[HACKERS] HS and clog
Hi, During the investigation of http://archives.postgresql.org/message-id/CAL_0b1t%3DWuM6roO8dki%3Dw8DhH8P8whhohbPjReymmQUrOcNT2A%40mail.gmail.com I noticed that during HS we do the following in RecordKnownAssignedTransactionIds: if (TransactionIdFollows(xid, latestObservedXid)) { TransactionId next_expected_xid; /* * Extend clog and subtrans like we do in GetNewTransactionId() during * normal operation using individual extend steps. Typical case * requires almost no activity. */ next_expected_xid = latestObservedXid; TransactionIdAdvance(next_expected_xid); while (TransactionIdPrecedesOrEquals(next_expected_xid, xid)) { ExtendCLOG(next_expected_xid); ExtendSUBTRANS(next_expected_xid); TransactionIdAdvance(next_expected_xid); } Extending subtrans is fine, that's required since its truncated after restart and because its not really WAL logged, but extending CLOG? Thats strange, isn't it, since clog is actually WAL logged, so all required pages will be zeroed from their wal records anyway. The commit introducing HS changed ExtendCLOG to do /* Zero the page and make an XLOG entry about it */ ZeroCLOGPage(pageno, !InRecovery); to make that even work during recovery. Imo this shouldn't be needed. Simon, do you remember why you added that? It makes me uneasy doing something like that only during HS but not during normal crash recovery/disabled HS. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS and clog
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote: Simon, do you remember why you added that? That doesn't look like code I added, but I'll look some more. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS and clog
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote: Imo this shouldn't be needed. In principle, I think your premise looks correct. ExtendCLOG() is not needed. If the xid truly is known assigned at a point in time, then the clog should already have been extended to allow that. I can't recall any special case there, but there may be one. Given that it seems to work, changing it requires more care... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS and clog
Simon Riggs si...@2ndquadrant.com writes: On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote: Imo this shouldn't be needed. In principle, I think your premise looks correct. ExtendCLOG() is not needed. If the xid truly is known assigned at a point in time, then the clog should already have been extended to allow that. IIRC, the slru/clog code assumes, or at least did at one time assume, that requests to initialize new pages are consecutive after startup. Which is OK in normal operation since we don't skip over assignment of any XIDs. What I thought Andres was worried about here was that there might be a gap in the extension requests when doing HS replay, and that maybe that led to failure to create files or portions of files that later the code would try to reference. But maybe I misunderstood. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS and clog
On 2013-03-30 18:24:44 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote: Imo this shouldn't be needed. In principle, I think your premise looks correct. ExtendCLOG() is not needed. If the xid truly is known assigned at a point in time, then the clog should already have been extended to allow that. I think thats pretty much guaranteed since xids should only be generated by GetNewTransactionId() which calls ExtendCLOG() which emits CLOG_ZEROPAGE. So any failure here would lead to broken crash recovery. What I am worried about is mostly that it could lead to a) errors with overwriting existing clog entries if the known assigned xids machinery lost track somehow. Check for example the bug that triggered me looking at the code referenced upthread. b) future coding errors which only work with either HS enabled/disabled since the behaviour is now different between both. IIRC, the slru/clog code assumes, or at least did at one time assume, that requests to initialize new pages are consecutive after startup. Which is OK in normal operation since we don't skip over assignment of any XIDs. It at least expects that ExtendClog() gets called for every xid since it only zeroes it for the first xid on a page. What I thought Andres was worried about here was that there might be a gap in the extension requests when doing HS replay, and that maybe that led to failure to create files or portions of files that later the code would try to reference. No, I am not particularly worried about that since I don't think the HS specific call to ExtendCLOG() can happen before a CLOG_ZEROPAGE record has been replayed, so I can't see how it ever can do something necessary. Its more that I am worried that it could overwrite stuff in edge-cases. I am pretty sure that the scenario you describe happens for SUBTRANS tho. Leading to the reported bug. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS and clog
On 2013-03-30 23:58:26 +0100, Andres Freund wrote: I am pretty sure that the scenario you describe happens for SUBTRANS tho. Leading to the reported bug. For a slightly too long explanation see: http://archives.postgresql.org/message-id/20130330172144.GI28736%40alap2.anarazel.de Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers