[HACKERS] HS and clog

2013-03-30 Thread Andres Freund
Hi,

During the investigation of
http://archives.postgresql.org/message-id/CAL_0b1t%3DWuM6roO8dki%3Dw8DhH8P8whhohbPjReymmQUrOcNT2A%40mail.gmail.com
I noticed that during HS we do the following in 
RecordKnownAssignedTransactionIds:
if (TransactionIdFollows(xid, latestObservedXid))
{
TransactionId next_expected_xid;

/*
 * Extend clog and subtrans like we do in GetNewTransactionId() 
during
 * normal operation using individual extend steps. Typical case
 * requires almost no activity.
 */
next_expected_xid = latestObservedXid;
TransactionIdAdvance(next_expected_xid);
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
{
ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);

TransactionIdAdvance(next_expected_xid);
}

Extending subtrans is fine, that's required since its truncated after
restart and because its not really WAL logged, but extending CLOG? Thats
strange, isn't it, since clog is actually WAL logged, so all required
pages will be zeroed from their wal records anyway.
The commit introducing HS changed ExtendCLOG to do
/* Zero the page and make an XLOG entry about it */
ZeroCLOGPage(pageno, !InRecovery);
to make that even work during recovery.

Imo this shouldn't be needed.

Simon, do you remember why you added that?  It makes me uneasy doing
something like that only during HS but not during normal crash
recovery/disabled HS.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Simon Riggs
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:

 Simon, do you remember why you added that?

That doesn't look like code I added, but I'll look some more.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Simon Riggs
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:

 Imo this shouldn't be needed.

In principle, I think your premise looks correct. ExtendCLOG() is not needed.

If the xid truly is known assigned at a point in time, then the clog
should already have been extended to allow that.

I can't recall any special case there, but there may be one.

Given that it seems to work, changing it requires more care...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:
 Imo this shouldn't be needed.

 In principle, I think your premise looks correct. ExtendCLOG() is not needed.

 If the xid truly is known assigned at a point in time, then the clog
 should already have been extended to allow that.

IIRC, the slru/clog code assumes, or at least did at one time assume,
that requests to initialize new pages are consecutive after startup.
Which is OK in normal operation since we don't skip over assignment
of any XIDs.

What I thought Andres was worried about here was that there might be
a gap in the extension requests when doing HS replay, and that maybe
that led to failure to create files or portions of files that later
the code would try to reference.

But maybe I misunderstood.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Andres Freund
On 2013-03-30 18:24:44 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:
  Imo this shouldn't be needed.
 
  In principle, I think your premise looks correct. ExtendCLOG() is not 
  needed.
 
  If the xid truly is known assigned at a point in time, then the clog
  should already have been extended to allow that.

I think thats pretty much guaranteed since xids should only be generated by
GetNewTransactionId() which calls ExtendCLOG() which emits CLOG_ZEROPAGE. So
any failure here would lead to broken crash recovery.

What I am worried about is mostly that it could lead to

a) errors with overwriting existing clog entries if the known assigned xids
   machinery lost track somehow. Check for example the bug that
   triggered me looking at the code referenced upthread.
b) future coding errors which only work with either HS enabled/disabled since
   the behaviour is now different between both.

 IIRC, the slru/clog code assumes, or at least did at one time assume,
 that requests to initialize new pages are consecutive after startup.
 Which is OK in normal operation since we don't skip over assignment
 of any XIDs.

It at least expects that ExtendClog() gets called for every xid since it
only zeroes it for the first xid on a page.

 What I thought Andres was worried about here was that there might be
 a gap in the extension requests when doing HS replay, and that maybe
 that led to failure to create files or portions of files that later
 the code would try to reference.

No, I am not particularly worried about that since I don't think the HS
specific call to ExtendCLOG() can happen before a CLOG_ZEROPAGE record
has been replayed, so I can't see how it ever can do something
necessary. Its more that I am worried that it could overwrite stuff in
edge-cases.

I am pretty sure that the scenario you describe happens for SUBTRANS
tho. Leading to the reported bug.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Andres Freund
On 2013-03-30 23:58:26 +0100, Andres Freund wrote:
 I am pretty sure that the scenario you describe happens for SUBTRANS
 tho. Leading to the reported bug.

For a slightly too long explanation see:
http://archives.postgresql.org/message-id/20130330172144.GI28736%40alap2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers