On Oct25, 2011, at 13:39 , Florian Pflug wrote:
> On Oct25, 2011, at 11:13 , Simon Riggs wrote:
>> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> We are starting recovery at the right place but we are initialising
>>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>>> being derived later than it should be, which can cause problems if
>>> this then means that whole pages are unitialised in subtrans. The bug
>>> only shows up if you do enough transactions (2048 is always enough) to
>>> move to the next subtrans page between the redo pointer and the
>>> checkpoint record while at the same time we do not have a long running
>>> transaction that spans those two points. That's just enough to happen
>>> reasonably frequently on busy systems and yet just enough to have
>>> slipped through testing.
>>> We must either
>>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>>> we derive the redo location
>> (1) looks the best way forwards in all cases.
> Let me see if I understand this
> The probem seems to be that we currently derive oldestActiveXid end the end of
> the checkpoint, just before writing the checkpoint record. Since we use
> oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
> that checkpoint record (but after the REDO location, of course) may very well
> contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
> to touch these XID's SUBTRANS state, we error out.
> Your patch seems sensible, because the checkpoint "logically" occurs at the
> REDO location not the checkpoint's location, so we ought to log an 
> oldestActiveXID
> corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to 
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at 
glance, implies that we ought to use the nextId at the *beginning* of the 
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has 
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are 
larger than their parent XIDs and because only active transactions can produce

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to