Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

Heikki Linnakangas Tue, 11 Dec 2012 08:51:56 -0800

On 11.12.2012 17:04, Fujii Masao wrote:

On Tue, Dec 11, 2012 at 6:14 PM, Heikki Linnakangas
<[email protected]>  wrote:

On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote:


The cause is that CheckRecoveryConsistency() is called before rm_redo(),
as Horiguchi-san suggested upthead. Imagine the case where
minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
record. When restarting the server with that minRecoveryPoint,
the followings would happen, and then PANIC occurs.

1. XLOG_SMGR_TRUNCATE record is read.
2. CheckRecoveryConsistency() is called, and database is marked as
      consistent since we've reached minRecoveryPoint (i.e., the location
      of XLOG_SMGR_TRUNCATE).
3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
      found.
4. Since the database has already been marked as consistent, an invalid
      page leads to PANIC.



Exactly.

In smgr_redo, EndRecPtr which is pointing the record next to
SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
hand, during the second startup of the standby,
CheckRecoveryConsistency checks for consistency by
XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
just BEFORE the SMGR_TRUNCATE record is applied. So
reachedConsistency becomes true just before the SMGR_TRUNCATE
record will be applied. Bang!



Ah, I see. I thought moving CheckRecoveryConsistency was just a nice-to-have
thing, so that we'd notice that we're consistent earlier, so didn't pay
attention to that part.

I think the real issue here is that CheckRecoveryConsistency should not be
comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr as it
currently does. EndRecPtr points to the end of the last record *read*, while
recoveryLastRecPtr points to the end of the last record *replayed*. The
latter is what CheckRecoveryConsistency should be concerned with.

BTW, it occurs to me that we have two variables in the shared struct that
are almost the same thing: recoveryLastRecPtr and replayEndRecPtr. The
former points to the end of the last record successfully replayed, while
replayEndRecPtr points to the end of the last record successfully replayed,
or begin replayed at the moment. I find that confusing, so I suggest that we
rename recoveryLastRecPtr to make that more clear. Included in the attached
patch.


The patch looks good. I confirmed that the PANIC error didn't happen,
with the patch.

Thanks. After thinking about this more, I think it's still not quiteright. Now that we fix the check in CheckRecoveryConsistency, we have tomove the call to CheckRecoveryConsistency to after the rm_redo call.Otherwise you don't enter hot standby mode after replaying the lastrecord, the one ending at minRecoveryPoint. You have to wait for onemore record to be read (but not replayed), so thatCheckRecoveryConsistency gets called and we let backends in.

As the code stands, the bad placement of CheckRecoveryConsistency iscompensated by the bug that we open up for hot standby one record tooearly. The net result is that we open up for hot standby just beforereplaying the last record that makes us consistent. The window forsomeone to see the inconsistent state is really small, postmaster willhave to receive the signal, start accepting connections, and you haveconnect and run a query, all before the startup process gets around toreplay the record its already read into memory.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

Reply via email to