Re: [HACKERS] [PATCHES] WAL logging freezing

Heikki Linnakangas Mon, 30 Oct 2006 05:08:56 -0800

We just discussed this in detail with Simon, and it looks like we have5 (!) different but related problems:

1) The original problem of freeze then crash, leaving too high values inrelminxid and datminxid. If you then run vacuum, it might truncate CLOGand you lose the commit status of the records that were supposed to befrozen.


To fix this, we need to WAL log freezing as already discussed.

2) vactuple_get_minxid doesn't take into account xmax's of tuples thathave HEAP_XMAX_INVALID set. That's a problem:


transaction 1001 - BEGIN; DELETE FROM foo where key = 1;
transaction 1001 - ROLLBACK;
transaction 1002 - VACUUM foo;
crash

VACUUM foo will set relminxid to 1002, because HEAP_XMAX_INVALID was seton the tuple (possibly by vacuum itself) that the deletion that rolledback touched. However, that hint-bit update hasn't hit the disk yet, soafter recovery, the tuple will have an xmax of 1001 with no hint-bit,and relminxid is 1002.

The simplest fix for this issue is to ignore the HEAP_XMAX_INVALID hintbit, and take any xmax other than InvalidXid into account whencalculating the relminxid.

3) If you recover from a PITR backup (or have a hot stand-by), with basebackup that's more than 4 billion transactions older than the newest WALrecord, the clog entries of old transactions in the base backup willoverlap with the clog entries of new transactions that are in the WALrecords. This is the problem you also pointed out below.

To fix this, we need to emit a WAL record when truncating the clog. Wemust also make sure that recovery of any WAL record type doesn't rely onclog, because if we truncate the clog and then crash, recovery won'thave the clog available for the old transactions. At the moment,TruncateCLog issues a checkpoint to protect from that but that's notgoing to work when rolling forward logs in PITR, right?

4) If we fix issue 2 so that vactuple_get_minxid always takes xmax intoaccount, even if HEAP_XMAX_INVALID is set, a tuple with an aborted xmaxwill keep us from advancing relminxid and truncating clog etc. Thatdoesn't lead to data corruption, but you will eventually hit thetransaction wrap-around limit. We don't have the same problem with xmin,because we freeze tuples that are older than FreezeLimit to avoid it,but we don't do that for xmax.

To fix this, replace any xmax older than FreezeLimit with InvalidXidduring vacuum. That also needs to be WAL logged.

5) We don't freeze tuples that are in RECENTLY_DEAD orDELETE_IN_PROGRESS state. That doesn't lead to data corruption, but itmight make you hit the transaction wrap-around limit. That can happen ifyou have a transaction that deletes or updates a very old, but not yetfrozen tuple. If you run vacuum while the deleting transaction is inprogress, vacuum won't freeze the tuple, and won't advance thewrap-around limit because of the old tuple. That's not serious if thedeleting transaction commits, because the next vacuum will then removethe tuple, but if it aborts, we might run into the same problem on thenext vacuum, and the next one, and the next one, until we reach thewrap-around.

To fix this, simply do the freezing for tuples in RECENTLY_DEAD andDELETE_IN_PROGRESS states as well,


Am I missing something? Finding this many bugs makes me nervous...

Simon volunteered to make the clog changes for 3 because it's a PITRrelated issue. I can write a patch/patches for the other changes if ithelps.


Tom Lane wrote:

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

I think it's premature to start writing
patches until we've decided how this really needs to work.

Not logging hint-bit updates seems safe to me. As long as we have theclog, the hint-bit is just a hint. The problem with freezing is thatafter freezing tuples, the corresponding clog page can go away.


Actually clog can go away much sooner than that, at least in normal
operation --- that's what datvacuumxid is for, to track where we can
truncate clog.  Maybe it's OK to say that during WAL replay we keep it
all the way back to the freeze horizon, but I'm not sure how we keep the
system from wiping clog it still needs right after switching to normal
operation.  Maybe we should somehow not xlog updates of datvacuumxid?

Another thing I'm concerned about is the scenario where a PITR
hot-standby machine tracks a master over a period of more than 4 billion
transactions.  I'm not sure what will happen in the slave's pg_clog
directory, but I'm afraid it won't be good :-(

                        regards, tom lane



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] WAL logging freezing

Reply via email to