Re: [HACKERS] PITR - Some data is not recovered.

Tom Lane Tue, 10 Aug 2004 19:28:31 -0700

OKADA Satoshi <[EMAIL PROTECTED]> writes:
> I'm testing PITR using pgbench and postgresql ver.8.0bata. 
> I think that result of recovery is wrong.


Many thanks for this bug report.  I have developed a patch (attached)
that seems to fix it for me.  Please test and see if you can still
cause the problem.

BTW, I found that it was much easier to cause the bug to happen if you
force a long commit delay.  To do this you need to have
        fsync = on
        commit_siblings = 1
        commit_delay = 100000
and while the pgbench run is going, start a new psql and do
        begin;
        create table foo(f1 int);
(or any other command that modifies the database).  If you are watching
things with "top" you should notice a drastic decrease in the CPU
consumption of the backend connected to pgbench as soon as you do this.
Then proceed with taking the on-line backup.  You can end the extra psql
session after the backup is done, to allow pgbench to go back to
normal speed.

Doing it this way I got a failure just about every time (3 failures in 3
tries) whereas without, I had seen only one failure in four tries.

                        regards, tom lane


Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xact.c,v
retrieving revision 1.177
diff -c -r1.177 xact.c
*** src/backend/access/transam/xact.c   3 Aug 2004 15:57:26 -0000       1.177
--- src/backend/access/transam/xact.c   10 Aug 2004 20:55:48 -0000
***************
*** 574,586 ****
                START_CRIT_SECTION();
  
                /*
!                * We only need to log the commit in XLOG if the transaction made
!                * any transaction-controlled XLOG entries or will delete files.
                 * (If it made no transaction-controlled XLOG entries, its XID
                 * appears nowhere in permanent storage, so no one else will ever care
!                * if it committed.)
                 */
                madeTCentries = (MyLastRecPtr.xrecoff != 0);
                if (madeTCentries || nrels > 0)
                {
                        XLogRecData rdata[3];
--- 574,601 ----
                START_CRIT_SECTION();
  
                /*
!                * If our transaction made any transaction-controlled XLOG entries,
!                * we need to lock out checkpoint start between writing our XLOG
!                * record and updating pg_clog.  Otherwise it is possible for the
!                * checkpoint to set REDO after the XLOG record but fail to flush the
!                * pg_clog update to disk, leading to loss of the transaction commit
!                * if we crash a little later.  Slightly klugy fix for problem
!                * discovered 2004-08-10.
!                *
                 * (If it made no transaction-controlled XLOG entries, its XID
                 * appears nowhere in permanent storage, so no one else will ever care
!                * if it committed; so it doesn't matter if we lose the commit flag.)
!                *
!                * Note we only need a shared lock.
                 */
                madeTCentries = (MyLastRecPtr.xrecoff != 0);
+               if (madeTCentries)
+                       LWLockAcquire(CheckpointStartLock, LW_SHARED);
+ 
+               /*
+                * We only need to log the commit in XLOG if the transaction made
+                * any transaction-controlled XLOG entries or will delete files.
+                */
                if (madeTCentries || nrels > 0)
                {
                        XLogRecData rdata[3];
***************
*** 668,673 ****
--- 683,692 ----
                        TransactionIdCommitTree(nchildren, children);
                }
  
+               /* Unlock checkpoint lock if we acquired it */
+               if (madeTCentries)
+                       LWLockRelease(CheckpointStartLock);
+ 
                END_CRIT_SECTION();
        }
  
***************
*** 850,855 ****
--- 869,876 ----
                 *
                 * We do not flush XLOG to disk unless deleting files, since the
                 * default assumption after a crash would be that we aborted, anyway.
+                * For the same reason, we don't need to worry about interlocking
+                * against checkpoint start.
                 */
                if (MyLastRecPtr.xrecoff != 0 || nrels > 0)
                {
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xlog.c,v
retrieving revision 1.158
diff -c -r1.158 xlog.c
*** src/backend/access/transam/xlog.c   9 Aug 2004 16:26:01 -0000       1.158
--- src/backend/access/transam/xlog.c   10 Aug 2004 20:55:49 -0000
***************
*** 4699,4704 ****
--- 4699,4713 ----
        checkPoint.ThisTimeLineID = ThisTimeLineID;
        checkPoint.time = time(NULL);
  
+       /*
+        * We must hold CheckpointStartLock while determining the checkpoint
+        * REDO pointer.  This ensures that any concurrent transaction commits
+        * will be either not yet logged, or logged and recorded in pg_clog.
+        * See notes in RecordTransactionCommit().
+        */
+       LWLockAcquire(CheckpointStartLock, LW_EXCLUSIVE);
+ 
+       /* And we need WALInsertLock too */
        LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
  
        /*
***************
*** 4731,4736 ****
--- 4740,4746 ----
                        ControlFile->checkPointCopy.redo.xrecoff)
                {
                        LWLockRelease(WALInsertLock);
+                       LWLockRelease(CheckpointStartLock);
                        LWLockRelease(CheckpointLock);
                        END_CRIT_SECTION();
                        return;
***************
*** 4789,4794 ****
--- 4799,4807 ----
         * GetSnapshotData needs to get XidGenLock while holding SInvalLock,
         * so there's a risk of deadlock. Need to find a better solution.  See
         * pgsql-hackers discussion of 17-Dec-01.
+        *
+        * XXX actually, the whole UNDO code is dead code and unlikely to ever
+        * be revived, so the lack of a good solution here is not troubling.
         */
  #ifdef NOT_USED
        checkPoint.undo = GetUndoRecPtr();
***************
*** 4798,4808 ****
  #endif
  
        /*
!        * Now we can release insert lock, allowing other xacts to proceed
!        * even while we are flushing disk buffers.
         */
        LWLockRelease(WALInsertLock);
  
        /*
         * Get the other info we need for the checkpoint record.
         */
--- 4811,4823 ----
  #endif
  
        /*
!        * Now we can release insert lock and checkpoint start lock, allowing
!        * other xacts to proceed even while we are flushing disk buffers.
         */
        LWLockRelease(WALInsertLock);
  
+       LWLockRelease(CheckpointStartLock);
+ 
        /*
         * Get the other info we need for the checkpoint record.
         */
Index: src/include/storage/lwlock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/lwlock.h,v
retrieving revision 1.12
diff -c -r1.12 lwlock.h
*** src/include/storage/lwlock.h        11 Jun 2004 16:43:24 -0000      1.12
--- src/include/storage/lwlock.h        10 Aug 2004 20:55:49 -0000
***************
*** 36,41 ****
--- 36,42 ----
        WALWriteLock,
        ControlFileLock,
        CheckpointLock,
+       CheckpointStartLock,
        RelCacheInitLock,
        BgWriterCommLock,
  

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PITR - Some data is not recovered.

Reply via email to