Re: [HACKERS] Allowing multiple concurrent base backups

Heikki Linnakangas Fri, 18 Mar 2011 04:57:12 -0700

On 18.03.2011 10:48, Heikki Linnakangas wrote:

On 17.03.2011 21:39, Robert Haas wrote:

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fu...@gmail.com>
wrote:

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:

Hmm, good point. It's harmless, but creating the history file in the
first
place sure seems like a waste of time.


The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.

--------------------
$ for ((i=0; i<4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i&
done

$ cat test0/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/000000010000000000000002.000000B0.backup
--------------------

This would cause a serious problem. Because the backup-end record
which indicates the same "START WAL LOCATION" can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required
WAL
file.

/*
* Force a CHECKPOINT. Aside from being necessary to prevent torn
* page problems, this guarantees that two successive backup runs will
* have different checkpoint positions and hence different history
* file names, even if nothing happened in between.
*
* We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
* fast = true). Otherwise this can take awhile.
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
(fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that.


Yes, good point.

Here's a patch based on that approach, ensuring that each base backupuses a different checkpoint as the start location. I think I'll committhis, rather than invent a new unique ID mechanism for backups. Thelatter would need changes in recovery and control file too, and I don'tfeel like tinkering with that at this stage.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 15af669..570f02b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -355,10 +355,13 @@ typedef struct XLogCtlInsert
 	 * exclusiveBackup is true if a backup started with pg_start_backup() is
 	 * in progress, and nonExclusiveBackups is a counter indicating the number
 	 * of streaming base backups currently in progress. forcePageWrites is
-	 * set to true when either of these is non-zero.
+	 * set to true when either of these is non-zero. lastBackupStart is the
+	 * latest checkpoint redo location used as a starting point for an online
+	 * backup.
 	 */
 	bool		exclusiveBackup;
 	int			nonExclusiveBackups;
+	XLogRecPtr	lastBackupStart;
 } XLogCtlInsert;
 
 /*
@@ -8809,6 +8812,19 @@ do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 						MAXPGPATH)));
 
 	/*
+	 * Force an XLOG file switch before the checkpoint, to ensure that the WAL
+	 * segment the checkpoint is written to doesn't contain pages with old
+	 * timeline IDs. That would otherwise happen if you called
+	 * pg_start_backup() right after restoring from a PITR archive: the first
+	 * WAL segment containing the startup checkpoint has pages in the
+	 * beginning with the old timeline ID. That can cause trouble at recovery:
+	 * we won't have a history file covering the old timeline if pg_xlog
+	 * directory was not included in the base backup and the WAL archive was
+	 * cleared too before starting the backup.
+	 */
+	RequestXLogSwitch();
+
+	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
 	 * it's quite possible for the backup dump to obtain a "torn" (partially
@@ -8843,43 +8859,54 @@ do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 	XLogCtl->Insert.forcePageWrites = true;
 	LWLockRelease(WALInsertLock);
 
-	/*
-	 * Force an XLOG file switch before the checkpoint, to ensure that the WAL
-	 * segment the checkpoint is written to doesn't contain pages with old
-	 * timeline IDs. That would otherwise happen if you called
-	 * pg_start_backup() right after restoring from a PITR archive: the first
-	 * WAL segment containing the startup checkpoint has pages in the
-	 * beginning with the old timeline ID. That can cause trouble at recovery:
-	 * we won't have a history file covering the old timeline if pg_xlog
-	 * directory was not included in the base backup and the WAL archive was
-	 * cleared too before starting the backup.
-	 */
-	RequestXLogSwitch();
-
 	/* Ensure we release forcePageWrites if fail below */
 	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 	{
-		/*
-		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
-		 * page problems, this guarantees that two successive backup runs will
-		 * have different checkpoint positions and hence different history
-		 * file names, even if nothing happened in between.
-		 *
-		 * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
-		 * fast = true).  Otherwise this can take awhile.
-		 */
-		RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
-						  (fast ? CHECKPOINT_IMMEDIATE : 0));
+		bool gotUniqueStartpoint = false;
+		do
+		{
+			/*
+			 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
+			 * page problems, this guarantees that two successive backup runs will
+			 * have different checkpoint positions and hence different history
+			 * file names, even if nothing happened in between.
+			 *
+			 * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
+			 * fast = true).  Otherwise this can take awhile.
+			 */
+			RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
+							  (fast ? CHECKPOINT_IMMEDIATE : 0));
 
-		/*
-		 * Now we need to fetch the checkpoint record location, and also its
-		 * REDO pointer.  The oldest point in WAL that would be needed to
-		 * restore starting from the checkpoint is precisely the REDO pointer.
-		 */
-		LWLockAcquire(ControlFileLock, LW_SHARED);
-		checkpointloc = ControlFile->checkPoint;
-		startpoint = ControlFile->checkPointCopy.redo;
-		LWLockRelease(ControlFileLock);
+			/*
+			 * Now we need to fetch the checkpoint record location, and also its
+			 * REDO pointer.  The oldest point in WAL that would be needed to
+			 * restore starting from the checkpoint is precisely the REDO pointer.
+			 */
+			LWLockAcquire(ControlFileLock, LW_SHARED);
+			checkpointloc = ControlFile->checkPoint;
+			startpoint = ControlFile->checkPointCopy.redo;
+			LWLockRelease(ControlFileLock);
+
+			/*
+			 * If two base backups are started at the same time (in WAL
+			 * sender processes), we need to make sure that they use
+			 * different checkpoints as starting locations, because we use
+			 * the starting WAL location as a unique identifier for the base
+			 * backup in the end-of-backup WAL record and when we write the
+			 * backup history file. Perhaps it would be better generate a
+			 * separate unique ID for each backup instead of forcing another
+			 * checkpoint, but taking a checkpoint right after another is
+			 * not that expensive either because only few buffers have been
+			 * dirtied yet.
+			 */
+			LWLockAcquire(WALInsertLock, LW_SHARED);
+			if (XLByteLT(XLogCtl->Insert.lastBackupStart, startpoint))
+			{
+				XLogCtl->Insert.lastBackupStart = startpoint;
+				gotUniqueStartpoint = true;
+			}
+			LWLockRelease(WALInsertLock);
+		} while(!gotUniqueStartpoint);
 
 		XLByteToSeg(startpoint, _logId, _logSeg);
 		XLogFileName(xlogfilename, ThisTimeLineID, _logId, _logSeg);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

Reply via email to