On Thu, Aug 4, 2016 at 2:19 AM, Andreas Seltenreich <seltenre...@gmx.de> wrote:
> testing with sqlsmith shows that the following assertion doesn't hold:
>
>     FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", 
> Line: 10200)
>
> The triggering statements always contain a call to pg_start_backup with
> the third argument 'true', i.e. it's trying to start an exlusive backup.
>
> I didn't manage to put together a stand-alone testcase yet.

While I have not been able to trigger this assertion directly, I have
bumped into the fact that pg_stop_backup can reset unconditionally
XLogCtl->Insert.exclusiveBackup *before* pg_start_backup finishes or
even creates the backup_label file if it is set. So the in-memory
state of the backup is like there is no backups running at all
(including exclusive and non-exclusive), but there could be a
backup_label file present. In this state, it is not possible to
trigger pg_start_backup or pg_stop_backup again except if the
backup_label file is manually removed.

In do_pg_stop_backup, both steps would be better reversed, like in the
patch attached. So what we should actually do in pg_stop_backup is
first look at if the backup_label file exists, and then we reset the
in-memory flag as the last thing that do_pg_start_backup does is
writing the backup_label file. This does not close completely the
window though. After the backup_label file is created, it could still
be possible to trigger  the assertion if there is an error on the
tablespace map file.

This window exists as well on back-branches btw, this is not new to
9.6. Magnus, what do you think?
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..9380225 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10274,40 +10274,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 			  errmsg("WAL level not sufficient for making an online backup"),
 				 errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
 
-	/*
-	 * OK to update backup counters and forcePageWrites
-	 */
-	WALInsertLockAcquireExclusive();
-	if (exclusive)
-	{
-		if (!XLogCtl->Insert.exclusiveBackup)
-		{
-			WALInsertLockRelease();
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("exclusive backup not in progress")));
-		}
-		XLogCtl->Insert.exclusiveBackup = false;
-	}
-	else
-	{
-		/*
-		 * The user-visible pg_start/stop_backup() functions that operate on
-		 * exclusive backups can be called at any time, but for non-exclusive
-		 * backups, it is expected that each do_pg_start_backup() call is
-		 * matched by exactly one do_pg_stop_backup() call.
-		 */
-		Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-		XLogCtl->Insert.nonExclusiveBackups--;
-	}
-
-	if (!XLogCtl->Insert.exclusiveBackup &&
-		XLogCtl->Insert.nonExclusiveBackups == 0)
-	{
-		XLogCtl->Insert.forcePageWrites = false;
-	}
-	WALInsertLockRelease();
-
 	if (exclusive)
 	{
 		/*
@@ -10362,6 +10328,40 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	}
 
 	/*
+	 * OK to update backup counters and forcePageWrites
+	 */
+	WALInsertLockAcquireExclusive();
+	if (exclusive)
+	{
+		if (!XLogCtl->Insert.exclusiveBackup)
+		{
+			WALInsertLockRelease();
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exclusive backup not in progress")));
+		}
+		XLogCtl->Insert.exclusiveBackup = false;
+	}
+	else
+	{
+		/*
+		 * The user-visible pg_start/stop_backup() functions that operate on
+		 * exclusive backups can be called at any time, but for non-exclusive
+		 * backups, it is expected that each do_pg_start_backup() call is
+		 * matched by exactly one do_pg_stop_backup() call.
+		 */
+		Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
+		XLogCtl->Insert.nonExclusiveBackups--;
+	}
+
+	if (!XLogCtl->Insert.exclusiveBackup &&
+		XLogCtl->Insert.nonExclusiveBackups == 0)
+	{
+		XLogCtl->Insert.forcePageWrites = false;
+	}
+	WALInsertLockRelease();
+
+	/*
 	 * Read and parse the START WAL LOCATION line (this code is pretty crude,
 	 * but we are not expecting any variability in the file format).
 	 */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to