Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-06 Thread Andreas Seltenreich
Michael Paquier writes:

> Andreas, with the patch attached is the assertion still triggered?
> [2. text/x-diff; base-backup-crash-v2.patch]

I didn't observe the crashes since applying this patch.  There should
have been about five by the amount of fuzzing done.

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-05 Thread Kyotaro HORIGUCHI
Hello,

While the exact cause of the situation is not known yet but we
have apparently forgot that pg_stop_backup can be called
simultaneously with pg_start_backup. It seems anyway to me that
pg_stop_backup should be called before the end of pg_start_backup
from their definition and what they do. And it should fix the
assertion failure.

On solution is exclusiveBackupStarted (I'd like to rename the
variable) on shared memory as the patch does.

Another place where we can have something with the same effect is
file system. We can create 'backup_label.tmp" at very early in
pg_start_backup and rename it to backup_label at the end of the
function. Anyway exclusive pg_stop_backup needs that. A defect of
that would be the remaining backup_label.tmp file after crash
during pg_start_backup. Renaming tmp to (none) is atomic enough
for this usage but I'm not sure if it's in a network file system.
exclusiveBackup is also used this kind of exclusion, this also is
replasable with the .tmp file.

But after some thoughts, it found to need to add a bunch of error
handling code for the file operations and it should become too
complex. So I abandoned it.


At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquier  
wrote in 
> On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
>  wrote:
> > Andreas, with the patch attached is the assertion still triggered?
> >
> > Thoughts from others are welcome as well.
> 
> Self review:
>   * of streaming base backups currently in progress. forcePageWrites is 
> set
>   * to true when either of these is non-zero. lastBackupStart is the 
> latest
>   * checkpoint redo location used as a starting point for an online 
> backup.
> + * exclusiveBackupStarted is true while pg_start_backup() is being called
> + * during an exclusive backup.
>   */
>  boolexclusiveBackup;
>  intnonExclusiveBackups;
>  XLogRecPtrlastBackupStart;
> +boolexclusiveBackupStarted;
> It would be better to just use one variable that uses an enum to track
> the following states of an exclusive backup: none, starting and
> in-progress.

What I thought when I looked the first patch is that. Addition to
that I don't see the name exclusiveBackupStated is
appropriate.

One more argument is the necessity of the WALInsertLock at the
end of pg_start_backup. No other backend cannot reach there
concurrently and possible latency from cache coherency won't be a
matter for the purpose, I suppose. What do you think it is needed
for?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-04 Thread Michael Paquier
On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
 wrote:
> Andreas, with the patch attached is the assertion still triggered?
>
> Thoughts from others are welcome as well.

Self review:
  * of streaming base backups currently in progress. forcePageWrites is set
  * to true when either of these is non-zero. lastBackupStart is the latest
  * checkpoint redo location used as a starting point for an online backup.
+ * exclusiveBackupStarted is true while pg_start_backup() is being called
+ * during an exclusive backup.
  */
 boolexclusiveBackup;
 intnonExclusiveBackups;
 XLogRecPtrlastBackupStart;
+boolexclusiveBackupStarted;
It would be better to just use one variable that uses an enum to track
the following states of an exclusive backup: none, starting and
in-progress.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-04 Thread Michael Paquier
On Thu, Aug 4, 2016 at 4:03 PM, Michael Paquier
 wrote:
> This window exists as well on back-branches btw, this is not new to
> 9.6. Magnus, what do you think?

Actually, a completely water-proof solution is to use an in-memory
flag proper to exclusive backups during the time pg_start_backup() is
called, at the cost of taking WALInsertLockAcquireExclusive once again
at the end of do_pg_start_backup(), but it seems to me that it is an
acceptable cost because pg_start_backup is not a hot code path. Using
a separate flag has the advantage to provide to the user a better
error message when pg_stop_backup is used while pg_start_backup is
running as well.

Andreas, with the patch attached is the assertion still triggered?

Thoughts from others are welcome as well.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..c4bc332 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -505,10 +505,13 @@ typedef struct XLogCtlInsert
 	 * of streaming base backups currently in progress. forcePageWrites is set
 	 * to true when either of these is non-zero. lastBackupStart is the latest
 	 * checkpoint redo location used as a starting point for an online backup.
+	 * exclusiveBackupStarted is true while pg_start_backup() is being called
+	 * during an exclusive backup.
 	 */
 	bool		exclusiveBackup;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
+	bool		exclusiveBackupStarted;
 
 	/*
 	 * WAL insertion locks.
@@ -9833,7 +9836,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	WALInsertLockAcquireExclusive();
 	if (exclusive)
 	{
-		if (XLogCtl->Insert.exclusiveBackup)
+		if (XLogCtl->Insert.exclusiveBackup ||
+			XLogCtl->Insert.exclusiveBackupStarted)
 		{
 			WALInsertLockRelease();
 			ereport(ERROR,
@@ -9842,6 +9846,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	 errhint("Run pg_stop_backup() and try again.")));
 		}
 		XLogCtl->Insert.exclusiveBackup = true;
+		XLogCtl->Insert.exclusiveBackupStarted = true;
 	}
 	else
 		XLogCtl->Insert.nonExclusiveBackups++;
@@ -10178,6 +10183,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 
 	/*
+	 * Mark that start phase has correctly finished for an exclusive backup.
+	 */
+	if (exclusive)
+	{
+		WALInsertLockAcquireExclusive();
+		XLogCtl->Insert.exclusiveBackupStarted = false;
+		WALInsertLockRelease();
+	}
+
+	/*
 	 * We're done.  As a convenience, return the starting WAL location.
 	 */
 	if (starttli_p)
@@ -10195,8 +10210,10 @@ pg_start_backup_callback(int code, Datum arg)
 	WALInsertLockAcquireExclusive();
 	if (exclusive)
 	{
-		Assert(XLogCtl->Insert.exclusiveBackup);
+		Assert(XLogCtl->Insert.exclusiveBackup &&
+			   XLogCtl->Insert.exclusiveBackupStarted);
 		XLogCtl->Insert.exclusiveBackup = false;
+		XLogCtl->Insert.exclusiveBackupStarted = false;
 	}
 	else
 	{
@@ -10287,6 +10304,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 errmsg("exclusive backup not in progress")));
 		}
+		if (XLogCtl->Insert.exclusiveBackupStarted)
+		{
+			WALInsertLockRelease();
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("exclusive backup currently starting")));
+		}
 		XLogCtl->Insert.exclusiveBackup = false;
 	}
 	else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-04 Thread Michael Paquier
On Thu, Aug 4, 2016 at 2:19 AM, Andreas Seltenreich  wrote:
> testing with sqlsmith shows that the following assertion doesn't hold:
>
> FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", 
> Line: 10200)
>
> The triggering statements always contain a call to pg_start_backup with
> the third argument 'true', i.e. it's trying to start an exlusive backup.
>
> I didn't manage to put together a stand-alone testcase yet.

While I have not been able to trigger this assertion directly, I have
bumped into the fact that pg_stop_backup can reset unconditionally
XLogCtl->Insert.exclusiveBackup *before* pg_start_backup finishes or
even creates the backup_label file if it is set. So the in-memory
state of the backup is like there is no backups running at all
(including exclusive and non-exclusive), but there could be a
backup_label file present. In this state, it is not possible to
trigger pg_start_backup or pg_stop_backup again except if the
backup_label file is manually removed.

In do_pg_stop_backup, both steps would be better reversed, like in the
patch attached. So what we should actually do in pg_stop_backup is
first look at if the backup_label file exists, and then we reset the
in-memory flag as the last thing that do_pg_start_backup does is
writing the backup_label file. This does not close completely the
window though. After the backup_label file is created, it could still
be possible to trigger  the assertion if there is an error on the
tablespace map file.

This window exists as well on back-branches btw, this is not new to
9.6. Magnus, what do you think?
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..9380225 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10274,40 +10274,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 			  errmsg("WAL level not sufficient for making an online backup"),
  errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
 
-	/*
-	 * OK to update backup counters and forcePageWrites
-	 */
-	WALInsertLockAcquireExclusive();
-	if (exclusive)
-	{
-		if (!XLogCtl->Insert.exclusiveBackup)
-		{
-			WALInsertLockRelease();
-			ereport(ERROR,
-	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-	 errmsg("exclusive backup not in progress")));
-		}
-		XLogCtl->Insert.exclusiveBackup = false;
-	}
-	else
-	{
-		/*
-		 * The user-visible pg_start/stop_backup() functions that operate on
-		 * exclusive backups can be called at any time, but for non-exclusive
-		 * backups, it is expected that each do_pg_start_backup() call is
-		 * matched by exactly one do_pg_stop_backup() call.
-		 */
-		Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-		XLogCtl->Insert.nonExclusiveBackups--;
-	}
-
-	if (!XLogCtl->Insert.exclusiveBackup &&
-		XLogCtl->Insert.nonExclusiveBackups == 0)
-	{
-		XLogCtl->Insert.forcePageWrites = false;
-	}
-	WALInsertLockRelease();
-
 	if (exclusive)
 	{
 		/*
@@ -10362,6 +10328,40 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	}
 
 	/*
+	 * OK to update backup counters and forcePageWrites
+	 */
+	WALInsertLockAcquireExclusive();
+	if (exclusive)
+	{
+		if (!XLogCtl->Insert.exclusiveBackup)
+		{
+			WALInsertLockRelease();
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("exclusive backup not in progress")));
+		}
+		XLogCtl->Insert.exclusiveBackup = false;
+	}
+	else
+	{
+		/*
+		 * The user-visible pg_start/stop_backup() functions that operate on
+		 * exclusive backups can be called at any time, but for non-exclusive
+		 * backups, it is expected that each do_pg_start_backup() call is
+		 * matched by exactly one do_pg_stop_backup() call.
+		 */
+		Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
+		XLogCtl->Insert.nonExclusiveBackups--;
+	}
+
+	if (!XLogCtl->Insert.exclusiveBackup &&
+		XLogCtl->Insert.nonExclusiveBackups == 0)
+	{
+		XLogCtl->Insert.forcePageWrites = false;
+	}
+	WALInsertLockRelease();
+
+	/*
 	 * Read and parse the START WAL LOCATION line (this code is pretty crude,
 	 * but we are not expecting any variability in the file format).
 	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-03 Thread Andreas Seltenreich
Hi,

testing with sqlsmith shows that the following assertion doesn't hold:

FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 
10200)

The triggering statements always contain a call to pg_start_backup with
the third argument 'true', i.e. it's trying to start an exlusive backup.

I didn't manage to put together a stand-alone testcase yet.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers