Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Michael Paquier writes: > Andreas, with the patch attached is the assertion still triggered? > [2. text/x-diff; base-backup-crash-v2.patch] I didn't observe the crashes since applying this patch. There should have been about five by the amount of fuzzing done. regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Hello, While the exact cause of the situation is not known yet but we have apparently forgot that pg_stop_backup can be called simultaneously with pg_start_backup. It seems anyway to me that pg_stop_backup should be called before the end of pg_start_backup from their definition and what they do. And it should fix the assertion failure. On solution is exclusiveBackupStarted (I'd like to rename the variable) on shared memory as the patch does. Another place where we can have something with the same effect is file system. We can create 'backup_label.tmp" at very early in pg_start_backup and rename it to backup_label at the end of the function. Anyway exclusive pg_stop_backup needs that. A defect of that would be the remaining backup_label.tmp file after crash during pg_start_backup. Renaming tmp to (none) is atomic enough for this usage but I'm not sure if it's in a network file system. exclusiveBackup is also used this kind of exclusion, this also is replasable with the .tmp file. But after some thoughts, it found to need to add a bunch of error handling code for the file operations and it should become too complex. So I abandoned it. At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquierwrote in > On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier > wrote: > > Andreas, with the patch attached is the assertion still triggered? > > > > Thoughts from others are welcome as well. > > Self review: > * of streaming base backups currently in progress. forcePageWrites is > set > * to true when either of these is non-zero. lastBackupStart is the > latest > * checkpoint redo location used as a starting point for an online > backup. > + * exclusiveBackupStarted is true while pg_start_backup() is being called > + * during an exclusive backup. > */ > boolexclusiveBackup; > intnonExclusiveBackups; > XLogRecPtrlastBackupStart; > +boolexclusiveBackupStarted; > It would be better to just use one variable that uses an enum to track > the following states of an exclusive backup: none, starting and > in-progress. What I thought when I looked the first patch is that. Addition to that I don't see the name exclusiveBackupStated is appropriate. One more argument is the necessity of the WALInsertLock at the end of pg_start_backup. No other backend cannot reach there concurrently and possible latency from cache coherency won't be a matter for the purpose, I suppose. What do you think it is needed for? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquierwrote: > Andreas, with the patch attached is the assertion still triggered? > > Thoughts from others are welcome as well. Self review: * of streaming base backups currently in progress. forcePageWrites is set * to true when either of these is non-zero. lastBackupStart is the latest * checkpoint redo location used as a starting point for an online backup. + * exclusiveBackupStarted is true while pg_start_backup() is being called + * during an exclusive backup. */ boolexclusiveBackup; intnonExclusiveBackups; XLogRecPtrlastBackupStart; +boolexclusiveBackupStarted; It would be better to just use one variable that uses an enum to track the following states of an exclusive backup: none, starting and in-progress. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Thu, Aug 4, 2016 at 4:03 PM, Michael Paquierwrote: > This window exists as well on back-branches btw, this is not new to > 9.6. Magnus, what do you think? Actually, a completely water-proof solution is to use an in-memory flag proper to exclusive backups during the time pg_start_backup() is called, at the cost of taking WALInsertLockAcquireExclusive once again at the end of do_pg_start_backup(), but it seems to me that it is an acceptable cost because pg_start_backup is not a hot code path. Using a separate flag has the advantage to provide to the user a better error message when pg_stop_backup is used while pg_start_backup is running as well. Andreas, with the patch attached is the assertion still triggered? Thoughts from others are welcome as well. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f13f9c1..c4bc332 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -505,10 +505,13 @@ typedef struct XLogCtlInsert * of streaming base backups currently in progress. forcePageWrites is set * to true when either of these is non-zero. lastBackupStart is the latest * checkpoint redo location used as a starting point for an online backup. + * exclusiveBackupStarted is true while pg_start_backup() is being called + * during an exclusive backup. */ bool exclusiveBackup; int nonExclusiveBackups; XLogRecPtr lastBackupStart; + bool exclusiveBackupStarted; /* * WAL insertion locks. @@ -9833,7 +9836,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, WALInsertLockAcquireExclusive(); if (exclusive) { - if (XLogCtl->Insert.exclusiveBackup) + if (XLogCtl->Insert.exclusiveBackup || + XLogCtl->Insert.exclusiveBackupStarted) { WALInsertLockRelease(); ereport(ERROR, @@ -9842,6 +9846,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, errhint("Run pg_stop_backup() and try again."))); } XLogCtl->Insert.exclusiveBackup = true; + XLogCtl->Insert.exclusiveBackupStarted = true; } else XLogCtl->Insert.nonExclusiveBackups++; @@ -10178,6 +10183,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive)); /* + * Mark that start phase has correctly finished for an exclusive backup. + */ + if (exclusive) + { + WALInsertLockAcquireExclusive(); + XLogCtl->Insert.exclusiveBackupStarted = false; + WALInsertLockRelease(); + } + + /* * We're done. As a convenience, return the starting WAL location. */ if (starttli_p) @@ -10195,8 +10210,10 @@ pg_start_backup_callback(int code, Datum arg) WALInsertLockAcquireExclusive(); if (exclusive) { - Assert(XLogCtl->Insert.exclusiveBackup); + Assert(XLogCtl->Insert.exclusiveBackup && + XLogCtl->Insert.exclusiveBackupStarted); XLogCtl->Insert.exclusiveBackup = false; + XLogCtl->Insert.exclusiveBackupStarted = false; } else { @@ -10287,6 +10304,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("exclusive backup not in progress"))); } + if (XLogCtl->Insert.exclusiveBackupStarted) + { + WALInsertLockRelease(); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("exclusive backup currently starting"))); + } XLogCtl->Insert.exclusiveBackup = false; } else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Thu, Aug 4, 2016 at 2:19 AM, Andreas Seltenreichwrote: > testing with sqlsmith shows that the following assertion doesn't hold: > > FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", > Line: 10200) > > The triggering statements always contain a call to pg_start_backup with > the third argument 'true', i.e. it's trying to start an exlusive backup. > > I didn't manage to put together a stand-alone testcase yet. While I have not been able to trigger this assertion directly, I have bumped into the fact that pg_stop_backup can reset unconditionally XLogCtl->Insert.exclusiveBackup *before* pg_start_backup finishes or even creates the backup_label file if it is set. So the in-memory state of the backup is like there is no backups running at all (including exclusive and non-exclusive), but there could be a backup_label file present. In this state, it is not possible to trigger pg_start_backup or pg_stop_backup again except if the backup_label file is manually removed. In do_pg_stop_backup, both steps would be better reversed, like in the patch attached. So what we should actually do in pg_stop_backup is first look at if the backup_label file exists, and then we reset the in-memory flag as the last thing that do_pg_start_backup does is writing the backup_label file. This does not close completely the window though. After the backup_label file is created, it could still be possible to trigger the assertion if there is an error on the tablespace map file. This window exists as well on back-branches btw, this is not new to 9.6. Magnus, what do you think? -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f13f9c1..9380225 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10274,40 +10274,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) errmsg("WAL level not sufficient for making an online backup"), errhint("wal_level must be set to \"replica\" or \"logical\" at server start."))); - /* - * OK to update backup counters and forcePageWrites - */ - WALInsertLockAcquireExclusive(); - if (exclusive) - { - if (!XLogCtl->Insert.exclusiveBackup) - { - WALInsertLockRelease(); - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("exclusive backup not in progress"))); - } - XLogCtl->Insert.exclusiveBackup = false; - } - else - { - /* - * The user-visible pg_start/stop_backup() functions that operate on - * exclusive backups can be called at any time, but for non-exclusive - * backups, it is expected that each do_pg_start_backup() call is - * matched by exactly one do_pg_stop_backup() call. - */ - Assert(XLogCtl->Insert.nonExclusiveBackups > 0); - XLogCtl->Insert.nonExclusiveBackups--; - } - - if (!XLogCtl->Insert.exclusiveBackup && - XLogCtl->Insert.nonExclusiveBackups == 0) - { - XLogCtl->Insert.forcePageWrites = false; - } - WALInsertLockRelease(); - if (exclusive) { /* @@ -10362,6 +10328,40 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) } /* + * OK to update backup counters and forcePageWrites + */ + WALInsertLockAcquireExclusive(); + if (exclusive) + { + if (!XLogCtl->Insert.exclusiveBackup) + { + WALInsertLockRelease(); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("exclusive backup not in progress"))); + } + XLogCtl->Insert.exclusiveBackup = false; + } + else + { + /* + * The user-visible pg_start/stop_backup() functions that operate on + * exclusive backups can be called at any time, but for non-exclusive + * backups, it is expected that each do_pg_start_backup() call is + * matched by exactly one do_pg_stop_backup() call. + */ + Assert(XLogCtl->Insert.nonExclusiveBackups > 0); + XLogCtl->Insert.nonExclusiveBackups--; + } + + if (!XLogCtl->Insert.exclusiveBackup && + XLogCtl->Insert.nonExclusiveBackups == 0) + { + XLogCtl->Insert.forcePageWrites = false; + } + WALInsertLockRelease(); + + /* * Read and parse the START WAL LOCATION line (this code is pretty crude, * but we are not expecting any variability in the file format). */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Hi, testing with sqlsmith shows that the following assertion doesn't hold: FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200) The triggering statements always contain a call to pg_start_backup with the third argument 'true', i.e. it's trying to start an exlusive backup. I didn't manage to put together a stand-alone testcase yet. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers