Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2017-01-17 Thread Michael Paquier
I sent this email one month ago but forgot to cc pgsql-hackers ;)
For the record, it is the set of patches attached that have been
pushed as 974ece5, and only Fujii-san has received them... Thanks for
committing the fix by the way!

On Thu, Dec 15, 2016 at 3:30 AM, Fujii Masao  wrote:
> On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao  wrote:
>> On second thought, do users really want to distinguish those three errornous
>> states? I'm inclined to merge the checks for those three conditions into one,
>> that is,
>>
>> if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
>> {
>> WALInsertLockRelease();
>> ereport(ERROR,
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>  errmsg("a backup is already in progress"),
>>
>> Also it may be better to handle the similar checks in pg_stop_backup,
>> in the same way.
>
> Here is the updated version of the patch that I applied the above "merge" to.

Okay.

> Unfortunately this patch is not applied cleanly to old versions.
> So we need to create more patches for back-patch.

Attached are patches for all supported branches. I have tweaked the
following comment for master/9.6, that's why I am re-sending it:
+* (see comments of ExclusiveBackupState for more details).

Getting master and 9.6 was straight-forward. 9.5 showed a small
conflict. 9.4 bigger conflicts but the code blocks are
straight-forward to place except that the tablespace map does not
exist there, and that we need to be careful that the same flow is used
for the removal of the backup_label file. At 9.3, conflicts are caused
because of changes of APIs for WAL insert locking. Finally 9.2 showed
no conflicts with 9.3, which was a bit surprising.
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 457a83452e..b33e3697e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -349,6 +349,29 @@ typedef struct XLogwrtResult
 } XLogwrtResult;
 
 /*
+ * State of an exclusive backup, necessary to control concurrent activities
+ * across sessions when working on exclusive backups.
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is no exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
+ * exclusive backup.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS,
+	EXCLUSIVE_BACKUP_STOPPING
+} ExclusiveBackupState;
+
+/*
  * Shared state data for XLogInsert.
  */
 typedef struct XLogCtlInsert
@@ -370,13 +393,15 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
-	 * of streaming base backups currently in progress. forcePageWrites is set
-	 * to true when either of these is non-zero. lastBackupStart is the latest
-	 * checkpoint redo location used as a starting point for an online backup.
+	 * exclusiveBackupState indicates the state of an exclusive backup
+	 * (see comments of ExclusiveBackupState for more details).
+	 * nonExclusiveBackups is a counter indicating the number of streaming
+	 * base backups currently in progress. forcePageWrites is set to true
+	 * when either of these is non-zero. lastBackupStart is the latest
+	 * checkpoint redo location used as a starting point for an online
+	 * backup.
 	 */
-	bool		exclusiveBackup;
+	ExclusiveBackupState exclusiveBackupState;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
 } XLogCtlInsert;
@@ -693,6 +718,7 @@ static bool CheckForStandbyTrigger(void);
 static void xlog_outrec(StringInfo buf, XLogRecord *record);
 #endif
 static void pg_start_backup_callback(int code, Datum arg);
+static void pg_stop_backup_callback(int code, Datum arg);
 static bool read_backup_label(XLogRecPtr *checkPointLoc,
   bool *backupEndRequired, bool *backupFromStandby);
 static void rm_redo_error_callback(void *arg);
@@ -8699,7 +8725,12 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
 	if (exclusive)
 	{
-		if (XLogCtl->Insert.exclusiveBackup)
+		/*
+		 * At first, mark that we're now starting an exclusive backup,
+		 * to ensure that there are no other sessions currently running
+		 * pg_start_backup() or pg_stop_backup().
+		 */
+		if (XLogCtl->Insert.exclusiveBackupState != 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-14 Thread Fujii Masao
On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao  wrote:
> On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
>  wrote:
>> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:
>>
>> Thanks for the review.
>
> Thanks for the updated version of the patch!
>
>>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> + errmsg("a backup is already starting")));
>>> +}
>>> +if (XLogCtl->Insert.exclusiveBackupState == 
>>> EXCLUSIVE_BACKUP_STOPPING)
>>> +{
>>> +WALInsertLockRelease();
>>> +ereport(ERROR,
>>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> + errmsg("a backup is currently stopping")));
>>>
>>> Isn't it better to use "an exclusive backup" explicitly rather than
>>> "a backup" here?
>>
>> Yes. It would be better to not touch the original in-progress messages
>> though.
>
> On second thought, do users really want to distinguish those three errornous
> states? I'm inclined to merge the checks for those three conditions into one,
> that is,
>
> if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
> {
> WALInsertLockRelease();
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("a backup is already in progress"),
>
> Also it may be better to handle the similar checks in pg_stop_backup,
> in the same way.

Here is the updated version of the patch that I applied the above "merge" to.

Unfortunately this patch is not applied cleanly to old versions.
So we need to create more patches for back-patch.

Regards,

-- 
Fujii Masao


base-backup-crash-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-14 Thread Fujii Masao
On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
 wrote:
> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:
>
> Thanks for the review.

Thanks for the updated version of the patch!

>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("a backup is already starting")));
>> +}
>> +if (XLogCtl->Insert.exclusiveBackupState == 
>> EXCLUSIVE_BACKUP_STOPPING)
>> +{
>> +WALInsertLockRelease();
>> +ereport(ERROR,
>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("a backup is currently stopping")));
>>
>> Isn't it better to use "an exclusive backup" explicitly rather than
>> "a backup" here?
>
> Yes. It would be better to not touch the original in-progress messages
> though.

On second thought, do users really want to distinguish those three errornous
states? I'm inclined to merge the checks for those three conditions into one,
that is,

if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
{
WALInsertLockRelease();
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("a backup is already in progress"),

Also it may be better to handle the similar checks in pg_stop_backup,
in the same way.

>> You changed the code so that pg_stop_backup() removes backup_label before
>> it marks the exclusive-backup-state as not-in-progress. Could you tell me
>> why you did this change? It's better to explain it in the comment.
>
> That's actually mentioned in the comments :)
>
> This is to ensure that there cannot be any other pg_stop_backup() running
> in parallel and trying to remove the backup label file. The key point here
> is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
> that the removal of the backup_label file is kept last on purpose (that's
> what HEAD does anyway), and that we can rollback to an in-progress state
> in case of failure.

Okay.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier
On Wed, Dec 14, 2016 at 9:25 AM, Kyotaro HORIGUCHI
 wrote:
> Ah, sorry for the confusion.
>
> At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Kyotaro HORIGUCHI
Ah, sorry for the confusion.

At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier
On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the comment.

Er, it is good to see more people interested in this problem... As the
former author, still working actively on this patch, perhaps it would
be better if I continue to code this patch, no? It is a waste to step
on each other's toes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier
On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:

Thanks for the review.

> Isn't it better to mention "an exclusive backup" here? What about
>
> EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an 
> exclusive
> backup.
> EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
> backup.

OK, changed this way.

> I think that it's worth explaining why ExclusiveBackupState is necessary,
> in the comment.

Changed like that at the top of the declaration of ExclusiveBackupState:
 * State of an exclusive backup, necessary to control concurrent activities
 * across sessions when working on exclusive backups.

> - * exclusiveBackup is true if a backup started with pg_start_backup() is
> - * in progress, and nonExclusiveBackups is a counter indicating the 
> number
> + * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
> + * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS 
> once
> + * it is done, and nonExclusiveBackups is a counter indicating the number
>
> Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
> Basically it's better to explain fully how the state changes.

Let's simplify things instead and just say that the meaning of the values is
described where ExclusiveBackupState is declared.

> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is already starting")));
> +}
> +if (XLogCtl->Insert.exclusiveBackupState == 
> EXCLUSIVE_BACKUP_STOPPING)
> +{
> +WALInsertLockRelease();
> +ereport(ERROR,
> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is currently stopping")));
>
> Isn't it better to use "an exclusive backup" explicitly rather than
> "a backup" here?

Yes. It would be better to not touch the original in-progress messages
though.

> You changed the code so that pg_stop_backup() removes backup_label before
> it marks the exclusive-backup-state as not-in-progress. Could you tell me
> why you did this change? It's better to explain it in the comment.

That's actually mentioned in the comments :)

This is to ensure that there cannot be any other pg_stop_backup() running
in parallel and trying to remove the backup label file. The key point here
is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
that the removal of the backup_label file is kept last on purpose (that's
what HEAD does anyway), and that we can rollback to an in-progress state
in case of failure.

I have rewritten this block like that. Does that sound better?

+* Remove backup_label for an exclusive backup. Before doing anything
+* the status of the exclusive backup is switched to
+* EXCLUSIVE_BACKUP_STOPPING to ensure that there cannot be concurrent
+* operations. In case of failure, in which case the status of the
+* exclusive backup is switched back to in-progress. The removal of the
+* backup_label file is kept last as it is the critical piece proving
+* if an exclusive backup is running or not in the event of a system
+* crash.

Thanks,
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 084401d..8fa9cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -463,6 +463,29 @@ typedef union WALInsertLockPadded
 } WALInsertLockPadded;
 
 /*
+ * State of an exclusive backup, necessary to control concurrent activities
+ * across sessions when working on exclusive backups.
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is no exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_STOPPING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/*
  * Shared state data for WAL insertion.
  */
 typedef struct XLogCtlInsert
@@ -503,13 +526,14 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
-	 * of streaming base backups currently in progress. forcePageWrites is set
-	 * to true when either of these is non-zero. lastBackupStart is the latest
-	 * checkpoint redo location used as a starting point for an online backup.
+	 * The possible 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Kyotaro HORIGUCHI
Thank you for the comment.

At Tue, 13 Dec 2016 12:49:12 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-12 Thread Fujii Masao
On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-11-28 Thread Michael Paquier
On Thu, Nov 17, 2016 at 1:18 PM, Michael Paquier
 wrote:
> On Mon, Nov 7, 2016 at 6:18 PM, Kyotaro HORIGUCHI
>  wrote:
>> I will mark this as "Ready for Committer".
>
> I have just noticed that Robert has switched this patch to "needs
> review" by mistake (I think that there was a mistake with another
> patch), so I have switched it back to "Ready for committer". I agree
> with Horiguchi-san that this seems adapted, as it got some review and
> some love.

Moved to next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-11-16 Thread Michael Paquier
On Mon, Nov 7, 2016 at 6:18 PM, Kyotaro HORIGUCHI
 wrote:
> I will mark this as "Ready for Committer".

I have just noticed that Robert has switched this patch to "needs
review" by mistake (I think that there was a mistake with another
patch), so I have switched it back to "Ready for committer". I agree
with Horiguchi-san that this seems adapted, as it got some review and
some love.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-11-07 Thread Kyotaro HORIGUCHI
Hello,

At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-11-05 Thread Michael Paquier
On Fri, Nov 4, 2016 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
>> On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao  wrote:
>> > On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
>> >  wrote:
>> >> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao  
>> >> wrote:
>> >>> You seem to add another entry for this patch into CommitFest.
>> >>> Either of them needs to be removed.
>> >>> https://commitfest.postgresql.org/10/698/
>> >>
>> >> Indeed. I just removed this one.
>> >>
>> >>> This patch prevents pg_stop_backup from starting while pg_start_backup
>> >>> is running. OTOH, we also should prevent pg_start_backup from starting
>> >>> until "previous" pg_stop_backup has completed? What happens if
>> >>> pg_start_backup starts while pg_stop_backup is running?
>> >>> As far as I read the current code, ISTM that there is no need to do that,
>> >>> but it's better to do the double check.
>> >>
>> >> I don't agree about doing that.
>> >
>> > Hmm... my previous comment was confusing. I wanted to comment "it's better
>> > *also for you* to do the double check whether we need to prevent 
>> > pg_start_backup
>> > while pg_stop_backup is running or not". Your answer seems to be almost 
>> > the same
>> > as mine, i.e., not necessary. Right?
>>
>> Yes, that's not necessary to do more. In the worst case, say
>> pg_start_backup starts when pg_stop_backup is running. And
>> pg_stop_backup has decremented the backup counters, but not yet
>> removed the backup_label, then pg_start_backup would just choke on the
>> presence of the backup_label file
>
> I don't see any problem on the state-transition of
> exclusiveBackupState. For the following part
>
> @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool 
> fast, TimeLineID *starttli_p,
>  {
>/*
> * Check for existing backup label --- implies a back>
up is already
> -   * running.  (XXX given that we checked exclusiveBackup above,
> +   * running.  (XXX given that we checked exclusiveBackupState above,
> * maybe it would be OK to just unlink any such label file?)
>
> The issue in the XXX comment should be settled by this
> chance. PostgreSQL no longer (or should not) places the file by
> mistake of itself. It is only possible by someone evil crafting
> it, or crash-then-restarted. For the former it can be safely
> removed with some notice. For the latter we should remove it
> since the backup taken through the restarting is not
> reliable. Addition to that, issueing pg_start_backup indicates
> that the operator already have forgotten that he/she issued it
> previously. So it seems to me that it can be removed with some
> WARNING.
>
> The error checking on exclusiveBackupState looks somewhat
> redandunt but I don't come up with a better coding.

Yes, that's on purpose to make the error messages more verbose for the user.

> So, everything other than the XXX comment looks fine for me, and
> this is rather straghtforward patch.

I agree that we could do something better, but that would be an
optimization, not a bug fix which is what this patch is aiming at.

> So after deciding what to do
> for the issue and anyone opposed to this patch, I'll make this
> 'Ready for commiter'.

Thanks for the input.

By the way, as long as I look at that, the patch applies cleanly on
master and 9.6 but not further down. If a committer shows up to push
this patch, I can prepare versions for back branches as needed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-11-04 Thread Kyotaro HORIGUCHI
Hello,

(the header of this message is crafted so it might be isolate
this message from the thread)

The patch still applies on the current master with disaplacements.

> On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao  wrote:
> > On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
> >  wrote:
> >> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao  
> >> wrote:
> >>> You seem to add another entry for this patch into CommitFest.
> >>> Either of them needs to be removed.
> >>> https://commitfest.postgresql.org/10/698/
> >>
> >> Indeed. I just removed this one.
> >>
> >>> This patch prevents pg_stop_backup from starting while pg_start_backup
> >>> is running. OTOH, we also should prevent pg_start_backup from starting
> >>> until "previous" pg_stop_backup has completed? What happens if
> >>> pg_start_backup starts while pg_stop_backup is running?
> >>> As far as I read the current code, ISTM that there is no need to do that,
> >>> but it's better to do the double check.
> >>
> >> I don't agree about doing that.
> >
> > Hmm... my previous comment was confusing. I wanted to comment "it's better
> > *also for you* to do the double check whether we need to prevent 
> > pg_start_backup
> > while pg_stop_backup is running or not". Your answer seems to be almost the 
> > same
> > as mine, i.e., not necessary. Right?
> 
> Yes, that's not necessary to do more. In the worst case, say
> pg_start_backup starts when pg_stop_backup is running. And
> pg_stop_backup has decremented the backup counters, but not yet
> removed the backup_label, then pg_start_backup would just choke on the
> presence of the backup_label file

I don't see any problem on the state-transition of
exclusiveBackupState. For the following part

@@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, 
TimeLineID *starttli_p,
 {
   /*
* Check for existing backup label --- implies a backup is already
-   * running.  (XXX given that we checked exclusiveBackup above,
+   * running.  (XXX given that we checked exclusiveBackupState above,
* maybe it would be OK to just unlink any such label file?)

The issue in the XXX comment should be settled by this
chance. PostgreSQL no longer (or should not) places the file by
mistake of itself. It is only possible by someone evil crafting
it, or crash-then-restarted. For the former it can be safely
removed with some notice. For the latter we should remove it
since the backup taken through the restarting is not
reliable. Addition to that, issueing pg_start_backup indicates
that the operator already have forgotten that he/she issued it
previously. So it seems to me that it can be removed with some
WARNING.

The error checking on exclusiveBackupState looks somewhat
redandunt but I don't come up with a better coding.

So, everything other than the XXX comment looks fine for me, and
this is rather straghtforward patch. So after deciding what to do
for the issue and anyone opposed to this patch, I'll make this
'Ready for commiter'.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-29 Thread Michael Paquier
On Tue, Aug 30, 2016 at 1:44 PM, Fujii Masao  wrote:
> On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
>  wrote:
>> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao  wrote:
>>> You seem to add another entry for this patch into CommitFest.
>>> Either of them needs to be removed.
>>> https://commitfest.postgresql.org/10/698/
>>
>> Indeed. I just removed this one.
>>
>>> This patch prevents pg_stop_backup from starting while pg_start_backup
>>> is running. OTOH, we also should prevent pg_start_backup from starting
>>> until "previous" pg_stop_backup has completed? What happens if
>>> pg_start_backup starts while pg_stop_backup is running?
>>> As far as I read the current code, ISTM that there is no need to do that,
>>> but it's better to do the double check.
>>
>> I don't agree about doing that.
>
> Hmm... my previous comment was confusing. I wanted to comment "it's better
> *also for you* to do the double check whether we need to prevent 
> pg_start_backup
> while pg_stop_backup is running or not". Your answer seems to be almost the 
> same
> as mine, i.e., not necessary. Right?

Yes, that's not necessary to do more. In the worst case, say
pg_start_backup starts when pg_stop_backup is running. And
pg_stop_backup has decremented the backup counters, but not yet
removed the backup_label, then pg_start_backup would just choke on the
presence of the backup_label file.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-29 Thread Fujii Masao
On Tue, Aug 30, 2016 at 1:32 PM, Michael Paquier
 wrote:
> On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao  wrote:
>> You seem to add another entry for this patch into CommitFest.
>> Either of them needs to be removed.
>> https://commitfest.postgresql.org/10/698/
>
> Indeed. I just removed this one.
>
>> This patch prevents pg_stop_backup from starting while pg_start_backup
>> is running. OTOH, we also should prevent pg_start_backup from starting
>> until "previous" pg_stop_backup has completed? What happens if
>> pg_start_backup starts while pg_stop_backup is running?
>> As far as I read the current code, ISTM that there is no need to do that,
>> but it's better to do the double check.
>
> I don't agree about doing that.

Hmm... my previous comment was confusing. I wanted to comment "it's better
*also for you* to do the double check whether we need to prevent pg_start_backup
while pg_stop_backup is running or not". Your answer seems to be almost the same
as mine, i.e., not necessary. Right?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-29 Thread Michael Paquier
On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao  wrote:
> You seem to add another entry for this patch into CommitFest.
> Either of them needs to be removed.
> https://commitfest.postgresql.org/10/698/

Indeed. I just removed this one.

> This patch prevents pg_stop_backup from starting while pg_start_backup
> is running. OTOH, we also should prevent pg_start_backup from starting
> until "previous" pg_stop_backup has completed? What happens if
> pg_start_backup starts while pg_stop_backup is running?
> As far as I read the current code, ISTM that there is no need to do that,
> but it's better to do the double check.

I don't agree about doing that. The on-disk presence of the
backup_label file is what prevents pg_start_backup to run should
pg_stop_backup have already decremented its counters but not unlinked
yet the backup_label, so this insurance is really enough IMO. One good
reason to keep pg_stop_backup as-is in its failure handling is that if
for example it fails to remove the backup_label file, it is still
possible to do again a backup by just removing manually the existing
backup_label file *without* restarting the server. If you have an
in-memory state it would not be possible to fallback to this method
and you'd need a method to clean up the in-memory state.

Now, well, with the patch as well as HEAD, it could be possible that
the backup counters are decremented, but pg_stop_backup *fails* to
remove the backup_label file which would prevent any subsequent
pg_start_backup to run, because there is still a backup_label file, as
well as any other pg_stop_backup to run, because there is no backup
running per the in-memory state. We could improve the in-memory state
by:
- Having an extra exclusive backup status to mark an exclusive backup
as stopping, say EXCLUSIVE_BACKUP_STOPPING. Then mark the exclusive
backup status as such when do_pg_stop_backup starts.
- delaying counter decrementation in pg_stop_backup to the moment when
the backup_label file has been removed.
- Taking an extra time the exclusive WAL insert lock after
backup_label is removed, and decrement the counters.
- During the time the backup_label file is removed, we need an error
callback to switch back to BACKUP_RUNNING in case of error, similarly
to do_pg_start_backup.
Which is more or less the attached patch. Now, if pg_stop_backup
fails, this has the disadvantage to force a server restart to clean up
the in-memory state, instead of just removing manually the existing
backup_file.

For those reasons I still strongly recommend using v3, and not meddle
with the error handling of pg_stop_backup. Because, well, that's
actually not necessary, and this would just hurt the error handling of
exclusive backups.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index acd95aa..ad04e3e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -460,6 +460,28 @@ typedef union WALInsertLockPadded
 } WALInsertLockPadded;
 
 /*
+ * State of an exclusive backup
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_STOPPING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/*
  * Shared state data for WAL insertion.
  */
 typedef struct XLogCtlInsert
@@ -500,13 +522,14 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
+	 * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+	 * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+	 * it is done, and nonExclusiveBackups is a counter indicating the number
 	 * of streaming base backups currently in progress. forcePageWrites is set
 	 * to true when either of these is non-zero. lastBackupStart is the latest
 	 * checkpoint redo location used as a starting point for an online backup.
 	 */
-	bool		exclusiveBackup;
+	ExclusiveBackupState exclusiveBackupState;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
 
@@ -842,6 +865,7 @@ static void xlog_outrec(StringInfo buf, XLogReaderState *record);
 #endif
 static void xlog_outdesc(StringInfo buf, XLogReaderState *record);
 static void pg_start_backup_callback(int code, Datum arg);
+static void 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-29 Thread Fujii Masao
On Mon, Aug 22, 2016 at 10:43 AM, Michael Paquier
 wrote:
> On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich  
> wrote:
>> Michael Paquier writes:
>>
>>> Andreas, with the patch attached is the assertion still triggered?
>>> [2. text/x-diff; base-backup-crash-v2.patch]
>>
>> I didn't observe the crashes since applying this patch.  There should
>> have been about five by the amount of fuzzing done.
>
> I have reworked the patch, replacing those two booleans with a single
> enum. That's definitely clearer. Also, I have added this patch to the
> CF to not lose track of it:
> https://commitfest.postgresql.org/10/731/
> Horiguchi-san, I have added you as a reviewer as you provided some
> input. I hope you don't mind.

You seem to add another entry for this patch into CommitFest.
Either of them needs to be removed.
https://commitfest.postgresql.org/10/698/

This patch prevents pg_stop_backup from starting while pg_start_backup
is running. OTOH, we also should prevent pg_start_backup from starting
until "previous" pg_stop_backup has completed? What happens if
pg_start_backup starts while pg_stop_backup is running?
As far as I read the current code, ISTM that there is no need to do that,
but it's better to do the double check.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 10:09 PM, Amit Kapila  wrote:
> Won't the similar problem exists for nonExclusiveBackups?  Basically
> in similar situation, the count for nonExclusiveBackups will be
> decremented and if it hits pg_start_backup_callback(), the following
> Assertion can fail.
> pg_start_backup_callback()
> {
> ..
> Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
> ..
> }

This cannot be hit for non-exclusive backups as pg_start_backup() and
pg_stop_backup() need to be launched from the same session: their call
will never overlap across session, and this overlap is the reason why
exclusive backups are exposed to the problem.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-22 Thread Amit Kapila
On Mon, Aug 22, 2016 at 7:13 AM, Michael Paquier
 wrote:
> On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich  
> wrote:
>> Michael Paquier writes:
>>
>>> Andreas, with the patch attached is the assertion still triggered?
>>> [2. text/x-diff; base-backup-crash-v2.patch]
>>
>> I didn't observe the crashes since applying this patch.  There should
>> have been about five by the amount of fuzzing done.
>
> I have reworked the patch, replacing those two booleans with a single
> enum. That's definitely clearer. Also, I have added this patch to the
> CF to not lose track of it:
> https://commitfest.postgresql.org/10/731/
> Horiguchi-san, I have added you as a reviewer as you provided some
> input. I hope you don't mind.
>

Won't the similar problem exists for nonExclusiveBackups?  Basically
in similar situation, the count for nonExclusiveBackups will be
decremented and if it hits pg_start_backup_callback(), the following
Assertion can fail.
pg_start_backup_callback()
{
..
Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
..
}


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-21 Thread Michael Paquier
On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich  wrote:
> Michael Paquier writes:
>
>> Andreas, with the patch attached is the assertion still triggered?
>> [2. text/x-diff; base-backup-crash-v2.patch]
>
> I didn't observe the crashes since applying this patch.  There should
> have been about five by the amount of fuzzing done.

I have reworked the patch, replacing those two booleans with a single
enum. That's definitely clearer. Also, I have added this patch to the
CF to not lose track of it:
https://commitfest.postgresql.org/10/731/
Horiguchi-san, I have added you as a reviewer as you provided some
input. I hope you don't mind.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..2b8c09c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -460,6 +460,25 @@ typedef union WALInsertLockPadded
 } WALInsertLockPadded;
 
 /*
+ * State of an exclusive backup
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/*
  * Shared state data for WAL insertion.
  */
 typedef struct XLogCtlInsert
@@ -500,13 +519,14 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
+	 * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+	 * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+	 * it is done, and nonExclusiveBackups is a counter indicating the number
 	 * of streaming base backups currently in progress. forcePageWrites is set
 	 * to true when either of these is non-zero. lastBackupStart is the latest
 	 * checkpoint redo location used as a starting point for an online backup.
 	 */
-	bool		exclusiveBackup;
+	ExclusiveBackupState exclusiveBackupState;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
 
@@ -9833,7 +9853,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	WALInsertLockAcquireExclusive();
 	if (exclusive)
 	{
-		if (XLogCtl->Insert.exclusiveBackup)
+		if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_NONE)
 		{
 			WALInsertLockRelease();
 			ereport(ERROR,
@@ -9841,7 +9861,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	 errmsg("a backup is already in progress"),
 	 errhint("Run pg_stop_backup() and try again.")));
 		}
-		XLogCtl->Insert.exclusiveBackup = true;
+		XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING;
 	}
 	else
 		XLogCtl->Insert.nonExclusiveBackups++;
@@ -10096,7 +10116,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 		{
 			/*
 			 * Check for existing backup label --- implies a backup is already
-			 * running.  (XXX given that we checked exclusiveBackup above,
+			 * running.  (XXX given that we checked exclusiveBackupState above,
 			 * maybe it would be OK to just unlink any such label file?)
 			 */
 			if (stat(BACKUP_LABEL_FILE, _buf) != 0)
@@ -10178,6 +10198,16 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 
 	/*
+	 * Mark that start phase has correctly finished for an exclusive backup.
+	 */
+	if (exclusive)
+	{
+		WALInsertLockAcquireExclusive();
+		XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
+		WALInsertLockRelease();
+	}
+
+	/*
 	 * We're done.  As a convenience, return the starting WAL location.
 	 */
 	if (starttli_p)
@@ -10195,8 +10225,8 @@ pg_start_backup_callback(int code, Datum arg)
 	WALInsertLockAcquireExclusive();
 	if (exclusive)
 	{
-		Assert(XLogCtl->Insert.exclusiveBackup);
-		XLogCtl->Insert.exclusiveBackup = false;
+		Assert(XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STARTING);
+		XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE;
 	}
 	else
 	{
@@ -10204,7 +10234,7 @@ pg_start_backup_callback(int code, Datum arg)
 		XLogCtl->Insert.nonExclusiveBackups--;
 	}
 
-	if (!XLogCtl->Insert.exclusiveBackup &&
+	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
 		XLogCtl->Insert.nonExclusiveBackups == 0)
 	{
 		XLogCtl->Insert.forcePageWrites = false;
@@ -10280,14 +10310,21 @@ do_pg_stop_backup(char *labelfile, bool