Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

Kyotaro HORIGUCHI Thu, 21 Jul 2016 01:21:07 -0700

Sorry for the absense. I've reached here.

At Thu, 21 Jul 2016 12:20:30 +0900, Michael Paquier <michael.paqu...@gmail.com> 
wrote in <cab7npqr+krkk9fqwqusjna2ob-rcq6mi8ybz3wl0kcob5ss...@mail.gmail.com>
> On Thu, Jul 21, 2016 at 11:56 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> > On Thu, Jul 21, 2016 at 7:28 AM, Michael Paquier
> > <michael.paqu...@gmail.com> wrote:
> >> On Wed, Jul 20, 2016 at 8:56 PM, Michael Paquier
> >> <michael.paqu...@gmail.com> wrote:
> >>>>
> >>>> Yeah, I think that is totally different angle to fix this issue, so
> >>>> don't you think it is better to start a separate thread to discuss
> >>>> about it for 10.0 and mark this patch as ready for committer.
> >>>
> >>> I'd like to tackle this problem in 10.0, but that will strongly depend
> >>> on how my patches move on in CF1 and CF2.
> >>
> >> By the way, thank you for taking the time to provide input. I think
> >> we're in good shape here now.
> >>
> >
> > So, if I understand correctly, then we can mark the version posted by
> > you upthread [1] which includes a test along with Kyotaro's fix can be
> > marked as Ready for committer.  If so, then please change the status
> > of patch accordingly.
> 
> Oops. I thought you did it already. So done.


Thank you very much for the intensive discussion on this.

After some additional consideration, I attached two patches about
comment and documentation. 0001- is a patch related to the
previous patch that adds description on why we use replay end
point instead of minRecoveryPoint. And 0002- is a fix and an
additional mention on what would happen when pg_control is not
copied last during backup.


==== About the first patch.

Related to the previous patch, I found the following comment just
above where it applies.

xlog.c:10412
- * We return the current minimum recovery point as the backup end
- * location. Note that it can be greater than the exact backup end
- * location if the minimum recovery point is updated after the backup of
- * pg_control. This is harmless for current uses.

I haven't gave a notice on this but this seems to be necessary to
edit so as to mention this fix and the gap like the following.

+ * minRecoveryPoint can go behind the last checkpoint's redo location when
+ * the checkpoint writes out no buffer. This does no harm to performing a
+ * recovery but such inversion seems inconsistent from the view of the
+ * callers and prevents them from knowing WAL segments needed by sane
+ * calcuation. For the reason we return the last replayed point as the
+ * backup end location. Note that it can be greater than the exact backup
+ * end location or even the minimum recovery point of pg_control at the
+ * time. This is harmless for current uses.


==== About the second patch.

By the way, related to the following discussion in the upthread,

At Tue, 19 Jul 2016 14:13:36 +0900, Michael Paquier <michael.paqu...@gmail.com> 
wrote in <cab7npqrxmkr2rq7_wa1azzqbfvruwu_5qtbr_keqqwgueja...@mail.gmail.com>
> The thing that is really annoying btw is that there will be always a
> gap between minRecoveryPoint and the actual moment where a backup
> finishes because there is no way to rely on the XLOG_BACKUP_END
> record. On top of that we can not be sure if pg_control has been
> backed up last or not. Which is why it would be cool to document that
> gap.

Even with out my previous patch, the gap between minRecoveryPoint
*when pg_control is backed up* and that (or the replay end) at
the end of backup is crucial and should be back-patched.

Addition to that, while I examined the documentation I found the
following description about pg_stop_backup in
continuous-archiving.html, which contradicts to its definition.

> In the same connection as before, issue the command:
> 
>   SELECT * FROM pg_stop_backup(false);
...
> The pg_stop_backup will return one row with three values.

AFAICS pg_stop_backup returns a single value of LSN. I don't know
where this comes from, but the attached patch fixes this and adds
a mention on the "gap". The original description mentioned
backup_label and tablespace_map but it seems not necessary. The
following is the new content rewritten by the patch.

| The pg_stop_backup will return the LSN when it is called. This
| LSN informs upto where the backup needs WAL segment files to
| complete. For a backup taken from a master, this function leaves
| a backup history file to inform the segment files needed and a
| server starts from the backup performes a recovery up to where a
| backup-end WAL record to reach a consistent state. But things are
| different for a backup taken from a standby. For the case, backup
| history file won't be created and recvoery is perfomed up to the
| Minimum-recovery-ending-location field in the pg_control file
| included in the backup instead. The location is properly stored
| in the backup if you copy the the pg_control file after all the
| other files as pg_basebackup does. Note that a hot standby
| started from a backup not taken in that manner will consider to
| have reached a consistent state mistakenly earlier.
| 
| Once the WAL segment files active during the backup are archived,
| you are done. The file contains the LSN returned by
| pg_stop_backup is the last segment that is required to form a
| complete set of backup files.

The "LSN" in the first line is intentionally ambiguated. It may
seems too much to write here.

> Another crazy idea would be to return pg_control as an extra
> return field of pg_stop_backup() and encourage users to write that
> back in the backup itself. This would allow closing any hole in the
> current logic for backups taken from live standbys: minRecoveryPoint
> would be updated directly to the last replayed LSN/TLI in the control
> file.

It seems to be doable at do_pg_*_backup level. non-exclusive
backup code does the same thing to backup_label but no usage
seen.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 74fc54085d20955cef4d5b7ce4b49d10351b1fbd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Thu, 21 Jul 2016 15:45:04 +0900
Subject: [PATCH 1/2] Add the reason why we use replayEndRecPtr as recovery-end
 point.

---
 src/backend/access/transam/xlog.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aecede1..13ef75f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10409,10 +10409,14 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	 * required files are available, a user should wait for them to be
 	 * archived, or include them into the backup.
 	 *
-	 * We return the current minimum recovery point as the backup end
-	 * location. Note that it can be greater than the exact backup end
-	 * location if the minimum recovery point is updated after the backup of
-	 * pg_control. This is harmless for current uses.
+	 * minRecoveryPoint can go behind the last checkpoint's redo location when
+	 * the checkpoint writes out no buffer. This does no harm to performing a
+	 * recovery but such inversion seems inconsistent from the view of the
+	 * callers and prevents them from knowing WAL segments needed by sane
+	 * calcuation. For the reason we return the last replayed point as the
+	 * backup end location. Note that it can be greater than the exact backup
+	 * end location or even the minimum recovery point of pg_control at the
+	 * time. This is harmless for current uses.
 	 *
 	 * XXX currently a backup history file is for informational and debug
 	 * purposes only. It's not essential for an online backup. Furthermore,
-- 
1.8.3.1

>From 756daf723fb57c565e39c02dda235514e3bb7577 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Thu, 21 Jul 2016 15:43:15 +0900
Subject: [PATCH 2/2] Fix documentation about pg_stop_backup.

In the description about continuous archiving, pg_stop_backup is
wrongly described. This fixes that. Addition to that, adding a mention
on what happens if pg_control is not backup last.
---
 doc/src/sgml/backup.sgml | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..c4c182d 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -894,20 +894,27 @@ SELECT * FROM pg_stop_backup(false);
      ready to archive.
     </para>
     <para>
-     The <function>pg_stop_backup</> will return one row with three
-     values. The second of these fields should be written to a file named
-     <filename>backup_label</> in the root directory of the backup. The
-     third field should be written to a file named
-     <filename>tablespace_map</> unless the field is empty. These files are
-     vital to the backup working, and must be written without modification.
+     The <function>pg_stop_backup</> will return the LSN when it is called.
+     This LSN informs upto where the backup needs WAL segment files to
+     complete. For a backup taken from a master, this function leaves a backup
+     history file to inform the segment files needed and a server starts from
+     the backup performes a recovery up to where a backup-end WAL record to
+     reach a consistent state. But things are different for a backup taken
+     from a standby. For the case, backup history file won't be created and
+     recvoery is perfomed up to the Minimum-recovery-ending-location field in
+     the pg_control file included in the backup instead. The location is
+     properly stored in the backup if you copy the the pg_control file after
+     all the other files as pg_basebackup does. Note that a hot standby
+     started from a backup not taken in that manner will consider to have
+     reached a consistent state mistakenly earlier.
     </para>
    </listitem>
    <listitem>
     <para>
      Once the WAL segment files active during the backup are archived, you are
-     done.  The file identified by <function>pg_stop_backup</>'s first return
-     value is the last segment that is required to form a complete set of
-     backup files.  If <varname>archive_mode</> is enabled,
+     done.  The file contains the LSN returned by <function>pg_stop_backup</>
+     is the last segment that is required to form a complete set of backup
+     files.  If <varname>archive_mode</> is enabled,
      <function>pg_stop_backup</> does not return until the last segment has
      been archived.
      Archiving of these files happens automatically since you have
-- 
1.8.3.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

Reply via email to