Sorry, I'm confused about the minRecoveryPoint.

Reconsidered a bit.

At Tue, 14 Jun 2016 20:31:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20160614.203111.229211034.horiguchi.kyot...@lab.ntt.co.jp>
> > > After looking more closely, I found that the minRecoveryPoint
> > > tends to be too small as the backup end point, and up to the
> > > record at the lastReplayedRecPtr can affect the pages on disk and
> > > they can go into the backup just taken.
> > >
> > > My conclusion here is that do_pg_stop_backup should return
> > > lastReplayedRecPtr, not minRecoveryPoint.
> > 
> > I have been thinking quite a bit about this patch, and this logic
> > sounds quite right to me. When stopping the backup we need to let the
> > user know up to which point it needs to replay WAL, and relation pages
> > are touched up to lastReplayedEndRecPtr.
> 
> Yes, but by design, the changes in buffers don't go into disk
> until buffer replacing occurs, which updates minRecoveryPoint. My
> understanding is that the problem is that a restart point that is
> not accompanied with buffer updates advances only the redo point
> of the last checkpoint and doesn't update minRecoveryPoint, which
> may be behind the redo point at the time.
> 
> It seems to me that we could more agressively advance the
> minRecoveryPoint (but must not let it go too far..), but it is
> right for it to aim a bit smaller than the ideal location.

It's wrong. minRecoveryPoint should be greater than or equal to
the maximum buffer-touching LSN reached in previous recoveries,
and it can be the same to replayEndRecPtr but may be behind it if
no acutual modification on page files is done hereafter. xlog.c
works that way. The value of the minRecoveryPoint smaller than
the redo point of the last checkpoint with no buffer flush is
allowable from this point of view but it is not proper for the
end point of a backup.

If we skip recording the last checkpoint position when it
eventually causes no buffer flush, minRecoveryPoint is again
usable for the purpose. However, it causes repeated restartpoint
trial on the same (skipped) checkpoint record.

As the consequence, we can solve this problemn also by explicitly
updating the minRecoveryPoint for an executed restartpoint
without no buffer flush.

The attached patch performs this way and also solves the problem.

Which one do you think is more preferable? Or any other solution?

This patch updates minRecoeryPoint only for restartpoints that
caused no buffer flush but such restriction might not be
necessary.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5927a196295eea63424011c15d7359ed8141812c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Wed, 15 Jun 2016 12:00:33 +0900
Subject: [PATCH] Advancing minRecoveryPoint for executed empty restart point.

Currently, restart point with no buffer flush doesn't update the
minRecoveryPoint but updates lastCheckPoint. This can cause
do_pg_stop_backup() to return the stop LSN smaller than the start LSN
given by do_pg_start_backup() and leads to falure in taking base
backup.

This patch let CreateRestartPoint update the minRecoveryPoint if an
executed restartpoint is accompanied with no buffer flush.
---
 src/backend/access/transam/xlog.c     | 9 +++++++++
 src/test/recovery/t/001_stream_rep.pl | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..7697223 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8797,6 +8797,15 @@ CreateRestartPoint(int flags)
 	CheckPointGuts(lastCheckPoint.redo, flags);
 
 	/*
+	 * basebackup needs minRecoveryPoint to be grater than or equal to the
+	 * redo point of this checkpoint. If no buffer is flushed so far,
+	 * minRecoveryPoint has not advanced during this checkpoint. So explicitly
+	 * advance it to there for the case.
+	 */
+	if (CheckpointStats.ckpt_bufs_written == 0)
+		UpdateMinRecoveryPoint(lastCheckPointRecPtr, false);
+
+	/*
 	 * Remember the prior checkpoint's redo pointer, used later to determine
 	 * the point at which we can truncate the log.
 	 */
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 7b42f21..cee4768 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -24,6 +24,11 @@ $node_standby_1->start;
 # pg_basebackup works on a standby).
 $node_standby_1->backup($backup_name);
 
+# Take a second backup of the standby while the master is offline.
+$node_master->stop;
+$node_standby_1->backup('my_backup_2');
+$node_master->start;
+
 # Create second standby node linking to standby 1
 my $node_standby_2 = get_new_node('standby_2');
 $node_standby_2->init_from_backup($node_standby_1, $backup_name,
-- 
1.8.3.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to