On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
> Thank you for looking and retelling this.
> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmh...@gmail.com> wrote
> in <ca+tgmoardyfcqcg9x-e99xcxdc4lw0zygp5qlsphfsb6byd...@mail.gmail.com>
>> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier
>> <michael.paqu...@gmail.com> wrote:
>> >> So, if I understand correctly, then we can mark the version posted by
>> >> you upthread  which includes a test along with Kyotaro's fix can be
>> >> marked as Ready for committer. If so, then please change the status
>> >> of patch accordingly.
>> > Patch moved to next CF 2016-11, still with status "ready for committer".
>> > IMO, this thread has reached as conclusion to use this patch to
>> > address the problem reported:
>> > cab7npqtv5gmkqcndofgtgqoqxz2xlz4rrw247oqojzztvy6...@mail.gmail.com
>> I spent some time reading this thread today and trying to understand
>> exactly what's going on here. Here's my attempt to restate the
>> 1. If a restartpoint flushes no dirty buffers, then the redo location
>> advances but the minimum recovery point does not; therefore, the
>> minimum recovery point can fall behind the redo location. That's
>> correct behavior, because if the standby is subsequently restarted, it
>> can correctly begin at the checkpoint record associated with the
>> restartpoint and need not replay any further.
>> 2. However, it causes a problem when a base backup with the "include
>> WAL" option is taken from a standby because -- on standby servers only
>> -- the end location for a backup as returned by do_pg_stop_backup() is
>> the minimum recovery point. If this precedes the start point for the
>> backup -- which is the restart point -- perform_base_backup() will
>> conclude that no WAL files are required for the backup and fail an
>> internal sanity check.
> Yes. The option implies that the copied $PGDATA is a standalone
> backup, that is, usable to start a standby instantly so at least
> one WAL segment is needed, as mentioned in 3 below.
>> 3. Removing the sanity check wouldn't help, because it can never be
>> right to end up with zero WAL files as a result of a base backup. At
>> a minimum, we need the WAL file which contains the checkpoint record
>> from which the control file specifies that redo should start.
>> Actually, I think that checkpoint record could be spread across
>> multiple files: it might be big. We need them all, but the current
>> code won't ensure that we get them.
> Yes. The "checkpoit records" means that records during a checkpoint.
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point. If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location). Although this will work, it might
>> include more WAL in the backup than is strictly necessary. If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
> StartupXLOG mandates the existence of the start record of the
> latest checkpoint. pg_start_backup() returns the start point
> (redo location) of the restartpoint that it requested and
> minRecoveryPoint is always after the replpayed checkpoint so it
> works if always was going well.
> But if the checkpoint contains no update, the restart point
> exceeds minRecoveryPoint. Then no WAL copied.
>> I can think of two solutions that would be "tighter":
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record. I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery. A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.
I think what you are saying is not completely right, because we do
update minRecoveryPoint when we don't perform a new restart point.
When we perform restart point, then it assumes that flushing the
buffers will take care of updating minRecoveryPoint.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: