On Mon, Oct 23, 2017 at 11:20 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > Remove the code that maintained two checkpoint's WAL files and all > associated stuff. > > Try to avoid breaking anything else > > This > * reduces disk space requirements on master > * removes a minor bug in fast failover > * simplifies code
In short, yeah! I had a close look at the patch and noticed a couple of issues. + else /* - * The last valid checkpoint record required for a streaming - * recovery exists in neither standby nor the primary. + * We used to attempt to go back to a secondary checkpoint + * record here, but only when not in standby_mode. We now + * just fail if we can't read the last checkpoint because + * this allows us to simplify processing around checkpoints. */ ereport(PANIC, (errmsg("could not locate a valid checkpoint record"))); - } Using brackets in this case is more elegant style IMO. OK, there is one line, but the comment is long so the block becomes confusingly-shaped. /* Version identifier for this pg_control format */ -#define PG_CONTROL_VERSION 1003 +#define PG_CONTROL_VERSION 1100 Yes, this had to happen anyway in this release cycle. backup.sgml describes the following: to higher segment numbers. It's assumed that segment files whose contents precede the checkpoint-before-last are no longer of interest and can be recycled. However this is not true anymore with this patch. The documentation of pg_control_checkpoint() is not updated. func.sgml lists the tuple fields returned for this function. You forgot to update pg_control_checkpoint in pg_proc.h. prior_lsn is still listed in the list of arguments returned. And you need to update as well the output list of types. * whichChkpt identifies the checkpoint (merely for reporting purposes). - * 1 for "primary", 2 for "secondary", 0 for "other" (backup_label) + * 1 for "primary", 0 for "other" (backup_label) + * 2 for "secondary" is no longer supported */ I would suggest to just remove the reference to "secondary". - * Delete old log files (those no longer needed even for previous - * checkpoint or the standbys in XLOG streaming). + * Delete old log files and recycle them */ Here that's more "Delete or recycle old log files". Recycling of a WAL segment is its renaming into a newer segment. You forgot to update this formula in xlog.c: distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate; /* add 10% for good measure. */ distance *= 1.10; You can guess easily where the mistake is. - checkPointLoc = ControlFile->prevCheckPoint; + /* + * It appears to be a bug that we used to use prevCheckpoint here + */ + checkPointLoc = ControlFile->checkPoint; Er, no. This is correct because we expect the prior checkpoint to still be present in the event of a failure detecting the latest checkpoint. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers