On Wed, Aug 01, 2018 at 09:09:30PM +0000, Richard Schmidt wrote:
> Our procedure that runs on machine A and B is as follows:
> 
>   1.  Build new databases on A and B, and configure A as Primary and B
>   as Standby databases. 
>   2.  Make some changes to the A (the primary) and check that they are
>   replicated to the B (the standby) 
>   3.  Promote B to be the new primary
>   4.  Switch of the A (the original primary)
>   5.  Add the replication slot to B (the new primary) for A (soon to
>   be standby)
>   6.  Add a recovery.conf to A (soon to be standby). File contains
>   recovery_target_timeline = 'latest' and restore_command = 'cp
>   /ice-dev/wal_archive/%f "%p" 
>   7.  Run pg_rewind on A - this appears to work as it returns the
>   message 'source and target cluster are on the same timeline no
>   rewind required'; 
>   8.  Start up server A (now a slave)

Step 7 is incorrect here, after promotion of B you should see pg_rewind
actually do its work.  The problem is that you are missing a piece in
your flow in the shape of a checkpoint on the promoted standby to run
after 3 and before step 7.  This makes the promoted standby update its
timeline number in the on-disk control file, which is used by pg_rewind
to check if a rewind needs to happen or not.

We see too many reports of such mistakes, I am going to propose a patch
on the -hackers mailing list to mention that in the documentation...
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to