If not set, could you add recovery.conf file
recovery_target_timeline='latest'
parameter?
https://www.postgresql.org/docs/devel/static/recovery-target-settings.html


2018-03-08 10:41 GMT+03:00 Dylan Luong <dylan.lu...@unisa.edu.au>:

> Hi Michael,
>
> I tested the failover today and the slave 2 failed to resync with the new
> master (old slave1).
>
> After I promoted the slave1 to become master,  I was able to use pg_rewind
> on the old master and bring it back as new slave.
>
> I then stopped slave2 and ran pg_rewind on slave2 against new master, it
> report that no rewind was required:
>
>       $ pg_rewind -D /var/lib/pgsql/9.6/data 
> --source-server="host=xxxxx.xxx.xxxx
> port=5432 user=postgres"
>       servers diverged at WAL position 1BB/AB000098 on timeline 5
>       no rewind required
>
> So I then updated the recovery.conf on slave2 with primary_conninfo equal
> to the new master IP.
> When starting up posgres, it failed with the following error in the logs:
>
> database system was shut down in recovery at 2018-03-08 17:52:10 ACDT
> 2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG:
> entering standby mode
> cp: cannot stat '/pg_backup/backup/archive /00000005.history': No such
> file or directory
> cp: cannot stat '/pg_backup/backup/archive /00000005000001BB000000AB': No
> such file or directory
> 2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG:
> consistent recovery state reached at 1BB/AB000098
> 2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG:  record
> with incorrect prev-link 1B9/73000040 at 1BB/AB000098
> 2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG:
> database system is ready to accept read only connections
> 2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG:  started
> streaming WAL from primary at 1BB/AB000000 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL:  End
> of WAL reached on timeline 5 at 1BB/AB000098.
> cp: cannot stat '/pg_backup/backup/archive_sync/00000005000001BB000000AB':
> No such file or directory
> 2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG:
> restarted WAL streaming at 1BB/AB000000 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL:  End
> of WAL reached on timeline 5 at 1BB/AB000098.
>
>
> On the new master in the /pg_backup/backup/archive folder I can see a file
> 00000005000001BB000000AB.partial
> Eg.
> ls -l
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:48
> 00000005000001BB000000AB.partial
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000AB
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000AC
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000AD
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000AE
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000AF
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000B0
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000B1
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:49
> 00000006000001BB000000B2
> -rw-------. 1 postgres postgres 16777216 Mar  8 16:50
> 00000006000001BB000000B3
> -rw-------. 1 postgres postgres 16777216 Mar  8 17:01
> 00000006000001BB000000B4
> -rw-------. 1 postgres postgres 16777216 Mar  8 17:14
> 00000006000001BB000000B5
> -rw-------. 1 postgres postgres      218 Mar  8 16:48 00000006.history
>
> Any ideas?
>
> Dylan
>
> -----Original Message-----
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Tuesday, 6 March 2018 5:55 PM
> To: Dylan Luong <dylan.lu...@unisa.edu.au>
> Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Resync second slave to new master
>
> On Tue, Mar 06, 2018 at 06:00:40AM +0000, Dylan Luong wrote:
> > So everytime after promoting Slave to become master (either manually
> > or automatic), just stop Slave2 and run pg_rewind on slave2 against
> > the new maser (old slave1). And when old master server is available
> > again, use pg_rewind on that serve as well against new master to
> > return to original configuration.
>
> Yes.  That's exactly the idea.  Running pg_rewind on the old master will
> be necessary anyway because you need to stop it cleanly once, which will
> cause it to generate WAL records at least for the shutdown checkpoint,
> while doing it on slave 2 may be optional, still safer to do.
> --
> Michael
>
>

Reply via email to