RE: Resync second slave to new master

Dylan Luong Wed, 07 Mar 2018 23:42:58 -0800

Hi Michael,

I tested the failover today and the slave 2 failed to resync with the new 
master (old slave1).

After I promoted the slave1 to become master,  I was able to use pg_rewind on 
the old master and bring it back as new slave.

I then stopped slave2 and ran pg_rewind on slave2 against new master, it report 
that no rewind was required:

      $ pg_rewind -D /var/lib/pgsql/9.6/data 
--source-server="host=xxxxx.xxx.xxxx port=5432 user=postgres"
      servers diverged at WAL position 1BB/AB000098 on timeline 5
      no rewind required

So I then updated the recovery.conf on slave2 with primary_conninfo equal to 
the new master IP.
When starting up posgres, it failed with the following error in the logs:

database system was shut down in recovery at 2018-03-08 17:52:10 ACDT
2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG:  entering 
standby mode
cp: cannot stat '/pg_backup/backup/archive /00000005.history': No such file or 
directory
cp: cannot stat '/pg_backup/backup/archive /00000005000001BB000000AB': No such 
file or directory
2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG:  consistent 
recovery state reached at 1BB/AB000098
2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG:  record with 
incorrect prev-link 1B9/73000040 at 1BB/AB000098
2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG:  database 
system is ready to accept read only connections
2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG:  started 
streaming WAL from primary at 1BB/AB000000 on timeline 5
2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG:  replication 
terminated by primary server
2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL:  End of 
WAL reached on timeline 5 at 1BB/AB000098.
cp: cannot stat '/pg_backup/backup/archive_sync/00000005000001BB000000AB': No 
such file or directory
2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG:  restarted 
WAL streaming at 1BB/AB000000 on timeline 5
2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG:  replication 
terminated by primary server
2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL:  End of 
WAL reached on timeline 5 at 1BB/AB000098.

On the new master in the /pg_backup/backup/archive folder I can see a file 
00000005000001BB000000AB.partial
Eg.
ls -l
-rw-------. 1 postgres postgres 16777216 Mar  8 16:48 
00000005000001BB000000AB.partial
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000AB
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000AC
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000AD
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000AE
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000AF
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000B0
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000B1
-rw-------. 1 postgres postgres 16777216 Mar  8 16:49 00000006000001BB000000B2
-rw-------. 1 postgres postgres 16777216 Mar  8 16:50 00000006000001BB000000B3
-rw-------. 1 postgres postgres 16777216 Mar  8 17:01 00000006000001BB000000B4
-rw-------. 1 postgres postgres 16777216 Mar  8 17:14 00000006000001BB000000B5
-rw-------. 1 postgres postgres      218 Mar  8 16:48 00000006.history

Any ideas?

Dylan

-----Original Message-----
From: Michael Paquier [mailto:[email protected]] 
Sent: Tuesday, 6 March 2018 5:55 PM
To: Dylan Luong <[email protected]>
Cc: pgsql-generallists.postgresql.org <[email protected]>
Subject: Re: Resync second slave to new master

On Tue, Mar 06, 2018 at 06:00:40AM +0000, Dylan Luong wrote:
> So everytime after promoting Slave to become master (either manually
> or automatic), just stop Slave2 and run pg_rewind on slave2 against
> the new maser (old slave1). And when old master server is available
> again, use pg_rewind on that serve as well against new master to
> return to original configuration.

Yes.  That's exactly the idea.  Running pg_rewind on the old master will
be necessary anyway because you need to stop it cleanly once, which will
cause it to generate WAL records at least for the shutdown checkpoint,
while doing it on slave 2 may be optional, still safer to do.
--
Michael

RE: Resync second slave to new master

Reply via email to