On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 01/13/2015 12:11 PM, Vladimir Borodin wrote: > >> >> 05 янв. 2015 г., в 18:15, Vladimir Borodin <r...@simply.name> написал(а): >> >> Hi all. >>> >>> I have a simple script for planned switchover of PostgreSQL (9.3 and >>> 9.4) master to one of its replicas. This script checks a lot of things >>> before doing it and one of them is that all data from master has been >>> received by replica that is going to be promoted. Right now the check is >>> done like below: >>> >>> On the master: >>> >>> postgres@pgtest03d ~ $ psql -t -A -c 'select >>> pg_current_xlog_location();' >>> 0/33000090 >>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast >>> waiting for server to shut down.... done >>> server stopped >>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head >>> pg_control version number: 937 >>> Catalog version number: 201306121 >>> Database system identifier: 6061800518091528182 >>> Database cluster state: shut down >>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK >>> Latest checkpoint location: 0/34000028 >>> Prior checkpoint location: 0/33000028 >>> Latest checkpoint's REDO location: 0/34000028 >>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034 >>> Latest checkpoint's TimeLineID: 27 >>> postgres@pgtest03d ~ $ >>> >>> On the replica (after shutdown of master): >>> >>> postgres@pgtest03g ~ $ psql -t -A -c "select >>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');" >>> 104 >>> postgres@pgtest03g ~ $ >>> >>> These 104 bytes seems to be the size of shutdown checkpoint record (as I >>> can understand from pg_xlogdump output). >>> >>> postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump >>> -s 0/33000090 -t 27 >>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn: >>> 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch >>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: >>> 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028; >>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; >>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; >>> shutdown >>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero >>> length at 0/34000090 >>> >>> postgres@pgtest03g ~/9.3/data/pg_xlog $ >>> >>> I’m not sure that these 104 bytes will always be 104 bytes to have a >>> strict equality while checking. Could it change in the future? Or is there >>> a better way to understand that streaming replica received all data after >>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes >>> seems a bit strange. >>> >> > Don't rely on it being 104 bytes. It can vary across versions, and across > different architectures. > > You could simply check that the standby's pg_last_xlog_replay_location() > > master's "Latest checkpoint location", and not care about the exact > difference. > > > I believe there were some changes made in v9.3 which will wait for pending WALs to be replicated before a fast and smart shutdown (of master) can close the replication connection. http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459