Re: [EXTERNAL] RE: PostgreSQL-12 replication. Check replication lag
Hi Ahmed, Can you please elaborate? One way I found to check the time lag is this: postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay; What I want to understand is that, in case of replication link failure, there will still be inserts happening at the master node. In that case, how will the slave know if it is up-to-date? Regards, Mariya From: "Ahmed, Nawaz (Fuji Xerox Australia)" Date: Wednesday, 5 August 2020 at 11:41 PM To: Mariya Rampurawala , "pgsql-gene...@postgresql.org" Subject: [EXTERNAL] RE: PostgreSQL-12 replication. Check replication lag Hi, You could achieve that with a combination of those functions and querying the pg_stat_replication on either side of your setup. Regards, [Fuji Xerox Australia] Nawaz Ahmed Database Administrator Fuji Xerox Australia Pty. Limited 8 Khartoum Rd, Macquarie Park NSW 2113 Australia | P: 02 9856 5337 | E: nawaz.ah...@aus.fujixerox.com<mailto:nawaz.ah...@aus.fujixerox.com> | W: www.fujixerox.com.au<http://www.fujixerox.com.au> | [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/facebook.png]<https://www.facebook.com/FujiXeroxAustralia> [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/twitter.png] <https://twitter.com/fujixeroxaus> [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/linkedin.png] <http://www.linkedin.com/company/fuji-xerox-australia> [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/youtube.png] <https://www.youtube.com/playlist?list=PL1624299D50FAA7C1> [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/blog.png] <http://www.fujixerox.com.au/blog/> [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/feedback.png]<https://fujixerox.au1.qualtrics.com/jfe/form/SV_ehrj1apkwRRNKYZ> Please think before you print and use duplex printing to preserve resources[Please think before you print and use duplex printing to preserve resources]<https://www.fujixerox.com.au/en/Company/Sustainability> From: Mariya Rampurawala Sent: Wednesday, 5 August 2020 1:23 AM To: pgsql-gene...@postgresql.org Subject: PostgreSQL-12 replication. Check replication lag Hi, I am working on providing HA for replication, using automation scripts. My set up consists of two nodes, Master and Slave. When master fails, The slave is promoted to master. In case of sync replication, we do not check if the data is up-to-date on slave. But in case of async replication, how can we check if the slave is up-to-date. One way I found from the blogs is to compare the values of pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() If these values are same, can we say that the slave is up-to-date? Both these values I will fetch from the slave node. How will the slave ensure it is up-to-date, if the network between master and slave is broken? Regards, Mariya IMPORTANT NOTE: Fuji Xerox email transmission, including any attachments, is private and confidential and may contain legally privileged information. It is for the addressee's attention only. If you are not the intended recipient and have received this transmission, you must not use, edit, print, copy or disclose its contents to any person or disseminate the information contained herein or hereto attached, and you must notify sender immediately by return email and delete this transmission from your system. Any confidentiality, privilege or copyright is not waived or lost because this e-mail has been sent to you in error. We have used reasonable efforts to protect this transmission from computer viruses and other malicious software, but no warranty is made and the sender takes no responsibility for any loss or damage incurred from using this email or the information contained in this email.
PostgreSQL-12 replication. Check replication lag
Hi, I am working on providing HA for replication, using automation scripts. My set up consists of two nodes, Master and Slave. When master fails, The slave is promoted to master. In case of sync replication, we do not check if the data is up-to-date on slave. But in case of async replication, how can we check if the slave is up-to-date. One way I found from the blogs is to compare the values of pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn() If these values are same, can we say that the slave is up-to-date? Both these values I will fetch from the slave node. How will the slave ensure it is up-to-date, if the network between master and slave is broken? Regards, Mariya
Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails
Thank you Kyotaro and Laurenz for your quick responses. This helped me get my setup working. Regards, Mariya On 13/05/20, 6:51 AM, "Kyotaro Horiguchi" wrote: At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe wrote in > On Tue, 2020-05-12 at 09:40 +, Mariya Rampurawala wrote: > > > but if the target cluster ran for a long time after the divergence, > > > the old WAL files might no longer be present. In that case, they can > > > be manually copied from the WAL archive to the pg_wal directory, or > > > fetched on startup by configuring primary_conninfo or restore_command. > > > > I hit this issue every time I follow the aforementioned steps, manually as well as with scripts. > > How long is "long time after divergence"? Is there a way I can make some > > configuration changes so that I donʼt hit this issue? > > Is there anything I must change in my restore command? As mentioned in the documentation, pg_rewind uses the WAL records startng from the last checkpoint just before the divergence point. The divergence point is shown as the follows in your log messages. > pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53 pg_rewind scans backward starting from that location to find a checkpoint record, which is the oldest WAL record pg_rewind needs. As you see it is not a matter of calculation. There's no other way than actually restoring WAL segment files to read and try finding. > What you can do is to use a higher value for "wal_keep_segments". > Then PostgreSQL will keep around that number of old WAL segments, > which increases the chance for "pg_rewind" to succeed. So this is one effective way to reduce the chance to lose required WAL (segment) files. On PG12, an easy way to automatically restore all required WAL files would be restoring the WAL file every time pg_rewind complains that it is missing. Or, you could use pg_waldump to find a checkpoint record. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails
Hi, Thank you for the response. > but if the target cluster ran for a long time after the divergence, > the old WAL files might no longer be present. In that case, they can > be manually copied from the WAL archive to the pg_wal directory, or > fetched on startup by configuring primary_conninfo or restore_command. I hit this issue every time I follow the aforementioned steps, manually as well as with scripts. How long is "long time after divergence"? Is there a way I can make some configuration changes so that I don’t hit this issue? Is there anything I must change in my restore command? === primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any' restore_command = 'scp root@10.209.56.88:/pg_backup/%f %p' === Regards, Mariya On 12/05/20, 2:15 PM, "Kyotaro Horiguchi" wrote: Hello. At Tue, 12 May 2020 06:32:30 +, Mariya Rampurawala wrote in > I am working on providing HA for replication, using automation scripts. > My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I try to re-register the old master as slave, the pg_rewind command fails. Details below. ... > 1. Rewind again: > 2. -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data --source-server="host=10.209.57.17 port=5432 user=postgres dbname=postgres" > > pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53 > > pg_rewind: error: could not open file "/pg_mnt/pg-12/data/pg_wal/0035000600B9": No such file or directory > > pg_rewind: fatal: could not find previous WAL record at 6/B9D8 > > > I have tried this multiple times but always face the same error. Can someone help me resolve this? As the error message is saying, required WAL file has been removed on the old master. It is the normal behavior and described in the documentation. https://www.postgresql.org/docs/12/app-pgrewind.html > but if the target cluster ran for a long time after the divergence, > the old WAL files might no longer be present. In that case, they can > be manually copied from the WAL archive to the pg_wal directory, or > fetched on startup by configuring primary_conninfo or restore_command. So you seem to need to restore the required WAL files from archive or the current master. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PostgreSQL-12 replication failover, pg_rewind fails
Hello, Can someone please help me with the below query? Regards, Mariya From: Mariya Rampurawala Date: Sunday, 10 May 2020 at 2:55 PM To: "pgsql-general@lists.postgresql.org" , "pgsql-gene...@postgresql.org" Subject: PostgreSQL-12 replication failover, pg_rewind fails Hi, I am working on providing HA for replication, using automation scripts. My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I try to re-register the old master as slave, the pg_rewind command fails. Details below. Non-default entries in Postgresql.conf === archive_mode = on archive_command = 'scp %p root@10.209.56.88:/pg_backup/%f' archive_cleanup_command = '/bin/ssh root@10.209.56.88 "/usr/pgsql-12/bin/pg_archivecleanup /pg_backup %r"' synchronous_standby_names = '*' listen_addresses = '*' wal_level = hot_standby max_wal_senders = 32 max_replication_slots = 8 hot_standby = on wal_log_hints = on synchronous_commit = on hot_standby_feedback = on restart_after_crash = off max_connections = 100 === Postgresql.auto.conf at standby: === primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any' restore_command = 'scp root@10.209.56.88:/pg_backup/%f %p' recovery_target_timeline='latest' === Reproduction steps: 1. Start master: -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/ start waiting for server to start2020-05-11 05:26:01.665 IST [23999] LOG: starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:26:01.665 IST [23999] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:26:01.665 IST [23999] LOG: listening on IPv6 address "::", port 5432 2020-05-11 05:26:01.666 IST [23999] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:26:01.668 IST [23999] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:26:01.688 IST [23999] LOG: redirecting log output to logging collector process 2020-05-11 05:26:01.688 IST [23999] HINT: Future log output will appear in directory "log". done server started 1. Basebackup on slave: /usr/pgsql-12/bin/pg_basebackup -D /pg_mnt/pg-12/data -X stream -R -c fast -U replicator --host=10.209.57.16 --port=5432 1. Copy postgresql.auto.conf to slave data directory and touch standby.signal 2. Start slave -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data start waiting for server to start2020-05-11 05:29:09.422 IST [22624] LOG: starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:29:09.422 IST [22624] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:29:09.422 IST [22624] LOG: listening on IPv6 address "::", port 5432 2020-05-11 05:29:09.423 IST [22624] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:29:09.425 IST [22624] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:29:09.442 IST [22624] LOG: redirecting log output to logging collector process 2020-05-11 05:29:09.442 IST [22624] HINT: Future log output will appear in directory "log". .. done server started 1. Check replication status on master: -bash-4.2$ psql psql (12.2) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]+- pid | 24988 usesysid | 16385 usename | replicator application_name | walreceiver client_addr | 10.209.57.14 client_hostname | client_port | 40802 backend_start| 2020-05-11 05:29:14.752341+05:30 backend_xmin | 588 state| streaming sent_lsn | 5/A6D8 write_lsn| 5/A6D8 flush_lsn| 5/A6D8 replay_lsn | 5/A6D8 write_lag| flush_lag| replay_lag | sync_priority| 1 sync_state | sync reply_time | 2020-05-11 05:29:54.895857+05:30 1. From a third node write large data on master #psql -h 10.209.57.16 --user=postgres psql (12.2) Type "help" for help. postgres=# \c vcs You are now connected to database "vcs" as user "postgres". vcs=# create table c1 as select generate_series(1,100); 1. Kill postgres processes on master: -bash-4.2$ ps -ef | grep post root 1860 18990 0 May08 pts/100:00:00 vim postgresql.cluster.conf root 1925 14243 0 Apr07 pts/000:00:00 su - postgres postgres 1926 1925 0 Apr07 pts/000:00:00 -bash
PostgreSQL-12 replication failover, pg_rewind fails
Hi, I am working on providing HA for replication, using automation scripts. My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I try to re-register the old master as slave, the pg_rewind command fails. Details below. Non-default entries in Postgresql.conf === archive_mode = on archive_command = 'scp %p root@10.209.56.88:/pg_backup/%f' archive_cleanup_command = '/bin/ssh root@10.209.56.88 "/usr/pgsql-12/bin/pg_archivecleanup /pg_backup %r"' synchronous_standby_names = '*' listen_addresses = '*' wal_level = hot_standby max_wal_senders = 32 max_replication_slots = 8 hot_standby = on wal_log_hints = on synchronous_commit = on hot_standby_feedback = on restart_after_crash = off max_connections = 100 === Postgresql.auto.conf at standby: === primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any' restore_command = 'scp root@10.209.56.88:/pg_backup/%f %p' recovery_target_timeline='latest' === Reproduction steps: 1. Start master: -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/ start waiting for server to start2020-05-11 05:26:01.665 IST [23999] LOG: starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:26:01.665 IST [23999] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:26:01.665 IST [23999] LOG: listening on IPv6 address "::", port 5432 2020-05-11 05:26:01.666 IST [23999] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:26:01.668 IST [23999] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:26:01.688 IST [23999] LOG: redirecting log output to logging collector process 2020-05-11 05:26:01.688 IST [23999] HINT: Future log output will appear in directory "log". done server started 1. Basebackup on slave: /usr/pgsql-12/bin/pg_basebackup -D /pg_mnt/pg-12/data -X stream -R -c fast -U replicator --host=10.209.57.16 --port=5432 1. Copy postgresql.auto.conf to slave data directory and touch standby.signal 2. Start slave -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data start waiting for server to start2020-05-11 05:29:09.422 IST [22624] LOG: starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:29:09.422 IST [22624] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:29:09.422 IST [22624] LOG: listening on IPv6 address "::", port 5432 2020-05-11 05:29:09.423 IST [22624] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:29:09.425 IST [22624] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:29:09.442 IST [22624] LOG: redirecting log output to logging collector process 2020-05-11 05:29:09.442 IST [22624] HINT: Future log output will appear in directory "log". .. done server started 1. Check replication status on master: -bash-4.2$ psql psql (12.2) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]+- pid | 24988 usesysid | 16385 usename | replicator application_name | walreceiver client_addr | 10.209.57.14 client_hostname | client_port | 40802 backend_start| 2020-05-11 05:29:14.752341+05:30 backend_xmin | 588 state| streaming sent_lsn | 5/A6D8 write_lsn| 5/A6D8 flush_lsn| 5/A6D8 replay_lsn | 5/A6D8 write_lag| flush_lag| replay_lag | sync_priority| 1 sync_state | sync reply_time | 2020-05-11 05:29:54.895857+05:30 1. From a third node write large data on master #psql -h 10.209.57.16 --user=postgres psql (12.2) Type "help" for help. postgres=# \c vcs You are now connected to database "vcs" as user "postgres". vcs=# create table c1 as select generate_series(1,100); 1. Kill postgres processes on master: -bash-4.2$ ps -ef | grep post root 1860 18990 0 May08 pts/100:00:00 vim postgresql.cluster.conf root 1925 14243 0 Apr07 pts/000:00:00 su - postgres postgres 1926 1925 0 Apr07 pts/000:00:00 -bash root 7321 1 0 Mar14 ?00:00:15 /usr/libexec/postfix/master -w postfix 7350 7321 0 Mar14 ?00:00:03 qmgr -l -t unix -u postfix 10410 7321 0 04:47 ?00:00:00 pickup -l -t unix -u postgres 23999 1 0 05:26 ?00:00:00 /usr/pgsql-12/bin/postgres -D /pg_mnt/pg-12/data postgres 24000 23999 0 05:26 ?00:00:00 postgres: logger postgres 24002 23999 0 05:26 ?00:00:00 postgres:
Wal receiver process listens to physical IP
Hi, I have been working on a project where I need to virtualize the postgresql standby server. The wal receiver process listens to the Physical IP and a random port. How can I configure it to listen to a VIP that the wal sender process can connect to? Also, Is there a way I can configure the port number for this? $ /usr/pgsql-12/bin/postgres -V postgres (PostgreSQL) 12.2 postgresql.conf: listen_addresses = '10.209.57.17' • This is the VIP $ ps -ef | grep post postgres 2945 1 0 05:37 ?00:00:00 /usr/pgsql-12/bin/postgres -D /pg_mnt/pg-12/data -h 10.209.57.17 postgres 2946 2945 0 05:37 ?00:00:00 postgres: logger postgres 2947 2945 0 05:37 ?00:00:00 postgres: startup recovering 00010007 postgres 2952 2945 0 05:37 ?00:00:00 postgres: checkpointer postgres 2953 2945 0 05:37 ?00:00:00 postgres: background writer postgres 2954 2945 0 05:37 ?00:00:00 postgres: stats collector postgres 2955 2945 0 05:37 ?00:00:04 postgres: walreceiver $ netstat -anp | grep 55656 tcp0 0 10.209.57.15:55656 10.209.57.16:5432 ESTABLISHED 2955/postgres: walr 10.209.57.15 is the physical IP $ ps -ef | grep 2955 postgres 2955 2945 0 05:37 ?00:00:04 postgres: walreceiver Regards, Mariya