Re: [EXTERNAL] RE: PostgreSQL-12 replication. Check replication lag

2020-08-05 Thread Mariya Rampurawala
Hi Ahmed,

Can you please elaborate?

One way I found to check the time lag is this:


postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

What I want to understand is that, in case of replication link failure, there 
will still be inserts happening at the master node. In that case, how will the 
slave know if it is up-to-date?

Regards,
Mariya

From: "Ahmed, Nawaz (Fuji Xerox Australia)" 
Date: Wednesday, 5 August 2020 at 11:41 PM
To: Mariya Rampurawala , 
"pgsql-gene...@postgresql.org" 
Subject: [EXTERNAL] RE: PostgreSQL-12 replication. Check replication lag


Hi,

You could achieve that with a combination of those functions and querying the 
pg_stat_replication on either side of your setup.

Regards,
[Fuji Xerox Australia]
Nawaz Ahmed
Database Administrator

Fuji Xerox Australia Pty. Limited
8 Khartoum Rd, Macquarie Park NSW 2113 Australia
| P: 02 9856 5337 | E: 
nawaz.ah...@aus.fujixerox.com<mailto:nawaz.ah...@aus.fujixerox.com> | W: 
www.fujixerox.com.au<http://www.fujixerox.com.au> |
[http://xww.aus.xerox.com/staffdirectory/ebizcard/images/facebook.png]<https://www.facebook.com/FujiXeroxAustralia>
  [http://xww.aus.xerox.com/staffdirectory/ebizcard/images/twitter.png] 
<https://twitter.com/fujixeroxaus>   
[http://xww.aus.xerox.com/staffdirectory/ebizcard/images/linkedin.png] 
<http://www.linkedin.com/company/fuji-xerox-australia>   
[http://xww.aus.xerox.com/staffdirectory/ebizcard/images/youtube.png] 
<https://www.youtube.com/playlist?list=PL1624299D50FAA7C1>   
[http://xww.aus.xerox.com/staffdirectory/ebizcard/images/blog.png] 
<http://www.fujixerox.com.au/blog/>

[http://xww.aus.xerox.com/staffdirectory/ebizcard/images/feedback.png]<https://fujixerox.au1.qualtrics.com/jfe/form/SV_ehrj1apkwRRNKYZ>



Please think before you print and use duplex printing to preserve 
resources[Please think before you print and use duplex printing to preserve 
resources]<https://www.fujixerox.com.au/en/Company/Sustainability>
From: Mariya Rampurawala 
Sent: Wednesday, 5 August 2020 1:23 AM
To: pgsql-gene...@postgresql.org
Subject: PostgreSQL-12 replication. Check replication lag

Hi,

I am working on providing HA for replication, using automation scripts.
My set up consists of two nodes, Master and Slave. When master fails, The slave 
is promoted to master.

In case of sync replication, we do not check if the data is up-to-date on slave.
But in case of async replication, how can we check if the slave is up-to-date.

One way I found from the blogs is to compare the values of 
pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn()

If these values are same, can we say that the slave is up-to-date?

Both these values I will fetch from the slave node. How will the slave ensure 
it is up-to-date, if the network between master and slave is broken?

Regards,
Mariya


IMPORTANT NOTE: Fuji Xerox email transmission, including any attachments, is 
private and confidential and may contain legally privileged information. It is 
for the addressee's attention only. If you are not the intended recipient and 
have received this transmission, you must not use, edit, print, copy or 
disclose its contents to any person or disseminate the information contained 
herein or hereto attached, and you must notify sender immediately by return 
email and delete this transmission from your system. Any confidentiality, 
privilege or copyright is not waived or lost because this e-mail has been sent 
to you in error. We have used reasonable efforts to protect this transmission 
from computer viruses and other malicious software, but no warranty is made and 
the sender takes no responsibility for any loss or damage incurred from using 
this email or the information contained in this email.


PostgreSQL-12 replication. Check replication lag

2020-08-04 Thread Mariya Rampurawala
Hi,

I am working on providing HA for replication, using automation scripts.
My set up consists of two nodes, Master and Slave. When master fails, The slave 
is promoted to master.

In case of sync replication, we do not check if the data is up-to-date on slave.
But in case of async replication, how can we check if the slave is up-to-date.

One way I found from the blogs is to compare the values of 
pg_last_wal_receive_lsn() and pg_last_wal_replay_lsn()

If these values are same, can we say that the slave is up-to-date?

Both these values I will fetch from the slave node. How will the slave ensure 
it is up-to-date, if the network between master and slave is broken?

Regards,
Mariya



Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

2020-05-13 Thread Mariya Rampurawala
Thank you Kyotaro and Laurenz for your quick responses.
This helped me get my setup working.

Regards,
Mariya

On 13/05/20, 6:51 AM, "Kyotaro Horiguchi"  wrote:

At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe  
wrote in 
> On Tue, 2020-05-12 at 09:40 +, Mariya Rampurawala wrote:
> > > but if the target cluster ran for a long time after the divergence,
> > > the old WAL files might no longer be present. In that case, they can
> > > be manually copied from the WAL archive to the pg_wal directory, or
> > > fetched on startup by configuring primary_conninfo or restore_command.
> > 
> > I hit this issue every time I follow the aforementioned steps, manually 
as well as with scripts.
> > How long is "long time after divergence"? Is there a way I can make some
> > configuration changes so that I donʼt hit this issue?
> > Is there anything I must change in my restore command?

As mentioned in the documentation, pg_rewind uses the WAL records
startng from the last checkpoint just before the divergence point. The
divergence point is shown as the follows in your log messages.

> pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53

pg_rewind scans backward starting from that location to find a
checkpoint record, which is the oldest WAL record pg_rewind needs.  As
you see it is not a matter of calculation.  There's no other way than
actually restoring WAL segment files to read and try finding.

> What you can do is to use a higher value for "wal_keep_segments".
> Then PostgreSQL will keep around that number of old WAL segments,
> which increases the chance for "pg_rewind" to succeed.

So this is one effective way to reduce the chance to lose required WAL
(segment) files.

On PG12, an easy way to automatically restore all required WAL files
would be restoring the WAL file every time pg_rewind complains that it
is missing.

Or, you could use pg_waldump to find a checkpoint record.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

2020-05-12 Thread Mariya Rampurawala
Hi,

Thank you for the response.

> but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can
> be manually copied from the WAL archive to the pg_wal directory, or
> fetched on startup by configuring primary_conninfo or restore_command.

I hit this issue every time I follow the aforementioned steps, manually as well 
as with scripts.
How long is "long time after divergence"? Is there a way I can make some 
configuration changes so that I don’t hit this issue?
Is there anything I must change in my restore command?

===
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer 
sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
restore_command = 'scp  root@10.209.56.88:/pg_backup/%f %p'
===

Regards,
Mariya

On 12/05/20, 2:15 PM, "Kyotaro Horiguchi"  wrote:

Hello.

At Tue, 12 May 2020 06:32:30 +, Mariya Rampurawala 
 wrote in 
> I am working on providing HA for replication, using automation scripts.
> My set up consists on two nodes, Master and Slave. When master fails, The 
slave is promoted to master. But when I try to re-register the old master as 
slave, the pg_rewind command fails. Details below.
...
>   1.  Rewind again:
>   2.  -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data 
--source-server="host=10.209.57.17  port=5432 user=postgres dbname=postgres"
> 
> pg_rewind: servers diverged at WAL location 6/B9D8 on timeline 53
> 
> pg_rewind: error: could not open file 
"/pg_mnt/pg-12/data/pg_wal/0035000600B9": No such file or directory
> 
> pg_rewind: fatal: could not find previous WAL record at 6/B9D8
> 
> 
> I have tried this multiple times but always face the same error. Can 
someone help me resolve this?

As the error message is saying, required WAL file has been removed on
the old master.  It is the normal behavior and described in the
documentation.

https://www.postgresql.org/docs/12/app-pgrewind.html

> but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can
> be manually copied from the WAL archive to the pg_wal directory, or
> fetched on startup by configuring primary_conninfo or restore_command.

So you seem to need to restore the required WAL files from archive or
the current master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PostgreSQL-12 replication failover, pg_rewind fails

2020-05-12 Thread Mariya Rampurawala
Hello,
Can someone please help me with the below query?

Regards,
Mariya

From: Mariya Rampurawala 
Date: Sunday, 10 May 2020 at 2:55 PM
To: "pgsql-general@lists.postgresql.org" , 
"pgsql-gene...@postgresql.org" 
Subject: PostgreSQL-12 replication failover, pg_rewind fails

Hi,

I am working on providing HA for replication, using automation scripts.
My set up consists on two nodes, Master and Slave. When master fails, The slave 
is promoted to master. But when I try to re-register the old master as slave, 
the pg_rewind command fails. Details below.

Non-default entries in Postgresql.conf

===
archive_mode = on
archive_command = 'scp %p root@10.209.56.88:/pg_backup/%f'
archive_cleanup_command = '/bin/ssh root@10.209.56.88 
"/usr/pgsql-12/bin/pg_archivecleanup /pg_backup %r"'

synchronous_standby_names = '*'

listen_addresses = '*'

wal_level = hot_standby

max_wal_senders = 32

max_replication_slots = 8

hot_standby = on

wal_log_hints = on

synchronous_commit = on

hot_standby_feedback = on

restart_after_crash = off

max_connections = 100

===



Postgresql.auto.conf at standby:

===
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer 
sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
restore_command = 'scp  root@10.209.56.88:/pg_backup/%f %p'
recovery_target_timeline='latest'

===





Reproduction steps:

  1.  Start master:

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/  start

waiting for server to start2020-05-11 05:26:01.665 IST [23999] LOG:  
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit

2020-05-11 05:26:01.665 IST [23999] LOG:  listening on IPv4 address "0.0.0.0", 
port 5432

2020-05-11 05:26:01.665 IST [23999] LOG:  listening on IPv6 address "::", port 
5432

2020-05-11 05:26:01.666 IST [23999] LOG:  listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5432"

2020-05-11 05:26:01.668 IST [23999] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"

2020-05-11 05:26:01.688 IST [23999] LOG:  redirecting log output to logging 
collector process

2020-05-11 05:26:01.688 IST [23999] HINT:  Future log output will appear in 
directory "log".

done

server started

  1.  Basebackup on slave:

/usr/pgsql-12/bin/pg_basebackup -D /pg_mnt/pg-12/data -X stream -R -c fast -U 
replicator --host=10.209.57.16 --port=5432

  1.  Copy postgresql.auto.conf to slave data directory and touch standby.signal
  2.  Start slave

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data start

waiting for server to start2020-05-11 05:29:09.422 IST [22624] LOG:  
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit

2020-05-11 05:29:09.422 IST [22624] LOG:  listening on IPv4 address "0.0.0.0", 
port 5432

2020-05-11 05:29:09.422 IST [22624] LOG:  listening on IPv6 address "::", port 
5432

2020-05-11 05:29:09.423 IST [22624] LOG:  listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5432"

2020-05-11 05:29:09.425 IST [22624] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"

2020-05-11 05:29:09.442 IST [22624] LOG:  redirecting log output to logging 
collector process

2020-05-11 05:29:09.442 IST [22624] HINT:  Future log output will appear in 
directory "log".

.. done

server started

  1.  Check replication status on master:

-bash-4.2$ psql

psql (12.2)

Type "help" for help.



postgres=# \x

Expanded display is on.

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]+-

pid  | 24988

usesysid | 16385

usename  | replicator

application_name | walreceiver

client_addr  | 10.209.57.14

client_hostname  |

client_port  | 40802

backend_start| 2020-05-11 05:29:14.752341+05:30

backend_xmin | 588

state| streaming

sent_lsn | 5/A6D8

write_lsn| 5/A6D8

flush_lsn| 5/A6D8

replay_lsn   | 5/A6D8

write_lag|

flush_lag|

replay_lag   |

sync_priority| 1

sync_state   | sync

reply_time   | 2020-05-11 05:29:54.895857+05:30



  1.  From a third node write large data on master
#psql -h 10.209.57.16 --user=postgres
psql (12.2)
Type "help" for help.

postgres=# \c vcs
You are now connected to database "vcs" as user "postgres".
vcs=# create table c1 as select generate_series(1,100);


  1.  Kill postgres processes on master:
-bash-4.2$ ps -ef | grep post
root  1860 18990  0 May08 pts/100:00:00 vim postgresql.cluster.conf
root  1925 14243  0 Apr07 pts/000:00:00 su - postgres
postgres  1926  1925  0 Apr07 pts/000:00:00 -bash

PostgreSQL-12 replication failover, pg_rewind fails

2020-05-10 Thread Mariya Rampurawala
Hi,

I am working on providing HA for replication, using automation scripts.
My set up consists on two nodes, Master and Slave. When master fails, The slave 
is promoted to master. But when I try to re-register the old master as slave, 
the pg_rewind command fails. Details below.

Non-default entries in Postgresql.conf

===
archive_mode = on
archive_command = 'scp %p root@10.209.56.88:/pg_backup/%f'
archive_cleanup_command = '/bin/ssh root@10.209.56.88 
"/usr/pgsql-12/bin/pg_archivecleanup /pg_backup %r"'

synchronous_standby_names = '*'

listen_addresses = '*'

wal_level = hot_standby

max_wal_senders = 32

max_replication_slots = 8

hot_standby = on

wal_log_hints = on

synchronous_commit = on

hot_standby_feedback = on

restart_after_crash = off

max_connections = 100

===



Postgresql.auto.conf at standby:

===
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer 
sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
restore_command = 'scp  root@10.209.56.88:/pg_backup/%f %p'
recovery_target_timeline='latest'

===





Reproduction steps:

  1.  Start master:

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/  start

waiting for server to start2020-05-11 05:26:01.665 IST [23999] LOG:  
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit

2020-05-11 05:26:01.665 IST [23999] LOG:  listening on IPv4 address "0.0.0.0", 
port 5432

2020-05-11 05:26:01.665 IST [23999] LOG:  listening on IPv6 address "::", port 
5432

2020-05-11 05:26:01.666 IST [23999] LOG:  listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5432"

2020-05-11 05:26:01.668 IST [23999] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"

2020-05-11 05:26:01.688 IST [23999] LOG:  redirecting log output to logging 
collector process

2020-05-11 05:26:01.688 IST [23999] HINT:  Future log output will appear in 
directory "log".

done

server started

  1.  Basebackup on slave:

/usr/pgsql-12/bin/pg_basebackup -D /pg_mnt/pg-12/data -X stream -R -c fast -U 
replicator --host=10.209.57.16 --port=5432

  1.  Copy postgresql.auto.conf to slave data directory and touch standby.signal
  2.  Start slave

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data start

waiting for server to start2020-05-11 05:29:09.422 IST [22624] LOG:  
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit

2020-05-11 05:29:09.422 IST [22624] LOG:  listening on IPv4 address "0.0.0.0", 
port 5432

2020-05-11 05:29:09.422 IST [22624] LOG:  listening on IPv6 address "::", port 
5432

2020-05-11 05:29:09.423 IST [22624] LOG:  listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5432"

2020-05-11 05:29:09.425 IST [22624] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"

2020-05-11 05:29:09.442 IST [22624] LOG:  redirecting log output to logging 
collector process

2020-05-11 05:29:09.442 IST [22624] HINT:  Future log output will appear in 
directory "log".

.. done

server started

  1.  Check replication status on master:

-bash-4.2$ psql

psql (12.2)

Type "help" for help.



postgres=# \x

Expanded display is on.

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]+-

pid  | 24988

usesysid | 16385

usename  | replicator

application_name | walreceiver

client_addr  | 10.209.57.14

client_hostname  |

client_port  | 40802

backend_start| 2020-05-11 05:29:14.752341+05:30

backend_xmin | 588

state| streaming

sent_lsn | 5/A6D8

write_lsn| 5/A6D8

flush_lsn| 5/A6D8

replay_lsn   | 5/A6D8

write_lag|

flush_lag|

replay_lag   |

sync_priority| 1

sync_state   | sync

reply_time   | 2020-05-11 05:29:54.895857+05:30



  1.  From a third node write large data on master
#psql -h 10.209.57.16 --user=postgres
psql (12.2)
Type "help" for help.

postgres=# \c vcs
You are now connected to database "vcs" as user "postgres".
vcs=# create table c1 as select generate_series(1,100);


  1.  Kill postgres processes on master:
-bash-4.2$ ps -ef | grep post
root  1860 18990  0 May08 pts/100:00:00 vim postgresql.cluster.conf
root  1925 14243  0 Apr07 pts/000:00:00 su - postgres
postgres  1926  1925  0 Apr07 pts/000:00:00 -bash
root  7321 1  0 Mar14 ?00:00:15 /usr/libexec/postfix/master -w
postfix   7350  7321  0 Mar14 ?00:00:03 qmgr -l -t unix -u
postfix  10410  7321  0 04:47 ?00:00:00 pickup -l -t unix -u
postgres 23999 1  0 05:26 ?00:00:00 /usr/pgsql-12/bin/postgres -D 
/pg_mnt/pg-12/data
postgres 24000 23999  0 05:26 ?00:00:00 postgres: logger
postgres 24002 23999  0 05:26 ?00:00:00 postgres: 

Wal receiver process listens to physical IP

2020-03-20 Thread Mariya Rampurawala
Hi,

I have been working on a project where I need to virtualize the postgresql 
standby server.
The wal receiver process listens to the Physical IP and a random port.
How can I configure it to listen to a VIP that the wal sender process can 
connect to?
Also, Is there a way I can configure the port number for this?


$ /usr/pgsql-12/bin/postgres -V

postgres (PostgreSQL) 12.2


postgresql.conf:

listen_addresses = '10.209.57.17'  • This is the VIP



$ ps -ef | grep post
postgres  2945 1  0 05:37 ?00:00:00 /usr/pgsql-12/bin/postgres -D 
/pg_mnt/pg-12/data -h 10.209.57.17
postgres  2946  2945  0 05:37 ?00:00:00 postgres: logger
postgres  2947  2945  0 05:37 ?00:00:00 postgres: startup   recovering 
00010007
postgres  2952  2945  0 05:37 ?00:00:00 postgres: checkpointer
postgres  2953  2945  0 05:37 ?00:00:00 postgres: background writer
postgres  2954  2945  0 05:37 ?00:00:00 postgres: stats collector
postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver





$ netstat -anp | grep 55656
tcp0  0 10.209.57.15:55656  10.209.57.16:5432   ESTABLISHED 
2955/postgres: walr





10.209.57.15 is the physical IP


$ ps -ef | grep 2955
postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver


Regards,
Mariya