How to Implement DR for a Production PostgreSQL v12.8 database cluster with repmgr & PgBouncer?

Hilbert, Karin Fri, 12 Nov 2021 15:56:08 -0800

I manage PostgreSQL v12.8 database clusters.
Our database clusters are on Linux VMs, with OS:
  Flavor: redhat_7
  Release: 3.10.0-1160.45.1.el7.x86_64


We have repmgr clusters of 1 Primary & 2 Standby servers & use another server 
with PgBouncer to direct the connections to the current Primary.

The PostgreSQL servers have asynchronous replication & use repmgr to handle 
automatic failovers.
Any failovers have always promoted the 1st Standby server in the cluster.
We did have one time where the newly promoted server almost immediately went 
down & the 2nd Standby was promoted.
All three servers in the cluster currently have a priority of 100.

I've been requested to set up DR for the Production cluster.  My Google 
searches haven't found anything describing how DR should be implemented.  I 
have seen that there should always be an odd number of servers in the cluster.

My thoughts are to have a new VM created in our DR datacenter that will replace 
the 2nd Standby server in the cluster.  I'm thinking that the DR server would 
have a lower priority in the repmgr.conf file (50 instead of 100), since we 
would want the local Standby to be promoted first.  We have failover configured 
to be automatic, but it waits 80 seconds before promoting a Standby (to avoid a 
premature failover due to network flapping).

Is there any reason to change the failover configuration to manual for DR?  I 
would think in a DR situation we would want it to be automatic, but my 
colleague disagrees.  I'm thinking that if the Primary & the 1st Standby both 
go down, even if it isn't a real "DR" situation, we would still want it to 
promote to the 3rd (DR) server to prevent an outage.  Our failover script 
performs post-promote tasks, including redirecting the PgBouncer server to 
point to the new Primary, so it shouldn't matter if it was pointing to the DR 
server or a local server.

If we do decide to make failover manual, can the configuration specify 
automatic for the local servers & manual for the DR server, or is it an all or 
nothing type of configuration?

I guess another strategy could be to configure it as a stand-alone server & 
leave our current cluster intact.  Then copy all the backup & WAL files to the 
DR server, but then in a DR situation we would have to perform the restore 
before the databases could be available.

Any advice on a good DR strategy would be appreciated.
Thanks,

Karin Hilbert

How to Implement DR for a Production PostgreSQL v12.8 database cluster with repmgr & PgBouncer?

Reply via email to