What are the parameters have you set in the recovery.conf file?
Regards, Ninad Shah On Fri, 20 Aug 2021 at 18:53, Hispaniola Sol <mois...@hotmail.com> wrote: > Team, > > I have a pg 10 cluster with a master and two hot-standby nodes. There is a > requirement for a manual failover (nodes switching the roles) at will. This > is a vanilla 3 node PG cluster that was built with WAL archiving (central > location) and streaming replication to two hot standby nodes. The failover > is scripted in Ansible. Ansible massages and moves around the > archive/restore scripts, the conf files and the trigger and calls ` > pg_ctlcluster` to start/stop. This part _seems_ to be doing the job fine. > > The issue I am struggling with is the apparent fragility of the process - > all 3 nodes will end up in a "good" state after the switch only every other > time. Other times I have to rebase the hot-standby from the new master with > pg_basebackup. It seems the issues are mostly with those nodes, ending up > as slaves after the roles switch runs. > They get errors like mismatch in timelines, recovering from the same WAL > over and over again, invalid resource manager ID in primary checkpoint > record, etc. > > In this light, I am wondering - using what's offered by PostgreSQL itself, > i.e. streaming WAL replication with log shipping - can I expect to have > this kind of failover 100% reliable on PG side ? Anyone is doing this > reliably on PostgreSQL 10.1x ? > > Thanks ! > > Moishe >