Re: Manual failover cluster

Ninad Shah Mon, 23 Aug 2021 09:13:23 -0700

What are the parameters have you set in the recovery.conf file?


Regards,
Ninad Shah

On Fri, 20 Aug 2021 at 18:53, Hispaniola Sol <[email protected]> wrote:

> Team,
>
> I have a pg 10 cluster with a master and two hot-standby nodes. There is a
> requirement for a manual failover (nodes switching the roles) at will. This
> is a vanilla 3 node PG cluster that was built with WAL archiving (central
> location) and streaming replication to two hot standby nodes.  The failover
> is scripted in Ansible. Ansible massages and moves around the
> archive/restore scripts, the conf files and the trigger and calls `
> pg_ctlcluster` to start/stop. This part _seems_ to be doing the job fine.
>
> The issue I am struggling with is the apparent fragility of the process -
> all 3 nodes will end up in a "good" state after the switch only every other
> time. Other times I have to rebase the hot-standby from the new master with
> pg_basebackup. It seems the issues are mostly with those nodes, ending up
> as slaves after the roles switch runs.
> They get errors like mismatch in timelines, recovering from the same WAL
> over and over again, invalid resource manager ID in primary checkpoint
> record, etc.
>
> In this light, I am wondering - using what's offered by PostgreSQL itself,
> i.e. streaming WAL replication with log shipping - can I expect to have
> this kind of failover 100% reliable on PG side ? Anyone is doing this
> reliably on PostgreSQL 10.1x ?
>
> Thanks !
>
> Moishe
>

Re: Manual failover cluster

Reply via email to