Right, so I may have been too fast to give up. I set maintenance mode back on and promoted ph-sql-04 manually. Unfortunately I don't have the logs of ph-sql-03 anymore because I reinitialized it.
You mention that demote timeout should be start timeout + stop timeout. Start/stop are 60s, so that would mean 120s for demote timeout? Or 30s for start/stop? On Fri, 14 Jun 2019 at 15:55, Jehan-Guillaume de Rorthais <[email protected]> wrote: > On Fri, 14 Jun 2019 13:18:09 +0200 > Tiemen Ruiten <[email protected]> wrote: > > > Thank you, useful advice! > > > > Logs are attached, they cover the period between when I set > > maintenance-mode=false till after the node fencing. > > Switchover started @ 09:51:43 > > In fact, the action that timed out was the demote action, not the stop > action: > > pgsqld_demote_0:31997 - timed out after 30000ms > > As explained, the demote is doing a stop/start because PgSQL doesn't > support hot > demotion. So your demote action should be stop timeout+start timeout. I > would > recommend 60s there instead of 30s. > > After Pacemaker decide what to do next, you had some more timeouts. I > supose > PgSQL logs should give some more explanation of what happen during these > long > minutes > > pgsqld_notify_0:37945 - timed out after 60000ms > ... > pgsqld_stop_0:7783 - timed out after 60000ms > > It is 09:54:16. Now pengine become angry and want to make sure pgsql is > stopped > on node 03: > > pengine: warning: unpack_rsc_op_failure: Processing failed stop of > pgsqld:1 > on ph-sql-03: unknown error | rc=1 > ... > pengine: warning: pe_fence_node: Cluster node ph-sql-03 will be > fenced: > pgsqld:1 failed there > ... > pengine: warning: stage6: Scheduling Node ph-sql-03 for STONITH > ... > pengine: notice: native_stop_constraints: Stop of failed resource > pgsqld:1 > is implicit after ph-sql-03 is fenced > > > From there node 03 is down for 9 minutes, it comes back at 10:02:59. > > Meanwhile, @ 09:54:29, node 5 took over the DC role and decided to promote > pgsql > on node 4 as expected. > > The pre-promote notify actions are triggered, but at 09:55:24, the > transition is canceled because of maintenance mode: > > Transition aborted by cib-bootstrap-options-maintenance-mode doing modify > maintenance-mode=true > > Soon after, both notify actions timed out on both nodes: > > warning: child_timeout_callback: pgsqld_notify_0 process (PID 38838) > timed > out > > Not sure what happen on your side that could explain these timeouts, but > because the cluster was in maintenance mode, there was a human interaction > ingoing anyway. > > > > > > > > On Fri, 14 Jun 2019 at 12:48, Jehan-Guillaume de Rorthais < > [email protected]> > > wrote: > > > > > Hi, > > > > > > On Fri, 14 Jun 2019 12:27:12 +0200 > > > Tiemen Ruiten <[email protected]> wrote: > > > > I setup a new 3-node PostgreSQL cluster with HA managed by PAF. > Nodes are > > > > named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing > > > > archive files to an NFS share that's mounted on all nodes using > > > pgBackRest. > > > > > > > > What I did: > > > > - Create a pacemaker cluster, cib.xml is attached. > > > > - Set maintenance-mode=true in pacemaker > > > > > > This is not required. Just build your PgSQL replication, shut down the > > > instances, then add the PAF resource to the cluster. > > > > > > But it's not very important here. > > > > > > > - Bring up ph-sql-03 with pg_ctl start > > > > - Take a pg_basebackup on ph-sql-04 and ph-sql-05 > > > > - Create a recovery.conf on ph-sql-04 and ph-sql-05: > > > > > > > > standby_mode = 'on' > > > > primary_conninfo = 'user=replication password=XXXXXXXXXXXXXXXX > > > > application_name=ph-sql-0x host=10.100.130.20 port=5432 > sslmode=prefer > > > > sslcompression=0 krbsrvname=postgres target_session_attrs=any' > > > > recovery_target_timeline = 'latest' > > > > restore_command = 'pgbackrest --stanza=pgdb2 archive-get %f "%p"' > > > > > > Sounds fine. > > > > > > > - Bring up ph-sql-04 and ph-sql-05 and let recovery finish > > > > - Set maintenance-mode=false in pacemaker > > > > - Cluster is now running with ph-sql-03 as master and ph-sql-04/5 > as > > > slaves > > > > At this point I tried a manual failover: > > > > - pcs resource move --wait --master pgsql-ha ph-sql-04 > > > > Contrary to my expectations, pacemaker attempted to stop psqld on > > > > ph-sql-03. > > > > > > Indeed. PostgreSQL doesn't support hot-demote. It has to be shut > downed and > > > started as a standby. > > > > > > > This took longer than the configured timeout of 60s (checkpoint > > > > hadn't completed yet) and the node was fenced. > > > > > > 60s of checkpoint during a maintenance window? That's important > indeed. I > > > would > > > command doing a manual checkpoint before triggering the > move/switchover. > > > > > > > Then I ended up with > > > > ph-sql-04 and ph-sql-05 both in slave mode and ph-sql-03 rebooting. > > > > > > > > Master: pgsql-ha > > > > Meta Attrs: notify=true > > > > Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms) > > > > Attributes: bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/11/data > > > > recovery_template=/var/lib/pgsql/recovery.conf.pcmk > > > > Operations: demote interval=0s timeout=30s > (pgsqld-demote-interval-0s) > > > > methods interval=0s timeout=5 > (pgsqld-methods-interval-0s) > > > > monitor interval=15s role=Master timeout=10s > > > > (pgsqld-monitor-interval-15s) > > > > monitor interval=16s role=Slave timeout=10s > > > > (pgsqld-monitor-interval-16s) > > > > notify interval=0s timeout=60s > (pgsqld-notify-interval-0s) > > > > promote interval=0s timeout=30s > > > (pgsqld-promote-interval-0s) > > > > reload interval=0s timeout=20 > (pgsqld-reload-interval-0s) > > > > start interval=0s timeout=60s > (pgsqld-start-interval-0s) > > > > stop interval=0s timeout=60s (pgsqld-stop-interval-0s) > > > > > > > > I understand I should at least increase the timeout of the stop > operation > > > > for psqld, though I'm not sure how much. Checkpoints can take up to > 15 > > > > minutes to complete on this cluster. So is 20 minutes reasonable? > > > > > > 20 minutes is not reasonable for HA. 2 minutes is for manual procedure. > > > Timeout are here so the cluster knows how to react during unexpected > > > failure. > > > Not during maintenance. > > > > > > As I wrote, just add a manual checkpoint in your switchover procedure > > > before > > > the actual move. > > > > > > > Any other operations I should increase the timeouts for? > > > > > > > > Why didn't pacemaker elect and promote one of the other nodes? > > > > > > Do you have logs of all nodes during this time period? > > > > > > > > > > > > -- > Jehan-Guillaume de Rorthais > Dalibo > -- Tiemen Ruiten Systems Engineer R&D Media
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
