>>> Tiemen Ruiten <t.rui...@rdmedia.com> schrieb am 14.06.2019 um 16:43 in Nachricht <caaegnz2ty9z_l6g9cnp+ixjh0bhjha0-mazhlexajx7uap-...@mail.gmail.com>: > Right, so I may have been too fast to give up. I set maintenance mode back > on and promoted ph-sql-04 manually. Unfortunately I don't have the logs of > ph-sql-03 anymore because I reinitialized it. > > You mention that demote timeout should be start timeout + stop timeout. > Start/stop are 60s, so that would mean 120s for demote timeout? Or 30s for > start/stop?
Timeout values always depend on your specific configuration, so general values cannot be given. I suggest to time the operations once (maybe with a very large timeout), then adjust the timeout to the value measured times some safety factor like 1.5 or even 3. Of course it all depends: If a fenceing and restart including recovery is faster tha waiting for an extraordinarily slow stop, you may prefer having a shorter timeout value. As said before: It all depends... Sorry for the late response, BTW. Ulrich > > > > > On Fri, 14 Jun 2019 at 15:55, Jehan-Guillaume de Rorthais <j...@dalibo.com> > wrote: > >> On Fri, 14 Jun 2019 13:18:09 +0200 >> Tiemen Ruiten <t.rui...@rdmedia.com> wrote: >> >> > Thank you, useful advice! >> > >> > Logs are attached, they cover the period between when I set >> > maintenance-mode=false till after the node fencing. >> >> Switchover started @ 09:51:43 >> >> In fact, the action that timed out was the demote action, not the stop >> action: >> >> pgsqld_demote_0:31997 - timed out after 30000ms >> >> As explained, the demote is doing a stop/start because PgSQL doesn't >> support hot >> demotion. So your demote action should be stop timeout+start timeout. I >> would >> recommend 60s there instead of 30s. >> >> After Pacemaker decide what to do next, you had some more timeouts. I >> supose >> PgSQL logs should give some more explanation of what happen during these >> long >> minutes >> >> pgsqld_notify_0:37945 - timed out after 60000ms >> ... >> pgsqld_stop_0:7783 - timed out after 60000ms >> >> It is 09:54:16. Now pengine become angry and want to make sure pgsql is >> stopped >> on node 03: >> >> pengine: warning: unpack_rsc_op_failure: Processing failed stop of >> pgsqld:1 >> on ph-sql-03: unknown error | rc=1 >> ... >> pengine: warning: pe_fence_node: Cluster node ph-sql-03 will be >> fenced: >> pgsqld:1 failed there >> ... >> pengine: warning: stage6: Scheduling Node ph-sql-03 for STONITH >> ... >> pengine: notice: native_stop_constraints: Stop of failed resource >> pgsqld:1 >> is implicit after ph-sql-03 is fenced >> >> >> From there node 03 is down for 9 minutes, it comes back at 10:02:59. >> >> Meanwhile, @ 09:54:29, node 5 took over the DC role and decided to promote >> pgsql >> on node 4 as expected. >> >> The pre-promote notify actions are triggered, but at 09:55:24, the >> transition is canceled because of maintenance mode: >> >> Transition aborted by cib-bootstrap-options-maintenance-mode doing modify >> maintenance-mode=true >> >> Soon after, both notify actions timed out on both nodes: >> >> warning: child_timeout_callback: pgsqld_notify_0 process (PID 38838) >> timed >> out >> >> Not sure what happen on your side that could explain these timeouts, but >> because the cluster was in maintenance mode, there was a human interaction >> ingoing anyway. >> >> >> >> >> >> >> > On Fri, 14 Jun 2019 at 12:48, Jehan-Guillaume de Rorthais < >> j...@dalibo.com> >> > wrote: >> > >> > > Hi, >> > > >> > > On Fri, 14 Jun 2019 12:27:12 +0200 >> > > Tiemen Ruiten <t.rui...@rdmedia.com> wrote: >> > > > I setup a new 3-node PostgreSQL cluster with HA managed by PAF. >> Nodes are >> > > > named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing >> > > > archive files to an NFS share that's mounted on all nodes using >> > > pgBackRest. >> > > > >> > > > What I did: >> > > > - Create a pacemaker cluster, cib.xml is attached. >> > > > - Set maintenance-mode=true in pacemaker >> > > >> > > This is not required. Just build your PgSQL replication, shut down the >> > > instances, then add the PAF resource to the cluster. >> > > >> > > But it's not very important here. >> > > >> > > > - Bring up ph-sql-03 with pg_ctl start >> > > > - Take a pg_basebackup on ph-sql-04 and ph-sql-05 >> > > > - Create a recovery.conf on ph-sql-04 and ph-sql-05: >> > > > >> > > > standby_mode = 'on' >> > > > primary_conninfo = 'user=replication password=XXXXXXXXXXXXXXXX >> > > > application_name=ph-sql-0x host=10.100.130.20 port=5432 >> sslmode=prefer >> > > > sslcompression=0 krbsrvname=postgres target_session_attrs=any' >> > > > recovery_target_timeline = 'latest' >> > > > restore_command = 'pgbackrest --stanza=pgdb2 archive-get %f "%p"' >> > > >> > > Sounds fine. >> > > >> > > > - Bring up ph-sql-04 and ph-sql-05 and let recovery finish >> > > > - Set maintenance-mode=false in pacemaker >> > > > - Cluster is now running with ph-sql-03 as master and ph-sql-04/5 >> as >> > > slaves >> > > > At this point I tried a manual failover: >> > > > - pcs resource move --wait --master pgsql-ha ph-sql-04 >> > > > Contrary to my expectations, pacemaker attempted to stop psqld on >> > > > ph-sql-03. >> > > >> > > Indeed. PostgreSQL doesn't support hot-demote. It has to be shut >> downed and >> > > started as a standby. >> > > >> > > > This took longer than the configured timeout of 60s (checkpoint >> > > > hadn't completed yet) and the node was fenced. >> > > >> > > 60s of checkpoint during a maintenance window? That's important >> indeed. I >> > > would >> > > command doing a manual checkpoint before triggering the >> move/switchover. >> > > >> > > > Then I ended up with >> > > > ph-sql-04 and ph-sql-05 both in slave mode and ph-sql-03 rebooting. >> > > > >> > > > Master: pgsql-ha >> > > > Meta Attrs: notify=true >> > > > Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms) >> > > > Attributes: bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/11/data >> > > > recovery_template=/var/lib/pgsql/recovery.conf.pcmk >> > > > Operations: demote interval=0s timeout=30s >> (pgsqld-demote-interval-0s) >> > > > methods interval=0s timeout=5 >> (pgsqld-methods-interval-0s) >> > > > monitor interval=15s role=Master timeout=10s >> > > > (pgsqld-monitor-interval-15s) >> > > > monitor interval=16s role=Slave timeout=10s >> > > > (pgsqld-monitor-interval-16s) >> > > > notify interval=0s timeout=60s >> (pgsqld-notify-interval-0s) >> > > > promote interval=0s timeout=30s >> > > (pgsqld-promote-interval-0s) >> > > > reload interval=0s timeout=20 >> (pgsqld-reload-interval-0s) >> > > > start interval=0s timeout=60s >> (pgsqld-start-interval-0s) >> > > > stop interval=0s timeout=60s (pgsqld-stop-interval-0s) >> > > > >> > > > I understand I should at least increase the timeout of the stop >> operation >> > > > for psqld, though I'm not sure how much. Checkpoints can take up to >> 15 >> > > > minutes to complete on this cluster. So is 20 minutes reasonable? >> > > >> > > 20 minutes is not reasonable for HA. 2 minutes is for manual procedure. >> > > Timeout are here so the cluster knows how to react during unexpected >> > > failure. >> > > Not during maintenance. >> > > >> > > As I wrote, just add a manual checkpoint in your switchover procedure >> > > before >> > > the actual move. >> > > >> > > > Any other operations I should increase the timeouts for? >> > > > >> > > > Why didn't pacemaker elect and promote one of the other nodes? >> > > >> > > Do you have logs of all nodes during this time period? >> > > >> > > >> > >> >> >> >> -- >> Jehan-Guillaume de Rorthais >> Dalibo >> > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/