In one of guides suggested procedure to simulate split brain was to kill
corosync process. It actually worked on one cluster, but on another
corosync process was restarted after being killed without cluster
noticing anything. Except after several attempts pacemaker died with
stopping resources ...
Wrapping my head around how pcmk_delay_max works, my understanding is
- on startup pacemaker always starts one instance of stonith/sbd; it
probably randomly selects node for it. I suppose this initial start is
delayed by random number within pcmk_delay_max.
- when cluster is partitioned,
Jan,
Very appreciated on your help, I am getting further more, but still it
looks very strange.
1. To use "debug-promote", I upgrade pacemaker from 1.12 to 1.16, pcs to
0.9.160.
2. Recreate resource with below commands
pcs resource create ovndb_servers ocf:ovn:ovndb-servers \