18.12.2020 12:00, Ulrich Windl пишет: > > Maybe a related question: Do STONITH resources have special rules, meaning > they don't wait for successful fencing?
pacemaker resources in CIB do not perform fencing. They only register fencing devices with fenced which does actual job. In particular ... > I saw this between fencing being initiated and fencing being confirmed (h16 > was DC, now h18 became DC): > > Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Processing graph 0 > (ref=pe_calc-dc-1608280169-21) derived from > /var/lib/pacemaker/pengine/pe-warn-9.bz2 > Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Requesting fencing > (reboot) of node h16 > Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Initiating start > operation prm_stonith_sbd_start_0 locally on h18 ... "start" operation on pacemaker stonith resource only registers this device with fenced. It does *not* initiate stonith operation. > ... > Dec 18 09:31:14 h18 pacemaker-controld[4479]: error: Node h18 did not send > start result (via controller) within 45000ms (action timeout plus > cluster-delay) I am not sure what happens here. Somehow fenced took very long time to respond or something with communication between them. > Dec 18 09:31:14 h18 pacemaker-controld[4479]: error: [Action 22]: > In-flight resource op prm_stonith_sbd_start_0 on h18 (priority: 9900, > waiting: (null)) > Dec 18 09:31:14 h18 pacemaker-controld[4479]: notice: Transition 0 aborted: > Action lost > Dec 18 09:31:14 h18 pacemaker-controld[4479]: warning: rsc_op 22: > prm_stonith_sbd_start_0 on h18 timed out > ... > Dec 18 09:31:15 h18 pacemaker-controld[4479]: notice: Peer h16 was > terminated (reboot) by h18 on behalf of pacemaker-controld.4527: OK > Dec 18 09:31:17 h18 pacemaker-execd[4476]: notice: prm_stonith_sbd start > (call 164) exited with status 0 (execution time 110960ms, queue time 15001ms) It could be related to pending fencing but I am not familiar with low level details. > ... > Dec 18 09:31:30 h18 pacemaker-controld[4479]: notice: Peer h16 was > terminated (reboot) by h19 on behalf of pacemaker-controld.4479: OK > Dec 18 09:31:30 h18 pacemaker-controld[4479]: notice: Transition 0 > (Complete=31, Pending=0, Fired=0, Skipped=1, Incomplete=3, > Source=/var/lib/pacemaker/pengine/pe-warn-9.bz2): Stopped > ... > Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]: warning: Unexpected result > (error) was recorded for start of prm_stonith_sbd on h18 at Dec 18 09:31:14 > 2020 > Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]: notice: * Recover > prm_stonith_sbd ( h18 ) > ... > > Regards, > Ulrich > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/