18.12.2020 12:00, Ulrich Windl пишет:
> 
> Maybe a related question: Do STONITH resources have special rules, meaning 
> they don't wait for successful fencing?

pacemaker resources in CIB do not perform fencing. They only register
fencing devices with fenced which does actual job. In particular ...

> I saw this between fencing being initiated and fencing being confirmed (h16 
> was DC, now h18 became DC):
> 
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Processing graph 0 
> (ref=pe_calc-dc-1608280169-21) derived from 
> /var/lib/pacemaker/pengine/pe-warn-9.bz2
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Requesting fencing 
> (reboot) of node h16
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Initiating start 
> operation prm_stonith_sbd_start_0 locally on h18

... "start" operation on pacemaker stonith resource only registers this
device with fenced. It does *not* initiate stonith operation.

> ...
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: Node h18 did not send 
> start result (via controller) within 45000ms (action timeout plus 
> cluster-delay)

I am not sure what happens here. Somehow fenced took very long time to
respond or something with communication between them.

> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: [Action   22]: 
> In-flight resource op prm_stonith_sbd_start_0      on h18 (priority: 9900, 
> waiting: (null))
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  notice: Transition 0 aborted: 
> Action lost
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  warning: rsc_op 22: 
> prm_stonith_sbd_start_0 on h18 timed out
> ...
> Dec 18 09:31:15 h18 pacemaker-controld[4479]:  notice: Peer h16 was 
> terminated (reboot) by h18 on behalf of pacemaker-controld.4527: OK
> Dec 18 09:31:17 h18 pacemaker-execd[4476]:  notice: prm_stonith_sbd start 
> (call 164) exited with status 0 (execution time 110960ms, queue time 15001ms)

It could be related to pending fencing but I am not familiar with low
level details.

> ...
> Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Peer h16 was 
> terminated (reboot) by h19 on behalf of pacemaker-controld.4479: OK
> Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Transition 0 
> (Complete=31, Pending=0, Fired=0, Skipped=1, Incomplete=3, 
> Source=/var/lib/pacemaker/pengine/pe-warn-9.bz2): Stopped
> ...
> Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  warning: Unexpected result 
> (error) was recorded for start of prm_stonith_sbd on h18 at Dec 18 09:31:14 
> 2020
> Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  notice:  * Recover    
> prm_stonith_sbd                      (             h18 )
> ...
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to