On Thu, 2020-12-17 at 19:13 +0300, Andrei Borzenkov wrote:
> 17.12.2020 14:02, Ulrich Windl пишет:
> > > > > Andrei Borzenkov <arvidj...@gmail.com> schrieb am 17.12.2020
> > > > > um 09:50 in
> > 
> > Nachricht
> > <caa91j0vuv4nmtetcpqnimf-xrrv_9kqkcnpvman4xbonbqp...@mail.gmail.com
> > >:
> > 
> > ...
> > > According to logs from xstha1, it started to activate resources
> > > only
> > > after stonith was confirmed
> > > 
> > > Dec 16 15:08:12 [708] stonith‑ng:   notice: log_operation:
> > > Operation 'off' [1273] (call 4 from crmd.712) for host 'xstha2'
> > > with
> > > device 'xstha2‑stonith' returned: 0 (OK)
> > > Dec 16 15:08:12 [708] stonith‑ng:   notice: remote_op_done:
> > > Operation 'off' targeting xstha2 on xstha1 for
> > > crmd.712@xstha1.e487e7cc: OK
> > > 
> > > It is possible that your IPMI/BMC/whatever implementation
> > > responds
> > > with success before it actually completes this action. I have
> > > seen at
> > 
> > Shouldn't a reasonable "stonith-timeout=180" do? 
> 
> This is maximum time to wait for successful stonith. In this case
> stonith *was* successful - at least from the pacemaker point of view.

This reminded me that some IPMI implementations return "success" for
commands before they've actually been completed. This is why
fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.

The best thing would be to do some manual testing using ipmitool or
whatnot to turn off the power, and observe how long it takes between
when the command returns and the server actually is powered down. Then
set power_wait to a comfortable margin above that. Or just keep raising
power_wait until the problem goes away :)
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to