On Wed, 2020-12-16 at 15:56 +0100, Gabriele Bulfon wrote:
> Thanks, here are the logs, there are infos about how it tried to
> start resources on the nodes.
> Keep in mind the node1 was already running the resources, and I
> simulated a problem by turning down the ha interface.
>  
> Gabriele

>From the logs, Pacemaker is scheduling resource recovery after fencing
(which means stonith-enabled must already be true, by the way). I don't
know how you could see resources start without fencing succeeding
first.

Have you tested the fence devices themselves? E.g. manually run the
fence agent with the same parameters, or run "stonith_admin --reboot
<node>". It's possible the fence device is returning success without
actually doing the fencing, though I'm not sure how that would happen
either.

BTW if you're using corosync < 3, turning down the interface isn't a
good test. Physically pulling the cable, or using the firewall to block
both incoming and outgoing packets on the interface, is better.

>  
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
> 
> 
> 
> -------------------------------------------------------------------
> ---------------
> 
> Da: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>
> A: users@clusterlabs.org 
> Data: 16 dicembre 2020 15.45.36 CET
> Oggetto: [ClusterLabs] Antw: [EXT] delaying start of a resource
> 
> > >>> Gabriele Bulfon <gbul...@sonicle.com> schrieb am 16.12.2020 um
> > 15:32 in
> > Nachricht <1523391015.734.1608129155836@www>:
> > > Hi, I have now a two node cluster using stonith with different 
> > > pcmk_delay_base, so that node 1 has priority to stonith node 2 in
> > case of 
> > > problems.
> > > 
> > > Though, there is still one problem: once node 2 delays its
> > stonith action 
> > > for 10 seconds, and node 1 just 1, node 2 does not delay start of
> > resources, 
> > > so it happens that while it's not yet powered off by node 1 (and
> > waiting its 
> > > dalay to power off node 1) it actually starts resources, causing
> > a moment of 
> > > few seconds where both NFS IP and ZFS pool (!!!!!) is mounted by
> > both!
> > 
> > AFAIK pacemaker will not start resources on a node that is
> > scheduled for stonith. Even more: Pacemaker will tra to stop
> > resources on a node scheduled for stonith to start them elsewhere.
> > 
> > > How can I delay node 2 resource start until the delayed stonith
> > action is 
> > > done? Or how can I just delay the resource start so I can make it
> > larger than 
> > > its pcmk_delay_base?
> > 
> > We probably need to see logs and configs to understand.
> > 
> > > 
> > > Also, I was suggested to set "stonith-enabled=true", but I don't
> > know where 
> > > to set this flag (cib-bootstrap-options is not happy with it...).
> > 
> > I think it's on by default, so you must have set it to false.
> > In crm shell it is "configure# property stonith-enabled=...".
> > 
> > Regards,
> > Ulrich
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to