On Tue, Jun 15, 2021 at 10:41 PM Strahil Nikolov <hunter86...@yahoo.com> wrote:
> Maybe you can try: > > while true ; do echo '0' > /proc/sys/kernel/nmi_watchdog ; sleep 1 ; done > > and in another shell stop pacemaker and sbd. > > I guess the only way to easily reproduce is with sbd over iscsi. > > Best Regards, > Strahil Nikolov > > On Tue, Jun 15, 2021 at 21:30, Andrei Borzenkov > <arvidj...@gmail.com> wrote: > On 15.06.2021 20:48, Strahil Nikolov wrote: > > I'm using 'pcs cluster stop' (or it's crm alternative),yet I'm not sure > if it will help in this case. > > > > No it won't. It will still stop pacemaker. > > Guess this is really a delicate issue and we might think of adding some handle here. Although of course these kind of handles always come with a certain amount of risk that they might be used in a way that prevents a node from suiciding when it actually should. Unfortunately the way 'pcs cluster stop' avoids suicides of single nodes in larger clusters might not work here - first stop pacemaker on all nodes and just then stop corosync to keep quorum for long enough and to have a quick shutdown of the rest - as on a 2-node-cluster sbd actually isn't checking for quorum but for the number of nodes registered with the corosync protocol pacemaker uses. Regards, Klaus > > > > Most probably the safest way is to wait for the storage to be recovered, > as without the pacemaker<->SBD communication , sbd will stop and the > watchdog will be triggered. > > > > > What makes you think I am not aware of it? > > can you suggest the steps to avoid it? > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/