Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Andrei Borzenkov Thu, 30 Nov 2017 03:52:41 -0800

On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan <y...@suse.com> wrote:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
>
> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>


Sounds promising. Is it enough? Comment in /etc/sysconfig/sbd says
"Whether to delay after starting sbd on boot for "msgwait" seconds.",
but as I understand, stonith agent timeout is 2 * msgwait.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Reply via email to