On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan <y...@suse.com> wrote: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pacemaker was not started after reboot. >> In logs I see during boot >> >> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly >> just fenced by sapprod01p for sapprod01p >> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd >> process (3151) can no longer be respawned, >> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: notice: Shutting down >> Pacemaker >> >> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that >> stonith with SBD always takes msgwait (at least, visually host is not >> declared as OFFLINE until 120s passed). But VM rebots lightning fast >> and is up and running long before timeout expires. >> >> I think I have seen similar report already. Is it something that can >> be fixed by SBD/pacemaker tuning? > > SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution. >
Sounds promising. Is it enough? Comment in /etc/sysconfig/sbd says "Whether to delay after starting sbd on boot for "msgwait" seconds.", but as I understand, stonith agent timeout is 2 * msgwait. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org