05.12.2017 13:34, Gao,Yan пишет: > On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote: >> On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote: >>> 04.12.2017 14:48, Gao,Yan пишет: >>>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >>>>> 30.11.2017 13:48, Gao,Yan пишет: >>>>>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >>>>>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >>>>>>> VM on VSphere using shared VMDK as SBD. During basic tests by >>>>>>> killing >>>>>>> corosync and forcing STONITH pacemaker was not started after reboot. >>>>>>> In logs I see during boot >>>>>>> >>>>>>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly >>>>>>> just fenced by sapprod01p for sapprod01p >>>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd >>>>>>> process (3151) can no longer be respawned, >>>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: notice: Shutting down >>>>>>> Pacemaker >>>>>>> >>>>>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems >>>>>>> that >>>>>>> stonith with SBD always takes msgwait (at least, visually host is >>>>>>> not >>>>>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast >>>>>>> and is up and running long before timeout expires. >>>>>>> >>>>>>> I think I have seen similar report already. Is it something that can >>>>>>> be fixed by SBD/pacemaker tuning? >>>>>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution. >>>>>> >>>>> >>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it >>>>> has >>>>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch >>>>> disk at all. >>>> It simply waits that long on startup before starting the rest of the >>>> cluster stack to make sure the fencing that targeted it has >>>> returned. It >>>> intentionally doesn't watch anything during this period of time. >>>> >>> >>> Unfortunately it waits too long. >>> >>> ha1:~ # systemctl status sbd.service >>> ● sbd.service - Shared-storage based fencing daemon >>> Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor >>> preset: disabled) >>> Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK; >>> 4min 16s ago >>> Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited, >>> status=0/SUCCESS) >>> Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid >>> watch (code=killed, signa >>> Main PID: 1792 (code=exited, status=0/SUCCESS) >>> >>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing >>> daemon... >>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out. >>> Terminating. >>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based >>> fencing daemon. >>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state. >>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result >>> 'timeout'. >>> >>> But the real problem is - in spite of SBD failed to start, the whole >>> cluster stack continues to run; and because SBD blindly trusts in well >>> behaving nodes, fencing appears to succeed after timeout ... without >>> anyone taking any action on poison pill ... >> >> That's something I always wondered about: if a node is capable of >> reading a poison pill then it could before shutdown also write an >> "I'm leaving" message into its slot. Wouldn't that make sbd more >> reliable? Any reason not to implement that? > Probably it's not considered necessary :) SBD is a fencing mechanism > which only needs to ensure fencing works.
I'm sorry, but SBD has zero chances to ensure fencing works. Recently I did storage vMotion of VM with shared VMDK for SBD - it silently created copy of VMDK which was indistinguishable from original one. As result both VMs run with own copy. Of course fencing did not work - but each VM *assumed* it worked because it posted message and waited for timeout ... I would expect "monitor" action of SBD fencing agent to actually test whether messages are seen by remote node(s) ... > SBD on the fencing target is > either there eating the pill or getting reset by watchdog, otherwise > it's not there which is supposed to imply the whole cluster stack is not > running so that it doesn't need to actually eat the pill. > > How systemd should handle the service dependencies is another topic... > _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org