17.12.2017 15:20, Gao,Yan пишет: > On 2017/12/16 16:59, Andrei Borzenkov wrote: >> 04.12.2017 21:55, Andrei Borzenkov пишет: >> ... >>>>> >>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it >>>>> has >>>>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch >>>>> disk at all. >>>> It simply waits that long on startup before starting the rest of the >>>> cluster stack to make sure the fencing that targeted it has >>>> returned. It >>>> intentionally doesn't watch anything during this period of time. >>>> >>> >>> Unfortunately it waits too long. >>> >>> ha1:~ # systemctl status sbd.service >>> ● sbd.service - Shared-storage based fencing daemon >>> Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor >>> preset: disabled) >>> Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK; >>> 4min 16s ago >>> Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited, >>> status=0/SUCCESS) >>> Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid >>> watch (code=killed, signa >>> Main PID: 1792 (code=exited, status=0/SUCCESS) >>> >>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing >>> daemon... >>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out. >>> Terminating. >>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based >>> fencing daemon. >>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state. >>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result >>> 'timeout'. >>> >>> But the real problem is - in spite of SBD failed to start, the whole >>> cluster stack continues to run; and because SBD blindly trusts in well >>> behaving nodes, fencing appears to succeed after timeout ... without >>> anyone taking any action on poison pill ... >>> >> >> That's sbd bug. It declares itself as RequiredBy=corosync.service but >> puts itself Before=pacemaker.service. Due to systemd design, service A >> *MUST* have Before dependency on service B if failure to start A should >> cause failure to start B. *Or* use BindsTo ... but that sounds wrong >> because it would cause B to start briefly and then be killed. >> >> So the question is what is intended here. Should sbd.service be >> prerequisite for corosync or pacemaker? > It should be so only if it's enabled. Try this: > https://github.com/ClusterLabs/sbd/pull/39 >
This is wrong, I commented on this pull request. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org