Hi, >* As can be seen below from uptime, the node-1 is not shutdown by `pcs > *>* cluster stop node-1` executed on itself. > *>* I found some discussions on users at clusterlabs.org > <http://clusterlabs.org/mailman/listinfo/users> about whether a node > *>* running SBD resource can fence itself, > *>* but the conclusion was not clear to me. > *> > I am not familiar with pcs, but stopping pacemaker services manually > makes node leave cluster in controlled manner, and does not result in > fencing, at least in my experience. > > I confirm that killing corosync on node-1 results in fencing of node-1, but in a reboot instead of my desired shutdown: [root@node-1 ~]# killall -15 corosync Broadcast message from systemd-journald@node-1 (Sat 2016-06-25 21:55:07 EDT):
sbd[4761]: /dev/sdb1: emerg: do_exit: Rebooting system: off So the next is question 6: how to setup fence_sbd for the fenced node to shutdown? Both action=off or mode=onoff action=off options passed to fence_sbd when creating the MyStonith resource result in a reboot. [root@node-2 ~]# pcs stonith show MyStonith Resource: MyStonith (class=stonith type=fence_sbd) Attributes: devices=/dev/sdb1 power_timeout=21 action=off Operations: monitor interval=60s (MyStonith-monitor-interval-60s) Another question (the question 4 from my first post): The cluster is now in the state listed below. [root@node-2 ~]# pcs status Cluster name: mycluster Last updated: Sat Jun 25 22:06:51 2016 Last change: Sat Jun 25 15:41:09 2016 by root via cibadmin on node-1 Stack: corosync Current DC: node-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum 3 nodes and 1 resource configured Online: [ node-2 node-3 ] OFFLINE: [ node-1 ] Full list of resources: MyStonith (stonith:fence_sbd): Started node-2 PCSD Status: node-1: Online node-2: Online node-3: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@node-2 ~]# sbd -d /dev/sdb1 list 0 node-3 clear 1 node-2 clear 2 node-1 off node-2 What is the proper way of operating a cluster with SBD? I found that executing sbd watch on node-1 clears the SBD status: [root@node-1 ~]# sbd -d /dev/sdb1 watch [root@node-1 ~]# sbd -d /dev/sdb1 list 0 node-3 clear 1 node-2 clear 2 node-1 clear After making sure that sbd is not running on node-1 (I can do that because node-1 is currently not part of the cluster) [root@node-1 ~]# killall -15 sbd I can join node-1 to the cluster from node-2: [root@node-2 ~]# pcs cluster start node-1 >* Question 3: > *>* Neither node-1 is fenced by `stonith_admin -F node-1` executed on node-2, > *>* despite the fact > *>* /var/log/messages on node-2 (the one currently running MyStonith) > reporting: > *>* ... > *>* notice: Operation 'off' [3309] (call 2 from stonith_admin.3288) for host > *>* 'node-1' with device 'MyStonith' returned: 0 (OK) > *>* ... > *>* What is happening here? > *> > Do you have sbd daemon running? SBD is based on self-fencing - the only > thing that fence agent does is to place request for another node to kill > itself. It is expected that sbd running on another node will respond to > this request by committing suicide. > > > it looks to me that, as expected, sbd is integrated with corosync and * by doing`pcs* * cluster stop node-1` I stopped also sbd on node-1, so node-1 did not respond to the fence request from node-2.* *Now, back to question 6: with sbd running on node-1 and node-1 being part of the cluster[root@node-2 ~]# stonith_admin -F node-1* *results in a reboot of node-1 instead of shutdown.* */var/log/messages after the last command show "reboot" on node-2... Jun 25 22:36:46 localhost stonith-ng[3102]: notice: Client crmd.3106.b61d09b8 wants to fence (reboot) 'node-1' with device '(any)'Jun 25 22:36:46 localhost stonith-ng[3102]: notice: Initiating remote operation reboot for node-1: f29ba740-4929-4755-a3f5-3aca9ff3c3ff (0)Jun 25 22:36:46 localhost stonith-ng[3102]: notice: MyStonith can fence (reboot) node-1: dynamic-listJun 25 22:36:46 localhost stonith-ng[3102]: notice: watchdog can not fence (reboot) node-1: static-listJun 25 22:36:46 localhost stonith-ng[3102]: notice: MyStonith can fence (reboot) node-1: dynamic-listJun 25 22:36:46 localhost stonith-ng[3102]: notice: watchdog can not fence (reboot) node-1: static-listJun 25 22:36:59 localhost stonith-ng[3102]: notice: Operation 'off' [10653] (call 2 from stonith_admin.10640) for host 'node-1' with device 'MyStonith' returned: 0 (OK)Jun 25 22:36:59 localhost stonith-ng[3102]: notice: Operation off of node-1 by node-2 for [email protected]: OKJun 25 22:37:00 localhost stonith-ng[3102]: notice: Operation 'reboot' [10693] (call 4 from crmd.3106) for host 'node-1' with device 'MyStonith' returned: 0 (OK)Jun 25 22:37:00 localhost stonith-ng[3102]: notice: Operation reboot of node-1 by node-2 for [email protected]: OK...* *This may seems strange, but when sbd is not running on node-1 I'm consistently* *getting "(off)" instread of "(reboot)" in node-2:/var/log/messages after issuing:**[root@node-2 ~]# stonith_admin -F node-1* *and in this case there is of course no response from node-1 to the fencing request.* *Cheers,* *Marcin*
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
