Thanks. I most assuredly will, but first I have to run some experiments, to get a feeling for it.
On Wed, Apr 17, 2019 at 3:56 PM digimer <li...@alteeve.ca> wrote: > Happy to help you understand, just keep asking questions. :) > > The point can be explained this way; > > * If two nodes can work without coordination, you don't need a cluster, > just run your services everywhere. If that is not the case, then you > require coordination. Fencing ensures that a node that has entered an > unknown state can be forced into a known state (off). In this way, no > action will be taken by a node unless the peer can be informed, or the peer > is gone. > > The method that a node is forced into a known state depends on the > hardware (or infrastructure) you have in your particular setup. So perhaps, > explain what you're nodes are built on and we can assist with more specific > details. > > digimer > On 2019-04-17 5:46 p.m., JCA wrote: > > Thanks. This implies that I officially do not understand what it is that > fencing can do for me, in my simple cluster. Back to the drawing board. > > On Wed, Apr 17, 2019 at 3:33 PM digimer <li...@alteeve.ca> wrote: > >> Fencing requires some mechanism, outside the nodes themselves, that can >> terminate the nodes. Typically, IPMI (iLO, iRMC, RSA, DRAC, etc) is used >> for this. Alternatively, switched PDUs are common. If you don't have these >> but do have a watchdog timer on your nodes, SBD (storage-based death) can >> work. >> >> You can use 'fence_<device> <options> -o status' at the command line to >> figure out the what will work with your hardware. Once you can called >> 'fence_foo ... -o status' and get the status of each node, then translating >> that into a pacemaker configuration is pretty simple. That's when you >> enable stonith. >> >> Once stonith is setup and working in pacemaker (ie: you can crash a node >> and the peer reboots it), then you will go to DRBD and set 'fencing: >> resource-and-stonith;' (tells DRBD to block on communication failure with >> the peer and request a fence), and then setup the 'fence-handler >> /path/to/crm-fence-peer.sh' and 'unfence-handler >> /path/to/crm-unfence-handler.sh' (I am going from memory, check the man >> page to verify syntax). >> >> With all this done; if either pacemaker/corosync or DRBD lose contact >> with the peer, they will block and fence. Only after the peer has been >> confirmed terminated will IO resume. This way, split-nodes become >> effectively impossible. >> >> digimer >> On 2019-04-17 5:17 p.m., JCA wrote: >> >> Here is what I did: >> >> # pcs stonith create disk_fencing fence_scsi pcmk_host_list="one two" >> pcmk_monitor_action="metadata" pcmk_reboot_action="off" >> devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb" meta >> provides="unfencing" >> >> where ata-VBOX-... corresponds to the device where I have the partition >> that is shared between both nodes in my cluster. The command completes >> without any errors (that I can see) and after that I have >> >> # pcs status >> Cluster name: ClusterOne >> Stack: corosync >> Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with >> quorum >> Last updated: Wed Apr 17 14:35:25 2019 >> Last change: Wed Apr 17 14:11:14 2019 by root via cibadmin on one >> >> 2 nodes configured >> 5 resources configured >> >> Online: [ one two ] >> >> Full list of resources: >> >> MyCluster (ocf::myapp:myapp-script): Stopped >> Master/Slave Set: DrbdDataClone [DrbdData] >> Stopped: [ one two ] >> DrbdFS (ocf::heartbeat:Filesystem): Stopped >> disk_fencing (stonith:fence_scsi): Stopped >> >> Daemon Status: >> corosync: active/enabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> Things stay that way indefinitely, until I set stonith-enabled to false - >> at which point all the resources above get started immediately. >> >> Obviously, I am missing something big here. But, what is it? >> >> >> On Wed, Apr 17, 2019 at 2:59 PM Adam Budziński <budzinski.a...@gmail.com> >> wrote: >> >>> You did not configure any fencing device. >>> >>> śr., 17.04.2019, 22:51 użytkownik JCA <1.41...@gmail.com> napisał: >>> >>>> I am trying to get fencing working, as described in the "Cluster from >>>> Scratch" guide, and I am stymied at get-go :-( >>>> >>>> The document mentions a property named stonith-enabled. When I was >>>> trying to get my first cluster going, I noticed that my resources would >>>> start only when this property is set to false, by means of >>>> >>>> # pcs property set stonith-enabled=false >>>> >>>> Otherwise, all the resources remain stopped. >>>> >>>> I created a fencing resource for the partition that I am sharing across >>>> the the nodes, by means of DRBD. This works fine - but I still have the >>>> same problem as above - i.e. when stonith-enabled is set to true, all the >>>> resources get stopped, and remain in that state. >>>> >>>> I am very confused here. Can anybody point me in the right direction >>>> out of this conundrum? >>>> >>>> >>>> >>>> _______________________________________________ >>>> Manage your subscription: >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> ClusterLabs home: https://www.clusterlabs.org/ >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> _______________________________________________ >> Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >>
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/