> -----Original Message----- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: 17 February 2014 00:55 > To: li...@blueface.com; The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced > > > If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd > fencing operations are sent to Pacemaker. > If you aren't running pacemaker, then you have a big problem as no-one can > perform fencing.
I have configured pacemaker as the resource manager and I have it enabled to start on boot-up too as follows: chkconfig cman on chkconfig clvmd on chkconfig pacemaker on > > I don't know if you are testing without pacemaker running, but if so you > would need to configure cman with real fencing devices. > I have been testing with pacemaker running and the fencing appears to be operating fine, the issue I seem to have is that clvmd is unable re-acquire its locks when attempting to rejoin the cluster after a fence operation, so it looks like clvmd just hangs when the startup script fires it off on boot-up. When the 3rd node is in this state (hung clvmd), then the other 2 nodes are unable to obtain locks from the third node as clvmd has hung, as an example, this is what happens when the 3rd node is hung at the clvmd startup phase after pacemaker has issued a fence operation (running pvs on node1) [root@test01 ~]# pvs Error locking on node test03: Command timed out Unable to obtain global lock. The dlm elements look fine to me here too: [root@test01 ~]# dlm_tool ls dlm lockspaces name cdr id 0xa8054052 flags 0x00000008 fs_reg change member 2 joined 0 remove 1 failed 1 seq 2,2 members 1 2 name clvmd id 0x4104eefa flags 0x00000000 change member 3 joined 1 remove 0 failed 0 seq 3,3 members 1 2 3 So it looks like cman/dlm are operating properly, however, clvmd hangs and never exits so pacemaker never starts on the 3rd node. So the 3rd node is in "pending" state while clvmd is hung: [root@test02 ~]# crm_mon -Afr -1 Last updated: Mon Feb 17 15:52:28 2014 Last change: Mon Feb 17 15:43:16 2014 via cibadmin on test01 Stack: cman Current DC: test02 - partition with quorum Version: 1.1.10-14.el6_5.2-368c726 3 Nodes configured 15 Resources configured Node test03: pending Online: [ test01 test02 ] Full list of resources: fence_test01 (stonith:fence_vmware_soap): Started test01 fence_test02 (stonith:fence_vmware_soap): Started test02 fence_test03 (stonith:fence_vmware_soap): Started test01 Clone Set: fs_cdr-clone [fs_cdr] Started: [ test01 test02 ] Stopped: [ test03 ] Resource Group: sftp01-vip vip-001 (ocf::heartbeat:IPaddr2): Started test01 vip-002 (ocf::heartbeat:IPaddr2): Started test01 Resource Group: sftp02-vip vip-003 (ocf::heartbeat:IPaddr2): Started test02 vip-004 (ocf::heartbeat:IPaddr2): Started test02 Resource Group: sftp03-vip vip-005 (ocf::heartbeat:IPaddr2): Started test02 vip-006 (ocf::heartbeat:IPaddr2): Started test02 sftp01 (lsb:sftp01): Started test01 sftp02 (lsb:sftp02): Started test02 sftp03 (lsb:sftp03): Started test02 Node Attributes: * Node test01: * Node test02: * Node test03: Migration summary: * Node test03: * Node test02: * Node test01: _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org