Hi Darren, On Thu, Sep 29, 2011 at 02:15:34PM +0100, darren.mans...@opengi.co.uk wrote: > (Originally sent to DRBD-user, reposted here as it may be more relevant) > > > > > Hello all. > > > > I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 > for dual-primary shared FS. > > > > I've followed the instructions on the DRBD applications site and it > works really well. > > > > However, if I 'pull the plug' on a node, the other node continues to > operate the clones, but the filesystem is locked and inaccessible (the > monitor op works for the filesystem, but fails for the OCFS2 resource.) > > > > If I do a reboot one node, there are no problems and I can continue to > access the OCFS2 FS. > > > > After I pull the plug: > > > > Online: [ test-odp-02 ] > > OFFLINE: [ test-odp-01 ] > > > > Resource Group: Load-Balancing > > Virtual-IP-ODP (ocf::heartbeat:IPaddr2): Started > test-odp-02 > > Virtual-IP-ODPWS (ocf::heartbeat:IPaddr2): Started > test-odp-02 > > ldirectord (ocf::heartbeat:ldirectord): Started test-odp-02 > > Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2] > > Masters: [ test-odp-02 ] > > Stopped: [ p_drbd_ocfs2:1 ] > > Clone Set: cl-odp [odp] > > Started: [ test-odp-02 ] > > Stopped: [ odp:1 ] > > Clone Set: cl-odpws [odpws] > > Started: [ test-odp-02 ] > > Stopped: [ odpws:1 ] > > Clone Set: cl_fs_ocfs2 [p_fs_ocfs2] > > Started: [ test-odp-02 ] > > Stopped: [ p_fs_ocfs2:1 ] > > Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt] > > Started: [ test-odp-02 ] > > Stopped: [ g_ocfs2mgmt:1 ] > > > > Failed actions: > > p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2, > status=Timed Out): unknown > > exec error > > > > > > test-odp-02:~ # mount > > /dev/drbd0 on /opt/odp type ocfs2 > (rw,_netdev,noatime,cluster_stack=pcmk) > > > > test-odp-02:~ # ls /opt/odp > > ...just hangs forever... > > > > If I then power test-odp-01 back on, everything fails back fine and the > ls command suddenly completes. > > > > It seems to me that OCFS2 is trying to talk to the node that has > disappeared and doesn't time out. Does anyone have any ideas? (attached > CRM and DRBD configs)
With stonith disabled, I doubt that your cluster can behave as it should. Thanks, Dejan > > > Many thanks. > > > > Darren Mansell > > > Content-Description: crm.txt > node test-odp-01 > node test-odp-02 \ > attributes standby="off" > primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \ > params lvs_support="true" ip="2.21.15.100" cidr_netmask="8" > broadcast="2.255.255.255" \ > op monitor interval="1m" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \ > params lvs_support="true" ip="2.21.15.103" cidr_netmask="8" > broadcast="2.255.255.255" \ > op monitor interval="1m" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive ldirectord ocf:heartbeat:ldirectord \ > params configfile="/etc/ha.d/ldirectord.cf" \ > op monitor interval="2m" timeout="20s" \ > meta migration-threshold="10" failure-timeout="600" > primitive odp lsb:odp \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive odpwebservice lsb:odpws \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive p_controld ocf:pacemaker:controld \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive p_drbd_ocfs2 ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r0" directory="/opt/odp" > fstype="ocfs2" options="rw,noatime" \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > primitive p_o2cb ocf:ocfs2:o2cb \ > op monitor interval="10s" enabled="true" timeout="10s" \ > meta migration-threshold="10" failure-timeout="600" > group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord > group g_ocfs2mgmt p_controld p_o2cb > ms ms_drbd_ocfs2 p_drbd_ocfs2 \ > meta master-max="2" clone-max="2" notify="true" > clone cl-odp odp > clone cl-odpws odpws > clone cl_fs_ocfs2 p_fs_ocfs2 \ > meta target-role="Started" > clone cl_ocfs2mgmt g_ocfs2mgmt \ > meta interleave="true" > location Prefer-Node1 ldirectord \ > rule $id="prefer-node1-rule" 100: #uname eq test-odp-01 > order o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start cl_fs_ocfs2:start > order tomcatlast1 inf: cl_fs_ocfs2 cl-odp > order tomcatlast2 inf: cl_fs_ocfs2 cl-odpws > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > no-quorum-policy="ignore" \ > start-failure-is-fatal="false" \ > stonith-action="reboot" \ > stonith-enabled="false" \ > last-lrm-refresh="1317207361" > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker