All, I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working great, except during testing of hard-boot scenarios.
Whenever I hard-boot one of the nodes, the other node is successfully fenced and marked ³Outdated² * <resource minor="0" cs="WFConnection" ro1="Primary" ro2="Unknown"ds1="UpToDate" ds2="Outdated" /> However, this locks up I/O on the still active node and prevents any operations within the cluster :( I have even forced DRBD into StandAlone mode while in this state, but that does not resolve the I/O lock either....does anyone know if this is possible using OCFS2 (maintaining an active cluster in Primary/Unknown once the other node has a failure? E.g. Be it forced, controlled, etc) I have been focusing on DRBD config, but I am starting to wonder if perhaps it¹s something with my Pacemaker or OCFS2 setup that is forcing this I/O lock during a failure. Any thoughts? ----------------------------- crm_mon (crm_mon 1.0.9 for OpenAIS and Heartbeat): > ============ > Last updated: Mon Apr 4 12:57:47 2011 > Stack: openais > Current DC: ubu10a - partition with quorum > Version: 1.0.9-unknown > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ ubu10a ubu10b ] > > Master/Slave Set: msDRBD > Masters: [ ubu10a ubu10b ] > Clone Set: cloneDLM > Started: [ ubu10a ubu10b ] > Clone Set: cloneO2CB > Started: [ ubu10a ubu10b ] > Clone Set: cloneFS > Started: [ ubu10a ubu10b ] ----------------------------- DRBD (v8.3.8): > > version: 8.3.8 (api:88/proto:86-94) > 0:repdata Connected Primary/Primary UpToDate/UpToDate C /data ocfs2 ----------------------------- DRBD Conf: > > global { > usage-count no; > } > common { > syncer { rate 10M; } > } > resource repdata { > protocol C; > > meta-disk internal; > device /dev/drbd0; > disk /dev/sda3; > > handlers { > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; > local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > } > startup { > degr-wfc-timeout 120; # 120 = 2 minutes. > wfc-timeout 30; > become-primary-on both; > } > disk { > fencing resource-only; > } > syncer { > rate 10M; > al-extents 257; > } > net { > cram-hmac-alg "sha1"; > shared-secret "XXXXXXX"; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > on ubu10a { > address 192.168.0.66:7788; > } > on ubu10b { > address 192.168.0.67:7788; > } > } ----------------------------- CIB.xml > > node ubu10a \ > attributes standby="off" > node ubu10b \ > attributes standby="off" > primitive resDLM ocf:pacemaker:controld \ > op monitor interval="120s" > primitive resDRBD ocf:linbit:drbd \ > params drbd_resource="repdata" \ > operations $id="resDRBD-operations" \ > op monitor interval="20s" role="Master" timeout="120s" \ > op monitor interval="30s" role="Slave" timeout="120s" > primitive resFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/repdata" directory="/data" > fstype="ocfs2" \ > op monitor interval="120s" > primitive resO2CB ocf:pacemaker:o2cb \ > op monitor interval="120s" > ms msDRBD resDRBD \ > meta resource-stickines="100" notify="true" master-max="2" > interleave="true" > clone cloneDLM resDLM \ > meta globally-unique="false" interleave="true" > clone cloneFS resFS \ > meta interleave="true" ordered="true" > clone cloneO2CB resO2CB \ > meta globally-unique="false" interleave="true" > colocation colDLMDRBD inf: cloneDLM msDRBD:Master > colocation colFSO2CB inf: cloneFS cloneO2CB > colocation colO2CBDLM inf: cloneO2CB cloneDLM > order ordDLMO2CB 0: cloneDLM cloneO2CB > order ordDRBDDLM 0: msDRBD:promote cloneDLM > order ordO2CBFS 0: cloneO2CB cloneFS > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-unknown" \ > cluster-infrastructure="openais" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" > > -----------------------------
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker