Thank you, Sunil. I am not familiar with qdisk, so I will definitely look into it. I will also try posting this to the mailing list you recommended in hopes someone may have an alternate suggestion.
NOTE: I forgot to mention that I am using no-quorum-policy="ignore". Also, if I try to remove expected-quorum-votes=²2², it seems to automatically be added back in... Here is the latest CIB file I have been working with: > > node ubu10a > node ubu10b > primitive resDLM ocf:pacemaker:controld \ > op monitor interval="120s" > primitive resDRBD ocf:linbit:drbd \ > params drbd_resource="repdata" \ > operations $id="resDRBD-operations" \ > op monitor interval="30s" role="Master" timeout="120s" \ > op monitor interval="30s" role="Master" timeout="120s" > primitive resFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/repdata" directory="/data" > fstype="ocfs2" \ > op monitor interval="120s" > primitive resO2CB ocf:pacemaker:o2cb \ > op monitor interval="120s" > ms msDRBD resDRBD \ > meta resource-stickines="100" notify="true" master-max="2" > interleave="true" > clone cloneDLM resDLM \ > meta globally-unique="false" interleave="true" > clone cloneFS resFS \ > meta interleave="true" ordered="true" > clone cloneO2CB resO2CB \ > meta globally-unique="false" interleave="true" > colocation colDLMDRBD inf: cloneDLM msDRBD:Master > colocation colFSO2CB inf: cloneFS cloneO2CB > colocation colO2CBDLM inf: cloneO2CB cloneDLM > order ordDLMO2CB 0: cloneDLM cloneO2CB > order ordDRBDDLM 0: msDRBD:promote cloneDLM > order ordO2CBFS 0: cloneO2CB cloneFS > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-unknown" \ > cluster-infrastructure="openais" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" From: Sunil Mushran <sunil.mush...@oracle.com> Date: Fri, 01 Apr 2011 12:01:43 -0700 To: Mike Reid <mbr...@thepei.com> Cc: <ocfs2-users@oss.oracle.com> Subject: Re: [Ocfs2-users] Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10) I believe this is a pacemaker issue. There was a time it required a qdisk to continue working as a single node in a 2 node cluster when one node died. if pacemaker people don't jump in, you may want to try your luck in the linux-cluster mailing list. On 04/01/2011 11:44 AM, Mike Reid wrote: > Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10) > I am running a two-node web cluster on OCFS2 via DRBD Primary/Primary (v8.3.8) > and Pacemaker. Everything seems to be working great, except during testing of > hard-boot scenarios. > > Whenever I hard-boot one of the nodes, the other node is successfully fenced > and marked ³Outdated² > > > * <resource minor="0" cs="WFConnection" ro1="Primary" ro2="Unknown" > ds1="UpToDate" ds2="Outdated" /> > * > > However, this locks up I/O on the still active node and prevents any > operations within the cluster :( > I have even forced DRBD into StandAlone mode while in this state, but that > does not resolve the I/O lock either. > > > * <resource minor="0" cs="StandAlone" ro1="Primary" ro2="Unknown" > ds1="UpToDate" ds2="Outdated" /> > * > > The only way I¹ve been able to successfully regain I/O within the cluster is > to bring back up the other node. While monitoring the logs, it seems that it > is OCFS2 that¹s establishing the lock/unlock and not DRBD at all. > >> >> >> Apr 1 12:07:19 ubu10a kernel: [ 1352.739777] >> (ocfs2rec,3643,0):ocfs2_replay_journal:1605 Recovering node 1124116672 from >> slot 1 on device (147,0) >> Apr 1 12:07:19 ubu10a kernel: [ 1352.900874] >> (ocfs2rec,3643,0):ocfs2_begin_quota_recovery:407 Beginning quota recovery in >> slot 1 >> Apr 1 12:07:19 ubu10a kernel: [ 1352.902509] >> (ocfs2_wq,1213,0):ocfs2_finish_quota_recovery:598 Finishing quota recovery in >> slot 1 >> >> Apr 1 12:07:20 ubu10a kernel: [ 1354.423915] block drbd0: Handshake >> successful: Agreed network protocol version 94 >> Apr 1 12:07:20 ubu10a kernel: [ 1354.433074] block drbd0: Peer >> authenticated using 20 bytes of 'sha1' HMAC >> Apr 1 12:07:20 ubu10a kernel: [ 1354.433083] block drbd0: conn( >> WFConnection -> WFReportParams ) >> Apr 1 12:07:20 ubu10a kernel: [ 1354.433097] block drbd0: Starting asender >> thread (from drbd0_receiver [2145]) >> Apr 1 12:07:20 ubu10a kernel: [ 1354.433562] block drbd0: >> data-integrity-alg: <not-used> >> Apr 1 12:07:20 ubu10a kernel: [ 1354.434090] block drbd0: >> drbd_sync_handshake: >> Apr 1 12:07:20 ubu10a kernel: [ 1354.434094] block drbd0: self >> FBA98A2F89E05B83:EE17466F4DEC2F8B:6A4CD8FDD0562FA1:EC7831379B78B997 bits:4 >> flags:0 >> Apr 1 12:07:20 ubu10a kernel: [ 1354.434097] block drbd0: peer >> EE17466F4DEC2F8A:0000000000000000:6A4CD8FDD0562FA0:EC7831379B78B997 bits:2048 >> flags:2 >> Apr 1 12:07:20 ubu10a kernel: [ 1354.434099] block drbd0: uuid_compare()=1 >> by rule 70 >> Apr 1 12:07:20 ubu10a kernel: [ 1354.434104] block drbd0: peer( Unknown -> >> Secondary ) conn( WFReportParams -> WFBitMapS ) >> Apr 1 12:07:21 ubu10a kernel: [ 1354.601353] block drbd0: conn( WFBitMapS >> -> SyncSource ) pdsk( Outdated -> Inconsistent ) >> Apr 1 12:07:21 ubu10a kernel: [ 1354.601367] block drbd0: Began resync as >> SyncSource (will sync 8192 KB [2048 bits set]). >> Apr 1 12:07:21 ubu10a kernel: [ 1355.401912] block drbd0: Resync done >> (total 1 sec; paused 0 sec; 8192 K/sec) >> Apr 1 12:07:21 ubu10a kernel: [ 1355.401923] block drbd0: conn( SyncSource >> -> Connected ) pdsk( Inconsistent -> UpToDate ) >> Apr 1 12:07:22 ubu10a kernel: [ 1355.612601] block drbd0: peer( Secondary >> -> Primary ) >> >> >> > Therefore, my question is if there is an option in OCFS2 to remove / prevent > this lock, especially since it¹s inside a DRBD configuration? I¹m still new to > OCFS2, so I am definitely open to any criticism regarding my setup/approach, > or any recommendations related to keeping the cluster active when another node > is shutdown during testing. > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users