> -----Original Message----- > From: Elmar Marschke [mailto:elmar.marsc...@schenker.at] > Sent: Friday, August 16, 2013 10:31 PM > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb > > > Am 16.08.2013 15:46, schrieb Jake Smith: > >> -----Original Message----- > >> From: Elmar Marschke [mailto:elmar.marsc...@schenker.at] > >> Sent: Friday, August 16, 2013 9:05 AM > >> To: The Pacemaker cluster resource manager > >> Subject: [Pacemaker] Dual primary drbd + ocfs2: problems starting > >> o2cb > >> > >> Hi all, > >> > >> i'm working on a two node pacemaker cluster with dual primary drbd > >> and ocfs2. > >> > >> Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, > >> reading, writing, everything...). > >> > >> When i try to make this work in pacemaker, there seems to be a > >> problem > > to > >> start the o2cb resource. > >> > >> My (already simplified) configuration is: > >> ----------------------------------------- > >> node poc1 \ > >> attributes standby="off" > >> node poc2 \ > >> attributes standby="off" > >> primitive res_dlm ocf:pacemaker:controld \ > >> op monitor interval="120" > >> primitive res_drbd ocf:linbit:drbd \ > >> params drbd_resource="r0" \ > >> op stop interval="0" timeout="100" \ > >> op start interval="0" timeout="240" \ > >> op promote interval="0" timeout="90" \ > >> op demote interval="0" timeout="90" \ > >> op notifiy interval="0" timeout="90" \ > >> op monitor interval="40" role="Slave" timeout="20" \ > >> op monitor interval="20" role="Master" timeout="20" > >> primitive res_o2cb ocf:pacemaker:o2cb \ > >> op monitor interval="60" > >> ms ms_drbd res_drbd \ > >> meta notify="true" master-max="2" master-node-max="1" target- > >> role="Started" > >> property $id="cib-bootstrap-options" \ > >> no-quorum-policy="ignore" \ > >> stonith-enabled="false" \ > >> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > >> cluster-infrastructure="openais" \ > >> expected-quorum-votes="2" \ > >> last-lrm-refresh="1376574860" > >> > > > > Looks like you are missing ordering and colocation and clone (even > > group to make it a shorter config; group = order and colocation in one > > statement) statements. The resources *must* start in a particular > > order and they much run on the same node and there must be an instance > > of each resource on each node. > > > > More here for DRBD 8.4: > > http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html > > Or DRBD 8.3: > > http://www.drbd.org/users-guide-8.3/s-ocfs2-pacemaker.html > > > > Basically add: > > Group grp_dlm_o2cb res_dlm res_o2cb > > Clone cl_dlm_o2cb grp_dlm_o2cb meta interleave=true Order > > ord_drbd_then_dlm_o2cb res_drbd:promote cl_dlm_o2cb:start > Colocation > > col_dlm_o2cb_with_drbdmaster cl_dlm_o2cb res_drbd:Master > > > > HTH > > > > Jake > > > > Hello Jake, > > thanks for your reply. I already had res_dlm and res_o2cb grouped together > and cloned like in your advice; indeed this was my initial configuration. But > the problem showed up, so i tried to simplify the configuration to reduce > possible error sources. > > But now it seems i found a solution; or at least a workaround: i just use the > LSB resource agent lsb:o2cb. This one works! The resource starts without a > problem on both nodes and as far as i can see right now everything is fine > (tried with and without additional group and clone resource). > > Don't know if this will bring some drawbacks in the future; but for the > moment my problem seems to be solved.
Not sure either - usually resource agents are more robust than simple LSB. I would also verify that the o2cb LSB is fully LSB compliant or your cluster will have issues > > Currently it seems to me that there's a subtle problem with the > ocf:pacemaker:o2cb resource agent; at least on my system. Maybe, maybe not - if you take a look at the o2cb resource agent the error message you were getting is after trying to start /usr/sbin/ocfs2_controld.pcmk for 10 seconds without success... I would time starting o2cb. Might be as simple as allowing more time for startup of the daemon. I've not setup ocfs2 in a while but I believe you may be able to extend that timeout in the meta of the primitive without having to muck with the actual resource agent. Jake > > Anyway, thanks a lot for your answer..! > Best regards > elmar > > > > > >> First error message in corosync.log as far as i can identify it: > >> ---------------------------------------------------------------- > >> lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk: > >> no process found > >> [ other stuff ] > >> lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk: > >> no process found > >> [ other stuff ] > >> lrmd: [5547]: info: RA output: (res_o2cb:start:stderr) > >> 2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up > >> > >> ( > >> You can find the whole corosync logfile (starting corosync on node 1 > > from > >> beginning until after starting of resources) on: > >> http://www.marschke.info/corosync_drei.log > >> ) > >> > >> syslog shows: > >> ------------- > >> ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not > >> exist > >> > >> > >> Output of crm_mon: > >> ------------------ > >> ============ > >> Stack: openais > >> Current DC: poc1 - partition WITHOUT quorum > >> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff > >> 2 Nodes configured, 2 expected votes > >> 4 Resources configured. > >> ============ > >> > >> Online: [ poc1 ] > >> OFFLINE: [ poc2 ] > >> > >> Master/Slave Set: ms_drbd [res_drbd] > >> Masters: [ poc1 ] > >> Stopped: [ res_drbd:1 ] > >> res_dlm (ocf::pacemaker:controld): Started poc1 > >> > >> Migration summary: > >> * Node poc1: > >> res_o2cb: migration-threshold=1000000 fail-count=1000000 > >> > >> Failed actions: > >> res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete): > >> unknown error > >> > >> --------------------------------------------------------------------- > >> This is the situation after a reboot of node poc1. For simplification > >> i > > left > >> pacemaker / corosync unstarted on the second node, and already > >> removed a group and a clone resource where dlm and o2cb already had > >> been in > > (errors > >> were there also). > >> > >> Is my configuration of the resource agents correct? > >> I checked using "ra meta ...", but as far as i recognized everything > >> is > > ok. > >> > >> Is some piece of software missing? > >> dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are > >> available, i even did additional links in /usr/sbin: > >> root@poc1:~# which ocfs2_controld.pcmk /usr/sbin/ocfs2_controld.pcmk > >> root@poc1:~# which dlm_controld.pcmk /usr/sbin/dlm_controld.pcmk > >> root@poc1:~# > >> > >> I already googled but couldn't find any useful. Thanks for any > > hints...:) > >> > >> kind regards > >> elmar > >> > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org