> -----Original Message----- > From: Elmar Marschke [mailto:elmar.marsc...@schenker.at] > Sent: Friday, August 16, 2013 9:05 AM > To: The Pacemaker cluster resource manager > Subject: [Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb > > Hi all, > > i'm working on a two node pacemaker cluster with dual primary drbd and > ocfs2. > > Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading, > writing, everything...). > > When i try to make this work in pacemaker, there seems to be a problem to > start the o2cb resource. > > My (already simplified) configuration is: > ----------------------------------------- > node poc1 \ > attributes standby="off" > node poc2 \ > attributes standby="off" > primitive res_dlm ocf:pacemaker:controld \ > op monitor interval="120" > primitive res_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op stop interval="0" timeout="100" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op notifiy interval="0" timeout="90" \ > op monitor interval="40" role="Slave" timeout="20" \ > op monitor interval="20" role="Master" timeout="20" > primitive res_o2cb ocf:pacemaker:o2cb \ > op monitor interval="60" > ms ms_drbd res_drbd \ > meta notify="true" master-max="2" master-node-max="1" target- > role="Started" > property $id="cib-bootstrap-options" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1376574860" >
Looks like you are missing ordering and colocation and clone (even group to make it a shorter config; group = order and colocation in one statement) statements. The resources *must* start in a particular order and they much run on the same node and there must be an instance of each resource on each node. More here for DRBD 8.4: http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html Or DRBD 8.3: http://www.drbd.org/users-guide-8.3/s-ocfs2-pacemaker.html Basically add: Group grp_dlm_o2cb res_dlm res_o2cb Clone cl_dlm_o2cb grp_dlm_o2cb meta interleave=true Order ord_drbd_then_dlm_o2cb res_drbd:promote cl_dlm_o2cb:start Colocation col_dlm_o2cb_with_drbdmaster cl_dlm_o2cb res_drbd:Master HTH Jake > First error message in corosync.log as far as i can identify it: > ---------------------------------------------------------------- > lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk: > no process found > [ other stuff ] > lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk: > no process found > [ other stuff ] > lrmd: [5547]: info: RA output: (res_o2cb:start:stderr) > 2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up > > ( > You can find the whole corosync logfile (starting corosync on node 1 from > beginning until after starting of resources) on: > http://www.marschke.info/corosync_drei.log > ) > > syslog shows: > ------------- > ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not > exist > > > Output of crm_mon: > ------------------ > ============ > Stack: openais > Current DC: poc1 - partition WITHOUT quorum > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ poc1 ] > OFFLINE: [ poc2 ] > > Master/Slave Set: ms_drbd [res_drbd] > Masters: [ poc1 ] > Stopped: [ res_drbd:1 ] > res_dlm (ocf::pacemaker:controld): Started poc1 > > Migration summary: > * Node poc1: > res_o2cb: migration-threshold=1000000 fail-count=1000000 > > Failed actions: > res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete): > unknown error > > --------------------------------------------------------------------- > This is the situation after a reboot of node poc1. For simplification i left > pacemaker / corosync unstarted on the second node, and already removed a > group and a clone resource where dlm and o2cb already had been in (errors > were there also). > > Is my configuration of the resource agents correct? > I checked using "ra meta ...", but as far as i recognized everything is ok. > > Is some piece of software missing? > dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are > available, i even did additional links in /usr/sbin: > root@poc1:~# which ocfs2_controld.pcmk > /usr/sbin/ocfs2_controld.pcmk > root@poc1:~# which dlm_controld.pcmk > /usr/sbin/dlm_controld.pcmk > root@poc1:~# > > I already googled but couldn't find any useful. Thanks for any hints...:) > > kind regards > elmar > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org