Am 19.08.2013 16:25, schrieb Vladislav Bogdanov:
16.08.2013 16:04, Elmar Marschke wrote:
Hi all,
i'm working on a two node pacemaker cluster with dual primary drbd and
ocfs2.
Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading,
writing, everything...).
ocfs2 uses own clustering stack by default.
When i try to make this work in pacemaker, there seems to be a problem
to start the o2cb resource.
My (already simplified) configuration is:
-----------------------------------------
node poc1 \
attributes standby="off"
node poc2 \
attributes standby="off"
primitive res_dlm ocf:pacemaker:controld \
op monitor interval="120"
primitive res_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op stop interval="0" timeout="100" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notifiy interval="0" timeout="90" \
op monitor interval="40" role="Slave" timeout="20" \
op monitor interval="20" role="Master" timeout="20"
primitive res_o2cb ocf:pacemaker:o2cb \
op monitor interval="60"
ms ms_drbd res_drbd \
meta notify="true" master-max="2" master-node-max="1"
target-role="Started"
property $id="cib-bootstrap-options" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
last-lrm-refresh="1376574860"
Side note: you need to run both dlm and o2cb as clones, and group them
(either with "group" or with pair of colocation/order statements), so so
ocfs2_controld is started when dlm_controld already runs. You probably
already tried that, but do not forget the last part of this.
First error message in corosync.log as far as i can identify it:
----------------------------------------------------------------
lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
(
You can find the whole corosync logfile (starting corosync on node 1
from beginning until after starting of resources) on:
http://www.marschke.info/corosync_drei.log
)
syslog shows:
-------------
ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not exist
How exactly did you start corosync process? As "corosync" or as "openais"?
Background is that CKPT service is not loaded by corosync by default,
only if it is started by openais script, you may want to look at it for
details.
hello vladislav,
thanks for this information. I started it as "corosync2". Just for
interest, do you know what "CKPT" means? Anyway, currently i think this
log message isn't so relevant anymore, because my cluster is running
fine (apart from another "little" issue, but maybe this is more related
to the virtual machine i'm currently running as a resource on the
cluster - i have to research that further...).
regards
e.
Output of crm_mon:
------------------
============
Stack: openais
Current DC: poc1 - partition WITHOUT quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ poc1 ]
OFFLINE: [ poc2 ]
Master/Slave Set: ms_drbd [res_drbd]
Masters: [ poc1 ]
Stopped: [ res_drbd:1 ]
res_dlm (ocf::pacemaker:controld): Started poc1
Migration summary:
* Node poc1:
res_o2cb: migration-threshold=1000000 fail-count=1000000
Failed actions:
res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete): unknown
error
---------------------------------------------------------------------
This is the situation after a reboot of node poc1. For simplification i
left pacemaker / corosync unstarted on the second node, and already
removed a group and a clone resource where dlm and o2cb already had been
in (errors were there also).
Is my configuration of the resource agents correct?
I checked using "ra meta ...", but as far as i recognized everything is ok.
Is some piece of software missing?
dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
available, i even did additional links in /usr/sbin:
root@poc1:~# which ocfs2_controld.pcmk
/usr/sbin/ocfs2_controld.pcmk
root@poc1:~# which dlm_controld.pcmk
/usr/sbin/dlm_controld.pcmk
root@poc1:~#
I already googled but couldn't find any useful. Thanks for any hints...:)
kind regards
elmar
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org