hi!

i'm on debian lenny and trying to run ocfs2 on a dual primary
drbd device. the drbd device is already set up as msDRBD0.

to get dlm_controld.pcmk i installed it from source (from
cluster-suite-3.0.10)
now i configured a resource "resDLM" with 2 clones:
  primitive resDLM ocf:pacemaker:controld op monitor interval="120s"
  clone cloneDLM resDLM meta globally-unique="false" interleave="true"
  colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
  order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
-> seems to work.


to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source.
after adding the resource:
  primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
  clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
  colocation colO2CB_DLM inf: cloneO2CB cloneDLM
  order ordDLM_O2CB inf: cloneDLM cloneO2CB

i get the following errors in crm_mon:
======================================
Failed actions:
    resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1,
status=complete): unknown error
    resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1,
status=complete): unknown error


the relevant syslog entries:
============================
Apr 12 13:15:18 app1a corosync[4638]:   [pcmk  ] info: pcmk_notify:
Enabling node  notifications for child 8311 (0xd83090)
Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device:
Unable to access cluster service



if i start "ocfs2_controld.pcmk -D" i get:
==========================================
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection:
Creating connection to our AIS plugin
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS
connection established
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
569559765 is now known as app1a.xlhost.de
1271072439 setup_st...@168: Cluster connection established.  Local node
id: 569559765
1271072439 setup_st...@172: Added Pacemaker as client 1 with fd 5
1271072439 setup_c...@609: Initializing CKPT service (try 1)
1271072439 setup_c...@615: Connected to CKPT service with handle
0x327b23c600000000
1271072439 call_ckpt_o...@160: Opening checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld:21f2cad5"
with handle 0x6633487300000000
1271072439 call_section_wr...@340: Writing to section
"daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_cre...@292: Creating section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_cre...@300: Created section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 call_section_wr...@340: Writing to section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_cre...@292: Creating section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_cre...@300: Created section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 start_j...@588: Starting join for group "ocfs2:controld"
1271072439 start_j...@592: cpg_join succeeded
1271072439 l...@975: setup done
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch:
Membership 156: quorum acquired
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
586336981 is now known as app1b.xlhost.de
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(213.202.242.162)  votes=1 born=148 seen=156
proc=00000000000000000000000000013312
1271072439 confchg...@495: confchg called
1271072439 daemon_cha...@398: ocfs2_controld (group "ocfs2:controld")
confchg: members 1, left 0, joined 1
1271072439 cpg_joi...@909: CPG is live, we are the first daemon
1271072439 call_ckpt_o...@160: Opening checkpoint "ocfs2:controld" (try 1)
1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld" with
handle 0x2ae8944a00000001
1271072439 call_section_wr...@340: Writing to section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_cre...@292: Creating section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_cre...@300: Created section "daemon_protocol" on
checkpoint "ocfs2:controld"
1271072439 call_section_wr...@340: Writing to section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_cre...@292: Creating section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_cre...@300: Created section "ocfs2_protocol" on
checkpoint "ocfs2:controld"
1271072439 cpg_joi...@923: Daemon protocol is 1.0
1271072439 cpg_joi...@925: fs protocol is 1.0
1271072439 cpg_joi...@927: Connecting to dlm_controld
1271072439 cpg_joi...@934: Opening control device
1271072439 cpg_joi...@938: Error opening control device: Unable to access
cluster service
1271072439 exit_dlmcont...@363: Closing dlm_controld connection
1271072439 start_le...@613: leaving group "ocfs2:controld"
1271072439 start_le...@626: cpg_leave succeeded
1271072439 exit_...@760: closing cpg connection
1271072439 call_ckpt_cl...@240: Closing checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_cl...@246: Closed checkpoint
"ocfs2:controld:21f2cad5"
1271072439 exit_c...@643: Disconnecting from CKPT service (try 1)
1271072439 exit_c...@647: Disconnected from CKPT service
1271072439 exit_st...@144: closing pacemaker connection
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice:
terminate_ais_connection: Disconnected from AIS


obviously ocfs2_controld.pcmk can connect to the openais CKPT service and
to dlm_controld.pcmk, which then terminates the connection.
here's the output from dlm_controld.pcmk -q 0 -D:
(the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!)
=======================================================================
1271072755 dlm_controld 3.0.10 started
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection:
Creating connection to our AIS plugin
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS
connection established
cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765
is now known as app1a.xlhost.de
1271072755 found /dev/misc/dlm-control minor 58
1271072755 found /dev/misc/dlm-monitor minor 57
1271072755 found /dev/misc/dlm_plock minor 56
1271072755 /dev/misc/dlm-monitor fd 9
1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1271072755 confdb_key_get error 11
1271072755 group_mode 3 compat 0
1271072755 setup_cpg_daemon 11
1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765
left
1271072755 run protocol from nodeid 586336981
1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1271072755 plocks 13
1271072755 plock cpg message size: 104 bytes
cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership
156: quorum acquired
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981
is now known as app1b.xlhost.de
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(213.202.242.162)  votes=1 born=148 seen=156
proc=00000000000000000000000000013312
1271072755 Processing membership 156
1271072755 Adding address ip(213.202.242.161) to configfs for node
569559765
1271072755 set_configfs_node 569559765 213.202.242.161 local 1
1271072755 Added active node 569559765: born-on=156, last-seen=156,
this-event=156, last-event=0
1271072755 Adding address ip(213.202.242.162) to configfs for node
586336981
1271072755 set_configfs_node 586336981 213.202.242.162 local 0
1271072755 Added active node 586336981: born-on=148, last-seen=156,
this-event=156, last-event=0
1271072763 client connection 5 fd 14
1271072763 connection 5 read error -1
1271072776 client connection 5 fd 14
1271072776 connection 5 read error -1
1271072779 client connection 5 fd 14
1271072779 connection 5 read error -1



i'm pretty lost at the moment, as there's nothing i can find via google
regarding the "core" problem:
1271072439 cpg_joi...@934: Opening control device
1271072439 cpg_joi...@938: Error opening control device: Unable to access
cluster service


any help would be greatly appreciated.

best regards,
jürgen herrmann
-- 
>> XLhost.de - eXperts in Linux hosting ® <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Volker Geith, Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830

WEB:  http://www.XLhost.de
IRC:  #xlh...@irc.quakenet.org


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to