hi! i'm on debian lenny and trying to run ocfs2 on a dual primary drbd device. the drbd device is already set up as msDRBD0.
to get dlm_controld.pcmk i installed it from source (from cluster-suite-3.0.10) now i configured a resource "resDLM" with 2 clones: primitive resDLM ocf:pacemaker:controld op monitor interval="120s" clone cloneDLM resDLM meta globally-unique="false" interleave="true" colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start -> seems to work. to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source. after adding the resource: primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s" clone cloneO2CB resO2CB meta globally-unique="false" interleave="true" colocation colO2CB_DLM inf: cloneO2CB cloneDLM order ordDLM_O2CB inf: cloneDLM cloneO2CB i get the following errors in crm_mon: ====================================== Failed actions: resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1, status=complete): unknown error resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1, status=complete): unknown error the relevant syslog entries: ============================ Apr 12 13:15:18 app1a corosync[4638]: [pcmk ] info: pcmk_notify: Enabling node notifications for child 8311 (0xd83090) Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device: Unable to access cluster service if i start "ocfs2_controld.pcmk -D" i get: ========================================== ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: Creating connection to our AIS plugin ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS connection established ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server details: id=569559765 uname=app1a.xlhost.de cname=pcmk ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node app1a.xlhost.de now has id: 569559765 ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node 569559765 is now known as app1a.xlhost.de 1271072439 setup_st...@168: Cluster connection established. Local node id: 569559765 1271072439 setup_st...@172: Added Pacemaker as client 1 with fd 5 1271072439 setup_c...@609: Initializing CKPT service (try 1) 1271072439 setup_c...@615: Connected to CKPT service with handle 0x327b23c600000000 1271072439 call_ckpt_o...@160: Opening checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld:21f2cad5" with handle 0x6633487300000000 1271072439 call_section_wr...@340: Writing to section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_cre...@292: Creating section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_cre...@300: Created section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" 1271072439 call_section_wr...@340: Writing to section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_cre...@292: Creating section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_cre...@300: Created section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" 1271072439 start_j...@588: Starting join for group "ocfs2:controld" 1271072439 start_j...@592: cpg_join succeeded 1271072439 l...@975: setup done ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch: Membership 156: quorum acquired ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 proc=00000000000000000000000000013312 (new) ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node app1b.xlhost.de now has id: 586336981 ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node 586336981 is now known as app1b.xlhost.de ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) ip(213.202.242.162) votes=1 born=148 seen=156 proc=00000000000000000000000000013312 1271072439 confchg...@495: confchg called 1271072439 daemon_cha...@398: ocfs2_controld (group "ocfs2:controld") confchg: members 1, left 0, joined 1 1271072439 cpg_joi...@909: CPG is live, we are the first daemon 1271072439 call_ckpt_o...@160: Opening checkpoint "ocfs2:controld" (try 1) 1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld" with handle 0x2ae8944a00000001 1271072439 call_section_wr...@340: Writing to section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_cre...@292: Creating section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_cre...@300: Created section "daemon_protocol" on checkpoint "ocfs2:controld" 1271072439 call_section_wr...@340: Writing to section "ocfs2_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_cre...@292: Creating section "ocfs2_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_cre...@300: Created section "ocfs2_protocol" on checkpoint "ocfs2:controld" 1271072439 cpg_joi...@923: Daemon protocol is 1.0 1271072439 cpg_joi...@925: fs protocol is 1.0 1271072439 cpg_joi...@927: Connecting to dlm_controld 1271072439 cpg_joi...@934: Opening control device 1271072439 cpg_joi...@938: Error opening control device: Unable to access cluster service 1271072439 exit_dlmcont...@363: Closing dlm_controld connection 1271072439 start_le...@613: leaving group "ocfs2:controld" 1271072439 start_le...@626: cpg_leave succeeded 1271072439 exit_...@760: closing cpg connection 1271072439 call_ckpt_cl...@240: Closing checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_ckpt_cl...@246: Closed checkpoint "ocfs2:controld:21f2cad5" 1271072439 exit_c...@643: Disconnecting from CKPT service (try 1) 1271072439 exit_c...@647: Disconnected from CKPT service 1271072439 exit_st...@144: closing pacemaker connection ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: terminate_ais_connection: Disconnected from AIS obviously ocfs2_controld.pcmk can connect to the openais CKPT service and to dlm_controld.pcmk, which then terminates the connection. here's the output from dlm_controld.pcmk -q 0 -D: (the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!) ======================================================================= 1271072755 dlm_controld 3.0.10 started cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: Creating connection to our AIS plugin cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS connection established cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server details: id=569559765 uname=app1a.xlhost.de cname=pcmk cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node app1a.xlhost.de now has id: 569559765 cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765 is now known as app1a.xlhost.de 1271072755 found /dev/misc/dlm-control minor 58 1271072755 found /dev/misc/dlm-monitor minor 57 1271072755 found /dev/misc/dlm_plock minor 56 1271072755 /dev/misc/dlm-monitor fd 9 1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 1271072755 confdb_key_get error 11 1271072755 group_mode 3 compat 0 1271072755 setup_cpg_daemon 11 1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765 left 1271072755 run protocol from nodeid 586336981 1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1 1271072755 plocks 13 1271072755 plock cpg message size: 104 bytes cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership 156: quorum acquired cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 proc=00000000000000000000000000013312 (new) cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node app1b.xlhost.de now has id: 586336981 cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981 is now known as app1b.xlhost.de cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) ip(213.202.242.162) votes=1 born=148 seen=156 proc=00000000000000000000000000013312 1271072755 Processing membership 156 1271072755 Adding address ip(213.202.242.161) to configfs for node 569559765 1271072755 set_configfs_node 569559765 213.202.242.161 local 1 1271072755 Added active node 569559765: born-on=156, last-seen=156, this-event=156, last-event=0 1271072755 Adding address ip(213.202.242.162) to configfs for node 586336981 1271072755 set_configfs_node 586336981 213.202.242.162 local 0 1271072755 Added active node 586336981: born-on=148, last-seen=156, this-event=156, last-event=0 1271072763 client connection 5 fd 14 1271072763 connection 5 read error -1 1271072776 client connection 5 fd 14 1271072776 connection 5 read error -1 1271072779 client connection 5 fd 14 1271072779 connection 5 read error -1 i'm pretty lost at the moment, as there's nothing i can find via google regarding the "core" problem: 1271072439 cpg_joi...@934: Opening control device 1271072439 cpg_joi...@938: Error opening control device: Unable to access cluster service any help would be greatly appreciated. best regards, jürgen herrmann -- >> XLhost.de - eXperts in Linux hosting ® << XLhost.de GmbH Jürgen Herrmann, Geschäftsführer Boelckestrasse 21, 93051 Regensburg, Germany Geschäftsführer: Volker Geith, Jürgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 WEB: http://www.XLhost.de IRC: #xlh...@irc.quakenet.org _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users