What versions of openais (corosync?) and pacemaker are you using?
On Mon, Apr 12, 2010 at 2:00 PM, Jürgen Herrmann <[email protected]> wrote: > > hi! > > i'm on debian lenny and trying to run ocfs2 on a dual primary > drbd device. the drbd device is already set up as msDRBD0. > > to get dlm_controld.pcmk i installed it from source (from > cluster-suite-3.0.10) > now i configured a resource "resDLM" with 2 clones: > primitive resDLM ocf:pacemaker:controld op monitor interval="120s" > clone cloneDLM resDLM meta globally-unique="false" interleave="true" > colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master > order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start > -> seems to work. > > > to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source. > after adding the resource: > primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s" > clone cloneO2CB resO2CB meta globally-unique="false" interleave="true" > colocation colO2CB_DLM inf: cloneO2CB cloneDLM > order ordDLM_O2CB inf: cloneDLM cloneO2CB > > i get the following errors in crm_mon: > ====================================== > Failed actions: > resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1, > status=complete): unknown error > resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1, > status=complete): unknown error > > > the relevant syslog entries: > ============================ > Apr 12 13:15:18 app1a corosync[4638]: [pcmk ] info: pcmk_notify: > Enabling node > notifications for child 8311 (0xd83090) > Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device: > Unable to access cluster service > > > > if i start "ocfs2_controld.pcmk -D" i get: > ========================================== > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: > Creating connection to our AIS plugin > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS > connection established > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server > details: id=569559765 uname=app1a.xlhost.de cname=pcmk > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node > app1a.xlhost.de now has id: 569559765 > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node > 569559765 is now known as app1a.xlhost.de > 1271072439 setup_st...@168: Cluster connection established. Local node > id: 569559765 > 1271072439 setup_st...@172: Added Pacemaker as client 1 with fd 5 > 1271072439 setup_c...@609: Initializing CKPT service (try 1) > 1271072439 setup_c...@615: Connected to CKPT service with handle > 0x327b23c600000000 > 1271072439 call_ckpt_o...@160: Opening checkpoint > "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld:21f2cad5" > with handle 0x6633487300000000 > 1271072439 call_section_wr...@340: Writing to section > "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_section_cre...@292: Creating section "daemon_max_protocol" > on checkpoint "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_section_cre...@300: Created section "daemon_max_protocol" > on checkpoint "ocfs2:controld:21f2cad5" > 1271072439 call_section_wr...@340: Writing to section "ocfs2_max_protocol" > on checkpoint "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_section_cre...@292: Creating section "ocfs2_max_protocol" > on checkpoint "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_section_cre...@300: Created section "ocfs2_max_protocol" > on checkpoint "ocfs2:controld:21f2cad5" > 1271072439 start_j...@588: Starting join for group "ocfs2:controld" > 1271072439 start_j...@592: cpg_join succeeded > 1271072439 l...@975: setup done > ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch: > Membership 156: quorum acquired > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node > app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) > ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 > proc=00000000000000000000000000013312 (new) > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node > app1b.xlhost.de now has id: 586336981 > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node > 586336981 is now known as app1b.xlhost.de > ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node > app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) > ip(213.202.242.162) votes=1 born=148 seen=156 > proc=00000000000000000000000000013312 > 1271072439 confchg...@495: confchg called > 1271072439 daemon_cha...@398: ocfs2_controld (group "ocfs2:controld") > confchg: members 1, left 0, joined 1 > 1271072439 cpg_joi...@909: CPG is live, we are the first daemon > 1271072439 call_ckpt_o...@160: Opening checkpoint "ocfs2:controld" (try 1) > 1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld" with > handle 0x2ae8944a00000001 > 1271072439 call_section_wr...@340: Writing to section "daemon_protocol" on > checkpoint "ocfs2:controld" (try 1) > 1271072439 call_section_cre...@292: Creating section "daemon_protocol" on > checkpoint "ocfs2:controld" (try 1) > 1271072439 call_section_cre...@300: Created section "daemon_protocol" on > checkpoint "ocfs2:controld" > 1271072439 call_section_wr...@340: Writing to section "ocfs2_protocol" on > checkpoint "ocfs2:controld" (try 1) > 1271072439 call_section_cre...@292: Creating section "ocfs2_protocol" on > checkpoint "ocfs2:controld" (try 1) > 1271072439 call_section_cre...@300: Created section "ocfs2_protocol" on > checkpoint "ocfs2:controld" > 1271072439 cpg_joi...@923: Daemon protocol is 1.0 > 1271072439 cpg_joi...@925: fs protocol is 1.0 > 1271072439 cpg_joi...@927: Connecting to dlm_controld >>>>>>>>>>>>>>>>>>>>>>>>> here's the error <<<<<<<<<<<<<<<<<<<<<< > 1271072439 cpg_joi...@934: Opening control device > 1271072439 cpg_joi...@938: Error opening control device: Unable to access > cluster service > 1271072439 exit_dlmcont...@363: Closing dlm_controld connection > 1271072439 start_le...@613: leaving group "ocfs2:controld" > 1271072439 start_le...@626: cpg_leave succeeded > 1271072439 exit_...@760: closing cpg connection > 1271072439 call_ckpt_cl...@240: Closing checkpoint > "ocfs2:controld:21f2cad5" (try 1) > 1271072439 call_ckpt_cl...@246: Closed checkpoint > "ocfs2:controld:21f2cad5" > 1271072439 exit_c...@643: Disconnecting from CKPT service (try 1) > 1271072439 exit_c...@647: Disconnected from CKPT service > 1271072439 exit_st...@144: closing pacemaker connection > ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: > terminate_ais_connection: Disconnected from AIS > > > obviously ocfs2_controld.pcmk can connect to the openais CKPT service and > to dlm_controld.pcmk, which then terminates the connection. > here's the output from dlm_controld.pcmk -q 0 -D: > (the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!) > ======================================================================= > 1271072755 dlm_controld 3.0.10 started > cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: > Creating connection to our AIS plugin > cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS > connection established > cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server > details: id=569559765 uname=app1a.xlhost.de cname=pcmk > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node > app1a.xlhost.de now has id: 569559765 > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765 > is now known as app1a.xlhost.de > 1271072755 found /dev/misc/dlm-control minor 58 > 1271072755 found /dev/misc/dlm-monitor minor 57 > 1271072755 found /dev/misc/dlm_plock minor 56 > 1271072755 /dev/misc/dlm-monitor fd 9 > 1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 > 1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 > 1271072755 confdb_key_get error 11 > 1271072755 group_mode 3 compat 0 > 1271072755 setup_cpg_daemon 11 > 1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765 > left > 1271072755 run protocol from nodeid 586336981 > 1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1 > 1271072755 plocks 13 > 1271072755 plock cpg message size: 104 bytes > cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership > 156: quorum acquired > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node > app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) > ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 > proc=00000000000000000000000000013312 (new) > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node > app1b.xlhost.de now has id: 586336981 > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981 > is now known as app1b.xlhost.de > cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node > app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) > ip(213.202.242.162) votes=1 born=148 seen=156 > proc=00000000000000000000000000013312 > 1271072755 Processing membership 156 > 1271072755 Adding address ip(213.202.242.161) to configfs for node > 569559765 > 1271072755 set_configfs_node 569559765 213.202.242.161 local 1 > 1271072755 Added active node 569559765: born-on=156, last-seen=156, > this-event=156, last-event=0 > 1271072755 Adding address ip(213.202.242.162) to configfs for node > 586336981 > 1271072755 set_configfs_node 586336981 213.202.242.162 local 0 > 1271072755 Added active node 586336981: born-on=148, last-seen=156, > this-event=156, last-event=0 > 1271072763 client connection 5 fd 14 > 1271072763 connection 5 read error -1 > 1271072776 client connection 5 fd 14 > 1271072776 connection 5 read error -1 > 1271072779 client connection 5 fd 14 > 1271072779 connection 5 read error -1 > > > > i'm pretty lost at the moment, as there's nothing i can find via google > regarding the "core" problem: > 1271072439 cpg_joi...@934: Opening control device > 1271072439 cpg_joi...@938: Error opening control device: Unable to access > cluster service > > > any help would be greatly appreciated. > > best regards, > jürgen herrmann > -- >>> XLhost.de - eXperts in Linux hosting ® << > > XLhost.de GmbH > Jürgen Herrmann, Geschäftsführer > Boelckestrasse 21, 93051 Regensburg, Germany > > Geschäftsführer: Volker Geith, Jürgen Herrmann > Registriert unter: HRB9918 > Umsatzsteuer-Identifikationsnummer: DE245931218 > > Fon: +49 (0)800 XLHOSTDE [0800 95467833] > Fax: +49 (0)800 95467830 > > WEB: http://www.XLhost.de > IRC: #[email protected] > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
