Hello Andrew et al, few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well. It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some reasong attrd and cib seem to be crashing.
here's snippet from corosync log: Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP). Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up. Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP). Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up. Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP). Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up. Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP). Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up. Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options... Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options... Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null)) Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0) Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6 Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074 Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610) Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service. Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup... Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration. Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started. Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074 Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3 Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully. Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false) Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false) Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/ Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup... Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration. Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false) Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false) Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd and last few lines then keep repeating... here's gdb backtrace obtained from core files: cib: #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4 #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0, our_uname=0x616f28, nodeid=0x0) at ais.c:588 #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0) at cluster.c:56 #4 0x000000000040d753 in cib_init () at main.c:424 #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218 attrd: #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4 #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>, our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588 #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0) at cluster.c:56 #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569 Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I deleted old (testing) packages. But the versions are the same. Any idea where I should look for possible culprit? thanks a lot for reply! with best regards nik -- ------------------------------------- Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz ------------------------------------- _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker