Hi guys, Got a tricky issue with Corosync and Pacemaker over DHCP IP address using unicast. Corosync craches periodically.
Packages are from centos 6 repos: corosync-1.4.1-7.el6_3.1.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 *Logs* Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new configuration. Feb 10 07:56:22 corosync [TOTEM ] The network interface is down. Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] is now up. Feb 10 07:56:25 [5242] host1 pacemakerd: error: cfg_connection_destroy: Connection destroyed Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11) Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11) Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11) Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: AIS connection failed Feb 10 07:56:25 [5242] host1 pacemakerd: error: cpg_connection_destroy: Connection destroyed Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: AIS connection failed Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy: connection closed Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: AIS connection failed Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11) Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: AIS connection terminated Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: Lost connection to OpenAIS service! Feb 10 07:56:25 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: Shuting down Pacemaker Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: AIS connection failed Feb 10 07:56:25 [5249] host1 attrd: notice: main: Exiting... Feb 10 07:56:25 [5247] host1 stonith-ng: error: stonith_peer_ais_destroy: AIS connection terminated Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child: Stopping crmd: Sent -15 to process 5251 Feb 10 07:56:25 [5249] host1 attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated... Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req: Sending shutdown request to host2 Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child process stonith-ng exited (pid=5247, rc=1) Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC Channel to 5249 is not connected Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC Channel to 5246 is not connected Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC Channel to 5247 is not connected Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child process cib exited (pid=5246, rc=1) Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child process attrd exited (pid=5249, rc=1) Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text: Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection timed out (110) Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input I_ERROR from do_shutdown_req() received in state S_NOT_DC Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=do_shutdown_req ] Feb 10 07:56:27 [5251] host1 crmd: error: do_recover: Action A_RECOVER (0000000001000000) not supported Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ] Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown: Disconnecting STONITH... Feb 10 07:56:27 [5251] host1 crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation monitor[25] on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000] CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: Resource P_SESSION_IP was active at shutdown. You may ignore this error if it is unmanaged. Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control: Disconnected from the LRM Feb 10 07:56:27 [5251] host1 crmd: notice: terminate_ais_connection: Disconnecting from AIS Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control: Disconnected from OpenAIS Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control: Disconnecting CIB Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: IPC Channel to 5246 is not connected Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: IPC Channel to 5246 is not connected Feb 10 07:56:27 [5251] host1 crmd: error: cib_native_perform_op_delegate: Sending message to CIB service FAILED Feb 10 07:56:27 [5251] host1 crmd: info: crmd_cib_connection_destroy: Connection to the CIB terminated... Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: Resource P_SESSION_IP was active at shutdown. You may ignore this error if it is unmanaged. Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could not recover from internal error Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd] stopped (2) Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: Child process crmd exited (pid=5251, rc=2) Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: IPC Channel to 5251 is not connected Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: Stopping pengine: Sent -15 to process 5250 Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: Child process pengine exited (pid=5250, rc=0) Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: Stopping lrmd: Sent -15 to process 5248 Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: Child process lrmd exited (pid=5248, rc=0) Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle Feb 10 07:56:27 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting pacemakerd *corosync.conf:* compatibility: whitetank totem { version: 2 secauth: off nodeid: 104 interface { member { memberaddr: 172.17.0.104 } member { memberaddr: 172.17.0.105 } ringnumber: 0 bindnetaddr: 172.17.0.0 mcastport: 5426 ttl: 1 } transport: udpu } logging { fileline: off to_logfile: yes to_syslog: yes debug: on logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { # Load the Pacemaker Cluster Resource Manager ver: 1 name: pacemaker } aisexec { user: root group: root } Thank you! -- Viacheslav Biriukov BR http://biriukov.me
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org