2013/2/11 Dan Frincu <df.clus...@gmail.com> > Hi, > > On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov > <v.v.biriu...@gmail.com> wrote: > > Hi guys, > > > > Got a tricky issue with Corosync and Pacemaker over DHCP IP address using > > unicast. Corosync craches periodically. > > > > Packages are from centos 6 repos: > > corosync-1.4.1-7.el6_3.1.x86_64 > > corosynclib-1.4.1-7.el6_3.1.x86_64 > > pacemaker-cluster-libs-1.1.7-6.el6.x86_64 > > pacemaker-libs-1.1.7-6.el6.x86_64 > > pacemaker-cli-1.1.7-6.el6.x86_64 > > pacemaker-1.1.7-6.el6.x86_64 > > > > > > Logs > > > > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor > > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new > > configuration. > > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down. > > This ^^^ is your problem. Corosync doesn't like it, see > > https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface > > Normally DHCP shouldn't take the interface down. Also, since changing > the network configuration in corosync means restarting it, why not go > with static IP's? > > HTH, > Dan > > > Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] is > > now up. > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: > cfg_connection_destroy: > > Connection destroyed > > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: > > Receiving message body failed: (2) Library error: Resource temporarily > > unavailable (11) > > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: > > Receiving message body failed: (2) Library error: Resource temporarily > > unavailable (11) > > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: > > Receiving message body failed: (2) Library error: Resource temporarily > > unavailable (11) > > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: > AIS > > connection failed > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: > cpg_connection_destroy: > > Connection destroyed > > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: > AIS > > connection failed > > Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy: > > connection closed > > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: > AIS > > connection failed > > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: > > Receiving message body failed: (2) Library error: Resource temporarily > > unavailable (11) > > Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: > AIS > > connection terminated > > Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: > Lost > > connection to OpenAIS service! > > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: > > Shuting down Pacemaker > > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: > AIS > > connection failed > > Feb 10 07:56:25 [5249] host1 attrd: notice: main: Exiting... > > Feb 10 07:56:25 [5247] host1 stonith-ng: error: > stonith_peer_ais_destroy: > > AIS connection terminated > > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child: > > Stopping crmd: Sent -15 to process 5251 > > Feb 10 07:56:25 [5249] host1 attrd: error: > > attrd_cib_connection_destroy: Connection to the CIB terminated... > > Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch: > > Invoking handler for signal 15: Terminated > > Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown: > > Requesting shutdown, upper limit is 1200000ms > > Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req: > > Sending shutdown request to host2 > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > Child > > process stonith-ng exited (pid=5247, rc=1) > > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > IPC > > Channel to 5249 is not connected > > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > IPC > > Channel to 5246 is not connected > > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > IPC > > Channel to 5247 is not connected > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > Child > > process cib exited (pid=5246, rc=1) > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > Child > > process attrd exited (pid=5249, rc=1) > > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text: > > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection > timed > > out (110) > > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input > > I_ERROR from do_shutdown_req() received in state S_NOT_DC > > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: > > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR > cause=C_FSA_INTERNAL > > origin=do_shutdown_req ] > > Feb 10 07:56:27 [5251] host1 crmd: error: do_recover: > > Action A_RECOVER (0000000001000000) not supported > > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input > > I_TERMINATE from do_recover() received in state S_RECOVERY > > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: > > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > > cause=C_FSA_INTERNAL origin=do_recover ] > > Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown: > > Disconnecting STONITH... > > Feb 10 07:56:27 [5251] host1 crmd: info: > > tengine_stonith_connection_destroy: Fencing daemon disconnected > > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation > monitor[25] > > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its > parameters: > > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000] > > CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled > > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: > > Resource P_SESSION_IP was active at shutdown. You may ignore this error > if > > it is unmanaged. > > Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control: > > Disconnected from the LRM > > Feb 10 07:56:27 [5251] host1 crmd: notice: > terminate_ais_connection: > > Disconnecting from AIS > > Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control: > > Disconnected from OpenAIS > > Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control: > > Disconnecting CIB > > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: > IPC > > Channel to 5246 is not connected > > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: > IPC > > Channel to 5246 is not connected > > Feb 10 07:56:27 [5251] host1 crmd: error: > > cib_native_perform_op_delegate: Sending message to CIB service FAILED > > Feb 10 07:56:27 [5251] host1 crmd: info: > > crmd_cib_connection_destroy: Connection to the CIB terminated... > > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: > > Resource P_SESSION_IP was active at shutdown. You may ignore this error > if > > it is unmanaged. > > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: Performing > > A_EXIT_0 - gracefully exiting the CRMd > > Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could not > > recover from internal error > > Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping > > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] > > Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup: > > Cleaning up memory from libxml2 > > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd] > > stopped (2) > > Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: > Child > > process crmd exited (pid=5251, rc=2) > > Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: > IPC > > Channel to 5251 is not connected > > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: > > Stopping pengine: Sent -15 to process 5250 > > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: > Child > > process pengine exited (pid=5250, rc=0) > > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: > > Stopping lrmd: Sent -15 to process 5248 > > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down > > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: > Child > > process lrmd exited (pid=5248, rc=0) > > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > > Sending message via cpg FAILED: (rc=9) Bad handle > > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: > > Shutdown complete > > Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting > > pacemakerd > > > > > > corosync.conf: > > > > compatibility: whitetank > > > > totem { > > version: 2 > > secauth: off > > nodeid: 104 > > interface { > > member { > > memberaddr: 172.17.0.104 > > } > > member { > > memberaddr: 172.17.0.105 > > } > > ringnumber: 0 > > bindnetaddr: 172.17.0.0 > > mcastport: 5426 > > ttl: 1 > > } > > transport: udpu > > } > > > > logging { > > fileline: off > > to_logfile: yes > > to_syslog: yes > > debug: on > > logfile: /var/log/cluster/corosync.log > > debug: off > > timestamp: on > > logger_subsys { > > subsys: AMF > > debug: off > > } > > } > > service { > > # Load the Pacemaker Cluster Resource Manager > > ver: 1 > > name: pacemaker > > } > > > > aisexec { > > user: root > > group: root > > } > > > > > > > > Thank you! > > > > -- > > Viacheslav Biriukov > > BR > > http://biriukov.me > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > -- > Dan Frincu > CCNA, RHCE > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
-- Viacheslav Biriukov BR http://biriukov.me
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org