It is VM in the OpenStack. So we can't use static IP. Right now investigating why interface become down.
Thank you! 2013/2/11 Viacheslav Biriukov <v.v.biriu...@gmail.com> > > > > 2013/2/11 Dan Frincu <df.clus...@gmail.com> > >> Hi, >> >> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov >> <v.v.biriu...@gmail.com> wrote: >> > Hi guys, >> > >> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address >> using >> > unicast. Corosync craches periodically. >> > >> > Packages are from centos 6 repos: >> > corosync-1.4.1-7.el6_3.1.x86_64 >> > corosynclib-1.4.1-7.el6_3.1.x86_64 >> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64 >> > pacemaker-libs-1.1.7-6.el6.x86_64 >> > pacemaker-cli-1.1.7-6.el6.x86_64 >> > pacemaker-1.1.7-6.el6.x86_64 >> > >> > >> > Logs >> > >> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new >> > configuration. >> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down. >> >> This ^^^ is your problem. Corosync doesn't like it, see >> >> https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface >> >> Normally DHCP shouldn't take the interface down. Also, since changing >> the network configuration in corosync means restarting it, why not go >> with static IP's? >> >> HTH, >> Dan >> >> > Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] >> is >> > now up. >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: >> cfg_connection_destroy: >> > Connection destroyed >> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: >> > Receiving message body failed: (2) Library error: Resource temporarily >> > unavailable (11) >> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: >> > Receiving message body failed: (2) Library error: Resource temporarily >> > unavailable (11) >> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: >> > Receiving message body failed: (2) Library error: Resource temporarily >> > unavailable (11) >> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: >> AIS >> > connection failed >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: >> cpg_connection_destroy: >> > Connection destroyed >> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: >> AIS >> > connection failed >> > Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy: >> > connection closed >> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: >> AIS >> > connection failed >> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: >> > Receiving message body failed: (2) Library error: Resource temporarily >> > unavailable (11) >> > Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: >> AIS >> > connection terminated >> > Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: >> Lost >> > connection to OpenAIS service! >> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: >> > Shuting down Pacemaker >> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: >> AIS >> > connection failed >> > Feb 10 07:56:25 [5249] host1 attrd: notice: main: >> Exiting... >> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: >> stonith_peer_ais_destroy: >> > AIS connection terminated >> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child: >> > Stopping crmd: Sent -15 to process 5251 >> > Feb 10 07:56:25 [5249] host1 attrd: error: >> > attrd_cib_connection_destroy: Connection to the CIB terminated... >> > Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch: >> > Invoking handler for signal 15: Terminated >> > Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown: >> > Requesting shutdown, upper limit is 1200000ms >> > Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req: >> > Sending shutdown request to host2 >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >> Child >> > process stonith-ng exited (pid=5247, rc=1) >> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >> IPC >> > Channel to 5249 is not connected >> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >> IPC >> > Channel to 5246 is not connected >> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >> IPC >> > Channel to 5247 is not connected >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >> Child >> > process cib exited (pid=5246, rc=1) >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >> Child >> > process attrd exited (pid=5249, rc=1) >> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text: >> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection >> timed >> > out (110) >> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: >> Input >> > I_ERROR from do_shutdown_req() received in state S_NOT_DC >> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: >> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR >> cause=C_FSA_INTERNAL >> > origin=do_shutdown_req ] >> > Feb 10 07:56:27 [5251] host1 crmd: error: do_recover: >> > Action A_RECOVER (0000000001000000) not supported >> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: >> Input >> > I_TERMINATE from do_recover() received in state S_RECOVERY >> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: >> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE >> > cause=C_FSA_INTERNAL origin=do_recover ] >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown: >> > Disconnecting STONITH... >> > Feb 10 07:56:27 [5251] host1 crmd: info: >> > tengine_stonith_connection_destroy: Fencing daemon disconnected >> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation >> monitor[25] >> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its >> parameters: >> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000] >> > CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled >> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: >> > Resource P_SESSION_IP was active at shutdown. You may ignore this >> error if >> > it is unmanaged. >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control: >> > Disconnected from the LRM >> > Feb 10 07:56:27 [5251] host1 crmd: notice: >> terminate_ais_connection: >> > Disconnecting from AIS >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control: >> > Disconnected from OpenAIS >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control: >> > Disconnecting CIB >> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: >> IPC >> > Channel to 5246 is not connected >> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: >> IPC >> > Channel to 5246 is not connected >> > Feb 10 07:56:27 [5251] host1 crmd: error: >> > cib_native_perform_op_delegate: Sending message to CIB service >> FAILED >> > Feb 10 07:56:27 [5251] host1 crmd: info: >> > crmd_cib_connection_destroy: Connection to the CIB terminated... >> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: >> > Resource P_SESSION_IP was active at shutdown. You may ignore this >> error if >> > it is unmanaged. >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: >> Performing >> > A_EXIT_0 - gracefully exiting the CRMd >> > Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could not >> > recover from internal error >> > Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping >> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] >> > Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup: >> > Cleaning up memory from libxml2 >> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd] >> > stopped (2) >> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: >> Child >> > process crmd exited (pid=5251, rc=2) >> > Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: >> IPC >> > Channel to 5251 is not connected >> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: >> > Stopping pengine: Sent -15 to process 5250 >> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: >> Child >> > process pengine exited (pid=5250, rc=0) >> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: >> > Stopping lrmd: Sent -15 to process 5248 >> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down >> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: >> Child >> > process lrmd exited (pid=5248, rc=0) >> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >> > Sending message via cpg FAILED: (rc=9) Bad handle >> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker: >> > Shutdown complete >> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting >> > pacemakerd >> > >> > >> > corosync.conf: >> > >> > compatibility: whitetank >> > >> > totem { >> > version: 2 >> > secauth: off >> > nodeid: 104 >> > interface { >> > member { >> > memberaddr: 172.17.0.104 >> > } >> > member { >> > memberaddr: 172.17.0.105 >> > } >> > ringnumber: 0 >> > bindnetaddr: 172.17.0.0 >> > mcastport: 5426 >> > ttl: 1 >> > } >> > transport: udpu >> > } >> > >> > logging { >> > fileline: off >> > to_logfile: yes >> > to_syslog: yes >> > debug: on >> > logfile: /var/log/cluster/corosync.log >> > debug: off >> > timestamp: on >> > logger_subsys { >> > subsys: AMF >> > debug: off >> > } >> > } >> > service { >> > # Load the Pacemaker Cluster Resource Manager >> > ver: 1 >> > name: pacemaker >> > } >> > >> > aisexec { >> > user: root >> > group: root >> > } >> > >> > >> > >> > Thank you! >> > >> > -- >> > Viacheslav Biriukov >> > BR >> > http://biriukov.me >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> >> >> -- >> Dan Frincu >> CCNA, RHCE >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > > -- > Viacheslav Biriukov > BR > http://biriukov.me > -- Viacheslav Biriukov BR http://biriukov.me
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org