We need solution for something like VIP for our MySQL servers (for example) with auto migration when something go wrong. If you have a better solution – please suggest. Talking about dynamic IP addresses: it is not important for us. After boot (not every day) we reconfigure cluster using maintenance mode in the pacemaker.
2013/2/11 Andrew Beekhof <and...@beekhof.net> > On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov > <v.v.biriu...@gmail.com> wrote: > > It is VM in the OpenStack. So we can't use static IP. > > Right now investigating why interface become down. > > Even if you solve that, dynamic IP addresses are fundamentally > incompatible with cluster software. > You're effectively trying to create a cluster out of nodes which > change their name every time they boot. > > > > > Thank you! > > > > > > 2013/2/11 Viacheslav Biriukov <v.v.biriu...@gmail.com> > >> > >> > >> > >> > >> 2013/2/11 Dan Frincu <df.clus...@gmail.com> > >>> > >>> Hi, > >>> > >>> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov > >>> <v.v.biriu...@gmail.com> wrote: > >>> > Hi guys, > >>> > > >>> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address > >>> > using > >>> > unicast. Corosync craches periodically. > >>> > > >>> > Packages are from centos 6 repos: > >>> > corosync-1.4.1-7.el6_3.1.x86_64 > >>> > corosynclib-1.4.1-7.el6_3.1.x86_64 > >>> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64 > >>> > pacemaker-libs-1.1.7-6.el6.x86_64 > >>> > pacemaker-cli-1.1.7-6.el6.x86_64 > >>> > pacemaker-1.1.7-6.el6.x86_64 > >>> > > >>> > > >>> > Logs > >>> > > >>> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: > monitor > >>> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new > >>> > configuration. > >>> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down. > >>> > >>> This ^^^ is your problem. Corosync doesn't like it, see > >>> > >>> > https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface > >>> > >>> Normally DHCP shouldn't take the interface down. Also, since changing > >>> the network configuration in corosync means restarting it, why not go > >>> with static IP's? > >>> > >>> HTH, > >>> Dan > >>> > >>> > Feb 10 07:56:24 corosync [TOTEM ] The network interface > [172.17.0.104] > >>> > is > >>> > now up. > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: > >>> > cfg_connection_destroy: > >>> > Connection destroyed > >>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: > >>> > Receiving message body failed: (2) Library error: Resource > temporarily > >>> > unavailable (11) > >>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: > >>> > Receiving message body failed: (2) Library error: Resource > temporarily > >>> > unavailable (11) > >>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: > >>> > Receiving message body failed: (2) Library error: Resource > temporarily > >>> > unavailable (11) > >>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: > >>> > AIS > >>> > connection failed > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: > >>> > cpg_connection_destroy: > >>> > Connection destroyed > >>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: > >>> > AIS > >>> > connection failed > >>> > Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy: > >>> > connection closed > >>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: > >>> > AIS > >>> > connection failed > >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: > >>> > Receiving message body failed: (2) Library error: Resource > temporarily > >>> > unavailable (11) > >>> > Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: > >>> > AIS > >>> > connection terminated > >>> > Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: > >>> > Lost > >>> > connection to OpenAIS service! > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: > >>> > pcmk_shutdown_worker: > >>> > Shuting down Pacemaker > >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: > >>> > AIS > >>> > connection failed > >>> > Feb 10 07:56:25 [5249] host1 attrd: notice: main: > >>> > Exiting... > >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: > >>> > stonith_peer_ais_destroy: > >>> > AIS connection terminated > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child: > >>> > Stopping crmd: Sent -15 to process 5251 > >>> > Feb 10 07:56:25 [5249] host1 attrd: error: > >>> > attrd_cib_connection_destroy: Connection to the CIB > terminated... > >>> > Feb 10 07:56:25 [5251] host1 crmd: info: > crm_signal_dispatch: > >>> > Invoking handler for signal 15: Terminated > >>> > Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown: > >>> > Requesting shutdown, upper limit is 1200000ms > >>> > Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req: > >>> > Sending shutdown request to host2 > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > >>> > Child > >>> > process stonith-ng exited (pid=5247, rc=1) > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > >>> > IPC > >>> > Channel to 5249 is not connected > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > >>> > IPC > >>> > Channel to 5246 is not connected > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: > >>> > IPC > >>> > Channel to 5247 is not connected > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > >>> > Child > >>> > process cib exited (pid=5246, rc=1) > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: > >>> > Child > >>> > process attrd exited (pid=5249, rc=1) > >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text: > >>> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection > >>> > timed > >>> > out (110) > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: > >>> > Input > >>> > I_ERROR from do_shutdown_req() received in state S_NOT_DC > >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: > do_state_transition: > >>> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR > >>> > cause=C_FSA_INTERNAL > >>> > origin=do_shutdown_req ] > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_recover: > >>> > Action A_RECOVER (0000000001000000) not supported > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: > >>> > Input > >>> > I_TERMINATE from do_recover() received in state S_RECOVERY > >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: > do_state_transition: > >>> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > >>> > cause=C_FSA_INTERNAL origin=do_recover ] > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown: > >>> > Disconnecting STONITH... > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: > >>> > tengine_stonith_connection_destroy: Fencing daemon > disconnected > >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation > >>> > monitor[25] > >>> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its > >>> > parameters: > >>> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] > >>> > CRM_meta_timeout=[20000] > >>> > CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: > >>> > Resource P_SESSION_IP was active at shutdown. You may ignore this > >>> > error if > >>> > it is unmanaged. > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control: > >>> > Disconnected from the LRM > >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: > >>> > terminate_ais_connection: > >>> > Disconnecting from AIS > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control: > >>> > Disconnected from OpenAIS > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control: > >>> > Disconnecting CIB > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: > >>> > IPC > >>> > Channel to 5246 is not connected > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: > >>> > IPC > >>> > Channel to 5246 is not connected > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: > >>> > cib_native_perform_op_delegate: Sending message to CIB service > >>> > FAILED > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: > >>> > crmd_cib_connection_destroy: Connection to the CIB > terminated... > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: > >>> > Resource P_SESSION_IP was active at shutdown. You may ignore this > >>> > error if > >>> > it is unmanaged. > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: > >>> > Performing > >>> > A_EXIT_0 - gracefully exiting the CRMd > >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could > >>> > not > >>> > recover from internal error > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: > Dropping > >>> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop > ] > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup: > >>> > Cleaning up memory from libxml2 > >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd] > >>> > stopped (2) > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: > >>> > Child > >>> > process crmd exited (pid=5251, rc=2) > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: > >>> > IPC > >>> > Channel to 5251 is not connected > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: > >>> > Stopping pengine: Sent -15 to process 5250 > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: > >>> > Child > >>> > process pengine exited (pid=5250, rc=0) > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: > >>> > Stopping lrmd: Sent -15 to process 5248 > >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: > >>> > Child > >>> > process lrmd exited (pid=5248, rc=0) > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: > >>> > Sending message via cpg FAILED: (rc=9) Bad handle > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: > >>> > pcmk_shutdown_worker: > >>> > Shutdown complete > >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: > Exiting > >>> > pacemakerd > >>> > > >>> > > >>> > corosync.conf: > >>> > > >>> > compatibility: whitetank > >>> > > >>> > totem { > >>> > version: 2 > >>> > secauth: off > >>> > nodeid: 104 > >>> > interface { > >>> > member { > >>> > memberaddr: 172.17.0.104 > >>> > } > >>> > member { > >>> > memberaddr: 172.17.0.105 > >>> > } > >>> > ringnumber: 0 > >>> > bindnetaddr: 172.17.0.0 > >>> > mcastport: 5426 > >>> > ttl: 1 > >>> > } > >>> > transport: udpu > >>> > } > >>> > > >>> > logging { > >>> > fileline: off > >>> > to_logfile: yes > >>> > to_syslog: yes > >>> > debug: on > >>> > logfile: /var/log/cluster/corosync.log > >>> > debug: off > >>> > timestamp: on > >>> > logger_subsys { > >>> > subsys: AMF > >>> > debug: off > >>> > } > >>> > } > >>> > service { > >>> > # Load the Pacemaker Cluster Resource Manager > >>> > ver: 1 > >>> > name: pacemaker > >>> > } > >>> > > >>> > aisexec { > >>> > user: root > >>> > group: root > >>> > } > >>> > > >>> > > >>> > > >>> > Thank you! > >>> > > >>> > -- > >>> > Viacheslav Biriukov > >>> > BR > >>> > http://biriukov.me > >>> > > >>> > _______________________________________________ > >>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > > >>> > Project Home: http://www.clusterlabs.org > >>> > Getting started: > >>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> > Bugs: http://bugs.clusterlabs.org > >>> > > >>> > >>> > >>> > >>> -- > >>> Dan Frincu > >>> CCNA, RHCE > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> > >> -- > >> Viacheslav Biriukov > >> BR > >> http://biriukov.me > > > > > > > > > > -- > > Viacheslav Biriukov > > BR > > http://biriukov.me > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Viacheslav Biriukov BR http://biriukov.me
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org