On Wed, 2010-03-31 at 16:07 -0400, Simpson, John R wrote: > Greetings all, > > I have a lab cluster using Pacemaker 1.0.8 and Corosync 1.2.0-1 > (see packages below) on CentOS 5.4 (32-bit) VM's running under > VMware ESXi 3.5. My location constraints and connectivity > tests were working well, so I was feeling really good when > I decided to shut down the interface used for cluster > communication and verify that it resulted in a split-brain cluster. > > Much to my dismay, corosync crashed almost immediately on the node > where I shut down the Ethernet interface. I can recreate the issue > at will on this cluster and a different cluster running a slightly > more recent version of Pacemaker 1.0.8 and the same version of > Corosync on CentOS 5.4 64-bit VMs. > > I've attached the log, but here is the most suspicious message: > > Mar 31 15:35:16 corosync [pcmk ] ERROR: pcmk_peer_update: Something strange > happened: 1 > > Cluster communication is on 172.16.0.0/24 (eth1) and Apache, etc. are on > 10.127.252.0/24 (eth0). > > I've tried to include or attach all the relevant information -- please let me > know if there's anything else that would be useful. > > Regards, > > John Simpson >
I've answered this so many times on the ml I've created a faq for it. If the faq is unclear, let me know, and we can add to it. http://www.corosync.org/doku.php?id=faq:ifdown You mentioned Corosync crashed(segfault?), which it should not To report that crash, see the following faq: http://www.corosync.org/doku.php?id=faq:crash > [r...@cy-ha01 ~]# netstat -rn > Kernel IP routing table > Destination Gateway Genmask Flags MSS Window irtt Iface > 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3 > 172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 > 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 > 10.127.252.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 > 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth3 > 224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 eth1 > 0.0.0.0 10.127.252.1 0.0.0.0 UG 0 0 0 eth0 > > [r...@cy-ha01 ~]# date ; ifconfig eth1 down > Wed Mar 31 15:35:03 EDT 2010 > > Output from crm_mon when eth1 is shut down. > ============ > Last updated: Wed Mar 31 15:31:50 2010 > Stack: openais > Current DC: cy-ha02 - partition with quorum > Version: 1.0.8-2a76c6ac04bcccf42b89a08e55bfbd90da2fb49a > 2 Nodes configured, 2 expected votes > 2 Resources configured. > ============ > > Online: [ cy-ha01 cy-ha02 ] > > Resource Group: WebSiteGroup > ServiceIP (ocf::heartbeat:IPaddr2): Started cy-ha01 > WebSite (ocf::heartbeat:apache): Started cy-ha01 > Clone Set: CloneConnectivityTest > Started: [ cy-ha02 cy-ha01 ] > Connection to the CIB terminated > Reconnecting................................ > > [r...@cy-ha01 ~]# rpm -qa | grep pace > pacemaker-libs-devel-1.0.8-1.el5 > pacemaker-1.0.8-1.el5 > pacemaker-libs-1.0.8-1.el5 > [r...@cy-ha01 ~]# rpm -qa | grep coros > corosynclib-1.2.0-1.el5 > corosync-1.2.0-1.el5 > corosynclib-devel-1.2.0-1.el5 > > -- > John Simpson > Senior Software Engineer, I. T. Engineering and Operations > > _______________________________________________ > Pacemaker mailing list > [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
