----- Original Message ----- > From: fatcha...@gmx.de > To: pacemaker@oss.clusterlabs.org > Sent: Friday, July 20, 2012 6:08:45 AM > Subject: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3 > > Hi, > > I´m using a pacemaker+corosync bundle to run a pound based > loadbalancer. After an update on CentOS 6.3 there is some mismatch > of the node status. Via crm_mon on one node eveything looks fine > while on the other node everything is offline. Everything was fine > on CentOS 6.2. > > Node powerpound: > > ============ > Last updated: Fri Jul 20 12:04:29 2012 > Last change: Thu Jul 19 17:58:31 2012 via crm_attribute on pilotpound > Stack: openais > Current DC: powerpound - partition with quorum > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 > 2 Nodes configured, 2 expected votes > 7 Resources configured. > ============ > > Online: [ powerpound pilotpound ] > > HA_IP_1 (ocf::heartbeat:IPaddr2): Started powerpound > HA_IP_2 (ocf::heartbeat:IPaddr2): Started powerpound > HA_IP_3 (ocf::heartbeat:IPaddr2): Started powerpound > HA_IP_4 (ocf::heartbeat:IPaddr2): Started powerpound > HA_IP_5 (ocf::heartbeat:IPaddr2): Started powerpound > Clone Set: pingclone [ping-gateway] > Started: [ pilotpound powerpound ] > > > Node pilotpound: > > ============ > Last updated: Fri Jul 20 12:04:32 2012 > Last change: Thu Jul 19 17:58:17 2012 via crm_attribute on pilotpound > Stack: openais > Current DC: NONE > 2 Nodes configured, 2 expected votes > 7 Resources configured. > ============ > > OFFLINE: [ powerpound pilotpound ] > > > > > > from /var/log/messages on pilotpound: > > Jul 20 12:06:12 pilotpound cib[24755]: warning: cib_peer_callback: > Discarding cib_apply_diff message (35909) from powerpound: not in > our mem bership > Jul 20 12:06:12 pilotpound cib[24755]: warning: cib_peer_callback: > Discarding cib_apply_diff message (35910) from powerpound: not in > our mem bership > > > > how could this happened and what can I do to solve this problem ?
Pretty sure it had nothing to do with upgrade - I had this the other day on Ubuntu 12.04 after a reboot of both nodes. I believe a couple experts called it a "transient" bug. See: https://bugzilla.redhat.com/show_bug.cgi?id=820821 https://bugzilla.redhat.com/show_bug.cgi?id=5040 > > Any suggestions are welcome I fixed by stopping/killing pacemaker/corosync on offending node (pilotpound). Then cleared these files out on same node: rm /var/lib/heartbeat/crm/cib* rm /var/lib/pengine/* Then restart corosync/pacemaker and the node rejoined fine. HTH Jake _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org