Hi, I´m glad to report that after a restart of all corosync and pacemaker services the cluster is back on normal operation. A manuel failover is working fine and eveything shifts smoothly.
Thanks to everyone for their support ! Kind regards fatcharly -------- Original-Nachricht -------- > Datum: Tue, 24 Jul 2012 15:13:39 +0200 > Von: fatcha...@gmx.de > An: Jake Smith <jsm...@argotec.com>, The Pacemaker cluster resource manager > <pacemaker@oss.clusterlabs.org> > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3 > Hi, > > here are the results of the corosync status. Can´t find a problem there: > > pilotpound: > > [root@pilotpound ~]# corosync-cfgtool -s > Printing ring status. > Local node ID 425699520 > RING ID 0 > id = 192.168.95.25 > status = ring 0 active with no faults > RING ID 1 > id = 192.168.20.245 > status = ring 1 active with no faults > [root@pilotpound ~]# corosync-objctl | grep member > runtime.totem.pg.mrp.srp.members.425699520.ip=r(0) ip(192.168.95.25) r(1) > ip(192.168.20.245) > runtime.totem.pg.mrp.srp.members.425699520.join_count=1 > runtime.totem.pg.mrp.srp.members.425699520.status=joined > runtime.totem.pg.mrp.srp.members.442476736.ip=r(0) ip(192.168.95.26) r(1) > ip(192.168.20.246) > runtime.totem.pg.mrp.srp.members.442476736.join_count=1 > runtime.totem.pg.mrp.srp.members.442476736.status=joined > > > powerpound: > > [root@powerpound ~]# corosync-cfgtool -s > Printing ring status. > Local node ID 442476736 > RING ID 0 > id = 192.168.95.26 > status = ring 0 active with no faults > RING ID 1 > id = 192.168.20.246 > status = ring 1 active with no faults > [root@powerpound ~]# corosync-objctl | grep member > runtime.totem.pg.mrp.srp.members.442476736.ip=r(0) ip(192.168.95.26) r(1) > ip(192.168.20.246) > runtime.totem.pg.mrp.srp.members.442476736.join_count=1 > runtime.totem.pg.mrp.srp.members.442476736.status=joined > runtime.totem.pg.mrp.srp.members.425699520.ip=r(0) ip(192.168.95.25) r(1) > ip(192.168.20.245) > runtime.totem.pg.mrp.srp.members.425699520.join_count=5 > runtime.totem.pg.mrp.srp.members.425699520.status=joined > > So I think I´ve got to swollow the bitter pill and restart the whole > cluster. > > I will report about the result. > > Kind regards > > fatcharly > > > -------- Original-Nachricht -------- > > Datum: Fri, 20 Jul 2012 12:21:47 -0400 (EDT) > > Von: Jake Smith <jsm...@argotec.com> > > An: The Pacemaker cluster resource manager > <pacemaker@oss.clusterlabs.org> > > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3 > > > > > ----- Original Message ----- > > > From: fatcha...@gmx.de > > > To: "Jake Smith" <jsm...@argotec.com>, "The Pacemaker cluster resource > > manager" <pacemaker@oss.clusterlabs.org> > > > Sent: Friday, July 20, 2012 11:50:52 AM > > > Subject: Re: [Pacemaker] problem with pacemaker/corosync on CentOS > 6.3 > > > > > > Hi Jake, > > > > > > I erased the files as mentioned und started the services. This is > > > what I get on pilotpound after crm_mon : > > > > > > ============ > > > Last updated: Fri Jul 20 17:45:58 2012 > > > Last change: > > > Current DC: NONE > > > 0 Nodes configured, unknown expected votes > > > 0 Resources configured. > > > ============ > > > > > > > > > Looks like the system didn´t joined the cluster. > > > > > > Any suggestions are welcome > > > > Oh maybe worth checking corosync membership and see what it says now: > > > http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership > > > > > > > > Kind regards > > > > > > fatharly > > > > > > ------- Original-Nachricht -------- > > > > Datum: Fri, 20 Jul 2012 10:49:15 -0400 (EDT) > > > > Von: Jake Smith <jsm...@argotec.com> > > > > An: The Pacemaker cluster resource manager > > > > <pacemaker@oss.clusterlabs.org> > > > > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS > > > > 6.3 > > > > > > > > > > > ----- Original Message ----- > > > > > From: fatcha...@gmx.de > > > > > To: pacemaker@oss.clusterlabs.org > > > > > Sent: Friday, July 20, 2012 6:08:45 AM > > > > > Subject: [Pacemaker] problem with pacemaker/corosync on CentOS > > > > > 6.3 > > > > > > > > > > Hi, > > > > > > > > > > I´m using a pacemaker+corosync bundle to run a pound based > > > > > loadbalancer. After an update on CentOS 6.3 there is some > > > > > mismatch > > > > > of the node status. Via crm_mon on one node eveything looks fine > > > > > while on the other node everything is offline. Everything was > > > > > fine > > > > > on CentOS 6.2. > > > > > > > > > > Node powerpound: > > > > > > > > > > ============ > > > > > Last updated: Fri Jul 20 12:04:29 2012 > > > > > Last change: Thu Jul 19 17:58:31 2012 via crm_attribute on > > > > > pilotpound > > > > > Stack: openais > > > > > Current DC: powerpound - partition with quorum > > > > > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 > > > > > 2 Nodes configured, 2 expected votes > > > > > 7 Resources configured. > > > > > ============ > > > > > > > > > > Online: [ powerpound pilotpound ] > > > > > > > > > > HA_IP_1 (ocf::heartbeat:IPaddr2): Started powerpound > > > > > HA_IP_2 (ocf::heartbeat:IPaddr2): Started powerpound > > > > > HA_IP_3 (ocf::heartbeat:IPaddr2): Started powerpound > > > > > HA_IP_4 (ocf::heartbeat:IPaddr2): Started powerpound > > > > > HA_IP_5 (ocf::heartbeat:IPaddr2): Started powerpound > > > > > Clone Set: pingclone [ping-gateway] > > > > > Started: [ pilotpound powerpound ] > > > > > > > > > > > > > > > Node pilotpound: > > > > > > > > > > ============ > > > > > Last updated: Fri Jul 20 12:04:32 2012 > > > > > Last change: Thu Jul 19 17:58:17 2012 via crm_attribute on > > > > > pilotpound > > > > > Stack: openais > > > > > Current DC: NONE > > > > > 2 Nodes configured, 2 expected votes > > > > > 7 Resources configured. > > > > > ============ > > > > > > > > > > OFFLINE: [ powerpound pilotpound ] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > from /var/log/messages on pilotpound: > > > > > > > > > > Jul 20 12:06:12 pilotpound cib[24755]: warning: > > > > > cib_peer_callback: > > > > > Discarding cib_apply_diff message (35909) from powerpound: not in > > > > > our mem bership > > > > > Jul 20 12:06:12 pilotpound cib[24755]: warning: > > > > > cib_peer_callback: > > > > > Discarding cib_apply_diff message (35910) from powerpound: not in > > > > > our mem bership > > > > > > > > > > > > > > > > > > > > how could this happened and what can I do to solve this problem ? > > > > > > > > Pretty sure it had nothing to do with upgrade - I had this the > > > > other day > > > > on Ubuntu 12.04 after a reboot of both nodes. I believe a couple > > > > experts > > > > called it a "transient" bug. See: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=820821 > > > > https://bugzilla.redhat.com/show_bug.cgi?id=5040 > > > > > > > > > > > > > > Any suggestions are welcome > > > > > > > > I fixed by stopping/killing pacemaker/corosync on offending node > > > > (pilotpound). Then cleared these files out on same node: > > > > rm /var/lib/heartbeat/crm/cib* > > > > rm /var/lib/pengine/* > > > > > > > > Then restart corosync/pacemaker and the node rejoined fine. > > > > > > > > HTH > > > > > > > > Jake > > > > > > > > _______________________________________________ > > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > > > Project Home: http://www.clusterlabs.org > > > > Getting started: > > > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org