On 11 Apr 2014, at 10:54 pm, Marco Felettigh <ma...@nucleus.it> wrote:
> On Fri, 11 Apr 2014 17:17:57 +1000 > Andrew Beekhof <and...@beekhof.net> wrote: > >> >> On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: >> >>> On Tue, 8 Apr 2014 10:49:16 +1000 >>> Andrew Beekhof <and...@beekhof.net> wrote: >>> >>>> >>>> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: >>>> >>>>> Hi, >>>>> in a production environment with 2 nodes ( nodeA , nodeB ) we had >>>>> an hardware failure so we restart the nodeB. >>>>> After the restarted nodeB came up we restart corosync/pacemaker on >>>>> it but for 2 days till now che corosync/pacemaker stuff is >>>>> looping. >>>>> >>>>> crm_mon NodeA: >>>>> >>>>> Stack: openais >>>>> Current DC: nodeA - partition with quorum >>>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 >>>>> 2 Nodes configured, 2 expected votes >>>>> 17 Resources configured. >>>>> ============ >>>>> >>>>> Online: [ nodeA ] >>>>> OFFLINE: [ nodeB ] >>>>> >>>>> >>>>> crm_mon NodeB: >>>>> >>>>> Stack: openais >>>>> Current DC: NONE >>>>> 2 Nodes configured, 2 expected votes >>>>> 17 Resources configured. >>>>> ============ >>>>> >>>>> OFFLINE: [ nodeA nodeB ] >>>>> >>>>> This loop on nodeB reports: >>>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: >>>>> nodeA) lost: vote from nodeA (Age) >>>>> >>>>> So investigating around i found these message on nodeA: >>>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS >>>>> >>>>> now this message is repeating for every operation. >>>>> Is it a corosync problem or a cib/pacemaker one ? >>>>> Any suggestion on what is happened ? >>>> >>>> For some reason the cib can't connect to corosync anymore. >>>> No software got upgraded recently? >>>> >>>> Are there any logs from corosync? >>>> Which distro is this? >>>> >>>>> And why the start of a cluster node crasched the DC suff ? :( >>>>> >>>>> >>>>> Bye Marco >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: >>>>> http://bugs.clusterlabs.org >>>> >>> >>> Hi, >>> the distro in an opensuse 11.1 and there is no updates also because >>> the distro is out of maintenance. >> >> A good reason to be using SLES (or RHEL/CentOS). > > Better Gentoo ;) > >> >>> We are planning and upgrade but the interesting thing is to figure >>> out the reasons of the problem. >>> The log in attachment, thanks for the support >> >> There's nothing obvious in the logs. Just that as far as pacemaker >> could tell, corosync suddenly went away. Was the corosync process >> still running? >> > > Yes , corosync was still running . Stopping pacemaker and restarting it didnt help?
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org