This looks like the underlying problem: Feb 10 23:58:07 [1199] vcsquorum cib: notice: cib:diff: -- <node uname="vcsquorum.example.com" id="755053578" /> Feb 10 23:58:07 [1199] vcsquorum cib: notice: cib:diff: ++ <node id="755053578" uname="vcsquorum" />
Something is confused about what the node(s) should be called. On Mon, Feb 11, 2013 at 6:48 PM, Andrew Martin <amar...@xes-inc.com> wrote: > Hello, > > I am running a 3-node Pacemaker (1.1.8) + Corosync (2.1.0) cluster on Ubuntu > 12.04. Two of the nodes are "real" nodes, hosting a DRBD filesystem mount and > some daemons: > http://pastebin.com/n1sNMhuE > The third node cannot run resources and acts as a quorum node in standby. > > Recently, the nodes will all change to the "pending" state, and may remain > there for quite some time (many days) before coming back online (if ever). > Using "crm node clearstate" does not help. > > Tonight I stopped pacemaker and corosync on all nodes, emptied the contents > of /var/lib/pacemaker/cib, /var/lib/pacemaker/pengine, and /var/lib/corosync. > After doing so, I restarted corosync and pacemaker on all of the nodes, and > repopulated the CIB once the nodes all joined. This worked in restoring the > nodes states to "online", however after a few minutes, the nodes all went > back into "pending", this time only for around 5 minutes. Here's the log from > the current DC: > http://pastebin.com/xhfsb15d > > There do not appear to be any faults in the corosync rings: > RING ID 0 > id = 192.168.1.170 > status = ring 0 active with no faults > RING ID 1 > id = 192.168.7.170 > status = ring 1 active with no faults > > corosync.conf: > http://pastebin.com/DQUNdp9f > > Some common messages I am seeing in the log: > Peer is not part of our cluster > Diff 2.106.7 -> 2.106.8 from vcs1 not applied to 2.105.12: current "epoch" is > less than required (epoch, admin_epoch, and num_updates all appear in this > message) > What do these messages mean? Do they indicate a problem? > > Do you have any ideas on what may be causing this behavior? > > Thanks, > > Andrew > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org