On 08/04/2011 11:43 AM, Sebastian Kaps wrote: > Hi Steven, > > On 04.08.2011, at 18:27, Steven Dake wrote: > >> redundant ring is only supported upstream in corosync 1.4.1 or later. > > What does "supported" mean in this context, exactly? >
meaning the corosync community doesn't investigate redundant ring issues prior to corosync versions 1.4.1. I expect the root of ypur problem is already fixed (the retransmit list problem) however in the repos and latest released versions. Regards -steve > I'm asking, because we're having serious issues with these systems since > they went into production (the testing phase did not show any problems, > but we also couldn't use real workloads then). > > Since the cluster went productive, we're having issues with seemingly random > STONITH events that seem to be related to a high I/O load on a DRBD-mirrored > OCFS2 volume - but I don't see any pattern yet. We've had these machines > running for nearly two weeks without major problems and suddenly they went > back to killing each other :-( > >> The retransmit list message issues you are having is fixed in corosync >> 1.3.3. and later This is what is triggering the redundant ring faulty >> error. > > Could it also cause the instability problems we're seeing? > Thanks again, for helping! yes > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker