On 06/01/2010 11:18 PM, Dave Dillow wrote: > On 06/01/2010 01:37 PM, Steven Dake wrote: >> On 06/01/2010 10:26 AM, David Dillow wrote: >>> Certainly, it doesn't look like there should ever be encapsulated >>> messages on the regular ring, only the recovery ring. Somehow, we're >>> getting messages on the regular ring with at least one, if not two >>> levels of encapsulation. >>> >> >> There should never be an encapsulated message in a regular ring. The >> ring id problem I spoke about later in this mail would explain why that >> encapsulated message would come into in regular ring.
Sorry, here would have been a good place to mention that these tests were with r2917 off the trunk. > Ok, looks like r2792 fixed the encapsulated messages on the regular > ring, as you expected. I'm now tripping the assert on line 2750 in > totemsrp.c: > > assert (instance->commit_token->memb_index <= \ > instance->commit_token->addr_entries); > > This happened on several nodes when running with a peak of 93 nodes in > the cluster. It happened on one or two nodes, then later caught again > once the count had dropped to 90 or so. > > I'm still running the shorter timeouts, as they seem to stress the > system a bit more to force issues like this to surface. I've saved off > three specimens of the core files and associated logs for further study, > as it is likely the machines will be rebooted tomorrow for other testing > and they don't have long-term local storage. > > Any suggestions on how I can help debug this? _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
