On 09/24/2010 05:55 PM, Lars Kellogg-Stedman wrote: >> pacemaker is waiting for something in nanosleep. Not sure what. > > Should I ping the pacemaker list separately? I'm not sure how much > overlap there is between here and there. > >> The >> symptom you describe sounds like a inability for corosync to form a >> membership because of switch-default STP settings. > > I had a brief a-ha! moment: these systems are KVM guests. Network > connectivity is through bridges on the Linux host, which default to a > 30 second forwarding delay. Tragically, we had already set this to > zero: > > # brctl showstp br613 | grep -i delay > forward delay 0.00 bridge forward delay > 0.00 > > And in fact these sytems use DHCP to acquire network settings, and if > the issue was STP this would prevent them from receiving a lease from > the DHCP server. > >> Try running the following on the node after a lockup: >> killall -SEGV corosync >> corosync-fplay >> attach output > > I've attached the output to this message.
From the fplay records, it looks like corosync has started up perfectly and acquired all nodes in the network (248,249,250). I would suggest pinging the pacemaker list for further investigation. Regards -steve _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
