> pacemaker is waiting for something in nanosleep.  Not sure what.

Should I ping the pacemaker list separately?  I'm not sure how much
overlap there is between here and there.

> The
> symptom you describe sounds like a inability for corosync to form a
> membership because of switch-default STP settings.

I had a brief a-ha! moment: these systems are KVM guests.  Network
connectivity is through bridges on the Linux host, which default to a
30 second forwarding delay.  Tragically, we had already set this to
zero:

  # brctl showstp br613 | grep -i delay
  forward delay             0.00                 bridge forward delay       0.00

And in fact these sytems use DHCP to acquire network settings, and if
the issue was STP this would prevent them from receiving a lease from
the DHCP server.

> Try running the following on the node after a lockup:
> killall -SEGV corosync
> corosync-fplay
> attach output

I've attached the output to this message.

Attachment: corosync.log
Description: Binary data

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to