Good morning,

I am not subscribed to the list (yet, waiting on confirmation) so
please CC me on all replies.

My employer has several deployments of Pacemaker on top of Corosync
and we have recently been hitting this:

Jul 18 12:01:05 xxxx corosync[6065]:   [TOTEM ] FAILED TO RECEIVE
Jul 18 12:01:15 xxxx corosync[6065]: last message repeated 15 times
Jul 18 12:01:15 xxxx corosync[6065]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 268: memb=1,
new=0, lost=4

Every node in the cluster, every single one (not just the failed
node), then considers itself DC in Pacemaker and all of them have a
no-quorum event. There is no explanation for this anywhere aside from
the source, which I'm afraid I haven't devoted enough time to to
understand. I see the failure at exec/totemsrp.c:3548, but I have no
idea why we're hitting it.

I did see this:

http://marc.info/?l=openais&m=131074804115507&w=2

Which sounds promising. I noticed Corosync 1.4.0 just shipped, too, is
this something that would resolve this issue? I can provide details of
our switch hardware off-list if it is necessary, I'm just looking for
a little bit of guidance here (and some light shed on why this
occurs).

I appreciate the assistance,

-- 
Jed Smith
[email protected]
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to