Hi, I had an instance the one of mcast messages was lost. But the corosync does not try to retransmit the lost message, so the other node gets into "FAILED TO RECEIVE". The logs from both servers are below. The srv3 did not receive the mcast message 161. The problem is the srv3 did not request the retransmission of that lost message.
2010-03-17 15:22:03.831704 srv3-corosync[6213]: [TOTEM ] totemsrp.c:2094 mcasted message added to pending queue 2010-03-17 15:22:03.831719 srv3-corosync[6213]: [TOTEM ] totemsrp.c:3580 Delivering 160 to 162 2010-03-17 15:22:03.831727 srv3-corosync[6213]: [TOTEM ] totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 162 2010-03-17 15:22:03.831734 srv3-corosync[6213]: [TOTEM ] totemsrp.c:3580 Delivering 160 to 162 2010-03-17 15:22:03.204757 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 160 2010-03-17 15:22:03.204765 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 161 2010-03-17 15:22:03.205069 srv4-corosync[22981]: [TOTEM ] totemsrp.c:2217 releasing messages up to and including 160 2010-03-17 15:22:03.828871 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 162 2010-03-17 15:22:03.828884 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3580 Delivering 161 to 162 2010-03-17 15:22:03.828892 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3650 Delivering MCAST message with seq 162 to pending delivery queue 2010-03-17 15:22:03.859675 srv4-corosync[22981]: [TOTEM ] totemsrp.c:3442 FAILED TO RECEIVE 2010-03-17 15:22:03.859689 srv4-corosync[22981]: [TOTEM ] totemsrp.c:1102 Set consensus for 22/192.168.10.22 at 0 found 0 2010-03-17 15:22:03.859696 srv4-corosync[22981]: [TOTEM ] totemsrp.c:1795 entering GATHER state from 6. -- Peakpoint Service Cluster Setup, Troubleshooting & Development [email protected] (303) 997-2823
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
