On Thu, 2010-03-18 at 10:55 -0600, hj lee wrote: > Hi, > > I had an instance the one of mcast messages was lost. But the corosync > does not try to retransmit the lost message, so the other node gets > into "FAILED TO RECEIVE". The logs from both servers are below. The > srv3 did not receive the mcast message 161. The problem is the srv3 > did not request the retransmission of that lost message. >
my analysis of the log data indicates srv3 IP address is 192.168.10.21. Is that correct? I full log (attach it) would be helpful to see the events that led up to the problem. I especially want to know if totem was in the operational state or some other state when this happened. > > 2010-03-17 15:22:03.831704 srv3-corosync[6213]: [TOTEM ] > totemsrp.c:2094 mcasted message added to pending queue > 2010-03-17 15:22:03.831719 srv3-corosync[6213]: [TOTEM ] > totemsrp.c:3580 Delivering 160 to 162 > 2010-03-17 15:22:03.831727 srv3-corosync[6213]: [TOTEM ] > totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 162 > 2010-03-17 15:22:03.831734 srv3-corosync[6213]: [TOTEM ] > totemsrp.c:3580 Delivering 160 to 162 > > 2010-03-17 15:22:03.204757 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 160 > 2010-03-17 15:22:03.204765 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 161 > 2010-03-17 15:22:03.205069 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:2217 releasing messages up to and including 160 > 2010-03-17 15:22:03.828871 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3747 Received ringid(192.168.10.21:1532) seq 162 > 2010-03-17 15:22:03.828884 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3580 Delivering 161 to 162 > 2010-03-17 15:22:03.828892 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3650 Delivering MCAST message with seq 162 to pending > delivery queue > 2010-03-17 15:22:03.859675 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:3442 FAILED TO RECEIVE > 2010-03-17 15:22:03.859689 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:1102 Set consensus for 22/192.168.10.22 at 0 found 0 > 2010-03-17 15:22:03.859696 srv4-corosync[22981]: [TOTEM ] > totemsrp.c:1795 entering GATHER state from 6. > > -- > Peakpoint Service > > Cluster Setup, Troubleshooting & Development > [email protected] > (303) 997-2823 > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
