On 06/17/2010 07:16 PM, Tim Beale wrote: > Hi, > > I'm running corosync on a setup where corosync packets are getting delayed and > lost. I'm seeing corosync enter recovery mode repeatedly, which is then > causing > other problems for us. (We're running trunk as at revision 2569 (8 Dec 09), so > some of these flow-on problems may already be fixed.) > > Corosync entering recovery mode repeatedly doesn't look like it's fixed on the > latest trunk though. The problem is corosync is canceling its token retransmit > timeout prematurely in message_handler_mcast(). > > Corosync in this setup is getting some mcast packets received out of order. So > corosync receives a mcast message with a lower seq than the last token it sent > out and stops its token retransmit timer. If the token it just sent is lost, > then it doesn't retransmit the token. The token timeout occurs and corosync > enters gather/commit/recovery. > > I think the message_handler_mcast() code should also check the seq of the > mcast > message before stopping the retransmit timer (see attached patch). You can > only > guarantee the last token sent was successfully received if another node sends > a > mcast message with a higher seq. > > Does anyone see any problems with this patch? >
Missed your email - sorry for long delay. Thanks for pointing out the problem - you found a problem in the totem spec! Your logic is sound - showing a good understanding of how totem works.. A simpler solution altogether may just be to not cancel the token retransmit timer on receipt of a regular message. I can see no good reason to cancel that timer, other then as a micro optimization (at the expense of the comparisons for checking the seqid and ringid - looks like a wash). The patch you submitted doesn't handle rollover of the token (ie: when it reaches boundary conditions in the integer). Regards -steve > Thanks, > Tim > > > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
