Hi Jon,

See below.

Jon Maloy ??????:
> I think this is calculated correctly in our case, but the rec_gap 
> passed into tipc_send_proto_msg() gets overwritten by
> that routine. This is normally correct, since the gap should be 
> adjusted according to what is present
> in the deferred-queue, in order to avoid retransmitting more packets 
> than necessary.
>
> The code I was referring to is the following, where 'gap' initially is 
> set to the 'rec_gap' calculated above.
>
> if (l_ptr->oldest_deferred_in) {
> u32 rec = msg_seqno(buf_msg(l_ptr->oldest_deferred_in));
> gap = mod(rec - mod(l_ptr->next_in_no));
> }
>
> msg_set_seq_gap(msg, gap);
> .....
>
> When the protocol gets stuck, 'rec_gap' should be found to be (54992 - 
> 53968) = 1024
> Since the result is non-zero, tipc_link_send_proto_msg() is called.
> Inside that routine three things can happen:
> 1) l_ptr->oldest_deferred_in is NULL. This means that gap' will retain 
> its value of 1024.
> This leads us into case 3) below.
How can l_ptr->oldest_deferred_in be NULL? Don't think this is out case.
> 2) The calculation of 'gap' over-writes the original value. If this 
> value always is zero,
> the protocol will bail out. Can this happen?
> 3) msg_set_gap() always writes a zero into the message.
> Actually, this is fully possible. The field for 'gap' is only 8 bits 
> long, so any gap size
> which is a multiple of 256 will give a zero. Looking at the dump, this 
> looks
> very possible: the first packet loss is not 95 packets, as I stated in 
> my first mail, but
> 54483 - 53967= 525 packets. This is counting only from what we see in 
> Wireshark,
> which we have reason to suspect doesn't show all packets. So the real 
> value might
> quite well be 512. And if this happens, we are stuck forever, because 
> the head
> of the deferred-queue will never move.
This seems to be the case we are seeing in the dump. 525 is just too 
close to 512 :)
> My question to Peter is: How often does this happen? Every time? Often?
> If it happens often, can it be that the Ethernet driver has the habit 
> of throwing away
> blocks of packets which are exactly a multiple of 256 or 512. (These 
> computer
> programmers...)
Considering the amount of traffic I have on the nodes and packet drop 
rate relative to the frequency of occurance of this problem - we can 
safely call it a rare condition (... like 1 in 256 chance when packet 
drop occures ;) ). It is not predictable under normal operation, while 
very easy to cause using a stress test, thou it usualy took me between 2 
and 10 attempts before I can make the link stall.
I reduced link window to 224 when I realized the gap field is 8bit and I 
haven't seen any problems since then.
However it's worth noticing that some time ago when the cluster was 
working over a 100mbit net, a window of < 256 was causing more trouble 
than a window of say > 512 when facing high traffic/packet rate and that 
is actualy the reason I ended up using windows of up to 4096. Being a 
good programmer, I was trying powers of 2 for link window values up 
until I got to 4096, when my troubles almost disappeared during the 
100mbit era.
When we switched to gbit lan the picture changes quite dramaticaly.
Regarding the magic number 256 - it seems that most of my e1000 nics 
have a 256 entry tx/rx descriptor table, so it may have something to do 
with the issue .. but who knows :)

> Anyway, we have clearly found a potential problem which must be 
> resolved. With
> window sizes > 255, scenario 3) is bound to happen now and then. If 
> this is Peter's
> problem, remains to be seen.
So far it seems that the protocol may be having issues with link windows 
 >= 256.
I'd guess it will be a good idea to add the gap check you proposed anyway.

Regards,
Peter.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to