Re: [tipc-discussion] RE : Re: Link related question/issue

Jon Maloy Mon, 10 Mar 2008 14:31:06 -0700

Hi,
Ok, so it seems like we have pinpointed the problem.
Luckily, the solution to this is extremely simple, and
backwards compatible.
The 5 bits next to  MSB of the gap fields happen to be
unused, and is exactly what we need for a future-proof
gap-field size. (If we can loose a 512 packets block
on a 1-Gb  llink, how many can we loose on a 100 Gb
link? )


I suggest the following solution, i.e. extending the bit-
mask in msg_set_bits()/msg_bits() from 8 to 13 bits:

static inline u32 msg_seq_gap(struct tipc_msg *m)
{
    return msg_bits(m, 1, 16, 0x1fff);
}

static inline void msg_set_seq_gap(struct tipc_msg *m, u32 n)
{
    msg_set_bits(m, 1, 16, 0x1fff, n);
}

 I leave to Allan to integrate this into the 1.7.5 code, since delivering
it to David M would only screw up the patches Allan has in his pipe.
And anyway, I have not actually tested it.

Regards
///jon




Xpl++ wrote:
> Hi Jon,
>
> See below.
>
> Jon Maloy ??????:
>> I think this is calculated correctly in our case, but the rec_gap 
>> passed into tipc_send_proto_msg() gets overwritten by
>> that routine. This is normally correct, since the gap should be 
>> adjusted according to what is present
>> in the deferred-queue, in order to avoid retransmitting more packets 
>> than necessary.
>>
>> The code I was referring to is the following, where 'gap' initially 
>> is set to the 'rec_gap' calculated above.
>>
>> if (l_ptr->oldest_deferred_in) {
>> u32 rec = msg_seqno(buf_msg(l_ptr->oldest_deferred_in));
>> gap = mod(rec - mod(l_ptr->next_in_no));
>> }
>>
>> msg_set_seq_gap(msg, gap);
>> .....
>>
>> When the protocol gets stuck, 'rec_gap' should be found to be (54992 
>> - 53968) = 1024
>> Since the result is non-zero, tipc_link_send_proto_msg() is called.
>> Inside that routine three things can happen:
>> 1) l_ptr->oldest_deferred_in is NULL. This means that gap' will 
>> retain its value of 1024.
>> This leads us into case 3) below.
> How can l_ptr->oldest_deferred_in be NULL? Don't think this is out case.
>> 2) The calculation of 'gap' over-writes the original value. If this 
>> value always is zero,
>> the protocol will bail out. Can this happen?
>> 3) msg_set_gap() always writes a zero into the message.
>> Actually, this is fully possible. The field for 'gap' is only 8 bits 
>> long, so any gap size
>> which is a multiple of 256 will give a zero. Looking at the dump, 
>> this looks
>> very possible: the first packet loss is not 95 packets, as I stated 
>> in my first mail, but
>> 54483 - 53967= 525 packets. This is counting only from what we see in 
>> Wireshark,
>> which we have reason to suspect doesn't show all packets. So the real 
>> value might
>> quite well be 512. And if this happens, we are stuck forever, because 
>> the head
>> of the deferred-queue will never move.
> This seems to be the case we are seeing in the dump. 525 is just too 
> close to 512 :)
>> My question to Peter is: How often does this happen? Every time? Often?
>> If it happens often, can it be that the Ethernet driver has the habit 
>> of throwing away
>> blocks of packets which are exactly a multiple of 256 or 512. (These 
>> computer
>> programmers...)
> Considering the amount of traffic I have on the nodes and packet drop 
> rate relative to the frequency of occurance of this problem - we can 
> safely call it a rare condition (... like 1 in 256 chance when packet 
> drop occures ;) ). It is not predictable under normal operation, while 
> very easy to cause using a stress test, thou it usualy took me between 
> 2 and 10 attempts before I can make the link stall.
> I reduced link window to 224 when I realized the gap field is 8bit and 
> I haven't seen any problems since then.
> However it's worth noticing that some time ago when the cluster was 
> working over a 100mbit net, a window of < 256 was causing more trouble 
> than a window of say > 512 when facing high traffic/packet rate and 
> that is actualy the reason I ended up using windows of up to 4096. 
> Being a good programmer, I was trying powers of 2 for link window 
> values up until I got to 4096, when my troubles almost disappeared 
> during the 100mbit era.
> When we switched to gbit lan the picture changes quite dramaticaly.
> Regarding the magic number 256 - it seems that most of my e1000 nics 
> have a 256 entry tx/rx descriptor table, so it may have something to 
> do with the issue .. but who knows :)
>
>> Anyway, we have clearly found a potential problem which must be 
>> resolved. With
>> window sizes > 255, scenario 3) is bound to happen now and then. If 
>> this is Peter's
>> problem, remains to be seen.
> So far it seems that the protocol may be having issues with link 
> windows >= 256.
> I'd guess it will be a good idea to add the gap check you proposed 
> anyway.
>
> Regards,
> Peter.
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] RE : Re: Link related question/issue

Reply via email to