I tried different varieties of the solutions discussed below.

  1.  Ignoring 1st, 2d or 3d first NACKs, and rely on the time stamp for 
repeated retransmissions.
  2.  Ignoring 1st,2d,4th,5th and all other NACKS which are not a multiple of 3


Some gave the same throughput as with the patches I posted, but none was 
definitely better, as far as I could see.
However, I didn’t do any long series, so there might still be some small 
percentage improvement I have missed.

///jon


From: Jon Maloy <ma...@donjonn.com>
Sent: 4-Dec-19 13:07
To: Jon Maloy <jon.ma...@ericsson.com>; Xue, Ying <ying....@windriver.com>; 
x...@redhat.com; Tuong Tong Lien <tuong.t.l...@dektech.com.au>; Tung Quang 
Nguyen <tung.q.ngu...@dektech.com.au>; Hoang Huu Le 
<hoang.h...@dektech.com.au>; John Rutherford <john.rutherf...@dektech.com.au>; 
tipc-discussion@lists.sourceforge.net; tipc-...@dektech.com.au
Subject: Re: [net-next 1/3] tipc: eliminate gap indicator from ACK messages

Hi Ying,
(cc-ing tipc-discussion, since yout original mail seems to have been dropped 
somewhere, and I want everybody to be able follow the discussion)


Actually we do have a kind of SACK mechanism already, but maybe too simple.
See below.


On Wednesday, December 4, 2019, 11:45:37 AM GMT-5, Xue, Ying 
<ying....@windriver.com<mailto:ying....@windriver.com>> wrote:


I don’t know why I received lots of complains from outlook which indicate my 
below email was not sent to you.
I just try to resend. If you received, please ignore this one.

Thanks,
Ying

-----Original Message-----
From: Ying Xue [mailto:ying....@windriver.com<mailto:ying....@windriver.com>]
Sent: Thursday, December 5, 2019 12:39 AM
To: Jon Maloy; Jon Maloy
Cc: 
mohan.krishna.ghanta.krishnamur...@ericsson.com<mailto:mohan.krishna.ghanta.krishnamur...@ericsson.com>;
 parthasarathy.bhuvara...@gmail.com<mailto:parthasarathy.bhuvara...@gmail.com>; 
tung.q.ngu...@dektech.com.au<mailto:tung.q.ngu...@dektech.com.au>; 
hoang.h...@dektech.com.au<mailto:hoang.h...@dektech.com.au>; 
tuong.t.l...@dektech.com.au<mailto:tuong.t.l...@dektech.com.au>; 
gordan.mihalje...@dektech.com.au<mailto:gordan.mihalje...@dektech.com.au>; 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [net-next 1/3] tipc: eliminate gap indicator from ACK messages


On 12/2/19 8:32 AM, Jon Maloy wrote:
> When we increase the link send window we sometimes observe the
> following scenario:
>
> 1) A packet #N arrives out of order far ahead of a sequence of older
>    packets which are still under way. The packet is added to the
>    deferred queue.
> 2) The missing packets arrive in sequence, and for each 16th of them
>    an ACK is sent back to the receiver, as it should be.
> 3) When building those ACK messages, it is checked if there is a gap
>    between the link's 'rcv_nxt' and the first packet in the deferred
>    queue. This is always the case until packet number #N-1 arrives, and
>    a 'gap' indicator is added, effectively turning them into NACK
>    messages.
> 4) When those NACKs arrive at the sender, all the requested
>    retransmissions are done, since it is a first-time request.
>
> This sometimes leads to a huge amount of redundant retransmissions,
> causing a drop in max throughput. This problem gets worse when we
> in a later commit introduce variable window congestion control,
> since it drops the link back to 'slow start' much more often
> than necessary.
>

I think the essential thing is about how we should precisely detect if package 
is really lost.
When we can identify a message is really lost, that will be good moment to ask 
sender to retransmit the lost message to receiver.
But it's very difficult to identify this thing. When receiver finds a missing 
message in its receipt queue, the message has two statuses:
1. The missing message is just out of order.
2. The missing message is really lost.

For the first case, when receiver detects the missing message and it sends a 
retransmission request before the out-of-order message arrives at the receiver,
It will cause a redundant retransmission. In order to avoid this situation, we 
should send retransmission request as late as possible.

For the second case, when the missing message is really lost, it will cause one 
bad thing that lots of messages are queued in sending buffer of sender side if
Receiver doesn't timely send retransmission request. In order to avoid this 
situation, we should send retransmission request as early as possible.

As a consequence, the two conclusions we drew above conflict each other. It 
means we cannot just use one method to perfectly solve this problem.

So let's take SCTP as an example to understand how it handles this situation:

In SCTP, there are two different retransmission mechanisms: one is normal 
retransmission mechanism, and another is called fast retransmission.
In the normal retransmission algorithm, it sets a timer to detect whether 
message is lost or not. When a message's timer is expired, it supposes the 
message
Is lost and the message is then retransmitted.

[jon] This is what we are missing, and as a result first retransmission 
requests are always done. I have already observed that >90% of those are 
useless under normal circumstances, since they end up as duplicates. I was able 
to somewhat improve this figure with my first two patches in the series, but 
there is still a lot of duplicates.
I am sure we can do better here.
On a general note, I strongly dislike timers, especially if they are set per 
packet. Maybe we could just set the control block time stamp to e.g. N ms, and 
then on each link timeout check the age of the first (and oldest) buffer in the 
transmit queue. If the age exceeds N ms, we retransmit, and go on to check the 
next buffer etc.
But remember, nobody is using TIPC across the internet, so delay times are 
short, and it is far likelier that the receiver will detect a gap and send a 
NACK long before this limit has expired. So, I suspect such a mechanism would 
be of limited value. Also note that in the last patch we actually introduce 
something similar in tipc_link_timeout(), although the limit will in practice 
be 375 ms, and I never once saw it being trigged during my testing.


 In the fast retransmission algorithm, three SACK messages indicate the same 
message is missed, the message
will be immediately retransmitted before its retransmission timer is expired.

[jon] We have a similar mechanism, but we based it on the skb timestamp. If a 
packet has been retransmitted once, we don't allow it to be retransmitted until 
1 ms has elapsed, irrespective of number of NACKs. We could of course base this 
on the already existing retransmission counter instead. I will try that.



Regarding my understanding, the normal retransmission mechanism is used to 
handle the first case to send retransmission request as late as possible.
The fast retransmission algorithm is used to handle the second case to send 
retransmission request earlier than retransmission timer expiration.

Therefore, probably we can use its ideas to handle our TIPC cases:
1. We can introduce a retransmission timer to perform delayed retransmission.
2. We should remain the gap indicator. But when receiver reports a gap to 
sender, it doesn't send NACK to sender. Instead it should SACK to sender.

[jon] All our NACKs are in reality SACKs. It is up to the receiver to interpret 
them correctly.


Whenever sender receives NACK, it will immediately retransmit missed messages, 
which probably cause lots of duplicated messages. But for SACK,
It means Selective ACK which contains gap info. But when a sender receives SACK 
and gap info, it doesn't immediately retransmit the missed message.
Instead, it releases the messages in its sending queue which have been 
acknowledged by SACK message. Meanwhile, once three SACK messages report the 
same message
Is missed, the sender can immediately retransmit the message, which is exactly 
what the fast retransmission algorithm is doing.

Of course, the two suggested approaches are much more complex than our current 
TIPC link traffic control algorithm. If we decide to introduce retransmission 
timer, we have to introduce another
algorithm to measure RTT time. I am not sure whether we should introduce 
retransmission timer, but in my opinion, introducing SACK might be a good idea 
to TIPC.


[jon] Once we have an RTT measuring mechanism in place, we should be able to 
calculate an optimal expiration time according to the above suggestion. But I 
am skeptical to its practical value, at least the way TIPC is used now.
But I do agree that we need a mechanism for limiting the number of first 
retransmission attempts, which constitutes the vast majority of 
retransmissions, and to it vast majority are useless. I think I will first try 
leveraging the skb retransmission counter here, and see what will happen.

///jon




About SCTP SACK, please refer to: https://tools.ietf.org/html/rfc4960#page-34

Thanks,
Ying

> We now fix this by not sending any 'gap' indicator in regular ACK
> messages. We already have a mechanism for sending explicit NACKs
> in place, and this is sufficient to keep up the packet flow.
>
> Signed-off-by: Jon Maloy 
> <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>
> ---
>  net/tipc/link.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 24d4d10..6d86446 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1521,7 +1521,8 @@ static int tipc_link_build_nack_msg(struct tipc_link *l,
>                      struct sk_buff_head *xmitq)
>  {
>      u32 def_cnt = ++l->stats.deferred_recv;
> -    u32 defq_len = skb_queue_len(&l->deferdq);
> +    struct sk_buff_head *dfq = &l->deferdq;
> +    u32 defq_len = skb_queue_len(dfq);
>      int match1, match2;
>
>      if (link_is_bc_rcvlink(l)) {
> @@ -1532,8 +1533,12 @@ static int tipc_link_build_nack_msg(struct tipc_link 
> *l,
>          return 0;
>      }
>
> -    if (defq_len >= 3 && !((defq_len - 3) % 16))
> -        tipc_link_build_proto_msg(l, STATE_MSG, 0, 0, 0, 0, 0, xmitq);
> +    if (defq_len >= 3 && !((defq_len - 3) % 16)) {
> +        u16 rcvgap = buf_seqno(skb_peek(dfq)) - l->rcv_nxt;
> +
> +        tipc_link_build_proto_msg(l, STATE_MSG, 0, 0,
> +                      rcvgap, 0, 0, xmitq);
> +    }
>      return 0;
>  }
>
> @@ -1631,7 +1636,7 @@ static void tipc_link_build_proto_msg(struct tipc_link 
> *l, int mtyp, bool probe,
>      if (!tipc_link_is_up(l) && (mtyp == STATE_MSG))
>          return;
>
> -    if (!skb_queue_empty(dfq))
> +    if ((probe || probe_reply) && !skb_queue_empty(dfq))
>          rcvgap = buf_seqno(skb_peek(dfq)) - l->rcv_nxt;
>
>      skb = tipc_msg_create(LINK_PROTOCOL, mtyp, INT_H_SIZE,
> @@ -2079,7 +2084,6 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
> struct sk_buff *skb,
>          if (rcvgap || reply)
>              tipc_link_build_proto_msg(l, STATE_MSG, 0, reply,
>                            rcvgap, 0, 0, xmitq);
> -
>          rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq);
>
>          /* If NACK, retransmit will now start at right position */
>

_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to