Re: [tipc-discussion] [net-next 1/1] tipc: extend link reset criteria for stale packet retransmission

Xue, Ying Fri, 06 Jul 2018 05:23:24 -0700

Yeah, you definitely can as long as you want :)

Sent from my iPhone


> On Jul 6, 2018, at 8:12 PM, Jon Maloy <jon.ma...@ericsson.com> wrote:
> 
> Hi,
> Agree with your comment, but that would be some other patch.
> Can I put an "acked-by" on this ?
> 
> ///jon
> 
>> -----Original Message-----
>> From: Ying Xue <ying....@windriver.com>
>> Sent: Friday, 06 July, 2018 07:35
>> To: Jon Maloy <jon.ma...@ericsson.com>; Jon Maloy <ma...@donjonn.com>
>> Cc: Mohan Krishna Ghanta Krishnamurthy 
>> <mohan.krishna.ghanta.krishnamur...@ericsson.com>; Tung Quang Nguyen
>> <tung.q.ngu...@dektech.com.au>; Hoang Huu Le <hoang.h...@dektech.com.au>; 
>> Canh Duc Luu <canh.d....@dektech.com.au>;
>> Gordan Mihaljevic <gordan.mihalje...@dektech.com.au>; 
>> parthasarathy.bhuvara...@gmail.com; tipc-
>> discuss...@lists.sourceforge.net
>> Subject: Re: [net-next 1/1] tipc: extend link reset criteria for stale 
>> packet retransmission
>> 
>>> On 07/06/2018 09:11 AM, Jon Maloy wrote:
>>> Currently a link is declared stale and reset if there has been 100
>>> repeated attempts to retransmit the same packet. However, in certain
>>> infrastrucures we see that packet duplicates and delays may cause
>> 
>> s/infrastrucures/infrastructures
>> 
>>> such retransmit attempts to occur at a high rate, so that the peer
>>> doesn't have a reasonable chance to acknowledge the reception before
>>> the 100-limit is hit. This may take much less than the stipulated link
>>> tolerance time, and despite that probe/probe replies otherwise go
>>> through as normal.
>>> 
>>> We now extend the criteria for link reset to also being time based.
>>> I.e., we don't reset the link until the link tolerance time is passed
>>> AND we have made more than 100 retransmissions attempts.
>>> 
>> 
>> Good idea.
>> 
>> But I also ever observed another phenomenon before. In most of cases
>> many retransmission requests were caused because packets were out of
>> order on receipt side, not because the missing packages are really lost.
>> 
>> For example we often meet such case: after a node just requests its peer
>> to retransmit a series of packets, its requested packets arrive
>> subsequently. It means the unnecessary retransmission requests wasted
>> lot of network bandwidth. Especially for broadcast link, the situation
>> is much worse.
>> 
>> Therefore, we need to figure out one method to postpone a bit
>> retransmission request.
>> 
>> Of course, I am not object to this patch, instead it's is quite good.
>> 
>>> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
>>> ---
>>> net/tipc/link.c | 43 ++++++++++++++++++++++++-------------------
>>> 1 file changed, 24 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>>> index 4798a8b..ea1d9d0 100644
>>> --- a/net/tipc/link.c
>>> +++ b/net/tipc/link.c
>>> @@ -106,7 +106,8 @@ struct tipc_stats {
>>>  * @backlogq: queue for messages waiting to be sent
>>>  * @snt_nxt: next sequence number to use for outbound messages
>>>  * @last_retransmitted: sequence number of most recently retransmitted 
>>> message
>>> - * @stale_count: # of identical retransmit requests made by peer
>>> + * @stale_cnt: counter for number of identical retransmit attempts
>>> + * @stale_limit: time when repeated identical retransmits must force link 
>>> reset
>>>  * @ackers: # of peers that needs to ack each packet before it can be 
>>> released
>>>  * @acked: # last packet acked by a certain peer. Used for broadcast.
>>>  * @rcv_nxt: next sequence number to expect for inbound messages
>>> @@ -162,7 +163,8 @@ struct tipc_link {
>>>    u16 snd_nxt;
>>>    u16 last_retransm;
>>>    u16 window;
>>> -    u32 stale_count;
>>> +    u16 stale_cnt;
>>> +    unsigned long stale_limit;
>>> 
>>>    /* Reception */
>>>    u16 rcv_nxt;
>>> @@ -856,7 +858,7 @@ void tipc_link_reset(struct tipc_link *l)
>>>    l->acked = 0;
>>>    l->silent_intv_cnt = 0;
>>>    l->rst_cnt = 0;
>>> -    l->stale_count = 0;
>>> +    l->stale_cnt = 0;
>>>    l->bc_peer_is_up = false;
>>>    memset(&l->mon_state, 0, sizeof(l->mon_state));
>>>    tipc_link_reset_stats(l);
>>> @@ -993,39 +995,41 @@ static void link_retransmit_failure(struct tipc_link 
>>> *l, struct sk_buff *skb)
>>>        msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
>>> }
>>> 
>>> -int tipc_link_retrans(struct tipc_link *l, struct tipc_link *nacker,
>>> +/* tipc_link_retrans() - retransmit one or more packets
>>> + * @l: the link to transmit on
>>> + * @r: the receiving link ordering the retransmit. Same as l if unicast
>>> + * @from: retransmit from (inclusive) this sequence number
>>> + * @to: retransmit to (inclusive) this sequence number
>>> + * xmitq: queue for accumulating the retransmitted packets
>>> + */
>>> +int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r,
>>>              u16 from, u16 to, struct sk_buff_head *xmitq)
>>> {
>>>    struct sk_buff *_skb, *skb = skb_peek(&l->transmq);
>>> -    struct tipc_msg *hdr;
>>> -    u16 ack = l->rcv_nxt - 1;
>>>    u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
>>> +    u16 ack = l->rcv_nxt - 1;
>>> +    struct tipc_msg *hdr;
>>> 
>>>    if (!skb)
>>>        return 0;
>>> 
>>>    /* Detect repeated retransmit failures on same packet */
>>> -    if (nacker->last_retransm != buf_seqno(skb)) {
>>> -        nacker->last_retransm = buf_seqno(skb);
>>> -        nacker->stale_count = 1;
>>> -    } else if (++nacker->stale_count > 100) {
>>> +    if (r->last_retransm != buf_seqno(skb)) {
>>> +        r->last_retransm = buf_seqno(skb);
>>> +        r->stale_limit = jiffies + msecs_to_jiffies(l->tolerance);
>>> +    } else if (time_after(jiffies, r->stale_limit) && ++r->stale_cnt > 99) 
>>> {
>>>        link_retransmit_failure(l, skb);
>>> -        nacker->stale_count = 0;
>> 
>> Sorry, I could not understand why we don't reset l->stale_cnt here, but
>> resetting it is moved to tipc_link_rcv().
>> 
>> Thanks,
>> Ying
>> 
>>>        if (link_is_bc_sndlink(l))
>>>            return TIPC_LINK_DOWN_EVT;
>>>        return tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
>>>    }
>>> 
>>> -    /* Move forward to where retransmission should start */
>>>    skb_queue_walk(&l->transmq, skb) {
>>> -        if (!less(buf_seqno(skb), from))
>>> -            break;
>>> -    }
>>> -
>>> -    skb_queue_walk_from(&l->transmq, skb) {
>>> -        if (more(buf_seqno(skb), to))
>>> -            break;
>>>        hdr = buf_msg(skb);
>>> +        if (less(msg_seqno(hdr), from))
>>> +            continue;
>>> +        if (more(msg_seqno(hdr), to))
>>> +            break;
>>>        _skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
>>>        if (!_skb)
>>>            return 0;
>>> @@ -1268,6 +1272,7 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff 
>>> *skb,
>>> 
>>>        /* Forward queues and wake up waiting users */
>>>        if (likely(tipc_link_release_pkts(l, msg_ack(hdr)))) {
>>> +            l->stale_cnt = 0;
>>>            tipc_link_advance_backlog(l, xmitq);
>>>            if (unlikely(!skb_queue_empty(&l->wakeupq)))
>>>                link_prepare_wakeup(l);
>>> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next 1/1] tipc: extend link reset criteria for stale packet retransmission

Reply via email to