Looking at TCP's FRR they only retransmit after thee "duplicate" 
acknowledgements, which would correspond to us only sending NACK after 
receiving three out-of-sequence packets.
The question is, does this really consume *lots* od bandwidth as you say, or is 
it just a minor nuisance?
I would like to see some real-site statistics before trying to change this.

BR
///jon


> -----Original Message-----
> From: Ying Xue <ying....@windriver.com>
> Sent: Friday, 06 July, 2018 07:35
> To: Jon Maloy <jon.ma...@ericsson.com>; Jon Maloy <ma...@donjonn.com>
> Cc: Mohan Krishna Ghanta Krishnamurthy 
> <mohan.krishna.ghanta.krishnamur...@ericsson.com>; Tung Quang Nguyen
> <tung.q.ngu...@dektech.com.au>; Hoang Huu Le <hoang.h...@dektech.com.au>; 
> Canh Duc Luu <canh.d....@dektech.com.au>;
> Gordan Mihaljevic <gordan.mihalje...@dektech.com.au>; 
> parthasarathy.bhuvara...@gmail.com; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: [net-next 1/1] tipc: extend link reset criteria for stale packet 
> retransmission
> 
> On 07/06/2018 09:11 AM, Jon Maloy wrote:
> > Currently a link is declared stale and reset if there has been 100
> > repeated attempts to retransmit the same packet. However, in certain
> > infrastrucures we see that packet duplicates and delays may cause
> 
> s/infrastrucures/infrastructures
> 
> > such retransmit attempts to occur at a high rate, so that the peer
> > doesn't have a reasonable chance to acknowledge the reception before
> > the 100-limit is hit. This may take much less than the stipulated link
> > tolerance time, and despite that probe/probe replies otherwise go
> > through as normal.
> >
> > We now extend the criteria for link reset to also being time based.
> > I.e., we don't reset the link until the link tolerance time is passed
> > AND we have made more than 100 retransmissions attempts.
> >
> 
> Good idea.
> 
> But I also ever observed another phenomenon before. In most of cases
> many retransmission requests were caused because packets were out of
> order on receipt side, not because the missing packages are really lost.
> 
> For example we often meet such case: after a node just requests its peer
> to retransmit a series of packets, its requested packets arrive
> subsequently. It means the unnecessary retransmission requests wasted
> lot of network bandwidth. Especially for broadcast link, the situation
> is much worse.
> 
> Therefore, we need to figure out one method to postpone a bit
> retransmission request.
> 
> Of course, I am not object to this patch, instead it's is quite good.
> 
> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> > ---
> >  net/tipc/link.c | 43 ++++++++++++++++++++++++-------------------
> >  1 file changed, 24 insertions(+), 19 deletions(-)
> >
> > diff --git a/net/tipc/link.c b/net/tipc/link.c
> > index 4798a8b..ea1d9d0 100644
> > --- a/net/tipc/link.c
> > +++ b/net/tipc/link.c
> > @@ -106,7 +106,8 @@ struct tipc_stats {
> >   * @backlogq: queue for messages waiting to be sent
> >   * @snt_nxt: next sequence number to use for outbound messages
> >   * @last_retransmitted: sequence number of most recently retransmitted 
> > message
> > - * @stale_count: # of identical retransmit requests made by peer
> > + * @stale_cnt: counter for number of identical retransmit attempts
> > + * @stale_limit: time when repeated identical retransmits must force link 
> > reset
> >   * @ackers: # of peers that needs to ack each packet before it can be 
> > released
> >   * @acked: # last packet acked by a certain peer. Used for broadcast.
> >   * @rcv_nxt: next sequence number to expect for inbound messages
> > @@ -162,7 +163,8 @@ struct tipc_link {
> >     u16 snd_nxt;
> >     u16 last_retransm;
> >     u16 window;
> > -   u32 stale_count;
> > +   u16 stale_cnt;
> > +   unsigned long stale_limit;
> >
> >     /* Reception */
> >     u16 rcv_nxt;
> > @@ -856,7 +858,7 @@ void tipc_link_reset(struct tipc_link *l)
> >     l->acked = 0;
> >     l->silent_intv_cnt = 0;
> >     l->rst_cnt = 0;
> > -   l->stale_count = 0;
> > +   l->stale_cnt = 0;
> >     l->bc_peer_is_up = false;
> >     memset(&l->mon_state, 0, sizeof(l->mon_state));
> >     tipc_link_reset_stats(l);
> > @@ -993,39 +995,41 @@ static void link_retransmit_failure(struct tipc_link 
> > *l, struct sk_buff *skb)
> >             msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
> >  }
> >
> > -int tipc_link_retrans(struct tipc_link *l, struct tipc_link *nacker,
> > +/* tipc_link_retrans() - retransmit one or more packets
> > + * @l: the link to transmit on
> > + * @r: the receiving link ordering the retransmit. Same as l if unicast
> > + * @from: retransmit from (inclusive) this sequence number
> > + * @to: retransmit to (inclusive) this sequence number
> > + * xmitq: queue for accumulating the retransmitted packets
> > + */
> > +int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r,
> >                   u16 from, u16 to, struct sk_buff_head *xmitq)
> >  {
> >     struct sk_buff *_skb, *skb = skb_peek(&l->transmq);
> > -   struct tipc_msg *hdr;
> > -   u16 ack = l->rcv_nxt - 1;
> >     u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
> > +   u16 ack = l->rcv_nxt - 1;
> > +   struct tipc_msg *hdr;
> >
> >     if (!skb)
> >             return 0;
> >
> >     /* Detect repeated retransmit failures on same packet */
> > -   if (nacker->last_retransm != buf_seqno(skb)) {
> > -           nacker->last_retransm = buf_seqno(skb);
> > -           nacker->stale_count = 1;
> > -   } else if (++nacker->stale_count > 100) {
> > +   if (r->last_retransm != buf_seqno(skb)) {
> > +           r->last_retransm = buf_seqno(skb);
> > +           r->stale_limit = jiffies + msecs_to_jiffies(l->tolerance);
> > +   } else if (time_after(jiffies, r->stale_limit) && ++r->stale_cnt > 99) {
> >             link_retransmit_failure(l, skb);
> > -           nacker->stale_count = 0;
> 
> Sorry, I could not understand why we don't reset l->stale_cnt here, but
> resetting it is moved to tipc_link_rcv().
> 
> Thanks,
> Ying
> 
> >             if (link_is_bc_sndlink(l))
> >                     return TIPC_LINK_DOWN_EVT;
> >             return tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
> >     }
> >
> > -   /* Move forward to where retransmission should start */
> >     skb_queue_walk(&l->transmq, skb) {
> > -           if (!less(buf_seqno(skb), from))
> > -                   break;
> > -   }
> > -
> > -   skb_queue_walk_from(&l->transmq, skb) {
> > -           if (more(buf_seqno(skb), to))
> > -                   break;
> >             hdr = buf_msg(skb);
> > +           if (less(msg_seqno(hdr), from))
> > +                   continue;
> > +           if (more(msg_seqno(hdr), to))
> > +                   break;
> >             _skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
> >             if (!_skb)
> >                     return 0;
> > @@ -1268,6 +1272,7 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff 
> > *skb,
> >
> >             /* Forward queues and wake up waiting users */
> >             if (likely(tipc_link_release_pkts(l, msg_ack(hdr)))) {
> > +                   l->stale_cnt = 0;
> >                     tipc_link_advance_backlog(l, xmitq);
> >                     if (unlikely(!skb_queue_empty(&l->wakeupq)))
> >                             link_prepare_wakeup(l);
> >
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to