On 07/06/2018 09:11 AM, Jon Maloy wrote: > Currently a link is declared stale and reset if there has been 100 > repeated attempts to retransmit the same packet. However, in certain > infrastrucures we see that packet duplicates and delays may cause
s/infrastrucures/infrastructures > such retransmit attempts to occur at a high rate, so that the peer > doesn't have a reasonable chance to acknowledge the reception before > the 100-limit is hit. This may take much less than the stipulated link > tolerance time, and despite that probe/probe replies otherwise go > through as normal. > > We now extend the criteria for link reset to also being time based. > I.e., we don't reset the link until the link tolerance time is passed > AND we have made more than 100 retransmissions attempts. > Good idea. But I also ever observed another phenomenon before. In most of cases many retransmission requests were caused because packets were out of order on receipt side, not because the missing packages are really lost. For example we often meet such case: after a node just requests its peer to retransmit a series of packets, its requested packets arrive subsequently. It means the unnecessary retransmission requests wasted lot of network bandwidth. Especially for broadcast link, the situation is much worse. Therefore, we need to figure out one method to postpone a bit retransmission request. Of course, I am not object to this patch, instead it's is quite good. > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> > --- > net/tipc/link.c | 43 ++++++++++++++++++++++++------------------- > 1 file changed, 24 insertions(+), 19 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c > index 4798a8b..ea1d9d0 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -106,7 +106,8 @@ struct tipc_stats { > * @backlogq: queue for messages waiting to be sent > * @snt_nxt: next sequence number to use for outbound messages > * @last_retransmitted: sequence number of most recently retransmitted > message > - * @stale_count: # of identical retransmit requests made by peer > + * @stale_cnt: counter for number of identical retransmit attempts > + * @stale_limit: time when repeated identical retransmits must force link > reset > * @ackers: # of peers that needs to ack each packet before it can be > released > * @acked: # last packet acked by a certain peer. Used for broadcast. > * @rcv_nxt: next sequence number to expect for inbound messages > @@ -162,7 +163,8 @@ struct tipc_link { > u16 snd_nxt; > u16 last_retransm; > u16 window; > - u32 stale_count; > + u16 stale_cnt; > + unsigned long stale_limit; > > /* Reception */ > u16 rcv_nxt; > @@ -856,7 +858,7 @@ void tipc_link_reset(struct tipc_link *l) > l->acked = 0; > l->silent_intv_cnt = 0; > l->rst_cnt = 0; > - l->stale_count = 0; > + l->stale_cnt = 0; > l->bc_peer_is_up = false; > memset(&l->mon_state, 0, sizeof(l->mon_state)); > tipc_link_reset_stats(l); > @@ -993,39 +995,41 @@ static void link_retransmit_failure(struct tipc_link > *l, struct sk_buff *skb) > msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr)); > } > > -int tipc_link_retrans(struct tipc_link *l, struct tipc_link *nacker, > +/* tipc_link_retrans() - retransmit one or more packets > + * @l: the link to transmit on > + * @r: the receiving link ordering the retransmit. Same as l if unicast > + * @from: retransmit from (inclusive) this sequence number > + * @to: retransmit to (inclusive) this sequence number > + * xmitq: queue for accumulating the retransmitted packets > + */ > +int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r, > u16 from, u16 to, struct sk_buff_head *xmitq) > { > struct sk_buff *_skb, *skb = skb_peek(&l->transmq); > - struct tipc_msg *hdr; > - u16 ack = l->rcv_nxt - 1; > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > + u16 ack = l->rcv_nxt - 1; > + struct tipc_msg *hdr; > > if (!skb) > return 0; > > /* Detect repeated retransmit failures on same packet */ > - if (nacker->last_retransm != buf_seqno(skb)) { > - nacker->last_retransm = buf_seqno(skb); > - nacker->stale_count = 1; > - } else if (++nacker->stale_count > 100) { > + if (r->last_retransm != buf_seqno(skb)) { > + r->last_retransm = buf_seqno(skb); > + r->stale_limit = jiffies + msecs_to_jiffies(l->tolerance); > + } else if (time_after(jiffies, r->stale_limit) && ++r->stale_cnt > 99) { > link_retransmit_failure(l, skb); > - nacker->stale_count = 0; Sorry, I could not understand why we don't reset l->stale_cnt here, but resetting it is moved to tipc_link_rcv(). Thanks, Ying > if (link_is_bc_sndlink(l)) > return TIPC_LINK_DOWN_EVT; > return tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > } > > - /* Move forward to where retransmission should start */ > skb_queue_walk(&l->transmq, skb) { > - if (!less(buf_seqno(skb), from)) > - break; > - } > - > - skb_queue_walk_from(&l->transmq, skb) { > - if (more(buf_seqno(skb), to)) > - break; > hdr = buf_msg(skb); > + if (less(msg_seqno(hdr), from)) > + continue; > + if (more(msg_seqno(hdr), to)) > + break; > _skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC); > if (!_skb) > return 0; > @@ -1268,6 +1272,7 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff > *skb, > > /* Forward queues and wake up waiting users */ > if (likely(tipc_link_release_pkts(l, msg_ack(hdr)))) { > + l->stale_cnt = 0; > tipc_link_advance_backlog(l, xmitq); > if (unlikely(!skb_queue_empty(&l->wakeupq))) > link_prepare_wakeup(l); > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion