-----Original Message-----
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, August 17, 2016 2:09 AM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

As cluster sizes grow, so does the amount of identical or overlapping broadcast 
NACKs generated by the packet receivers. This often leads to 'NACK crunches' 
resulting in huge numbers of redundant retransmissions of the same packet 
ranges.

In this commit, we introduce rate control of broadcast retransmissions, so that 
a retransmitted range cannot be retransmitted again until after at least 10 ms. 
This reduces the frequency of duplicate retransmissions by an order of 
magnitude, while having a significant positive impact on throughput and 
scalability.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/link.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -181,7 +181,10 @@ struct tipc_link {
        u16 acked;
        struct tipc_link *bc_rcvlink;
        struct tipc_link *bc_sndlink;
-       int nack_state;
+       unsigned long prev_retr;
+       u16 prev_from;
+       u16 prev_to;
+       u8 nack_state;
        bool bc_peer_is_up;
 
        /* Statistics */
@@ -202,6 +205,8 @@ enum {
        BC_NACK_SND_SUPPRESS,
 };
 
+#define TIPC_BC_RETR_LIMIT 10   /* [ms] */

[Ying] I suggest we should define jiffies number rather than 10ms. If so, we 
don't need to convert l->prev_retr  from jiffies to microsecond.

+
 /*
  * Interval between NACKs when packets arrive out of order
  */
@@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct 
tipc_msg *hdr)
                l->rcv_nxt = peers_snd_nxt;
 }
 
+/* link_bc_retr eval()- check if the indicated range can be 
+retransmitted now
+ * - Adjust permitted range if there is overlap with previous 
+retransmission  */ static bool link_bc_retr_eval(struct tipc_link *l, 
+u16 *from, u16 *to) {
+       unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
+
+       if (less(*to, *from))
+               return false;
+
+       /* New retransmission request */
+       if ((elapsed > TIPC_BC_RETR_LIMIT) ||
+           less(*to, l->prev_from) || more(*from, l->prev_to)) {
+               l->prev_from = *from;
+               l->prev_to = *to;
+               l->prev_retr = jiffies;
+               return true;
+       }
+

[Ying] In the my understanding, the statement above seems to be changed as 
below:

if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from, 
l->prev_to)))

Otherwise, I think the rate of retransmission may be still very high especially 
when packet disorder frequently happens.
As it's very normal that BCast retransmission range requested from different 
nodes is not same, I guess the rate is not well suppressed in above condition.

Maybe, we can use the method which is being effective in tipc_sk_enqueue() to 
limit the rate like:

tipc_sk_enqueue()
{
..
        while (skb_queue_len(inputq)) {
                if (unlikely(time_after_eq(jiffies, time_limit)))
                        return;
..
}

And we don’t need to consider the retransmission range.

Regards,
Ying

+       /* Inside range of previous retransmit */
+       if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
+               return false;
+
+       /* Fully or partially outside previous range => exclude overlap */
+       if (less(*from, l->prev_from)) {
+               *to = l->prev_from - 1;
+               l->prev_from = *from;
+       }
+       if (more(*to, l->prev_to)) {
+               *from = l->prev_to + 1;
+               l->prev_to = *to;
+       }
+       l->prev_retr = jiffies;
+       return true;
+}
+
 /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
  */
 int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
                          struct sk_buff_head *xmitq)
 {
+       struct tipc_link *snd_l = l->bc_sndlink;
        u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
        u16 from = msg_bcast_ack(hdr) + 1;
        u16 to = from + msg_bc_gap(hdr) - 1;
@@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct 
tipc_msg *hdr,
        if (!l->bc_peer_is_up)
                return rc;
 
+       l->stats.recv_nacks++;
+
        /* Ignore if peers_snd_nxt goes beyond receive window */
        if (more(peers_snd_nxt, l->rcv_nxt + l->window))
                return rc;
 
-       if (!less(to, from)) {
-               rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
-               l->stats.recv_nacks++;
-       }
+       if (link_bc_retr_eval(snd_l, &from, &to))
+               rc = tipc_link_retrans(snd_l, from, to, xmitq);
 
        l->snd_nxt = peers_snd_nxt;
        if (link_bc_rcv_gap(l))
--
2.7.4

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to