> -----Original Message-----
> From: Xue, Ying [mailto:ying....@windriver.com]
> Sent: Tuesday, 30 August, 2016 05:55
> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>;
> tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>; Richard Alpe
> <richard.a...@ericsson.com>
> Cc: gbala...@gmail.com
> Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions
> 
> > +/* link_bc_retr eval()- check if the indicated range can be
> > +retransmitted now
> > + * - Adjust permitted range if there is overlap with previous
> > +retransmission  */ static bool link_bc_retr_eval(struct tipc_link *l,
> > +u16 *from, u16 *to) {
> > +   unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> > +
> > +   if (less(*to, *from))
> > +           return false;
> > +
> > +   /* New retransmission request */
> > +   if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> > +       less(*to, l->prev_from) || more(*from, l->prev_to)) {
> > +           l->prev_from = *from;
> > +           l->prev_to = *to;
> > +           l->prev_retr = jiffies;
> > +           return true;
> > +   }
> > +
> >
> > [Ying] In the my understanding, the statement above seems to be changed as
> below:
> >
> > if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) ||
> more(*from, l->prev_to)))
> >
> > Otherwise, I think the rate of retransmission may be still very high 
> > especially
> when packet disorder frequently happens.
> > As it's very normal that BCast retransmission range requested from different
> nodes is not same, I guess the rate is not well suppressed in above condition.
> >
> > Maybe, we can use the method which is being effective in tipc_sk_enqueue()
> to limit the rate like:
> 
> The condition indicates that the two retransmission requests are
> completely disjoint, e.g., 10-12 and 13-15.
> 
> [Ying]Understood, and it makes sense.
> 
> I don't see any reason why
> we should wait with retransmitting 13-15 in this case, as it is obvious
> that somebody is missing it, and we haven't retransmitted it before.
> 
> [Ying] Most of cases I observed on bcast before, is that after NACK is just
> submitted to request sender to retransmit missed packets, the requested
> packets quickly arrives soon. But tragically sender still retransmits the 
> requested
> packets through bcast later, leading to waste network bandwidth.  Especially
> when packets are received in disorder, it will further worsen the situation, 
> which
> may be one of reasons why we see many duplicated bcast message appearing on
> network wire. To avoid this case, it's better to a bit postpone the 
> retransmission
> of messages on sender, on the contrary, we also can restrain the 
> retransmission
> request on receiver side. In other words, when receiver detects that there is 
> a
> gap at the first time, it's better to delay a bit time to submit NACK message.
> 

There is an implicit delay in the fact that the gap detection is made through 
looking at STATE messages, and not directly from the packet flow. Reception of 
STATE messages should typically happen after the indicated "last_sent" packet 
already has been received, and after the detection we want one single 
retransmission of the missing packets as soon as possible in order to not waste 
bandwidth.
To further speed up detection and retransmission, you will see that I in the 
next commit introduce detection and NACK-ing directly from the incoming packet 
stream, although it is delayed and coordinated among the receivers.

In brief, I only see positive effects of early gap detection and sending of 
NACKs, as long as we can make an intelligent rate limiting of the actual 
retransmissions.

> Moreover, I have a more aggressive idea: now that NACK is delivered through
> unicast, why not use unicast to retransmit bcast messages. This is because 
> what
> packets missed rarely happens in practice. 

I tested this by running the "bcast-blast" program between two peers in a 
200-node network, and I can assure there were plenty of missed packets. 
Moreover, it seems to me that when a packet is lost, it is most often because 
it is dropped by an overwhelmed bridge, so that many, if not all, recipient 
will miss the same packet. So I don't think unicast is the solution.

My testing also consisted of adding many atomic counters in the code, and print 
out those by command. This gave a good insight into the statistical behavior of 
the protocol across all nodes (name spaces). In this rather extreme test I 
could see a lot of retransmissions (almost 100 %), but counting the absolute 
number of packets sent and received I could also observe the percentage of 
duplicate packets (i.e., redundant retransmissions) was very low (< 10 %), 
something that lead me to believe that the retransmission protocol is about as 
efficient as it can be.  

It is possible that we can make throughput better by decreasing the absolute 
number of dropped packets, but I rather think this can be achieved by 
decreasing the absolute pressure on the bridge by using a slower sending rate, 
and I doubt there is much to achieve from improving the retransmission 
protocol. I am trying to figure out a way to do this, but just decreasing the 
window size didn't do the trick.

My feeling is that there is more that can be done to this, but I haven't 
figured out how, and I don't have much more time to spend on this right now.

BR
///jon

> So when bcast messages are
> retransmitted through bcast, these message for most of nodes are redundant,
> which probably means it's unnecessary to send them through bcast.
> 
> Regards,
> Ying
> 
> But I do understand it can be problematic if re receive repeated
> disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test
> and see if there is anything to gain from keeping a "history" of the
> retransmissions, so that we never repeat the same packets inside the
> same interval. I'll take a look.
> 
> ///jon
> 
> >
> > tipc_sk_enqueue()
> > {
> > ..
> >          while (skb_queue_len(inputq)) {
> >                  if (unlikely(time_after_eq(jiffies, time_limit)))
> >                          return;
> > ..
> > }
> >
> > And we don’t need to consider the retransmission range.
> >
> > Regards,
> > Ying
> >
> > +   /* Inside range of previous retransmit */
> > +   if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
> > +           return false;
> > +
> > +   /* Fully or partially outside previous range => exclude overlap */
> > +   if (less(*from, l->prev_from)) {
> > +           *to = l->prev_from - 1;
> > +           l->prev_from = *from;
> > +   }
> > +   if (more(*to, l->prev_to)) {
> > +           *from = l->prev_to + 1;
> > +           l->prev_to = *to;
> > +   }
> > +   l->prev_retr = jiffies;
> > +   return true;
> > +}
> > +
> >   /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
> >    */
> >   int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> >                       struct sk_buff_head *xmitq)
> >   {
> > +   struct tipc_link *snd_l = l->bc_sndlink;
> >     u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
> >     u16 from = msg_bcast_ack(hdr) + 1;
> >     u16 to = from + msg_bc_gap(hdr) - 1;
> > @@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, 
> > struct
> tipc_msg *hdr,
> >     if (!l->bc_peer_is_up)
> >             return rc;
> >
> > +   l->stats.recv_nacks++;
> > +
> >     /* Ignore if peers_snd_nxt goes beyond receive window */
> >     if (more(peers_snd_nxt, l->rcv_nxt + l->window))
> >             return rc;
> >
> > -   if (!less(to, from)) {
> > -           rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> > -           l->stats.recv_nacks++;
> > -   }
> > +   if (link_bc_retr_eval(snd_l, &from, &to))
> > +           rc = tipc_link_retrans(snd_l, from, to, xmitq);
> >
> >     l->snd_nxt = peers_snd_nxt;
> >     if (link_bc_rcv_gap(l))
> > --
> > 2.7.4
> >

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to