> -----Original Message----- > From: Xue, Ying [mailto:ying....@windriver.com] > Sent: Tuesday, 30 August, 2016 05:55 > To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>; > tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan > <parthasarathy.bhuvara...@ericsson.com>; Richard Alpe > <richard.a...@ericsson.com> > Cc: gbala...@gmail.com > Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions > > > +/* link_bc_retr eval()- check if the indicated range can be > > +retransmitted now > > + * - Adjust permitted range if there is overlap with previous > > +retransmission */ static bool link_bc_retr_eval(struct tipc_link *l, > > +u16 *from, u16 *to) { > > + unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr); > > + > > + if (less(*to, *from)) > > + return false; > > + > > + /* New retransmission request */ > > + if ((elapsed > TIPC_BC_RETR_LIMIT) || > > + less(*to, l->prev_from) || more(*from, l->prev_to)) { > > + l->prev_from = *from; > > + l->prev_to = *to; > > + l->prev_retr = jiffies; > > + return true; > > + } > > + > > > > [Ying] In the my understanding, the statement above seems to be changed as > below: > > > > if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || > more(*from, l->prev_to))) > > > > Otherwise, I think the rate of retransmission may be still very high > > especially > when packet disorder frequently happens. > > As it's very normal that BCast retransmission range requested from different > nodes is not same, I guess the rate is not well suppressed in above condition. > > > > Maybe, we can use the method which is being effective in tipc_sk_enqueue() > to limit the rate like: > > The condition indicates that the two retransmission requests are > completely disjoint, e.g., 10-12 and 13-15. > > [Ying]Understood, and it makes sense. > > I don't see any reason why > we should wait with retransmitting 13-15 in this case, as it is obvious > that somebody is missing it, and we haven't retransmitted it before. > > [Ying] Most of cases I observed on bcast before, is that after NACK is just > submitted to request sender to retransmit missed packets, the requested > packets quickly arrives soon. But tragically sender still retransmits the > requested > packets through bcast later, leading to waste network bandwidth. Especially > when packets are received in disorder, it will further worsen the situation, > which > may be one of reasons why we see many duplicated bcast message appearing on > network wire. To avoid this case, it's better to a bit postpone the > retransmission > of messages on sender, on the contrary, we also can restrain the > retransmission > request on receiver side. In other words, when receiver detects that there is > a > gap at the first time, it's better to delay a bit time to submit NACK message. >
There is an implicit delay in the fact that the gap detection is made through looking at STATE messages, and not directly from the packet flow. Reception of STATE messages should typically happen after the indicated "last_sent" packet already has been received, and after the detection we want one single retransmission of the missing packets as soon as possible in order to not waste bandwidth. To further speed up detection and retransmission, you will see that I in the next commit introduce detection and NACK-ing directly from the incoming packet stream, although it is delayed and coordinated among the receivers. In brief, I only see positive effects of early gap detection and sending of NACKs, as long as we can make an intelligent rate limiting of the actual retransmissions. > Moreover, I have a more aggressive idea: now that NACK is delivered through > unicast, why not use unicast to retransmit bcast messages. This is because > what > packets missed rarely happens in practice. I tested this by running the "bcast-blast" program between two peers in a 200-node network, and I can assure there were plenty of missed packets. Moreover, it seems to me that when a packet is lost, it is most often because it is dropped by an overwhelmed bridge, so that many, if not all, recipient will miss the same packet. So I don't think unicast is the solution. My testing also consisted of adding many atomic counters in the code, and print out those by command. This gave a good insight into the statistical behavior of the protocol across all nodes (name spaces). In this rather extreme test I could see a lot of retransmissions (almost 100 %), but counting the absolute number of packets sent and received I could also observe the percentage of duplicate packets (i.e., redundant retransmissions) was very low (< 10 %), something that lead me to believe that the retransmission protocol is about as efficient as it can be. It is possible that we can make throughput better by decreasing the absolute number of dropped packets, but I rather think this can be achieved by decreasing the absolute pressure on the bridge by using a slower sending rate, and I doubt there is much to achieve from improving the retransmission protocol. I am trying to figure out a way to do this, but just decreasing the window size didn't do the trick. My feeling is that there is more that can be done to this, but I haven't figured out how, and I don't have much more time to spend on this right now. BR ///jon > So when bcast messages are > retransmitted through bcast, these message for most of nodes are redundant, > which probably means it's unnecessary to send them through bcast. > > Regards, > Ying > > But I do understand it can be problematic if re receive repeated > disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test > and see if there is anything to gain from keeping a "history" of the > retransmissions, so that we never repeat the same packets inside the > same interval. I'll take a look. > > ///jon > > > > > tipc_sk_enqueue() > > { > > .. > > while (skb_queue_len(inputq)) { > > if (unlikely(time_after_eq(jiffies, time_limit))) > > return; > > .. > > } > > > > And we don’t need to consider the retransmission range. > > > > Regards, > > Ying > > > > + /* Inside range of previous retransmit */ > > + if (!less(*from, l->prev_from) && !more(*to, l->prev_to)) > > + return false; > > + > > + /* Fully or partially outside previous range => exclude overlap */ > > + if (less(*from, l->prev_from)) { > > + *to = l->prev_from - 1; > > + l->prev_from = *from; > > + } > > + if (more(*to, l->prev_to)) { > > + *from = l->prev_to + 1; > > + l->prev_to = *to; > > + } > > + l->prev_retr = jiffies; > > + return true; > > +} > > + > > /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state > > */ > > int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > > struct sk_buff_head *xmitq) > > { > > + struct tipc_link *snd_l = l->bc_sndlink; > > u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); > > u16 from = msg_bcast_ack(hdr) + 1; > > u16 to = from + msg_bc_gap(hdr) - 1; > > @@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, > > struct > tipc_msg *hdr, > > if (!l->bc_peer_is_up) > > return rc; > > > > + l->stats.recv_nacks++; > > + > > /* Ignore if peers_snd_nxt goes beyond receive window */ > > if (more(peers_snd_nxt, l->rcv_nxt + l->window)) > > return rc; > > > > - if (!less(to, from)) { > > - rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq); > > - l->stats.recv_nacks++; > > - } > > + if (link_bc_retr_eval(snd_l, &from, &to)) > > + rc = tipc_link_retrans(snd_l, from, to, xmitq); > > > > l->snd_nxt = peers_snd_nxt; > > if (link_bc_rcv_gap(l)) > > -- > > 2.7.4 > > ------------------------------------------------------------------------------ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion