Hi Jon, See my inline comment.
Thanks, Hoang > -----Original Message----- > From: Jon Maloy <jma...@redhat.com> > Sent: Friday, October 9, 2020 1:27 AM > To: Hoang Huu Le <hoang.h...@dektech.com.au>; ma...@donjonn.com; > ying....@windriver.com; tipc- > discuss...@lists.sourceforge.net > Subject: Re: [net-next] tipc: introduce smooth change to replicast > > > > On 10/7/20 10:22 PM, Hoang Huu Le wrote: > > In commit cad2929dc432 ("tipc: update a binding service via broadcast"), > > We use broadcast to update a binding service for a large cluster. > > However, if we try to publish a thousands of services at the > > same time, we may to get "link overflow" happen because of queue limit > > had reached. > > > > We now introduce a smooth change to replicast if the broadcast link has > > reach to the limit of queue. > To me this beats the whole purpose of using broadcast distribution in > the first place. > We wanted to save CPU and network resources by usingĀ broadcast, and > then, when things get tough, we fall back to the supposedly less > efficient replicast method. Not good. > > I wonder what is really happening when this overflow situation occurs. > First, the reset limit is dimensioned so that it should be possible to > publish MAX_PUBLICATIONS (65k) publications in one shot. > With full bundling, which is what I expect here, there are 1460/20 = 73 > publication items in each buffer, so the reset limit (==max_bulk) should > be 65k/73 = 897 buffers. [Hoang] No, it's not right! As I staged in another commit that the reset limit is 350 buffers (65k/(3744/20)) => #1. where: FB_MTU=3744 and if a user set window size is 50, we are fallback to 32 window-size => #2. So, if the broadcast link is under high load traffic, we will reach to the limit easily (#1 + #2). > My figures are just from the top of my head, so you should double check > them, but I find it unlikely that we hit this limit unless there is a > lot of other broadcast going on at the same time, and even then I find > it unlikely. [Hoang] I just implement a simple application: [...] num_service = 10k for (i=0;i<num_service; i++) bind(socket, service<i>); [...] Then, run this app; sleep 2; kill SIGINT app; Then, the problem is reproducible. > I suggest you try to find out what is really going on when we reach this > situation. > -What exactly is in the backlog queue? > -Only publications? > -How many? > -A mixture of publications and other traffic? Only publication/withdrawn buffers, around more thousands. > -Has bundling really worked as supposed? > -Do we still have some issue with the broadcast link that stops buffers > being acked and released in a timely manner? > - Have you been able to dump out such info when this problem occurs? > - Are you able to re-produce it in your own system? These answers are YES > In the end it might be as simple as increasing the reset limit, but we > really should try to understand what is happening first. Yes, I think so. As previous patch I made the code change to update queue backlog supporting to 873 buffers. But if I increase number of services in my app up to 20k (not real??>) . The issue is able to reproduce as well. That is the reason why I propose this new solution + combine with two previous patches to solve the problem completely. > > Regards > ///jon > > > > > > Signed-off-by: Hoang Huu Le <hoang.h...@dektech.com.au> > > --- > > net/tipc/link.c | 5 ++++- > > net/tipc/node.c | 12 ++++++++++-- > > 2 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/net/tipc/link.c b/net/tipc/link.c > > index 06b880da2a8e..ca908ead753a 100644 > > --- a/net/tipc/link.c > > +++ b/net/tipc/link.c > > @@ -1022,7 +1022,10 @@ int tipc_link_xmit(struct tipc_link *l, struct > > sk_buff_head *list, > > /* Allow oversubscription of one data msg per source at congestion */ > > if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { > > if (imp == TIPC_SYSTEM_IMPORTANCE) { > > - pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); > > + pr_warn_ratelimited("%s<%s>, link overflow", > > + link_rst_msg, l->name); > > + if (link_is_bc_sndlink(l)) > > + return -EOVERFLOW; > > return -ENOBUFS; > > } > > rc = link_schedule_user(l, hdr); > > diff --git a/net/tipc/node.c b/net/tipc/node.c > > index d269ebe382e1..a37976610367 100644 > > --- a/net/tipc/node.c > > +++ b/net/tipc/node.c > > @@ -1750,15 +1750,23 @@ void tipc_node_broadcast(struct net *net, struct > > sk_buff *skb, int rc_dests) > > struct tipc_node *n; > > u16 dummy; > > u32 dst; > > + int rc = 0; > > > > /* Use broadcast if all nodes support it */ > > if (!rc_dests && tipc_bcast_get_mode(net) != BCLINK_MODE_RCAST) { > > + txskb = pskb_copy(skb, GFP_ATOMIC); > > + if (!txskb) > > + goto rcast; > > __skb_queue_head_init(&xmitq); > > - __skb_queue_tail(&xmitq, skb); > > - tipc_bcast_xmit(net, &xmitq, &dummy); > > + __skb_queue_tail(&xmitq, txskb); > > + rc = tipc_bcast_xmit(net, &xmitq, &dummy); > > + if (rc == -EOVERFLOW) > > + goto rcast; > > + kfree_skb(skb); > > return; > > } > > > > +rcast: > > /* Otherwise use legacy replicast method */ > > rcu_read_lock(); > > list_for_each_entry_rcu(n, tipc_nodes(net), list) { _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion