Hi Jon,

See my inline comment.

Thanks,
Hoang
> -----Original Message-----
> From: Jon Maloy <jma...@redhat.com>
> Sent: Friday, October 9, 2020 1:27 AM
> To: Hoang Huu Le <hoang.h...@dektech.com.au>; ma...@donjonn.com; 
> ying....@windriver.com; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: [net-next] tipc: introduce smooth change to replicast
> 
> 
> 
> On 10/7/20 10:22 PM, Hoang Huu Le wrote:
> > In commit cad2929dc432 ("tipc: update a binding service via broadcast"),
> > We use broadcast to update a binding service for a large cluster.
> > However, if we try to publish a thousands of services at the
> > same time, we may to get "link overflow" happen because of queue limit
> > had reached.
> >
> > We now introduce a smooth change to replicast if the broadcast link has
> > reach to the limit of queue.
> To me this beats the whole purpose of using broadcast distribution in
> the first place.
> We wanted to save CPU and network resources by usingĀ  broadcast, and
> then, when things get tough, we fall back to the supposedly less
> efficient replicast method. Not good.
> 
> I wonder what is really happening when this overflow situation occurs.
> First, the reset limit is dimensioned so that it should be possible to
> publish MAX_PUBLICATIONS (65k) publications in one shot.
> With full bundling, which is what I expect here, there are 1460/20 = 73
> publication items in each buffer, so the reset limit (==max_bulk) should
> be 65k/73 = 897 buffers.

[Hoang] No, it's not right!
As I staged in another commit that the reset limit is 350 buffers 
(65k/(3744/20)) => #1.
where:
FB_MTU=3744
and if a user set window size is 50, we are fallback to 32 window-size => #2. 
So, if the broadcast link is under high load traffic, we will reach to the 
limit easily (#1 + #2).

> My figures are just from the top of my head, so you should double check
> them, but I find it unlikely that we hit this limit unless there is a
> lot of other broadcast going on at the same time, and even then I find
> it unlikely.
[Hoang]
I just implement a simple application:
[...]
num_service = 10k
for (i=0;i<num_service; i++)
    bind(socket, service<i>);
[...]

Then, run this app; sleep 2; kill SIGINT app; Then, the problem is reproducible.
 
> I suggest you try to find out what is really going on when we reach this
> situation.
> -What exactly is in the backlog queue?
> -Only publications?
> -How many?
> -A mixture of publications and other traffic?
Only publication/withdrawn buffers, around more thousands.

> -Has bundling really worked as supposed?
> -Do we still have some issue with the broadcast link that stops buffers
> being acked and released in a timely manner?
> - Have you been able to dump out such info when this problem occurs?
> - Are you able to re-produce it in your own system?
These answers are YES

> In the end it might be as simple as increasing the reset limit, but we
> really should try to understand what is happening first.
Yes, I think so. As previous patch I made the code change to update queue 
backlog supporting to 873 buffers.
But if I increase number of services in my app up to 20k (not real??>) . The 
issue is able to reproduce as well.
That is the reason why I propose this new solution + combine with two previous 
patches to solve the problem completely.
> 
> Regards
> ///jon
> 
> 
> >
> > Signed-off-by: Hoang Huu Le <hoang.h...@dektech.com.au>
> > ---
> >   net/tipc/link.c |  5 ++++-
> >   net/tipc/node.c | 12 ++++++++++--
> >   2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/tipc/link.c b/net/tipc/link.c
> > index 06b880da2a8e..ca908ead753a 100644
> > --- a/net/tipc/link.c
> > +++ b/net/tipc/link.c
> > @@ -1022,7 +1022,10 @@ int tipc_link_xmit(struct tipc_link *l, struct 
> > sk_buff_head *list,
> >     /* Allow oversubscription of one data msg per source at congestion */
> >     if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) {
> >             if (imp == TIPC_SYSTEM_IMPORTANCE) {
> > -                   pr_warn("%s<%s>, link overflow", link_rst_msg, l->name);
> > +                   pr_warn_ratelimited("%s<%s>, link overflow",
> > +                                       link_rst_msg, l->name);
> > +                   if (link_is_bc_sndlink(l))
> > +                           return -EOVERFLOW;
> >                     return -ENOBUFS;
> >             }
> >             rc = link_schedule_user(l, hdr);
> > diff --git a/net/tipc/node.c b/net/tipc/node.c
> > index d269ebe382e1..a37976610367 100644
> > --- a/net/tipc/node.c
> > +++ b/net/tipc/node.c
> > @@ -1750,15 +1750,23 @@ void tipc_node_broadcast(struct net *net, struct 
> > sk_buff *skb, int rc_dests)
> >     struct tipc_node *n;
> >     u16 dummy;
> >     u32 dst;
> > +   int rc = 0;
> >
> >     /* Use broadcast if all nodes support it */
> >     if (!rc_dests && tipc_bcast_get_mode(net) != BCLINK_MODE_RCAST) {
> > +           txskb = pskb_copy(skb, GFP_ATOMIC);
> > +           if (!txskb)
> > +                   goto rcast;
> >             __skb_queue_head_init(&xmitq);
> > -           __skb_queue_tail(&xmitq, skb);
> > -           tipc_bcast_xmit(net, &xmitq, &dummy);
> > +           __skb_queue_tail(&xmitq, txskb);
> > +           rc = tipc_bcast_xmit(net, &xmitq, &dummy);
> > +           if (rc == -EOVERFLOW)
> > +                   goto rcast;
> > +           kfree_skb(skb);
> >             return;
> >     }
> >
> > +rcast:
> >     /* Otherwise use legacy replicast method */
> >     rcu_read_lock();
> >     list_for_each_entry_rcu(n, tipc_nodes(net), list) {


_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to