Re: [tipc-discussion] [net-next] tipc: introduce smooth change to replicast

Jon Maloy Mon, 12 Oct 2020 16:43:23 -0700


On 10/8/20 11:36 PM, Hoang Huu Le wrote:

Hi Jon,

See my inline comment.

Thanks,
Hoang

-----Original Message-----
From: Jon Maloy <jma...@redhat.com>
Sent: Friday, October 9, 2020 1:27 AM
To: Hoang Huu Le <hoang.h...@dektech.com.au>; ma...@donjonn.com; 
ying....@windriver.com; tipc-
discuss...@lists.sourceforge.net
Subject: Re: [net-next] tipc: introduce smooth change to replicast



On 10/7/20 10:22 PM, Hoang Huu Le wrote:

In commit cad2929dc432 ("tipc: update a binding service via broadcast"),
We use broadcast to update a binding service for a large cluster.
However, if we try to publish a thousands of services at the
same time, we may to get "link overflow" happen because of queue limit
had reached.

We now introduce a smooth change to replicast if the broadcast link has
reach to the limit of queue.

To me this beats the whole purpose of using broadcast distribution in
the first place.
We wanted to save CPU and network resources by using  broadcast, and
then, when things get tough, we fall back to the supposedly less
efficient replicast method. Not good.

I wonder what is really happening when this overflow situation occurs.
First, the reset limit is dimensioned so that it should be possible to
publish MAX_PUBLICATIONS (65k) publications in one shot.
With full bundling, which is what I expect here, there are 1460/20 = 73
publication items in each buffer, so the reset limit (==max_bulk) should
be 65k/73 = 897 buffers.

[Hoang] No, it's not right!
As I staged in another commit that the reset limit is 350 buffers (65k/(3744/20)) 
=> #1.
where:
FB_MTU=3744
and if a user set window size is 50, we are fallback to 32 window-size => #2.
So, if the broadcast link is under high load traffic, we will reach to the 
limit easily (#1 + #2).

Yes. I didn't review your previous patches before making this comment.

My figures are just from the top of my head, so you should double check
them, but I find it unlikely that we hit this limit unless there is a
lot of other broadcast going on at the same time, and even then I find
it unlikely.

[Hoang]
I just implement a simple application:
[...]
num_service = 10k
for (i=0;i<num_service; i++)
     bind(socket, service<i>);
[...]

Then, run this app; sleep 2; kill SIGINT app; Then, the problem is reproducible.

I suggest you try to find out what is really going on when we reach this
situation.
-What exactly is in the backlog queue?
-Only publications?
-How many?
-A mixture of publications and other traffic?

Only publication/withdrawn buffers, around more thousands.

-Has bundling really worked as supposed?
-Do we still have some issue with the broadcast link that stops buffers
being acked and released in a timely manner?

I suspect you mean NO in this case ;-)

- Have you been able to dump out such info when this problem occurs?
- Are you able to re-produce it in your own system?

These answers are YES

In the end it might be as simple as increasing the reset limit, but we
really should try to understand what is happening first.

Yes, I think so. As previous patch I made the code change to update queue 
backlog supporting to 873 buffers.
But if I increase number of services in my app up to 20k (not real??>) . The 
issue is able to reproduce as well.

Why does it still happen? The new limit is calculated to be safe for 65kpublications, so why does it fail already at 20k?According to the loop you show above you only do publications, nowithdrawals, so yo should not even theoretically be able to reach thereset limit.

What is happening?
Let's discuss this tomorrow.

///jon

That is the reason why I propose this new solution + combine with two previous 
patches to solve the problem completely.

Regards
///jon

Signed-off-by: Hoang Huu Le <hoang.h...@dektech.com.au>
---
   net/tipc/link.c |  5 ++++-
   net/tipc/node.c | 12 ++++++++++--
   2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 06b880da2a8e..ca908ead753a 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1022,7 +1022,10 @@ int tipc_link_xmit(struct tipc_link *l, struct 
sk_buff_head *list,
        /* Allow oversubscription of one data msg per source at congestion */
        if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) {
                if (imp == TIPC_SYSTEM_IMPORTANCE) {
-                       pr_warn("%s<%s>, link overflow", link_rst_msg, l->name);
+                       pr_warn_ratelimited("%s<%s>, link overflow",
+                                           link_rst_msg, l->name);
+                       if (link_is_bc_sndlink(l))
+                               return -EOVERFLOW;
                        return -ENOBUFS;
                }
                rc = link_schedule_user(l, hdr);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index d269ebe382e1..a37976610367 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1750,15 +1750,23 @@ void tipc_node_broadcast(struct net *net, struct 
sk_buff *skb, int rc_dests)
        struct tipc_node *n;
        u16 dummy;
        u32 dst;
+       int rc = 0;

        /* Use broadcast if all nodes support it */
        if (!rc_dests && tipc_bcast_get_mode(net) != BCLINK_MODE_RCAST) {
+               txskb = pskb_copy(skb, GFP_ATOMIC);
+               if (!txskb)
+                       goto rcast;
                __skb_queue_head_init(&xmitq);
-               __skb_queue_tail(&xmitq, skb);
-               tipc_bcast_xmit(net, &xmitq, &dummy);
+               __skb_queue_tail(&xmitq, txskb);
+               rc = tipc_bcast_xmit(net, &xmitq, &dummy);
+               if (rc == -EOVERFLOW)
+                       goto rcast;
+               kfree_skb(skb);
                return;
        }

+rcast:
        /* Otherwise use legacy replicast method */
        rcu_read_lock();
        list_for_each_entry_rcu(n, tipc_nodes(net), list) {




_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next] tipc: introduce smooth change to replicast

Reply via email to