On 10/7/20 10:22 PM, Hoang Huu Le wrote:
In commit cad2929dc432 ("tipc: update a binding service via broadcast"),
We use broadcast to update a binding service for a large cluster.
However, if we try to publish a thousands of services at the
same time, we may to get "link overflow" happen because of queue limit
had reached.
We now introduce a smooth change to replicast if the broadcast link has
reach to the limit of queue.
To me this beats the whole purpose of using broadcast distribution in
the first place.
We wanted to save CPU and network resources by usingĀ broadcast, and
then, when things get tough, we fall back to the supposedly less
efficient replicast method. Not good.
I wonder what is really happening when this overflow situation occurs.
First, the reset limit is dimensioned so that it should be possible to
publish MAX_PUBLICATIONS (65k) publications in one shot.
With full bundling, which is what I expect here, there are 1460/20 = 73
publication items in each buffer, so the reset limit (==max_bulk) should
be 65k/73 = 897 buffers.
My figures are just from the top of my head, so you should double check
them, but I find it unlikely that we hit this limit unless there is a
lot of other broadcast going on at the same time, and even then I find
it unlikely.
I suggest you try to find out what is really going on when we reach this
situation.
-What exactly is in the backlog queue?
-Only publications?
-How many?
-A mixture of publications and other traffic?
-Has bundling really worked as supposed?
-Do we still have some issue with the broadcast link that stops buffers
being acked and released in a timely manner?
- Have you been able to dump out such info when this problem occurs?
- Are you able to re-produce it in your own system?
In the end it might be as simple as increasing the reset limit, but we
really should try to understand what is happening first.
Regards
///jon
Signed-off-by: Hoang Huu Le <hoang.h...@dektech.com.au>
---
net/tipc/link.c | 5 ++++-
net/tipc/node.c | 12 ++++++++++--
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 06b880da2a8e..ca908ead753a 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1022,7 +1022,10 @@ int tipc_link_xmit(struct tipc_link *l, struct
sk_buff_head *list,
/* Allow oversubscription of one data msg per source at congestion */
if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) {
if (imp == TIPC_SYSTEM_IMPORTANCE) {
- pr_warn("%s<%s>, link overflow", link_rst_msg, l->name);
+ pr_warn_ratelimited("%s<%s>, link overflow",
+ link_rst_msg, l->name);
+ if (link_is_bc_sndlink(l))
+ return -EOVERFLOW;
return -ENOBUFS;
}
rc = link_schedule_user(l, hdr);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index d269ebe382e1..a37976610367 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1750,15 +1750,23 @@ void tipc_node_broadcast(struct net *net, struct
sk_buff *skb, int rc_dests)
struct tipc_node *n;
u16 dummy;
u32 dst;
+ int rc = 0;
/* Use broadcast if all nodes support it */
if (!rc_dests && tipc_bcast_get_mode(net) != BCLINK_MODE_RCAST) {
+ txskb = pskb_copy(skb, GFP_ATOMIC);
+ if (!txskb)
+ goto rcast;
__skb_queue_head_init(&xmitq);
- __skb_queue_tail(&xmitq, skb);
- tipc_bcast_xmit(net, &xmitq, &dummy);
+ __skb_queue_tail(&xmitq, txskb);
+ rc = tipc_bcast_xmit(net, &xmitq, &dummy);
+ if (rc == -EOVERFLOW)
+ goto rcast;
+ kfree_skb(skb);
return;
}
+rcast:
/* Otherwise use legacy replicast method */
rcu_read_lock();
list_for_each_entry_rcu(n, tipc_nodes(net), list) {
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion