Re: [tipc-discussion] [net] tipc: fix incorrect setting window for bcast link
> -Original Message- > From: Jon Maloy > Sent: Tuesday, October 13, 2020 6:39 AM > To: Hoang Huu Le ; > tipc-discussion@lists.sourceforge.net; ma...@donjonn.com; > ying@windriver.com > Subject: Re: [net] tipc: fix incorrect setting window for bcast link > > Hi Hoang, > I apologize for not paying enough attention to this problem until now. > See below. > > > On 9/30/20 9:43 PM, Hoang Huu Le wrote: > > In commit 16ad3f4022bb > > ("tipc: introduce variable window congestion control"), we applied > > the Reno algorithm to select window size from minimum window to the > > configured maximum window for unicast link. > > > > However, when setting window variable we do not specific to unicast link. > > This lead to the window for broadcast link had effect as unexpect value > > (i.e the window size is constantly 32). > > > > We fix this by updating the window for broadcast as its configuration. > > > > Signed-off-by: Hoang Huu Le > > --- > > net/tipc/bcast.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > > index 940d176e0e87..abac9443b4d9 100644 > > --- a/net/tipc/bcast.c > > +++ b/net/tipc/bcast.c > > @@ -585,7 +585,7 @@ static int tipc_bc_link_set_queue_limits(struct net > > *net, u32 max_win) > > if (max_win > TIPC_MAX_LINK_WIN) > > return -EINVAL; > > tipc_bcast_lock(net); > > - tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win); > > + tipc_link_set_queue_limits(l, max_win, max_win); > I think this is dangerous. The broadcast link puts a much higher stress > on the switch, and the risk of massive packet loss with ditto > retransmissions is very high. > A safer bet to stay with a fix window of 50 for now, to solve the > problem you have at sites, and then you can possibly experiment with a > variable window later. So, should we return to 'not support' when user changes the window for broadcast link. Or just silent ignore the setting? > If it gives significant higher throughput it might be worthwhile trying, > but I am pretty sure that e.g. the base for calculating ssthresh (300) > is too big. > > ///jon > > tipc_bcast_unlock(net); > > return 0; > > } ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net-next] tipc: introduce smooth change to replicast
On 10/8/20 11:36 PM, Hoang Huu Le wrote: Hi Jon, See my inline comment. Thanks, Hoang -Original Message- From: Jon Maloy Sent: Friday, October 9, 2020 1:27 AM To: Hoang Huu Le ; ma...@donjonn.com; ying@windriver.com; tipc- discuss...@lists.sourceforge.net Subject: Re: [net-next] tipc: introduce smooth change to replicast On 10/7/20 10:22 PM, Hoang Huu Le wrote: In commit cad2929dc432 ("tipc: update a binding service via broadcast"), We use broadcast to update a binding service for a large cluster. However, if we try to publish a thousands of services at the same time, we may to get "link overflow" happen because of queue limit had reached. We now introduce a smooth change to replicast if the broadcast link has reach to the limit of queue. To me this beats the whole purpose of using broadcast distribution in the first place. We wanted to save CPU and network resources by usingĀ broadcast, and then, when things get tough, we fall back to the supposedly less efficient replicast method. Not good. I wonder what is really happening when this overflow situation occurs. First, the reset limit is dimensioned so that it should be possible to publish MAX_PUBLICATIONS (65k) publications in one shot. With full bundling, which is what I expect here, there are 1460/20 = 73 publication items in each buffer, so the reset limit (==max_bulk) should be 65k/73 = 897 buffers. [Hoang] No, it's not right! As I staged in another commit that the reset limit is 350 buffers (65k/(3744/20)) => #1. where: FB_MTU=3744 and if a user set window size is 50, we are fallback to 32 window-size => #2. So, if the broadcast link is under high load traffic, we will reach to the limit easily (#1 + #2). Yes. I didn't review your previous patches before making this comment. My figures are just from the top of my head, so you should double check them, but I find it unlikely that we hit this limit unless there is a lot of other broadcast going on at the same time, and even then I find it unlikely. [Hoang] I just implement a simple application: [...] num_service = 10k for (i=0;i); [...] Then, run this app; sleep 2; kill SIGINT app; Then, the problem is reproducible. I suggest you try to find out what is really going on when we reach this situation. -What exactly is in the backlog queue? -Only publications? -How many? -A mixture of publications and other traffic? Only publication/withdrawn buffers, around more thousands. -Has bundling really worked as supposed? -Do we still have some issue with the broadcast link that stops buffers being acked and released in a timely manner? I suspect you mean NO in this case ;-) - Have you been able to dump out such info when this problem occurs? - Are you able to re-produce it in your own system? These answers are YES In the end it might be as simple as increasing the reset limit, but we really should try to understand what is happening first. Yes, I think so. As previous patch I made the code change to update queue backlog supporting to 873 buffers. But if I increase number of services in my app up to 20k (not real??>) . The issue is able to reproduce as well. Why does it still happen? The new limit is calculated to be safe for 65k publications, so why does it fail already at 20k? According to the loop you show above you only do publications, no withdrawals, so yo should not even theoretically be able to reach the reset limit. What is happening? Let's discuss this tomorrow. ///jon That is the reason why I propose this new solution + combine with two previous patches to solve the problem completely. Regards ///jon Signed-off-by: Hoang Huu Le --- net/tipc/link.c | 5 - net/tipc/node.c | 12 ++-- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 06b880da2a8e..ca908ead753a 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1022,7 +1022,10 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, /* Allow oversubscription of one data msg per source at congestion */ if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { if (imp == TIPC_SYSTEM_IMPORTANCE) { - pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); + pr_warn_ratelimited("%s<%s>, link overflow", + link_rst_msg, l->name); + if (link_is_bc_sndlink(l)) + return -EOVERFLOW; return -ENOBUFS; } rc = link_schedule_user(l, hdr); diff --git a/net/tipc/node.c b/net/tipc/node.c index d269ebe382e1..a37976610367 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1750,15 +1750,23 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests) struct tipc_node *n; u16 dummy; u32 dst; + int rc = 0; /* Use broadcast if all nodes
Re: [tipc-discussion] [net] tipc: fix incorrect setting window for bcast link
Hi Hoang, I apologize for not paying enough attention to this problem until now. See below. On 9/30/20 9:43 PM, Hoang Huu Le wrote: In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we applied the Reno algorithm to select window size from minimum window to the configured maximum window for unicast link. However, when setting window variable we do not specific to unicast link. This lead to the window for broadcast link had effect as unexpect value (i.e the window size is constantly 32). We fix this by updating the window for broadcast as its configuration. Signed-off-by: Hoang Huu Le --- net/tipc/bcast.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 940d176e0e87..abac9443b4d9 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -585,7 +585,7 @@ static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win) if (max_win > TIPC_MAX_LINK_WIN) return -EINVAL; tipc_bcast_lock(net); - tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win); + tipc_link_set_queue_limits(l, max_win, max_win); I think this is dangerous. The broadcast link puts a much higher stress on the switch, and the risk of massive packet loss with ditto retransmissions is very high. A safer bet to stay with a fix window of 50 for now, to solve the problem you have at sites, and then you can possibly experiment with a variable window later. If it gives significant higher throughput it might be worthwhile trying, but I am pretty sure that e.g. the base for calculating ssthresh (300) is too big. ///jon tipc_bcast_unlock(net); return 0; } ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net-next] tipc: re-configure queue limit for broadcast link
On 10/1/20 2:51 AM, Hoang Huu Le wrote: The queue limit of the broadcast link is being calculated base on initial MTU. However, when MTU value changed (e.g manual changing MTU on NIC device, MTU negotiation etc.,) we do not re-calculate queue limit. This gives throughput does not reflect with the change. So fix it by calling the function to re-calculate queue limit of the broadcast link. Signed-off-by: Hoang Huu Le --- net/tipc/bcast.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index abac9443b4d9..bc566b304571 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -108,6 +108,7 @@ static void tipc_bcbase_select_primary(struct net *net) { struct tipc_bc_base *bb = tipc_bc_base(net); int all_dests = tipc_link_bc_peers(bb->link); + int max_win = tipc_link_max_win(bb->link); int i, mtu, prim; bb->primary_bearer = INVALID_BEARER_ID; @@ -121,8 +122,11 @@ static void tipc_bcbase_select_primary(struct net *net) continue; mtu = tipc_bearer_mtu(net, i); - if (mtu < tipc_link_mtu(bb->link)) + if (mtu < tipc_link_mtu(bb->link)) { tipc_link_set_mtu(bb->link, mtu); + tipc_link_set_queue_limits(bb->link, max_win, + max_win); + } bb->bcast_support &= tipc_bearer_bcast_support(net, i); if (bb->dests[i] < all_dests) continue; An obvious bug that explains a lot, and needed to be fixed. Thanks. Acked-by: Jon Maloy ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [iproute2-next 0/2] tipc: add new options for TIPC encryption
On 10/11/20 5:13 AM, Tuong Lien wrote: This series adds two new options in the 'iproute2/tipc' command, enabling users to use the new TIPC encryption features, i.e. the master key and rekeying which have been recently merged in kernel. The help menu of the "tipc node set key" command is also updated accordingly: # tipc node set key --help Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES] tipc node set key rekeying REKEYING KEY Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form: [KEY: 16, 24 or 32 octets][SALT: 4 octets] ALGNAME Cipher algorithm [default: "gcm(aes)"] PROPERTIES master- Set KEY as a cluster master key - Set KEY as a cluster key nodeid NODEID - Set KEY as a per-node key for own or peer REKEYING INTERVAL - Set rekeying interval (in minutes) [0: disable] now - Trigger one (first) rekeying immediately EXAMPLES tipc node set key this_is_a_master_key master tipc node set key 0x746869735F69735F615F6B657931365F73616C74 tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002 tipc node set key rekeying 600 Tuong Lien (2): tipc: add option to set master key for encryption tipc: add option to set rekeying for encryption tipc/cmdl.c | 2 +- tipc/cmdl.h | 1 + tipc/node.c | 81 +++-- 3 files changed, 62 insertions(+), 22 deletions(-) Looks good to me. Series Acked-by: Jon Maloy ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net ] tipc: add stricter control of reserved service types
I think I see now. This would block anybody from calling bind() on the topology server (except first user in kernel) Using connect() should be fine. Sorry, did not read clearly enough. From: Rune Torgersen Sent: Monday, October 12, 2020 8:51 AM To: jma...@redhat.com ; tipc-discussion@lists.sourceforge.net Cc: x...@redhat.com Subject: Re: [tipc-discussion] [net ] tipc: add stricter control of reserved service types Hi. We use the tipc topology server extensively in our system to keep track of other instances of an app. Dies this chage mea we cannot use the topology server anymore? This si the userland code we use (this one would check on open sockets on 200:0 to 200:255 void * TipcTopologyServerThread(void * context) { int tipcStatusSocket; struct tipc_event event; struct sockaddr_tipc topsrv; struct tipc_subscr subscr = {{200, 0, 255}, TIPC_WAIT_FOREVER, TIPC_SUB_SERVICE, {5,5,5,5,5,5,5,5}}; memset(, 0, sizeof(topsrv)); topsrv.family = AF_TIPC; topsrv.addrtype = TIPC_ADDR_NAME; topsrv.addr.name.name.type = TIPC_TOP_SRV; topsrv.addr.name.name.instance = TIPC_TOP_SRV; tipcStatusSocket = socket(AF_TIPC, SOCK_SEQPACKET,0); if (tipcStatusSocket < 0) { perror("TipcTopologyThread: Could not make TIPC socket"); } // Connect to topology server: if (0 > connect(tipcStatusSocket,(struct sockaddr*),sizeof(topsrv))) { perro("TipcTopologyThread: Could not connect to TIPC topology server"); return NULL; } if (send(tipcStatusSocket,,sizeof(subscr),0) != sizeof(subscr)) { printf("TipcTopologyThread: Failed to send topology server subscription"); return NULL; } // Now wait for the subscriptions to fire: while(true) { int ret = recv(tipcStatusSocket,,sizeof(event),0); if (ret == sizeof(event)) { printf("Received an event\n"); if (event.event == TIPC_PUBLISHED) { printf("received a TIPC_PUBLISH msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); // do something } else if (event.event == TIPC_WITHDRAWN) { printf("received a TIPC_WITHDRAWN msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); // do something } else if (event.event == TIPC_SUBSCR_TIMEOUT) { printf("received a TIPC_SUBSCR_TIMEOUT msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); } } } return NULL; } From: jma...@redhat.com Sent: Saturday, October 10, 2020 9:56 AM To: tipc-discussion@lists.sourceforge.net Cc: x...@redhat.com Subject: [tipc-discussion] [net ] tipc: add stricter control of reserved service types This email originated from outside Innovative Systems. Do not click links or open attachments unless you recognize the sender and know the content is safe. From: Jon Maloy TIPC reserves 64 service types for current and future internal use. Therefore, the bind() function is meant to block regular user sockets from being bound to these values, while it should let through such bindings from internal users. However, since we at the design moment saw no way to distinguish between regular and internal users the filter function ended up with allowing all bindings of the types which were really in use ([0,1]), and block all the rest ([2,63]). This is dangerous, since a regular user may bind to the service type representing the topology server (TIPC_TOP_SRV == 1) or the one used for indicating neigboring node status (TIPC_CFG_SRV == 0), and wreak havoc for users of those services. I.e., practically all users. The reality is however that TIPC_CFG_SRV never is bound through the bind() function, since it doesn't represent a regular socket, and TIPC_TOP_SRV can easily be filtered out, since it is the very first binding performed when the system is starting. We can hence block TIPC_CFG_SRV completely, and only allow TIPC_TOP_SRV to be bound once, and the correct behavior is achieved. This is what we do in this commit. It should be noted that, although this is a change of the API semantics, there is no risk we will break any currently working applications by doing this. Any application trying to bind to the values in question would be badly broken from the outset, so there is no chance we would find any such applications in real-world production systems. Signed-off-by: Jon Maloy --- net/tipc/socket.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index e795a8a2955b..67875a5761d0 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -665,6
Re: [tipc-discussion] [net ] tipc: add stricter control of reserved service types
Hi. We use the tipc topology server extensively in our system to keep track of other instances of an app. Dies this chage mea we cannot use the topology server anymore? This si the userland code we use (this one would check on open sockets on 200:0 to 200:255 void * TipcTopologyServerThread(void * context) { int tipcStatusSocket; struct tipc_event event; struct sockaddr_tipc topsrv; struct tipc_subscr subscr = {{200, 0, 255}, TIPC_WAIT_FOREVER, TIPC_SUB_SERVICE, {5,5,5,5,5,5,5,5}}; memset(, 0, sizeof(topsrv)); topsrv.family = AF_TIPC; topsrv.addrtype = TIPC_ADDR_NAME; topsrv.addr.name.name.type = TIPC_TOP_SRV; topsrv.addr.name.name.instance = TIPC_TOP_SRV; tipcStatusSocket = socket(AF_TIPC, SOCK_SEQPACKET,0); if (tipcStatusSocket < 0) { perror("TipcTopologyThread: Could not make TIPC socket"); } // Connect to topology server: if (0 > connect(tipcStatusSocket,(struct sockaddr*),sizeof(topsrv))) { perro("TipcTopologyThread: Could not connect to TIPC topology server"); return NULL; } if (send(tipcStatusSocket,,sizeof(subscr),0) != sizeof(subscr)) { printf("TipcTopologyThread: Failed to send topology server subscription"); return NULL; } // Now wait for the subscriptions to fire: while(true) { int ret = recv(tipcStatusSocket,,sizeof(event),0); if (ret == sizeof(event)) { printf("Received an event\n"); if (event.event == TIPC_PUBLISHED) { printf("received a TIPC_PUBLISH msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); // do something } else if (event.event == TIPC_WITHDRAWN) { printf("received a TIPC_WITHDRAWN msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); // do something } else if (event.event == TIPC_SUBSCR_TIMEOUT) { printf("received a TIPC_SUBSCR_TIMEOUT msg, type: %u, found_lower: %u, found_upper: %u\n", event.s.seq.type, event.found_lower, event.found_upper); } } } return NULL; } From: jma...@redhat.com Sent: Saturday, October 10, 2020 9:56 AM To: tipc-discussion@lists.sourceforge.net Cc: x...@redhat.com Subject: [tipc-discussion] [net ] tipc: add stricter control of reserved service types This email originated from outside Innovative Systems. Do not click links or open attachments unless you recognize the sender and know the content is safe. From: Jon Maloy TIPC reserves 64 service types for current and future internal use. Therefore, the bind() function is meant to block regular user sockets from being bound to these values, while it should let through such bindings from internal users. However, since we at the design moment saw no way to distinguish between regular and internal users the filter function ended up with allowing all bindings of the types which were really in use ([0,1]), and block all the rest ([2,63]). This is dangerous, since a regular user may bind to the service type representing the topology server (TIPC_TOP_SRV == 1) or the one used for indicating neigboring node status (TIPC_CFG_SRV == 0), and wreak havoc for users of those services. I.e., practically all users. The reality is however that TIPC_CFG_SRV never is bound through the bind() function, since it doesn't represent a regular socket, and TIPC_TOP_SRV can easily be filtered out, since it is the very first binding performed when the system is starting. We can hence block TIPC_CFG_SRV completely, and only allow TIPC_TOP_SRV to be bound once, and the correct behavior is achieved. This is what we do in this commit. It should be noted that, although this is a change of the API semantics, there is no risk we will break any currently working applications by doing this. Any application trying to bind to the values in question would be badly broken from the outset, so there is no chance we would find any such applications in real-world production systems. Signed-off-by: Jon Maloy --- net/tipc/socket.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index e795a8a2955b..67875a5761d0 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -665,6 +665,7 @@ static int tipc_bind(struct socket *sock, struct sockaddr *uaddr, struct sockaddr_tipc *addr = (struct sockaddr_tipc *)uaddr; struct tipc_sock *tsk = tipc_sk(sk); int res = -EINVAL; + u32 stype, dnode; lock_sock(sk); if (unlikely(!uaddr_len)) { @@ -691,9 +692,10 @@ static int tipc_bind(struct socket *sock, struct sockaddr *uaddr, goto exit; } - if