Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()
On 12/11/19 10:00 AM, Tuong Lien Tong wrote: >> >> /* Move passive key if any */ >> if (key.passive) { >> -tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, >lock); >> +tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, > >lock); > The 3rd parameter should be the lockdep condition checking instead of the > spinlock's pointer i.e. "lockdep_is_held(>lock)"? > That's why I'd prefer to use the 'tipc_aead_rcu_swap ()' macro, which is > clear & concise at least for the context here. It might be re-used later as > well... > Right. The 3rd parameter of rcu_replace_pointer() should be "lockdep_is_held(>lock)" instead of ">lock". ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()
On 12/11/19 11:17 AM, Paul E. McKenney wrote: >> Acked-by: Ying Xue > As in the following? If so, I will be very happy to apply your Acked-by. Yes. Thanks! ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()
Hi Ying, Paul, Please see my comments inline. Thanks! BR/Tuong -Original Message- From: Ying Xue Sent: Wednesday, December 11, 2019 8:32 AM To: paul...@kernel.org Cc: net...@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@kernel.org; tipc-discussion@lists.sourceforge.net; kernel-t...@fb.com; torva...@linux-foundation.org; da...@davemloft.net Subject: Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer() On 12/11/19 6:38 AM, Paul E. McKenney wrote: > commit 4ee8e2c68b076867b7a5af82a38010fffcab611c > Author: Paul E. McKenney > Date: Mon Dec 9 19:13:45 2019 -0800 > > net/tipc: Replace rcu_swap_protected() with rcu_replace_pointer() > > This commit replaces the use of rcu_swap_protected() with the more > intuitively appealing rcu_replace_pointer() as a step towards removing > rcu_swap_protected(). > > Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4g g...@mail.gmail.com/ > Reported-by: Linus Torvalds > Reported-by: kbuild test robot > Signed-off-by: Paul E. McKenney > Cc: Jon Maloy > Cc: Ying Xue > Cc: "David S. Miller" > Cc: > Cc: > Acked-by: Ying Xue > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > index 990a872..978d2db 100644 > --- a/net/tipc/crypto.c > +++ b/net/tipc/crypto.c > @@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new, > #define tipc_aead_rcu_ptr(rcu_ptr, lock) \ > rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) > > -#define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) \ > - rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock)) > - > #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock)\ > do { \ > typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr),\ > @@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending) > > /* Move passive key if any */ > if (key.passive) { > - tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, >lock); > + tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, >lock); The 3rd parameter should be the lockdep condition checking instead of the spinlock's pointer i.e. "lockdep_is_held(>lock)"? That's why I'd prefer to use the 'tipc_aead_rcu_swap ()' macro, which is clear & concise at least for the context here. It might be re-used later as well... > x = (key.passive - key.pending + new_pending) % KEY_MAX; > new_passive = (x <= 0) ? x + KEY_MAX : x; > } > ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net 0/4] tipc: fix some issues
From: Tuong Lien Date: Tue, 10 Dec 2019 15:21:01 +0700 > This series consists of some bug-fixes for TIPC. Series applied, thanks. ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()
On 12/11/19 6:38 AM, Paul E. McKenney wrote: > commit 4ee8e2c68b076867b7a5af82a38010fffcab611c > Author: Paul E. McKenney > Date: Mon Dec 9 19:13:45 2019 -0800 > > net/tipc: Replace rcu_swap_protected() with rcu_replace_pointer() > > This commit replaces the use of rcu_swap_protected() with the more > intuitively appealing rcu_replace_pointer() as a step towards removing > rcu_swap_protected(). > > Link: > https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/ > Reported-by: Linus Torvalds > Reported-by: kbuild test robot > Signed-off-by: Paul E. McKenney > Cc: Jon Maloy > Cc: Ying Xue > Cc: "David S. Miller" > Cc: > Cc: > Acked-by: Ying Xue > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > index 990a872..978d2db 100644 > --- a/net/tipc/crypto.c > +++ b/net/tipc/crypto.c > @@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old, > struct tipc_key new, > #define tipc_aead_rcu_ptr(rcu_ptr, lock) \ > rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) > > -#define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) > \ > - rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock)) > - > #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock)\ > do { \ > typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr),\ > @@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct > tipc_crypto *rx, u8 new_pending) > > /* Move passive key if any */ > if (key.passive) { > - tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, >lock); > + tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, > >lock); > x = (key.passive - key.pending + new_pending) % KEY_MAX; > new_passive = (x <= 0) ? x + KEY_MAX : x; > } > ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net-next 0/3] tipc: introduce variable window congestion control
From: Jon Maloy Date: Tue, 10 Dec 2019 00:52:43 +0100 > We improve thoughput greatly by introducing a variety of the Reno > congestion control algorithm at the link level. Series applied, thanks Jon. ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH net/tipc] Replace rcu_swap_protected() with rcu_replace_pointer()
On 12/10/19 11:31 AM, Paul E. McKenney wrote: > This commit replaces the use of rcu_swap_protected() with the more > intuitively appealing rcu_replace_pointer() as a step towards removing > rcu_swap_protected(). > > Link: > https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/ > Reported-by: Linus Torvalds > Reported-by: kbuild test robot > Signed-off-by: Paul E. McKenney > Cc: Jon Maloy > Cc: Ying Xue > Cc: "David S. Miller" > Cc: > Cc: > > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > index 990a872..64cf831 100644 > --- a/net/tipc/crypto.c > +++ b/net/tipc/crypto.c > @@ -258,7 +258,7 @@ static char *tipc_key_change_dump(struct tipc_key old, > struct tipc_key new, > rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) > > #define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) > \ > - rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock)) > + rcu_replace_pointer((rcu_ptr), (ptr), lockdep_is_held(lock)) (ptr) = rcu_replace_pointer((rcu_ptr), (ptr), lockdep_is_held(lock)) > > #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock)\ > do { \ > @@ -1189,7 +1189,7 @@ static bool tipc_crypto_key_try_align(struct > tipc_crypto *rx, u8 new_pending) > > /* Move passive key if any */ > if (key.passive) { > - tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, >lock); > + tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, > >lock); tipc_aead_rcu_swap() is only called here in TIPC module. If we use rcu_replace_pointer() to switch pointers instead of calling tipc_aead_rcu_swap() macro, I think we should completely remove tipc_aead_rcu_swap(). > x = (key.passive - key.pending + new_pending) % KEY_MAX; > new_passive = (x <= 0) ? x + KEY_MAX : x; > } > ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [net] tipc: fix use-after-free in tipc_disc_rcv()
On 12/9/19 6:11 PM, Tuong Lien wrote: > In the function 'tipc_disc_rcv()', the 'msg_peer_net_hash()' is called > to read the header data field but after the message skb has been freed, > that might result in a garbage value... > > This commit fixes it by defining a new local variable to store the data > first, just like the other header fields' handling. > > Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") > Signed-off-by: Tuong Lien Acked-by: Ying Xue > --- > net/tipc/discover.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/tipc/discover.c b/net/tipc/discover.c > index b043e8c6397a..bfe43da127c0 100644 > --- a/net/tipc/discover.c > +++ b/net/tipc/discover.c > @@ -194,6 +194,7 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *skb, > { > struct tipc_net *tn = tipc_net(net); > struct tipc_msg *hdr = buf_msg(skb); > + u32 pnet_hash = msg_peer_net_hash(hdr); > u16 caps = msg_node_capabilities(hdr); > bool legacy = tn->legacy_addr_format; > u32 sugg = msg_sugg_node_addr(hdr); > @@ -242,9 +243,8 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *skb, > return; > if (!tipc_in_scope(legacy, b->domain, src)) > return; > - tipc_node_check_dest(net, src, peer_id, b, caps, signature, > - msg_peer_net_hash(hdr), , , > - _addr); > + tipc_node_check_dest(net, src, peer_id, b, caps, signature, pnet_hash, > + , , _addr); > if (dupl_addr) > disc_dupl_alert(b, src, ); > if (!respond) > ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [net 3/4] tipc: fix retrans failure due to wrong destination
When a user message is sent, TIPC will check if the socket has faced a congestion at link layer. If that happens, it will make a sleep to wait for the congestion to disappear. This leaves a gap for other users to take over the socket (e.g. multi threads) since the socket is released as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free to send messages to various destinations (e.g. via 'sendto()'), then the socket's preformatted header has to be updated correspondingly prior to the actual payload message building. Unfortunately, the latter action is done before the first action which causes a condition issue that the destination of a certain message can be modified incorrectly in the middle, leading to wrong destination when that message is built. Consequently, when the message is sent to the link layer, it gets stuck there forever because the peer node will simply reject it. After a number of retransmission attempts, the link is eventually taken down and the retransmission failure is reported. This commit fixes the problem by rearranging the order of actions to prevent the race condition from occurring, so the message building is 'atomic' and its header will not be modified by anyone. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/socket.c | 32 ++-- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 41688da233ab..6552f986774c 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1364,8 +1364,8 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) struct tipc_msg *hdr = >phdr; struct tipc_name_seq *seq; struct sk_buff_head pkts; - u32 dport, dnode = 0; - u32 type, inst; + u32 dport = 0, dnode = 0; + u32 type = 0, inst = 0; int mtu, rc; if (unlikely(dlen > TIPC_MAX_USER_MSG_SIZE)) @@ -1418,23 +1418,11 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) type = dest->addr.name.name.type; inst = dest->addr.name.name.instance; dnode = dest->addr.name.domain; - msg_set_type(hdr, TIPC_NAMED_MSG); - msg_set_hdr_sz(hdr, NAMED_H_SIZE); - msg_set_nametype(hdr, type); - msg_set_nameinst(hdr, inst); - msg_set_lookup_scope(hdr, tipc_node2scope(dnode)); dport = tipc_nametbl_translate(net, type, inst, ); - msg_set_destnode(hdr, dnode); - msg_set_destport(hdr, dport); if (unlikely(!dport && !dnode)) return -EHOSTUNREACH; } else if (dest->addrtype == TIPC_ADDR_ID) { dnode = dest->addr.id.node; - msg_set_type(hdr, TIPC_DIRECT_MSG); - msg_set_lookup_scope(hdr, 0); - msg_set_destnode(hdr, dnode); - msg_set_destport(hdr, dest->addr.id.ref); - msg_set_hdr_sz(hdr, BASIC_H_SIZE); } else { return -EINVAL; } @@ -1445,6 +1433,22 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) if (unlikely(rc)) return rc; + if (dest->addrtype == TIPC_ADDR_NAME) { + msg_set_type(hdr, TIPC_NAMED_MSG); + msg_set_hdr_sz(hdr, NAMED_H_SIZE); + msg_set_nametype(hdr, type); + msg_set_nameinst(hdr, inst); + msg_set_lookup_scope(hdr, tipc_node2scope(dnode)); + msg_set_destnode(hdr, dnode); + msg_set_destport(hdr, dport); + } else { /* TIPC_ADDR_ID */ + msg_set_type(hdr, TIPC_DIRECT_MSG); + msg_set_lookup_scope(hdr, 0); + msg_set_destnode(hdr, dnode); + msg_set_destport(hdr, dest->addr.id.ref); + msg_set_hdr_sz(hdr, BASIC_H_SIZE); + } + __skb_queue_head_init(); mtu = tipc_node_get_mtu(net, dnode, tsk->portid, false); rc = tipc_msg_build(hdr, m, 0, dlen, mtu, ); -- 2.13.7 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [net 4/4] tipc: fix use-after-free in tipc_disc_rcv()
In the function 'tipc_disc_rcv()', the 'msg_peer_net_hash()' is called to read the header data field but after the message skb has been freed, that might result in a garbage value... This commit fixes it by defining a new local variable to store the data first, just like the other header fields' handling. Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/discover.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/tipc/discover.c b/net/tipc/discover.c index b043e8c6397a..bfe43da127c0 100644 --- a/net/tipc/discover.c +++ b/net/tipc/discover.c @@ -194,6 +194,7 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *skb, { struct tipc_net *tn = tipc_net(net); struct tipc_msg *hdr = buf_msg(skb); + u32 pnet_hash = msg_peer_net_hash(hdr); u16 caps = msg_node_capabilities(hdr); bool legacy = tn->legacy_addr_format; u32 sugg = msg_sugg_node_addr(hdr); @@ -242,9 +243,8 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *skb, return; if (!tipc_in_scope(legacy, b->domain, src)) return; - tipc_node_check_dest(net, src, peer_id, b, caps, signature, -msg_peer_net_hash(hdr), , , -_addr); + tipc_node_check_dest(net, src, peer_id, b, caps, signature, pnet_hash, +, , _addr); if (dupl_addr) disc_dupl_alert(b, src, ); if (!respond) -- 2.13.7 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [net 0/4] tipc: fix some issues
This series consists of some bug-fixes for TIPC. Tuong Lien (4): tipc: fix name table rbtree issues tipc: fix potential hanging after b/rcast changing tipc: fix retrans failure due to wrong destination tipc: fix use-after-free in tipc_disc_rcv() net/tipc/bcast.c | 24 +++-- net/tipc/discover.c | 6 +- net/tipc/name_table.c | 279 -- net/tipc/socket.c | 32 +++--- 4 files changed, 215 insertions(+), 126 deletions(-) -- 2.13.7 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [net 2/4] tipc: fix potential hanging after b/rcast changing
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast"), we allow instant switching between replicast and broadcast by sending a dummy 'SYN' packet on the last used link to synchronize packets on the links. The 'SYN' message is an object of link congestion also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent back to the socket... However, in that commit, we simply use the same socket 'cong_link_cnt' counter for both the 'SYN' & normal payload message sending. Therefore, if both the replicast & broadcast links are congested, the counter will be not updated correctly but overwritten by the latter congestion. Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is reduced one by one and eventually overflowed. Consequently, further activities on the socket will only wait for the false congestion signal to disappear but never been met. Because sending the 'SYN' message is vital for the mechanism, it should be done anyway. This commit fixes the issue by marking the message with an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a link congestion, there is no need to touch the socket 'cong_link_cnt' either. In addition, in the event of any error (e.g. -ENOBUFS), we will purge the entire payload message queue and make a return immediately. Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast") Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/bcast.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 55aeba681cf4..656ebc79c64e 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -305,17 +305,17 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts, * @skb: socket buffer to copy * @method: send method to be used * @dests: destination nodes for message. - * @cong_link_cnt: returns number of encountered congested destination links * Returns 0 if success, otherwise errno */ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb, struct tipc_mc_method *method, - struct tipc_nlist *dests, - u16 *cong_link_cnt) + struct tipc_nlist *dests) { struct tipc_msg *hdr, *_hdr; struct sk_buff_head tmpq; struct sk_buff *_skb; + u16 cong_link_cnt; + int rc = 0; /* Is a cluster supporting with new capabilities ? */ if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL)) @@ -343,18 +343,19 @@ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb, _hdr = buf_msg(_skb); msg_set_size(_hdr, MCAST_H_SIZE); msg_set_is_rcast(_hdr, !msg_is_rcast(hdr)); + msg_set_errcode(_hdr, TIPC_ERR_NO_PORT); __skb_queue_head_init(); __skb_queue_tail(, _skb); if (method->rcast) - tipc_bcast_xmit(net, , cong_link_cnt); + rc = tipc_bcast_xmit(net, , _link_cnt); else - tipc_rcast_xmit(net, , dests, cong_link_cnt); + rc = tipc_rcast_xmit(net, , dests, _link_cnt); /* This queue should normally be empty by now */ __skb_queue_purge(); - return 0; + return rc; } /* tipc_mcast_xmit - deliver message to indicated destination nodes @@ -396,9 +397,14 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts, msg_set_is_rcast(hdr, method->rcast); /* Switch method ? */ - if (rcast != method->rcast) - tipc_mcast_send_sync(net, skb, method, -dests, cong_link_cnt); + if (rcast != method->rcast) { + rc = tipc_mcast_send_sync(net, skb, method, dests); + if (unlikely(rc)) { + pr_err("Unable to send SYN: method %d, rc %d\n", + rcast, rc); + goto exit; + } + } if (method->rcast) rc = tipc_rcast_xmit(net, pkts, dests, cong_link_cnt); -- 2.13.7 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [net 1/4] tipc: fix name table rbtree issues
The current rbtree for service ranges in the name table is built based on the 'lower' & 'upper' range values resulting in a flaw in the rbtree searching. Some issues have been observed in case of range overlapping: Case #1: unable to withdraw a name entry: After some name services are bound, all of them are withdrawn by user but one remains in the name table forever. This corrupts the table and that service becomes dummy i.e. no real port. E.g. / {22, 22} / / ---> {10, 50} / \ /\ {10, 30} {20, 60} The node {10, 30} cannot be removed since the rbtree searching stops at the node's ancestor i.e. {10, 50}, so starting from it will never reach the finding node. Case #2: failed to send data in some cases: E.g. Two service ranges: {20, 60}, {10, 50} are bound. The rbtree for this service will be one of the two cases below depending on the order of the bindings: {20, 60} {10, 50} <-- / \ / \ /\ /\ {10, 50} NIL <-- NIL {20, 60} (a)(b) Now, try to send some data to service {30}, there will be two results: (a): Failed, no route to host. (b): Ok. The reason is that the rbtree searching will stop at the pointing node as shown above. Case #3: Same as case #2b above but if the data sending's scope is local and the {10, 50} is published by a peer node, then it will result in 'no route to host' even though the other {20, 60} is for example on the local node which should be able to get the data. The issues are actually due to the way we built the rbtree. This commit fixes it by introducing an additional field to each node - named 'max', which is the largest 'upper' of that node subtree. The 'max' value for each subtrees will be propagated correctly whenever a node is inserted/ removed or the tree is rebalanced by the augmented rbtree callbacks. By this way, we can change the rbtree searching appoarch to solve the issues above. Another benefit from this is that we can now improve the searching for a next range matching e.g. in case of multicast, so get rid of the unneeded looping over all nodes in the tree. Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/name_table.c | 279 -- 1 file changed, 179 insertions(+), 100 deletions(-) diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c index 92d04dc2a44b..359b2bc888cf 100644 --- a/net/tipc/name_table.c +++ b/net/tipc/name_table.c @@ -36,6 +36,7 @@ #include #include +#include #include "core.h" #include "netlink.h" #include "name_table.h" @@ -51,6 +52,7 @@ * @lower: service range lower bound * @upper: service range upper bound * @tree_node: member of service range RB tree + * @max: largest 'upper' in this node subtree * @local_publ: list of identical publications made from this node * Used by closest_first lookup and multicast lookup algorithm * @all_publ: all publications identical to this one, whatever node and scope @@ -60,6 +62,7 @@ struct service_range { u32 lower; u32 upper; struct rb_node tree_node; + u32 max; struct list_head local_publ; struct list_head all_publ; }; @@ -84,6 +87,130 @@ struct tipc_service { struct rcu_head rcu; }; +#define service_range_upper(sr) ((sr)->upper) +RB_DECLARE_CALLBACKS_MAX(static, sr_callbacks, +struct service_range, tree_node, u32, max, +service_range_upper) + +#define service_range_entry(rbtree_node) \ + (container_of(rbtree_node, struct service_range, tree_node)) + +#define service_range_overlap(sr, start, end) \ + ((sr)->lower <= (end) && (sr)->upper >= (start)) + +/** + * service_range_foreach_match - iterate over tipc service rbtree for each + * range match + * @sr: the service range pointer as a loop cursor + * @sc: the pointer to tipc service which holds the service range rbtree + * @start, end: the range (end >= start) for matching + */ +#define service_range_foreach_match(sr, sc, start, end) \ + for (sr = service_range_match_first((sc)->ranges.rb_node, \ + start, \ + end); \ +sr;\ +sr = service_range_match_next(&(sr)->tree_node,\ + start, \ + end)) + +/** + * service_range_match_first - find first service range matching a range + * @n: the root node of service range rbtree for searching + * @start, end: the range (end >= start) for matching + * + * Return: the leftmost