On 6/1/20 7:45 PM, Hoang Huu Le wrote:
Yes you are right. I only checked this for UDP, where it seems to be ok, but forgot about VXLAN. Still, we can avoid building up 'dest' lists by making a small change to tipc_node_bcast():Hi Jon, See my comment inline. Hoang -----Original Message----- From: Jon Maloy <jma...@redhat.com> Sent: Monday, June 1, 2020 7:40 PM To: Hoang Huu Le <hoang.h...@dektech.com.au>; tipc-discussion@lists.sourceforge.net Cc: Tung Quang Nguyen <tung.q.ngu...@dektech.com.au>; Tuong Tong Lien <tuong.t.l...@dektech.com.au>; ma...@donjonn.com; x...@redhat.com; ying....@windriver.com; parthasarathy.bhuvara...@gmail.com Subject: Re: [ ] tipc: update a binding service via broadcast Hi Hoang. See below. On 6/1/20 5:33 AM, Hoang Huu Le wrote:Hi Jon, There is a concern in function tipc_node_broadcast. See my comment inline. Regards, Hoang -----Original Message----- From: jma...@redhat.com <jma...@redhat.com> Sent: Friday, May 29, 2020 10:34 PM To: tipc-discussion@lists.sourceforge.net Cc: Tung Quang Nguyen <tung.q.ngu...@dektech.com.au>; Hoang Huu Le <hoang.h...@dektech.com.au>; Tuong Tong Lien <tuong.t.l...@dektech.com.au>; jma...@redhat.com; ma...@donjonn.com; x...@redhat.com; ying....@windriver.com; parthasarathy.bhuvara...@gmail.com Subject: [ ] tipc: update a binding service via broadcast From: Hoang Huu Le <hoang.h...@dektech.com.au> Currently, updating binding table (add service binding to name table/withdraw a service binding) is being sent over replicast. However, if we are scaling up clusters to > 100 nodes/containers this method is less affection because of looping through nodes in a cluster one by one.[...]+ /* Use broadcast if all nodes support it */+ if (!rc_dests) { + __skb_queue_head_init(&xmitq); + __skb_queue_tail(&xmitq, skb); + tipc_bcast_xmit(net, &xmitq, &dummy); + return; + } + [Hoang] We could not use 'rc_dests' to send as broadcast mode because of it is not fully broadcast supporting in the cluster. As stated, if there is a node in the cluster not supporting broadcast mode, we need to use replicast instead. That's why we introduced the feature "Smooth switch between replicast/broadcast in bcast.c" and a new command line to configure the broadcast link. If we assume base on 'rc_dests' (i.e capability flags TIPC_NAMED_BCAST), probably 'forced replicast' configuration on broadcast link becomes useless. Then, destination nodes will be missed the publication item.You misunderstand this function. rc_dests is a *counter*, not a flag. It counts the number of peer nodes that don't support TIPC_NAMED_BCAST, and if this is non-zero, we use replicast. So this is safe. [Hoang] Yes, I know about this counter. What I'm considering about the L2 interface (e.g VXLAN) pseudo support broadcast (we discussed the topic a year ago), then, the sending side must be switching to replicast (method->force_bcast = true). But above likely forcing to use broadcast careless setting from broadcast/replicast from L2.
/* Use broadcast if bearer and all nodes support it */ if (!rc_dests && tipc_bcast_get_broadcast_mode(net) != BCLINK_MODE_RCAST) { u16 dummy; __skb_queue_head_init(&xmitq); __skb_queue_tail(&xmitq, skb); tipc_bcast_xmit(net, &xmitq, &dummy); return; }I really believe sequence numbering and these three new bits solve our problem in a simple and robust way. Strictly, even the 'non_legacy' bit could be omitted, since the receiver already has this info from the capability map, but I found it simpler to do it that way. Also, I realized that tipc_named_dequeue() function can be simplified further, since there can be no duplicate messages.
BR ///jon
///jon+ /* Otherwise use legacy replicast method */ rcu_read_lock(); list_for_each_entry_rcu(n, tipc_nodes(net), list) { dst = n->addr; @@ -1749,7 +1762,6 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb) tipc_node_xmit_skb(net, txskb, dst, 0); } rcu_read_unlock(); - kfree_skb(skb); }@@ -1844,7 +1856,9 @@ static void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int bearer_id /* Handle NAME_DISTRIBUTOR messages sent from 1.7 nodes */if (!skb_queue_empty(&n->bc_entry.namedq)) - tipc_named_rcv(net, &n->bc_entry.namedq); + tipc_named_rcv(net, &n->bc_entry.namedq, + &n->bc_entry.named_rcv_nxt, + &n->bc_entry.named_open);/* If reassembly or retransmission failure => reset all links to peer */if (rc & TIPC_LINK_DOWN_EVT) @@ -2109,7 +2123,9 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) tipc_node_link_down(n, bearer_id, false);if (unlikely(!skb_queue_empty(&n->bc_entry.namedq)))- tipc_named_rcv(net, &n->bc_entry.namedq); + tipc_named_rcv(net, &n->bc_entry.namedq, + &n->bc_entry.named_rcv_nxt, + &n->bc_entry.named_open);if (unlikely(!skb_queue_empty(&n->bc_entry.inputq1)))tipc_node_mcast_rcv(n); diff --git a/net/tipc/node.h b/net/tipc/node.h index a6803b449a2c..9f6f13f1604f 100644 --- a/net/tipc/node.h +++ b/net/tipc/node.h @@ -55,7 +55,8 @@ enum { TIPC_MCAST_RBCTL = (1 << 7), TIPC_GAP_ACK_BLOCK = (1 << 8), TIPC_TUNNEL_ENHANCED = (1 << 9), - TIPC_NAGLE = (1 << 10) + TIPC_NAGLE = (1 << 10), + TIPC_NAMED_BCAST = (1 << 11) };#define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT | \@@ -68,7 +69,8 @@ enum { TIPC_MCAST_RBCTL | \ TIPC_GAP_ACK_BLOCK | \ TIPC_TUNNEL_ENHANCED | \ - TIPC_NAGLE) + TIPC_NAGLE | \ + TIPC_NAMED_BCAST)#define INVALID_BEARER_ID -1 @@ -101,7 +103,7 @@ int tipc_node_xmit_skb(struct net *net, struct sk_buff *skb, u32 dest,u32 selector); void tipc_node_subscribe(struct net *net, struct list_head *subscr, u32 addr); void tipc_node_unsubscribe(struct net *net, struct list_head *subscr, u32 addr); -void tipc_node_broadcast(struct net *net, struct sk_buff *skb); +void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests); int tipc_node_add_conn(struct net *net, u32 dnode, u32 port, u32 peer_port); void tipc_node_remove_conn(struct net *net, u32 dnode, u32 port); int tipc_node_get_mtu(struct net *net, u32 addr, u32 sel, bool connected);
_______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion