Hi Jon,
Please see my inline comment
Regards,
Hoang
*From:* Jon Maloy <jma...@redhat.com>
*Sent:* Friday, May 29, 2020 11:11 PM
*To:* tipc-discussion@lists.sourceforge.net; tipc-dek
<tipc-...@dektech.com.au>; Tuong Tong Lien
<tuong.t.l...@dektech.com.au>; Xin Long <lucien....@gmail.com>; Tung
Quang Nguyen <tung.q.ngu...@dektech.com.au>; Ying Xue
<ying....@windriver.com>
*Subject:* Fwd: Re: FW: [PATCH 2/2] tipc: update a binding service via
broadcast
Added more recipients.
-------- Forwarded Message --------
*Subject: *
Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast
*Date: *
Fri, 29 May 2020 12:08:02 -0400
*From: *
Jon Maloy <jma...@redhat.com> <mailto:jma...@redhat.com>
*To: *
Hoang Huu Le <hoang.h...@dektech.com.au>
<mailto:hoang.h...@dektech.com.au>, ma...@donjonn.com
<mailto:ma...@donjonn.com> <ma...@donjonn.com> <mailto:ma...@donjonn.com>
Hi Hoang,
See below.
On 5/27/20 6:49 AM, Hoang Huu Le wrote:
Hi Jon,
I got DRAFT version base on your idea (attachment file).
But from my point, this version introduce too much code
implementation at sending side.
I don't think this is bright idea to keep rcast_list and
bcast_list in the name table.
I think we should find out a new way or just ignore the feature.
Yes, you are right.
I came up with a new idea, to just add a sequence number to the
broadcast/replicast messages and re-order them at reception. This even
handles the case if the first broadcast message arrives before the
first bulk message, something we have not anticipated before.
I couldn't resist the temptation trying to code it, as you can see i
the patch I just sent out.
It is totally untested, I just added the code as I thought it should
be and made sure it compiled.
There is still a little too much new code to my taste, but this might
be a way forward.
Please give your feedback on this.
I also noticed a couple of things while working with this:
1) There is still an 'expires' field in the tipc_mcast_method, and it
seems to even be considered when switching bcast/rcast. The whole
point of adding the mcast synchronization mechanism was to get rid of
this delay. Have you tested that syncronization really works without
the 'expires' ?
[Hoang] Yes, I did.
I issue the command below to force rcast/bcast regardless ‘expires’.
$ tipc link set bro REPLICAST
or
$tipc link set bro BROADCAST
Regards
///jon
Regards,
Hoang
-----Original Message-----
From: Jon Maloy <jma...@redhat.com> <mailto:jma...@redhat.com>
Sent: Thursday, May 21, 2020 9:44 PM
To: Hoang Huu Le <hoang.h...@dektech.com.au>
<mailto:hoang.h...@dektech.com.au>; ma...@donjonn.com
<mailto:ma...@donjonn.com>
Subject: Re: FW: [PATCH 2/2] tipc: update a binding service via
broadcast
Hi,
I have one more comment below. Looking forward to your feedback.
///jon
On 5/20/20 9:03 PM, Hoang Huu Le wrote:
Yeah, thanks Jon.
I will investigate more on your idea when I finish the issue
with lab infrastructure.
Hoang
-----Original Message-----
From: Jon Maloy <jma...@redhat.com> <mailto:jma...@redhat.com>
Sent: Thursday, May 21, 2020 7:13 AM
To: Hoang Huu Le <hoang.h...@dektech.com.au>
<mailto:hoang.h...@dektech.com.au>; ma...@donjonn.com
<mailto:ma...@donjonn.com>
Subject: Re: FW: [PATCH 2/2] tipc: update a binding service
via broadcast
Hi Hoang,
Below I try to summarize my newest proposal in relation to v2
of your patch.
1) The bulk can be sent just as is done now, with the addition
that we
add the NOT_LAST bit to the header.
2) We need a new capability bit identifying nodes which support
broadcast NAME_DISTR. I don't see we can just
reuse TIPC_MCAST_RBCTL, because the recipients need to
have code
for handling concurrent
bulk/broadcast receptions. This bit is not added to the
cluster
capability word.
3) We need to keep two structs of type tipc_nlist in the name
table. One
contains tipc_dest structures for
all nodes NOT supporting TIPC_MCAST_NAMEDISTR
(rcast_list), and the
other those which do (bcast_list).
For more comments see below.
On 5/12/20 6:22 AM, Hoang Huu Le wrote:
Just forward the patch I mentioned.
-----Original Message-----
From: Hoang Le <hoang.h...@dektech.com.au>
<mailto:hoang.h...@dektech.com.au>
Sent: Tuesday, November 19, 2019 5:01 PM
To: jon.ma...@ericsson.com
<mailto:jon.ma...@ericsson.com>; ma...@donjonn.com
<mailto:ma...@donjonn.com>; tipc-...@dektech.com.au
<mailto:tipc-...@dektech.com.au>
Subject: [PATCH 2/2] tipc: update a binding service via
broadcast
Currently, updating binding table (add service binding to
name table/withdraw a service binding) is being sent over
replicast.
However, if we are scaling up clusters to > 100
nodes/containers this
method is less affection because of looping through nodes
in a cluster one
by one.
It is worth to use broadcast to update a binding service.
Then binding
table updates in all nodes for one shot.
The mechanism is backward compatible as sync message
slient dropped.
v2: resolve synchronization problem when switching from
unicast to
broadcast
Signed-off-by: Hoang Le <hoang.h...@dektech.com.au>
<mailto:hoang.h...@dektech.com.au>
---
net/tipc/bcast.c | 3 ++-
net/tipc/link.c | 6 ++++++
net/tipc/name_table.c | 33 ++++++++++++++++++++++++++++++---
net/tipc/name_table.h | 4 ++++
net/tipc/node.c | 32 ++++++++++++++++++++++++++++++++
net/tipc/node.h | 2 ++
6 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index e06f05d55534..44ed481fec47 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -324,7 +324,8 @@ static int tipc_mcast_send_sync(struct
net *net, struct sk_buff *skb,
hdr = buf_msg(skb);
if (msg_user(hdr) == MSG_FRAGMENTER)
hdr = msg_inner_hdr(hdr);
- if (msg_type(hdr) != TIPC_MCAST_MSG)
+ if (msg_user(hdr) != NAME_DISTRIBUTOR &&
+ msg_type(hdr) != TIPC_MCAST_MSG)
return 0;
/* Allocate dummy message */
diff --git a/net/tipc/link.c b/net/tipc/link.c
index fb72031228c9..a2e9a64d5a0f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1190,6 +1190,8 @@ static bool tipc_data_input(struct
tipc_link *l, struct sk_buff *skb,
struct sk_buff_head *inputq)
{
struct sk_buff_head *mc_inputq = l->bc_rcvlink->inputq;
+ struct name_table *nt = tipc_name_table(l->net);
+ struct sk_buff_head *defnq = &nt->defer_namedq;
struct tipc_msg *hdr = buf_msg(skb);
switch (msg_user(hdr)) {
@@ -1211,6 +1213,10 @@ static bool tipc_data_input(struct
tipc_link *l, struct sk_buff *skb,
case NAME_DISTRIBUTOR:
l->bc_rcvlink->state = LINK_ESTABLISHED;
skb_queue_tail(l->namedq, skb);
+
+ spin_lock_bh(&defnq->lock);
+ tipc_mcast_filter_msg(l->net, defnq, l->namedq);
Should not be needed here.
You can instead do this in tipc_named_rcv(), using l->namedq
as deferred queue and creating an
temporary namedq queue on the stack for the messages ready to
be delivered.
We sort the messages in two steps:
1) If there are any chains of bulk messages, we sort those per
source node into the temporary namedq and deliver them first,
when a chain is complete.
2) If we find that a chain is incomplete we push it back to
the head of n->namedq and return without further action.
3) When there are no bulk messages left in n->namedq we call
tipc_mcast_filter_msgs() to sort the remaining messages into
the temporary namedq, as far as possible, and deliver those
which are ready to be delivered.
+ spin_unlock_bh(&defnq->lock);
return true;
case MSG_BUNDLER:
case TUNNEL_PROTOCOL:
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 66a65c2cdb23..593dcd11357f 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -615,9 +615,11 @@ struct publication
*tipc_nametbl_publish(struct net *net, u32 type, u32 lower,
struct tipc_net *tn = tipc_net(net);
struct publication *p = NULL;
struct sk_buff *skb = NULL;
+ bool rcast;
spin_lock_bh(&tn->nametbl_lock);
+ rcast = nt->rcast;
if (nt->local_publ_count >= TIPC_MAX_PUBL) {
pr_warn("Bind failed, max limit %u reached\n", TIPC_MAX_PUBL);
goto exit;
@@ -632,8 +634,18 @@ struct publication
*tipc_nametbl_publish(struct net *net, u32 type, u32 lower,
exit:
spin_unlock_bh(&tn->nametbl_lock);
- if (skb)
- tipc_node_broadcast(net, skb);
Here we make two calls to tipc_mcast_xmit(), one with
method->rcast/mandatory and rcast_list,if not empty, and one
with method->bcast/mandatory and bcast_list, if not empty.
Actually, the latter should not be mandatory. We can easily imagine a
situation where we start out with only legacy nodes, and then
upgrade to
bcast nodes, one by one.
In the beginning we want tipc_mcast_xmit() to select rcast even
for the
bcast nodes, and then, as their proportion of the cluster grows, it
should switch to bcast. This does mean that the method struct for
bcast
must be kept between the calls. I.e., another member of struct
name_table.
///jon
skb must of course be cloned if necessary.
+ if (skb) {
+ /* Use broadcast if all nodes support broadcast
NAME_DISTR */
+ if (tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) {
+ tipc_node_broadcast_named_publish(net, skb, &rcast);
+ spin_lock_bh(&tn->nametbl_lock);
+ nt->rcast = rcast;
+ spin_unlock_bh(&tn->nametbl_lock);
+ } else {
+ /* Otherwise, be backwards compatible */
+ tipc_node_broadcast(net, skb);
+ }
Not needed according to above.
return p;
}
@@ -648,8 +660,10 @@ int tipc_nametbl_withdraw(struct net
*net, u32 type, u32 lower,
u32 self = tipc_own_addr(net);
struct sk_buff *skb = NULL;
struct publication *p;
+ bool rcast;
spin_lock_bh(&tn->nametbl_lock);
+ rcast = nt->rcast;
p = tipc_nametbl_remove_publ(net, type, lower, upper,
self, key);
if (p) {
@@ -664,7 +678,16 @@ int tipc_nametbl_withdraw(struct net
*net, u32 type, u32 lower,
spin_unlock_bh(&tn->nametbl_lock);
if (skb) {
- tipc_node_broadcast(net, skb);
+ /* Use broadcast if all nodes support broadcast
NAME_DISTR */
+ if (tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) {
+ tipc_node_broadcast_named_publish(net, skb, &rcast);
+ spin_lock_bh(&tn->nametbl_lock);
+ nt->rcast = rcast;
+ spin_unlock_bh(&tn->nametbl_lock);
+ } else {
+ /* Otherwise, be backwards compatible */
+ tipc_node_broadcast(net, skb);
+ }
One or two calls to tipc_mcast_xmit(), just as above.
return 1;
}
return 0;
@@ -746,6 +769,9 @@ int tipc_nametbl_init(struct net *net)
INIT_LIST_HEAD(&nt->cluster_scope);
rwlock_init(&nt->cluster_scope_lock);
tn->nametbl = nt;
+ /* 'bulk' updated messages via unicast */
+ nt->rcast = true;
+ skb_queue_head_init(&nt->defer_namedq);
Not needed. Now, node->namedq *is* the deferred queue.
spin_lock_init(&tn->nametbl_lock);
return 0;
}
@@ -784,6 +810,7 @@ void tipc_nametbl_stop(struct net *net)
* publications, then release the name table
*/
spin_lock_bh(&tn->nametbl_lock);
+ skb_queue_purge(&nt->defer_namedq);
for (i = 0; i < TIPC_NAMETBL_SIZE; i++) {
if (hlist_empty(&nt->services[i]))
continue;
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index f79066334cc8..b8cdf2a29d48 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -95,6 +95,8 @@ struct publication {
* - used by name_distr to send bulk updates to new nodes
* - used by name_distr during re-init of name table
* @local_publ_count: number of publications issued by this
node
+ * @defer_namedq: temporarily queue for 'synching' update
+ * @rcast: previous method used to publish/withdraw a service
*/
struct name_table {
struct hlist_head services[TIPC_NAMETBL_SIZE];
@@ -102,6 +104,8 @@ struct name_table {
struct list_head cluster_scope;
rwlock_t cluster_scope_lock;
u32 local_publ_count;
+ struct sk_buff_head defer_namedq;
+ bool rcast;
};
int tipc_nl_name_table_dump(struct sk_buff *skb, struct
netlink_callback *cb);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index aaf595613e6e..b058647fa78b 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -2981,3 +2981,35 @@ void
tipc_node_pre_cleanup_net(struct net *exit_net)
}
rcu_read_unlock();
}
+
+int tipc_node_broadcast_named_publish(struct net *net,
struct sk_buff *skb,
+ bool *rcast)
+{
+ struct tipc_mc_method method = {.rcast = *rcast};
+ struct sk_buff_head xmitq;
+ struct tipc_nlist dests;
+ struct tipc_node *n;
+ u16 cong_link_cnt;
+ int rc = 0;
+
+ __skb_queue_head_init(&xmitq);
+ __skb_queue_tail(&xmitq, skb);
+
+ tipc_nlist_init(&dests, tipc_own_addr(net));
+ rcu_read_lock();
+ list_for_each_entry_rcu(n, tipc_nodes(net), list) {
+ if (in_own_node(net, n->addr))
+ continue;
+ if (!node_is_up(n))
+ continue;
+ tipc_nlist_add(&dests, n->addr);
+ }
+ rcu_read_unlock();
+
+ rc = tipc_mcast_xmit(net, &xmitq, &method, &dests,
&cong_link_cnt);
+ *rcast = method.rcast;
+
+ tipc_nlist_purge(&dests);
+ __skb_queue_purge(&xmitq);
+ return rc;
+}
Not needed any more.
diff --git a/net/tipc/node.h b/net/tipc/node.h
index a6803b449a2c..d7d19f9932b1 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -124,4 +124,6 @@ int tipc_nl_node_set_key(struct
sk_buff *skb, struct genl_info *info);
int tipc_nl_node_flush_key(struct sk_buff *skb, struct
genl_info *info);
#endif
void tipc_node_pre_cleanup_net(struct net *exit_net);
+int tipc_node_broadcast_named_publish(struct net *net,
struct sk_buff *skb,
+ bool *rcast);
#endif
There may of course be flaws or potential for improvements to
this, but
to me
this should solve our problem without too much new complexity
and code.
///jon