[tipc-discussion] Fwd: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

Jon Maloy Fri, 29 May 2020 09:12:40 -0700

Added more recipients.


-------- Forwarded Message --------
Subject:        Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast
Date:   Fri, 29 May 2020 12:08:02 -0400
From:   Jon Maloy <jma...@redhat.com>

To: Hoang Huu Le <hoang.h...@dektech.com.au>, ma...@donjonn.com<ma...@donjonn.com>




Hi Hoang,
See below.

On 5/27/20 6:49 AM, Hoang Huu Le wrote:

Hi Jon,

I got DRAFT version base on your idea (attachment file).
But from my point, this version introduce too much code implementationat sending side.I don't think this is bright idea to keep rcast_list and bcast_list inthe name table.
I think we should find out a new way or just ignore the feature.

Yes, you are right.

I came up with a new idea, to just add a sequence number to thebroadcast/replicast messages and re-order them at reception. This evenhandles the case if the first broadcast message arrives before the firstbulk message, something we have not anticipated before.I couldn't resist the temptation trying to code it, as you can see i thepatch I just sent out.It is totally untested, I just added the code as I thought it should beand made sure it compiled.There is still a little too much new code to my taste, but this might bea way forward.

Please give your feedback on this.


I also noticed a couple of things while working with this:

1) There is still an 'expires' field in the tipc_mcast_method, and itseems to even be considered when switching bcast/rcast. The wholepoint of adding the mcast synchronization mechanism was to get rid ofthis delay. Have you tested that syncronization really works without the'expires' ?

2) There are some remnants of old code for the name table dist_queue.This functionality was made redundant by me at least two years ago, sothis should be cleaned up.

3) We might have a potential race condition when new nodes come up, sothat publications are distributed twice. a) A publication is added to the name table, and the name tablelock is released. b) A new node comes up, and the new publication is delivered in thebulk message. c) The broadcast of the publication goes ahead and sends it out toall nodes, even the one that just came up.

    d) We end up with a double publication on one of the nodes.

e) One of those will linger in the name table after the publicationis withdrawn. I have never seen this happen, and my analysis might be wrong, butto me this looks like a possible scenario. Note that my patch doesn't fix this, but we could possibly arrangeit by adding a 'distributed' flag i the publication item on the sendingside, so that the bulk distribution will ignore it.


Regards
///jon

Regards,
Hoang
-----Original Message-----
From: Jon Maloy <jma...@redhat.com>
Sent: Thursday, May 21, 2020 9:44 PM
To: Hoang Huu Le <hoang.h...@dektech.com.au>; ma...@donjonn.com
Subject: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

Hi,
I have one more comment below. Looking forward to your feedback.
///jon

On 5/20/20 9:03 PM, Hoang Huu Le wrote:

Yeah, thanks Jon.

I will investigate more on your idea when I finish the issue with labinfrastructure.


Hoang
-----Original Message-----
From: Jon Maloy <jma...@redhat.com>
Sent: Thursday, May 21, 2020 7:13 AM
To: Hoang Huu Le <hoang.h...@dektech.com.au>; ma...@donjonn.com
Subject: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

Hi Hoang,

Below I try to summarize my newest proposal in relation to v2 of yourpatch.


1) The bulk can be sent just as is done now, with the addition that we
add the NOT_LAST bit to the header.
2) We need a new capability bit identifying nodes which support
broadcast NAME_DISTR. I don't see we can just
     reuse TIPC_MCAST_RBCTL, because the recipients need to have code
for handling concurrent
     bulk/broadcast receptions. This bit is not added to the cluster
capability word.
3) We need to keep two structs of type tipc_nlist in the name table. One
contains tipc_dest structures for
    all nodes NOT supporting TIPC_MCAST_NAMEDISTR (rcast_list), and the
other those which do (bcast_list).


For more comments see below.



On 5/12/20 6:22 AM, Hoang Huu Le wrote:

Just forward the patch I mentioned.

-----Original Message-----
From: Hoang Le <hoang.h...@dektech.com.au>
Sent: Tuesday, November 19, 2019 5:01 PM
To: jon.ma...@ericsson.com; ma...@donjonn.com; tipc-...@dektech.com.au
Subject: [PATCH 2/2] tipc: update a binding service via broadcast

Currently, updating binding table (add service binding to
name table/withdraw a service binding) is being sent over replicast.
However, if we are scaling up clusters to > 100 nodes/containers this

method is less affection because of looping through nodes in acluster one

by one.

It is worth to use broadcast to update a binding service. Then binding
table updates in all nodes for one shot.

The mechanism is backward compatible as sync message slient dropped.

v2: resolve synchronization problem when switching from unicast to
broadcast

Signed-off-by: Hoang Le <hoang.h...@dektech.com.au>
---
net/tipc/bcast.c | 3 ++-
net/tipc/link.c | 6 ++++++
net/tipc/name_table.c | 33 ++++++++++++++++++++++++++++++---
net/tipc/name_table.h | 4 ++++
net/tipc/node.c | 32 ++++++++++++++++++++++++++++++++
net/tipc/node.h | 2 ++
6 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index e06f05d55534..44ed481fec47 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c

@@ -324,7 +324,8 @@ static int tipc_mcast_send_sync(struct net *net,struct sk_buff *skb,

hdr = buf_msg(skb);
if (msg_user(hdr) == MSG_FRAGMENTER)
hdr = msg_inner_hdr(hdr);
- if (msg_type(hdr) != TIPC_MCAST_MSG)
+ if (msg_user(hdr) != NAME_DISTRIBUTOR &&
+ msg_type(hdr) != TIPC_MCAST_MSG)
return 0;
/* Allocate dummy message */
diff --git a/net/tipc/link.c b/net/tipc/link.c
index fb72031228c9..a2e9a64d5a0f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c

@@ -1190,6 +1190,8 @@ static bool tipc_data_input(struct tipc_link*l, struct sk_buff *skb,

struct sk_buff_head *inputq)
{
struct sk_buff_head *mc_inputq = l->bc_rcvlink->inputq;
+ struct name_table *nt = tipc_name_table(l->net);
+ struct sk_buff_head *defnq = &nt->defer_namedq;
struct tipc_msg *hdr = buf_msg(skb);
switch (msg_user(hdr)) {

@@ -1211,6 +1213,10 @@ static bool tipc_data_input(struct tipc_link*l, struct sk_buff *skb,

case NAME_DISTRIBUTOR:
l->bc_rcvlink->state = LINK_ESTABLISHED;
skb_queue_tail(l->namedq, skb);
+
+ spin_lock_bh(&defnq->lock);
+ tipc_mcast_filter_msg(l->net, defnq, l->namedq);

Should not be needed here.

You can instead do this in tipc_named_rcv(), using l->namedq asdeferred queue and creating antemporary namedq queue on the stack for the messages ready to bedelivered.

We sort the messages in two steps:

1) If there are any chains of bulk messages, we sort those per sourcenode into the temporary namedq and deliver them first, when a chainis complete.2) If we find that a chain is incomplete we push it back to the headof n->namedq and return without further action.3) When there are no bulk messages left in n->namedq we calltipc_mcast_filter_msgs() to sort the remaining messages into thetemporary namedq, as far as possible, and deliver those which areready to be delivered.

+ spin_unlock_bh(&defnq->lock);
return true;
case MSG_BUNDLER:
case TUNNEL_PROTOCOL:
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 66a65c2cdb23..593dcd11357f 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c

@@ -615,9 +615,11 @@ struct publication *tipc_nametbl_publish(structnet *net, u32 type, u32 lower,

struct tipc_net *tn = tipc_net(net);
struct publication *p = NULL;
struct sk_buff *skb = NULL;
+ bool rcast;
spin_lock_bh(&tn->nametbl_lock);
+ rcast = nt->rcast;
if (nt->local_publ_count >= TIPC_MAX_PUBL) {
pr_warn("Bind failed, max limit %u reached\n", TIPC_MAX_PUBL);
goto exit;

@@ -632,8 +634,18 @@ struct publication *tipc_nametbl_publish(structnet *net, u32 type, u32 lower,

exit:
spin_unlock_bh(&tn->nametbl_lock);
- if (skb)
- tipc_node_broadcast(net, skb);

Here we make two calls to tipc_mcast_xmit(), one withmethod->rcast/mandatory and rcast_list,if not empty, and one withmethod->bcast/mandatory and bcast_list, if not empty.

Actually, the latter should not be mandatory. We can easily imagine a
situation where we start out with only legacy nodes, and then upgrade to
bcast nodes, one by one.

In the beginning we want tipc_mcast_xmit() to select rcast even for the
bcast nodes, and then, as their proportion of the cluster grows, it
should switch to bcast. This does mean that the method struct for bcast
must be kept between the calls. I.e., another member of struct name_table.

///jon

skb must of course be cloned if necessary.

+ if (skb) {
+ /* Use broadcast if all nodes support broadcast NAME_DISTR */
+ if (tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) {
+ tipc_node_broadcast_named_publish(net, skb, &rcast);
+ spin_lock_bh(&tn->nametbl_lock);
+ nt->rcast = rcast;
+ spin_unlock_bh(&tn->nametbl_lock);
+ } else {
+ /* Otherwise, be backwards compatible */
+ tipc_node_broadcast(net, skb);
+ }

Not needed according to above.

return p;
}

@@ -648,8 +660,10 @@ int tipc_nametbl_withdraw(struct net *net, u32type, u32 lower,

u32 self = tipc_own_addr(net);
struct sk_buff *skb = NULL;
struct publication *p;
+ bool rcast;
spin_lock_bh(&tn->nametbl_lock);
+ rcast = nt->rcast;
p = tipc_nametbl_remove_publ(net, type, lower, upper, self, key);
if (p) {

@@ -664,7 +678,16 @@ int tipc_nametbl_withdraw(struct net *net, u32type, u32 lower,

spin_unlock_bh(&tn->nametbl_lock);
if (skb) {
- tipc_node_broadcast(net, skb);
+ /* Use broadcast if all nodes support broadcast NAME_DISTR */
+ if (tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) {
+ tipc_node_broadcast_named_publish(net, skb, &rcast);
+ spin_lock_bh(&tn->nametbl_lock);
+ nt->rcast = rcast;
+ spin_unlock_bh(&tn->nametbl_lock);
+ } else {
+ /* Otherwise, be backwards compatible */
+ tipc_node_broadcast(net, skb);
+ }

One or two calls to tipc_mcast_xmit(), just as above.

return 1;
}
return 0;
@@ -746,6 +769,9 @@ int tipc_nametbl_init(struct net *net)
INIT_LIST_HEAD(&nt->cluster_scope);
rwlock_init(&nt->cluster_scope_lock);
tn->nametbl = nt;
+ /* 'bulk' updated messages via unicast */
+ nt->rcast = true;
+ skb_queue_head_init(&nt->defer_namedq);

Not needed. Now, node->namedq *is* the deferred queue.

spin_lock_init(&tn->nametbl_lock);
return 0;
}
@@ -784,6 +810,7 @@ void tipc_nametbl_stop(struct net *net)
* publications, then release the name table
*/
spin_lock_bh(&tn->nametbl_lock);
+ skb_queue_purge(&nt->defer_namedq);
for (i = 0; i < TIPC_NAMETBL_SIZE; i++) {
if (hlist_empty(&nt->services[i]))
continue;
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index f79066334cc8..b8cdf2a29d48 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -95,6 +95,8 @@ struct publication {
* - used by name_distr to send bulk updates to new nodes
* - used by name_distr during re-init of name table
* @local_publ_count: number of publications issued by this node
+ * @defer_namedq: temporarily queue for 'synching' update
+ * @rcast: previous method used to publish/withdraw a service
*/
struct name_table {
struct hlist_head services[TIPC_NAMETBL_SIZE];
@@ -102,6 +104,8 @@ struct name_table {
struct list_head cluster_scope;
rwlock_t cluster_scope_lock;
u32 local_publ_count;
+ struct sk_buff_head defer_namedq;
+ bool rcast;
};

int tipc_nl_name_table_dump(struct sk_buff *skb, structnetlink_callback *cb);

diff --git a/net/tipc/node.c b/net/tipc/node.c
index aaf595613e6e..b058647fa78b 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c

@@ -2981,3 +2981,35 @@ void tipc_node_pre_cleanup_net(struct net*exit_net)

}
rcu_read_unlock();
}
+

+int tipc_node_broadcast_named_publish(struct net *net, structsk_buff *skb,

+ bool *rcast)
+{
+ struct tipc_mc_method method = {.rcast = *rcast};
+ struct sk_buff_head xmitq;
+ struct tipc_nlist dests;
+ struct tipc_node *n;
+ u16 cong_link_cnt;
+ int rc = 0;
+
+ __skb_queue_head_init(&xmitq);
+ __skb_queue_tail(&xmitq, skb);
+
+ tipc_nlist_init(&dests, tipc_own_addr(net));
+ rcu_read_lock();
+ list_for_each_entry_rcu(n, tipc_nodes(net), list) {
+ if (in_own_node(net, n->addr))
+ continue;
+ if (!node_is_up(n))
+ continue;
+ tipc_nlist_add(&dests, n->addr);
+ }
+ rcu_read_unlock();
+
+ rc = tipc_mcast_xmit(net, &xmitq, &method, &dests, &cong_link_cnt);
+ *rcast = method.rcast;
+
+ tipc_nlist_purge(&dests);
+ __skb_queue_purge(&xmitq);
+ return rc;
+}

Not needed any more.

diff --git a/net/tipc/node.h b/net/tipc/node.h
index a6803b449a2c..d7d19f9932b1 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -124,4 +124,6 @@ int tipc_nl_node_set_key(struct sk_buff *skb,struct genl_info *info);
int tipc_nl_node_flush_key(struct sk_buff *skb, struct genl_info *info);
#endif
void tipc_node_pre_cleanup_net(struct net *exit_net);
+int tipc_node_broadcast_named_publish(struct net *net, structsk_buff *skb,
+ bool *rcast);
#endif

There may of course be flaws or potential for improvements to this, but
to me
this should solve our problem without too much new complexity and code.

///jon



_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

[tipc-discussion] Fwd: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

Reply via email to