Acked-by: Jon Maloy <jon.ma...@ericsson.com> Also see my comments below.
///jon > -----Original Message----- > From: Tuong Lien <tuong.t.l...@dektech.com.au> > Sent: 24-Apr-19 00:41 > To: Jon Maloy <jon.ma...@ericsson.com>; ma...@donjonn.com; > ying....@windriver.com; tipc-discussion@lists.sourceforge.net > Subject: [PATCH RFC] tipc: fix link "half-failover" issue > > TIPC link can temporarily fall into "half-establish" that only one of the link > endpoints is ESTABLISHED and starts to send traffic, PROTOCOL messages, > whereas the other link endpoint is not up (e.g. immediately when the > endpoint receives ACTIVATE_MSG, the network interface goes down...). > > This is a normal situation and will be settled because the link endpoint will > be > eventually brought down after the link tolerance time. > > However, the situation will become worse when the second link is > established before the first link endpoint goes down, For example: > > 1. Both links <1A-1B>, <2A-2B> down Confusing terminology here. How can there be a link <1A-1B>? I think you want to say "Both link endpoints" 1A,1B down etc. > 2. Link endpoint 1B up, but 1A still down (e.g. due to network > disturbance, wrong session, etc.) > 3. Link <2A-2B> up Same here. > 4. Link endpoint 1B down (e.g. due to link tolerance timeout) > 5. Node B starts failover onto link <2A-2B> > > ==> Node A does never start link failover. > > When the "half-failover" situation happens, two consequences have been > observed: > > a) Peer link/node gets stuck in FAILINGOVER state; > b) Traffic or user messages that peer node is trying to failover onto the > second link can be partially or completely dropped by this node. > > The consequence a) was actually solved by commit c140eb166d68 ("tipc: > fix failover problem"), but that commit didn't cover the b). It's due to the > fact > that the tunnel link endpoint has never been prepared for a failover, so the > 'l->drop_point' (and the other data...) is not set correctly. When a > TUNNLE_MSG s/TUNNLE_MSG/TUNNEL_MSG from peer node arrives on the link, depending on the inner > message's seqno and the current 'l->drop_point' > value, the message can be dropped (- treated as a duplicate message) or > processed. > At this early stage, the traffic messages from peer are likely to be > NAME_DISTRIBUTORs, this means some name table entries will be missed on > the node forever! > > The commit resolves the issue by starting the FAILOVER process on this node > as well. Another benefit from this solution is that we ensure the link will > not > be re-established until the failover ends. > > Signed-off-by: Tuong Lien <tuong.t.l...@dektech.com.au> > --- > net/tipc/link.c | 35 +++++++++++++++++++++++++++++++++++ > net/tipc/link.h | 2 ++ > net/tipc/node.c | 54 > +++++++++++++++++++++++++++++++++++++++++++++++------- > 3 files changed, 84 insertions(+), 7 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index > 6053489c8063..fa639054329d > 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -1705,6 +1705,41 @@ void tipc_link_tnl_prepare(struct tipc_link *l, > struct tipc_link *tnl, > } > } > > +/** > + * tipc_link_failover_prepare() - prepare tnl for link failover > + * > + * This is a special version of the precursor - > +tipc_link_tnl_prepare(), > + * see the __tipc_node_link_failover() for details > + * > + * @l: failover link > + * @tnl: tunnel link > + * @xmitq: queue for messages to be xmited */ void > +tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl, > + struct sk_buff_head *xmitq) > +{ > + struct sk_buff_head *fdefq = &tnl->failover_deferdq; > + > + tipc_link_create_dummy_tnl_msg(tnl, xmitq); > + > + /* This failover link enpoint should never be established s/should never be established before, so did not../was never established, so it has not ... > + * before, so did not receive anything from peer. > + * Otherwise, it must be a normal failover situation or the > + * node has entered SELF_DOWN_PEER_LEAVING and both peer > nodes > + * would have to start over from scratch instead. > + */ > + WARN_ON(l && tipc_link_is_up(l)); > + tnl->drop_point = 1; > + tnl->failover_reasm_skb = NULL; > + > + /* Initiate the link's failover deferdq */ > + if (unlikely(!skb_queue_empty(fdefq))) { > + pr_warn("Link failover deferdq not empty: %d!\n", > + skb_queue_len(fdefq)); > + __skb_queue_purge(fdefq); > + } > +} > + > /* tipc_link_validate_msg(): validate message against current link state > * Returns true if message should be accepted, otherwise false > */ > diff --git a/net/tipc/link.h b/net/tipc/link.h index > 8439e0ee53a8..adcad65e761c 100644 > --- a/net/tipc/link.h > +++ b/net/tipc/link.h > @@ -90,6 +90,8 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct > tipc_link *tnl, > int mtyp, struct sk_buff_head *xmitq); void > tipc_link_create_dummy_tnl_msg(struct tipc_link *tnl, > struct sk_buff_head *xmitq); > +void tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl, > + struct sk_buff_head *xmitq); > void tipc_link_build_reset_msg(struct tipc_link *l, struct sk_buff_head > *xmitq); int tipc_link_fsm_evt(struct tipc_link *l, int evt); bool > tipc_link_is_up(struct tipc_link *l); diff --git a/net/tipc/node.c > b/net/tipc/node.c index 7478e2d4ec02..be07cf327d2d 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -714,7 +714,6 @@ static void __tipc_node_link_up(struct tipc_node *n, > int bearer_id, > *slot0 = bearer_id; > *slot1 = bearer_id; > tipc_node_fsm_evt(n, SELF_ESTABL_CONTACT_EVT); > - n->failover_sent = false; > n->action_flags |= TIPC_NOTIFY_NODE_UP; > tipc_link_set_active(nl, true); > tipc_bcast_add_peer(n->net, nl, xmitq); @@ -757,6 +756,45 > @@ static void tipc_node_link_up(struct tipc_node *n, int bearer_id, } > > /** > + * __tipc_node_link_failover() - start failover in case "half-failover" Don't see any reason for the "__" prefix. This is normally used when we need two functions with the same name, one locked and one unlocked. > + * > + * This function is only called in a very special situation where link > + * failover can be already started on peer node but not on this node. > + * This can happen when e.g. > + * 1. Both links <1A-1B>, <2A-2B> down > + * 2. Link endpoint 1B up, but 1A still down (e.g. due to network > + * disturbance, wrong session, etc.) > + * 3. Link <2A-2B> up > + * 4. Link endpoint 1B down (e.g. due to link tolerance timeout) > + * 5. Node B starts failover onto link <2A-2B> > + * > + * ==> Node A does never start link/node failover! > + * > + * @n: tipc node structure > + * @l: link peer endpoint failingover (- can be NULL) > + * @tnl: tunnel link > + * @xmitq: queue for messages to be xmited on tnl link later */ static > +void __tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l, > + struct tipc_link *tnl, > + struct sk_buff_head *xmitq) > +{ > + /* Avoid to be "self-failover" that can never end */ > + if (!tipc_link_is_up(tnl)) > + return; > + > + tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); > + tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); > + > + n->sync_point = tipc_link_rcv_nxt(tnl) + (U16_MAX / 2 - 1); > + tipc_link_failover_prepare(l, tnl, xmitq); > + > + if (l) > + tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT); > + tipc_node_fsm_evt(n, NODE_FAILOVER_BEGIN_EVT); } > + > +/** > * __tipc_node_link_down - handle loss of link > */ > static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id, @@ > -1675,14 +1713,16 @@ static bool tipc_node_check_state(struct tipc_node > *n, struct sk_buff *skb, > tipc_skb_queue_splice_tail_init(tipc_link_inputq(pl), > tipc_link_inputq(l)); > } > + > /* If parallel link was already down, and this happened > before > - * the tunnel link came up, FAILOVER was never sent. Ensure > that > - * FAILOVER is sent to get peer out of NODE_FAILINGOVER > state. > + * the tunnel link came up, node failover was never started. > + * Ensure that a FAILOVER_MSG is sent to get peer out of > + * NODE_FAILINGOVER state, also this node must accept > + * TUNNLE_MSGs from peer. s/TUNNLE_MSG/TUNEL_MSG > */ > - if (n->state != NODE_FAILINGOVER && !n->failover_sent) { > - tipc_link_create_dummy_tnl_msg(l, xmitq); > - n->failover_sent = true; > - } > + if (n->state != NODE_FAILINGOVER) > + __tipc_node_link_failover(n, pl, l, xmitq); > + > /* If pkts arrive out of order, use lowest calculated syncpt */ > if (less(syncpt, n->sync_point)) > n->sync_point = syncpt; > -- > 2.13.7 _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion