Thanks a lot, Jon!
Please see my answers inline... I will send the patch to net-next then.

BR/Tuong

-----Original Message-----
From: Jon Maloy <jon.ma...@ericsson.com> 
Sent: Wednesday, April 24, 2019 11:41 PM
To: Tuong Tong Lien <tuong.t.l...@dektech.com.au>; ma...@donjonn.com; 
ying....@windriver.com; tipc-discussion@lists.sourceforge.net
Subject: RE: [PATCH RFC] tipc: fix link "half-failover" issue

Acked-by: Jon Maloy <jon.ma...@ericsson.com>

Also see my comments below.

///jon

> -----Original Message-----
> From: Tuong Lien <tuong.t.l...@dektech.com.au>
> Sent: 24-Apr-19 00:41
> To: Jon Maloy <jon.ma...@ericsson.com>; ma...@donjonn.com;
> ying....@windriver.com; tipc-discussion@lists.sourceforge.net
> Subject: [PATCH RFC] tipc: fix link "half-failover" issue
> 
> TIPC link can temporarily fall into "half-establish" that only one of the link
> endpoints is ESTABLISHED and starts to send traffic, PROTOCOL messages,
> whereas the other link endpoint is not up (e.g. immediately when the
> endpoint receives ACTIVATE_MSG, the network interface goes down...).
> 
> This is a normal situation and will be settled because the link endpoint will 
> be
> eventually brought down after the link tolerance time.
> 
> However, the situation will become worse when the second link is
> established before the first link endpoint goes down, For example:
> 
>    1. Both links <1A-1B>, <2A-2B> down

Confusing terminology here. How can there be a link <1A-1B>?
I think you want to say "Both link endpoints" 1A,1B down etc.

[Tuong]: I didn't realize this confusion until you said it here 😊!
It should be "Both links <1A-2A>, <1B-2B> down" where the numeric part
presents the node number and the character part is the interface name.

>    2. Link endpoint 1B up, but 1A still down (e.g. due to network
>       disturbance, wrong session, etc.)

>    3. Link <2A-2B> up
Same here.
[Tuong]: Same above, it should be "Link <1B-2B> up"

>    4. Link endpoint 1B down (e.g. due to link tolerance timeout)
>    5. Node B starts failover onto link <2A-2B>
> 
>    ==> Node A does never start link failover.
> 
> When the "half-failover" situation happens, two consequences have been
> observed:
> 
> a) Peer link/node gets stuck in FAILINGOVER state;
> b) Traffic or user messages that peer node is trying to failover onto the
> second link can be partially or completely dropped by this node.
> 
> The consequence a) was actually solved by commit c140eb166d68 ("tipc:
> fix failover problem"), but that commit didn't cover the b). It's due to the 
> fact
> that the tunnel link endpoint has never been prepared for a failover, so the
> 'l->drop_point' (and the other data...) is not set correctly. When a
> TUNNLE_MSG 

s/TUNNLE_MSG/TUNNEL_MSG

[Tuong]: yep, a typo!

from peer node arrives on the link, depending on the inner
> message's seqno and the current 'l->drop_point'
> value, the message can be dropped (- treated as a duplicate message) or
> processed.
> At this early stage, the traffic messages from peer are likely to be
> NAME_DISTRIBUTORs, this means some name table entries will be missed on
> the node forever!
> 
> The commit resolves the issue by starting the FAILOVER process on this node
> as well. Another benefit from this solution is that we ensure the link will 
> not
> be re-established until the failover ends.
> 
> Signed-off-by: Tuong Lien <tuong.t.l...@dektech.com.au>
> ---
>  net/tipc/link.c | 35 +++++++++++++++++++++++++++++++++++
>  net/tipc/link.h |  2 ++
>  net/tipc/node.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++-------
>  3 files changed, 84 insertions(+), 7 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 
> 6053489c8063..fa639054329d
> 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1705,6 +1705,41 @@ void tipc_link_tnl_prepare(struct tipc_link *l,
> struct tipc_link *tnl,
>       }
>  }
> 
> +/**
> + * tipc_link_failover_prepare() - prepare tnl for link failover
> + *
> + * This is a special version of the precursor -
> +tipc_link_tnl_prepare(),
> + * see the __tipc_node_link_failover() for details
> + *
> + * @l: failover link
> + * @tnl: tunnel link
> + * @xmitq: queue for messages to be xmited  */ void
> +tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl,
> +                             struct sk_buff_head *xmitq)
> +{
> +     struct sk_buff_head *fdefq = &tnl->failover_deferdq;
> +
> +     tipc_link_create_dummy_tnl_msg(tnl, xmitq);
> +
> +     /* This failover link enpoint should never be established

s/should never be established before, so did not../was never established, so it 
has not ...

[Tuong]: ok, will update it.

> +      * before, so did not receive anything from peer.
> +      * Otherwise, it must be a normal failover situation or the
> +      * node has entered SELF_DOWN_PEER_LEAVING and both peer
> nodes
> +      * would have to start over from scratch instead.
> +      */
> +     WARN_ON(l && tipc_link_is_up(l));
> +     tnl->drop_point = 1;
> +     tnl->failover_reasm_skb = NULL;
> +
> +     /* Initiate the link's failover deferdq */
> +     if (unlikely(!skb_queue_empty(fdefq))) {
> +             pr_warn("Link failover deferdq not empty: %d!\n",
> +                     skb_queue_len(fdefq));
> +             __skb_queue_purge(fdefq);
> +     }
> +}
> +
>  /* tipc_link_validate_msg(): validate message against current link state
>   * Returns true if message should be accepted, otherwise false
>   */
> diff --git a/net/tipc/link.h b/net/tipc/link.h index
> 8439e0ee53a8..adcad65e761c 100644
> --- a/net/tipc/link.h
> +++ b/net/tipc/link.h
> @@ -90,6 +90,8 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct
> tipc_link *tnl,
>                          int mtyp, struct sk_buff_head *xmitq);  void
> tipc_link_create_dummy_tnl_msg(struct tipc_link *tnl,
>                                   struct sk_buff_head *xmitq);
> +void tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl,
> +                             struct sk_buff_head *xmitq);
>  void tipc_link_build_reset_msg(struct tipc_link *l, struct sk_buff_head
> *xmitq);  int tipc_link_fsm_evt(struct tipc_link *l, int evt);  bool
> tipc_link_is_up(struct tipc_link *l); diff --git a/net/tipc/node.c
> b/net/tipc/node.c index 7478e2d4ec02..be07cf327d2d 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -714,7 +714,6 @@ static void __tipc_node_link_up(struct tipc_node *n,
> int bearer_id,
>               *slot0 = bearer_id;
>               *slot1 = bearer_id;
>               tipc_node_fsm_evt(n, SELF_ESTABL_CONTACT_EVT);
> -             n->failover_sent = false;
>               n->action_flags |= TIPC_NOTIFY_NODE_UP;
>               tipc_link_set_active(nl, true);
>               tipc_bcast_add_peer(n->net, nl, xmitq); @@ -757,6 +756,45
> @@ static void tipc_node_link_up(struct tipc_node *n, int bearer_id,  }
> 
>  /**
> + * __tipc_node_link_failover() - start failover in case "half-failover"

Don't see any reason for the "__" prefix. This is normally used when we need 
two functions with the same name, one locked and one unlocked.

[Tuong]: Actually, I just want to emphasize that this function should be called
with care that one needs to obtain the node lock or perform some checks (e.g.
node state...) before, rather than there is another function with the same
name... If you think it is unnecessary, then I will remove the prefix...

> + *
> + * This function is only called in a very special situation where link
> + * failover can be already started on peer node but not on this node.
> + * This can happen when e.g.
> + *   1. Both links <1A-1B>, <2A-2B> down
> + *   2. Link endpoint 1B up, but 1A still down (e.g. due to network
> + *      disturbance, wrong session, etc.)
> + *   3. Link <2A-2B> up
> + *   4. Link endpoint 1B down (e.g. due to link tolerance timeout)
> + *   5. Node B starts failover onto link <2A-2B>
> + *
> + *   ==> Node A does never start link/node failover!
> + *
> + * @n: tipc node structure
> + * @l: link peer endpoint failingover (- can be NULL)
> + * @tnl: tunnel link
> + * @xmitq: queue for messages to be xmited on tnl link later  */ static
> +void __tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l,
> +                                   struct tipc_link *tnl,
> +                                   struct sk_buff_head *xmitq)
> +{
> +     /* Avoid to be "self-failover" that can never end */
> +     if (!tipc_link_is_up(tnl))
> +             return;
> +
> +     tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT);
> +     tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
> +
> +     n->sync_point = tipc_link_rcv_nxt(tnl) + (U16_MAX / 2 - 1);
> +     tipc_link_failover_prepare(l, tnl, xmitq);
> +
> +     if (l)
> +             tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT);
> +     tipc_node_fsm_evt(n, NODE_FAILOVER_BEGIN_EVT); }
> +
> +/**
>   * __tipc_node_link_down - handle loss of link
>   */
>  static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id, @@
> -1675,14 +1713,16 @@ static bool tipc_node_check_state(struct tipc_node
> *n, struct sk_buff *skb,
>                       tipc_skb_queue_splice_tail_init(tipc_link_inputq(pl),
>                                                       tipc_link_inputq(l));
>               }
> +
>               /* If parallel link was already down, and this happened
> before
> -              * the tunnel link came up, FAILOVER was never sent. Ensure
> that
> -              * FAILOVER is sent to get peer out of NODE_FAILINGOVER
> state.
> +              * the tunnel link came up, node failover was never started.
> +              * Ensure that a FAILOVER_MSG is sent to get peer out of
> +              * NODE_FAILINGOVER state, also this node must accept
> +              * TUNNLE_MSGs from peer.

s/TUNNLE_MSG/TUNEL_MSG

>                */
> -             if (n->state != NODE_FAILINGOVER && !n->failover_sent) {
> -                     tipc_link_create_dummy_tnl_msg(l, xmitq);
> -                     n->failover_sent = true;
> -             }
> +             if (n->state != NODE_FAILINGOVER)
> +                     __tipc_node_link_failover(n, pl, l, xmitq);
> +
>               /* If pkts arrive out of order, use lowest calculated syncpt */
>               if (less(syncpt, n->sync_point))
>                       n->sync_point = syncpt;
> --
> 2.13.7




_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to