From: Tuong Lien <tuong.t.l...@dektech.com.au> Date: Thu, 2 May 2019 17:23:23 +0700
> TIPC link can temporarily fall into "half-establish" that only one of > the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL > messages, whereas the other link endpoint is not up (e.g. immediately > when the endpoint receives ACTIVATE_MSG, the network interface goes > down...). > > This is a normal situation and will be settled because the link > endpoint will be eventually brought down after the link tolerance time. > > However, the situation will become worse when the second link is > established before the first link endpoint goes down, > For example: > > 1. Both links <1A-2A>, <1B-2B> down > 2. Link endpoint 2A up, but 1A still down (e.g. due to network > disturbance, wrong session, etc.) > 3. Link <1B-2B> up > 4. Link endpoint 2A down (e.g. due to link tolerance timeout) > 5. Node B starts failover onto link <1B-2B> > > ==> Node A does never start link failover. > > When the "half-failover" situation happens, two consequences have been > observed: > > a) Peer link/node gets stuck in FAILINGOVER state; > b) Traffic or user messages that peer node is trying to failover onto > the second link can be partially or completely dropped by this node. > > The consequence a) was actually solved by commit c140eb166d68 ("tipc: > fix failover problem"), but that commit didn't cover the b). It's due > to the fact that the tunnel link endpoint has never been prepared for a > failover, so the 'l->drop_point' (and the other data...) is not set > correctly. When a TUNNEL_MSG from peer node arrives on the link, > depending on the inner message's seqno and the current 'l->drop_point' > value, the message can be dropped (- treated as a duplicate message) or > processed. > At this early stage, the traffic messages from peer are likely to be > NAME_DISTRIBUTORs, this means some name table entries will be missed on > the node forever! > > The commit resolves the issue by starting the FAILOVER process on this > node as well. Another benefit from this solution is that we ensure the > link will not be re-established until the failover ends. > > Acked-by: Jon Maloy <jon.ma...@ericsson.com> > Signed-off-by: Tuong Lien <tuong.t.l...@dektech.com.au> Applied, thank you. _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion