Hi ,

We are using tipc (version 3.10.0-514 for RHEL 7.3).

Scenario :
4 node cluster , 3 nodes are sending broadcast traffic and 1 node is
leaving/joining the cluster (bearers enabled/disabled)

In function node_established_contact , following is being done
        n_ptr->bclink.acked = tipc_bclink_get_last_sent();
        tipc_bclink_add_node(n_ptr->addr);

By the time peer node is added in bclink , the last sent on bc link
(bcl->fsm_msg_cnt) can move ahead.
However to the new node, as part of synchronization message , we are still
sending a old broadcast seq number .   So the new node would ask for
broadcast packets to be re-transmitted which this node may not even have
any more (since the packet would have been acknowledged by other peers and
hence freed).

The issue goes away , if we set  n_ptr->bclink.acked inside
tipc_bclink_add_node (which acquires the bc_lock).

Impact of the issue : Without this change, whenever the race condition
occurs , it leads to constant re-transmission on the broadcast link for a
packet which is not available , which in turn causes link reset once the
stale_counter crosses 100.

We also referred to https://sourceforge.net/p/tipc/bugs/111/
But the tipc code version is way ahead and not possible to back port.

Would like to know if  there is any obvious issue due to above change?
Appreciate any feedback.

Regards,
Amit

_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to