On 04/28/2016 10:20 PM, Hamish Martin wrote:
> Hi Jon,
>
> Yes it was very difficult to track! Unfortunately I don't know why they
> get onto the wrong link in the first place.
>
> I agree the root problem would be good to find, but i am limited in both
> my understanding of TIPC and my ability to track it further.
> To reproduce this I had a bunch of nodes (7 all up). I effectively turn
> off two of them at a time and then turn them back on. When they are back
> in the cluster I restart another different two. Three of the nodes
> remain up at all times. It is on one of those three nodes that I
> eventually see the problem. Perhaps someone more experienced could try
> something like that and see if they can trigger the issue.
>
> In short, this is ambulance at the bottom of the cliff stuff rather than
> your desired fence at the top. I am happy to help test any theories of
> assist in further describing the test that shows it for us.

I understand. This test is legitimate to do even when we later find the 
root problem.
A further question: which nodes were the broadcast transmitters? All? 
Any of the new ones? Any of the "steady" ones?

///jon

>
> Thanks,
> Hamish Martin.
>
>
> On 04/29/2016 02:09 PM, Jon Maloy wrote:
>> (Removed netdev from list, added some others).
>>
>> This is interesting, and must it have been hard to track.  But I would 
>> really like to know the real reason why this happens, so we can catch the 
>> root problem. Broadcast ACK messages are just ordinary STATE messages, and 
>> should have a correct destination address. Did you find out where these 
>> messages really came from, and why they have wrong destination addresses?
>>
>> ///jon
>>
>>> -----Original Message-----
>>> From: Hamish Martin [mailto:[email protected]]
>>> Sent: Thursday, 28 April, 2016 21:35
>>> To: Jon Maloy; [email protected]
>>> Cc: Hamish Martin
>>> Subject: [PATCH] tipc: Only process unicast on intended node
>>>
>>> We have observed complete lock up of broadcast-link transmission due to
>>> unacknowledged packets never being removed from the 'transmq' queue. This
>>> is traced to nodes having their ack field set beyond the sequence number
>>> of packets that have actually been transmitted to them.
>>> Consider an example where node 1 has sent 10 packets to node 2 on a
>>> link and node 3 has sent 20 packets to node 2 on another link. We
>>> see examples of an ack from node 2 destined for node 3 being treated as
>>> an ack from node 2 at node 1. This leads to the ack on the node 1 to node
>>> 2 link being increased to 20 even though we have only sent 10 packets.
>>> When node 1 does get around to sending further packets, none of the
>>> packets with sequence numbers less than 21 are actually removed from the
>>> transmq.
>>> To resolve this we reinstate some code lost in commit d999297c3dbb ("tipc:
>>> reduce locking scope during packet reception") which ensures that only
>>> messages destined for the receiving node are processed by that node. This
>>> prevents the sequence numbers from getting out of sync and resolves the
>>> packet leakage, thereby resolving the broadcast-link transmission
>>> lock-ups we observed.
>>>
>>> Signed-off-by: Hamish Martin <[email protected]>
>>> Reviewed-by: Chris Packham <[email protected]>
>>> Reviewed-by: John Thompson <[email protected]>
>>> ---
>>>    net/tipc/node.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/net/tipc/node.c b/net/tipc/node.c
>>> index ace178fd3850..e5dda495d4b6 100644
>>> --- a/net/tipc/node.c
>>> +++ b/net/tipc/node.c
>>> @@ -1460,6 +1460,11 @@ void tipc_rcv(struct net *net, struct sk_buff *skb,
>>> struct tipc_bearer *b)
>>>                     return tipc_node_bc_rcv(net, skb, bearer_id);
>>>     }
>>>
>>> +   /* Discard unicast link messages destined for another node */
>>> +   if (unlikely(!msg_short(hdr) &&
>>> +                (msg_destnode(hdr) != tipc_own_addr(net))))
>>> +           goto discard;
>>> +
>>>     /* Locate neighboring node that sent packet */
>>>     n = tipc_node_find(net, msg_prevnode(hdr));
>>>     if (unlikely(!n))
>>> --
>>> 2.8.1


------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to