Canh,Tung,
This sounds like it might be the link synch bug we just  identified and fixed. 
Maybe you could send that patch to Andrew and let him try?
PS. I am on vacation, and will only sporadically be reading email the next 
three weeks.
///jon


From: Booth, Andrew [mailto:abo...@sonusnet.com]
Sent: Thursday, July 13, 2017 20:05
To: Jon Maloy <jon.ma...@ericsson.com>; Parthasarathy Bhuvaragan 
<parthasarathy.bhuvara...@ericsson.com>; Ying Xue <ying....@windriver.com>
Cc: Butler, Peter <pbut...@sonusnet.com>; Booth, Andrew <abo...@sonusnet.com>
Subject: TIPC issue: connection stalls when switch for bearer 0 recovers

Hi,

I am using a configuration with applications on 7 cards communicating using 
TIPC. Each card has two ethernet devices connecting to two disjoint subnets 
served by switch0 and switch1, respectively. TIPC is set to use two bearers on 
each card.

When I reboot switch0 I occasionally see TIPC connections fail. More precisely, 
the applications send "keepalive" messages every 5 seconds, and when switch0 
recovers the keepalive messages are not answered within 5 seconds so the 
applications close the connection. I have wireshark captures of a connection 
during the period where it fails; this shows some of the keepalive request and 
ack packets exchanged on the network, but each application's logs indicate that 
they are not received from the socket. The connection in this case is largely 
idle other than the keepalive exchanges.

I'm looking for ways to narrow down the issue.

The applications are select-based and I'm adding some more logging to ensure 
that the read bit is set correctly, I would be very surprised if it isn't.

I'm considering adding an ioctl to TIPC to get some information from the socket 
(number of bytes accepted from the application, number of bytes sent to the 
application,etc) that could be called when our 5s timer expires. The idea is to 
try and see if the TIPC socket receives the packets that I see in the wireshark 
capture, or if they are held in (or dropped by) the kernel during earlier 
processing.

I have wireshark captures if you're interested. The capture containing all TIPC 
traffic is about 10MB per slot, the capture showing only the connection traffic 
is about 30KB per slot.

Most of the cards are running TIPC from the 4.4.0 Linux kernel with some 
patches for specific bugs (I'm not sure how to identify which ones). One of the 
cards is running an older version of TIPC (pre-2.0), I'm not monitoring this 
card for connection errors, but it is exchanging TIPC packets and there is a 
chance it could be causing interference.

Any thoughts on how to proceed?

Thanks for any info,
Andrew
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to