"TIPC session ID mismatch and one-sided link tear-down"
This note details an issue with TIPC regarding the one-sided tear-down
of a link and how the session ID is not being used to detect this issue.
The scenario described was run on two VxWorks targets (running TIPC
1.7.5) but the same issue applies to Linux targets. From a quick
examination of the code, the problem looks to affect all versions of
TIPC, not just TIPC 1.7.5. Obviously not a situation of high occurance
since there have been no known reports of the issue (though it has been
observed internally under certain stress conditions in the past).
It has been observed that a link can be brought down on one node and
re-established without the other node realizing that the link went down.
Details below.
Some information as background on session Id numbers. TIPC establishes
session ID numbers for a link when the link is first activated. This
value is passed in LINK_PROTOCOL messages and checked when receiving
certain types of LINK_PROTOCO
L messages in tipc_recv_proto_msg(). However, there is no check to
verify that the other node has changed its session ID in the event that
the link goes down. Nor is there a check of this field for regular
state messages.
It may be prudent to put a check on the session ID in the state message
reception to detect stale session IDs. Perhaps just drop offending
messages and the link will eventually go down. I would hesitate to
recommend dropping a link if the session ID is wrong in case an old (or
malformed) packet happened to be received.
Perhaps Jon or anyone else can comment on this and provide any
recommendations on dealing with this, or why it may have been left out
of the code before now. Perhaps it was just not observed as a problem
and left out.
Now, the details of what was observed. The PTTS (Portable TIPC Test
Suite) was used with a blaster-type test to send many packets between
two nodes. Note that the updated PTTS still needs to be posted to
sourceforge and test #17, shown below, is included but disabled by
default.
Two VxWorks nodes:
<1.1.13> - runs the TIPC Test Suite SERVER
- link goes down and comes back up, socket connection broken
<1.1.16> - runs the TIPC Test Suite CLIENT
- link stays up and socket connection not broken but blocked
Results:
- <1.1.13> detects that the link was reset and tears down the socket
connection on port 1086930948. However, <1.1.16> does not detect the
tear down and still thinks it has a socket connection from its port,
1087021060, to port 1086930948 on <1.1.13>.
- The nametable information on <1.1.13> does not show <1.1.16> as it is
waiting for <1.1.16> to announce itself on the newly established link;
but <1.1.16> does not know that the link is new and shows the old
information in its nametable output.
OUTPUT from <1.1.16>
-> sp tipcTS
Task spawned: id = 0x16e76e0, name = t1
value = 24016608 = 0x16e76e0
-> TEST FAILED unexpected number of recv() errors errno = 57: errno =
0x39
->
-> tipcConfig "lo"
Log dump:
TIPC info: Activated (version 1.7.5)
TIPC info: Started in network mode
TIPC info: Own node address <1.1.16>, network identity 228
TIPC info: Enabled bearer <eth:motetsec0>, discovery domain <1.1.0>,
priority 10
TIPC info: Established link <1.1.16:motetsec0-1.1.13:mottsec0> on
network plane A
TIPC info: Resetting link <1.1.16:motetsec0-1.1.13:mottsec0>, requested
by peer
TIPC info: Lost link <1.1.16:motetsec0-1.1.13:mottsec0> on network plane
A
TIPC info: Lost contact with <1.1.13>
TIPC info: Established link <1.1.16:motetsec0-1.1.13:mottsec0> on
network plane A
value = 0 = 0x0
-> tipcConfig "nt"
Type Lower Upper Port Identity Publication
Scope
0 16781328 16781328 <1.1.16:1086726146> 1086726147
cluster
1 1 1 <1.1.16:1086726145> 1086726146
node
value = 0 = 0x0
-> tipcConfig "p"
Ports:
1086726145: bound to {1,1}
1086726146: bound to {0,16781328}
1086930948:
value = 0 = 0x0
OUTPUT from <1.1.13>
-> sp tipcTC 17
Task spawned: id = 0x296beb0, name = t1
value = 43433648 = 0x296beb0
-> Test # 17
TIPC blaster/blastee (SOCK_SEQPACKET) test...STARTED!
Sent 100000 packets of size 1 in 1417 ms (500000 bit/s)
Sent 100000 packets of size 32 in 1533 ms (16600000 bit/s)
Sent 100000 packets of size 64 in 1433 ms (35700000 bit/s)
Sent 100000 packets of size 100 in 1517 ms (52700000 bit/s)
Sent 100000 packets of size 128 in 1683 ms (60800000 bit/s)
Sent 100000 packets of size 256 in 116017 ms (1700000 bit/s)
Sent 100000 packets of size 512 in 349883 ms (1100000 bit/s)
Sent 100000 packets of size 1000 in 601134 ms (1300000 bit/s)
Sent 100000 packets of size 1024 in 623133 ms (1300000 bit/s)
Sent 100000 packets of size 1280 in 1014000 ms (1000000 bit/s)
Sent 100000 packets of size 1450 in 1132600 ms (1000000 bit/s)
-> tipcConfig "lo"
Log dump:
TIPC info: Activated (version 1.7.5)
TIPC info: Started in network mode
TIPC info: Own node address <1.1.13>, network identity 228
TIPC info: Enabled bearer <eth:mottsec0>, discovery domain <1.1.0>,
priority 10
TIPC info: Established link <1.1.13:mottsec0-1.1.16:motetsec0> on
network plane A
value = 0 = 0x0
-> tipcConfig "nt"
Type Lower Upper Port Identity Publication
Scope
0 16781325 16781325 <1.1.13:1086726146> 1086726147
cluster
16781328 16781328 <1.1.16:1086726146> 1086726147
cluster
1 1 1 <1.1.13:1086726145> 1086726146
node
value = 0 = 0x0
-> tipcConfig "p"
Ports:
1086726145: bound to {1,1}
1086726146: bound to {0,16781325}
1087021060: connected to <1.1.16:1086930948> via {72,1000}
value = 0 = 0x0
[END]
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion