Hi Jon,

Thanks for sharing your experience with the Native API.  This will be
very useful to those creating native API applications as well as for
tracking issues.

Elmer
 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jon
Paul Maloy
Sent: Friday, October 05, 2007 12:58 PM
To: Stephens, Allan
Cc: Lars Ekman; tipc
Subject: [tipc-discussion] A couple of issues

Hi Allan,
At the last meeting I mentioned that we here at Ericsson had seen some
issues with TIPC when  using the native interface, combined with using
dual links between nodes.

First discovery:
---------------
Access of a specific port via the native interface is non- re-entrant.
The reason for this is that the header for each message is "cached" in
the port structure, which is unprotected from parallel access. (With the
socket interface this is no problem, because the port is implicitly
protected by sock_lock).
Scenario: CPU A is sending a  connectionless message. The header is
constructed in the corresponding port structure, before being copied
into the allocated sk_buff in the call msg_build().
In parallel, CPU B is also sending a message, to a different
destination. Before A's header has been copied to the send buffer, it is
modified by B. The result that we may have an sk_buff in the send queue
to one node, with destination address to another node. Of course
destination port, message length and possibly even other header fields
may be wrong. We saw this happen several times.

I see three remedies for this: b) We add an extra lock, to protect each
port. This would give the penalty of having a redundant lock when
accessing via the socket interface.
b) We build the header on the stack, at least for connectionless
messages. For connection-oriented messages the header will not change,
except for message packet length. If we make sure to write packet length
directly into the sk_buff, just as we do with sequence numbers and other
link-layer fields, we can possibly keep the header "cache"
for this type of messages. (This needs to be analyzed further).
c) We don't change anything. We just make it clear in the Programmers
Guide, and in comments in the header file, that these functions are
*not* re-entrant, and must be protected by a user-provided lock.

I would prefer solution b), if my assumption about connection-oriented
headers holds.

Second Discovery:
---------------------
When running parallel links between two nodes, a race condition occurs
between the discovery procedures of the two bearers.
Scenario: A discovery message from node A comes in on bearer 1. In
tipc_disc_recv_msg(), we check if there already is an allocated
structure for node A, and if not, we create one.
In parallel, a discovery message from node A also comes in on bearer 2. 
The same check is
done, after bearer 1 did the test, but before the node structure is
actually allocated and added to the net structure.
This is possible because we only read-lock "net_lock" when a packet
arrive in tipc_recv_msg.
Unfortunately, we actually *do* change the net structure here, so this
is clearly wrong, and results in a nasty crash later on.
The work-around we used was to add a local spin_lock around these lines
of the code, but this does not feel completely satisfactory. Any
suggestions here?


Regards
///jon

------------------------------------------------------------------------
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to