Re: [tipc-discussion] TIPC one-sided link tear-down issue (session ID discussion)

Jon Paul Maloy Wed, 05 Dec 2007 14:10:14 -0800

Hi Elmer,
Se my comments/questions below.

///jon



Horvath, Elmer wrote:
> "TIPC session ID mismatch and one-sided link tear-down"
>   

> It may be prudent to put a check on the session ID in the state message
> reception to detect stale session IDs.  Perhaps just drop offending
> messages and the link will eventually go down.  I would hesitate to
> recommend dropping a link if the session ID is wrong in case an old (or
> malformed) packet happened to be received.
>   
Yes, we should drop such messages, just to fulfill the protocol. It is 
not a
complete solution to the problem, though, because on-going traffic may
still keep the link up for  a while, either until the traffic becomes so 
low
hat the link supervision mechanism kicks in and resets the link, or until
the other endpoint resets itself because of too many failed retransmission
attempts. (The sequence numbers will not match).

But what really puzzles me is how this can occur in the first place.
If you have a look at the state diagram for link activation below, and  
the 
corresponding  text (pasted in from the tipc-protocol draft), you will 
see that a
link endpoint can never ever come back to  WORKING_WORKING after
a reset without having a confirmation that the other endpoint has also been
reset, i.e. having  received either a "RESET" or an "ACTIVATE" message
from the other end.

This is the state machine we have been using for many years, and I have
had reason  to come back to it on many occasions, only to confirm each time
that it really is  water-proof. I have not heard of any problem like the 
one
you describe earlier.

I can only see two possible explanations for this: Either the code 
doesn't match
the state machine (a bug somewhere), or a stray RESET or ACTIVATE is
received from the other end (or somewhere else?) despite that the link is
up and running.  I have seen the latter happen, once I was using a switch
that was sometimes delivering packets completely out of order, with several
seconds  of delay. (This was the initial reason for introducing the 
session id.)
But this can only happen if the link has been recently started or reset, 
as far
as I can see.

Is the latter a possibility? How long has the link been up when this occurs?
Are you using a switch with any known issues? Have you seen this on Linux
too, or only on VxWorks?

<<<Pasted in: >>>>


      2.6.2. Link Activation

Link activation and supervision is completely handled by the generic 
part of the protocol, in contrast to the partially media-dependent 
neighbour detection protocol.

The following FSM describes how a link is activated and supervised.


------------------------------------------------------------------------

  ---------------                               ---------------
 |               |<--(CHECKPOINT == LAST_REC)--|               |
 |               |                             |               |
 |Working-Unknown|----TRAFFIC/ACTIVATE_MSG---->|Working-Working|
 |               |                             |               |
 |               |-------+      +-ACTIVATE_MSG>|               |
  ---------------         \    /                ------------A--
     |                     \  /                   |         |
     | NO TRAFFIC/          \/                 RESET_MSG  TRAFFIC/
     | NO PROBE             /\                    |      ACTIVATE_MSG
     | REPLY               /  \                   |         |
  ---V-----------         /    \                --V------------
 |               |-------+      +--RESET_MSG-->|               |
 |               |                             |               |
 | Reset-Unknown |                             |  Reset-Reset  |
 |               |----------RESET_MSG--------->|               |
 |               |                             |               |
  -------------A-                               ---------------
   |           |
   | BLOCK/    | UNBLOCK/
   | CHANGEOVER| CHANGEOVER END
   | ORIG_MSG  |
  -V-------------
 |               |
 |               |
 |    Blocked    |
 |               |
 |               |
  ---------------


* Figure 20: Link finite state machine *

------------------------------------------------------------------------

A link enpoint's state is defined by the own endpoint's state, combined 
with what is known about the other endpoint's state. The following 
states exist:

Reset-Unknown

    Own link endpoint reset, i.e. queues are emptied and sequence
    numbers are set back to their initial values. The state of the peer
    endpoint is unknown. LINK_PROTOCOL/RESET_MSG messages are sent
    periodically at CONTINUITY_INTERVAL to inform peer about the own
    endpoint's state, and to force it to reset its own enpoint,if this
    has not already been done. If the peer endpoint is rebooting, or has
    reset for some other reason, it will sooner or later also reach the
    state Reset-Unknown, and start sending its own RESET_MSG messages
    periodically. At least one of the endpoints, and often both, will
    eventually receive a RESET_MSG and transfer to state Reset-Reset. If
    the peer is still active, i.e. in one of the states Working-Working
    or Working-Unknown, and has not yet detected the disturbance causing
    this endpoint to reset, it will sooner or later receive a RESET_MSG,
    and transfer directly to state Reset-Reset. If a LINK_PROTOCOL/
    ACTIVATE_MSG message is received in this state, the link endpoint
    knows that the peer is already in state Reset-Reset, and can itself
    move directly on to state Working-Working. Any other messages are
    ignored in this state. CONTINUITY_INTERVAL is calculated as the
    smallest value of LINK_TOLERANCE/4 and 0.5 sec.



> Perhaps Jon or anyone else can comment on this and provide any
> recommendations on dealing with this, or why it may have been left out
> of the code before now.  Perhaps it was just not observed as a problem
> and left out.
>
>
>   

> [END]
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell.  From the desktop to the data center, Linux is going
> mainstream.  Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>   


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] TIPC one-sided link tear-down issue (session ID discussion)

Reply via email to