Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Joel M. Halpern Sat, 10 Dec 2016 23:03:08 -0800

Thanks Donald.  I will look at it within the next few days.


Yours,
Joel

On 12/11/16 12:19 AM, Donald Eastlake wrote:

Hi Joel,

Sorry for the delay but we have attempted to respond to your points in
version -09 of the draft. There were also changes unrelated to your
comments which are briefly described in
https://www.ietf.org/mail-archive/web/trill/current/msg07572.html
<https://www.ietf.org/mail-archive/web/trill/current/msg07572.html>

Additional changes in -09 including making "SHOULD" the implementation
requirement for methods 2 and 3.

Concerning the possible change to the Push Directory state machine,
looking at this it appears that changes by adding states would have to
be more extensive than I originally thought. In any case, in this
version, some explanatory text has been added in Section 2.3.2.

Please take a look when convenient.

Thanks,
Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 [email protected] <mailto:[email protected]>

On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <[email protected]
<mailto:[email protected]>> wrote:

    Hi Joel,

    On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>> wrote:
    > If by the connectivity check to the directory server, you mean the
    > underlying IS-IS routing reporting connectivity, then say that.

    OK.

    > While that
    > is not actually interchangeable with real connectivity, it is perfectly
    > reasoanble for the WG to deem it sufficient.  I think it would only take a
    > sentence or two to clarify for the reader that what is meant is apparent
    > topological connectivity, as distinct from verified communication.

    The phrase usually used in TRILL (See RFC 7780) is "data reachable".

    Thanks,
    Donald
    =============================
     Donald E. Eastlake 3rd   +1-508-333-2270 <tel:%2B1-508-333-2270> (cell)
     155 Beaver Street, Milford, MA 01757 USA
     [email protected] <mailto:[email protected]>

    > Yours,
    > Joel
    >
    >
    > On 4/15/16 11:12 PM, Donald Eastlake wrote:
    >>
    >> Hi Joel,
    >>
    >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>>
    >> wrote:
    >>>
    >>> Thank you Donald.  Points of agreement elided, some responses to
    try to
    >>> clarify my observations.  I will note that from your comments
    about 3.1,
    >>> I
    >>> believe my concerns, now moved to 3.7, are larger, as I had
    assumed that
    >>> the
    >>> magic was in some other protocol, and you now say it is not defined
    >>> there.
    >>>
    >>> Yours,
    >>> Joel
    >>>
    >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
    >>>>
    >>>>
    >>>> Hi Joel
    >>>>
    >>>> Thanks for your thorough review and comments. See below
    >>>>
    >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>
    >>>>   <mailto:[email protected] <mailto:[email protected]>> wrote:
    >>>>
    >>> ...
    >>>
    >>>>> Major Issues:
    >>>>> In the state machine transitions in section 2.3.3
    >>>>> for push servers, it appears that if the event indicating that the
    >>>>> server is being shut down occurs while the server is already Going
    >>>>> Stand-By or Uncompleting, the transitions indicate that this
    >>>>> "going
    >>>>> down" event will be lost.  A strict reading of this would seem to
    >>>>> mean that the "go Down" event would need to recur after the
    >>>>> timeout
    >>>>> condition.  This would seem to be best addressed by a new state
    >>>>> "Going-Down" whose timeout behavior is to move to down state.
    >>>>
    >>>>
    >>>> I understand your point but "going down" and the like are called
    >>>> "events or conditions" in this draft, not just events.
    >>>> The problem with adding a single "Going-Down" state is that
    >>>> transition
    >>>> to that state would lose the information as to whether or not the
    >>>> Push
    >>>> Directory had been advertising that it was pushing complete
    >>>> information or not. The reason to remember this is that you would
    >>>> want
    >>>> to behave a differently if the "going down" condition was revoked
    >>>> before it completed. This information could be preserved in a
    >>>> Boolean
    >>>> pseudo variable but the current style of state machine in this
    draft
    >>>> avoids such pseudo variables and encodes all of the relevant push
    >>>> directory's state into the state machine state. Thus, I can see
    >>>> three
    >>>> possible responses to your comment:
    >>>>
    >>>> 1) Change wording to emphasize that these "events or
    conditions" can
    >>>> be conditions that cause a state transition some substantial time
    >>>> after they become true.
    >>>>
    >>>> 2) Add two new states: (1) going down - was complete; (2) going
    down
    >>>> -
    >>>> was incomplete.
    >>>>
    >>>> 3) Change the style of state machine to admit pseudo variables
    which
    >>>> can be set and testing as part of the state machinery.
    >>>>
    >>>> Option 1 is just some minor wording changes but adopting either
    >>>> options 2 or 3 involves more extensive changes so I would prefer to
    >>>> avoid them.
    >>>
    >>>
    >>>  From what I have seen, trying to build a state machine with
    conditions
    >>> rather than events is fraught with problems and tends to lead to
    errors
    >>> in
    >>> implementation.  It amounts to hiding pseudo-variables inside
    the states,
    >>> but not describing them.
    >>> Thus, I would much prefer solution 2, but it is of course up to
    the WG.
    >>
    >>
    >> Well, option 2 wouldn't be too hard. Option 3 would probably
    involve the
    >> most
    >> change.
    >>
    >>> ...
    >>>
    >>>>> Minor Issues:
    >>>>> In section 2.3.3 describing the state transitions for push
    >>>>> servers, there is an event (event 1) described as "the server was
    >>>>> Down but is now Up."  The state transition diagram describes this
    >>>>> as
    >>>>> being a valid event that does not change the servers state if the
    >>>>> server is in any state other than "Down." In one sense, this is
    >>>>> reasonable, saying that such an event is harmless.  I would
    >>>>> however
    >>>>> expect some sort of logging or administrative notification, as
    >>>>> something in the system is quite confused.
    >>>>
    >>>>
    >>>> Again, I see your point but it seems to me to be a matter of state
    >>>> machine style. Note that the "event" is described as a
    condition, so
    >>>> from that point of view, it is true anytime the state is other than
    >>>> Down. On the other hand, if you view it as strictly an event, you
    >>>> are
    >>>> left with the question of what to put at the intersection of a
    state
    >>>> and event in the table when it is impossible for that event to
    occur
    >>>> in that state. Some people note this with an "N/A" (not applicable)
    >>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
    >>>> use
    >>>> "N/A" so it would probably be simplest to change to that for
    >>>> consistency.
    >>>
    >>>
    >>> I think N/A would be good.
    >>
    >>
    >> OK.
    >>
    >>> ...
    >>>
    >>>>> Text in section 3.2.2.1 on lifetimes and the information
    >>>>> maintenance in section 3.3 imply that the clients and servers must
    >>>>> maintain a connection. Presumably, this is required already by the
    >>>>> RBridge Channel protocol, and I understand that we should not
    >>>>> repeat
    >>>>> the entire protocol here.  It would seem to make readers life MUCH
    >>>>> simpler if the text noted that the RBridge Channel protocol
    >>>>> requires
    >>>>> that there be a maintained connection between the client and the
    >>>>> server, and that these mechanisms leverage the presence of that
    >>>>> connection.
    >>>>
    >>>>
    >>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
    >>>> rather than a connection protocol. So there is no guaranteed
    >>>> continuity of connection between RBridges that have previously
    >>>> exchanged RBridge Channel messages. But connection would only be
    >>>> lost
    >>>> if the network partitions since RBridge Channel messages look like
    >>>> data packets to any transit RBridges and will get forwarded as long
    >>>> as
    >>>> there is any route. Network partition is immediately visible in the
    >>>> link state database to the RBridges at both ends of an RBridge
    >>>> Channel
    >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
    >>>> longer
    >>>> reachable (i.e., RBridge Channel protocol packets would no longer
    >>>> get
    >>>> through), then all pull responses from that Pull Directory MUST be
    >>>> discarded since cache consistency update messages can't get
    through.
    >>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
    >>>
    >>>
    >>> I don't think a reference to 3.7 is sufficient, although it is
    helpful.
    >>> If the protocol is a datagram protocol, and if it is important
    to discard
    >>> data from unreachable pull servers, then I think 3.7 NEEDS to
    say more
    >>> than
    >>> just ~if you happen to magically figure out you can't reach the
    server,
    >>> discard data it has given you.~  From the rest of the text, this
    is an
    >>> important and unspecified protocol mechanism.
    >>
    >>
    >> Figuring out whether/how you can reach other RBridges is a basic
    >> function of TRILL IS-IS based routing, not something "magical".
    >> Whenever their is a topology change, an RBridge MUST determine routes
    >> to all data reachable RBridges in the new topology. If there was an
    >> RBridge previously reachable but no longer reachable, as would be the
    >> case for all RBridges on the other side of a network partition, this
    >> MUST be noticed so that, for example, all MAC reachability
    information
    >> associated with each of the no longer reachable RBridges can be
    discarded.
    >> It does not seem like much of a stretch to believe that an
    RBridge would
    >> keep track of the Pull Directory or Directories it was using, each of
    >> which will be some other RBridge, and notice when a topology change
    >> makes any of them inaccessible. But I have no problem adding some
    >> wording to make this clearer.
    >>
    >>> ...
    >>> In the flooding flag and behavior, (long text elided) I don't
    think there
    >>> is
    >>> anything wrong with the intended behavior.  It is just that the very
    >>> brief
    >>> description of the FL flag leads the reader to an incorrect
    expectation.
    >>> Yes, it gets sorted out, but that is not good.  What I would
    suggest is
    >>> when
    >>> the flag is defined (with whatever name you choose) note that
    "for the
    >>> qtypes 2,3,and 4, the flag indicates that the server should
    flood its
    >>> response."
    >>
    >>
    >> We can work  on clarifying the wording.
    >>
    >> Thanks,
    >> Donald
    >> =============================
    >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
    >>   155 Beaver Street, Milford, MA 01757 USA
    >>   [email protected] <mailto:[email protected]>
    >>
    >


_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill

Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Reply via email to