Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Joel M. Halpern Mon, 12 Dec 2016 15:38:00 -0800

Thank you Donald. One major and a few minor points I noticed whilereading. This does look to have addressed all my major concerns, andmost of my minor concerns.


Major:

The QTYPE table in section 3.2.1 lists the values 3 and 4 asunused. (This appears to have changed between versions 7 and 8.Possibly in an effort to address my earlier question about why thesevalues were used.) The Pull Directory Forwarding text in section3.2.2.2 still explicitly assigns meanings and responses to QTYPEs 3 and4. Either those values are to be used, in which case 3.2.1 needs to sayso. Or they are not to be used, and 2 is used for all the ARP-likebehaviors. In which case 3.2.2.2 needs to discuss this.


Minor:

The text is now clear as to what the content is when frames areincluded in a query (3.2.1) It would seem helpful to implementors ifthe motivation for distinguishing between type 2 and type 5 in therequest, since the behavior is apparently decidable based on the framecontent itself.

In section 3.2.2.1 on the Response format, in discussing the SIZEfield of the response record, the text refers to errors in the QUERYrecords and to subsequent QUERY records. I presume that this wasintended to say RESPONSE Record in each case?

In bullet 1 of section 3.3, at the end, in describing thepossibility of an all-entries flush (F, P, and N bits set), I think thetext intends that the count must be 0 to trigger this behavior. Itwould help to say that.


On 12/11/16 12:19 AM, Donald Eastlake wrote:

Hi Joel,

Sorry for the delay but we have attempted to respond to your points in
version -09 of the draft. There were also changes unrelated to your
comments which are briefly described in
https://www.ietf.org/mail-archive/web/trill/current/msg07572.html
<https://www.ietf.org/mail-archive/web/trill/current/msg07572.html>

Additional changes in -09 including making "SHOULD" the implementation
requirement for methods 2 and 3.

Concerning the possible change to the Push Directory state machine,
looking at this it appears that changes by adding states would have to
be more extensive than I originally thought. In any case, in this
version, some explanatory text has been added in Section 2.3.2.

Please take a look when convenient.

Thanks,
Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 [email protected] <mailto:[email protected]>

On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <[email protected]
<mailto:[email protected]>> wrote:

    Hi Joel,

    On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>> wrote:
    > If by the connectivity check to the directory server, you mean the
    > underlying IS-IS routing reporting connectivity, then say that.

    OK.

    > While that
    > is not actually interchangeable with real connectivity, it is perfectly
    > reasoanble for the WG to deem it sufficient.  I think it would only take a
    > sentence or two to clarify for the reader that what is meant is apparent
    > topological connectivity, as distinct from verified communication.

    The phrase usually used in TRILL (See RFC 7780) is "data reachable".

    Thanks,
    Donald
    =============================
     Donald E. Eastlake 3rd   +1-508-333-2270 <tel:%2B1-508-333-2270> (cell)
     155 Beaver Street, Milford, MA 01757 USA
     [email protected] <mailto:[email protected]>

    > Yours,
    > Joel
    >
    >
    > On 4/15/16 11:12 PM, Donald Eastlake wrote:
    >>
    >> Hi Joel,
    >>
    >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>>
    >> wrote:
    >>>
    >>> Thank you Donald.  Points of agreement elided, some responses to
    try to
    >>> clarify my observations.  I will note that from your comments
    about 3.1,
    >>> I
    >>> believe my concerns, now moved to 3.7, are larger, as I had
    assumed that
    >>> the
    >>> magic was in some other protocol, and you now say it is not defined
    >>> there.
    >>>
    >>> Yours,
    >>> Joel
    >>>
    >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
    >>>>
    >>>>
    >>>> Hi Joel
    >>>>
    >>>> Thanks for your thorough review and comments. See below
    >>>>
    >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern
    <[email protected] <mailto:[email protected]>
    >>>>   <mailto:[email protected] <mailto:[email protected]>> wrote:
    >>>>
    >>> ...
    >>>
    >>>>> Major Issues:
    >>>>> In the state machine transitions in section 2.3.3
    >>>>> for push servers, it appears that if the event indicating that the
    >>>>> server is being shut down occurs while the server is already Going
    >>>>> Stand-By or Uncompleting, the transitions indicate that this
    >>>>> "going
    >>>>> down" event will be lost.  A strict reading of this would seem to
    >>>>> mean that the "go Down" event would need to recur after the
    >>>>> timeout
    >>>>> condition.  This would seem to be best addressed by a new state
    >>>>> "Going-Down" whose timeout behavior is to move to down state.
    >>>>
    >>>>
    >>>> I understand your point but "going down" and the like are called
    >>>> "events or conditions" in this draft, not just events.
    >>>> The problem with adding a single "Going-Down" state is that
    >>>> transition
    >>>> to that state would lose the information as to whether or not the
    >>>> Push
    >>>> Directory had been advertising that it was pushing complete
    >>>> information or not. The reason to remember this is that you would
    >>>> want
    >>>> to behave a differently if the "going down" condition was revoked
    >>>> before it completed. This information could be preserved in a
    >>>> Boolean
    >>>> pseudo variable but the current style of state machine in this
    draft
    >>>> avoids such pseudo variables and encodes all of the relevant push
    >>>> directory's state into the state machine state. Thus, I can see
    >>>> three
    >>>> possible responses to your comment:
    >>>>
    >>>> 1) Change wording to emphasize that these "events or
    conditions" can
    >>>> be conditions that cause a state transition some substantial time
    >>>> after they become true.
    >>>>
    >>>> 2) Add two new states: (1) going down - was complete; (2) going
    down
    >>>> -
    >>>> was incomplete.
    >>>>
    >>>> 3) Change the style of state machine to admit pseudo variables
    which
    >>>> can be set and testing as part of the state machinery.
    >>>>
    >>>> Option 1 is just some minor wording changes but adopting either
    >>>> options 2 or 3 involves more extensive changes so I would prefer to
    >>>> avoid them.
    >>>
    >>>
    >>>  From what I have seen, trying to build a state machine with
    conditions
    >>> rather than events is fraught with problems and tends to lead to
    errors
    >>> in
    >>> implementation.  It amounts to hiding pseudo-variables inside
    the states,
    >>> but not describing them.
    >>> Thus, I would much prefer solution 2, but it is of course up to
    the WG.
    >>
    >>
    >> Well, option 2 wouldn't be too hard. Option 3 would probably
    involve the
    >> most
    >> change.
    >>
    >>> ...
    >>>
    >>>>> Minor Issues:
    >>>>> In section 2.3.3 describing the state transitions for push
    >>>>> servers, there is an event (event 1) described as "the server was
    >>>>> Down but is now Up."  The state transition diagram describes this
    >>>>> as
    >>>>> being a valid event that does not change the servers state if the
    >>>>> server is in any state other than "Down." In one sense, this is
    >>>>> reasonable, saying that such an event is harmless.  I would
    >>>>> however
    >>>>> expect some sort of logging or administrative notification, as
    >>>>> something in the system is quite confused.
    >>>>
    >>>>
    >>>> Again, I see your point but it seems to me to be a matter of state
    >>>> machine style. Note that the "event" is described as a
    condition, so
    >>>> from that point of view, it is true anytime the state is other than
    >>>> Down. On the other hand, if you view it as strictly an event, you
    >>>> are
    >>>> left with the question of what to put at the intersection of a
    state
    >>>> and event in the table when it is impossible for that event to
    occur
    >>>> in that state. Some people note this with an "N/A" (not applicable)
    >>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
    >>>> use
    >>>> "N/A" so it would probably be simplest to change to that for
    >>>> consistency.
    >>>
    >>>
    >>> I think N/A would be good.
    >>
    >>
    >> OK.
    >>
    >>> ...
    >>>
    >>>>> Text in section 3.2.2.1 on lifetimes and the information
    >>>>> maintenance in section 3.3 imply that the clients and servers must
    >>>>> maintain a connection. Presumably, this is required already by the
    >>>>> RBridge Channel protocol, and I understand that we should not
    >>>>> repeat
    >>>>> the entire protocol here.  It would seem to make readers life MUCH
    >>>>> simpler if the text noted that the RBridge Channel protocol
    >>>>> requires
    >>>>> that there be a maintained connection between the client and the
    >>>>> server, and that these mechanisms leverage the presence of that
    >>>>> connection.
    >>>>
    >>>>
    >>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
    >>>> rather than a connection protocol. So there is no guaranteed
    >>>> continuity of connection between RBridges that have previously
    >>>> exchanged RBridge Channel messages. But connection would only be
    >>>> lost
    >>>> if the network partitions since RBridge Channel messages look like
    >>>> data packets to any transit RBridges and will get forwarded as long
    >>>> as
    >>>> there is any route. Network partition is immediately visible in the
    >>>> link state database to the RBridges at both ends of an RBridge
    >>>> Channel
    >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
    >>>> longer
    >>>> reachable (i.e., RBridge Channel protocol packets would no longer
    >>>> get
    >>>> through), then all pull responses from that Pull Directory MUST be
    >>>> discarded since cache consistency update messages can't get
    through.
    >>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
    >>>
    >>>
    >>> I don't think a reference to 3.7 is sufficient, although it is
    helpful.
    >>> If the protocol is a datagram protocol, and if it is important
    to discard
    >>> data from unreachable pull servers, then I think 3.7 NEEDS to
    say more
    >>> than
    >>> just ~if you happen to magically figure out you can't reach the
    server,
    >>> discard data it has given you.~  From the rest of the text, this
    is an
    >>> important and unspecified protocol mechanism.
    >>
    >>
    >> Figuring out whether/how you can reach other RBridges is a basic
    >> function of TRILL IS-IS based routing, not something "magical".
    >> Whenever their is a topology change, an RBridge MUST determine routes
    >> to all data reachable RBridges in the new topology. If there was an
    >> RBridge previously reachable but no longer reachable, as would be the
    >> case for all RBridges on the other side of a network partition, this
    >> MUST be noticed so that, for example, all MAC reachability
    information
    >> associated with each of the no longer reachable RBridges can be
    discarded.
    >> It does not seem like much of a stretch to believe that an
    RBridge would
    >> keep track of the Pull Directory or Directories it was using, each of
    >> which will be some other RBridge, and notice when a topology change
    >> makes any of them inaccessible. But I have no problem adding some
    >> wording to make this clearer.
    >>
    >>> ...
    >>> In the flooding flag and behavior, (long text elided) I don't
    think there
    >>> is
    >>> anything wrong with the intended behavior.  It is just that the very
    >>> brief
    >>> description of the FL flag leads the reader to an incorrect
    expectation.
    >>> Yes, it gets sorted out, but that is not good.  What I would
    suggest is
    >>> when
    >>> the flag is defined (with whatever name you choose) note that
    "for the
    >>> qtypes 2,3,and 4, the flag indicates that the server should
    flood its
    >>> response."
    >>
    >>
    >> We can work  on clarifying the wording.
    >>
    >> Thanks,
    >> Donald
    >> =============================
    >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
    >>   155 Beaver Street, Milford, MA 01757 USA
    >>   [email protected] <mailto:[email protected]>
    >>
    >


_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill

Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Reply via email to