Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Donald Eastlake Sat, 10 Dec 2016 21:19:48 -0800

Hi Joel,

Sorry for the delay but we have attempted to respond to your points in
version -09 of the draft. There were also changes unrelated to your
comments which are briefly described in
https://www.ietf.org/mail-archive/web/trill/current/msg07572.html


Additional changes in -09 including making "SHOULD" the implementation
requirement for methods 2 and 3.

Concerning the possible change to the Push Directory state machine, looking
at this it appears that changes by adding states would have to be more
extensive than I originally thought. In any case, in this version, some
explanatory text has been added in Section 2.3.2.

Please take a look when convenient.

Thanks,
Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 [email protected]

On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <[email protected]> wrote:

> Hi Joel,
>
> On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern <[email protected]>
> wrote:
> > If by the connectivity check to the directory server, you mean the
> > underlying IS-IS routing reporting connectivity, then say that.
>
> OK.
>
> > While that
> > is not actually interchangeable with real connectivity, it is perfectly
> > reasoanble for the WG to deem it sufficient.  I think it would only take
> a
> > sentence or two to clarify for the reader that what is meant is apparent
> > topological connectivity, as distinct from verified communication.
>
> The phrase usually used in TRILL (See RFC 7780) is "data reachable".
>
> Thanks,
> Donald
> =============================
>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>  155 Beaver Street, Milford, MA 01757 USA
>  [email protected]
>
> > Yours,
> > Joel
> >
> >
> > On 4/15/16 11:12 PM, Donald Eastlake wrote:
> >>
> >> Hi Joel,
> >>
> >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern <[email protected]>
> >> wrote:
> >>>
> >>> Thank you Donald.  Points of agreement elided, some responses to try to
> >>> clarify my observations.  I will note that from your comments about
> 3.1,
> >>> I
> >>> believe my concerns, now moved to 3.7, are larger, as I had assumed
> that
> >>> the
> >>> magic was in some other protocol, and you now say it is not defined
> >>> there.
> >>>
> >>> Yours,
> >>> Joel
> >>>
> >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
> >>>>
> >>>>
> >>>> Hi Joel
> >>>>
> >>>> Thanks for your thorough review and comments. See below
> >>>>
> >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern <[email protected]
> >>>>   <mailto:[email protected]> wrote:
> >>>>
> >>> ...
> >>>
> >>>>> Major Issues:
> >>>>> In the state machine transitions in section 2.3.3
> >>>>> for push servers, it appears that if the event indicating that the
> >>>>> server is being shut down occurs while the server is already Going
> >>>>> Stand-By or Uncompleting, the transitions indicate that this
> >>>>> "going
> >>>>> down" event will be lost.  A strict reading of this would seem to
> >>>>> mean that the "go Down" event would need to recur after the
> >>>>> timeout
> >>>>> condition.  This would seem to be best addressed by a new state
> >>>>> "Going-Down" whose timeout behavior is to move to down state.
> >>>>
> >>>>
> >>>> I understand your point but "going down" and the like are called
> >>>> "events or conditions" in this draft, not just events.
> >>>> The problem with adding a single "Going-Down" state is that
> >>>> transition
> >>>> to that state would lose the information as to whether or not the
> >>>> Push
> >>>> Directory had been advertising that it was pushing complete
> >>>> information or not. The reason to remember this is that you would
> >>>> want
> >>>> to behave a differently if the "going down" condition was revoked
> >>>> before it completed. This information could be preserved in a
> >>>> Boolean
> >>>> pseudo variable but the current style of state machine in this draft
> >>>> avoids such pseudo variables and encodes all of the relevant push
> >>>> directory's state into the state machine state. Thus, I can see
> >>>> three
> >>>> possible responses to your comment:
> >>>>
> >>>> 1) Change wording to emphasize that these "events or conditions" can
> >>>> be conditions that cause a state transition some substantial time
> >>>> after they become true.
> >>>>
> >>>> 2) Add two new states: (1) going down - was complete; (2) going down
> >>>> -
> >>>> was incomplete.
> >>>>
> >>>> 3) Change the style of state machine to admit pseudo variables which
> >>>> can be set and testing as part of the state machinery.
> >>>>
> >>>> Option 1 is just some minor wording changes but adopting either
> >>>> options 2 or 3 involves more extensive changes so I would prefer to
> >>>> avoid them.
> >>>
> >>>
> >>>  From what I have seen, trying to build a state machine with conditions
> >>> rather than events is fraught with problems and tends to lead to errors
> >>> in
> >>> implementation.  It amounts to hiding pseudo-variables inside the
> states,
> >>> but not describing them.
> >>> Thus, I would much prefer solution 2, but it is of course up to the WG.
> >>
> >>
> >> Well, option 2 wouldn't be too hard. Option 3 would probably involve the
> >> most
> >> change.
> >>
> >>> ...
> >>>
> >>>>> Minor Issues:
> >>>>> In section 2.3.3 describing the state transitions for push
> >>>>> servers, there is an event (event 1) described as "the server was
> >>>>> Down but is now Up."  The state transition diagram describes this
> >>>>> as
> >>>>> being a valid event that does not change the servers state if the
> >>>>> server is in any state other than "Down." In one sense, this is
> >>>>> reasonable, saying that such an event is harmless.  I would
> >>>>> however
> >>>>> expect some sort of logging or administrative notification, as
> >>>>> something in the system is quite confused.
> >>>>
> >>>>
> >>>> Again, I see your point but it seems to me to be a matter of state
> >>>> machine style. Note that the "event" is described as a condition, so
> >>>> from that point of view, it is true anytime the state is other than
> >>>> Down. On the other hand, if you view it as strictly an event, you
> >>>> are
> >>>> left with the question of what to put at the intersection of a state
> >>>> and event in the table when it is impossible for that event to occur
> >>>> in that state. Some people note this with an "N/A" (not applicable)
> >>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
> >>>> use
> >>>> "N/A" so it would probably be simplest to change to that for
> >>>> consistency.
> >>>
> >>>
> >>> I think N/A would be good.
> >>
> >>
> >> OK.
> >>
> >>> ...
> >>>
> >>>>> Text in section 3.2.2.1 on lifetimes and the information
> >>>>> maintenance in section 3.3 imply that the clients and servers must
> >>>>> maintain a connection. Presumably, this is required already by the
> >>>>> RBridge Channel protocol, and I understand that we should not
> >>>>> repeat
> >>>>> the entire protocol here.  It would seem to make readers life MUCH
> >>>>> simpler if the text noted that the RBridge Channel protocol
> >>>>> requires
> >>>>> that there be a maintained connection between the client and the
> >>>>> server, and that these mechanisms leverage the presence of that
> >>>>> connection.
> >>>>
> >>>>
> >>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
> >>>> rather than a connection protocol. So there is no guaranteed
> >>>> continuity of connection between RBridges that have previously
> >>>> exchanged RBridge Channel messages. But connection would only be
> >>>> lost
> >>>> if the network partitions since RBridge Channel messages look like
> >>>> data packets to any transit RBridges and will get forwarded as long
> >>>> as
> >>>> there is any route. Network partition is immediately visible in the
> >>>> link state database to the RBridges at both ends of an RBridge
> >>>> Channel
> >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
> >>>> longer
> >>>> reachable (i.e., RBridge Channel protocol packets would no longer
> >>>> get
> >>>> through), then all pull responses from that Pull Directory MUST be
> >>>> discarded since cache consistency update messages can't get through.
> >>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
> >>>
> >>>
> >>> I don't think a reference to 3.7 is sufficient, although it is helpful.
> >>> If the protocol is a datagram protocol, and if it is important to
> discard
> >>> data from unreachable pull servers, then I think 3.7 NEEDS to say more
> >>> than
> >>> just ~if you happen to magically figure out you can't reach the server,
> >>> discard data it has given you.~  From the rest of the text, this is an
> >>> important and unspecified protocol mechanism.
> >>
> >>
> >> Figuring out whether/how you can reach other RBridges is a basic
> >> function of TRILL IS-IS based routing, not something "magical".
> >> Whenever their is a topology change, an RBridge MUST determine routes
> >> to all data reachable RBridges in the new topology. If there was an
> >> RBridge previously reachable but no longer reachable, as would be the
> >> case for all RBridges on the other side of a network partition, this
> >> MUST be noticed so that, for example, all MAC reachability information
> >> associated with each of the no longer reachable RBridges can be
> discarded.
> >> It does not seem like much of a stretch to believe that an RBridge would
> >> keep track of the Pull Directory or Directories it was using, each of
> >> which will be some other RBridge, and notice when a topology change
> >> makes any of them inaccessible. But I have no problem adding some
> >> wording to make this clearer.
> >>
> >>> ...
> >>> In the flooding flag and behavior, (long text elided) I don't think
> there
> >>> is
> >>> anything wrong with the intended behavior.  It is just that the very
> >>> brief
> >>> description of the FL flag leads the reader to an incorrect
> expectation.
> >>> Yes, it gets sorted out, but that is not good.  What I would suggest is
> >>> when
> >>> the flag is defined (with whatever name you choose) note that "for the
> >>> qtypes 2,3,and 4, the flag indicates that the server should flood its
> >>> response."
> >>
> >>
> >> We can work  on clarifying the wording.
> >>
> >> Thanks,
> >> Donald
> >> =============================
> >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
> >>   155 Beaver Street, Milford, MA 01757 USA
> >>   [email protected]
> >>
> >
>

_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill

Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Reply via email to