Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Donald Eastlake Tue, 13 Dec 2016 23:06:07 -0800

Hi Joel,

Thanks.
A -10 version has been posted that is intended to incorporate these
improvements.


Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 [email protected]


On Tue, Dec 13, 2016 at 3:26 PM, Joel M. Halpern <[email protected]> wrote:
> Thanks.  That works for me.  I suspect the 3.2.1 / 3.2.2.2 disconnect was a
> skipped correction.
>
> Yours,
> Joel
>
>
> On 12/13/16 3:23 PM, Donald Eastlake wrote:
>>
>> Hi Joel,
>>
>> Thanks for your prompt response. See below at <de>
>>
>> -----Original Message-----
>> From: trill [mailto:[email protected]] On Behalf Of Joel M. Halpern
>> Sent: Monday, December 12, 2016 6:36 PM
>> To: Donald Eastlake
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [trill] RtgDir review of
>> draft-ietf-trill-directory-assist-mechanisms-07.txt
>>
>> Thank you Donald.  One major and a few minor points I noticed while
>> reading.  This does look to have addressed all my major concerns, and
>> most of my minor concerns.
>>
>> <de> Thanks.
>>
>> Major:
>>      The QTYPE table in section 3.2.1 lists the values 3 and 4 as
>> unused.  (This appears to have changed between versions 7 and 8.
>> Possibly in an effort to address my earlier question about why these
>> values were used.)  The  Pull Directory Forwarding text in section
>> 3.2.2.2 still explicitly assigns meanings and responses to QTYPEs 3 and
>> 4.  Either those values are to be used, in which case 3.2.1 needs to say
>> so.  Or they are not to be used, and 2 is used for all the ARP-like
>> behaviors.  In which case 3.2.2.2 needs to discuss this.
>>
>> <de> Sorry, 3.2.2.2 was overlooked when 3.2.2.1 was updated. This should
>> be easy to fix.
>>
>> <de> I do see a difference between QTYPE 2 and QTYPE 5.
>>         QTYPE 2 can be seen as saying to ignore the MAC destination
>> address, look at the Ethertype, and process as an ARP, ND, or RARP packet
>> (or reject if none of these).
>>         QTYPE 5 can be seen as saying to ignore the Ethertype and do
>> various lookups and/or forwarding based on the MAC destination address.
>>         These seems like different services although I suppose you could
>> guess heuristically which was wanted.
>>
>> Minor:
>>      The text is now clear as to what the content is when frames are
>> included in a query (3.2.1)  It would seem helpful to implementors if
>> the motivation for distinguishing between type 2 and type 5 in the
>> request, since the behavior is apparently decidable based on the frame
>> content itself.
>>
>> <de3> OK. Something like my text above could be included.
>>
>>      In section 3.2.2.1 on the Response format, in discussing the SIZE
>> field of the response record, the text refers to errors in the QUERY
>> records and to subsequent QUERY records.  I presume that this was
>> intended to say RESPONSE Record in each case?
>>
>> <de> Yup. Looks like a copy and paste error that slipped by.
>>
>>      In bullet 1 of section 3.3, at the end, in describing the
>> possibility of an all-entries flush (F, P, and N bits set), I think the
>> text intends that the count must be 0 to trigger this behavior.  It
>> would help to say that.
>>
>> <de> OK. Seems fairly clear to me but it can't hurt to make it clearer.
>>
>> <de>Thanks,
>> Donald
>> ==========================================
>> Donald E. Eastlake, 3rd     [email protected]
>> 155 Beaver Street              +1-508-333-2270
>>  Milford, MA 01757 USA
>>
>>
>> On 12/11/16 12:19 AM, Donald Eastlake wrote:
>>>
>>> Hi Joel,
>>>
>>> Sorry for the delay but we have attempted to respond to your points in
>>> version -09 of the draft. There were also changes unrelated to your
>>> comments which are briefly described in
>>> https://www.ietf.org/mail-archive/web/trill/current/msg07572.html
>>> <https://www.ietf.org/mail-archive/web/trill/current/msg07572.html>
>>>
>>> Additional changes in -09 including making "SHOULD" the implementation
>>> requirement for methods 2 and 3.
>>>
>>> Concerning the possible change to the Push Directory state machine,
>>> looking at this it appears that changes by adding states would have to
>>> be more extensive than I originally thought. In any case, in this
>>> version, some explanatory text has been added in Section 2.3.2.
>>>
>>> Please take a look when convenient.
>>>
>>> Thanks,
>>> Donald
>>> ===============================
>>>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>>  155 Beaver Street, Milford, MA 01757 USA
>>>  [email protected] <mailto:[email protected]>
>>>
>>> On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Hi Joel,
>>>
>>>     On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern
>>>     <[email protected] <mailto:[email protected]>> wrote:
>>>     > If by the connectivity check to the directory server, you mean the
>>>     > underlying IS-IS routing reporting connectivity, then say that.
>>>
>>>     OK.
>>>
>>>     > While that
>>>     > is not actually interchangeable with real connectivity, it is
>>> perfectly
>>>     > reasoanble for the WG to deem it sufficient.  I think it would only
>>> take a
>>>     > sentence or two to clarify for the reader that what is meant is
>>> apparent
>>>     > topological connectivity, as distinct from verified communication.
>>>
>>>     The phrase usually used in TRILL (See RFC 7780) is "data reachable".
>>>
>>>     Thanks,
>>>     Donald
>>>     =============================
>>>      Donald E. Eastlake 3rd   +1-508-333-2270 <tel:%2B1-508-333-2270>
>>> (cell)
>>>      155 Beaver Street, Milford, MA 01757 USA
>>>      [email protected] <mailto:[email protected]>
>>>
>>>     > Yours,
>>>     > Joel
>>>     >
>>>     >
>>>     > On 4/15/16 11:12 PM, Donald Eastlake wrote:
>>>     >>
>>>     >> Hi Joel,
>>>     >>
>>>     >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern
>>>     <[email protected] <mailto:[email protected]>>
>>>     >> wrote:
>>>     >>>
>>>     >>> Thank you Donald.  Points of agreement elided, some responses to
>>>     try to
>>>     >>> clarify my observations.  I will note that from your comments
>>>     about 3.1,
>>>     >>> I
>>>     >>> believe my concerns, now moved to 3.7, are larger, as I had
>>>     assumed that
>>>     >>> the
>>>     >>> magic was in some other protocol, and you now say it is not
>>> defined
>>>     >>> there.
>>>     >>>
>>>     >>> Yours,
>>>     >>> Joel
>>>     >>>
>>>     >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
>>>     >>>>
>>>     >>>>
>>>     >>>> Hi Joel
>>>     >>>>
>>>     >>>> Thanks for your thorough review and comments. See below
>>>     >>>>
>>>     >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern
>>>     <[email protected] <mailto:[email protected]>
>>>     >>>>   <mailto:[email protected] <mailto:[email protected]>>
>>> wrote:
>>>     >>>>
>>>     >>> ...
>>>     >>>
>>>     >>>>> Major Issues:
>>>     >>>>> In the state machine transitions in section 2.3.3
>>>     >>>>> for push servers, it appears that if the event indicating that
>>> the
>>>     >>>>> server is being shut down occurs while the server is already
>>> Going
>>>     >>>>> Stand-By or Uncompleting, the transitions indicate that this
>>>     >>>>> "going
>>>     >>>>> down" event will be lost.  A strict reading of this would seem
>>> to
>>>     >>>>> mean that the "go Down" event would need to recur after the
>>>     >>>>> timeout
>>>     >>>>> condition.  This would seem to be best addressed by a new state
>>>     >>>>> "Going-Down" whose timeout behavior is to move to down state.
>>>     >>>>
>>>     >>>>
>>>     >>>> I understand your point but "going down" and the like are called
>>>     >>>> "events or conditions" in this draft, not just events.
>>>     >>>> The problem with adding a single "Going-Down" state is that
>>>     >>>> transition
>>>     >>>> to that state would lose the information as to whether or not
>>> the
>>>     >>>> Push
>>>     >>>> Directory had been advertising that it was pushing complete
>>>     >>>> information or not. The reason to remember this is that you
>>> would
>>>     >>>> want
>>>     >>>> to behave a differently if the "going down" condition was
>>> revoked
>>>     >>>> before it completed. This information could be preserved in a
>>>     >>>> Boolean
>>>     >>>> pseudo variable but the current style of state machine in this
>>>     draft
>>>     >>>> avoids such pseudo variables and encodes all of the relevant
>>> push
>>>     >>>> directory's state into the state machine state. Thus, I can see
>>>     >>>> three
>>>     >>>> possible responses to your comment:
>>>     >>>>
>>>     >>>> 1) Change wording to emphasize that these "events or
>>>     conditions" can
>>>     >>>> be conditions that cause a state transition some substantial
>>> time
>>>     >>>> after they become true.
>>>     >>>>
>>>     >>>> 2) Add two new states: (1) going down - was complete; (2) going
>>>     down
>>>     >>>> -
>>>     >>>> was incomplete.
>>>     >>>>
>>>     >>>> 3) Change the style of state machine to admit pseudo variables
>>>     which
>>>     >>>> can be set and testing as part of the state machinery.
>>>     >>>>
>>>     >>>> Option 1 is just some minor wording changes but adopting either
>>>     >>>> options 2 or 3 involves more extensive changes so I would prefer
>>> to
>>>     >>>> avoid them.
>>>     >>>
>>>     >>>
>>>     >>>  From what I have seen, trying to build a state machine with
>>>     conditions
>>>     >>> rather than events is fraught with problems and tends to lead to
>>>     errors
>>>     >>> in
>>>     >>> implementation.  It amounts to hiding pseudo-variables inside
>>>     the states,
>>>     >>> but not describing them.
>>>     >>> Thus, I would much prefer solution 2, but it is of course up to
>>>     the WG.
>>>     >>
>>>     >>
>>>     >> Well, option 2 wouldn't be too hard. Option 3 would probably
>>>     involve the
>>>     >> most
>>>     >> change.
>>>     >>
>>>     >>> ...
>>>     >>>
>>>     >>>>> Minor Issues:
>>>     >>>>> In section 2.3.3 describing the state transitions for push
>>>     >>>>> servers, there is an event (event 1) described as "the server
>>> was
>>>     >>>>> Down but is now Up."  The state transition diagram describes
>>> this
>>>     >>>>> as
>>>     >>>>> being a valid event that does not change the servers state if
>>> the
>>>     >>>>> server is in any state other than "Down." In one sense, this is
>>>     >>>>> reasonable, saying that such an event is harmless.  I would
>>>     >>>>> however
>>>     >>>>> expect some sort of logging or administrative notification, as
>>>     >>>>> something in the system is quite confused.
>>>     >>>>
>>>     >>>>
>>>     >>>> Again, I see your point but it seems to me to be a matter of
>>> state
>>>     >>>> machine style. Note that the "event" is described as a
>>>     condition, so
>>>     >>>> from that point of view, it is true anytime the state is other
>>> than
>>>     >>>> Down. On the other hand, if you view it as strictly an event,
>>> you
>>>     >>>> are
>>>     >>>> left with the question of what to put at the intersection of a
>>>     state
>>>     >>>> and event in the table when it is impossible for that event to
>>>     occur
>>>     >>>> in that state. Some people note this with an "N/A" (not
>>> applicable)
>>>     >>>> entry. In fact, previous TRILL state diagrams such as in RFC
>>> 7177
>>>     >>>> use
>>>     >>>> "N/A" so it would probably be simplest to change to that for
>>>     >>>> consistency.
>>>     >>>
>>>     >>>
>>>     >>> I think N/A would be good.
>>>     >>
>>>     >>
>>>     >> OK.
>>>     >>
>>>     >>> ...
>>>     >>>
>>>     >>>>> Text in section 3.2.2.1 on lifetimes and the information
>>>     >>>>> maintenance in section 3.3 imply that the clients and servers
>>> must
>>>     >>>>> maintain a connection. Presumably, this is required already by
>>> the
>>>     >>>>> RBridge Channel protocol, and I understand that we should not
>>>     >>>>> repeat
>>>     >>>>> the entire protocol here.  It would seem to make readers life
>>> MUCH
>>>     >>>>> simpler if the text noted that the RBridge Channel protocol
>>>     >>>>> requires
>>>     >>>>> that there be a maintained connection between the client and
>>> the
>>>     >>>>> server, and that these mechanisms leverage the presence of that
>>>     >>>>> connection.
>>>     >>>>
>>>     >>>>
>>>     >>>> The basic RBridge Channel protocol [RFC7178] is a datagram
>>> protocol
>>>     >>>> rather than a connection protocol. So there is no guaranteed
>>>     >>>> continuity of connection between RBridges that have previously
>>>     >>>> exchanged RBridge Channel messages. But connection would only be
>>>     >>>> lost
>>>     >>>> if the network partitions since RBridge Channel messages look
>>> like
>>>     >>>> data packets to any transit RBridges and will get forwarded as
>>> long
>>>     >>>> as
>>>     >>>> there is any route. Network partition is immediately visible in
>>> the
>>>     >>>> link state database to the RBridges at both ends of an RBridge
>>>     >>>> Channel
>>>     >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
>>>     >>>> longer
>>>     >>>> reachable (i.e., RBridge Channel protocol packets would no
>>> longer
>>>     >>>> get
>>>     >>>> through), then all pull responses from that Pull Directory MUST
>>> be
>>>     >>>> discarded since cache consistency update messages can't get
>>>     through.
>>>     >>>> Perhaps a reference to Section 3.7 should be added to Section
>>> 3.3.
>>>     >>>
>>>     >>>
>>>     >>> I don't think a reference to 3.7 is sufficient, although it is
>>>     helpful.
>>>     >>> If the protocol is a datagram protocol, and if it is important
>>>     to discard
>>>     >>> data from unreachable pull servers, then I think 3.7 NEEDS to
>>>     say more
>>>     >>> than
>>>     >>> just ~if you happen to magically figure out you can't reach the
>>>     server,
>>>     >>> discard data it has given you.~  From the rest of the text, this
>>>     is an
>>>     >>> important and unspecified protocol mechanism.
>>>     >>
>>>     >>
>>>     >> Figuring out whether/how you can reach other RBridges is a basic
>>>     >> function of TRILL IS-IS based routing, not something "magical".
>>>     >> Whenever their is a topology change, an RBridge MUST determine
>>> routes
>>>     >> to all data reachable RBridges in the new topology. If there was
>>> an
>>>     >> RBridge previously reachable but no longer reachable, as would be
>>> the
>>>     >> case for all RBridges on the other side of a network partition,
>>> this
>>>     >> MUST be noticed so that, for example, all MAC reachability
>>>     information
>>>     >> associated with each of the no longer reachable RBridges can be
>>>     discarded.
>>>     >> It does not seem like much of a stretch to believe that an
>>>     RBridge would
>>>     >> keep track of the Pull Directory or Directories it was using, each
>>> of
>>>     >> which will be some other RBridge, and notice when a topology
>>> change
>>>     >> makes any of them inaccessible. But I have no problem adding some
>>>     >> wording to make this clearer.
>>>     >>
>>>     >>> ...
>>>     >>> In the flooding flag and behavior, (long text elided) I don't
>>>     think there
>>>     >>> is
>>>     >>> anything wrong with the intended behavior.  It is just that the
>>> very
>>>     >>> brief
>>>     >>> description of the FL flag leads the reader to an incorrect
>>>     expectation.
>>>     >>> Yes, it gets sorted out, but that is not good.  What I would
>>>     suggest is
>>>     >>> when
>>>     >>> the flag is defined (with whatever name you choose) note that
>>>     "for the
>>>     >>> qtypes 2,3,and 4, the flag indicates that the server should
>>>     flood its
>>>     >>> response."
>>>     >>
>>>     >>
>>>     >> We can work  on clarifying the wording.
>>>     >>
>>>     >> Thanks,
>>>     >> Donald
>>>     >> =============================
>>>     >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>>     >>   155 Beaver Street, Milford, MA 01757 USA
>>>     >>   [email protected] <mailto:[email protected]>
>>>     >>
>>>     >
>>>
>>>
>>
>> _______________________________________________
>> trill mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/trill
>>
>

_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill

Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Reply via email to