Hi Joel, Thanks. A -10 version has been posted that is intended to incorporate these improvements.
Donald =============================== Donald E. Eastlake 3rd +1-508-333-2270 (cell) 155 Beaver Street, Milford, MA 01757 USA [email protected] On Tue, Dec 13, 2016 at 3:26 PM, Joel M. Halpern <[email protected]> wrote: > Thanks. That works for me. I suspect the 3.2.1 / 3.2.2.2 disconnect was a > skipped correction. > > Yours, > Joel > > > On 12/13/16 3:23 PM, Donald Eastlake wrote: >> >> Hi Joel, >> >> Thanks for your prompt response. See below at <de> >> >> -----Original Message----- >> From: trill [mailto:[email protected]] On Behalf Of Joel M. Halpern >> Sent: Monday, December 12, 2016 6:36 PM >> To: Donald Eastlake >> Cc: [email protected]; [email protected]; [email protected]; >> [email protected] >> Subject: Re: [trill] RtgDir review of >> draft-ietf-trill-directory-assist-mechanisms-07.txt >> >> Thank you Donald. One major and a few minor points I noticed while >> reading. This does look to have addressed all my major concerns, and >> most of my minor concerns. >> >> <de> Thanks. >> >> Major: >> The QTYPE table in section 3.2.1 lists the values 3 and 4 as >> unused. (This appears to have changed between versions 7 and 8. >> Possibly in an effort to address my earlier question about why these >> values were used.) The Pull Directory Forwarding text in section >> 3.2.2.2 still explicitly assigns meanings and responses to QTYPEs 3 and >> 4. Either those values are to be used, in which case 3.2.1 needs to say >> so. Or they are not to be used, and 2 is used for all the ARP-like >> behaviors. In which case 3.2.2.2 needs to discuss this. >> >> <de> Sorry, 3.2.2.2 was overlooked when 3.2.2.1 was updated. This should >> be easy to fix. >> >> <de> I do see a difference between QTYPE 2 and QTYPE 5. >> QTYPE 2 can be seen as saying to ignore the MAC destination >> address, look at the Ethertype, and process as an ARP, ND, or RARP packet >> (or reject if none of these). >> QTYPE 5 can be seen as saying to ignore the Ethertype and do >> various lookups and/or forwarding based on the MAC destination address. >> These seems like different services although I suppose you could >> guess heuristically which was wanted. >> >> Minor: >> The text is now clear as to what the content is when frames are >> included in a query (3.2.1) It would seem helpful to implementors if >> the motivation for distinguishing between type 2 and type 5 in the >> request, since the behavior is apparently decidable based on the frame >> content itself. >> >> <de3> OK. Something like my text above could be included. >> >> In section 3.2.2.1 on the Response format, in discussing the SIZE >> field of the response record, the text refers to errors in the QUERY >> records and to subsequent QUERY records. I presume that this was >> intended to say RESPONSE Record in each case? >> >> <de> Yup. Looks like a copy and paste error that slipped by. >> >> In bullet 1 of section 3.3, at the end, in describing the >> possibility of an all-entries flush (F, P, and N bits set), I think the >> text intends that the count must be 0 to trigger this behavior. It >> would help to say that. >> >> <de> OK. Seems fairly clear to me but it can't hurt to make it clearer. >> >> <de>Thanks, >> Donald >> ========================================== >> Donald E. Eastlake, 3rd [email protected] >> 155 Beaver Street +1-508-333-2270 >> Milford, MA 01757 USA >> >> >> On 12/11/16 12:19 AM, Donald Eastlake wrote: >>> >>> Hi Joel, >>> >>> Sorry for the delay but we have attempted to respond to your points in >>> version -09 of the draft. There were also changes unrelated to your >>> comments which are briefly described in >>> https://www.ietf.org/mail-archive/web/trill/current/msg07572.html >>> <https://www.ietf.org/mail-archive/web/trill/current/msg07572.html> >>> >>> Additional changes in -09 including making "SHOULD" the implementation >>> requirement for methods 2 and 3. >>> >>> Concerning the possible change to the Push Directory state machine, >>> looking at this it appears that changes by adding states would have to >>> be more extensive than I originally thought. In any case, in this >>> version, some explanatory text has been added in Section 2.3.2. >>> >>> Please take a look when convenient. >>> >>> Thanks, >>> Donald >>> =============================== >>> Donald E. Eastlake 3rd +1-508-333-2270 (cell) >>> 155 Beaver Street, Milford, MA 01757 USA >>> [email protected] <mailto:[email protected]> >>> >>> On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Joel, >>> >>> On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern >>> <[email protected] <mailto:[email protected]>> wrote: >>> > If by the connectivity check to the directory server, you mean the >>> > underlying IS-IS routing reporting connectivity, then say that. >>> >>> OK. >>> >>> > While that >>> > is not actually interchangeable with real connectivity, it is >>> perfectly >>> > reasoanble for the WG to deem it sufficient. I think it would only >>> take a >>> > sentence or two to clarify for the reader that what is meant is >>> apparent >>> > topological connectivity, as distinct from verified communication. >>> >>> The phrase usually used in TRILL (See RFC 7780) is "data reachable". >>> >>> Thanks, >>> Donald >>> ============================= >>> Donald E. Eastlake 3rd +1-508-333-2270 <tel:%2B1-508-333-2270> >>> (cell) >>> 155 Beaver Street, Milford, MA 01757 USA >>> [email protected] <mailto:[email protected]> >>> >>> > Yours, >>> > Joel >>> > >>> > >>> > On 4/15/16 11:12 PM, Donald Eastlake wrote: >>> >> >>> >> Hi Joel, >>> >> >>> >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern >>> <[email protected] <mailto:[email protected]>> >>> >> wrote: >>> >>> >>> >>> Thank you Donald. Points of agreement elided, some responses to >>> try to >>> >>> clarify my observations. I will note that from your comments >>> about 3.1, >>> >>> I >>> >>> believe my concerns, now moved to 3.7, are larger, as I had >>> assumed that >>> >>> the >>> >>> magic was in some other protocol, and you now say it is not >>> defined >>> >>> there. >>> >>> >>> >>> Yours, >>> >>> Joel >>> >>> >>> >>> On 4/15/16 11:23 AM, Donald Eastlake wrote: >>> >>>> >>> >>>> >>> >>>> Hi Joel >>> >>>> >>> >>>> Thanks for your thorough review and comments. See below >>> >>>> >>> >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern >>> <[email protected] <mailto:[email protected]> >>> >>>> <mailto:[email protected] <mailto:[email protected]>> >>> wrote: >>> >>>> >>> >>> ... >>> >>> >>> >>>>> Major Issues: >>> >>>>> In the state machine transitions in section 2.3.3 >>> >>>>> for push servers, it appears that if the event indicating that >>> the >>> >>>>> server is being shut down occurs while the server is already >>> Going >>> >>>>> Stand-By or Uncompleting, the transitions indicate that this >>> >>>>> "going >>> >>>>> down" event will be lost. A strict reading of this would seem >>> to >>> >>>>> mean that the "go Down" event would need to recur after the >>> >>>>> timeout >>> >>>>> condition. This would seem to be best addressed by a new state >>> >>>>> "Going-Down" whose timeout behavior is to move to down state. >>> >>>> >>> >>>> >>> >>>> I understand your point but "going down" and the like are called >>> >>>> "events or conditions" in this draft, not just events. >>> >>>> The problem with adding a single "Going-Down" state is that >>> >>>> transition >>> >>>> to that state would lose the information as to whether or not >>> the >>> >>>> Push >>> >>>> Directory had been advertising that it was pushing complete >>> >>>> information or not. The reason to remember this is that you >>> would >>> >>>> want >>> >>>> to behave a differently if the "going down" condition was >>> revoked >>> >>>> before it completed. This information could be preserved in a >>> >>>> Boolean >>> >>>> pseudo variable but the current style of state machine in this >>> draft >>> >>>> avoids such pseudo variables and encodes all of the relevant >>> push >>> >>>> directory's state into the state machine state. Thus, I can see >>> >>>> three >>> >>>> possible responses to your comment: >>> >>>> >>> >>>> 1) Change wording to emphasize that these "events or >>> conditions" can >>> >>>> be conditions that cause a state transition some substantial >>> time >>> >>>> after they become true. >>> >>>> >>> >>>> 2) Add two new states: (1) going down - was complete; (2) going >>> down >>> >>>> - >>> >>>> was incomplete. >>> >>>> >>> >>>> 3) Change the style of state machine to admit pseudo variables >>> which >>> >>>> can be set and testing as part of the state machinery. >>> >>>> >>> >>>> Option 1 is just some minor wording changes but adopting either >>> >>>> options 2 or 3 involves more extensive changes so I would prefer >>> to >>> >>>> avoid them. >>> >>> >>> >>> >>> >>> From what I have seen, trying to build a state machine with >>> conditions >>> >>> rather than events is fraught with problems and tends to lead to >>> errors >>> >>> in >>> >>> implementation. It amounts to hiding pseudo-variables inside >>> the states, >>> >>> but not describing them. >>> >>> Thus, I would much prefer solution 2, but it is of course up to >>> the WG. >>> >> >>> >> >>> >> Well, option 2 wouldn't be too hard. Option 3 would probably >>> involve the >>> >> most >>> >> change. >>> >> >>> >>> ... >>> >>> >>> >>>>> Minor Issues: >>> >>>>> In section 2.3.3 describing the state transitions for push >>> >>>>> servers, there is an event (event 1) described as "the server >>> was >>> >>>>> Down but is now Up." The state transition diagram describes >>> this >>> >>>>> as >>> >>>>> being a valid event that does not change the servers state if >>> the >>> >>>>> server is in any state other than "Down." In one sense, this is >>> >>>>> reasonable, saying that such an event is harmless. I would >>> >>>>> however >>> >>>>> expect some sort of logging or administrative notification, as >>> >>>>> something in the system is quite confused. >>> >>>> >>> >>>> >>> >>>> Again, I see your point but it seems to me to be a matter of >>> state >>> >>>> machine style. Note that the "event" is described as a >>> condition, so >>> >>>> from that point of view, it is true anytime the state is other >>> than >>> >>>> Down. On the other hand, if you view it as strictly an event, >>> you >>> >>>> are >>> >>>> left with the question of what to put at the intersection of a >>> state >>> >>>> and event in the table when it is impossible for that event to >>> occur >>> >>>> in that state. Some people note this with an "N/A" (not >>> applicable) >>> >>>> entry. In fact, previous TRILL state diagrams such as in RFC >>> 7177 >>> >>>> use >>> >>>> "N/A" so it would probably be simplest to change to that for >>> >>>> consistency. >>> >>> >>> >>> >>> >>> I think N/A would be good. >>> >> >>> >> >>> >> OK. >>> >> >>> >>> ... >>> >>> >>> >>>>> Text in section 3.2.2.1 on lifetimes and the information >>> >>>>> maintenance in section 3.3 imply that the clients and servers >>> must >>> >>>>> maintain a connection. Presumably, this is required already by >>> the >>> >>>>> RBridge Channel protocol, and I understand that we should not >>> >>>>> repeat >>> >>>>> the entire protocol here. It would seem to make readers life >>> MUCH >>> >>>>> simpler if the text noted that the RBridge Channel protocol >>> >>>>> requires >>> >>>>> that there be a maintained connection between the client and >>> the >>> >>>>> server, and that these mechanisms leverage the presence of that >>> >>>>> connection. >>> >>>> >>> >>>> >>> >>>> The basic RBridge Channel protocol [RFC7178] is a datagram >>> protocol >>> >>>> rather than a connection protocol. So there is no guaranteed >>> >>>> continuity of connection between RBridges that have previously >>> >>>> exchanged RBridge Channel messages. But connection would only be >>> >>>> lost >>> >>>> if the network partitions since RBridge Channel messages look >>> like >>> >>>> data packets to any transit RBridges and will get forwarded as >>> long >>> >>>> as >>> >>>> there is any route. Network partition is immediately visible in >>> the >>> >>>> link state database to the RBridges at both ends of an RBridge >>> >>>> Channel >>> >>>> exchange. Section 3.7 provides that if a Pull Directory is no >>> >>>> longer >>> >>>> reachable (i.e., RBridge Channel protocol packets would no >>> longer >>> >>>> get >>> >>>> through), then all pull responses from that Pull Directory MUST >>> be >>> >>>> discarded since cache consistency update messages can't get >>> through. >>> >>>> Perhaps a reference to Section 3.7 should be added to Section >>> 3.3. >>> >>> >>> >>> >>> >>> I don't think a reference to 3.7 is sufficient, although it is >>> helpful. >>> >>> If the protocol is a datagram protocol, and if it is important >>> to discard >>> >>> data from unreachable pull servers, then I think 3.7 NEEDS to >>> say more >>> >>> than >>> >>> just ~if you happen to magically figure out you can't reach the >>> server, >>> >>> discard data it has given you.~ From the rest of the text, this >>> is an >>> >>> important and unspecified protocol mechanism. >>> >> >>> >> >>> >> Figuring out whether/how you can reach other RBridges is a basic >>> >> function of TRILL IS-IS based routing, not something "magical". >>> >> Whenever their is a topology change, an RBridge MUST determine >>> routes >>> >> to all data reachable RBridges in the new topology. If there was >>> an >>> >> RBridge previously reachable but no longer reachable, as would be >>> the >>> >> case for all RBridges on the other side of a network partition, >>> this >>> >> MUST be noticed so that, for example, all MAC reachability >>> information >>> >> associated with each of the no longer reachable RBridges can be >>> discarded. >>> >> It does not seem like much of a stretch to believe that an >>> RBridge would >>> >> keep track of the Pull Directory or Directories it was using, each >>> of >>> >> which will be some other RBridge, and notice when a topology >>> change >>> >> makes any of them inaccessible. But I have no problem adding some >>> >> wording to make this clearer. >>> >> >>> >>> ... >>> >>> In the flooding flag and behavior, (long text elided) I don't >>> think there >>> >>> is >>> >>> anything wrong with the intended behavior. It is just that the >>> very >>> >>> brief >>> >>> description of the FL flag leads the reader to an incorrect >>> expectation. >>> >>> Yes, it gets sorted out, but that is not good. What I would >>> suggest is >>> >>> when >>> >>> the flag is defined (with whatever name you choose) note that >>> "for the >>> >>> qtypes 2,3,and 4, the flag indicates that the server should >>> flood its >>> >>> response." >>> >> >>> >> >>> >> We can work on clarifying the wording. >>> >> >>> >> Thanks, >>> >> Donald >>> >> ============================= >>> >> Donald E. Eastlake 3rd +1-508-333-2270 (cell) >>> >> 155 Beaver Street, Milford, MA 01757 USA >>> >> [email protected] <mailto:[email protected]> >>> >> >>> > >>> >>> >> >> _______________________________________________ >> trill mailing list >> [email protected] >> https://www.ietf.org/mailman/listinfo/trill >> > _______________________________________________ trill mailing list [email protected] https://www.ietf.org/mailman/listinfo/trill
