Document: draft-ietf-opsawg-rfc5706bis
Title: Guidelines for Considering Operations and Management in IETF 
Specifications
Reviewer: Dave Thaler
Review result: On the Right Track

I am the assigned iotdir reviewer for this draft. For background on iotdir,
please see the [FAQ](https://wiki.ietf.org/en/group/iotdir). Please resolve
these comments along with any other comments you may receive.

A marked-up PDF copy with my comments inline is at
https://1drv.ms/b/c/dc2b364f3f06fea8/IQBuO9rPPwGxRLZZi9kQSJncAT_Aktrn9MuIurcZp88NRjs?e=2boyoh

I found the checklist of key questions in Appendix A to be well written,
useful, and widely applicable to areas across the IETF, including IoT
protocols.  Similarly the content in the body is well-written and useful.

I do have a bunch of comments on the body of the document however to make it
more widely applicable to areas across the IETF. A summary of my main
technical feedback follows, and many other minor editorial points (which
I don't expect should need WG discussion) can be found in my marked up
PDF copy.

1) Applicability of requirement: Three different places in the document make
   three different, contradictory, statements about which RFCs would be
   required to have this section.
      a) abstract says all "RFCs in the IETF Stream"
      b) 3.1 says all IETF RFCs "that document a technical specification"
      c) Appendix B says "all new Standards Track RFCs"
   I think section 3.1 is the best and the abstract and Appendix B should
   both be changed.

2) Architecture RFCs: Most places in the document are consistent in saying
   documents that specify a "New Protocol" or "Protocol Extension", but one
   place in section 3.1 throws in "or an architecture".  Generally speaking,
   an implementation does not claim conformance to an architecture/framework
   document, and so depending on how it is written and the content it may not
   be considered a “technical specification”, just a roadmap document. In
   that case, the previous paragraph would not require it in such an
   architecture document. Furthermore, elsewhere in the document, like the
   abstract, focused on requiring it in New Protocols and Protocol Extensions.
   As such, I’d remove “or an architecture”. It might be ok in the preceding
   paragraph to clarify that “anything an implementation would claim
   conformance to is considered a technical specification”, and in my view
   that would cover it.

3) Requirements around individual draft -00 submissions: Section 3.1 says
   "early revisions of Internet-Drafts are expected to include an
   Operational Considerations section".  I'd find it a huge process hurdle
   to “expect” all -00 versions of individual drafts to have such a section
   as that would discourage many new entrants from participating in the IETF.
   I might say "encouraged" instead of "expected".

4) Operator: I found much of the document, as currently worded, to be way too
   _network_ operator focused, for a document that creates a requirement for
   all areas, including IoT. Some places say "network operator" and other
   places just say "operator".  If you widen the term "operator" to be any
   person or organization responsible for managing the protocol
   implementations, then "operator" is fine but it should be added to the
   Terminology section.  E.g., is a cloud hosting service an "operator"?  Is
   a standalone DNS server admin an "operator"?  Is an NTP server admin an
   "operator"?  In a home network, is the household member who configures
   devices an "operator"?  I'd want the definition to be such that the answer
   to all of those is Yes (or else pick a different term that is generic),
   so that the recommendations in the document are as widely applicable and
   useful as possible.  The checklist in Appendix A certainly is good already.

   Similarly there are a bunch of places that only talk about "their network"
   (e.g., section 4.6) and "impact ... on the network" (e.g., section 4.7),
   rather than about "their devices and bandwidth" or whatever.   Impact on
   the network is good to talk about but from an operations perspective, the
   impact on hosts/devices is also important in my view, and largely missing
   it seems.

5) Network Operation: Section 4.5 contains a statement:
   > If the protocol specification requires changes to end hosts, it
   > should also indicate whether safeguards exist to protect networks
   > from potential overload.
   
   This statement seems asymmetric and biased in terms of only being from the
   perspective of a network operator. Shouldn’t there be a similar statement
   that if a protocol specification requires changes to routers it should
   indicate whether safeguards exist to protect hosts from potential
   overload? My point is really that it seems to be more about protecting
   one organization from entities that aren’t under their control. In some
   cases the hosts/servers may be more strictly managed than the network
   boxes (e.g., in some home networks), and indexing on host vs network is,
   in my view, not the right axis here if one is going to be asymmetric in
   recommendation. My point is consistent with the wording in 2.1.2 of
   RFC 5218 “Protocols that can be deployed by a single group or team … have
   a greater chance of success than those that require cooperation across
   organizations“ (which makes no distinction between network vs host per se).

   Section 5.4.4 (Fault Isolation) is ok but seems overly network centric.
   Say you have a docker container that is misbehaving in some way… the host
   could isolate or quarantine the container. Same for VMs. Or say you have
   a process in a host that is misbehaving… the kernel could isolate or
   quarantine the process. I’d make the wording here more generic and less
   network operator centric. Operations and management is about more than
   just network operators per se.  The guidance is good and just using more
   generic terminology here in terms of stating the principles would make
   the section stronger and more impactful in my view.

6) Internationalization: Section 4.8 suggests that English should be the
   default language in implementations for human readable messages.  I don't
   think this document should make any such recommendation.  I do, however,
   recommend adding that it must also be possible to identify which language
   a message intended for humans is in (e.g., via a language tag). Otherwise,
   it cannot be reliably displayed correctly.
 
   Section 5.5 also has an internationalization issue.  It cites an IAB
   workshop RFC (where such RFCs reflect the consensus of workshop
   participants, not the IAB or IETF per se), and then makes a blanket
   statement about configuration files that "human-readable strings should
   utilize UTF-8" which comes across as saying this is now an IETF consensus
   statement.  There is IETF consensus on UTF-8 _in protocols_, and more
   specifically UTF-8 with NFC (see section 2 of RFC 5198, which can be
   cited as a normative reference here) but not in _device-local files_.
   The IETF has no recommendation about files since they’re outside the
   scope of Protocols per se. Different OS's already diverge in terms of
   both normalization form and UTF-8 vs UTF-16. Hence either change the
   text to be about strings in protocols (not textual configuration files
   like the preceding sentence says) or make it clear that it is not an
   IETF recommendation, or else be prepared for an IETF-wide discussion that
   will never converge. 

7) Information Model Design: The document nicely recommends in point 1 of
   5.3.1 to "start with a small set of essential objects", which is great.
   I’ve seen cases where someone just exposes everything just because it’s
   there, not because there’s any need (“someone might want it”). As a
   result, querying all state can be burdensome since it can be large and/or
   expensive to query a given value, and can also disincent someone from
   implementing the mechanism for querying them as too burdensome to
   implement. To determine what is “essential”, I usually recommend
   determining what questions need to be answered to troubleshoot, configure,
   etc. and exposing the things that are needed to answer those questions.
   It might help to say something like this to help readers understand what
   is “essential” here.  And I think that's consistent with the purpose of
   Appendix A's checklist.

   Point 2 in 5.3.1 says "Require that all objects be essential for
   management" but I don't follow what that means.  Elaborate.

   Point 6 says "Avoid causing critical sections to be heavily instrumented"
   I think it’s not just “critical sections” per se, but anything that would
   be expensive. E.g., if someone wants to expose a summary object _rather
   than the components of it from which the client could do the computation_
   it would still meet criteria 4, but may be expensive to compute.

8) Liveness Detection: section 5.4.1 says:
   > Protocol Designers should always build in basic testing features
   > (e.g., ICMP echo, UDP/TCP echo service, NULL RPCs (remote procedure
   > calls)) that can be used to test for liveness, with an option to
   > enable and disable them.

   I’m not convinced there aren’t exceptions, such as maybe for very
   constrained IoT devices. Recommend removing “always” and just leaving
   it as lower case “should” like other statements in this doc.

9) Configuration Management: Continuing my theme of making the document
   less network-operator-centric in order to apply more generally, including
   to IoT cases... section 5.5 (Configuration Management) comes across to
   me as overly network centric for the section title which is nicely
   generic. So if you manage a bunch of end hosts, or a bunch of Kubernetes
   pods, or a bunch of IoT devices, or a bunch of VMs on a cloud service,
   or a bunch of processes on one or more devices, this section should
   still apply, but it provides little or no guidance. Either change the
   section title and narrow the scope, or else (my preference) broaden the
   discussion. For example, it would be remiss to not mention Kubernetes
   in a general discussion of configuration management. Similarly, for
   IoT devices there are various centralized configuration management
   services such as Balena, SocketXP, Golioth, ThingsBoard, etc.  One need
   not name them (I wouldn't), but simply acknowledging the existence of
   popular centralized management platforms would seem appropriate.

10) Operational Consideration section: The rest of the document already
   says that this section shouldn't be required in documents that aren't
   technical specifications and section 3.1 specifically uses process
   documents as an example of when they're not required.  Since this
   document itself is a process document, it's not required, so why is
   it here?  If you do keep this section, you could say that explicitly
   that it's not required in a document of this type, so people don’t try
   to use this as a precedent to create barriers that aren’t required.

11) Network Device: This term is used in several places (e.g., section 9
   among others) without definition.  Is it "a device managed by a
   network operator"?  Is it "any device on the network, whether
   router or end host"?  Is it "a device that implements the New Protocol
   or Protocol Extension in question"?  If you use this term, an entry
   in the Terminology section might help.

12) Password-based authentication: Section 9 (Security Considerations)
   says "The security implications of password-based authentication should
   be taken into account when designing a New Protocol or Protocol
   Extension."  True but this should already be stated in other RFCs,
   not specific to O&M considerations per se. So is this sentence really
   needed in _this_ document too?  It seems anachronistic to me, even
   though it's clearly good advice.

Dave Thaler


_______________________________________________
OPSAWG mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to