(Due to weirdness with email, the WGLC announcement for me came on DNSSD and not DNSOP. Shoulder shrug - something to debug for later.)
I was told of this last call: referring to the document at https://tools.ietf.org/html/draft-ietf-dnsop-session-signal-05 Overall - the approach looks promising. I would want to see this run through workshops to see how well it is compatible with the installed base before too much is built on top of it. The idea of state-per-channel has long been a taboo subject, one that I think should be broken, nevertheless, we have to understand this change in thinking. "Corner cases" are always a concern. Section 2, terminology # The unqualified term "session" in the context of this document means # the exchange of DNS messages over a connection where: # # o The connection between client and server is persistent and # relatively long-lived (i.e., minutes or hours, rather than # seconds). # # o Either end of the connection may initiate messages to the other. I fear that this may cause confusion down the line, given the history of how terms have been used in previous RFC documents. A session to those who recall the 7-layer OSI model conjures up ideas of end-to-end associations that make use of one or more transport"s", or connections. I.e., a session might use multiple, parallel in time, transport channels or may use successive in time transport channels. In this use, the DSO session refers to the management of the transport connection between two intermediate elements of the DNS. # A "DSO Session" is established between two endpoints The endpoints - is that the stub resolver and authoritative server (on in DNSSEC terms) the signer and the validator? I believe here it is the client-server actors for the channel. # A "DSO Session" is terminated when the underlying connection is # closed. This is in conflict with the older notion of a session persisting across transports. To support that notion, using (for convenience): https://en.wikipedia.org/wiki/Session_layer, this is stated "If a connection is not used for a long period, the session-layer protocol may close it and re-open it." Note - this is terminology, not conceptual. However, I've seen terminology become a greater problem "down the road" as new people come into the field. As a suggestion I'd work in that this is "DNS channel management". It sounds like the document is defining a DNS channel management protocol. Or perhaps DNS transport management protocol. Section 4.1 This section intermingles text on whether or not each DSO request elicits a response or not and the process of DSO "session" establishment. With proper editing, these should be separated to lessen confusion. Section 4.1.1 Requirements (first MUST) and recommendations to operate in a certain way tend to become dated quickly. Instead of placing requirements on clients to act good, let the server refuse workload. I am thinking of the issue surrounding the iterations in NSEC3 and recent surveys of operators. Despite the documents saying a low value is better, operators use high values. With that in mind, I'd not have "clients MUST take care". If not for the reasoning above, then "how does one test whether a client has taken care"? Instead, reinforce the notion that a server has the right to deny opening a connection on its own grounds (local policy), including load considerations. Section 4.1.3 This section reinforces the notion that this is "transport channel management" and not "session management in the OSI layer 5 sense." Section 4.2.1 This phrase is confusing and unnecessary "this is a fatal error". The logic to that point is clear that the situation doesn't happen and the prescribed behavior ("close the connection") makes sense. This is clumsy, when describing the RCODE: "generally set to zero on transmission, and silently ignored on reception, except". I'd suggest saying ... well reading it again, I don't understand what the paragragh is saying. It starts out with RCODE being 0 on send, then except when it conveys the reason for termination...which I'd expect to be in a response. However, it might be better to say that the RCODE value may be set according to the definition of the request, but in most cases, will be 0. Maybe? Section 4.2.2 " Unacknowledged request messages are only appropriate in cases where the sender already knows that the receiver supports and wishes to receive these messages." This passage causes concern in me. The entire notion of unacknowledged requests trouble me as a protocol designer. There are three outcomes of sending a request over a reliable transport - the receiver doesn't understand it, the receiver acts accordingly, or the receiver misinterprets the request. The latter includes software bugs (or other unintended consequences) and could cover outright refusal to perform. It's not that an acknowledgement is needed, the question is how the sender can confirm that the request was handled to the sender's expectations. (We have this problem, for example, with "Automated Updates of DNSSEC Trust Anchors" where there is no feedback loop.) Ok, I get this: "For example, after a client has subscribed for Push Notifications" as a plausible use case. In this case I see that it is not that the requests need message responses as turning them on and off is done via an acknowleded request action. Maybe it's the term that is confusing but I can see the concept. Perhaps these are not "unacknowledged requests" but "subsequent responses". Section 4.2.2.1 " Where domain names appear within TYPE-DEPENDENT DATA, they MAY be compressed using standard DNS name compression [RFC1035]." Do not do name compression! No No No No. The compression was originally defined to be for the well-known types (in the STD 13 documents), then became to be used for newer and newer ones up through DNSSEC - until someone realized that this is a mistake. Consult "Handling of Unknown DNS RR Types", specifically this: To avoid such corruption, servers MUST NOT compress domain names embedded in the RDATA of types that are class-specific or not well- known. This requirement was stated in [RFC1123] without defining the term "well-known"; it is hereby specified that only the RR types defined in [RFC1035] are to be considered "well-known". Section 4.2.2.4 "If DSO request is received containing an unrecognized Primary TLV, with a zero MESSAGE ID (indicating that no response is expected), the receiver MUST silently ignore the message. A response MUST NOT be sent." I would have thought this would warrant tearing down the connection given the words earlier that this ought never happen. (I'd want the receiver to alert the sender that there's perhaps a capability mismatch though.) Silently ignoring a problem is an example of a receiver acting in a manner that is not expected by the sender and leads to wedged state machines. Section 4.3 "The namespaces of 16-bit MESSAGE IDs are disjoint in each direction. For example, it is *not* an error for both client and server to send a request message with the same ID." This will, someday, confuse a young and inexperienced DNS hosting engineer. Precedent - having DNSKEYs with the same key_id is allowed by protocol but popular DNSSEC key management tools will discard any key matching another's key_id, to preserve the sanity of the human who will debug. Section 5.3 'Just because a DSO Session has no traffic for an extended period of time does not automatically make that DSO Session "inactive", if it has an active operation that is awaiting events.' There will be a need to fight cruft, or garbage collect. Inactive objects tend to be forgotten while still using up resources. There ought to be some means to manage what might be otherwise "forgotten." Like subscribing to events that are no longer coming. Section 5.5 "and attempt re-connection if appropriate." I thought that if a connection ends, the DSO session ends. Section 5.6.3.2 Sometimes a client can't distinguish this: "If reconnecting to the same server," as some server processes have multiple addresses and names. The following MUST ought to apply to "the same IP address" at best. (But if the underlying routing of anycast traffic changes, that's overkill.) Still make this rule IP address (and maybe port number) specific, not "server". Section 6.2 "The RECOMMENDED value is 10 seconds." Probably a bad idea to codify this because implementations will set it to 10 and not scatter it when it should be. (Like closing out many connections in a load-shedding panic.) _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop