Re: [DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03

2023-07-03 Thread Peter van Dijk
Hello Duane & others,

thank you for your response. Comments inline below.


On Thu, 2023-06-29 at 23:58 +, Wessels, Duane wrote:
> 
> 
> > 
> > ## 2.2
> > 
> > The first paragraph correctly mentions "policy reasons". The second 
> > paragraph
> > correctly says "they are not authoritative". I am not sure not being
> > authoritative can be considered a policy reason, so perhaps these two
> > paragraphs can be connected with an "or"?
> 
> I see your point.  We propose this change to the introduction sentence:
> 
> A name server returns a message with the RCODE field set to REFUSED
> when it refuses to process the query, e.g., for policy or other reasons.

Works for me.

> 
> > 
> > ## 3.1
> > 
> > "A resolver MUST NOT retry a given query over a server's transport  more 
> > than
> > twice" - should this be clarified to say "in a short period of time" or
> > something like that? Clearly a retry is allowed *eventually*.
> 
> For reference, here’s the sentence in question at the start of 3.1:
> 
>    A resolver MUST NOT retry a given query over a server's transport more
>    than twice (i.e., three queries in total) before considering the
>    server's transport unresponsive for that query.
> 
> We feel that “a given query” and “for that query” in the sentence 
> sufficiently limits the
> scope here, and there is no need to qualify it by some amount of time.
> 
> As an example, let’s say that a recursive has been asked to lookup 
> www.example.com (our “given” query).  The example.com zone has two name 
> servers, each of which has two IP addresses, and (presumably) two transports. 
>  It can send 3 queries to 199.43.135.53 over UDP (then that transport is 
> unresponsive), 3 queries to 199.43.133.53 over UDP, same over TCP, over IPv6, 
> and so on.  In total the recursive can send 2x2x2x3 = 24 queries before it 
> has to give up if all servers and all transports are unresponsive. At this 
> point the resolver gives up on that query and returns SERVFAIL.
> 
> Then, section 3.2 is about caching and says that the resolution failure MUST 
> be cached for at least 5 seconds, but otherwise gives implementations a lot 
> of freedom in how to do that.  Could be by query tuple, by server/transport, 
> or some other way.

Right! 3.2 solves this.

> > Also, "MUST NOT" is pretty strong language. Given the various process 
> > models of
> > resolver implementations, two subprocesses (threads) both retrying the same 
> > or
> > a similar thing a few times can not always be avoided. Would you settle for
> > SHOULD NOT? The "given" in "retry a given query" gives some leeway, but not
> > enough, I feel.
> 
> We feel that MUST NOT is appropriate but would like more input from working 
> group
> members and implementors especially.

Ok

> > "may retry a given query over a different transport .. believe .. is 
> > available"
> > - this ignores that some transports have better security properties than
> > others. One currently active draft in this area is
> > draft-ietf-dprive-unilateral-probing. Perhaps add some wording, without 
> > being
> > too prescriptive, such as "available, and compatible with the resolver's
> > security policies, ..".
> 
> We think “compatible with the resolver’s security policies” goes without 
> saying, but don’t mind making it explicit.

I am inclined to agree, and will leave this for others to judge.

> > 
> > ## 3.2
> > 
> > A previous review
> > (https://secure-web.cisco.com/1-uwEOxF71cZbW0W3ux-QNC1pO0bJjYJvc0KHnZ_wN4Xw3M1XWB_K8diPjdzzV1zzAfZ98vObLHcs-9USjQPtEzxOdqnjHtcYGPxv8yID-fDRYNW8i8BtGJL-qahSS-JHbS3LHL6Bfm0duG-nUUKdSZF_MOoDFhQymCFnu838N4-l8Ky7xjoVKijU3pbZHLVQFpxjYecSLm0hqLoc4GW9n2Ri-vYT-lKiSPl5qB72Q1kbSUp21qnHSMMrfCCEizICDfjVzCKrwtau5DkwfiR7PVxgh2wT1twgX8oVBhJIY-0QfTaJLnHg7itWRgwH3tcX/https%3A%2F%2Fmailarchive.ietf.org%2Farch%2Fmsg%2Fdnsop%2FsJlbyhro-4bDhfGBnXhhD5Htcew%2F)
> > suggested that the then-chosen tuple was not specific enough, and also said 
> > it
> > was too prescriptive. I agree with both. The current draft prescribes 
> > nothing,
> > which I'm generally a fan of!
> > 
> > However, speaking to a coworker (the one likely responsible for implementing
> > this draft, if it turns out our implementation deviates from its final form)
> > told me "some guidance would be nice". After some discussion on
> > prescriptiveness, here is our suggestion: do not prescribe, but mention
> > (without wanting to be complete) a few tuple formats that might make sense, 
> > and
> > suggest that implementations document what they choose here.
> 
> The relevant text here currently says:
> 
>    The implementation might cache different resolution failure conditions
>    differently.  For example, DNSSEC validation failures might be cached
>    according to the queried name, class, and type, whereas unresponsive
>    servers might be cached only according to the server's IP address.
> 
> So we provide two examples, although not really phrased as “tuples”.  I guess 
> you’re suggesting to see more 

Re: [DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03

2023-06-29 Thread Wessels, Duane
Hi Peter,

Thank you for the detailed review.  Responses from the authors are inline below.



> On Jun 26, 2023, at 7:47 AM, Peter van Dijk via Datatracker 
>  wrote:
> 
> Reviewer: Peter van Dijk
> Review result: Almost Ready
> 
> I have been selected as the DNS Directorate reviewer for this draft. The
> DNS Directorate seeks to review all DNS or DNS-related drafts as
> they pass through IETF last call and IESG review, and sometimes on special
> request. The purpose of the review is to provide assistance to the ADs.
> For more information about the DNS Directorate, please see
> https://secure-web.cisco.com/1C33i4Dh-oRD64xG7a2vNsTYqSkPmK_TtaZK5jIi1iKC1wcUTEdFfOLVw2n8nSUfJfqkbodz3_NdwV50FyoYhfQgSizKNv3M_ep9Yx9lkNt6oLgHQ4Vzp63kYnLezQGHdj0sornqx_SXzMbvj1BAzh7zPtyqF42myyeJXxiCyrr6e3QAdSUEYSAtaWewUcT8nEODhdq9BzBue8j3jhmGSCvOkEqY7Y8Tgfwi52mCcXFC6XNOjWYytI5iJuWrXd6WvjALuITYr8tv11C5mJVNaMj6-jLCgBzJbwJt0tkCxrL4MjxM4GuoUUvZ29P9i3K07/https%3A%2F%2Fwiki.ietf.org%2Fen%2Fgroup%2Fdnsdir
> 
> This document is generally in good shape. It is not too prescriptive, leaving
> room to implementers to honour the requirements in a way that makes sense for
> their implementations. The document has not seen a lot of WG discussion; I 
> hope
> this means people have read it and generally agree.
> 
> Section 3.3 contains a "FOR DISCUSSION" note. I believe this means the 
> document
> cannot currently pass Last Call. (See below for some notes on that discussion
> item.)
> 
> I have various nits and suggestions. Please see them below. Section numbers 
> are
> for -03.
> 
> ## 2
> 
> I know section 2 is not meant to be exhaustive, but I wonder if FORMERR
> deserves a mention. In theory, a FORMERR response will not improve with time
> until an auth operator actively intervenes (by updating/replacing software to 
> a
> more compliant version). SERVFAILs, by comparison, can be much more transient.

Good suggestion, we’ve added a short section on FORMERR.

> 
> ## 2.1
> 
> current:
> 
>> Authoritative servers, and more specifically secondary servers,
>> return server failure responses when they don't have any valid data
>> for a zone.  That is, a secondary server has been configured to serve
>> a particular zone, but is unable to retrieve or refresh the zone data
>> from the primary server.
> 
> proposed:
> 
>> Authoritative servers, and more specifically secondary servers,
>> return server failure responses when they don't have any valid data
>> for a query.  For example, a secondary server has been configured to serve
>> a particular zone, but is unable to retrieve or refresh the zone data
>> from the primary server.

Yes, thanks.

> 
> ## 2.2
> 
> The first paragraph correctly mentions "policy reasons". The second paragraph
> correctly says "they are not authoritative". I am not sure not being
> authoritative can be considered a policy reason, so perhaps these two
> paragraphs can be connected with an "or"?

I see your point.  We propose this change to the introduction sentence:

A name server returns a message with the RCODE field set to REFUSED
when it refuses to process the query, e.g., for policy or other reasons.


> 
> ## 2.3
> 
> "If, however, the implementation does not join outstanding queries together,
> ..." - this could use a reference to 5452 4.3 and 5452 5, pointing out that
> implementations really should be joining queries together for security reasons
> whenever they can, beside the reason given in the draft of not overloading
> authoritatives.

Added.

> 
> ## 3.1
> 
> "A resolver MUST NOT retry a given query over a server's transport  more than
> twice" - should this be clarified to say "in a short period of time" or
> something like that? Clearly a retry is allowed *eventually*.

For reference, here’s the sentence in question at the start of 3.1:

   A resolver MUST NOT retry a given query over a server's transport more
   than twice (i.e., three queries in total) before considering the
   server's transport unresponsive for that query.

We feel that “a given query” and “for that query” in the sentence sufficiently 
limits the
scope here, and there is no need to qualify it by some amount of time.

As an example, let’s say that a recursive has been asked to lookup 
www.example.com (our “given” query).  The example.com zone has two name 
servers, each of which has two IP addresses, and (presumably) two transports.  
It can send 3 queries to 199.43.135.53 over UDP (then that transport is 
unresponsive), 3 queries to 199.43.133.53 over UDP, same over TCP, over IPv6, 
and so on.  In total the recursive can send 2x2x2x3 = 24 queries before it has 
to give up if all servers and all transports are unresponsive. At this point 
the resolver gives up on that query and returns SERVFAIL.

Then, section 3.2 is about caching and says that the resolution failure MUST be 
cached for at least 5 seconds, but otherwise gives implementations a lot of 
freedom in how to do that.  Could be by query tuple, by server/transport, or 
some 

Re: [DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03

2023-06-26 Thread Peter van Dijk
On Mon, 2023-06-26 at 07:47 -0700, Peter van Dijk via Datatracker wrote:
> ## 3.2
> 
> A previous review
> (https://mailarchive.ietf.org/arch/msg/dnsop/sJlbyhro-4bDhfGBnXhhD5Htcew/)
> suggested that the then-chosen tuple was not specific enough, and also said it
> was too prescriptive. I agree with both. The current draft prescribes nothing,
> which I'm generally a fan of!
> 
> However, speaking to a coworker (the one likely responsible for implementing
> this draft, if it turns out our implementation deviates from its final form)
> told me "some guidance would be nice". After some discussion on
> prescriptiveness, here is our suggestion: do not prescribe, but mention
> (without wanting to be complete) a few tuple formats that might make sense, 
> and
> suggest that implementations document what they choose here.

I can't believe I forgot this bit:

While this document is in draft status, an implementation status section
would be great. This would allow us to see if the document is
implementable as is (I'd hope so, as it describes behaviour, with plenty
of developer freedoms), and if implementers make choices for which it
turns out it *does* make sense to perhaps write them down normatively.

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


[DNSOP] Dnsdir last call review of draft-ietf-dnsop-caching-resolution-failures-03

2023-06-26 Thread Peter van Dijk via Datatracker
Reviewer: Peter van Dijk
Review result: Almost Ready

I have been selected as the DNS Directorate reviewer for this draft. The
DNS Directorate seeks to review all DNS or DNS-related drafts as
they pass through IETF last call and IESG review, and sometimes on special
request. The purpose of the review is to provide assistance to the ADs.
For more information about the DNS Directorate, please see
https://wiki.ietf.org/en/group/dnsdir

This document is generally in good shape. It is not too prescriptive, leaving
room to implementers to honour the requirements in a way that makes sense for
their implementations. The document has not seen a lot of WG discussion; I hope
this means people have read it and generally agree.

Section 3.3 contains a "FOR DISCUSSION" note. I believe this means the document
cannot currently pass Last Call. (See below for some notes on that discussion
item.)

I have various nits and suggestions. Please see them below. Section numbers are
for -03.

## 2

I know section 2 is not meant to be exhaustive, but I wonder if FORMERR
deserves a mention. In theory, a FORMERR response will not improve with time
until an auth operator actively intervenes (by updating/replacing software to a
more compliant version). SERVFAILs, by comparison, can be much more transient.

## 2.1

current:

> Authoritative servers, and more specifically secondary servers,
> return server failure responses when they don't have any valid data
> for a zone.  That is, a secondary server has been configured to serve
> a particular zone, but is unable to retrieve or refresh the zone data
> from the primary server.

proposed:

> Authoritative servers, and more specifically secondary servers,
> return server failure responses when they don't have any valid data
> for a query.  For example, a secondary server has been configured to serve
> a particular zone, but is unable to retrieve or refresh the zone data
> from the primary server.

## 2.2

The first paragraph correctly mentions "policy reasons". The second paragraph
correctly says "they are not authoritative". I am not sure not being
authoritative can be considered a policy reason, so perhaps these two
paragraphs can be connected with an "or"?

## 2.3

"If, however, the implementation does not join outstanding queries together,
..." - this could use a reference to 5452 4.3 and 5452 5, pointing out that
implementations really should be joining queries together for security reasons
whenever they can, beside the reason given in the draft of not overloading
authoritatives.

## 3.1

"A resolver MUST NOT retry a given query over a server's transport  more than
twice" - should this be clarified to say "in a short period of time" or
something like that? Clearly a retry is allowed *eventually*.

Also, "MUST NOT" is pretty strong language. Given the various process models of
resolver implementations, two subprocesses (threads) both retrying the same or
a similar thing a few times can not always be avoided. Would you settle for
SHOULD NOT? The "given" in "retry a given query" gives some leeway, but not
enough, I feel.

"may retry a given query over a different transport .. believe .. is available"
- this ignores that some transports have better security properties than
others. One currently active draft in this area is
draft-ietf-dprive-unilateral-probing. Perhaps add some wording, without being
too prescriptive, such as "available, and compatible with the resolver's
security policies, ..".

## 3.2

A previous review
(https://mailarchive.ietf.org/arch/msg/dnsop/sJlbyhro-4bDhfGBnXhhD5Htcew/)
suggested that the then-chosen tuple was not specific enough, and also said it
was too prescriptive. I agree with both. The current draft prescribes nothing,
which I'm generally a fan of!

However, speaking to a coworker (the one likely responsible for implementing
this draft, if it turns out our implementation deviates from its final form)
told me "some guidance would be nice". After some discussion on
prescriptiveness, here is our suggestion: do not prescribe, but mention
(without wanting to be complete) a few tuple formats that might make sense, and
suggest that implementations document what they choose here.

## 3.3

> FOR DISCUSSION: the requirement quoted above may be problematic
> today.  e.g., focusing on NS as the query type (a) probably goes
> against qname minimization, and (b) is not the real problem.  Also
> RFC 4697 doesn't place any time restriction (TTL) on this.

*Before* qname minimization, queries that yield delegation answers often did
not have type NS. With qname minimization, depending on the implementation,
those queries might in fact be NS. (7816 specifies NS; 9156 relaxes the qtype
requirement for qname-minimized queries). That said, there is no reason for the
requery (which, as this draft reiterates, MUST NOT be done) to use NS, and so,
I do agree the focus on the NS type should be removed.

As for TTL, the originally received delegation will eventually expire, so the