Hi all,

On 27/03/2023 10:24, Stephane Bortzmeyer wrote:
* Unbound implementation is not ready, but I let Yorgos elaborate on
this point.
The Unbound implementation is far from ready but the hackathon time was well spent to identify needed changes to Unbound to cleanly support unilateral probing and to look closely at the draft.

I will continue with the development in the future and report back here with the results. Some initial notes for that if you are interested:
- The feature is going to be off by default;
- When turned on, the default further probing configuration will be to
  actively probe new servers in an attempt to ease testing;
- Retaining data across reset as per section 4.5 will not be included,
  at least in the initial implementation.

Now on with my comments for the draft, sorry for the wall of text :)


## A - ALPN
In section 4.4 there is mention of ALPN for the resolver (a MUST if I read it correctly) but there is no mention of ALPN for the authoritative side in the document.


## B - Resolver source IP
Section 4.5.1 describes keeping state based on the resolver's own source IP. This is to support the guidance from section 3.1 where it says:

      To avoid incurring additional minor timeouts for such a recursive
      resolver, the pool operator SHOULD either:
      * ensure that all members of the pool enable the same encrypted
        transport(s) within the span of a few seconds, or
      * ensure that the load balancer maps client requests to pool
        members based on client IP addresses.

My interpretation of this text is that the first bullet point is for offering the same transport service with a slight hiccup during update, whereas the second bullet point is for offering different transport services on individual servers of the pool.

The worst case for the former is that the pool is going to be labeled as supporting encryption at most 1 day (damping variable) later, based on which servers are reached from the pool. This looks fine for me and no extra state keeping (i.e., resolver own source IP) is needed.

I find trying to keep extra state per resolver source IP for the latter case particularly challenging. Especially if the resolver is not configured with explicit outgoing interfaces, thus default route, and needs to observe its own source address from the reply, which may not be available next time around thus giving bind()/send() errors and introducing retry code paths. All this while the measure does not guarantee to solve the different-transport-service-behind-a-single-IP case as it depends heavily on the network. I understand that partial rollout is meant to test the waters for an authoritative operator but I believe using a separate IP for enabling DoT and/or DoQ for testing would make things simpler for both sides.

I don't have an operator's hat but is a pool with variable transport services something that we actively want to support?


## C - Failure identification
There is mention in the draft about successful and unsuccessful DNS replies.
SERVFAIL is used as an example of an unsuccessful DNS reply.
Following the pseudo code in the draft, a SERVFAIL answer in all the transports, which IMHO is an already usable DNS answer for the resolver, will make the resolver to wait for all the transport replies before considering using the SERVFAIL as the final answer.

My opinion is that any RCODE in the reply is a successful DNS answer (of course with matching ID, qname, etc). Otherwise we introduce something like a healthcheck per transport, see which transport replies "better" and use that. I believe this aligns with Stephane's observation during the hackathon about different answers on 53 and 853 and needs addressing in section 3 to clearly state that a nameserver's reply to a given query must be the same regardless of the transport used (maybe not the best text if TC is also to be considered but I hope I get my message across :)

Maybe also define an unsuccessful "reply" as timeout/connection shutdown instead of non-preferable RCODEs? There is already logic in resolvers to handle different RCODEs.

What I am trying to say is to not base the usability of the encrypted transport on the DNS replies themselves. IMHO as long as there are DNS replies there, the encrypted transport is usable and preferable.


## D - Wording knit
In sections 4.6.2 and 4.6.9 the following is said:

     If R is successful:
     - Return R to the requesting client

It may well be the case that the R is to an internal query and there is no requesting client waiting for an answer. Would the following work better?

     If R is successful:
     - R is further processed by the resolver


## E - Possible bug
In sections 4.6.2 and 4.6.9 the following is said after receiving a successful reply:

    - If Q is in N-queries[X]:
      - Remove Q from N-queries[X]

I believe this is a bug and needs to be removed since future, slower replies from the N transport will not be allowed to update the relevant metrics as section 4.6.9 will stop further processing by the following text:

    If Q is not in E-queries[X]:
    - Discard R and process it no further (do not respond to a encrypted
      response to a query that is not outstanding)


In general I support the idea of the draft but I believe we need to iron out the expectations on both sides, also regarding Florian's recent comments about per zone answers and thread-intelligence systems behavior.

Thanks for considering and best regards,
-- Yorgos


Some questions were raised about the draft, giving the experience with
PowerDNS Recursor:

* If the ADoT server replies but the reply indicates an error,
   such as SERVFAIL or REFUSED, should the resolver retries without
   DoT? PowerDNS recursor does it, but it seems it would make more
   sense to accept the reply, and just to remind system
   administrators that port 853 and 53 should deliver consistent
   answers. The draft seems clear on the first point (as long as
   there is a properly formatted DNS request, regard the server as
   DoT-enabled) but not on the second (no clear reminder for
   authoritative name servers).
* What should be the criteria to select an authoritative name
   server to query? Should we prefer a fast insecure server or a slow
   encrypted one? The draft does not mention it, because it is local
   policy. (PowerDNS recursor has apparently no way to change its
   default policy, which is to use the fastest one, DoT or
   not.) The draft does not mandate such a knob in the authoritative
   server, again, IETF typically does not tell endpoints how they have
   to be configured.


_______________________________________________
dns-privacy mailing list
dns-privacy@ietf.org
https://www.ietf.org/mailman/listinfo/dns-privacy

Reply via email to