On Fri, Dec 10, 2021 at 5:03 PM Daniel Kahn Gillmor
wrote:
> Hi Ben--
>
> Thanks for the prompt review and feedback!
>
> On Thu 2021-12-09 16:14:36 -0500, Ben Schwartz wrote:
> > The SNI guidance looks good to me.
> >
> > I find it confusing to mention ECH in this draft. ECH can never be used
> > with this specification, because there is (by definition) no SVCB record
> to
> > provide the ECH keys. (If there is a SVCB record in play, then we are no
> > longer in "unilateral probing".)
>
> given that we haven't established what the signalling mechanism
> is/should be for authoritative dprive, i'm not entirely sure that we're
> out of the realm of unilateral probing here. (for example, does the
> mere presence of SVCB indicate a hard-fail condition?) I think that's
> up to the other draft. ☺
>
The terminology section says
* "unilateral" means capable of opportunistic probing deployment
without external coordination with any of the other parties
To my eye, that excludes any way of delivering ECHConfigs, whether or not
SVCB becomes the preferred mechanism for that.
In general, I think this draft should try to be clear that it is restricted
to the case where no additional DNS queries are performed.
That said, i don't think i'd have an objection to removing the ECH
> reference (or at least trimming it down to only be relevant for the
> discussion of potential leak due to SNI in the privacy considerations
> section)
>
> > I did notice one issue with -01:
> >
> > To avoid incurring additional minor timeouts for such a recursive
> >> resolver, the pool operator SHOULD either:
> >>
> >> * ensure that all members of the pool enable the same encrypted
> >>transport(s) simultaneously, or
> >>
> >> * ensure that the load balancer maps client requests to pool
> members
> >>based on client IP addresses.
> >
> > The first option seems a bit unrealistic.
>
> It might be unrealistic for some pool operators, but it's surely not
> unrealistic to all pool operators (for some plausibly-fuzzy definition
> of "simultaneously")
>
Perhaps "within the span of a few seconds" would be clearer.
> I would replace it with "ensure that any members of the pool return an
> > explicit rejection packet (e.g. TCP RST) if they do not support the
> > encrypted protocol, or".
>
> While this is good guidance in general for authoritative servers (i'd
> include "ICMP port unreachable" in list alongside TCP RST, and maybe
> some QUIC-specific signalling?), i'm not convinced it belongs in this
> section about authoritatives behind a pool.
>
> In particular, i don't think the consequences of this approach would
> yield a healthy pool, which is why i didn't include it in the list
> initially.
>
> First, if a pool's load balancer can't reliably map traffic from the
> client at IP address X to pool member Y at all, then any sort of
> stream-based protocol (whether that's DoT or DoQ or even Do53 over TCP)
> is going to fail in pretty terrible ways.
>
I'm not sure this is true. A 5-tuple load balancer, for example, would
preserve stream continuity but fail for the purpose of this section.
I also don't think IP-based load balancing is technically sufficient. For
large resolvers with multiple "exit" IPs, there is (currently) no
requirement that the state estimate for a given destination IP be
partitioned by the resolver's exit IP.
If we assume that the load balancer is capable of allocating persistent
> stream-like flows (including QUIC sessions if DoQ is in the mix?) to
> specific pool members, but randomly allocates stream-initiating packets
> from the same client IP address to different pool members, then we'll
> have the problem that a client will "learn" that an encrypted transport
> is available to that authoritative, and upon the next stream initiation
> might land on a pool member that doesn't implement that encrypted
> transport.
>
> In effect, the presence of encrypted transport will "flap" for any given
> client.
>
> While the consequences will be relatively small (even if the RST or port
> unreachable messages are swallowed by the network, the default `timeout`
> parameter for establishing the encrypted transport has fast expiry),
> clients will still incur at least an extra serialized round-trip on each
> "flap",
Yes, but this is no worse than the handshake of the encrypted transport we
are seeking to bootstrap, so it's a performance cost that will be borne
anyway.
and if the allocations are truly random they'll be frequently
> "damped" into not trying encrypted transport for a full day.
>
This is interesting. Maybe the long damping should only apply if the
request timed out, as opposed to being rejected within a few milliseconds.
How would you feel about adding the guidance you suggested more
> generally to the overall guidance for authoritative servers?
I don't have specific thoughts on how to structure the guidance yet.
___