On Tue, Apr 14, 2020 at 12:06:41PM -0400, Rich Felker wrote:

> > Well, ISP resolvers and anycast resolvers from Google, Cloudflare,
> > Verisign and Quad are generally not too far away.
> 
> If you're on dialup or saturated DSL or cellular link, they're easily
> 300-1000 ms away. Each round trip costs that.

Yes, but the alternative is not getting an answer at all.  I don't see
the point of a fast wrong answer.  The TCP failover does not impact the
vast majority of queries that don't come back with TC=1, so what exactly
is the problem?

> In any case this is a separate topic.

It isn't exactly, because large RRsets get dropped over UDP, and all
you get is an essentially empty response with TC=1.  This can strip
TLSA RRsets, SRV RRsets, multihomed machines with many IPv6 IPs, ...

Again, correctness first, then performance, especially when the
performance cost is only incurred in a minority of cases where
otherwise you get the wrong answer.

> > That was quite some time ago...  This is no longer a problem that needs
> > to be addressed by clients.
> 
> Given your above assumption that everyone is on fiber or similar, I
> think you might be a bit optimistic about what we can rely on...

I am not assuming that everyone is on fiber or similar, I am only
saying that the forwarders for stub resolvers tend to be nearby.
The TCP retry queries the *nearest* resolver that first returned
a TC=1 UDP response, and additional TCP retries can be abandoned
when the first complete answer arrives.  If you're doing DNS over
a dialup link, then it will take longer to get the right answer
than over "broadband".

> > That RFC was published in 2013.  That's long enough ago.
> 
> We support environments that haven't been touched since 2009 or so,
> and to a lesser/minimal-support extent ones that haven't been touched
> since around 2004. Your idea of environments Postfix might be running
> on musl in is very different from the concept of environments that
> arbitrary applications binaries linked to musl might be running in.

Nevertheless, the AD bit is on by default in dig and similar tools, with
no reports of any issues in a long time.  Do you see dig fail where MUSL
libc lookups succeed?  I'm asking around the DNS community for any
evidence of barriers to AD=1, so far nobody knows of any.  I'll try to
find more compelling evidence, but basically tolerating AD=1 (either
ignoring or acting on it per the 2013 RFC) is *required* resolver
behaviour.

> So if there's any chance of this breaking there almost certainly needs
> to be a way to turn it off that works even on static binaries.

Whether and where to place such controls is your call.  If novel
/etc/resolv.conf options are not a problem for statically linked
binaries using something other than musl-libc, then you could
have:

    options noad ...

but if that is incompatible with other stub resolver libraries on the
same machine, you may need a private musl-specific configuration file.

My money is on this being unnecessary.  I'll let know what I find
from dns-operations, and if possible perhaps a RIPE ATLAS probe,
assuming they support enabling AD=1.

> > In that case, set the AD bit unconditionally, or provide a documented
> > mechanism to do so via a suitable configuration file.
> 
> Putting it in resolv.conf on an options line is probably the best. The
> main remaining question is just which default to use, and where to
> apply it (at res_mkquery or at res_send).

Your call.

> > Find me a resolver that fails when the AD bit is set.  Stub resolvers
> > that always set it have been around for some time now.
>
> Do you know if the usual Windows, Android, iOS, etc. ones always set
> it? If so it's almost surely safe to do so and this might not even
> need to be an option (which would really be my favorite coarse of
> action -- making it unconditional so there's no new configuration to
> invent).

Mostly dig, unbound-host, ... Most of the platform C libraries support
DO=1, which obviates the need for AD=1, so they don't do that, but it is
nevertheless safe.  AD=1 is much cheaper than DO=1, because you get back
just the AD bit without the excess RRSIG baggage, which is not needed
when you're not doing your own validation.

-- 
    Viktor.

Reply via email to