On Tue, Apr 14, 2020 at 12:06:41PM -0400, Rich Felker wrote: > > Well, ISP resolvers and anycast resolvers from Google, Cloudflare, > > Verisign and Quad are generally not too far away. > > If you're on dialup or saturated DSL or cellular link, they're easily > 300-1000 ms away. Each round trip costs that.
Yes, but the alternative is not getting an answer at all. I don't see the point of a fast wrong answer. The TCP failover does not impact the vast majority of queries that don't come back with TC=1, so what exactly is the problem? > In any case this is a separate topic. It isn't exactly, because large RRsets get dropped over UDP, and all you get is an essentially empty response with TC=1. This can strip TLSA RRsets, SRV RRsets, multihomed machines with many IPv6 IPs, ... Again, correctness first, then performance, especially when the performance cost is only incurred in a minority of cases where otherwise you get the wrong answer. > > That was quite some time ago... This is no longer a problem that needs > > to be addressed by clients. > > Given your above assumption that everyone is on fiber or similar, I > think you might be a bit optimistic about what we can rely on... I am not assuming that everyone is on fiber or similar, I am only saying that the forwarders for stub resolvers tend to be nearby. The TCP retry queries the *nearest* resolver that first returned a TC=1 UDP response, and additional TCP retries can be abandoned when the first complete answer arrives. If you're doing DNS over a dialup link, then it will take longer to get the right answer than over "broadband". > > That RFC was published in 2013. That's long enough ago. > > We support environments that haven't been touched since 2009 or so, > and to a lesser/minimal-support extent ones that haven't been touched > since around 2004. Your idea of environments Postfix might be running > on musl in is very different from the concept of environments that > arbitrary applications binaries linked to musl might be running in. Nevertheless, the AD bit is on by default in dig and similar tools, with no reports of any issues in a long time. Do you see dig fail where MUSL libc lookups succeed? I'm asking around the DNS community for any evidence of barriers to AD=1, so far nobody knows of any. I'll try to find more compelling evidence, but basically tolerating AD=1 (either ignoring or acting on it per the 2013 RFC) is *required* resolver behaviour. > So if there's any chance of this breaking there almost certainly needs > to be a way to turn it off that works even on static binaries. Whether and where to place such controls is your call. If novel /etc/resolv.conf options are not a problem for statically linked binaries using something other than musl-libc, then you could have: options noad ... but if that is incompatible with other stub resolver libraries on the same machine, you may need a private musl-specific configuration file. My money is on this being unnecessary. I'll let know what I find from dns-operations, and if possible perhaps a RIPE ATLAS probe, assuming they support enabling AD=1. > > In that case, set the AD bit unconditionally, or provide a documented > > mechanism to do so via a suitable configuration file. > > Putting it in resolv.conf on an options line is probably the best. The > main remaining question is just which default to use, and where to > apply it (at res_mkquery or at res_send). Your call. > > Find me a resolver that fails when the AD bit is set. Stub resolvers > > that always set it have been around for some time now. > > Do you know if the usual Windows, Android, iOS, etc. ones always set > it? If so it's almost surely safe to do so and this might not even > need to be an option (which would really be my favorite coarse of > action -- making it unconditional so there's no new configuration to > invent). Mostly dig, unbound-host, ... Most of the platform C libraries support DO=1, which obviates the need for AD=1, so they don't do that, but it is nevertheless safe. AD=1 is much cheaper than DO=1, because you get back just the AD bit without the excess RRSIG baggage, which is not needed when you're not doing your own validation. -- Viktor.