* Rich Felker:

> On Wed, Apr 15, 2020 at 07:19:43PM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > This is true for users running local nameservers, which ideally will
>> > eventually be everyone, but at present that's far from the case.
>> > Differences like concurrent attempts from multiple nameservers and/or
>> > lack of TCP fallback on TC are what makes netstat fast on musl vs
>> > repeatedly stalling for multiple seconds at a time on other
>> > implementations. I don't have any data on how often TC happens and if
>> > it's actually a big part of the difference, so this is probably worth
>> > exploring. But I think it's a separate topic from the issue with DANE
>> > on Postfix, so let's set it aside and pick that back up on the musl
>> > list or elsewhere later.
>> 
>> qmail famously used a 512 byte buffer for the DNS response (the same
>> amount that can fit into a UDP DNS response), and it wasn't enough for
>> some MX responses at the time.  Pretty much everyone using qmail
>> eventually had to patch around this.  (There were also problematic ANY
>> queries, if I recall correctly.)
>
> I'd be interested in reading more on this if you know any references.
> Over 512 bytes of MX records seems like a lot, and seems like a really
> bad idea for a domain configuration since there have always been (as
> you noted, with qmail) compatibility problems with not all sites being
> able to resolve them.

The patch is here:

  <http://www.memoryhole.net/qmail/#oversize-dns>

You can find some historic discussions using the term
“CNAME_lookup_failed_temporarily”.  It seems that aol.com is affected.
It may have been a combination with the ANY lookups that qmail did
instead of looking at A replies directly, to see whether they started
at a CNAME chain.

(There was a belief once that it is necessary to fail mail delivery to
domains whose MX records had the start of a CNAME chain on the right
hand side.)

>> DNS practices for mail have changed since then.  Maybe you can get
>> away with a 512 byte response buffer these days if you don't use
>> DNSSEC.
>
> "If you don't use DNSSEC" is ambiguous. As long as DNSSEC is being
> validated in the nameserver the stub contacts (which should be local
> to have reasonable trust properties), "using" DNSSEC does not impose
> any additional response size requirements on the application/stub
> resolver.

I meant: setting the DO bit in the query.  Sorry.

>> I don't understand your PTR example.  It seems such a fringe case that
>> people produce larger PTR responses because they add all virtual hosts
>> to the reverse DNS zone.  Sure, it happens, but not often.
>
> I think it's probably more a matter of the concurrent lookups from
> multiple nameservers (e.g. local, ISP, and G/CF, where local has
> fastest round-trip but not much in cache, G/CF has nearly everything
> in cache but slowest round trip, and ISP is middle on both) than lack
> of tcp fallback that makes netstat etc. so much faster.

The question is: Why would you get a TC bit response?  Is the musl
resolver code triggering some anti-spoofing measure that tries to
validate source addresses over TCP?  (I forgot about this aspect of
DNS.  Ugh.)

> However it's not clear how "fallback to tcp" logic should interact
> with such concurrent requests -- switch to tcp for everything and
> just one nameserver as soon as we get any TC response?

It's TCP for this query only, not all subsequent queries.  It makes
sense to query the name server that provided the TC response: It
reduces latency because that server is more likely to have the large
response in its cache.

Reply via email to