* Rich Felker:

> On Wed, Apr 15, 2020 at 08:27:08PM +0200, Florian Weimer wrote:
>> >> I don't understand your PTR example.  It seems such a fringe case that
>> >> people produce larger PTR responses because they add all virtual hosts
>> >> to the reverse DNS zone.  Sure, it happens, but not often.
>> >
>> > I think it's probably more a matter of the concurrent lookups from
>> > multiple nameservers (e.g. local, ISP, and G/CF, where local has
>> > fastest round-trip but not much in cache, G/CF has nearly everything
>> > in cache but slowest round trip, and ISP is middle on both) than lack
>> > of tcp fallback that makes netstat etc. so much faster.
>> 
>> The question is: Why would you get a TC bit response?  Is the musl
>> resolver code triggering some anti-spoofing measure that tries to
>> validate source addresses over TCP?  (I forgot about this aspect of
>> DNS.  Ugh.)
>
> TC bit is for truncation, and means that the complete response would
> have been larger than 512 bytes and was truncated to whatever number
> of whole RRs fit in 512 bytes.

You mentioned that TC processing added observable latency to the
netstat tool.  netstat performs PTR queries.  Non-DNSSEC responses to
PTR queries are rarely larger than 512 bytes.  (The only exception I
have seen occur when people list all their HTTP virtual hosts in PTR
records, but again, that's very rare.)  Typically, they are less than
150 bytes.  Non-minimal responses can be larger, but the additional
data is removed without setting the TC bit.

This is why something very odd must have happened during your test.
One explanation would be a middlebox that injects TC queries to
validate source addresses.

>> > However it's not clear how "fallback to tcp" logic should interact
>> > with such concurrent requests -- switch to tcp for everything and
>> > just one nameserver as soon as we get any TC response?
>> 
>> It's TCP for this query only, not all subsequent queries.  It makes
>> sense to query the name server that provided the TC response: It
>> reduces latency because that server is more likely to have the large
>> response in its cache.
>
> I'm not talking about future queries but other unfinished queries that
> are part of the same operation (presently just concurrent A and AAAA
> lookups).

If the second response has TC set (but not the first), you can keep
the first response.  Re-querying both over TCP increases the
likelihood that you get a response from the same cluster node (so more
consistency), but you won't get that over UDP, ever, so I don't think
it matters.

If the first response has TC set, you have an open TCP connection you
could use for the second query as well.  Pipelining of DNS requests
has compatibility issues because there is no application-layer
connection teardown (an equivalent to HTTP's Connection: close).  If
the server closes the connection after sending the response to the
first query, without reading the second, this is a TCP data loss
event, which results in an RST segment and potentially, loss of the
response to the first query.  Ideally, a client would wait for the
second UDP response and the TCP response to arrive.  If the second UDP
response is TC as well, the TCP query should be delayed until the
first TCP response came back.

(We should move this discussion someplace else.)

Reply via email to