* Rich Felker: > On Wed, Apr 15, 2020 at 08:27:08PM +0200, Florian Weimer wrote: >> >> I don't understand your PTR example. It seems such a fringe case that >> >> people produce larger PTR responses because they add all virtual hosts >> >> to the reverse DNS zone. Sure, it happens, but not often. >> > >> > I think it's probably more a matter of the concurrent lookups from >> > multiple nameservers (e.g. local, ISP, and G/CF, where local has >> > fastest round-trip but not much in cache, G/CF has nearly everything >> > in cache but slowest round trip, and ISP is middle on both) than lack >> > of tcp fallback that makes netstat etc. so much faster. >> >> The question is: Why would you get a TC bit response? Is the musl >> resolver code triggering some anti-spoofing measure that tries to >> validate source addresses over TCP? (I forgot about this aspect of >> DNS. Ugh.) > > TC bit is for truncation, and means that the complete response would > have been larger than 512 bytes and was truncated to whatever number > of whole RRs fit in 512 bytes.
You mentioned that TC processing added observable latency to the netstat tool. netstat performs PTR queries. Non-DNSSEC responses to PTR queries are rarely larger than 512 bytes. (The only exception I have seen occur when people list all their HTTP virtual hosts in PTR records, but again, that's very rare.) Typically, they are less than 150 bytes. Non-minimal responses can be larger, but the additional data is removed without setting the TC bit. This is why something very odd must have happened during your test. One explanation would be a middlebox that injects TC queries to validate source addresses. >> > However it's not clear how "fallback to tcp" logic should interact >> > with such concurrent requests -- switch to tcp for everything and >> > just one nameserver as soon as we get any TC response? >> >> It's TCP for this query only, not all subsequent queries. It makes >> sense to query the name server that provided the TC response: It >> reduces latency because that server is more likely to have the large >> response in its cache. > > I'm not talking about future queries but other unfinished queries that > are part of the same operation (presently just concurrent A and AAAA > lookups). If the second response has TC set (but not the first), you can keep the first response. Re-querying both over TCP increases the likelihood that you get a response from the same cluster node (so more consistency), but you won't get that over UDP, ever, so I don't think it matters. If the first response has TC set, you have an open TCP connection you could use for the second query as well. Pipelining of DNS requests has compatibility issues because there is no application-layer connection teardown (an equivalent to HTTP's Connection: close). If the server closes the connection after sending the response to the first query, without reading the second, this is a TCP data loss event, which results in an RST segment and potentially, loss of the response to the first query. Ideally, a client would wait for the second UDP response and the TCP response to arrive. If the second UDP response is TC as well, the TCP query should be delayed until the first TCP response came back. (We should move this discussion someplace else.)