On Mon, Apr 13, 2020 at 05:41:38PM -0400, Viktor Dukhovni wrote:
> > Fallback to tcp on TC would also yield very bad performance for users
> > who are not running a local nameserver whenever looking up names with
> > ridiculous numbers of A/AAAA records, where the truncated response
> > certainly suffices (except in your example of FCrDNS).
> 
> Your local nameserver has already done the TCP failover and paid the
> cost of obtaining the full RRset, your stub resolver is just failing to
> give it the opportunity to return the full data to you.  The performance
> cost is low, and such records are a minority.  Correctness trumps
> performance where I come from.  Cutting corners for performance and
> violating requirements is not acceptable.

This is true for users running local nameservers, which ideally will
eventually be everyone, but at present that's far from the case.
Differences like concurrent attempts from multiple nameservers and/or
lack of TCP fallback on TC are what makes netstat fast on musl vs
repeatedly stalling for multiple seconds at a time on other
implementations. I don't have any data on how often TC happens and if
it's actually a big part of the difference, so this is probably worth
exploring. But I think it's a separate topic from the issue with DANE
on Postfix, so let's set it aside and pick that back up on the musl
list or elsewhere later.

> > > But some applications need to see the AD bit returned by the local
> > > resolver in order to distiguish between validated and non-validated
> > > results.  Recursive Nameservers (BIND, Unbound, ...) will only set
> > > (when appropriate) the AD bit in replies if it is set in the incoming
> > > query.  The AD bit is part of the standard DNS header:
> > 
> > Is the AD bit valid as part of a query?
> 
> Absolutely, and indeed it is required in order to solicit the AD bit
> in return.  And e.g. dig(1) sets the AD bit in requests by default,
> and you need to use "dig +noad" to turn it off!
> 
> > I couldn't find where this is documented, and it's almost certainly
> > not supported (possibly rejected/dropped) by servers that aren't aware
> > of it.
> 
> That is not the case.  In order for DNS to be extensible, servers are
> required to ignore previously reserved flag bits, so that they can
> later be assigned.

OK, if that's true in practice then it probably suffices to always set
it. I'll see if I can find any more information on this. Searching for
dig noadflag suggests there were at least historically problems with
certain nameservers and firewalls dropping requests with the AD bit
set...

> there is also no AD bit in the reply.  Implementors of stub resolvers
> need to read many RFCs or consult experts who have:
> 
>     https://tools.ietf.org/html/rfc6840#section-5.7

I've read the ones we implement thoroughly. That does not include the
latest additions, because my view has always been that a stub
resolver's role is to speak the most minimal protocol that all servers
accept, and that all the logic for what to do with DNSSEC belongs in
the server responsible for policy, not in the stub resolver or
application.

>From the text you linked, it looks like this use of the AD bit (in
queries) is considerably newer than the DO bit.

> > If the former, I don't see why it would be done conditional on
> > being a local resolver (and also local need not be 127.0.0.1 or ::1;
> > it can be public address of localhost or a lot of other things, e.g. a
> > tunnel out of a container to the actual host, depending on network
> > setup).
> 
> Because the AD bit from a non-local resolver is not trustworthy.  One
> might imagine resolver configurations in which one can indicate that the
> network path to a range of non-local IP addresses (perhaps IPSEC or
> other secure link) is tamper-resistant, but as a default it may make
> sense to ignore the AD bit from remote IPs.

I see. I don't think imposing policy about what IPs are "local" or not
is within the scope of musl, though. There are lots of setups people
use where the sense of "local" is rather muddled.

> Not ignoring is not worse than the situation that Postfix is in today,
> where we don't know whether the AD bit returned by libresolv is
> trustworthy or not, and just document the requirement for a local
> resolver, and hope that users who want DANE security pay attention to
> the docs.
> 
> However, I am suggesting that ignoring non-local AD bits would in fact
> resolve that issue.  A more complete implementation would have a
> configurable whitelist of "trusted" resolvers.

The "trusted" resolver is whatever you write in resolv.conf, to
whatever extent you intend to trust it. There's no point in a separate
whitelist; resolv.conf is that whitelist.

> > I think just adding a resolv.conf option for using the AD bit might be
> > appropriate. One issue that makes this more complicated though is how
> > the API is factored.
> 
> You can safely set it unconditionally, or just to the loopback ones (to
> help remove an AD-bit MiTM footgun).  No known resolvers will object to
> the AD in queries.

Do you know any research on this? That's my hope too but I did turn up
some results from around 2013-2015 that seemed to be folks running
into problems that might have been servers or firewalls dropping it.

> > res_mkquery in theory doesn't/shouldn't depend on
> > the particular nameservers, but should just serialize a query that can
> > be used with any server (e.g. my implementation of host(1) does this
> > to send to the server you give it on the command line). But the choice
> > of configuration is specific to the configured nameservers.
> 
> You can inject the AD bit just before sending the packet to a particular
> server.

Surely that's one thing you could do, but I feel like it violates
least-surprise a bit. It also makes it a pain to send the query
in-place (requires iovecs with sendmsg).

> > > Sorry, we actually need to know which records were validated in
> > > signed domains, and which are "insecure" responses from unsigned
> > > domains.  That's what the AD bit is for, and you're not setting
> > > it in requests, and so it does not come back in the response.
> > 
> > Can you describe why?
> 
> I can, but you can just read RFC 7672 if you like, I've already
> explained it there.  Bottom line, it is needed.
> 
> > Is it only for the sake of not using TLSA
> > records in unsigned domains? That kind of policy can be implemented at
> > the resolver level
> 
> It cannot and should not be implemented at the resolver level.

Noted that this is your position. :-)

Rich

Reply via email to