Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.
> On Thu, Nov 17, 2016 at 10:18:01PM +0100, Walter Doekes wrote: >> That looks like I have my DNS recursor to blame for the problem. It's a >> powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken. >> >> I'll be forwarding the issue with the appropriate evidence there if it >> hasn't been fixed already. > > Please post a summary with the resolution. If for some (unlikely) > reason you don't get an adequate answer from PowerDNS support, drop > me a note, I can reach out directly to the developers. Recursors > are expected to behave in the manner you observed with bind9. Okay, today I finally got some time to get this sorted. It appears it was indeed a bug in pdns-recursor 4.0.0~alpha2-2 on Ubuntu/Xenial. The bug had been fixed upstream in May 2016: https://github.com/PowerDNS/pdns/commit/9d534f2a12defc44d2a79291bf34b82e5ee28121 I've filed a bugreport for Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/pdns-recursor/+bug/1646538 It looks like of Debian and Ubuntu, only Ubuntu/Xenial (LTS) is affected. All the others run 3.x or 4.0.1 or higher (the latter ones include 9d534f2a and the former didn't appear affected by this). Thanks again for your prompt reply! Walter Doekes OSSO B.V.
Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.
On Thu, Nov 17, 2016 at 10:18:01PM +0100, Walter Doekes wrote: > >Postfix will not directly query the remote nameserver, and in indeed > >with DANE you're supposed to be configured to *only* query the > >local resolver. What resolver is that? And how is it configured? > > > >Once the A records come back insecure (AD=0), Postfix will not > >query for TLSA records. > > Yes, I was aware that postfix doesn't do the recursion itself. The > @remote-dns in the example was merely to clarify. > > You are right. I checked with bind9 as recursor today and it does two > queries: first one that gets the FORMERR and then a second one without EDNS > that succeeds. It'll happily pass along the succesful response to the > original requestor. > > That looks like I have my DNS recursor to blame for the problem. It's a > powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken. > > I'll be forwarding the issue with the appropriate evidence there if it > hasn't been fixed already. Please post a summary with the resolution. If for some (unlikely) reason you don't get an adequate answer from PowerDNS support, drop me a note, I can reach out directly to the developers. Recursors are expected to behave in the manner you observed with bind9. -- Viktor.
Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.
Awesome Viktor! Thanks for your speedy response. On 17-11-16 01:17, Viktor Dukhovni wrote: On Wed, Nov 16, 2016 at 11:15:35PM +0100, Walter Doekes wrote: this week we stumbled upon an issue where we could not send mail to certain domains, for instance em...@umcg.nl. ... It turned out that this was the cause: ... $ dig A umcg-nl.mail.protection.outlook.com. \ @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec | grep FORMERR ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904 ;; WARNING: EDNS query returned status FORMERR - retry with '+nodnssec +noedns' I can't reproduce your observations using unbound as the local resolver: $ dig +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl \ -t a umcg-nl.mail.protection.outlook.com ... umcg-nl.mail.protection.outlook.com. A 213.199.154.23 umcg-nl.mail.protection.outlook.com. A 213.199.154.87 Postfix will not directly query the remote nameserver, and in indeed with DANE you're supposed to be configured to *only* query the local resolver. What resolver is that? And how is it configured? Once the A records come back insecure (AD=0), Postfix will not query for TLSA records. Yes, I was aware that postfix doesn't do the recursion itself. The @remote-dns in the example was merely to clarify. You are right. I checked with bind9 as recursor today and it does two queries: first one that gets the FORMERR and then a second one without EDNS that succeeds. It'll happily pass along the succesful response to the original requestor. That looks like I have my DNS recursor to blame for the problem. It's a powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken. I'll be forwarding the issue with the appropriate evidence there if it hasn't been fixed already. Thanks again, Walter Doekes OSSO B.V.
Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.
On Wed, Nov 16, 2016 at 11:15:35PM +0100, Walter Doekes wrote: > this week we stumbled upon an issue where we could not send mail to certain > domains, for instance em...@umcg.nl. > > Nov 16 17:04:08 mail postfix/smtp[13330]: warning: > no MX host for umcg.nl has a valid address record > Nov 16 17:04:08 mail postfix/smtp[13330]: 1D1D21422C2: > to=, relay=none, delay=2257, > delays=2256/0.02/0.52/0, dsn=4.4.3, status=deferred > (Host or domain name not found. Name service error > for name=umcg-nl.mail.protection.outlook.com type=A: > Host not found, try again) > > It turned out that this was the cause: > > $ dig MX umcg.nl +short > 10 umcg-nl.mail.protection.outlook.com. > > $ dig NS mail.protection.outlook.com. +short > ns1-proddns.glbdns.o365filtering.com. > ns2-proddns.glbdns.o365filtering.com. > > $ dig A umcg-nl.mail.protection.outlook.com. \ > @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec | > grep FORMERR > ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904 > ;; WARNING: EDNS query returned status FORMERR - > retry with '+nodnssec +noedns' I can't reproduce your observations using unbound as the local resolver: $ dig +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl \ -t a umcg-nl.mail.protection.outlook.com ; <<>> DiG 9.10.4-P2 <<>> +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl -t a umcg-nl.mail.protection.outlook.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10562 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;umcg-nl.mail.protection.outlook.com. INA ;; ANSWER SECTION: umcg-nl.mail.protection.outlook.com. A 213.199.154.23 umcg-nl.mail.protection.outlook.com. A 213.199.154.87 Postfix will not directly query the remote nameserver, and in indeed with DANE you're supposed to be configured to *only* query the local resolver. What resolver is that? And how is it configured? Once the A records come back insecure (AD=0), Postfix will not query for TLSA records. > Apparently some Microsoft Office 365 mail servers do not support EDNS and > return FORMERR. This propagated through our DNS recursors as SERVFAIL and > caused the lookup to fail. FORMERR is the expected/standard respose in this case, and your resolver is expected to fall back to non-EDNS queries. > Some more digging revealed that EDNS was enabled on the query through > `smtp_addr_list`: > > else if (smtp_tls_insecure_mx_policy > TLS_LEV_MAY) > res_opt = RES_USE_DNSSEC; That setting affects communication between Postfix and the local resolver, it does control the options on the next hop query. > The USE_DNSSEC causes the subsequent queries to use USE_EDNS0 with the DO > flag and that killed our interoperability with the Microsoft Office 365 DNS. This analysis is flawed. Your resolver is not supposed to unconditionally use EDNS upstream just because the local client is using EDNS. > - Apart from Microsoft upgrading their servers to 2016 and supporting EDNS, > is this issue something postfix should handle? The problem is your resolver. > - Would postfix have handled FORMERR but not SERVFAIL and are my caching > resolvers to blame? The latter. > - Should postfix retry the query without EDNS on unexpected errors? No. -- Viktor.