Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.

2016-12-01 Thread Walter Doekes
> On Thu, Nov 17, 2016 at 10:18:01PM +0100, Walter Doekes wrote:
>> That looks like I have my DNS recursor to blame for the problem. It's a
>> powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken.
>>
>> I'll be forwarding the issue with the appropriate evidence there if it
>> hasn't been fixed already.
>
> Please post a summary with the resolution.  If for some (unlikely)
> reason you don't get an adequate answer from PowerDNS support, drop
> me a note, I can reach out directly to the developers.  Recursors
> are expected to behave in the manner you observed with bind9.

Okay, today I finally got some time to get this sorted. It appears it was
indeed a bug in pdns-recursor 4.0.0~alpha2-2 on Ubuntu/Xenial.

The bug had been fixed upstream in May 2016:
https://github.com/PowerDNS/pdns/commit/9d534f2a12defc44d2a79291bf34b82e5ee28121

I've filed a bugreport for Ubuntu here:
https://bugs.launchpad.net/ubuntu/+source/pdns-recursor/+bug/1646538

It looks like of Debian and Ubuntu, only Ubuntu/Xenial (LTS) is affected.
All the others run 3.x or 4.0.1 or higher (the latter ones include
9d534f2a and the former didn't appear affected by this).

Thanks again for your prompt reply!

Walter Doekes
OSSO B.V.




Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.

2016-11-17 Thread Viktor Dukhovni
On Thu, Nov 17, 2016 at 10:18:01PM +0100, Walter Doekes wrote:

> >Postfix will not directly query the remote nameserver, and in indeed
> >with DANE you're supposed to be configured to *only* query the
> >local resolver.  What resolver is that?  And how is it configured?
> >
> >Once the A records come back insecure (AD=0), Postfix will not
> >query for TLSA records.
> 
> Yes, I was aware that postfix doesn't do the recursion itself. The
> @remote-dns in the example was merely to clarify.
> 
> You are right. I checked with bind9 as recursor today and it does two
> queries: first one that gets the FORMERR and then a second one without EDNS
> that succeeds. It'll happily pass along the succesful response to the
> original requestor.
> 
> That looks like I have my DNS recursor to blame for the problem. It's a
> powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken.
> 
> I'll be forwarding the issue with the appropriate evidence there if it
> hasn't been fixed already.

Please post a summary with the resolution.  If for some (unlikely)
reason you don't get an adequate answer from PowerDNS support, drop
me a note, I can reach out directly to the developers.  Recursors
are expected to behave in the manner you observed with bind9.

-- 
Viktor.


Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.

2016-11-17 Thread Walter Doekes

Awesome Viktor! Thanks for your speedy response.

On 17-11-16 01:17, Viktor Dukhovni wrote:

On Wed, Nov 16, 2016 at 11:15:35PM +0100, Walter Doekes wrote:

this week we stumbled upon an issue where we could not send mail to certain
domains, for instance em...@umcg.nl.

...

It turned out that this was the cause:

...

  $ dig A umcg-nl.mail.protection.outlook.com.  \
  @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec |
grep FORMERR
  ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904
  ;; WARNING: EDNS query returned status FORMERR -
  retry with '+nodnssec +noedns'



I can't reproduce your observations using unbound as the local
resolver:

$ dig +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl \
-t a umcg-nl.mail.protection.outlook.com

...

umcg-nl.mail.protection.outlook.com. A 213.199.154.23
umcg-nl.mail.protection.outlook.com. A 213.199.154.87

Postfix will not directly query the remote nameserver, and in indeed
with DANE you're supposed to be configured to *only* query the
local resolver.  What resolver is that?  And how is it configured?

Once the A records come back insecure (AD=0), Postfix will not
query for TLSA records.


Yes, I was aware that postfix doesn't do the recursion itself. The 
@remote-dns in the example was merely to clarify.


You are right. I checked with bind9 as recursor today and it does two 
queries: first one that gets the FORMERR and then a second one without 
EDNS that succeeds. It'll happily pass along the succesful response to 
the original requestor.


That looks like I have my DNS recursor to blame for the problem. It's a 
powerdns recursor, version 4.0.0~alpha2 if I'm not mistaken.


I'll be forwarding the issue with the appropriate evidence there if it 
hasn't been fixed already.



Thanks again,
Walter Doekes
OSSO B.V.



Re: EDNS / DANE trouble with Microsoft mail.protection.outlook.com.

2016-11-16 Thread Viktor Dukhovni
On Wed, Nov 16, 2016 at 11:15:35PM +0100, Walter Doekes wrote:

> this week we stumbled upon an issue where we could not send mail to certain
> domains, for instance em...@umcg.nl.
> 
> Nov 16 17:04:08 mail postfix/smtp[13330]: warning:
> no MX host for umcg.nl has a valid address record
> Nov 16 17:04:08 mail postfix/smtp[13330]: 1D1D21422C2:
> to=, relay=none, delay=2257,
> delays=2256/0.02/0.52/0, dsn=4.4.3, status=deferred
> (Host or domain name not found. Name service error
> for name=umcg-nl.mail.protection.outlook.com type=A:
> Host not found, try again)
> 
> It turned out that this was the cause:
> 
>   $ dig MX umcg.nl +short
>   10 umcg-nl.mail.protection.outlook.com.
> 
>   $ dig NS mail.protection.outlook.com. +short
>   ns1-proddns.glbdns.o365filtering.com.
>   ns2-proddns.glbdns.o365filtering.com.
> 
>   $ dig A umcg-nl.mail.protection.outlook.com.  \
>   @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec |
> grep FORMERR
>   ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904
>   ;; WARNING: EDNS query returned status FORMERR -
>   retry with '+nodnssec +noedns'

I can't reproduce your observations using unbound as the local
resolver:


$ dig +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl \
-t a umcg-nl.mail.protection.outlook.com

; <<>> DiG 9.10.4-P2 <<>> +dnssec +ad +noall +comment +cmd +qu +ans +auth 
+nocl +nottl -t a umcg-nl.mail.protection.outlook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10562
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;umcg-nl.mail.protection.outlook.com. INA

;; ANSWER SECTION:
umcg-nl.mail.protection.outlook.com. A 213.199.154.23
umcg-nl.mail.protection.outlook.com. A 213.199.154.87

Postfix will not directly query the remote nameserver, and in indeed
with DANE you're supposed to be configured to *only* query the
local resolver.  What resolver is that?  And how is it configured?

Once the A records come back insecure (AD=0), Postfix will not
query for TLSA records.

> Apparently some Microsoft Office 365 mail servers do not support EDNS and
> return FORMERR. This propagated through our DNS recursors as SERVFAIL and
> caused the lookup to fail.

FORMERR is the expected/standard respose in this case, and your
resolver is expected to fall back to non-EDNS queries.

> Some more digging revealed that EDNS was enabled on the query through
> `smtp_addr_list`:
> 
>  else if (smtp_tls_insecure_mx_policy > TLS_LEV_MAY)
> res_opt = RES_USE_DNSSEC;

That setting affects communication between Postfix and the local
resolver, it does control the options on the next hop query.

> The USE_DNSSEC causes the subsequent queries to use USE_EDNS0 with the DO
> flag and that killed our interoperability with the Microsoft Office 365 DNS.

This analysis is flawed.  Your resolver is not supposed to
unconditionally use EDNS upstream just because the local client is
using EDNS.

> - Apart from Microsoft upgrading their servers to 2016 and supporting EDNS,
> is this issue something postfix should handle?

The problem is your resolver.

> - Would postfix have handled FORMERR but not SERVFAIL and are my caching
> resolvers to blame?

The latter.

> - Should postfix retry the query without EDNS on unexpected errors?

No.

-- 
Viktor.