Zitat von "W.C.A. Wijngaards" <[email protected]>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Andreas,

On 01/10/2012 03:43 PM, [email protected] wrote:
Hello

we have a internal unbound cache using a second unbound instance at
the border firewall to do dns resolution with DNSSEC enabled. Today
our internal unbound stop working with errors like this:

Jan 10 14:33:53 mailer unbound: [27958:0] info: validation failure
<www.at-web.de. A IN>: no DNSSEC records from x.x.x.x for DS
at-web.de. while building chain of trust Jan 10 14:33:53 mailer
unbound: [27958:0] info: validation failure <www.heise.de. A IN>:
no DNSSEC records from x.x.x.x for DS heise.de. while building
chain of trust

So, what it looked like for this server was that dig @x.x.x.x DS
heise.de +dnssec +norec +cdflag did not return any DNSSEC data.

The man-pages of my "dig" version does not know "+norec" and the above command lead to Status->Refused, without the "noreg" it got the following which looks sane to me:

; <<>> DiG 9.7.0-P1 <<>> @x.x.x.x heise.de DS +dnssec +cdflag
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13864
;; flags: qr rd ra cd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;heise.de.                      IN      DS

;; AUTHORITY SECTION:
H319DM5GC3EDEK691VQBHEHOT7VGGJ2B.de. 7136 IN NSEC3 1 1 15 BA5EBA11 H31BIK9NA0MJD5K06JE5H9BBFDBD56DB NS SOA NAPTR RRSIG DNSKEY NSEC3PARAM H319DM5GC3EDEK691VQBHEHOT7VGGJ2B.de. 7136 IN RRSIG NSEC3 8 2 7200 20120117131500 20120110131500 30565 de. fd08T4Fapf6tVVOA2VmYXceBTUS5Ckjz8iqdBttzt4DgAq2e8bI4l/aE wHgXBl2P+CEq6m5H7d4X6WHXvoi+mWYof4LYb1cSW2l212kJ/jT4M6q4 QMYrcocKZaFzKg/X4fZwD1ma0RQ7q8Mx09heV25TlZwxSBjbpRUQv4Ez /0U=
de.                     7136    IN      SOA     f.nic.de. its.denic.de. 
2012011062 7200 7200 3600000 7200
de. 7136 IN RRSIG SOA 8 1 86400 20120117131500 20120110131500 30565 de. R5N20le84Cacq8mtIwKifWifIOgJN2tWULiJU/DGDxsBQPiqYkM9zec7 dfgfs8XQbUx3Kkymsuo7sdanAQVld7ieew+aVP9yhgZdc18cmuk4hYBB 1X1Sb8X249kv6xxR/D87pl57g86HW3OzG2pFhV+pjt5IWNUGvBCiiQkQ HUU= UMUKTKOLDUUT050M28LQE3R399Q894KV.de. 6942 IN NSEC3 1 1 15 BA5EBA11 UMUPU1E8C10ANEOEMVVG217UL77BN1H8 A RRSIG UMUKTKOLDUUT050M28LQE3R399Q894KV.de. 6942 IN RRSIG NSEC3 8 2 7200 20120117131500 20120110131500 30565 de. Gf4tjJyx6WwHi8tyX7UwkI2CYoyA0I3Jyjv9zqo7o/kmm9ztleOZZSFG y5DzFihl4vyvSVu6ZSmeMHjy1dniIMmvIPMOsWGK120vp/LGYjc0r+J+ KsJsqb8F6bimi6EPy4Q80/Pc2UsOpoYToOawLCqHjMHE7mn76HpPJyXK oX8=

;; Query time: 0 msec
;; SERVER: x.x.x.x#53(x.x.x.x)
;; WHEN: Tue Jan 10 16:23:29 2012
;; MSG SIZE  rcvd: 742

As if there were fragmentation problems.  And since it was internal
there are extra firewalls or routers for that sort of thing to occur.

There is nothing between the two machines beside a switch, both machines have iptables but configured to let UDP/TCP port 53 pass. No logs for iptables either from this time.

The instance at the border firewall has no errors in the log and
works fine all the time. After restarting the internal instance, it
is also working fine again. The auto-trust-anchor-file of the
internal instance has a timestamp from the restart of the instance,
so i suspect something went wrong with the update of this file, but
i have no glue why the restart cured it.

No, the timestamp was probably written right when you restart it.
Because it is written when the root DNSKEY is seen.  When you restart
it the cache is empty and it fetches the root DNSKEY.  And thus
updates the file to note that it saw the root key.

That what strikes me odd. The internal unbound instance is not able to fetch the key some minutes ago, but on restart it is able to do so without problems. As there are also .com domains affected i suspect that it wasn't the key from heise.de which failed but it was simply the first to fail.


Both instances are Unbound version 1.4.14 with auto-trust-anchor
enabled. The forwarding from internal to firewall instance is done
this way:

forward-zone: name: "." forward-addr: x.x.x.x

This looks fine.

What can we do to debug this problem and prevent it from happening
again?

There is something happening with UDP.  There seems nothing wrong with
key files.  The error is that somehow it gets no DNSSEC data (edns
backoff, or messages arrive 'stripped' of DNSSEC data).

A said there is basically only a wire between the two. Will the keys be cached by unbound BTW? As said the external unbound does not have any problem at all while the internal does only delivers errors. Is it possible that the problem arise from DNS data delivered from external name servers?
It is very inconvenient if the central resolver cache stop working...

Thanks

Andreas


_______________________________________________
Unbound-users mailing list
[email protected]
http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users

Reply via email to