Re: *.dlv.isc.org DS: must be secure warnings [was: Re: 9.6.1-P1 log message]

2009-09-27 Thread Mark Andrews

In message prayer.1.3.2.0909262248400.24...@hermes-1.csi.cam.ac.uk, Chris Tho
mpson writes:
 Back in August there was some a thread on bind-users about messages
 of the shape
 
   validating @[hex]: [name].dlv.isc.org DS: must be secure failure
 
 (these are category dnssec severity warning) and on 31 August I wrote:
 
 We have been running two production recursive nameservers validating against
 dlv.isc.org since 9 June, and first saw a batch of messages (for both server
 s)
 like this on 20 July. We reported them to ISC and got suggestions along the
 lines of Mark's above, along with an admission that current versions of BIND
 give up on EDNS too easily in situations they maybe shouldn't, which may be
 fixed in future releases.
 
 Since then we have had a trickle of such warning messages in the logs. We
 assume that they are the result of temporary network glitches somewhere,
 but their frequency appears to be increasing, which is somewhat worrying.
 It's also not clear whether any client queries are actually failing as a
 result, or whether BIND is simply trying another dlv.isc.org nameserver
 with better luck.
 
 I have been looking at this again, and in fact there was a step function
 on 21 August when the messages rose from almost nil to 15-20 per day, and
 then fell back to almost nil after 15 September (we've seen just one since
 then). We have been running BIND 9.6.1-P1 throughout.
 
 I would be very interested to know whether other recursive nameserver
 operators validating via dlv.isc.org have seen a similar pattern. I am
 prepared to believe that the frequency is related to transient network
 errors or delays, but I have no idea whether they are likely to be local
 or at at the dlv.isc.org server end.

One gets these or similar messages when named falls back to plain
DNS as a result of multiple timeouts.  Named tries EDNS advertising
a 4096 byte UDP buffer, then after multiple timeouts it tries EDNS
advertising a 512 byte UDP buffer, then after multiple timeouts it
tries plain DNS.

Named also had a bug where it would fallback a EDNS step when it
didn't need to (like retrying w/ TCP).  This made DNSSEC behind
middleware that was dropping fragments difficult.

2564.   [bug]   Only take EDNS fallback steps when processing timeouts.
[RT #19405]

Some (perhaps not all) of the timeout causes are below.  This list is
not specific to DLV.

(apparent) non responses to UDP queries can be due to lots of causes:
*+ Firewalls/middleware that blocks DNS responses  512
*+ Firewalls/middleware that blocks fragments
*+ Lack of support for out of order responses in NAT
*+ Responses that require fragmentation but DF set.  Most of these will
  be in the 1481-1500 bytes in size (IP in IP tunnels).  Larger responses
  are usually fragmented by the sending OS and don't have DF set.  Smaller
  response make it through a single layer of encapsulation.
*+# Bad nameserver software that fails to respond to EDNS requests
*+# Firewalls/proxies that block EDNS queries or queries/responses with
  one or more of DO, CD or AD set.
* Congestion
* Packet corruption
* Appear lost due to long rtt times
  - load balancing probes taking too long
  - multiple satellite links
  - significant congestion causing long delays

+ indicates broken software
# indicates fallback to plain DNS will be required

A handful a day would suggest packet corruption/congestion as the likely
cause.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


*.dlv.isc.org DS: must be secure warnings [was: Re: 9.6.1-P1 log message]

2009-09-26 Thread Chris Thompson

Back in August there was some a thread on bind-users about messages
of the shape

 validating @[hex]: [name].dlv.isc.org DS: must be secure failure

(these are category dnssec severity warning) and on 31 August I wrote:


We have been running two production recursive nameservers validating against
dlv.isc.org since 9 June, and first saw a batch of messages (for both servers)
like this on 20 July. We reported them to ISC and got suggestions along the
lines of Mark's above, along with an admission that current versions of BIND
give up on EDNS too easily in situations they maybe shouldn't, which may be
fixed in future releases.

Since then we have had a trickle of such warning messages in the logs. We
assume that they are the result of temporary network glitches somewhere,
but their frequency appears to be increasing, which is somewhat worrying.
It's also not clear whether any client queries are actually failing as a
result, or whether BIND is simply trying another dlv.isc.org nameserver
with better luck.


I have been looking at this again, and in fact there was a step function
on 21 August when the messages rose from almost nil to 15-20 per day, and
then fell back to almost nil after 15 September (we've seen just one since
then). We have been running BIND 9.6.1-P1 throughout.

I would be very interested to know whether other recursive nameserver
operators validating via dlv.isc.org have seen a similar pattern. I am
prepared to believe that the frequency is related to transient network
errors or delays, but I have no idea whether they are likely to be local
or at at the dlv.isc.org server end.

--
Chris Thompson
Email: c...@cam.ac.uk
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users