Re: BIND servfail from caching server
Thanks, I was able to setup a forward zone in the caching servers for supernet.com and forward to the ns{2,3}.earthlink.net servers. I will check periodically for their fixing of the zone and then remove the forward zone in the caching servers. Is there a simple tool to quickly identify this kind of issue? I would prefer to cron up a job to run periodically and when the problem is resolved to shoot me an email so I can remove the aforementioned config. Thanks again! On Thu, 2011-03-03 at 16:06 -0800, Chris Buxton wrote: It's because the NS RRSet returned by the authoritative name servers lists servers that are not authoritative. Classic DNS mistake. The com zone says that the authoritative servers for supernet.com are ns{2,3}.earthlink.net (delegation). But supernet.com as hosted on ns{2,3}.earthlink.net says that dns{1,2}.earthlink.net are the authoritative servers. This latter set of servers is not actually authoritative for the zone. For the first query, the resolver has not yet talked to the authoritative servers, so its only information is the delegation NS record set from com. The answer to that query, however, contains the authoritative NS record set, which is considered more credible and therefore replaces the delegation record set in the resolver's cache. Subsequent queries into the zone go to the bad servers, get lame responses, and fail. Unless you own supernet.com, this problem is not your fault and not for you to fix. You can work around it with conditional forwarding, or a zone of type static-stub if you're using BIND 9.8 already, but that's strictly a workaround and subject to breakage if the zone is moved. Chris Buxton BlueCat Networks On Mar 3, 2011, at 2:29 PM, Justin Krejci wrote: When doing a recursive query for MX supernet.com against a caching BIND server, the BIND server responds back with the answer. The TTL is 300. After the TTL expires the following recursive query for the same record returns a SERVFAIL from the caching server. If I do a +trace on the same query to the same caching server for the same data it is able to respond with the answer yet a standard recursive query still gives a SERVFAIL. Queries for other domains are working fine on this caching server. Other 3rd party DNS caching servers are responding fine for the same record above even after the TTL expires, tried @8.8.8.8 and @208.67.220.220 If if flush the cache on the caching server it successfully returns the answer to the query but only for the up the TTL's life then goes back to SERVFAIL again. (tried doing a full stop-and-start of named as well). This particular server is running BIND 9.7.0-P2 but this exact same behavior is also happening on a server running 9.5.1-P2.1 as well. So I noticed when doing a trace that the NS servers are different between the gtld and the actual authoritative servers. snip com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. ;; Received 502 bytes from 192.36.148.17#53(i.root-servers.net) in 2987 ms supernet.com. 172800 IN NS ns2.earthlink.net. supernet.com. 172800 IN NS ns3.earthlink.net. ;; Received 111 bytes from 192.54.112.30#53(h.gtld-servers.net) in 119 ms supernet.com. 300 IN MX 5 onemain-mx.earthlink.net. supernet.com. 3600IN NS dns1.earthlink.net. supernet.com. 3600IN NS dns2.earthlink.net. ;; Received 172 bytes from 207.217.120.43#53(ns3.earthlink.net) in 54 ms Is this just a bug that upgrading BIND will fix or is there something else going on here? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
BIND servfail from caching server
When doing a recursive query for MX supernet.com against a caching BIND server, the BIND server responds back with the answer. The TTL is 300. After the TTL expires the following recursive query for the same record returns a SERVFAIL from the caching server. If I do a +trace on the same query to the same caching server for the same data it is able to respond with the answer yet a standard recursive query still gives a SERVFAIL. Queries for other domains are working fine on this caching server. Other 3rd party DNS caching servers are responding fine for the same record above even after the TTL expires, tried @8.8.8.8 and @208.67.220.220 If if flush the cache on the caching server it successfully returns the answer to the query but only for the up the TTL's life then goes back to SERVFAIL again. (tried doing a full stop-and-start of named as well). This particular server is running BIND 9.7.0-P2 but this exact same behavior is also happening on a server running 9.5.1-P2.1 as well. So I noticed when doing a trace that the NS servers are different between the gtld and the actual authoritative servers. snip com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. ;; Received 502 bytes from 192.36.148.17#53(i.root-servers.net) in 2987 ms supernet.com. 172800 IN NS ns2.earthlink.net. supernet.com. 172800 IN NS ns3.earthlink.net. ;; Received 111 bytes from 192.54.112.30#53(h.gtld-servers.net) in 119 ms supernet.com. 300 IN MX 5 onemain-mx.earthlink.net. supernet.com. 3600IN NS dns1.earthlink.net. supernet.com. 3600IN NS dns2.earthlink.net. ;; Received 172 bytes from 207.217.120.43#53(ns3.earthlink.net) in 54 ms Is this just a bug that upgrading BIND will fix or is there something else going on here? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND servfail from caching server
Forgot to additionally add that the only thing that showed up in the logs was the query log entry, nothing else pertaining the below query. I also checked with tcpdump on the caching server that it was not sending any queries towards the Earthlink IP addresses which makes sense given that the SERVFAIL response comes back in 2 ms according to dig. On Thu, 2011-03-03 at 16:29 -0600, Justin Krejci wrote: When doing a recursive query for MX supernet.com against a caching BIND server, the BIND server responds back with the answer. The TTL is 300. After the TTL expires the following recursive query for the same record returns a SERVFAIL from the caching server. If I do a +trace on the same query to the same caching server for the same data it is able to respond with the answer yet a standard recursive query still gives a SERVFAIL. Queries for other domains are working fine on this caching server. Other 3rd party DNS caching servers are responding fine for the same record above even after the TTL expires, tried @8.8.8.8 and @208.67.220.220 If if flush the cache on the caching server it successfully returns the answer to the query but only for the up the TTL's life then goes back to SERVFAIL again. (tried doing a full stop-and-start of named as well). This particular server is running BIND 9.7.0-P2 but this exact same behavior is also happening on a server running 9.5.1-P2.1 as well. So I noticed when doing a trace that the NS servers are different between the gtld and the actual authoritative servers. snip com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. ;; Received 502 bytes from 192.36.148.17#53(i.root-servers.net) in 2987 ms supernet.com. 172800 IN NS ns2.earthlink.net. supernet.com. 172800 IN NS ns3.earthlink.net. ;; Received 111 bytes from 192.54.112.30#53(h.gtld-servers.net) in 119 ms supernet.com. 300 IN MX 5 onemain-mx.earthlink.net. supernet.com. 3600IN NS dns1.earthlink.net. supernet.com. 3600IN NS dns2.earthlink.net. ;; Received 172 bytes from 207.217.120.43#53(ns3.earthlink.net) in 54 ms Is this just a bug that upgrading BIND will fix or is there something else going on here? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND servfail from caching server
It's because the NS RRSet returned by the authoritative name servers lists servers that are not authoritative. Classic DNS mistake. The com zone says that the authoritative servers for supernet.com are ns{2,3}.earthlink.net (delegation). But supernet.com as hosted on ns{2,3}.earthlink.net says that dns{1,2}.earthlink.net are the authoritative servers. This latter set of servers is not actually authoritative for the zone. For the first query, the resolver has not yet talked to the authoritative servers, so its only information is the delegation NS record set from com. The answer to that query, however, contains the authoritative NS record set, which is considered more credible and therefore replaces the delegation record set in the resolver's cache. Subsequent queries into the zone go to the bad servers, get lame responses, and fail. Unless you own supernet.com, this problem is not your fault and not for you to fix. You can work around it with conditional forwarding, or a zone of type static-stub if you're using BIND 9.8 already, but that's strictly a workaround and subject to breakage if the zone is moved. Chris Buxton BlueCat Networks On Mar 3, 2011, at 2:29 PM, Justin Krejci wrote: When doing a recursive query for MX supernet.com against a caching BIND server, the BIND server responds back with the answer. The TTL is 300. After the TTL expires the following recursive query for the same record returns a SERVFAIL from the caching server. If I do a +trace on the same query to the same caching server for the same data it is able to respond with the answer yet a standard recursive query still gives a SERVFAIL. Queries for other domains are working fine on this caching server. Other 3rd party DNS caching servers are responding fine for the same record above even after the TTL expires, tried @8.8.8.8 and @208.67.220.220 If if flush the cache on the caching server it successfully returns the answer to the query but only for the up the TTL's life then goes back to SERVFAIL again. (tried doing a full stop-and-start of named as well). This particular server is running BIND 9.7.0-P2 but this exact same behavior is also happening on a server running 9.5.1-P2.1 as well. So I noticed when doing a trace that the NS servers are different between the gtld and the actual authoritative servers. snip com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. ;; Received 502 bytes from 192.36.148.17#53(i.root-servers.net) in 2987 ms supernet.com. 172800 IN NS ns2.earthlink.net. supernet.com. 172800 IN NS ns3.earthlink.net. ;; Received 111 bytes from 192.54.112.30#53(h.gtld-servers.net) in 119 ms supernet.com. 300 IN MX 5 onemain-mx.earthlink.net. supernet.com. 3600IN NS dns1.earthlink.net. supernet.com. 3600IN NS dns2.earthlink.net. ;; Received 172 bytes from 207.217.120.43#53(ns3.earthlink.net) in 54 ms Is this just a bug that upgrading BIND will fix or is there something else going on here? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users