RE: Seemingly random ServFail issues on a caching server

2011-09-06 Thread Florian CROUZAT
Florian CROUZAT wrote on 2011-08-31:

 Lyle Giese wrote on 2011-08-31:

 On 8/31/2011 8:40 AM, Florian CROUZAT wrote:
 Florian CROUZAT wrote on 2011-08-25:

 Hi list,

 On a few domains (we'll consider only one domain for this example) I
 encounter sometimes (seemingly randoms) ServFails while resolving
 domain names. A client (192.168.147.2) asks my caching server
 (192.168.151.100) to resolve a target (www.leclercdrive.fr)

 Here are the relevant logs:

 Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries:
 info: client 192.168.147.2#34502: view internal: query:
 www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]:
 24-Aug-2011 17:14:19.380 queries: info: client 192.168.147.2#34502:
 view internal: query: www.leclercdrive.fr IN A + Aug 24 17:14:19 ns
 named[24929]: 24-Aug- 2011 17:14:19.382 queries: info: client
 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +


 A tcpdump on the local side of the NS server shows the A request and
 the instant ServFail. A tcpdump on the external side of the NS server
 shows no traffic at all in this case meaning it fails internally and
 doesn't even try to forward the A request to the Internet.

 17:14:19.377608 IP 192.168.147.2.34502  192.168.151.100.53: 26340+
 A? www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53
 192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
 192.168.147.2.34502  192.168.151.100.53: 52628+ A?
 www.leclercdrive.fr. (37) 17:14:19.381383 IP 192.168.151.100.53
 192.168.147.2.34502: 52628 ServFail 0/0/0 (37) 17:14:19.382605 IP
 192.168.147.2.34502 192.168.151.100.53: 58933+ A?
 www.leclercdrive.fr. (37) 17:14:19.383406 IP 192.168.151.100.53
 192.168.147.2.34502: 58933 ServFail 0/0/0 (37)

 A few minutes before, or later, it worked just fine, see:

 17:15:58.736177 IP 192.168.147.2.34502  192.168.151.100.53: 49610+
 A? www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53
 192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]

 The TTL of the www.leclercdrive.fr entry is 300 - which seems short
 to me - maybe the ServFail happens when a request is treated at the
 exact time of the TTL reaching zero and the cache entry beeing
 flushed ? I tried flushing the cache using rndc but the first request
 after that worked just fine (of course...)

 Any ideas/hints are welcome.

 The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
 cat /etc/debian_version =  5.0.4
 (I have no control on the version of the tools)



 I found in my logfiles a few other domains where the ServFails happen,
 their respective TTL are all different, from 300 sec to 86400. I still
 have no idea at all how to resolve this issue and as far as I
 investigated, I haven't been able to identify a pattern in those
 ServFails. I'm not even sure the TTL is involved since I saw two
 ServFail separated in time by less than the TTL value of the entry...

 Florian


 The authorative name servers for leclercdrive.fr are a.dns.gandi.net,
 b.dns.gandi.net and c.dns.gandi.net.  I don't know how big gandi.net
 is, but traceroutes to those servers end up going through Level3 in
 Baltimore, MD from here.  They did have a hurricane go through there
 and I would not be surprised if traffic levels have been a bit high for
 the last few days.

 Lyle

 Well, it's a french registrar, my servers are in France and my clients
 are french too so from here the traceroute is pretty neat. Anyway my
 problem isn't (apparently) Gandi related, or even www.leclercdrive.fr
 related since the ServFails happen internally and instantanetly in my
 BIND which doesn't even try to forward the A request.


 Florian

Apparently -- even if I don't understand why -- the problem seems to be that
the NS ({a,b,c}.dns.gandi.net) of leclercdrive.fr and other domains which
ServFail have  entries and my caching server has IPv6 enabled but my
network doesn't route or handle IPv6.

All I had to do to get rid of those ServFails was to add -4 in the
starting options of bind (CentOS: /etc/default/bind9, OPTIONS=)

Anyway, I don't really understand whether or not it's a bug in bind that
only happens when your interface has a link-local IPv6 addr, the remote NS
have  entries and your network doesn't handle IPv6.

The solution I applied works, but I'm not satisfied with it.
Any precisions are of course welcome.

Greetings,
Florian





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

RE: Seemingly random ServFail issues on a caching server

2011-08-31 Thread Florian CROUZAT
Florian CROUZAT wrote on 2011-08-25:

 Hi list,

 On a few domains (we'll consider only one domain for this example) I
 encounter sometimes (seemingly randoms) ServFails while resolving domain
 names. A client (192.168.147.2) asks my caching server (192.168.151.100)
 to resolve a target (www.leclercdrive.fr)

 Here are the relevant logs:

 Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: info:
 client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN
 A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.380 queries:
 info: client 192.168.147.2#34502: view internal: query:
 www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011
 17:14:19.382 queries: info: client 192.168.147.2#34502: view internal:
 query: www.leclercdrive.fr IN A +


 A tcpdump on the local side of the NS server shows the A request and the
 instant ServFail. A tcpdump on the external side of the NS server shows
 no traffic at all in this case meaning it fails internally and doesn't
 even try to forward the A request to the Internet.

 17:14:19.377608 IP 192.168.147.2.34502  192.168.151.100.53: 26340+ A?
 www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53 
 192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
 192.168.147.2.34502  192.168.151.100.53: 52628+ A? www.leclercdrive.fr.
 (37) 17:14:19.381383 IP 192.168.151.100.53  192.168.147.2.34502: 52628
 ServFail 0/0/0 (37) 17:14:19.382605 IP 192.168.147.2.34502 
 192.168.151.100.53: 58933+ A? www.leclercdrive.fr. (37) 17:14:19.383406
 IP 192.168.151.100.53  192.168.147.2.34502: 58933 ServFail 0/0/0 (37)

 A few minutes before, or later, it worked just fine, see:

 17:15:58.736177 IP 192.168.147.2.34502  192.168.151.100.53: 49610+ A?
 www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53 
 192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]

 The TTL of the www.leclercdrive.fr entry is 300 - which seems short to
 me - maybe the ServFail happens when a request is treated at the exact
 time of the TTL reaching zero and the cache entry beeing flushed ? I
 tried flushing the cache using rndc but the first request after that
 worked just fine (of course...)

 Any ideas/hints are welcome.

 The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
 cat /etc/debian_version = 5.0.4
 (I have no control on the version of the tools)



I found in my logfiles a few other domains where the ServFails happen, their
respective TTL are all different, from 300 sec to 86400.
I still have no idea at all how to resolve this issue and as far as I
investigated, I haven't been able to identify a pattern in those ServFails.
I'm not even sure the TTL is involved since I saw two ServFail separated in
time by less than the TTL value of the entry...

Florian





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Seemingly random ServFail issues on a caching server

2011-08-31 Thread Lyle Giese

On 8/31/2011 8:40 AM, Florian CROUZAT wrote:

Florian CROUZAT wrote on 2011-08-25:


Hi list,

On a few domains (we'll consider only one domain for this example) I
encounter sometimes (seemingly randoms) ServFails while resolving domain
names. A client (192.168.147.2) asks my caching server (192.168.151.100)
to resolve a target (www.leclercdrive.fr)

Here are the relevant logs:

Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN
A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.380 queries:
info: client 192.168.147.2#34502: view internal: query:
www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011
17:14:19.382 queries: info: client 192.168.147.2#34502: view internal:
query: www.leclercdrive.fr IN A +


A tcpdump on the local side of the NS server shows the A request and the
instant ServFail. A tcpdump on the external side of the NS server shows
no traffic at all in this case meaning it fails internally and doesn't
even try to forward the A request to the Internet.

17:14:19.377608 IP 192.168.147.2.34502  192.168.151.100.53: 26340+ A?
www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53
192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
192.168.147.2.34502  192.168.151.100.53: 52628+ A? www.leclercdrive.fr.
(37) 17:14:19.381383 IP 192.168.151.100.53  192.168.147.2.34502: 52628
ServFail 0/0/0 (37) 17:14:19.382605 IP 192.168.147.2.34502
192.168.151.100.53: 58933+ A? www.leclercdrive.fr. (37) 17:14:19.383406
IP 192.168.151.100.53  192.168.147.2.34502: 58933 ServFail 0/0/0 (37)

A few minutes before, or later, it worked just fine, see:

17:15:58.736177 IP 192.168.147.2.34502  192.168.151.100.53: 49610+ A?
www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53
192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]

The TTL of the www.leclercdrive.fr entry is 300 - which seems short to
me - maybe the ServFail happens when a request is treated at the exact
time of the TTL reaching zero and the cache entry beeing flushed ? I
tried flushing the cache using rndc but the first request after that
worked just fine (of course...)

Any ideas/hints are welcome.

The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
cat /etc/debian_version =  5.0.4
(I have no control on the version of the tools)




I found in my logfiles a few other domains where the ServFails happen, their
respective TTL are all different, from 300 sec to 86400.
I still have no idea at all how to resolve this issue and as far as I
investigated, I haven't been able to identify a pattern in those ServFails.
I'm not even sure the TTL is involved since I saw two ServFail separated in
time by less than the TTL value of the entry...

Florian



The authorative name servers for leclercdrive.fr are a.dns.gandi.net, 
b.dns.gandi.net and c.dns.gandi.net.  I don't know how big gandi.net is, 
but traceroutes to those servers end up going through Level3 in 
Baltimore, MD from here.  They did have a hurricane go through there and 
I would not be surprised if traffic levels have been a bit high for the 
last few days.


Lyle
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Seemingly random ServFail issues on a caching server

2011-08-31 Thread Florian CROUZAT
Lyle Giese wrote on 2011-08-31:

 On 8/31/2011 8:40 AM, Florian CROUZAT wrote:
 Florian CROUZAT wrote on 2011-08-25:

 Hi list,

 On a few domains (we'll consider only one domain for this example) I
 encounter sometimes (seemingly randoms) ServFails while resolving
 domain names. A client (192.168.147.2) asks my caching server
 (192.168.151.100) to resolve a target (www.leclercdrive.fr)

 Here are the relevant logs:

 Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries:
 info: client 192.168.147.2#34502: view internal: query:
 www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]:
 24-Aug-2011 17:14:19.380 queries: info: client 192.168.147.2#34502:
 view internal: query: www.leclercdrive.fr IN A + Aug 24 17:14:19 ns
 named[24929]: 24-Aug- 2011 17:14:19.382 queries: info: client
 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +


 A tcpdump on the local side of the NS server shows the A request and
 the instant ServFail. A tcpdump on the external side of the NS server
 shows no traffic at all in this case meaning it fails internally and
 doesn't even try to forward the A request to the Internet.

 17:14:19.377608 IP 192.168.147.2.34502  192.168.151.100.53: 26340+ A?
 www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53
 192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
 192.168.147.2.34502  192.168.151.100.53: 52628+ A?
 www.leclercdrive.fr. (37) 17:14:19.381383 IP 192.168.151.100.53
 192.168.147.2.34502: 52628 ServFail 0/0/0 (37) 17:14:19.382605 IP
 192.168.147.2.34502 192.168.151.100.53: 58933+ A?
 www.leclercdrive.fr. (37) 17:14:19.383406 IP 192.168.151.100.53
 192.168.147.2.34502: 58933 ServFail 0/0/0 (37)

 A few minutes before, or later, it worked just fine, see:

 17:15:58.736177 IP 192.168.147.2.34502  192.168.151.100.53: 49610+ A?
 www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53
 192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]

 The TTL of the www.leclercdrive.fr entry is 300 - which seems short to
 me - maybe the ServFail happens when a request is treated at the exact
 time of the TTL reaching zero and the cache entry beeing flushed ? I
 tried flushing the cache using rndc but the first request after that
 worked just fine (of course...)

 Any ideas/hints are welcome.

 The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
 cat /etc/debian_version =  5.0.4
 (I have no control on the version of the tools)



 I found in my logfiles a few other domains where the ServFails happen,
 their respective TTL are all different, from 300 sec to 86400. I still
 have no idea at all how to resolve this issue and as far as I
 investigated, I haven't been able to identify a pattern in those
 ServFails. I'm not even sure the TTL is involved since I saw two
 ServFail separated in time by less than the TTL value of the entry...

 Florian


 The authorative name servers for leclercdrive.fr are a.dns.gandi.net,
 b.dns.gandi.net and c.dns.gandi.net.  I don't know how big gandi.net is,
 but traceroutes to those servers end up going through Level3 in
 Baltimore, MD from here.  They did have a hurricane go through there and
 I would not be surprised if traffic levels have been a bit high for the
 last few days.

 Lyle

Well, it's a french registrar, my servers are in France and my clients are
french too so from here the traceroute is pretty neat.
Anyway my problem isn't (apparently) Gandi related, or even
www.leclercdrive.fr related since the ServFails happen internally and
instantanetly in my BIND which doesn't even try to forward the A request.


Florian





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Seemingly random ServFail issues on a caching server

2011-08-25 Thread Florian CROUZAT
Hi list,

On a few domains (we'll consider only one domain for this example) I
encounter sometimes (seemingly randoms) ServFails while resolving domain
names.
A client (192.168.147.2) asks my caching server (192.168.151.100) to resolve
a target (www.leclercdrive.fr)

Here are the relevant logs:

Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +
Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.380 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +
Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.382 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A +


A tcpdump on the local side of the NS server shows the A request and the
instant ServFail.
A tcpdump on the external side of the NS server shows no traffic at all in
this case meaning it fails internally and doesn't even try to forward the A
request to the Internet.

17:14:19.377608 IP 192.168.147.2.34502  192.168.151.100.53: 26340+ A?
www.leclercdrive.fr. (37)
17:14:19.378845 IP 192.168.151.100.53  192.168.147.2.34502: 26340 ServFail
0/0/0 (37)
17:14:19.380607 IP 192.168.147.2.34502  192.168.151.100.53: 52628+ A?
www.leclercdrive.fr. (37)
17:14:19.381383 IP 192.168.151.100.53  192.168.147.2.34502: 52628 ServFail
0/0/0 (37)
17:14:19.382605 IP 192.168.147.2.34502  192.168.151.100.53: 58933+ A?
www.leclercdrive.fr. (37)
17:14:19.383406 IP 192.168.151.100.53  192.168.147.2.34502: 58933 ServFail
0/0/0 (37)

A few minutes before, or later, it worked just fine, see:

17:15:58.736177 IP 192.168.147.2.34502  192.168.151.100.53: 49610+ A?
www.leclercdrive.fr. (37)
17:15:58.784470 IP 192.168.151.100.53  192.168.147.2.34502: 49610 3/3/6
CNAME[|domain]

The TTL of the www.leclercdrive.fr entry is 300 - which seems short to me -
maybe the ServFail happens when a request is treated at the exact time of
the TTL reaching zero and the cache entry beeing flushed ? I tried flushing
the cache using rndc but the first request after that worked just fine (of
course...)

Any ideas/hints are welcome.

The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
cat /etc/debian_version = 5.0.4
(I have no control on the version of the tools)

Thank you.



Florian



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users