Re: [Dnsmasq-discuss] occasional REFUSED response after successful query

2005-11-19 Thread Simon Kelley

Holger Schletz wrote:

Hi,

I get occasional REFUSED responses from dnsmasq on a specific network, though 
the query is actually successful. I am able to reproduce the error with 
dnsmasq 2.22 on Debian Sarge and 2.23 on Debian Etch in this network. 
However, i could not reproduce it with 2.23 on a different network. My Etch 
home box is also OK.


The problem occurs
- with the first query after dnsmasq starts up (this would not be a serious 
problem). Subsequent queries are successful.
- with queries issued by nightly cron jobs (which fail completely in 
consequence - very bad!)


The network uses a dial-on-demand DSL connection, which gets triggered by the 
query from the cronjobs. However, the dialup is perfomed on an external 
router and should be completely transparent to the network (except for a 
short delay). BIND9 never had this problem.
Moreover, in the first case the error can be reproduced even if the DSL link 
is already up.


This is how i tested it:

1. restart dnsmasq
2. run host some.domain.name
3. host responds: Host some.domain.name not found: 5(REFUSED)
4. But the query log shows:

Nov 17 11:54:48 zfg15 dnsmasq[8039]: query[A] some.domain.name from 
127.0.0.1
Nov 17 11:54:48 zfg15 dnsmasq[8039]: forwarded some.domain.name to 
first.dns.server
Nov 17 11:54:48 zfg15 dnsmasq[8039]: forwarded some.domain.name to 
second.dns.server
Nov 17 11:54:48 zfg15 dnsmasq[8039]: reply some.domain.name  is 
some.ip.address


What's going on here? What is so special about my network? And most important: 
how do I fix it? :-)


Thanks,
Holger




Dnsmasq only generates REFUSED return codes itself if there are no 
suitable upstream DNS servers to forward a query to, or all the attempts 
to forward fail at transmission time (typically, with No route to 
host) That's clearly not what is happening here, so the REFUSED return 
code must be coming from one of the upstream servers.


What is happening if this:

* This is the first query to dnsmasq, so it doesn't know which of the 
upstream servers are working. In this situation, it sends the query to 
all the servers, (two, in this case.) That's the first three lines of 
the log.


* One of the upstream servers returns REFUSED, which gets sent back to 
the original requestor, that's your problem. This is not logged by dnsmasq.


* The other upstream server returns a good anwer, which is also sent to 
the orignal requestor, but too late. That's the last line in the log. 
The upstream server which returns a good answer is marked as good and 
any subsequent request are sent there, so the problem doesn't recur.


The reason why it happens like this is partly  just history and inertia, 
partly because I didn't want to risk the original requestor getting no 
response at all, (and suffering a long timeout) when upstream servers 
are returning error codes. However, this isn't the first time this has 
been reported as a bug (see 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=330422), and from the 
next release the behaviuor will change. Now, if a query gets send to n 
servers, the first n-1 error replies will be dropped, and only the last 
one returned to the original requestor. That means that if some upstream 
servers are erroring, but some are working, then the query will still 
suceed.


I plan to release version 2.24, which has this change, fairly soon and 
I'm happy to make the current development snapshot available to anyone 
who wants to try it.


Holger, to fix your problem I suggest either weeding out the broken 
nameserver (though experience shows that by now, it's probably working 
again!), or risking the 2.24 beta.


Cheers,

Simon.







Re: [Dnsmasq-discuss] occasional REFUSED response after successful query

2005-11-19 Thread Holger Schletz
Hi,

 The reason why it happens like this is partly  just history and inertia,
 partly because I didn't want to risk the original requestor getting no
 response at all, (and suffering a long timeout) when upstream servers
 are returning error codes. However, this isn't the first time this has
 been reported as a bug (see
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=330422), and from the
 next release the behaviuor will change. Now, if a query gets send to n
 servers, the first n-1 error replies will be dropped, and only the last
 one returned to the original requestor. That means that if some upstream
 servers are erroring, but some are working, then the query will still
 suceed.

I already read this one, but did't realize this was the same issue... I never 
checked the upstream servers extensively. I just assumed they were working as 
I never had problems with the old configuration.

 I plan to release version 2.24, which has this change, fairly soon and
 I'm happy to make the current development snapshot available to anyone
 who wants to try it.

 Holger, to fix your problem I suggest either weeding out the broken
 nameserver (though experience shows that by now, it's probably working
 again!), or risking the 2.24 beta.

I don't like to run out-of distro software unless I have to. I'll check the 
upstream servers and hope that I find the evil one.

Thanks!
Holger


pgpcKQNnPpl7D.pgp
Description: PGP signature


Re: [Dnsmasq-discuss] occasional REFUSED response after successful query

2005-11-19 Thread Simon Kelley

Holger Schletz wrote:

Hi,



The reason why it happens like this is partly  just history and inertia,
partly because I didn't want to risk the original requestor getting no
response at all, (and suffering a long timeout) when upstream servers
are returning error codes. However, this isn't the first time this has
been reported as a bug (see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=330422), and from the
next release the behaviuor will change. Now, if a query gets send to n
servers, the first n-1 error replies will be dropped, and only the last
one returned to the original requestor. That means that if some upstream
servers are erroring, but some are working, then the query will still
suceed.



I already read this one, but did't realize this was the same issue... I never 
checked the upstream servers extensively. 
The Debian bug was with servers returning SERVFAIL, but exactly the same 
 thing applies to REFUSED.


I just assumed they were working as 
I never had problems with the old configuration.
The aim is to have dnsmasq compensate for broken upstream servers as 
much as possible. Tweaking it for every situation is on-going.


Cheers,

Simon.