Ok testing zesty on my own then, verified with three KVM guests:
dns1 192.168.122.79
dns2 192.168.122.225
zesty 192.168.122.220
# basic servers
$ sudo apt-get install bind9 bind9utils bind9-doc
/etc/bind/named.conf.local:
zone "paelzertest1.lan" {
type master;
file "/etc/bind/for.paelzertest1.lan";
};
zone "1.168.192.in-addr.arpa" {
type master;
file "/etc/bind/rev.paelzertest1.lan";
};
The other one the same but with a 2 instead of a 1
Also the forwar/reverse zones with 1 on dns1 and 2 on dns2
/etc/bind/for.paelzertest2.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
2011071001 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN A 192.168.1.200
@ IN A 192.168.1.201
pri IN A 192.168.1.200
test IN A 192.168.1.200
/etc/bind/rev.paelzertest1.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
2011071002 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN PTR paelzertest1.lan.
pri IN A 192.168.1.200
test IN A 192.168.1.201
200 IN PTR pri.paelzertest1.lan.
201 IN PTR test.paelzertest1.lan.
Disable recursion by adding the following to /etc/bind/named.conf.options:
allow-transfer {"none";};
allow-recursion {"none";};
recursion no;
$ sudo systemctl restart bind9
This is now having dns1 only answering for test.paelzertest1.lan and
refusing if asking dns2 for it (and vice versa)
Example:
$ dig test.paelzertest1.lan @192.168.122.225
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62119
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest1.lan. IN A
;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:52 UTC 2017
;; MSG SIZE rcvd: 50
ubuntu@zesty-dnsmasq-test:~$ dig test.paelzertest2.lan @192.168.122.225
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37335
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest2.lan. IN A
;; ANSWER SECTION:
test.paelzertest2.lan. 86400 IN A 192.168.2.201
;; AUTHORITY SECTION:
paelzertest2.lan. 86400 IN NS pri.paelzertest2.lan.
;; ADDITIONAL SECTION:
pri.paelzertest2.lan. 86400 IN A 192.168.2.200
;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:56 UTC 2017
;; MSG SIZE rcvd: 100
Now we configure dnsmasq as dns server and with a config to reach out to those
two dns servers we prepared.
$ sudo vim /etc/resolv.dnsmasq.conf
nameserver 192.168.122.79
nameserver 192.168.122.225
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon
--log-queries
This should give you a dnsmasq asking our two servers, running locally (in
foreground with debug enabled).
On a second console on the test system with dnsmasq now use dig to query the
dnsmasq that will then ask the two binds we have.
So for something that fails for sure on both we get:
$ dig foo @127.0.0.1
; <<>> DiG 9.10.3-P4-Ubuntu <<>> foo @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 42311
On the server we see:
dnsmasq: query[A] foo from 127.0.0.1
dnsmasq: forwarded foo to 192.168.122.79
dnsmasq: forwarded foo to 192.168.122.225
That works for the Xenial Test.
Now this is a bit of a race, run sime loacl requests and sometimes you
get the combo:
$ dig test.paelzertest2.lan @127.0.0.1
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
sever log:
dnsmasq: query[A] test.paelzertest2.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.79
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.225
This should not happen (and doesn't with the fix).
For Zesty to force the issue (since it has one of the two patches already) we
need to force "SERVFAIL".
Unfortunately this fail has to be faster than the valid reply to trigger the
race (it would then consider fail success and reply without waiting for the
good answer).
To get an answer a bind has to run, but to get a SERVFAIL instead of an
NXDOMAIN it will need a definition for that zone.
So copy /etc/bind/for.paelzertest1.lan and /etc/bind/rev.paelzertest1.lan from
dns1 to dns2.
Then make it known in /etc/bind/named.conf.local to be loaded.
Finally "break" it intentional e.g. by changing the leading "$TTL" to "TTL".
That way bind works (one good zone) and serves paelzertest1 namespace
(registered the conf) but it fails.
Status should show like:
named[3534]: zone paelzertest1.lan/IN: not loaded due to errors.
Now dns1 gives me NOERROR but dns2 gives SERVFAIL for
dig test.paelzertest1.lan @192.168.122.225
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36187
Disable caching to open the window of the race further.
Further we need to set --all-servers, otherwise it would almost randomly
iterate.
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon
--log-queries --cache-size=0 --all-servers
That gives SERVFAIL when querying the dnsmasq server.
$ dig test.paelzertest1.lan @127.0.0.1
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27511
Log from the server:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225
=> It didn't try the next as it considered SERVFAIL to be ok successful
as an answer.
Installing the version from proposed resolves that.
$ dig test.paelzertest1.lan @127.0.0.1
; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43539
Server-log:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.79
dnsmasq: reply test.paelzertest1.lan is 192.168.1.201
With that - set verification-done
** Tags removed: verification-needed verification-needed-zesty
** Tags added: verification-done verification-done-zesty
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to dnsmasq in Ubuntu.
https://bugs.launchpad.net/bugs/1726017
Title:
dnsmasq prematurely returns REFUSED, breaking resolver
Status in dnsmasq package in Ubuntu:
Fix Released
Status in dnsmasq source package in Xenial:
In Progress
Status in dnsmasq source package in Zesty:
Fix Committed
Bug description:
[Impact]
* DNS name resolution fails in certain network configurations, where
different DNS servers are responsible for different domains and one or
more servers reply REFUSED to queries that regard other domains than
their own. Without the patch, dnsmasq returns a negative reply to
if only one such negative answer is received from a forwarder, even
if other forwarders return valid responses.
This breaks
the resolver and practically all internet connectivity, including web
browsing, email, and receiving updates.
* This should be backported to stable to fix internet connectivity
for users.
* The patch fixes the problem by querying all servers and only returning
a negative reply to the requestor only if *all* forwarders return negative
responses.
[Test Case]
* It should be possible to test this in a virtual network. One DNS server
should be responsible for queries to the outside world, and the other one
could be a DHCP/DNS instance (perhaps dnsmasq, also) that handles internal
IP addresses and names. It's important that at least one of these servers
return REFUSED to queries that don't belong into its realm (assuming the
domain name is "my.net", the server for "my.net" would reply REFUSED to
"ubuntu.com" and every other domain. I am not sure if this is normally the
case, all I can say is that my Linux-based ASUS router does it.
Connect an Ubuntu VM to this network.
To aggravate the problem, the DHCP server would put the internal DNS
server first in the nameservers field. If that's the case, the problem
would also occur if the client used "strict-order" in dnsmasq.conf.
[Regression Potential]
* I don't see any. Would there be networks where admins rely upon getting
NXDOMAIN back if just one server fails for a DNS query? I don't know.
* [racb] As the behaviour in the area of REFUSED and SERVFAIL is
being changed, it's probably worth checking during SRU verification
that dnsmasq correctly passes back successful, REFUSED, SERVFAIL,
zero-answer and 1+ answer responses in the simple, single upstream DNS
server case. If there is a regression introduced by these patches, it
is likely to be in the area of handling SERVFAIL, REFUSED and
successful replies.
[Other Info]
Original bug description follows.
Seen with dnsmasq 2.75-1ubuntu0.16.04.3, after Trusty->Xenial update.
In my local network, I have two DNS servers; 192.168.1.1 is the local
DHCP/DNS server configured to reply to queries inside the local
network, and 192.168.1.4 is the forwarder in my DSL Router,
responsible to answer queries about the outside world. THe DHCP server
returns these in the order 192.168.1.4,192.168.1.1. The internal
server replies REFUSED to queries about external domains.
This configuration has worked well with Ubuntu 14.04 and other Linux
Distros (using Fedora and OpenSUSE internally here), as well as
various other OSes.
It does not work with Ubuntu 16.04. NetworkManager's dnsmasq instance
pushes the REFUSED reply from 192.168.1.1 to applications and ignores
the successful reply from 2.168.1.4. This causes all DNS queries to
external servers to fail.
I believe this is fixed in dnsmasq 2.76 and related to
http://lists.thekelleys.org.uk/pipermail/dnsmasq-
discuss/2016q1/010263.html
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=68f6312d4bae30b78daafcd6f51dc441b8685b1e
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=4ace25c5d6
According to these sources, the bug was introduced with
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=51967f9807665dae403f1497b827165c5fa1084b
In my local setup at least, I can work around the problem by using the
"strict-order" option to dnsmasq.
echo strict-order >/etc/NetworkManager/dnsmasq.d/order.conf
But that's not a general solution. If dnsmasq has several forwarders,
and some return SERVFAIL or REFUSED and others return SUCCESS, the
successful answer should be returned to clients, independent of the
strict-order setting.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1726017/+subscriptions
--
Mailing list: https://launchpad.net/~touch-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~touch-packages
More help : https://help.launchpad.net/ListHelp