Ok testing zesty on my own then, verified with three KVM guests: dns1 192.168.122.79 dns2 192.168.122.225 zesty 192.168.122.220
# basic servers $ sudo apt-get install bind9 bind9utils bind9-doc /etc/bind/named.conf.local: zone "paelzertest1.lan" { type master; file "/etc/bind/for.paelzertest1.lan"; }; zone "1.168.192.in-addr.arpa" { type master; file "/etc/bind/rev.paelzertest1.lan"; }; The other one the same but with a 2 instead of a 1 Also the forwar/reverse zones with 1 on dns1 and 2 on dns2 /etc/bind/for.paelzertest2.lan: $TTL 86400 @ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. ( 2011071001 ;Serial 3600 ;Refresh 1800 ;Retry 604800 ;Expire 86400 ;Minimum TTL ) @ IN NS pri.paelzertest1.lan. @ IN A 192.168.1.200 @ IN A 192.168.1.201 pri IN A 192.168.1.200 test IN A 192.168.1.200 /etc/bind/rev.paelzertest1.lan: $TTL 86400 @ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. ( 2011071002 ;Serial 3600 ;Refresh 1800 ;Retry 604800 ;Expire 86400 ;Minimum TTL ) @ IN NS pri.paelzertest1.lan. @ IN PTR paelzertest1.lan. pri IN A 192.168.1.200 test IN A 192.168.1.201 200 IN PTR pri.paelzertest1.lan. 201 IN PTR test.paelzertest1.lan. Disable recursion by adding the following to /etc/bind/named.conf.options: allow-transfer {"none";}; allow-recursion {"none";}; recursion no; $ sudo systemctl restart bind9 This is now having dns1 only answering for test.paelzertest1.lan and refusing if asking dns2 for it (and vice versa) Example: $ dig test.paelzertest1.lan @192.168.122.225 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62119 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;test.paelzertest1.lan. IN A ;; Query time: 0 msec ;; SERVER: 192.168.122.225#53(192.168.122.225) ;; WHEN: Tue Nov 07 07:14:52 UTC 2017 ;; MSG SIZE rcvd: 50 ubuntu@zesty-dnsmasq-test:~$ dig test.paelzertest2.lan @192.168.122.225 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @192.168.122.225 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37335 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;test.paelzertest2.lan. IN A ;; ANSWER SECTION: test.paelzertest2.lan. 86400 IN A 192.168.2.201 ;; AUTHORITY SECTION: paelzertest2.lan. 86400 IN NS pri.paelzertest2.lan. ;; ADDITIONAL SECTION: pri.paelzertest2.lan. 86400 IN A 192.168.2.200 ;; Query time: 0 msec ;; SERVER: 192.168.122.225#53(192.168.122.225) ;; WHEN: Tue Nov 07 07:14:56 UTC 2017 ;; MSG SIZE rcvd: 100 Now we configure dnsmasq as dns server and with a config to reach out to those two dns servers we prepared. $ sudo vim /etc/resolv.dnsmasq.conf nameserver 192.168.122.79 nameserver 192.168.122.225 $ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon --log-queries This should give you a dnsmasq asking our two servers, running locally (in foreground with debug enabled). On a second console on the test system with dnsmasq now use dig to query the dnsmasq that will then ask the two binds we have. So for something that fails for sure on both we get: $ dig foo @127.0.0.1 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> foo @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 42311 On the server we see: dnsmasq: query[A] foo from 127.0.0.1 dnsmasq: forwarded foo to 192.168.122.79 dnsmasq: forwarded foo to 192.168.122.225 That works for the Xenial Test. Now this is a bit of a race, run sime loacl requests and sometimes you get the combo: $ dig test.paelzertest2.lan @127.0.0.1 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 953 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 sever log: dnsmasq: query[A] test.paelzertest2.lan from 127.0.0.1 dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.79 dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.225 This should not happen (and doesn't with the fix). For Zesty to force the issue (since it has one of the two patches already) we need to force "SERVFAIL". Unfortunately this fail has to be faster than the valid reply to trigger the race (it would then consider fail success and reply without waiting for the good answer). To get an answer a bind has to run, but to get a SERVFAIL instead of an NXDOMAIN it will need a definition for that zone. So copy /etc/bind/for.paelzertest1.lan and /etc/bind/rev.paelzertest1.lan from dns1 to dns2. Then make it known in /etc/bind/named.conf.local to be loaded. Finally "break" it intentional e.g. by changing the leading "$TTL" to "TTL". That way bind works (one good zone) and serves paelzertest1 namespace (registered the conf) but it fails. Status should show like: named[3534]: zone paelzertest1.lan/IN: not loaded due to errors. Now dns1 gives me NOERROR but dns2 gives SERVFAIL for dig test.paelzertest1.lan @192.168.122.225 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36187 Disable caching to open the window of the race further. Further we need to set --all-servers, otherwise it would almost randomly iterate. $ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon --log-queries --cache-size=0 --all-servers That gives SERVFAIL when querying the dnsmasq server. $ dig test.paelzertest1.lan @127.0.0.1 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27511 Log from the server: dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1 dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225 => It didn't try the next as it considered SERVFAIL to be ok successful as an answer. Installing the version from proposed resolves that. $ dig test.paelzertest1.lan @127.0.0.1 ; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43539 Server-log: dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1 dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225 dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.79 dnsmasq: reply test.paelzertest1.lan is 192.168.1.201 With that - set verification-done ** Tags removed: verification-needed verification-needed-zesty ** Tags added: verification-done verification-done-zesty -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to dnsmasq in Ubuntu. https://bugs.launchpad.net/bugs/1726017 Title: dnsmasq prematurely returns REFUSED, breaking resolver Status in dnsmasq package in Ubuntu: Fix Released Status in dnsmasq source package in Xenial: In Progress Status in dnsmasq source package in Zesty: Fix Committed Bug description: [Impact] * DNS name resolution fails in certain network configurations, where different DNS servers are responsible for different domains and one or more servers reply REFUSED to queries that regard other domains than their own. Without the patch, dnsmasq returns a negative reply to if only one such negative answer is received from a forwarder, even if other forwarders return valid responses. This breaks the resolver and practically all internet connectivity, including web browsing, email, and receiving updates. * This should be backported to stable to fix internet connectivity for users. * The patch fixes the problem by querying all servers and only returning a negative reply to the requestor only if *all* forwarders return negative responses. [Test Case] * It should be possible to test this in a virtual network. One DNS server should be responsible for queries to the outside world, and the other one could be a DHCP/DNS instance (perhaps dnsmasq, also) that handles internal IP addresses and names. It's important that at least one of these servers return REFUSED to queries that don't belong into its realm (assuming the domain name is "my.net", the server for "my.net" would reply REFUSED to "ubuntu.com" and every other domain. I am not sure if this is normally the case, all I can say is that my Linux-based ASUS router does it. Connect an Ubuntu VM to this network. To aggravate the problem, the DHCP server would put the internal DNS server first in the nameservers field. If that's the case, the problem would also occur if the client used "strict-order" in dnsmasq.conf. [Regression Potential] * I don't see any. Would there be networks where admins rely upon getting NXDOMAIN back if just one server fails for a DNS query? I don't know. * [racb] As the behaviour in the area of REFUSED and SERVFAIL is being changed, it's probably worth checking during SRU verification that dnsmasq correctly passes back successful, REFUSED, SERVFAIL, zero-answer and 1+ answer responses in the simple, single upstream DNS server case. If there is a regression introduced by these patches, it is likely to be in the area of handling SERVFAIL, REFUSED and successful replies. [Other Info] Original bug description follows. Seen with dnsmasq 2.75-1ubuntu0.16.04.3, after Trusty->Xenial update. In my local network, I have two DNS servers; 192.168.1.1 is the local DHCP/DNS server configured to reply to queries inside the local network, and 192.168.1.4 is the forwarder in my DSL Router, responsible to answer queries about the outside world. THe DHCP server returns these in the order 192.168.1.4,192.168.1.1. The internal server replies REFUSED to queries about external domains. This configuration has worked well with Ubuntu 14.04 and other Linux Distros (using Fedora and OpenSUSE internally here), as well as various other OSes. It does not work with Ubuntu 16.04. NetworkManager's dnsmasq instance pushes the REFUSED reply from 192.168.1.1 to applications and ignores the successful reply from 2.168.1.4. This causes all DNS queries to external servers to fail. I believe this is fixed in dnsmasq 2.76 and related to http://lists.thekelleys.org.uk/pipermail/dnsmasq- discuss/2016q1/010263.html http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=68f6312d4bae30b78daafcd6f51dc441b8685b1e http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=4ace25c5d6 According to these sources, the bug was introduced with http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=51967f9807665dae403f1497b827165c5fa1084b In my local setup at least, I can work around the problem by using the "strict-order" option to dnsmasq. echo strict-order >/etc/NetworkManager/dnsmasq.d/order.conf But that's not a general solution. If dnsmasq has several forwarders, and some return SERVFAIL or REFUSED and others return SUCCESS, the successful answer should be returned to clients, independent of the strict-order setting. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1726017/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp