Ok testing zesty on my own then, verified with three KVM guests:
dns1 192.168.122.79
dns2 192.168.122.225
zesty 192.168.122.220

# basic servers
$ sudo apt-get install bind9 bind9utils bind9-doc

/etc/bind/named.conf.local:
zone "paelzertest1.lan" {
        type master;
        file "/etc/bind/for.paelzertest1.lan";
 };
zone "1.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/rev.paelzertest1.lan";
 };

The other one the same but with a 2 instead of a 1

Also the forwar/reverse zones with 1 on dns1 and 2 on dns2
/etc/bind/for.paelzertest2.lan:
$TTL 86400
@   IN  SOA     pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071001  ;Serial
        3600        ;Refresh
        1800        ;Retry
        604800      ;Expire
        86400       ;Minimum TTL
)
@       IN  NS         pri.paelzertest1.lan.
@       IN  A          192.168.1.200
@       IN  A          192.168.1.201
pri     IN  A          192.168.1.200
test    IN  A          192.168.1.200

/etc/bind/rev.paelzertest1.lan:
$TTL 86400
@   IN  SOA     pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071002  ;Serial
        3600        ;Refresh
        1800        ;Retry
        604800      ;Expire
        86400       ;Minimum TTL
)
@       IN  NS          pri.paelzertest1.lan.
@       IN  PTR         paelzertest1.lan.
pri     IN  A           192.168.1.200
test    IN  A           192.168.1.201
200     IN  PTR         pri.paelzertest1.lan.
201     IN  PTR         test.paelzertest1.lan.

Disable recursion by adding the following to /etc/bind/named.conf.options:
allow-transfer {"none";};
allow-recursion {"none";};
recursion no;

$ sudo systemctl restart bind9

This is now having dns1 only answering for test.paelzertest1.lan and
refusing if asking dns2 for it (and vice versa)

Example:
$ dig test.paelzertest1.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62119
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest1.lan.         IN      A

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:52 UTC 2017
;; MSG SIZE  rcvd: 50

ubuntu@zesty-dnsmasq-test:~$ dig test.paelzertest2.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37335
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest2.lan.         IN      A

;; ANSWER SECTION:
test.paelzertest2.lan.  86400   IN      A       192.168.2.201

;; AUTHORITY SECTION:
paelzertest2.lan.       86400   IN      NS      pri.paelzertest2.lan.

;; ADDITIONAL SECTION:
pri.paelzertest2.lan.   86400   IN      A       192.168.2.200

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:56 UTC 2017
;; MSG SIZE  rcvd: 100


Now we configure dnsmasq as dns server and with a config to reach out to those 
two dns servers we prepared.

$ sudo vim /etc/resolv.dnsmasq.conf
nameserver 192.168.122.79
nameserver 192.168.122.225
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon 
--log-queries

This should give you a dnsmasq asking our two servers, running locally (in 
foreground with debug enabled).
On a second console on the test system with dnsmasq now use dig to query the 
dnsmasq that will then ask the two binds we have.

So for something that fails for sure on both we get:
$ dig foo @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> foo @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 42311

On the server we see:
dnsmasq: query[A] foo from 127.0.0.1
dnsmasq: forwarded foo to 192.168.122.79
dnsmasq: forwarded foo to 192.168.122.225

That works for the Xenial Test.

Now this is a bit of a race, run sime loacl requests and sometimes you
get the combo:

$ dig test.paelzertest2.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

sever log:
dnsmasq: query[A] test.paelzertest2.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.79
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.225

This should not happen (and doesn't with the fix).

For Zesty to force the issue (since it has one of the two patches already) we 
need to force "SERVFAIL".
Unfortunately this fail has to be faster than the valid reply to trigger the 
race (it would then consider fail success and reply without waiting for the 
good answer).

To get an answer a bind has to run, but to get a SERVFAIL instead of an
NXDOMAIN it will need a definition for that zone.

So copy /etc/bind/for.paelzertest1.lan and /etc/bind/rev.paelzertest1.lan from 
dns1 to dns2.
Then make it known in /etc/bind/named.conf.local to be loaded.
Finally "break" it intentional e.g. by changing the leading "$TTL" to "TTL".
That way bind works (one good zone) and serves paelzertest1 namespace 
(registered the conf) but it fails.
Status should show like:
  named[3534]: zone paelzertest1.lan/IN: not loaded due to errors.

Now dns1 gives me NOERROR but dns2 gives SERVFAIL for
dig test.paelzertest1.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36187


Disable caching to open the window of the race further.
Further we need to set --all-servers, otherwise it would almost randomly 
iterate.
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon 
--log-queries --cache-size=0 --all-servers

That gives SERVFAIL when querying the dnsmasq server.
$ dig test.paelzertest1.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27511

Log from the server:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225

=> It didn't try the next as it considered SERVFAIL to be ok successful
as an answer.

Installing the version from proposed resolves that.

$ dig test.paelzertest1.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43539

Server-log:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.79
dnsmasq: reply test.paelzertest1.lan is 192.168.1.201

With that - set verification-done

** Tags removed: verification-needed verification-needed-zesty
** Tags added: verification-done verification-done-zesty

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to dnsmasq in Ubuntu.
https://bugs.launchpad.net/bugs/1726017

Title:
  dnsmasq prematurely returns REFUSED, breaking resolver

Status in dnsmasq package in Ubuntu:
  Fix Released
Status in dnsmasq source package in Xenial:
  In Progress
Status in dnsmasq source package in Zesty:
  Fix Committed

Bug description:
  [Impact]

   * DNS name resolution fails in certain network configurations, where
     different DNS servers are responsible for different domains and one or
     more servers reply REFUSED to queries that regard other domains than
     their own. Without the patch, dnsmasq returns a negative reply to
     if only one such negative answer is received from a forwarder, even
     if other forwarders return valid responses.

     This breaks
     the resolver and practically all internet connectivity, including web
     browsing, email, and receiving updates.

   * This should be backported to stable to fix internet connectivity
     for users.

   * The patch fixes the problem by querying all servers and only returning
     a negative reply to the requestor only if *all* forwarders return negative
     responses.

  [Test Case]

   * It should be possible to test this in a virtual network. One DNS server
     should be responsible for queries to the outside world, and the other one
     could be a DHCP/DNS instance (perhaps dnsmasq, also) that handles internal
     IP addresses and names. It's important that at least one of these servers
     return REFUSED to queries that don't belong into its realm (assuming the
     domain name is "my.net", the server for "my.net" would reply REFUSED to
     "ubuntu.com" and every other domain. I am not sure if this is normally the
     case, all I can say is that my Linux-based ASUS router does it.

     Connect an Ubuntu VM to this network.

     To aggravate the problem, the DHCP server would put the internal DNS
     server first in the nameservers field. If that's the case, the problem
     would also occur if the client used "strict-order" in dnsmasq.conf.

  [Regression Potential]

   * I don't see any. Would there be networks where admins rely upon getting
     NXDOMAIN back if just one server fails for a DNS query? I don't know.

   * [racb] As the behaviour in the area of REFUSED and SERVFAIL is
  being changed, it's probably worth checking during SRU verification
  that dnsmasq correctly passes back successful, REFUSED, SERVFAIL,
  zero-answer and 1+ answer responses in the simple, single upstream DNS
  server case. If there is a regression introduced by these patches, it
  is likely to be in the area of handling SERVFAIL, REFUSED and
  successful replies.

  [Other Info]

  Original bug description follows.

  Seen with dnsmasq 2.75-1ubuntu0.16.04.3, after Trusty->Xenial update.

  In my local network, I have two DNS servers; 192.168.1.1 is the local
  DHCP/DNS server configured to reply to queries inside the local
  network, and 192.168.1.4 is the forwarder in my DSL Router,
  responsible to answer queries about the outside world. THe DHCP server
  returns these in the order 192.168.1.4,192.168.1.1. The internal
  server replies REFUSED to queries about external domains.

  This configuration has worked well with Ubuntu 14.04 and other Linux
  Distros (using Fedora and OpenSUSE internally here), as well as
  various other OSes.

  It does not work with Ubuntu 16.04. NetworkManager's dnsmasq instance
  pushes the REFUSED reply from 192.168.1.1 to applications and ignores
  the successful reply from 2.168.1.4. This causes all DNS queries to
  external servers to fail.

  I believe this is fixed in dnsmasq 2.76 and related to

  http://lists.thekelleys.org.uk/pipermail/dnsmasq-
  discuss/2016q1/010263.html

  
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=68f6312d4bae30b78daafcd6f51dc441b8685b1e
  http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=4ace25c5d6

  According to these sources, the bug was introduced with
  
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=51967f9807665dae403f1497b827165c5fa1084b

  In my local setup at least, I can work around the problem by using the
  "strict-order" option to dnsmasq.

  echo strict-order >/etc/NetworkManager/dnsmasq.d/order.conf

  But that's not a general solution. If dnsmasq has several forwarders,
  and some return SERVFAIL or REFUSED and others return SUCCESS, the
  successful answer should be returned to clients, independent of the
  strict-order setting.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1726017/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to