Re: [Dnsmasq-discuss] NXDOMAIN on exisiting A record
Petr, Thank you very much for your help. I will follow your advice and report my findings to the list. On Wed, Jul 10, 2019, 4:47 AM Petr Mensik wrote: > Hello Alex, > > I would try removing all-servers and clear-on-reload statements away. I > would use just one server for testing, retesting all of them for the > same behaviour. When you do not know which server is used, it is hard to > debug better. > > I think dots in server=/.X/ are not necessary and maybe even misleading. > Try it without them, just server=/X/ip > > I think one second timeout is too short. Just use only localhost in > /etc/resolv.conf and debug what happens with dnsmasq. Record what > queries are sent to dnsmasq and what dnsmasq forwards to configured > servers. > > Note I discovered already requests without recursion desired bit set are > forwarded always, do not serve any local records. But that should not be > the issue. Try dig +rec and dig +norec to rule it out. > > Regards, > Petr > > On 7/7/19 10:28 PM, Alex Litvak wrote: > > (luck of sleep, fixing some mistakes in text) > > > > Hello everyone, > > > > I run consul services on my network where services are registered with > > .service.consul when they start. All containers and bare metal > > hosts are running dnsmasq 2.80. > > I noticed that if I restart one of the containers, one of the hosts > > continue failing to resolve the service name. I assume that dnsmasq is > > a culprit because: > > > > 1. I can resolve service xyz.service.consul against standard dns servers > > with dig. > > 2. Dnsmasq listening on 127.0.0.1 is the first line in the resolve.conf > > and when I run tcpdump against port 53 on interface lo I see it returns > > NXDOMAIN on each A record query for service in question. > > 3. If I restart dnsmasq everything is back to normal again. Even more > > weird, if I send SIGHUP to dnsmasq, which only causes a reread of > > /etc/hosts file, everything is back to normal as far as service > > resolution goes. > > > > I have this problem only happening on some hosts without the pattern I > > can recognize. For example I have two nodes with the same config, os, > > kernel version, dnsmasq version, etc ... and one of them has the problem > > 100% after service xyz.service.consul restart and the other is not. > > > > Where do I start troubleshooting? Any ideas are welcome. > > > > Here is a standard dnsmasq confugration. > > > > port=53 > > domain-needed > > bogus-priv > > interface=lo > > listen-address=127.0.0.1 > > no-dhcp-interface=127.0.0.1 > > #bind-interfaces > > no-resolv > > all-servers > > dns-forward-max=500 > > > > # If you don't want dnsmasq to read /etc/hosts, uncomment the > > # following line. > > #no-hosts > > # or if you want it to read another file, as well as /etc/hosts, use > > # this. > > #addn-hosts=/etc/banner_add_hosts > > > > #log-queries=extra > > #log-facility=/var/log/dnsmasq.log > > log-async=25 > > > > # Set the cachesize here. > > cache-size=1 > > min-cache-ttl=5 > > #neg-ttl=3600 > > > > # If you want to disable negative caching, uncomment this. > > #no-negcache > > > > # For debugging purposes, log each DNS query as it passes through > > # dnsmasq. > > #log-queries > > clear-on-reload > > > > server=10.0.48.12 > > server=10.0.48.11 > > server=10.0.21.63 > > server=10.0.21.61 > > > > server=/.la.consul/10.0.73.43 > > server=/.la.consul/10.0.73.40 > > server=/.la.consul/10.0.73.28 > > server=/.chi-pbx.consul/10.1.73.1 > > server=/.chi-pbx.consul/10.1.73.2 > > server=/.chi-pbx.consul/10.1.73.3 > > server=/.consul/10.0.73.43 > > server=/.consul/10.0.73.40 > > server=/.consul/10.0.73.28 > > > > Resolver config > > > > search '' > > options timeout:1 attempts:1 > > nameserver 127.0.0.1 > > nameserver 10.0.48.11 > > nameserver 10.0.48.12 > > nameserver 10.0.21.63 > > > > > > > > ___ > > Dnsmasq-discuss mailing list > > Dnsmasq-discuss@lists.thekelleys.org.uk > > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > -- > Petr Menšík > Software Engineer > Red Hat, http://www.redhat.com/ > email: pemen...@redhat.com PGP: 65C6C973 > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] NXDOMAIN on exisiting A record
Hello Alex, I would try removing all-servers and clear-on-reload statements away. I would use just one server for testing, retesting all of them for the same behaviour. When you do not know which server is used, it is hard to debug better. I think dots in server=/.X/ are not necessary and maybe even misleading. Try it without them, just server=/X/ip I think one second timeout is too short. Just use only localhost in /etc/resolv.conf and debug what happens with dnsmasq. Record what queries are sent to dnsmasq and what dnsmasq forwards to configured servers. Note I discovered already requests without recursion desired bit set are forwarded always, do not serve any local records. But that should not be the issue. Try dig +rec and dig +norec to rule it out. Regards, Petr On 7/7/19 10:28 PM, Alex Litvak wrote: > (luck of sleep, fixing some mistakes in text) > > Hello everyone, > > I run consul services on my network where services are registered with > .service.consul when they start. All containers and bare metal > hosts are running dnsmasq 2.80. > I noticed that if I restart one of the containers, one of the hosts > continue failing to resolve the service name. I assume that dnsmasq is > a culprit because: > > 1. I can resolve service xyz.service.consul against standard dns servers > with dig. > 2. Dnsmasq listening on 127.0.0.1 is the first line in the resolve.conf > and when I run tcpdump against port 53 on interface lo I see it returns > NXDOMAIN on each A record query for service in question. > 3. If I restart dnsmasq everything is back to normal again. Even more > weird, if I send SIGHUP to dnsmasq, which only causes a reread of > /etc/hosts file, everything is back to normal as far as service > resolution goes. > > I have this problem only happening on some hosts without the pattern I > can recognize. For example I have two nodes with the same config, os, > kernel version, dnsmasq version, etc ... and one of them has the problem > 100% after service xyz.service.consul restart and the other is not. > > Where do I start troubleshooting? Any ideas are welcome. > > Here is a standard dnsmasq confugration. > > port=53 > domain-needed > bogus-priv > interface=lo > listen-address=127.0.0.1 > no-dhcp-interface=127.0.0.1 > #bind-interfaces > no-resolv > all-servers > dns-forward-max=500 > > # If you don't want dnsmasq to read /etc/hosts, uncomment the > # following line. > #no-hosts > # or if you want it to read another file, as well as /etc/hosts, use > # this. > #addn-hosts=/etc/banner_add_hosts > > #log-queries=extra > #log-facility=/var/log/dnsmasq.log > log-async=25 > > # Set the cachesize here. > cache-size=1 > min-cache-ttl=5 > #neg-ttl=3600 > > # If you want to disable negative caching, uncomment this. > #no-negcache > > # For debugging purposes, log each DNS query as it passes through > # dnsmasq. > #log-queries > clear-on-reload > > server=10.0.48.12 > server=10.0.48.11 > server=10.0.21.63 > server=10.0.21.61 > > server=/.la.consul/10.0.73.43 > server=/.la.consul/10.0.73.40 > server=/.la.consul/10.0.73.28 > server=/.chi-pbx.consul/10.1.73.1 > server=/.chi-pbx.consul/10.1.73.2 > server=/.chi-pbx.consul/10.1.73.3 > server=/.consul/10.0.73.43 > server=/.consul/10.0.73.40 > server=/.consul/10.0.73.28 > > Resolver config > > search '' > options timeout:1 attempts:1 > nameserver 127.0.0.1 > nameserver 10.0.48.11 > nameserver 10.0.48.12 > nameserver 10.0.21.63 > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss -- Petr Menšík Software Engineer Red Hat, http://www.redhat.com/ email: pemen...@redhat.com PGP: 65C6C973 ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] NXDOMAIN on exisiting A record
(luck of sleep, fixing some mistakes in text) Hello everyone, I run consul services on my network where services are registered with .service.consul when they start. All containers and bare metal hosts are running dnsmasq 2.80. I noticed that if I restart one of the containers, one of the hosts continue failing to resolve the service name. I assume that dnsmasq is a culprit because: 1. I can resolve service xyz.service.consul against standard dns servers with dig. 2. Dnsmasq listening on 127.0.0.1 is the first line in the resolve.conf and when I run tcpdump against port 53 on interface lo I see it returns NXDOMAIN on each A record query for service in question. 3. If I restart dnsmasq everything is back to normal again. Even more weird, if I send SIGHUP to dnsmasq, which only causes a reread of /etc/hosts file, everything is back to normal as far as service resolution goes. I have this problem only happening on some hosts without the pattern I can recognize. For example I have two nodes with the same config, os, kernel version, dnsmasq version, etc ... and one of them has the problem 100% after service xyz.service.consul restart and the other is not. Where do I start troubleshooting? Any ideas are welcome. Here is a standard dnsmasq confugration. port=53 domain-needed bogus-priv interface=lo listen-address=127.0.0.1 no-dhcp-interface=127.0.0.1 #bind-interfaces no-resolv all-servers dns-forward-max=500 # If you don't want dnsmasq to read /etc/hosts, uncomment the # following line. #no-hosts # or if you want it to read another file, as well as /etc/hosts, use # this. #addn-hosts=/etc/banner_add_hosts #log-queries=extra #log-facility=/var/log/dnsmasq.log log-async=25 # Set the cachesize here. cache-size=1 min-cache-ttl=5 #neg-ttl=3600 # If you want to disable negative caching, uncomment this. #no-negcache # For debugging purposes, log each DNS query as it passes through # dnsmasq. #log-queries clear-on-reload server=10.0.48.12 server=10.0.48.11 server=10.0.21.63 server=10.0.21.61 server=/.la.consul/10.0.73.43 server=/.la.consul/10.0.73.40 server=/.la.consul/10.0.73.28 server=/.chi-pbx.consul/10.1.73.1 server=/.chi-pbx.consul/10.1.73.2 server=/.chi-pbx.consul/10.1.73.3 server=/.consul/10.0.73.43 server=/.consul/10.0.73.40 server=/.consul/10.0.73.28 Resolver config search '' options timeout:1 attempts:1 nameserver 127.0.0.1 nameserver 10.0.48.11 nameserver 10.0.48.12 nameserver 10.0.21.63 ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] NXDOMAIN on exisiting A record
On Sun, Jul 07, 2019 at 02:09:20PM -0500, Alex Litvak wrote: > Hello every one, > > I run consul services on my network where servics are registered with > xyz.service.consul when they start. All containers and bare metal hosts are > running dnsmasq 2.80. > I noticed that if I restart one of the containers one of the hosts continue > failing to resolve the server hostname. I can see that dnsmasq is a culprit > because: > > 1. I can resolve service against standard dns servers > 2. Dnsmasq on 127.0.0.1 is first in the resolve.conf and when I run tcpdump > against port 53 on lo I see it returns NXDOMAIN on the service query > 3. If I restart dnsmasq everything is back to normal again. Even more > weird, if I send SIGHUP to dnsmasq which only causes to reread /etc/hosts > file, everything is bad to normal as far as service resolution goes. > > The weird thing is I have it only happen on some hosts without the pattern I > can recognize. For example I have to nodes with the same config, os, kernel > version, dnsmasq version, etc ... and one of them have the problem 100% on > service restart and other is not. > > Where do I start troubleshooting, any ideas are welcome. Draw a diagram / make a sketch / picture it > Here is a standard dnsmasq confugration. > > port=53 > domain-needed > bogus-priv > interface=lo > listen-address=127.0.0.1 > no-dhcp-interface=127.0.0.1 > #bind-interfaces > no-resolv > all-servers > dns-forward-max=500 > > # If you don't want dnsmasq to read /etc/hosts, uncomment the > # following line. > #no-hosts > # or if you want it to read another file, as well as /etc/hosts, use > # this. > #addn-hosts=/etc/banner_add_hosts > > #log-queries=extra > #log-facility=/var/log/dnsmasq.log > log-async=25 > > # Set the cachesize here. > cache-size=1 > min-cache-ttl=5 > #neg-ttl=3600 > > # If you want to disable negative caching, uncomment this. > #no-negcache > > # For debugging purposes, log each DNS query as it passes through > # dnsmasq. > #log-queries > clear-on-reload > > server=10.0.48.12 > server=10.0.48.11 > server=10.0.21.63 > server=10.0.21.61 > > server=/.la.consul/10.0.73.43 > server=/.la.consul/10.0.73.40 > server=/.la.consul/10.0.73.28 > server=/.chi-pbx.consul/10.1.73.1 > server=/.chi-pbx.consul/10.1.73.2 > server=/.chi-pbx.consul/10.1.73.3 > server=/.consul/10.0.73.43 > server=/.consul/10.0.73.40 > server=/.consul/10.0.73.28 > > Resolver config > > search '' > options timeout:1 attempts:1 > nameserver 127.0.0.1 > nameserver 10.0.48.11 > nameserver 10.0.48.12 > nameserver 10.0.21.63 > > > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss -- Groeten Geert Stappers -- Leven en laten leven ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss