Re: [Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

2019-11-28 Thread Tore Anderson
* Vladislav Grishenko

> Can you try to capture dns exchange to dnsmasq (on lo interface) and from it 
> (on your nic interface) both at the same time?
> $ tcpdump -i lo port 53 -w /path/to/dns-lo.pcap
> $ tcpdump -i  port 53 -w /path/to/dns-ext.pcap
> Highly possible that trigger query (or reply) can't be logged in usual way, 
> but will be captured by tcpdump.
> Next, I'd like to take a look at them, will there be something after/near the 
> last logged query before spinning starts.
> 
> p.s. Caution, pcap files will contain all your dns traffic, sending it to 
> mail list might be not a really good idea.

Hi,

PCAP attached. I used «tcpdump -i any», so it's a single file with the internal 
and external traffic interleaved.

It is apparent that the initial SSHFP query is not forwarded upstream, and that 
the subsequent queries from the stub resolver (a retransmission of the SSHFP 
query plus some other unrelated queries) are neither logged nor forwarded.

Here are the corresponding log lines from Dnsmasq:

nov. 29 07:15:53.964856 sloth.fud.no dnsmasq[48069]: query[A] 
l1-g9-osl2.n.bitbit.net from 127.0.0.1
nov. 29 07:15:53.965060 sloth.fud.no dnsmasq[48069]: forwarded 
l1-g9-osl2.n.bitbit.net to 87.238.33.1
nov. 29 07:15:53.965155 sloth.fud.no dnsmasq[48069]: query[] 
l1-g9-osl2.n.bitbit.net from 127.0.0.1
nov. 29 07:15:53.965273 sloth.fud.no dnsmasq[48069]: forwarded 
l1-g9-osl2.n.bitbit.net to 87.238.33.1
nov. 29 07:15:54.039407 sloth.fud.no dnsmasq[48069]: reply 
l1-g9-osl2.n.bitbit.net is 
nov. 29 07:15:54.039461 sloth.fud.no dnsmasq[48069]: reply 
eth0.l1-g9-osl2.n.bitbit.net is 10.20.120.102
nov. 29 07:15:54.039666 sloth.fud.no dnsmasq[48069]: reply 
l1-g9-osl2.n.bitbit.net is 
nov. 29 07:15:54.039700 sloth.fud.no dnsmasq[48069]: reply 
eth0.l1-g9-osl2.n.bitbit.net is NODATA-IPv6
nov. 29 07:15:54.964042 sloth.fud.no dnsmasq[48069]: query[type=44] 
l1-g9-osl2.n.bitbit.net from 127.0.0.1
(CPU starts spinning at this point, no further log lines appear)

Tore


cpu-spin.pcap.gz
Description: application/gzip
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

2019-11-28 Thread Vladislav Grishenko
Hi Tore,

Can you try to capture dns exchange to dnsmasq (on lo interface) and from it 
(on your nic interface) both at the same time?
$ tcpdump -i lo port 53 -w /path/to/dns-lo.pcap
$ tcpdump -i  port 53 -w /path/to/dns-ext.pcap
Highly possible that trigger query (or reply) can't be logged in usual way, but 
will be captured by tcpdump.
Next, I'd like to take a look at them, will there be something after/near the 
last logged query before spinning starts.

p.s. Caution, pcap files will contain all your dns traffic, sending it to mail 
list might be not a really good idea.

Best Regards, Vladislav Grishenko

-Original Message-
From: Dnsmasq-discuss  On 
Behalf Of Tore Anderson
Sent: Thursday, November 28, 2019 12:38 PM
To: dnsmasq-discuss@lists.thekelleys.org.uk
Subject: [Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

Hello,

I've noticed that Dnsmasq on my system sometimes enters a defective state where 
it starts spinning on the CPU. When it has entered this state, I need to send 
it SIGKILL to get rid of it - SIGTERM is ignored.

The version is current Git master (2.80-93-g6ebdc95).

I've enabled query logging and grabbed the final log lines of a few incidents 
(slightly anonymised):

Example 1:

forwarded git.i.example.org to 192.168.33.1 reply git.i.example.org is  
reply git01-osl3.i.example.org is 10.22.3.196 reply git.i.example.org is 
 reply git01-osl3.i.example.org is 2001:db8:400:c:18:59ff:fe7a:73c4 
query[type=44] git.i.example.org from 127.0.0.1 (CPU spin begins)

Example 2:

query[A] s2-a8-osl3.n.example.org from 127.0.0.1 forwarded 
s2-a8-osl3.n.example.org to 192.168.33.1 query[] s2-a8-osl3.n.example.org 
from 127.0.0.1 forwarded s2-a8-osl3.n.example.org to 192.168.33.1 reply 
s2-a8-osl3.n.example.org is  reply lo.s2-a8-osl3.n.example.org is 
2001:db8:1::4:1 reply s2-a8-osl3.n.example.org is  reply 
lo.s2-a8-osl3.n.example.org is 192.168.63.11 query[type=44] 
s2-a8-osl3.n.example.org from 127.0.0.1 (CPU spin begins)

Example 3:

query[A] s1-a8-osl3.n.example.org from 127.0.0.1 forwarded 
s1-a8-osl3.n.example.org to 192.168.33.1 query[] s1-a8-osl3.n.example.org 
from 127.0.0.1 forwarded s1-a8-osl3.n.example.org to 192.168.33.1 reply 
s1-a8-osl3.n.example.org is  reply lo.s1-a8-osl3.n.example.org is 
192.168.63.10 reply s1-a8-osl3.n.example.org is  reply 
lo.s1-a8-osl3.n.example.org is 2001:db8:1::4:0 query[type=44] 
s1-a8-osl3.n.example.org from 127.0.0.1 (CPU spin begins)

All of them ends with an incoming query for SSHFP records (type 44), which I 
find highly suspect. The SSHFP requests comes from the SSH client (due to 
VerifyHostKeyDNS being set in my ~/.ssh/config).

None of the hostnames in question do have SSHFP records published, but that 
does not seem to matter, as the query does not seem to be forwarded upstream in 
the first place. When the bug does not occur, Dnsmasq does log that it forwards 
the query upstream, like so:

query[type=44] l1-a9-osl3.n.example.org from 127.0.0.1 forwarded 
l1-a9-osl3.n.example.org to 192.168.33.1

Dnsmasq is invoked from NetworkManager, using the following command line:

/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces 
--pid-file=/run/NetworkManager/dnsmasq.pid --listen-address=127.0.0.1 
--cache-size=400 --clear-on-reload --conf-file=/dev/null --proxy-dnssec 
--enable-dbus=org.freedesktop.NetworkManager.dnsmasq 
--conf-dir=/etc/NetworkManager/dnsmasq.d

Additional configuration in /etc/NetworkManager/dnsmasq.d/dnssec.conf:

dnssec
conf-file = /usr/share/dnsmasq/trust-anchors.conf
log-queries

Finally, my environment contains RES_OPTIONS=edns0 in case that is relevant 
(this is required for SSH's VerifyHostKeyDNS feature to work correctly).

I cannot reliably reproduce the issue. It seems to happen regularly (several 
times a day) during normal usage - I use the SSH client quite frequently.

I would be happy to provide additional debugging information, if given 
instructions on how to obtain it.'

Tore

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] inconsistent use of a server=/example.com/ specification

2019-11-28 Thread Geert Stappers
On Tue, Nov 26, 2019 at 06:18:02AM -0500, Brian J. Murrell wrote:
> On Tue, 2019-11-26 at 07:52 +0100, Which Nameserver wrote:
> > 
> } But NOT what might be causing the inconsistancy.
> > I hope that OP digs deeper.
> 
> Probably not.  I moved the desired behaviour to somewhere where it
> works reliably, up to NetworkManager.
> 
> example.com is the domainname of a VPN connection and 10.75.22.247 is
> the DNS server for that domain on that VPN.
> 
> Since NM has the ability to learn about domainnames and DNS servers
> from VPN connections, and to route requests for lookups in that domain
> (and the reverse domains) to the specified DNS, I just moved this
> setting out of the dnsmasq configuration into the NM configuration.
> 
> The funny part is that NM is just poking these into dnsmasq using DBus:
> 
> Nov 25 17:53:00 vpn-client.example.com dnsmasq[129564]: setting upstream 
> servers from DBus
> Nov 25 17:53:00 vpn-client.example.com dnsmasq[129564]: using local addresses 
> only for domain ilinx
> Nov 25 17:53:00 vpn-client.example.com dnsmasq[129564]: using nameserver 
> 10.75.22.247#53 for domain example.com
> Nov 25 17:53:00 vpn-client.example.com dnsmasq[129564]: using nameserver 
> 10.75.22.247#53 for domain 0.8.10.in-addr.arpa
> Nov 25 17:53:00 vpn-client.example.com dnsmasq[129564]: using nameserver 
> 10.75.22.247#53 for domain 22.75.10.in-addr.arpa
> 
> but this is working consistently and reliably where:
> 
> server=/example.com/10.75.22.247
> 
> is not.
> 

If hidding inconsistency in NM makes one happy, then is one happy.

Feel free to report the strange inconsistency again.
Karma bonuspoints for querying the nearby NS.


Groeten
Geert Stappers
-- 
Leven en laten leven

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Domain name wildcard match in the --server option

2019-11-28 Thread Geert Stappers


Warning: this email contains a compliment

If you can't handle compliments, stop reading now.


On Wed, Nov 27, 2019 at 06:41:10PM +0100, Geert Stappers wrote:
> On Wed, Nov 27, 2019 at 10:06:33PM +0800, MingJian Hong wrote:
> >   It is my first time to submit a patch by 'git send-email'.
> > If it doesn't work, please let me know. Thanks.
> 
> Patch did apply.  ( not mangled in e-mail )
> 
> Better feedback from my later.
} Better feedback from me later.
 

Now less in a hurry.


What `git send-email` does, is sending email.
The emails are created from previous created patches.
In those patches are commit messages.

 
> Regards
> Geert Stappers
>
> |stappers@alpaca:~/src/dnsmasq
> |$ git show | head -n 25

show the top twenty-five lines of the latest commit aka patch.


> |commit ded56ce787ac814cd7953d2b2efb9277f4fa1eb6
> |Author: MingJian Hong 
> |Date:   Wed Nov 27 22:06:33 2019 +0800
> |
> |Domain name wildcard match in the --server option
> |
> |  It is my first time to submit a patch by 'git send-email'.
> |If it doesn't work, please let me know. Thanks.
> |
> |Regards,
> |hmj

Good text for an email, less good text for a commit message


> |diff --git a/src/forward.c b/src/forward.c
> |index f488b90..8bb3bcf 100644
> |--- a/src/forward.c
> |+++ b/src/forward.c
> |@@ -150,10 +150,19 @@ static unsigned int search_servers(time_t now, union 
> all_addr **addrpp, unsigned
> |   }
> | else if (serv->flags & SERV_HAS_DOMAIN)
> |   {
> |+   int isequal;
> |unsigned int domainlen = strlen(serv->domain);
> |-   char *matchstart = qdomain + namelen - domainlen;
> |+   char *matchstart = strcasestr(qdomain, serv->domain);
> |+   if ((matchstart != NULL) && (*(matchstart+domainlen) == 0 || 
> *(matchstart+domainlen) == '.'))

Great work from  MingJian Hong. He came with a question like
 How to express  /domain.tld.*/
after finding out that
  /domain.tld$/
was implemented. Then responsed with a patch that does
  /domain.tld.*/


Regards
Geert Stappers

P.S.
Some regular expression characters
  /   search string delimiter
  .   any character
  *   zero or more character as the previous character
  $   end


-- 
Leven en laten leven

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] DHCP renew and rebind time weirdness

2019-11-28 Thread Geert Stappers
On Thu, Nov 28, 2019 at 09:49:43PM +0200, Johan Kruger wrote:
> Hi,
> 
> I have a fairly simple DHCP setup on a Raspbian box, using dnsmasq
> 2.80. The relavant options in /etc/dnsmasq.conf are (yes, my home LAN
> is 10.168.8.0/24):
> 
> 
> 
> dhcp-range=10.168.8.10,10.168.8.200 # Also tried with 12h on the end, no 
> difference
> dhcp-authoritative
> log-dhcp
> 
> 
> I'm testing by running "dnsmasq -d" in a shell, so I can see what's going on.
> 
> 
> When dnsmasq gets a DHCP request (I'm using "nmap --script
> broadcast-dhcp-discover" on another box to test), I see the following:
> 
> 
> 
> dnsmasq-dhcp: 811185697 broadcast response
> dnsmasq-dhcp: 811185697 sent size:  1 option: 53 message-type  2
> dnsmasq-dhcp: 811185697 sent size:  4 option: 54 server-identifier  
> 10.168.8.254
> dnsmasq-dhcp: 811185697 sent size:  4 option: 51 lease-time  2m
> dnsmasq-dhcp: 811185697 sent size:  4 option: 58 T1  1m
> dnsmasq-dhcp: 811185697 sent size:  4 option: 59 T2  1m45s
> dnsmasq-dhcp: 811185697 sent size:  4 option:  1 netmask  255.255.255.0
> dnsmasq-dhcp: 811185697 sent size:  4 option: 28 broadcast  10.168.8.255
> dnsmasq-dhcp: 811185697 sent size:  4 option:  3 router  10.168.8.254
> dnsmasq-dhcp: 811185697 sent size:  4 option:  6 dns-server  10.168.8.254
> dnsmasq-dhcp: 811185697 sent size: 16 option: 15 domain-name  myhome
> 
> 
> Notice the lease-time, T1 and T2 responses... 2 minutes??? Default
> should be 1 hour, and I get the same 2m if I specify 12h in the
> dhcp-range line.
> 
> 
> 
> I then added the following to the conf file:
> 
> 
> dhcp-option=51,12h
> dhcp-option=58,12h # DHCP Renewal (T1) Time
> dhcp-option=59,12h # DHCP Rebinding (T2) Time
> 
> 
> Then the response looked like this:
> 
> 
> dnsmasq-dhcp: 1994188113 broadcast response
> dnsmasq-dhcp: 1994188113 sent size:  1 option: 53 message-type  2
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 54 server-identifier  
> 10.168.8.254
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 51 lease-time  2m
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 58 T1  1m
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 59 T2  1m45s
> dnsmasq-dhcp: 1994188113 sent size:  4 option:  1 netmask  255.255.255.0
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 28 broadcast  10.168.8.255
> dnsmasq-dhcp: 1994188113 sent size:  4 option:  3 router  10.168.8.254
> dnsmasq-dhcp: 1994188113 sent size:  4 option:  6 dns-server  10.168.8.254
> dnsmasq-dhcp: 1994188113 sent size: 16 option: 15 domain-name  myhome
> dnsmasq-dhcp: 1994188113 sent size:  4 option: 51 lease-time  12h
> 
> 
> Two lease-time responses, at least the correct 12h one is second so
> the client uses that, but notice that T1 and T2 are still wrong.
> 
> 
> 
> The upshot is that most DHCP clients (Android and Windows anyway)
> just reject the response.
> 
> 
> 
> For now, I guess I'll have to install isc-dhcp-server, since 2.80 is
> the only version of dnsmasq available for Raspdian 10 (buster). Bit
> of overkill for a little home LAN only running a couple of devices,
> but there we have it.

Whatever works for you and thanks for the joke



___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] DHCP lease-time weirdness

2019-11-28 Thread Johan Kruger
Hi,



I have a fairly simple DHCP setup on a Raspbian box, using dnsmasq 2.80. The 
relavant options in /etc/dnsmasq.conf are (yes, my home LAN is 10.168.8.0/24):



dhcp-range=10.168.8.10,10.168.8.200 # Also tried with 12h on the end, no 
difference

dhcp-authoritative

log-dhcp



I'm testing by running "dnsmasq -d" in a shell, so I can see what's going on.



When dnsmasq gets a DHCP request (I'm using "nmap --script 
broadcast-dhcp-discover" on another box to test), I see the following:



dnsmasq-dhcp: 811185697 broadcast response

dnsmasq-dhcp: 811185697 sent size:  1 option: 53 message-type  2

dnsmasq-dhcp: 811185697 sent size:  4 option: 54 server-identifier  10.168.8.254

dnsmasq-dhcp: 811185697 sent size:  4 option: 51 lease-time  2m

dnsmasq-dhcp: 811185697 sent size:  4 option: 58 T1  1m

dnsmasq-dhcp: 811185697 sent size:  4 option: 59 T2  1m45s

dnsmasq-dhcp: 811185697 sent size:  4 option:  1 netmask  255.255.255.0

dnsmasq-dhcp: 811185697 sent size:  4 option: 28 broadcast  10.168.8.255

dnsmasq-dhcp: 811185697 sent size:  4 option:  3 router  10.168.8.254

dnsmasq-dhcp: 811185697 sent size:  4 option:  6 dns-server  10.168.8.254

dnsmasq-dhcp: 811185697 sent size: 16 option: 15 domain-name  myhome



Notice the lease-time, T1 and T2 responses... 2 minutes??? Default should be 1 
hour, and I get the same 2m if I specify 12h in the dhcp-range line.



I then added the following to the conf file:



dhcp-option=51,12h

dhcp-option=58,12h # DHCP Renewal (T1) Time

dhcp-option=59,12h # DHCP Rebinding (T2) Time



Then the response looked like this:



dnsmasq-dhcp: 1994188113 broadcast response

dnsmasq-dhcp: 1994188113 sent size:  1 option: 53 message-type  2

dnsmasq-dhcp: 1994188113 sent size:  4 option: 54 server-identifier  
10.168.8.254

dnsmasq-dhcp: 1994188113 sent size:  4 option: 51 lease-time  2m

dnsmasq-dhcp: 1994188113 sent size:  4 option: 58 T1  1m

dnsmasq-dhcp: 1994188113 sent size:  4 option: 59 T2  1m45s

dnsmasq-dhcp: 1994188113 sent size:  4 option:  1 netmask  255.255.255.0

dnsmasq-dhcp: 1994188113 sent size:  4 option: 28 broadcast  10.168.8.255

dnsmasq-dhcp: 1994188113 sent size:  4 option:  3 router  10.168.8.254

dnsmasq-dhcp: 1994188113 sent size:  4 option:  6 dns-server  10.168.8.254

dnsmasq-dhcp: 1994188113 sent size: 16 option: 15 domain-name  myhome

dnsmasq-dhcp: 1994188113 sent size:  4 option: 51 lease-time  12h



Two lease-time responses, at least the correct 12h one is second so the client 
uses that, but notice that T1 and T2 are still wrong.



The upshot is that most DHCP clients (Android and Windows anyway) just reject 
the response.



For now, I guess I'll have to install isc-dhcp-server, since 2.80 is the only 
version of dnsmasq available for Raspdian 10 (buster). Bit of overkill for a 
little home LAN only running a couple of devices, but there we have it.___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] [PATCH] DHCPv6: Honor assigning IPv6 address based on MAC address

2019-11-28 Thread Harald Jensås
On Sun, 2019-11-24 at 10:35 +0100, Geert Stappers wrote:
> On Thu, Nov 21, 2019 at 03:12:00PM +0100, Harald Jensås wrote:
> > Bumping this patch again: 
> > http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2018q4/012707.html
> > 
> > I see there have been several request to include this on the list:
> > http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2019q1/012895.html 
> > (and others
> > ...)
> > afaict there has not been any objections to the patch.
> > 
> > 
> > We are trying to make Openstack Ironic deploy baremetal servers in
> > a
> > setup where Openstack Neutron is used. Neutron uses dnsmasq and the
> > MAC
> > address as identifier to assign IPv6 addresses in a static only
> > setup,
> > config details at bottom of this mail.
> > 
> > The problem with DUID/IAID changes as the baremetal node is moving
> > trough the deployment phases UEFI->iPXE->Deploy Ramdisk resulting
> > in
> > "no address available" has always been a blocker for this to work.
> > (The
> > issue was discussed on this list a some time ago, 
> > http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q1/thread.html#11267
> > )
> > 
> > This patch solves this problem of changing DUID/IAID's.
> > 
> >  Is there a reason not to merge the patch?
> 
> A better question would be:
> 
>   What is needed to get this, reported as working, patch merged?
> 

Thanks Geert, this is indeed the better question.

I can provide dnsmasq log's and traffic captures along with
configuration details from a setup I have where I verified this is
working if that helps.

On other words, let us know what is needed and we shall try to provide
it.


> 
> > 
> > 
> > /usr/sbin/dnsmasq \
> >  -k \
> >  --no-hosts \
> >  --no-resolv \
> >  --pid-file=/var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/pid \
> >  --dhcp-hostsfile=/var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/host \
> >  --addn-hosts=/var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/addn_hosts \
> >  --dhcp-optsfile=/var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/opts \
> >  --dhcp-leasefile=/var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/leases \
> >  --dhcp-match=set:ipxe,175 \
> >  --dhcp-userclass=set:ipxe6,iPXE \
> >  --local-service \
> >  --bind-dynamic \
> >  --dhcp-range=set:subnet-2b009f45-e0e2-4888-bb19-
> > 8d46e51717b8,fd12:3456:789a:1::,static,64,86400s \
> >  --dhcp-option-force=option:mtu,1450 \
> >  --dhcp-lease-max=16777216 \
> >  --conf-file= \
> >  --domain=localdomain
> > 
> > $ cat /var/lib/neutron/dhcp/6c9c3845-a101-47fa-a491-
> > 52e1ecb11e65/host
> > fa:16:3e:8b:07:9f,host-fd12-3456-789a-1
> > --4.localdomain,[fd12:3456:789a:1::4]
> > fa:16:3e:9f:b7:e3,host-fd12-3456-789a-1
> > --ed.localdomain,[fd12:3456:789a:1::ed]
> > 
> > 
> 
> Groeten
> Geert Stappers


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

2019-11-28 Thread Tore Anderson
Hello,

I've noticed that Dnsmasq on my system sometimes enters a defective state where 
it starts spinning on the CPU. When it has entered this state, I need to send 
it SIGKILL to get rid of it - SIGTERM is ignored.

The version is current Git master (2.80-93-g6ebdc95).

I've enabled query logging and grabbed the final log lines of a few incidents 
(slightly anonymised):

Example 1:

forwarded git.i.example.org to 192.168.33.1
reply git.i.example.org is 
reply git01-osl3.i.example.org is 10.22.3.196
reply git.i.example.org is 
reply git01-osl3.i.example.org is 2001:db8:400:c:18:59ff:fe7a:73c4
query[type=44] git.i.example.org from 127.0.0.1
(CPU spin begins)

Example 2:

query[A] s2-a8-osl3.n.example.org from 127.0.0.1
forwarded s2-a8-osl3.n.example.org to 192.168.33.1
query[] s2-a8-osl3.n.example.org from 127.0.0.1
forwarded s2-a8-osl3.n.example.org to 192.168.33.1
reply s2-a8-osl3.n.example.org is 
reply lo.s2-a8-osl3.n.example.org is 2001:db8:1::4:1
reply s2-a8-osl3.n.example.org is 
reply lo.s2-a8-osl3.n.example.org is 192.168.63.11
query[type=44] s2-a8-osl3.n.example.org from 127.0.0.1
(CPU spin begins)

Example 3:

query[A] s1-a8-osl3.n.example.org from 127.0.0.1
forwarded s1-a8-osl3.n.example.org to 192.168.33.1
query[] s1-a8-osl3.n.example.org from 127.0.0.1
forwarded s1-a8-osl3.n.example.org to 192.168.33.1
reply s1-a8-osl3.n.example.org is 
reply lo.s1-a8-osl3.n.example.org is 192.168.63.10
reply s1-a8-osl3.n.example.org is 
reply lo.s1-a8-osl3.n.example.org is 2001:db8:1::4:0
query[type=44] s1-a8-osl3.n.example.org from 127.0.0.1
(CPU spin begins)

All of them ends with an incoming query for SSHFP records (type 44), which I 
find highly suspect. The SSHFP requests comes from the SSH client (due to 
VerifyHostKeyDNS being set in my ~/.ssh/config).

None of the hostnames in question do have SSHFP records published, but that 
does not seem to matter, as the query does not seem to be forwarded upstream in 
the first place. When the bug does not occur, Dnsmasq does log that it forwards 
the query upstream, like so:

query[type=44] l1-a9-osl3.n.example.org from 127.0.0.1
forwarded l1-a9-osl3.n.example.org to 192.168.33.1

Dnsmasq is invoked from NetworkManager, using the following command line:

/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces 
--pid-file=/run/NetworkManager/dnsmasq.pid --listen-address=127.0.0.1 
--cache-size=400 --clear-on-reload --conf-file=/dev/null --proxy-dnssec 
--enable-dbus=org.freedesktop.NetworkManager.dnsmasq 
--conf-dir=/etc/NetworkManager/dnsmasq.d

Additional configuration in /etc/NetworkManager/dnsmasq.d/dnssec.conf:

dnssec
conf-file = /usr/share/dnsmasq/trust-anchors.conf
log-queries

Finally, my environment contains RES_OPTIONS=edns0 in case that is relevant 
(this is required for SSH's VerifyHostKeyDNS feature to work correctly).

I cannot reliably reproduce the issue. It seems to happen regularly (several 
times a day) during normal usage - I use the SSH client quite frequently.

I would be happy to provide additional debugging information, if given 
instructions on how to obtain it.'

Tore

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss