[Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]

2018-12-20 Thread Iain Lane
Hi there,

Forgive my ignorance on all matters dnsmasq/ipv6 - I'm coming to this
from the 'desktop' end and I've not got much experience here.

We noticed in Ubuntu that our NetworkManager testsuite started failing
once we had copied dnsmasq 2.80-1 from unstable. What this testsuite
does (you can read the code at [1] if you're interested) is start
dnsmasq in the given mode (supplied by the testcase), create some fake
interfaces, and verify that the interfaces come up properly.

The testcase "test_auto_ip6_raonly_no_pe" started failing.  What we see
is that the test times out, and when that happens the dnsmasq log file
is filled with

  dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600::

repeating over and over. You can view a log file including all the
dnsmasq log entries at [2] - it's huge though, so I suggest downloading
it and using a real text editor instead of your browser.

This test starts dnsmasq in ra-only mode, and that seems to be the case
that's broken. Since this started happening in a new version, I was able
to bisect 2.79 to 2.80, and I found that commit
0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit.
Indeed, reverting that on master causes the NM testsuite to start
passing again.

Can anyone help shed some light please? Are we looking at a dnsmasq bug,
or something that our testsutie is doing wrong? If any more information
would be helpful then I'm more than happy to provide it.

Cheers all,

-- 
Iain Lane  [ i...@orangesquash.org.uk ]
Debian Developer   [ la...@debian.org ]
Ubuntu Developer   [ la...@ubuntu.com ]

[1] https://git.launchpad.net/network-manager/tree/debian/tests?h=disco
- nm.py & network_test_base.py
[2] 
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco/disco/amd64/n/network-manager/20181218_194950_de909@/log.gz


signature.asc
Description: PGP signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]

2018-12-20 Thread Iain Lane
On Thu, Dec 20, 2018 at 01:06:30PM +, Iain Lane wrote:
>   dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600::
> 
> repeating over and over. You can view a log file including all the
> dnsmasq log entries at [2] - it's huge though, so I suggest downloading
> it and using a real text editor instead of your browser.
> 
> This test starts dnsmasq in ra-only mode, and that seems to be the case
> that's broken. Since this started happening in a new version, I was able
> to bisect 2.79 to 2.80, and I found that commit
> 0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit.
> Indeed, reverting that on master causes the NM testsuite to start
> passing again.

I attached gdb to dnsmasq while it was doing this. Here's the stack
trace of what was going on at the time:

(gdb) bt
#0  0x7fb1b5a7cfd4 in __GI___libc_write (fd=10, buf=0x55c255462ce8, 
nbytes=62) at ../sysdeps/unix/sysv/linux/write.c:26
#1  0x55c253c4fb94 in log_write () at log.c:179
#2  0x55c253c502fd in my_syslog (priority=6, format=0x55c253c77e39 
"RTR-ADVERT(%s) %s") at log.c:389
#3  0x55c253c5de2e in add_prefixes (local=0x55c25545d65c, prefix=64, 
scope=0, if_index=13, flags=4, preferred=3600, valid=3600, 
vparam=0x7ffcd1e77ce0) at radv.c:725
#4  0x55c253c493a0 in iface_enumerate (family=10, parm=0x7ffcd1e77ce0, 
callback=0x55c253c5d829 ) at netlink.c:268
#5  0x55c253c5cad4 in send_ra_alias (now=1545314006, iface=13, 
iface_name=0x7ffcd1e77e1c "veth42", dest=0x0, send_iface=13)
at radv.c:291
#6  0x55c253c5d826 in send_ra (now=1545314006, iface=13, 
iface_name=0x7ffcd1e77e1c "veth42", dest=0x0) at radv.c:553
#7  0x55c253c5e0f8 in periodic_ra (now=1545314006) at radv.c:808
#8  0x55c253c3e256 in lease_update_file (now=1545314006) at lease.c:355
#9  0x55c253c52e7e in dhcp_construct_contexts (now=1545314006) at 
dhcp6.c:789
#10 0x55c253c36812 in newaddress (now=1545314006) at network.c:1721
#11 0x55c253c39902 in async_event (pipe=8, now=1545314006) at dnsmasq.c:1413
#12 0x55c253c38e71 in main (argc=11, argv=0x7ffcd1e78258) at dnsmasq.c:1084

There's a few places you can get an async_event of EVENT_NEWADDR. Some
further grubbing around shows that it's coming from iface_enumerate() ->
nl_async() -> async_event(EVENT_NEWADDR).

You'll see that iface_enumerate() is frame #4 in that trace too, so this
smells like it might be a source of infinite recursion to me: every time
we call add_prefixes() we also queue another call of the same thing -
the new code is to blame via dhcp_construct_contexts() calling
construct_worker() to enter the new block setting param->newone = 1,
which is checked in dhcp_construct_options().

What's not clear to me is how best to cut this off. If we only called
the new code one time that would probably solve it...

Cheers,

-- 
Iain Lane  [ i...@orangesquash.org.uk ]
Debian Developer   [ la...@debian.org ]
Ubuntu Developer   [ la...@ubuntu.com ]


signature.asc
Description: PGP signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)

2018-12-20 Thread Donald Muller
I think this should be added to the code maybe with an option in the config 
file to turn on the caching of these records.

> -Original Message-
> From: Dnsmasq-discuss 
> On Behalf Of Jeremy Allison
> Sent: Thursday, December 20, 2018 4:39 PM
> To: dnsmasq-discuss@lists.thekelleys.org.uk
> Subject: Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version
> (#3)
> 
> On 12/20/2018 12:20 PM, Jeremy Allison wrote:
> > On Thu, 20 Dec 2018 11:53:11 -0800
> >
> > Jeremy Allison  wrote:
> >
> >> I know dnsmasq doesn't cache SRV records by design:
> >>
> >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9
> >>
> >> However, when used with Samba code (winbindd) and
> >> other Windows-integrating technology (MIT krb5)
> >> there are a lot of SRV record queries that make
> >> the whole integration stack slow if SRV records
> >> are not cached.
> >>
> >> With that in mind, here is a patch to cache
> >> SRV records positively and negatively inside
> >> dnsmasq.
> >>
> >> I'm sending here it so that people who might need
> >> this functionality have a central place to find
> >> it (it might end up being used in ChromeOS, depending
> >> on test results / review).
> >
> > Sigh. Found a bug when testing. free_mx_srv_record()
> > wasn't checking for NULL pointers on free(),
> > which can be the case for negative cache
> > records.
> 
> Third time is the charm :-). Remember to NULL
> out free'd and uninitialized pointers and
> structrures, and remove the F_SRV flag on deleting
> the cache entry.
> 
> Hopefully this is the last iteration of this.
> I can't see any more issues to address, but
> I'm still testing :-).
> 
> Jeremy.
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)

2018-12-20 Thread Simon Kelley
This is worth having for the removal of the archaic 16-bit limit on the
flags field in a cache record alone. I've been meaning to tackle that
for some time.

This time of year either frees up lots of time for coding, or yields
none at all, and for me it's the later, but I will go through this and
merge it in the new year.


Cheers,

Simon.


On 20/12/2018 21:38, Jeremy Allison wrote:
> On 12/20/2018 12:20 PM, Jeremy Allison wrote:
>> On Thu, 20 Dec 2018 11:53:11 -0800
>>
>> Jeremy Allison  wrote:
>>
>>> I know dnsmasq doesn't cache SRV records by design:
>>>
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9
>>>
>>> However, when used with Samba code (winbindd) and
>>> other Windows-integrating technology (MIT krb5)
>>> there are a lot of SRV record queries that make
>>> the whole integration stack slow if SRV records
>>> are not cached.
>>>
>>> With that in mind, here is a patch to cache
>>> SRV records positively and negatively inside
>>> dnsmasq.
>>>
>>> I'm sending here it so that people who might need
>>> this functionality have a central place to find
>>> it (it might end up being used in ChromeOS, depending
>>> on test results / review).
>>
>> Sigh. Found a bug when testing. free_mx_srv_record()
>> wasn't checking for NULL pointers on free(),
>> which can be the case for negative cache
>> records.
> 
> Third time is the charm :-). Remember to NULL
> out free'd and uninitialized pointers and
> structrures, and remove the F_SRV flag on deleting
> the cache entry.
> 
> Hopefully this is the last iteration of this.
> I can't see any more issues to address, but
> I'm still testing :-).
> 
> Jeremy.
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 



signature.asc
Description: OpenPGP digital signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]

2018-12-20 Thread Simon Kelley
Patch below is untested because I'm away from my test rig, but it would
seem to do the right thing, ie set template->if_index _before_ calling
ra_start_unsolicited() so that if we re-enter here via an async event it
doesn't get called again.

If you could test it in your harness, that would be great :)

Cheers,

Simon.



diff --git a/src/dhcp6.c b/src/dhcp6.c
index 3932cc7..4082504 100644
--- a/src/dhcp6.c
+++ b/src/dhcp6.c
@@ -650,12 +650,11 @@ static int construct_worker(struct in6_addr
*local, int prefix,
/* First time found, do fast RA. */
if (template->if_index != if_index ||
!IN6_ARE_ADDR_EQUAL(>local6, local))
  {
+   template->if_index = if_index;
+   template->local6 = *local;
ra_start_unsolicited(param->now, template);
param->newone = 1;
  }
-
-   template->if_index = if_index;
-   template->local6 = *local;
  }

   }


On 20/12/2018 17:21, Iain Lane wrote:
> On Thu, Dec 20, 2018 at 01:06:30PM +, Iain Lane wrote:
>>   dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600::
>>
>> repeating over and over. You can view a log file including all the
>> dnsmasq log entries at [2] - it's huge though, so I suggest downloading
>> it and using a real text editor instead of your browser.
>>
>> This test starts dnsmasq in ra-only mode, and that seems to be the case
>> that's broken. Since this started happening in a new version, I was able
>> to bisect 2.79 to 2.80, and I found that commit
>> 0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit.
>> Indeed, reverting that on master causes the NM testsuite to start
>> passing again.
> 
> I attached gdb to dnsmasq while it was doing this. Here's the stack
> trace of what was going on at the time:
> 
> (gdb) bt
> #0  0x7fb1b5a7cfd4 in __GI___libc_write (fd=10, buf=0x55c255462ce8, 
> nbytes=62) at ../sysdeps/unix/sysv/linux/write.c:26
> #1  0x55c253c4fb94 in log_write () at log.c:179
> #2  0x55c253c502fd in my_syslog (priority=6, format=0x55c253c77e39 
> "RTR-ADVERT(%s) %s") at log.c:389
> #3  0x55c253c5de2e in add_prefixes (local=0x55c25545d65c, prefix=64, 
> scope=0, if_index=13, flags=4, preferred=3600, valid=3600, 
> vparam=0x7ffcd1e77ce0) at radv.c:725
> #4  0x55c253c493a0 in iface_enumerate (family=10, parm=0x7ffcd1e77ce0, 
> callback=0x55c253c5d829 ) at netlink.c:268
> #5  0x55c253c5cad4 in send_ra_alias (now=1545314006, iface=13, 
> iface_name=0x7ffcd1e77e1c "veth42", dest=0x0, send_iface=13)
> at radv.c:291
> #6  0x55c253c5d826 in send_ra (now=1545314006, iface=13, 
> iface_name=0x7ffcd1e77e1c "veth42", dest=0x0) at radv.c:553
> #7  0x55c253c5e0f8 in periodic_ra (now=1545314006) at radv.c:808
> #8  0x55c253c3e256 in lease_update_file (now=1545314006) at lease.c:355
> #9  0x55c253c52e7e in dhcp_construct_contexts (now=1545314006) at 
> dhcp6.c:789
> #10 0x55c253c36812 in newaddress (now=1545314006) at network.c:1721
> #11 0x55c253c39902 in async_event (pipe=8, now=1545314006) at 
> dnsmasq.c:1413
> #12 0x55c253c38e71 in main (argc=11, argv=0x7ffcd1e78258) at 
> dnsmasq.c:1084
> 
> There's a few places you can get an async_event of EVENT_NEWADDR. Some
> further grubbing around shows that it's coming from iface_enumerate() ->
> nl_async() -> async_event(EVENT_NEWADDR).
> 
> You'll see that iface_enumerate() is frame #4 in that trace too, so this
> smells like it might be a source of infinite recursion to me: every time
> we call add_prefixes() we also queue another call of the same thing -
> the new code is to blame via dhcp_construct_contexts() calling
> construct_worker() to enter the new block setting param->newone = 1,
> which is checked in dhcp_construct_options().
> 
> What's not clear to me is how best to cut this off. If we only called
> the new code one time that would probably solve it...
> 
> Cheers,
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 



signature.asc
Description: OpenPGP digital signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)

2018-12-20 Thread Jeremy Allison
On 12/20/2018 12:20 PM, Jeremy Allison wrote:
> On Thu, 20 Dec 2018 11:53:11 -0800
> 
> Jeremy Allison  wrote:
> 
>> I know dnsmasq doesn't cache SRV records by design:
>>
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9
>>
>> However, when used with Samba code (winbindd) and
>> other Windows-integrating technology (MIT krb5)
>> there are a lot of SRV record queries that make
>> the whole integration stack slow if SRV records
>> are not cached.
>>
>> With that in mind, here is a patch to cache
>> SRV records positively and negatively inside
>> dnsmasq.
>>
>> I'm sending here it so that people who might need
>> this functionality have a central place to find
>> it (it might end up being used in ChromeOS, depending
>> on test results / review).
> 
> Sigh. Found a bug when testing. free_mx_srv_record()
> wasn't checking for NULL pointers on free(),
> which can be the case for negative cache
> records.

Third time is the charm :-). Remember to NULL
out free'd and uninitialized pointers and
structrures, and remove the F_SRV flag on deleting
the cache entry.

Hopefully this is the last iteration of this.
I can't see any more issues to address, but
I'm still testing :-).

Jeremy.
From d9758e1b340ceb7a46c965a53dc040193d80956d Mon Sep 17 00:00:00 2001
From: Jeremy Allison 
Date: Tue, 18 Dec 2018 10:44:08 -0800
Subject: [PATCH 1/6] Widen 'flags' field in cache to unsigned int from
 unsigned short.

Ensure we still only cache unsigned short bits (0x).

Signed-off-by: Jeremy Allison 
---
 src/cache.c   | 23 ++-
 src/dnsmasq.h |  6 --
 src/rfc1035.c |  2 +-
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/src/cache.c b/src/cache.c
index 8662f54..cd72a0c 100644
--- a/src/cache.c
+++ b/src/cache.c
@@ -27,7 +27,7 @@ static int bignames_left, hash_size;
 
 static void make_non_terminals(struct crec *source);
 static struct crec *really_insert(char *name, struct all_addr *addr, 
- time_t now,  unsigned long ttl, unsigned 
short flags);
+ time_t now,  unsigned long ttl, unsigned int 
flags);
 
 /* type->string mapping: this is also used by the name-hash function as a 
mixing table. */
 static const struct {
@@ -330,7 +330,7 @@ static int is_expired(time_t now, struct crec *crecp)
   return 1;
 }
 
-static struct crec *cache_scan_free(char *name, struct all_addr *addr, time_t 
now, unsigned short flags,
+static struct crec *cache_scan_free(char *name, struct all_addr *addr, time_t 
now, unsigned int flags,
struct crec **target_crec, unsigned int 
*target_uid)
 {
   /* Scan and remove old entries.
@@ -465,7 +465,7 @@ void cache_start_insert(void)
 }
 
 struct crec *cache_insert(char *name, struct all_addr *addr, 
- time_t now,  unsigned long ttl, unsigned short flags)
+ time_t now,  unsigned long ttl, unsigned int flags)
 {
   /* Don't log DNSSEC records here, done elsewhere */
   if (flags & (F_IPV4 | F_IPV6 | F_CNAME))
@@ -483,7 +483,7 @@ struct crec *cache_insert(char *name, struct all_addr *addr,
 
 
 static struct crec *really_insert(char *name, struct all_addr *addr, 
- time_t now,  unsigned long ttl, unsigned 
short flags)
+ time_t now,  unsigned long ttl, unsigned int 
flags)
 {
   struct crec *new, *target_crec = NULL;
   union bigname *big_name = NULL;
@@ -495,6 +495,11 @@ static struct crec *really_insert(char *name, struct 
all_addr *addr,
   if (insert_error)
 return NULL;
   
+  /*
+   * Ensure we only cache unsigned short bits.
+   */
+  flags &= DNSMASQ_CACHE_MASK;
+
   /* First remove any expired entries and entries for the name/address we
  are currently inserting. */
   if ((new = cache_scan_free(name, addr, now, flags, _crec, 
_uid)))
@@ -656,7 +661,7 @@ void cache_end_insert(void)
{
  char *name = cache_get_name(new_chain);
  ssize_t m = strlen(name);
- unsigned short flags = new_chain->flags;
+ unsigned int flags = new_chain->flags;
 #ifdef HAVE_DNSSEC
  u16 class = new_chain->uid;
 #endif
@@ -717,7 +722,7 @@ int cache_recv_insert(time_t now, int fd)
   struct all_addr addr;
   unsigned long ttl;
   time_t ttd;
-  unsigned short flags;
+  unsigned int flags;
   struct crec *crecp = NULL;
   
   cache_start_insert();
@@ -856,7 +861,7 @@ struct crec *cache_find_by_name(struct crec *crecp, char 
*name, time_t now, unsi
   /* first search, look for relevant entries and push to top of list
 also free anything which has expired */
   struct crec *next, **up, **insert = NULL, **chainp = 
-  unsigned short ins_flags = 0;
+  unsigned int ins_flags = 0;
   
   for (up = hash_bucket(name), crecp = *up; crecp; crecp = next)
{
@@ -1140,7 +1145,7 @@ int read_hostsfile(char *filename, unsigned int index, 
int cache_size, struct cr
  

Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version.

2018-12-20 Thread Paul Smith
On Thu, 2018-12-20 at 12:20 -0800, Jeremy Allison wrote:
> Sigh. Found a bug when testing. free_mx_srv_record()
> wasn't checking for NULL pointers on free(),
> which can be the case for negative cache
> records.

C has required free(NULL) to be a no-op since ANSI C 1989/ISO C 1990...
Is dnsmasq still targetting C runtimes that don't adhere to that
standard?

Just curious...


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss