[Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]
Hi there, Forgive my ignorance on all matters dnsmasq/ipv6 - I'm coming to this from the 'desktop' end and I've not got much experience here. We noticed in Ubuntu that our NetworkManager testsuite started failing once we had copied dnsmasq 2.80-1 from unstable. What this testsuite does (you can read the code at [1] if you're interested) is start dnsmasq in the given mode (supplied by the testcase), create some fake interfaces, and verify that the interfaces come up properly. The testcase "test_auto_ip6_raonly_no_pe" started failing. What we see is that the test times out, and when that happens the dnsmasq log file is filled with dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600:: repeating over and over. You can view a log file including all the dnsmasq log entries at [2] - it's huge though, so I suggest downloading it and using a real text editor instead of your browser. This test starts dnsmasq in ra-only mode, and that seems to be the case that's broken. Since this started happening in a new version, I was able to bisect 2.79 to 2.80, and I found that commit 0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit. Indeed, reverting that on master causes the NM testsuite to start passing again. Can anyone help shed some light please? Are we looking at a dnsmasq bug, or something that our testsutie is doing wrong? If any more information would be helpful then I'm more than happy to provide it. Cheers all, -- Iain Lane [ i...@orangesquash.org.uk ] Debian Developer [ la...@debian.org ] Ubuntu Developer [ la...@ubuntu.com ] [1] https://git.launchpad.net/network-manager/tree/debian/tests?h=disco - nm.py & network_test_base.py [2] https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco/disco/amd64/n/network-manager/20181218_194950_de909@/log.gz signature.asc Description: PGP signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]
On Thu, Dec 20, 2018 at 01:06:30PM +, Iain Lane wrote: > dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600:: > > repeating over and over. You can view a log file including all the > dnsmasq log entries at [2] - it's huge though, so I suggest downloading > it and using a real text editor instead of your browser. > > This test starts dnsmasq in ra-only mode, and that seems to be the case > that's broken. Since this started happening in a new version, I was able > to bisect 2.79 to 2.80, and I found that commit > 0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit. > Indeed, reverting that on master causes the NM testsuite to start > passing again. I attached gdb to dnsmasq while it was doing this. Here's the stack trace of what was going on at the time: (gdb) bt #0 0x7fb1b5a7cfd4 in __GI___libc_write (fd=10, buf=0x55c255462ce8, nbytes=62) at ../sysdeps/unix/sysv/linux/write.c:26 #1 0x55c253c4fb94 in log_write () at log.c:179 #2 0x55c253c502fd in my_syslog (priority=6, format=0x55c253c77e39 "RTR-ADVERT(%s) %s") at log.c:389 #3 0x55c253c5de2e in add_prefixes (local=0x55c25545d65c, prefix=64, scope=0, if_index=13, flags=4, preferred=3600, valid=3600, vparam=0x7ffcd1e77ce0) at radv.c:725 #4 0x55c253c493a0 in iface_enumerate (family=10, parm=0x7ffcd1e77ce0, callback=0x55c253c5d829 ) at netlink.c:268 #5 0x55c253c5cad4 in send_ra_alias (now=1545314006, iface=13, iface_name=0x7ffcd1e77e1c "veth42", dest=0x0, send_iface=13) at radv.c:291 #6 0x55c253c5d826 in send_ra (now=1545314006, iface=13, iface_name=0x7ffcd1e77e1c "veth42", dest=0x0) at radv.c:553 #7 0x55c253c5e0f8 in periodic_ra (now=1545314006) at radv.c:808 #8 0x55c253c3e256 in lease_update_file (now=1545314006) at lease.c:355 #9 0x55c253c52e7e in dhcp_construct_contexts (now=1545314006) at dhcp6.c:789 #10 0x55c253c36812 in newaddress (now=1545314006) at network.c:1721 #11 0x55c253c39902 in async_event (pipe=8, now=1545314006) at dnsmasq.c:1413 #12 0x55c253c38e71 in main (argc=11, argv=0x7ffcd1e78258) at dnsmasq.c:1084 There's a few places you can get an async_event of EVENT_NEWADDR. Some further grubbing around shows that it's coming from iface_enumerate() -> nl_async() -> async_event(EVENT_NEWADDR). You'll see that iface_enumerate() is frame #4 in that trace too, so this smells like it might be a source of infinite recursion to me: every time we call add_prefixes() we also queue another call of the same thing - the new code is to blame via dhcp_construct_contexts() calling construct_worker() to enter the new block setting param->newone = 1, which is checked in dhcp_construct_options(). What's not clear to me is how best to cut this off. If we only called the new code one time that would probably solve it... Cheers, -- Iain Lane [ i...@orangesquash.org.uk ] Debian Developer [ la...@debian.org ] Ubuntu Developer [ la...@ubuntu.com ] signature.asc Description: PGP signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)
I think this should be added to the code maybe with an option in the config file to turn on the caching of these records. > -Original Message- > From: Dnsmasq-discuss > On Behalf Of Jeremy Allison > Sent: Thursday, December 20, 2018 4:39 PM > To: dnsmasq-discuss@lists.thekelleys.org.uk > Subject: Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version > (#3) > > On 12/20/2018 12:20 PM, Jeremy Allison wrote: > > On Thu, 20 Dec 2018 11:53:11 -0800 > > > > Jeremy Allison wrote: > > > >> I know dnsmasq doesn't cache SRV records by design: > >> > >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9 > >> > >> However, when used with Samba code (winbindd) and > >> other Windows-integrating technology (MIT krb5) > >> there are a lot of SRV record queries that make > >> the whole integration stack slow if SRV records > >> are not cached. > >> > >> With that in mind, here is a patch to cache > >> SRV records positively and negatively inside > >> dnsmasq. > >> > >> I'm sending here it so that people who might need > >> this functionality have a central place to find > >> it (it might end up being used in ChromeOS, depending > >> on test results / review). > > > > Sigh. Found a bug when testing. free_mx_srv_record() > > wasn't checking for NULL pointers on free(), > > which can be the case for negative cache > > records. > > Third time is the charm :-). Remember to NULL > out free'd and uninitialized pointers and > structrures, and remove the F_SRV flag on deleting > the cache entry. > > Hopefully this is the last iteration of this. > I can't see any more issues to address, but > I'm still testing :-). > > Jeremy. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)
This is worth having for the removal of the archaic 16-bit limit on the flags field in a cache record alone. I've been meaning to tackle that for some time. This time of year either frees up lots of time for coding, or yields none at all, and for me it's the later, but I will go through this and merge it in the new year. Cheers, Simon. On 20/12/2018 21:38, Jeremy Allison wrote: > On 12/20/2018 12:20 PM, Jeremy Allison wrote: >> On Thu, 20 Dec 2018 11:53:11 -0800 >> >> Jeremy Allison wrote: >> >>> I know dnsmasq doesn't cache SRV records by design: >>> >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9 >>> >>> However, when used with Samba code (winbindd) and >>> other Windows-integrating technology (MIT krb5) >>> there are a lot of SRV record queries that make >>> the whole integration stack slow if SRV records >>> are not cached. >>> >>> With that in mind, here is a patch to cache >>> SRV records positively and negatively inside >>> dnsmasq. >>> >>> I'm sending here it so that people who might need >>> this functionality have a central place to find >>> it (it might end up being used in ChromeOS, depending >>> on test results / review). >> >> Sigh. Found a bug when testing. free_mx_srv_record() >> wasn't checking for NULL pointers on free(), >> which can be the case for negative cache >> records. > > Third time is the charm :-). Remember to NULL > out free'd and uninitialized pointers and > structrures, and remove the F_SRV flag on deleting > the cache entry. > > Hopefully this is the last iteration of this. > I can't see any more issues to address, but > I'm still testing :-). > > Jeremy. > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > signature.asc Description: OpenPGP digital signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Infinite(?) RTR-ADVERTs being sent out [in Ubuntu NetworkManager testuite]
Patch below is untested because I'm away from my test rig, but it would seem to do the right thing, ie set template->if_index _before_ calling ra_start_unsolicited() so that if we re-enter here via an async event it doesn't get called again. If you could test it in your harness, that would be great :) Cheers, Simon. diff --git a/src/dhcp6.c b/src/dhcp6.c index 3932cc7..4082504 100644 --- a/src/dhcp6.c +++ b/src/dhcp6.c @@ -650,12 +650,11 @@ static int construct_worker(struct in6_addr *local, int prefix, /* First time found, do fast RA. */ if (template->if_index != if_index || !IN6_ARE_ADDR_EQUAL(>local6, local)) { + template->if_index = if_index; + template->local6 = *local; ra_start_unsolicited(param->now, template); param->newone = 1; } - - template->if_index = if_index; - template->local6 = *local; } } On 20/12/2018 17:21, Iain Lane wrote: > On Thu, Dec 20, 2018 at 01:06:30PM +, Iain Lane wrote: >> dnsmasq-dhcp[2010]: RTR-ADVERT(veth42) 2600:: >> >> repeating over and over. You can view a log file including all the >> dnsmasq log entries at [2] - it's huge though, so I suggest downloading >> it and using a real text editor instead of your browser. >> >> This test starts dnsmasq in ra-only mode, and that seems to be the case >> that's broken. Since this started happening in a new version, I was able >> to bisect 2.79 to 2.80, and I found that commit >> 0a496f059c1e9d75c33cce4c1211d58422ba4f62 is the first bad commit. >> Indeed, reverting that on master causes the NM testsuite to start >> passing again. > > I attached gdb to dnsmasq while it was doing this. Here's the stack > trace of what was going on at the time: > > (gdb) bt > #0 0x7fb1b5a7cfd4 in __GI___libc_write (fd=10, buf=0x55c255462ce8, > nbytes=62) at ../sysdeps/unix/sysv/linux/write.c:26 > #1 0x55c253c4fb94 in log_write () at log.c:179 > #2 0x55c253c502fd in my_syslog (priority=6, format=0x55c253c77e39 > "RTR-ADVERT(%s) %s") at log.c:389 > #3 0x55c253c5de2e in add_prefixes (local=0x55c25545d65c, prefix=64, > scope=0, if_index=13, flags=4, preferred=3600, valid=3600, > vparam=0x7ffcd1e77ce0) at radv.c:725 > #4 0x55c253c493a0 in iface_enumerate (family=10, parm=0x7ffcd1e77ce0, > callback=0x55c253c5d829 ) at netlink.c:268 > #5 0x55c253c5cad4 in send_ra_alias (now=1545314006, iface=13, > iface_name=0x7ffcd1e77e1c "veth42", dest=0x0, send_iface=13) > at radv.c:291 > #6 0x55c253c5d826 in send_ra (now=1545314006, iface=13, > iface_name=0x7ffcd1e77e1c "veth42", dest=0x0) at radv.c:553 > #7 0x55c253c5e0f8 in periodic_ra (now=1545314006) at radv.c:808 > #8 0x55c253c3e256 in lease_update_file (now=1545314006) at lease.c:355 > #9 0x55c253c52e7e in dhcp_construct_contexts (now=1545314006) at > dhcp6.c:789 > #10 0x55c253c36812 in newaddress (now=1545314006) at network.c:1721 > #11 0x55c253c39902 in async_event (pipe=8, now=1545314006) at > dnsmasq.c:1413 > #12 0x55c253c38e71 in main (argc=11, argv=0x7ffcd1e78258) at > dnsmasq.c:1084 > > There's a few places you can get an async_event of EVENT_NEWADDR. Some > further grubbing around shows that it's coming from iface_enumerate() -> > nl_async() -> async_event(EVENT_NEWADDR). > > You'll see that iface_enumerate() is frame #4 in that trace too, so this > smells like it might be a source of infinite recursion to me: every time > we call add_prefixes() we also queue another call of the same thing - > the new code is to blame via dhcp_construct_contexts() calling > construct_worker() to enter the new block setting param->newone = 1, > which is checked in dhcp_construct_options(). > > What's not clear to me is how best to cut this off. If we only called > the new code one time that would probably solve it... > > Cheers, > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > signature.asc Description: OpenPGP digital signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version (#3)
On 12/20/2018 12:20 PM, Jeremy Allison wrote: > On Thu, 20 Dec 2018 11:53:11 -0800 > > Jeremy Allison wrote: > >> I know dnsmasq doesn't cache SRV records by design: >> >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579536;msg=9 >> >> However, when used with Samba code (winbindd) and >> other Windows-integrating technology (MIT krb5) >> there are a lot of SRV record queries that make >> the whole integration stack slow if SRV records >> are not cached. >> >> With that in mind, here is a patch to cache >> SRV records positively and negatively inside >> dnsmasq. >> >> I'm sending here it so that people who might need >> this functionality have a central place to find >> it (it might end up being used in ChromeOS, depending >> on test results / review). > > Sigh. Found a bug when testing. free_mx_srv_record() > wasn't checking for NULL pointers on free(), > which can be the case for negative cache > records. Third time is the charm :-). Remember to NULL out free'd and uninitialized pointers and structrures, and remove the F_SRV flag on deleting the cache entry. Hopefully this is the last iteration of this. I can't see any more issues to address, but I'm still testing :-). Jeremy. From d9758e1b340ceb7a46c965a53dc040193d80956d Mon Sep 17 00:00:00 2001 From: Jeremy Allison Date: Tue, 18 Dec 2018 10:44:08 -0800 Subject: [PATCH 1/6] Widen 'flags' field in cache to unsigned int from unsigned short. Ensure we still only cache unsigned short bits (0x). Signed-off-by: Jeremy Allison --- src/cache.c | 23 ++- src/dnsmasq.h | 6 -- src/rfc1035.c | 2 +- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/src/cache.c b/src/cache.c index 8662f54..cd72a0c 100644 --- a/src/cache.c +++ b/src/cache.c @@ -27,7 +27,7 @@ static int bignames_left, hash_size; static void make_non_terminals(struct crec *source); static struct crec *really_insert(char *name, struct all_addr *addr, - time_t now, unsigned long ttl, unsigned short flags); + time_t now, unsigned long ttl, unsigned int flags); /* type->string mapping: this is also used by the name-hash function as a mixing table. */ static const struct { @@ -330,7 +330,7 @@ static int is_expired(time_t now, struct crec *crecp) return 1; } -static struct crec *cache_scan_free(char *name, struct all_addr *addr, time_t now, unsigned short flags, +static struct crec *cache_scan_free(char *name, struct all_addr *addr, time_t now, unsigned int flags, struct crec **target_crec, unsigned int *target_uid) { /* Scan and remove old entries. @@ -465,7 +465,7 @@ void cache_start_insert(void) } struct crec *cache_insert(char *name, struct all_addr *addr, - time_t now, unsigned long ttl, unsigned short flags) + time_t now, unsigned long ttl, unsigned int flags) { /* Don't log DNSSEC records here, done elsewhere */ if (flags & (F_IPV4 | F_IPV6 | F_CNAME)) @@ -483,7 +483,7 @@ struct crec *cache_insert(char *name, struct all_addr *addr, static struct crec *really_insert(char *name, struct all_addr *addr, - time_t now, unsigned long ttl, unsigned short flags) + time_t now, unsigned long ttl, unsigned int flags) { struct crec *new, *target_crec = NULL; union bigname *big_name = NULL; @@ -495,6 +495,11 @@ static struct crec *really_insert(char *name, struct all_addr *addr, if (insert_error) return NULL; + /* + * Ensure we only cache unsigned short bits. + */ + flags &= DNSMASQ_CACHE_MASK; + /* First remove any expired entries and entries for the name/address we are currently inserting. */ if ((new = cache_scan_free(name, addr, now, flags, _crec, _uid))) @@ -656,7 +661,7 @@ void cache_end_insert(void) { char *name = cache_get_name(new_chain); ssize_t m = strlen(name); - unsigned short flags = new_chain->flags; + unsigned int flags = new_chain->flags; #ifdef HAVE_DNSSEC u16 class = new_chain->uid; #endif @@ -717,7 +722,7 @@ int cache_recv_insert(time_t now, int fd) struct all_addr addr; unsigned long ttl; time_t ttd; - unsigned short flags; + unsigned int flags; struct crec *crecp = NULL; cache_start_insert(); @@ -856,7 +861,7 @@ struct crec *cache_find_by_name(struct crec *crecp, char *name, time_t now, unsi /* first search, look for relevant entries and push to top of list also free anything which has expired */ struct crec *next, **up, **insert = NULL, **chainp = - unsigned short ins_flags = 0; + unsigned int ins_flags = 0; for (up = hash_bucket(name), crecp = *up; crecp; crecp = next) { @@ -1140,7 +1145,7 @@ int read_hostsfile(char *filename, unsigned int index, int cache_size, struct cr
Re: [Dnsmasq-discuss] Patch to cache SRV records - updated version.
On Thu, 2018-12-20 at 12:20 -0800, Jeremy Allison wrote: > Sigh. Found a bug when testing. free_mx_srv_record() > wasn't checking for NULL pointers on free(), > which can be the case for negative cache > records. C has required free(NULL) to be a no-op since ANSI C 1989/ISO C 1990... Is dnsmasq still targetting C runtimes that don't adhere to that standard? Just curious... ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss