Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
On Mon, Sep 14, 2020 at 10:41:32PM +0200, Geert Stappers wrote: > On Mon, Sep 14, 2020 at 11:23:44AM -0700, James Brown wrote: > > That is fantastic, Dominick! > > > > I'm testing now, but in preliminary testing, this patch appears to fix the > > DNAME issue for me. > > OK. > Acknowledge. > > Thursday night (CEST, UTC+2) I'll retransmit the patch + "Tested-by" Done, Message-Id: 1600372552-8489-1-git-send-email-stapp...@alpaca.gpm.stappers.nl ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
On Mon, Sep 14, 2020 at 11:23:44AM -0700, James Brown wrote: > That is fantastic, Dominick! > > I'm testing now, but in preliminary testing, this patch appears to fix the > DNAME issue for me. OK. Acknowledge. Thursday night (CEST, UTC+2) I'll retransmit the patch + "Tested-by" ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
That is fantastic, Dominick! I'm testing now, but in preliminary testing, this patch appears to fix the DNAME issue for me. On Sun, Sep 13, 2020 at 10:03 PM Dominick C. Pastore < dominickpast...@dcpx.org> wrote: > This caught my eye because it's similar to a bug I noticed in 2.80. See > (and ignore the first half of the message about CNAMEs; that was an > unrelated issue): > http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2019q4/013483.html > > It sounds like that was essentially the same issue, but without DNAMEs. It > turned out it had already been fixed but the fix hadn't been released yet > at the time: > > http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=162e5e0062ce923c494cc64282f293f0ed64fc10 > > That fix was eventually in 2.81, but it looks like it misses the cases > where the NXDOMAIN reply contains a CNAME or DNAME. > > I've attached a patch that hopefully fixes this, but word of warning: I've > only been able to verify that it fixes the CNAME case, not the DNAME case. > I don't think it breaks the intended functionality from b6f926f, but I will > admit, I don't feel familiar enough with the inner workings of Dnsmasq to > verify that myself. > > Regards, > Dominick > > On Fri, Sep 11, 2020, at 7:53 PM, James Brown wrote: > > Just wanted to bump this thread since this is still kind of a > show-stopper for anyone that uses DNAMEs heavily. Any thoughts on how to > fix? > > > > On Wed, Jul 29, 2020 at 12:16 PM James Brown > wrote: > >> Indeed, that's the commit that did it. > >> > >> I'm not sure why that change has any effect for DNAMEs, though (which > are not being generated internally to dnsmasq)... > >> > >> On Wed, Jul 29, 2020 at 12:07 PM Geert Stappers > wrote: > >>> On Wed, Jul 29, 2020 at 11:23:17AM -0700, James Brown wrote: > >>> > I'm upgrading some test nodes in my employer's cluster from 2.78 to > 2.82 > >>> > and handling of DNAMEs in the new version seems different (and > wrong). > >>> > > >>> > The setup: > >>> > > >>> > local.mycompany.net is a DNAME to local-.mycompany.net, with > >>> > authoritative resolvers in each datacenter serving a different DNAME > record > >>> > prod.mycompany.net is an unrelated domain > >>> > > >>> > /etc/resolv.conf contains the line > >>> > > >>> > search local.mycompany.net prod.mycompany.net > >>> > > >>> > Imagine searching for the bare-word "foo", which is defined in > >>> > prod.mycompany.net but nowhere else. > >>> > > >>> > Under dnsmasq 2.78, querying for the bare name "foo" using the system > >>> > resolver will correctly first attempt to query for " > foo.local.mycompany.net", > >>> > get back a DNAME to foo.local-dcname.mycompany.net, then get an > empty > >>> > response with the NXDOMAIN code; that will fail, and glibc will then > query " > >>> > foo.prod.mycompany.net", which is the correct record. > >>> > > >>> > Under dnsmasq 2.82, querying for the bare name "foo" using the system > >>> > resolver will correctly first attempt to query for " > foo.local.mycompany.net", > >>> > get back a DNAME to foo.local-dcname.mycompany.net, gets back an > empty > >>> > response with the NOERROR code. This causes the system resolver to > stop > >>> > trying new search domains. This behavior seems to be dependent on > caching; > >>> > the first request correctly returns NXDOMAIN but subsequent requests > return > >>> > NOERROR. There's actually something more confusing to it than this; > if the > >>> > first request is for A, then subsequent requests return NOERROR > but > >>> > subsequent A requests return NXDOMAIN. Some kind of weird cache > poisoning > >>> > between record types? > >>> > > >>> > I bisected this in git and this behavioral change was introduced in > >>> > commit b6f926fbefcd2471699599e44f32b8d25b87b471. > >>> > >>> $ git log b6f926fbe...b6f926fbe^1 > >>> commit b6f926fbefcd2471699599e44f32b8d25b87b471 > >>> Author: Simon Kelley > >>> Date: Tue Aug 21 17:46:52 2018 +0100 > >>> > >>> Don't return NXDOMAIN to empty non-terminals. > >>> > >>> When a record is defined locally, eg an A record for > one.two.example then > >>> we already know that if we forward, eg an query for > one.two.example, > >>> and get back NXDOMAIN, then we need to alter that to NODATA. This > is handled > >>> by check_for_local_domain(). But, if we forward two.example, > because > >>> one.two.example exists, then the answer to two.example should also > be > >>> a NODATA. > >>> > >>> For most local records this is easy, just to substring matching. > >>> for A, and CNAME records that are in the cache, it's more > difficult. > >>> The cache has no efficient way to find such records. The fix is to > >>> insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each > >>> non-terminal. > >>> > >>> The same considerations apply in auth mode, and the same basic > mechanism > >>> is used there too. > >>> > >>> > >>> Regards > >>> Geert Stappers > >>> -- > >>> Silence is hard
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
This caught my eye because it's similar to a bug I noticed in 2.80. See (and ignore the first half of the message about CNAMEs; that was an unrelated issue): http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2019q4/013483.html It sounds like that was essentially the same issue, but without DNAMEs. It turned out it had already been fixed but the fix hadn't been released yet at the time: http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=162e5e0062ce923c494cc64282f293f0ed64fc10 That fix was eventually in 2.81, but it looks like it misses the cases where the NXDOMAIN reply contains a CNAME or DNAME. I've attached a patch that hopefully fixes this, but word of warning: I've only been able to verify that it fixes the CNAME case, not the DNAME case. I don't think it breaks the intended functionality from b6f926f, but I will admit, I don't feel familiar enough with the inner workings of Dnsmasq to verify that myself. Regards, Dominick On Fri, Sep 11, 2020, at 7:53 PM, James Brown wrote: > Just wanted to bump this thread since this is still kind of a show-stopper > for anyone that uses DNAMEs heavily. Any thoughts on how to fix? > > On Wed, Jul 29, 2020 at 12:16 PM James Brown wrote: >> Indeed, that's the commit that did it. >> >> I'm not sure why that change has any effect for DNAMEs, though (which are >> not being generated internally to dnsmasq)... >> >> On Wed, Jul 29, 2020 at 12:07 PM Geert Stappers wrote: >>> On Wed, Jul 29, 2020 at 11:23:17AM -0700, James Brown wrote: >>> > I'm upgrading some test nodes in my employer's cluster from 2.78 to 2.82 >>> > and handling of DNAMEs in the new version seems different (and wrong). >>> > >>> > The setup: >>> > >>> > local.mycompany.net is a DNAME to local-.mycompany.net, with >>> > authoritative resolvers in each datacenter serving a different DNAME >>> > record >>> > prod.mycompany.net is an unrelated domain >>> > >>> > /etc/resolv.conf contains the line >>> > >>> > search local.mycompany.net prod.mycompany.net >>> > >>> > Imagine searching for the bare-word "foo", which is defined in >>> > prod.mycompany.net but nowhere else. >>> > >>> > Under dnsmasq 2.78, querying for the bare name "foo" using the system >>> > resolver will correctly first attempt to query for >>> > "foo.local.mycompany.net", >>> > get back a DNAME to foo.local-dcname.mycompany.net, then get an empty >>> > response with the NXDOMAIN code; that will fail, and glibc will then >>> > query " >>> > foo.prod.mycompany.net", which is the correct record. >>> > >>> > Under dnsmasq 2.82, querying for the bare name "foo" using the system >>> > resolver will correctly first attempt to query for >>> > "foo.local.mycompany.net", >>> > get back a DNAME to foo.local-dcname.mycompany.net, gets back an empty >>> > response with the NOERROR code. This causes the system resolver to stop >>> > trying new search domains. This behavior seems to be dependent on caching; >>> > the first request correctly returns NXDOMAIN but subsequent requests >>> > return >>> > NOERROR. There's actually something more confusing to it than this; if the >>> > first request is for A, then subsequent requests return NOERROR but >>> > subsequent A requests return NXDOMAIN. Some kind of weird cache poisoning >>> > between record types? >>> > >>> > I bisected this in git and this behavioral change was introduced in >>> > commit b6f926fbefcd2471699599e44f32b8d25b87b471. >>> >>> $ git log b6f926fbe...b6f926fbe^1 >>> commit b6f926fbefcd2471699599e44f32b8d25b87b471 >>> Author: Simon Kelley >>> Date: Tue Aug 21 17:46:52 2018 +0100 >>> >>> Don't return NXDOMAIN to empty non-terminals. >>> >>> When a record is defined locally, eg an A record for one.two.example >>> then >>> we already know that if we forward, eg an query for >>> one.two.example, >>> and get back NXDOMAIN, then we need to alter that to NODATA. This is >>> handled >>> by check_for_local_domain(). But, if we forward two.example, because >>> one.two.example exists, then the answer to two.example should also be >>> a NODATA. >>> >>> For most local records this is easy, just to substring matching. >>> for A, and CNAME records that are in the cache, it's more >>> difficult. >>> The cache has no efficient way to find such records. The fix is to >>> insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each >>> non-terminal. >>> >>> The same considerations apply in auth mode, and the same basic mechanism >>> is used there too. >>> >>> >>> Regards >>> Geert Stappers >>> -- >>> Silence is hard to parse >>> >>> ___ >>> Dnsmasq-discuss mailing list >>> Dnsmasq-discuss@lists.thekelleys.org.uk >>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >> >> >> -- >> James Brown >> Engineer > > > -- > James Brown > Engineer > ___ > Dnsmasq-discuss mailing li
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
On Fri, Sep 11, 2020 at 04:53:14PM -0700, James Brown wrote: > On Wed, Jul 29, 2020 at 12:16 PM James Brown wrote: > > > Indeed, that's the commit that did it. > > > Just wanted to bump this thread since this is still kind of a show-stopper > for anyone that uses DNAMEs heavily. Any thoughts on how to fix? > What about _leaving out_ the commit that did it? ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
Just wanted to bump this thread since this is still kind of a show-stopper for anyone that uses DNAMEs heavily. Any thoughts on how to fix? On Wed, Jul 29, 2020 at 12:16 PM James Brown wrote: > Indeed, that's the commit that did it. > > I'm not sure why that change has any effect for DNAMEs, though (which are > not being generated internally to dnsmasq)... > > On Wed, Jul 29, 2020 at 12:07 PM Geert Stappers > wrote: > >> On Wed, Jul 29, 2020 at 11:23:17AM -0700, James Brown wrote: >> > I'm upgrading some test nodes in my employer's cluster from 2.78 to 2.82 >> > and handling of DNAMEs in the new version seems different (and wrong). >> > >> > The setup: >> > >> > local.mycompany.net is a DNAME to local-.mycompany.net, with >> > authoritative resolvers in each datacenter serving a different DNAME >> record >> > prod.mycompany.net is an unrelated domain >> > >> > /etc/resolv.conf contains the line >> > >> > search local.mycompany.net prod.mycompany.net >> > >> > Imagine searching for the bare-word "foo", which is defined in >> > prod.mycompany.net but nowhere else. >> > >> > Under dnsmasq 2.78, querying for the bare name "foo" using the system >> > resolver will correctly first attempt to query for " >> foo.local.mycompany.net", >> > get back a DNAME to foo.local-dcname.mycompany.net, then get an empty >> > response with the NXDOMAIN code; that will fail, and glibc will then >> query " >> > foo.prod.mycompany.net", which is the correct record. >> > >> > Under dnsmasq 2.82, querying for the bare name "foo" using the system >> > resolver will correctly first attempt to query for " >> foo.local.mycompany.net", >> > get back a DNAME to foo.local-dcname.mycompany.net, gets back an empty >> > response with the NOERROR code. This causes the system resolver to stop >> > trying new search domains. This behavior seems to be dependent on >> caching; >> > the first request correctly returns NXDOMAIN but subsequent requests >> return >> > NOERROR. There's actually something more confusing to it than this; if >> the >> > first request is for A, then subsequent requests return NOERROR but >> > subsequent A requests return NXDOMAIN. Some kind of weird cache >> poisoning >> > between record types? >> > >> > I bisected this in git and this behavioral change was introduced in >> > commit b6f926fbefcd2471699599e44f32b8d25b87b471. >> >> $ git log b6f926fbe...b6f926fbe^1 >> commit b6f926fbefcd2471699599e44f32b8d25b87b471 >> Author: Simon Kelley >> Date: Tue Aug 21 17:46:52 2018 +0100 >> >> Don't return NXDOMAIN to empty non-terminals. >> >> When a record is defined locally, eg an A record for one.two.example >> then >> we already know that if we forward, eg an query for >> one.two.example, >> and get back NXDOMAIN, then we need to alter that to NODATA. This is >> handled >> by check_for_local_domain(). But, if we forward two.example, because >> one.two.example exists, then the answer to two.example should also be >> a NODATA. >> >> For most local records this is easy, just to substring matching. >> for A, and CNAME records that are in the cache, it's more >> difficult. >> The cache has no efficient way to find such records. The fix is to >> insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each >> non-terminal. >> >> The same considerations apply in auth mode, and the same basic >> mechanism >> is used there too. >> >> >> Regards >> Geert Stappers >> -- >> Silence is hard to parse >> >> ___ >> Dnsmasq-discuss mailing list >> Dnsmasq-discuss@lists.thekelleys.org.uk >> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >> > > > -- > James Brown > Engineer > -- James Brown Engineer ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
Indeed, that's the commit that did it. I'm not sure why that change has any effect for DNAMEs, though (which are not being generated internally to dnsmasq)... On Wed, Jul 29, 2020 at 12:07 PM Geert Stappers wrote: > On Wed, Jul 29, 2020 at 11:23:17AM -0700, James Brown wrote: > > I'm upgrading some test nodes in my employer's cluster from 2.78 to 2.82 > > and handling of DNAMEs in the new version seems different (and wrong). > > > > The setup: > > > > local.mycompany.net is a DNAME to local-.mycompany.net, with > > authoritative resolvers in each datacenter serving a different DNAME > record > > prod.mycompany.net is an unrelated domain > > > > /etc/resolv.conf contains the line > > > > search local.mycompany.net prod.mycompany.net > > > > Imagine searching for the bare-word "foo", which is defined in > > prod.mycompany.net but nowhere else. > > > > Under dnsmasq 2.78, querying for the bare name "foo" using the system > > resolver will correctly first attempt to query for " > foo.local.mycompany.net", > > get back a DNAME to foo.local-dcname.mycompany.net, then get an empty > > response with the NXDOMAIN code; that will fail, and glibc will then > query " > > foo.prod.mycompany.net", which is the correct record. > > > > Under dnsmasq 2.82, querying for the bare name "foo" using the system > > resolver will correctly first attempt to query for " > foo.local.mycompany.net", > > get back a DNAME to foo.local-dcname.mycompany.net, gets back an empty > > response with the NOERROR code. This causes the system resolver to stop > > trying new search domains. This behavior seems to be dependent on > caching; > > the first request correctly returns NXDOMAIN but subsequent requests > return > > NOERROR. There's actually something more confusing to it than this; if > the > > first request is for A, then subsequent requests return NOERROR but > > subsequent A requests return NXDOMAIN. Some kind of weird cache poisoning > > between record types? > > > > I bisected this in git and this behavioral change was introduced in > > commit b6f926fbefcd2471699599e44f32b8d25b87b471. > > $ git log b6f926fbe...b6f926fbe^1 > commit b6f926fbefcd2471699599e44f32b8d25b87b471 > Author: Simon Kelley > Date: Tue Aug 21 17:46:52 2018 +0100 > > Don't return NXDOMAIN to empty non-terminals. > > When a record is defined locally, eg an A record for one.two.example > then > we already know that if we forward, eg an query for > one.two.example, > and get back NXDOMAIN, then we need to alter that to NODATA. This is > handled > by check_for_local_domain(). But, if we forward two.example, because > one.two.example exists, then the answer to two.example should also be > a NODATA. > > For most local records this is easy, just to substring matching. > for A, and CNAME records that are in the cache, it's more > difficult. > The cache has no efficient way to find such records. The fix is to > insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each > non-terminal. > > The same considerations apply in auth mode, and the same basic > mechanism > is used there too. > > > Regards > Geert Stappers > -- > Silence is hard to parse > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > -- James Brown Engineer ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
On Wed, Jul 29, 2020 at 11:23:17AM -0700, James Brown wrote: > I'm upgrading some test nodes in my employer's cluster from 2.78 to 2.82 > and handling of DNAMEs in the new version seems different (and wrong). > > The setup: > > local.mycompany.net is a DNAME to local-.mycompany.net, with > authoritative resolvers in each datacenter serving a different DNAME record > prod.mycompany.net is an unrelated domain > > /etc/resolv.conf contains the line > > search local.mycompany.net prod.mycompany.net > > Imagine searching for the bare-word "foo", which is defined in > prod.mycompany.net but nowhere else. > > Under dnsmasq 2.78, querying for the bare name "foo" using the system > resolver will correctly first attempt to query for "foo.local.mycompany.net", > get back a DNAME to foo.local-dcname.mycompany.net, then get an empty > response with the NXDOMAIN code; that will fail, and glibc will then query " > foo.prod.mycompany.net", which is the correct record. > > Under dnsmasq 2.82, querying for the bare name "foo" using the system > resolver will correctly first attempt to query for "foo.local.mycompany.net", > get back a DNAME to foo.local-dcname.mycompany.net, gets back an empty > response with the NOERROR code. This causes the system resolver to stop > trying new search domains. This behavior seems to be dependent on caching; > the first request correctly returns NXDOMAIN but subsequent requests return > NOERROR. There's actually something more confusing to it than this; if the > first request is for A, then subsequent requests return NOERROR but > subsequent A requests return NXDOMAIN. Some kind of weird cache poisoning > between record types? > > I bisected this in git and this behavioral change was introduced in > commit b6f926fbefcd2471699599e44f32b8d25b87b471. $ git log b6f926fbe...b6f926fbe^1 commit b6f926fbefcd2471699599e44f32b8d25b87b471 Author: Simon Kelley Date: Tue Aug 21 17:46:52 2018 +0100 Don't return NXDOMAIN to empty non-terminals. When a record is defined locally, eg an A record for one.two.example then we already know that if we forward, eg an query for one.two.example, and get back NXDOMAIN, then we need to alter that to NODATA. This is handled by check_for_local_domain(). But, if we forward two.example, because one.two.example exists, then the answer to two.example should also be a NODATA. For most local records this is easy, just to substring matching. for A, and CNAME records that are in the cache, it's more difficult. The cache has no efficient way to find such records. The fix is to insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each non-terminal. The same considerations apply in auth mode, and the same basic mechanism is used there too. Regards Geert Stappers -- Silence is hard to parse ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
[Dnsmasq-discuss] Incorrect response for DNAME'd records in dnsmasq 2.80+
I'm upgrading some test nodes in my employer's cluster from 2.78 to 2.82 and handling of DNAMEs in the new version seems different (and wrong). The setup: local.mycompany.net is a DNAME to local-.mycompany.net, with authoritative resolvers in each datacenter serving a different DNAME record prod.mycompany.net is an unrelated domain /etc/resolv.conf contains the line search local.mycompany.net prod.mycompany.net Imagine searching for the bare-word "foo", which is defined in prod.mycompany.net but nowhere else. Under dnsmasq 2.78, querying for the bare name "foo" using the system resolver will correctly first attempt to query for "foo.local.mycompany.net", get back a DNAME to foo.local-dcname.mycompany.net, then get an empty response with the NXDOMAIN code; that will fail, and glibc will then query " foo.prod.mycompany.net", which is the correct record. Under dnsmasq 2.82, querying for the bare name "foo" using the system resolver will correctly first attempt to query for "foo.local.mycompany.net", get back a DNAME to foo.local-dcname.mycompany.net, gets back an empty response with the NOERROR code. This causes the system resolver to stop trying new search domains. This behavior seems to be dependent on caching; the first request correctly returns NXDOMAIN but subsequent requests return NOERROR. There's actually something more confusing to it than this; if the first request is for A, then subsequent requests return NOERROR but subsequent A requests return NXDOMAIN. Some kind of weird cache poisoning between record types? I bisected this in git and this behavioral change was introduced in commit b6f926fbefcd2471699599e44f32b8d25b87b471. -- James Brown Engineer ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss