Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Otto Moerbeek via Pdns-users
On Thu, Jan 26, 2023 at 10:57:21PM +0100, Arien Vijn wrote:

> 
> > On 26 Jan 2023, at 19:00, Otto Moerbeek  wrote:
> 
> [...]
> 
> > I expect the aggressive cache workaround to function.
> 
> It seems so indeed.
> 
> > What is happening is that a query of a non-existent type (e.g. )
> > for xdsl-c-serviceweb.gslb.kpn.com 
> > 
> > $ dig @ns1gslb.kpn.com.  xdsl-c-serviceweb.gslb.kpn.com 
> >   +dnssec
> > 
> > produces an NSEC3 record that denies all types except TXT and RRSIG:
> > 
> > cq026lgcduus730qu6cbhtrt7qpr2jnu.gslb.kpn.com 
> > . 86400 IN NSEC3 
> > 1 0 1 19623DE58C1E7E40 CQ026LGCDUUS730QU6CBHTRT7QPR2JNV TXT RRSIG
> > 
> > So when the A record expires and somebody has done an  query in
> > between, the aggressive cache concludes that the wanted A record  does
> > not exists and not even asks the auth for it.
> > 
> > The after a cache wipe it works because when the (aggressive) cache is
> > empty for that zone, there is also no NSEC3 record denying anything.
> > 
> > So in the end this is a misconfigured domain. Completely disabling the
> > aggressive cache is a bit of a big hammer, you can also add an NTA for
> > the specific problem domain, something like:
> > 
> > addNTA('gslb.kpn.com ', 'Invalid NSEC3 record served 
> > for xdsl-c-serviceweb.gslb.kpn.com 
> > ')
> > 
> > in your Lua config file. This effectively does disable DNSSEC for the
> > domain. And please also report this to KPN.
> 
> Thanks for the explanation! This is really useful because KPN pointed to our 
> DNS= servers.
> 
> We also saw this with other (KPN hosted) 'gslb-domains', which also show no 
> trouble anymore after disabling the
> aggressive cache. So, if we go the NTA-way then I am afraid that we'll have 
> to add a series of NTAs then :/
> 
> At any rate, I am really glad with this explanation. I hope that KPN, and the 
> parties they outsourced their DNS service to, wil appreciate this too :)
> 
> -- Arien

This gives background information and a link to a remedy to be
employed on the load balancer side.

https://en.blog.nic.cz/2019/07/10/error-in-dnssec-implementation-on-f5-big-ip-load-balancers/

-Otto



___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Arien Vijn via Pdns-users

> On 26 Jan 2023, at 19:00, Otto Moerbeek  wrote:

[...]

> I expect the aggressive cache workaround to function.

It seems so indeed.

> What is happening is that a query of a non-existent type (e.g. )
> for xdsl-c-serviceweb.gslb.kpn.com 
> 
> $ dig @ns1gslb.kpn.com.  xdsl-c-serviceweb.gslb.kpn.com 
>   +dnssec
> 
> produces an NSEC3 record that denies all types except TXT and RRSIG:
> 
> cq026lgcduus730qu6cbhtrt7qpr2jnu.gslb.kpn.com 
> . 86400 IN   NSEC3 
> 1 0 1 19623DE58C1E7E40 CQ026LGCDUUS730QU6CBHTRT7QPR2JNV TXT RRSIG
> 
> So when the A record expires and somebody has done an  query in
> between, the aggressive cache concludes that the wanted A record  does
> not exists and not even asks the auth for it.
> 
> The after a cache wipe it works because when the (aggressive) cache is
> empty for that zone, there is also no NSEC3 record denying anything.
> 
> So in the end this is a misconfigured domain. Completely disabling the
> aggressive cache is a bit of a big hammer, you can also add an NTA for
> the specific problem domain, something like:
> 
> addNTA('gslb.kpn.com ', 'Invalid NSEC3 record served 
> for xdsl-c-serviceweb.gslb.kpn.com ')
> 
> in your Lua config file. This effectively does disable DNSSEC for the
> domain. And please also report this to KPN.

Thanks for the explanation! This is really useful because KPN pointed to our 
DNS= servers.

We also saw this with other (KPN hosted) 'gslb-domains', which also show no 
trouble anymore after disabling the
aggressive cache. So, if we go the NTA-way then I am afraid that we'll have to 
add a series of NTAs then :/

At any rate, I am really glad with this explanation. I hope that KPN, and the 
parties they outsourced their DNS service to, wil appreciate this too :)

-- Arien





signature.asc
Description: Message signed with OpenPGP
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Otto Moerbeek via Pdns-users
On Thu, Jan 26, 2023 at 05:37:12PM +0100, Arien Vijn via Pdns-users wrote:

> Hi Peter,
> 
> > On 26 Jan 2023, at 17:28, Peter van Dijk via Pdns-users 
> >  wrote:
> 
> [...]
> 
> > After some brief investigation we somewhat suspect this is aggressive
> > NSEC caching. Can you see if aggressive-nsec-cache-size=0 makes the
> > problem go away?
> 
> Thanks! I'll add this line to the configuration right away :)
> 
> -- Ari??n
> 

I expect the aggressive cache workaround to function.

What is happening is that a query of a non-existent type (e.g. )
for xdsl-c-serviceweb.gslb.kpn.com

$ dig @ns1gslb.kpn.com.  xdsl-c-serviceweb.gslb.kpn.com  +dnssec 

produces an NSEC3 record that denies all types except TXT and RRSIG:

cq026lgcduus730qu6cbhtrt7qpr2jnu.gslb.kpn.com. 86400 IN NSEC3 1 0 1 
19623DE58C1E7E40 CQ026LGCDUUS730QU6CBHTRT7QPR2JNV TXT RRSIG

So when the A record expires and somebody has done an  query in
between, the aggressive cache concludes that the wanted A record  does
not exists and not even asks the auth for it.

The after a cache wipe it works because when the (aggressive) cache is
empty for that zone, there is also no NSEC3 record denying anything.

So in the end this is a misconfigured domain. Completely disabling the
aggressive cache is a bit of a big hammer, you can also add an NTA for
the specific problem domain, something like:

addNTA('gslb.kpn.com', 'Invalid NSEC3 record served for 
xdsl-c-serviceweb.gslb.kpn.com')

in your Lua config file. This effectively does disable DNSSEC for the
domain. And please also report this to KPN.

-Otto



___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Arien Vijn via Pdns-users
Hi Peter,

> On 26 Jan 2023, at 17:28, Peter van Dijk via Pdns-users 
>  wrote:

[...]

> After some brief investigation we somewhat suspect this is aggressive
> NSEC caching. Can you see if aggressive-nsec-cache-size=0 makes the
> problem go away?

Thanks! I'll add this line to the configuration right away :)

-- Ariën



signature.asc
Description: Message signed with OpenPGP
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Arien Vijn via Pdns-users
Hi Otto,

Thanks for checking. Here is the configuration:

local-address=0.0.0.0, ::
local-port=53
query-local-address=::,0.0.0.0
threads=8

allow-from=127.0.0.0/8, ::1/128, 10.0.0.0/8, 87.251.42.0/26, 2001:7b8:650::/48

dnssec=validate

lua-config-file=/etc/powerdns/recursor.lua

webserver=yes
webserver-port=8082
webserver-address=::
webserver-password=
webserver-allow-from=0.0.0.0/32,::/0
api-key=

pdns-distributes-queries=true
reuseport=yes
any-to-tcp=yes
root-nx-trust=no
version-string=powerdns
max-ns-per-resolve=5

-- Ariën


> On 26 Jan 2023, at 17:04, Otto Moerbeek  wrote:
> 
> Hi,
> 
> Please show your configuration.
> 
> I do not think your analysis is to the point.
> If I repeat a scenario, I see a correct retrieval of the A record.
> 
> So we have to find out what is different in your case.
> 
>   -Otto
> 
> 
> On Thu, Jan 26, 2023 at 01:30:54PM +0100, Arien Vijn via Pdns-users wrote:
> 
>> Greetings,
>> 
>> We recently upgraded pdns_recursor from version 4.4.5 to 4.8.0. It seems 
>> that we run in into the following issue ever since.
>> 
>> 1/ Client queries for an A-record for xdsl-serviceweb.kpn.com.
>> 2/ Recursor queries the domain tree and receives the CNAME-record that 
>> points to: xdsl-c-serviceweb.gslb.kpn.com. from the authoritative DNS server.
>> 3/ Recursor queries and receives the subsequent an A-record from the 
>> authoritative DNS server for that A-record.
>> 4/ Recursor answers the client mentioned in 1/.
>> 
>> So far so good, until the A-record of xdsl-c-serviceweb.gslb.kpn.com. 
>> expires out of the 'main record cache' but not from the 'main packet cache'. 
>> The CNAME remains in both caches. Please note this excerpt from: rec_control 
>> dump-cache below:
>> 
>>   ; main record cache dump follows
>>   ;
>>   xdsl-serviceweb.kpn.com. 300 -224 IN CNAME xdsl-c-serviceweb.gslb.kpn.com. 
>> ; (Secure) auth=1 zone=kpn.com from=194.151.228.10 nm= rtag= ss=0
>>   ; negcache dump follows
>> 
>>   [...]
>> 
>>   ; main packet cache dump from thread follows
>>   ;
>>   xdsl-c-serviceweb.gslb.kpn.com. -1803 A  ; tag 0 udp
>> 
>>   [...]
>> 
>>   ; main packet cache dump from thread follows
>>   ;
>>   xdsl-serviceweb.kpn.com. -470 A  ; tag 0 udp
>>   xdsl-serviceweb.kpn.com. 111 A  ; tag 0 udp
>>   xdsl-serviceweb.kpn.com. 111   ; tag 0 udp
>> 
>> 
>> From that point on, pdns_recursor replies on queries for the A-record with 
>> the SOA-record of the domain of the said A-record:
>> 
>>   ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> 
>> xdsl-c-serviceweb.gslb.kpn.com. @localhost
>>   ;; global options: +cmd
>>   ;; Got answer:
>>   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36347
>>   ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
>> 
>>   ;; OPT PSEUDOSECTION:
>>   ; EDNS: version: 0, flags:; udp: 512
>>   ;; QUESTION SECTION:
>>   ;xdsl-c-serviceweb.gslb.kpn.com.IN  A
>> 
>>   ;; AUTHORITY SECTION:
>>   gslb.kpn.com.   79407   IN  SOA ns2gslb.kpn.com. 
>> netmaster.gslb.kpn.com. 2023011702 10800 3600 604800 86400
>> 
>>   ;; Query time: 0 msec
>>   ;; SERVER: ::1#53(::1)
>>   ;; WHEN: Thu Jan 26 12:10:13 CET 2023
>>   ;; MSG SIZE  rcvd: 113
>> 
>> 
>> This situation causes actual people to complain and is being resolved by 
>> removing the domain tree for the subdomain gslb.kpn.com. out of the caches. 
>> From then on the story starts again.
>> 
>> That the A-record xdsl-c-serviceweb.gslb.kpn.com. remains in the packet 
>> cache seems not good to me, but I don't know enough about DNS and 
>> pdns_recursor be sure. What could trigger this behaviour or is it perhaps a 
>> configuration issue because we made such a large jump in versions when we 
>> upgraded? Last but not least we see the same behaviour with at least one 
>> other hostname
>> 
>> -- Ari??n
>> 
> 
> 
> 
>> ___
>> Pdns-users mailing list
>> Pdns-users@mailman.powerdns.com 
>> https://mailman.powerdns.com/mailman/listinfo/pdns-users 
>> 


signature.asc
Description: Message signed with OpenPGP
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Peter van Dijk via Pdns-users
Hi Arien,

On Thu, 2023-01-26 at 13:30 +0100, Arien Vijn via Pdns-users wrote:
> Greetings,
> 
> We recently upgraded pdns_recursor from version 4.4.5 to 4.8.0. It seems that 
> we run in into the following issue ever since.
> 
> 1/ Client queries for an A-record for xdsl-serviceweb.kpn.com.
> 2/ Recursor queries the domain tree and receives the CNAME-record that points 
> to: xdsl-c-serviceweb.gslb.kpn.com. from the authoritative DNS server.
> 3/ Recursor queries and receives the subsequent an A-record from the 
> authoritative DNS server for that A-record.
> 4/ Recursor answers the client mentioned in 1/.
> 
> So far so good, until the A-record of xdsl-c-serviceweb.gslb.kpn.com. expires 
> out of the 'main record cache' but not from the 'main packet cache'. The 
> CNAME remains in both caches. Please note this excerpt from: rec_control 
> dump-cache below:

After some brief investigation we somewhat suspect this is aggressive
NSEC caching. Can you see if aggressive-nsec-cache-size=0 makes the
problem go away?

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] pdns_recursor issue

2023-01-26 Thread Otto Moerbeek via Pdns-users
Hi,

Please show your configuration.

I do not think your analysis is to the point.
If I repeat a scenario, I see a correct retrieval of the A record.

So we have to find out what is different in your case.

-Otto


On Thu, Jan 26, 2023 at 01:30:54PM +0100, Arien Vijn via Pdns-users wrote:

> Greetings,
> 
> We recently upgraded pdns_recursor from version 4.4.5 to 4.8.0. It seems that 
> we run in into the following issue ever since.
> 
> 1/ Client queries for an A-record for xdsl-serviceweb.kpn.com.
> 2/ Recursor queries the domain tree and receives the CNAME-record that points 
> to: xdsl-c-serviceweb.gslb.kpn.com. from the authoritative DNS server.
> 3/ Recursor queries and receives the subsequent an A-record from the 
> authoritative DNS server for that A-record.
> 4/ Recursor answers the client mentioned in 1/.
> 
> So far so good, until the A-record of xdsl-c-serviceweb.gslb.kpn.com. expires 
> out of the 'main record cache' but not from the 'main packet cache'. The 
> CNAME remains in both caches. Please note this excerpt from: rec_control 
> dump-cache below:
> 
>; main record cache dump follows
>;
>xdsl-serviceweb.kpn.com. 300 -224 IN CNAME xdsl-c-serviceweb.gslb.kpn.com. 
> ; (Secure) auth=1 zone=kpn.com from=194.151.228.10 nm= rtag= ss=0
>; negcache dump follows
> 
>[...]
> 
>; main packet cache dump from thread follows
>;
>xdsl-c-serviceweb.gslb.kpn.com. -1803 A  ; tag 0 udp
> 
>[...]
> 
>; main packet cache dump from thread follows
>;
>xdsl-serviceweb.kpn.com. -470 A  ; tag 0 udp
>xdsl-serviceweb.kpn.com. 111 A  ; tag 0 udp
>xdsl-serviceweb.kpn.com. 111   ; tag 0 udp
> 
> 
> From that point on, pdns_recursor replies on queries for the A-record with 
> the SOA-record of the domain of the said A-record:
> 
>; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> 
> xdsl-c-serviceweb.gslb.kpn.com. @localhost
>;; global options: +cmd
>;; Got answer:
>;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36347
>;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
> 
>;; OPT PSEUDOSECTION:
>; EDNS: version: 0, flags:; udp: 512
>;; QUESTION SECTION:
>;xdsl-c-serviceweb.gslb.kpn.com.IN  A
> 
>;; AUTHORITY SECTION:
>gslb.kpn.com.   79407   IN  SOA ns2gslb.kpn.com. 
> netmaster.gslb.kpn.com. 2023011702 10800 3600 604800 86400
> 
>;; Query time: 0 msec
>;; SERVER: ::1#53(::1)
>;; WHEN: Thu Jan 26 12:10:13 CET 2023
>;; MSG SIZE  rcvd: 113
> 
> 
> This situation causes actual people to complain and is being resolved by 
> removing the domain tree for the subdomain gslb.kpn.com. out of the caches. 
> From then on the story starts again.
> 
> That the A-record xdsl-c-serviceweb.gslb.kpn.com. remains in the packet cache 
> seems not good to me, but I don't know enough about DNS and pdns_recursor be 
> sure. What could trigger this behaviour or is it perhaps a configuration 
> issue because we made such a large jump in versions when we upgraded? Last 
> but not least we see the same behaviour with at least one other hostname
> 
> -- Ari??n
> 



> ___
> Pdns-users mailing list
> Pdns-users@mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users

___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users