Re: [dns-operations] Open source DNS quality assurance & risks: survey and discussion

2024-05-16 Thread Petr Špaček

Last call: The survey will close in 24 hours.

Link:
https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey

Thank you very much to those of you who already filled the survey!

If you did not yet please take 4 minutes of your time, it's pretty easy 
with lots of multiple choice, so that you'll hardly have to type 
anything yourself!


We'd like to draw some conclusions from the results *by this Friday*, so 
don't schedule it for later: click now on the link and contribute with 
your experience for next week's discussion.


Petr Špaček
Internet Systems Consortium


On 13. 05. 24 10:05, Petr Špaček wrote:

Dear DNS colleagues,

I invite you all to an open discussion about Open source quality 
assurance & risk mitigation. Hopefully this is relevant to many 
participants here as I believe that open source and DNS go well together.


To fuel the discussion please fill out a 5-minute survey here:
https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey

Our goal is to find out what people actually look for when assessing 
open source quality and risks associated with the software.


The actual discussion will take place during hybrid RIPE 88 meeting in 
Krakow, Poland on Thursday 2024 May 23, starting around 14:00 UTC+2 - 
during Open source working group session.


Remote participation is available free of charge [1].

You opinions are very welcome even if you don't plan to attend the meeting!

[1] https://ripe88.ripe.net/attend/register/

Thank you for your time - and see you in the meeting, at least virtually!



___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] Open source DNS quality assurance & risks: survey and discussion

2024-05-13 Thread Petr Špaček

Dear DNS colleagues,

I invite you all to an open discussion about Open source quality 
assurance & risk mitigation. Hopefully this is relevant to many 
participants here as I believe that open source and DNS go well together.


To fuel the discussion please fill out a 5-minute survey here:
https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey

Our goal is to find out what people actually look for when assessing 
open source quality and risks associated with the software.


The actual discussion will take place during hybrid RIPE 88 meeting in 
Krakow, Poland on Thursday 2024 May 23, starting around 14:00 UTC+2 - 
during Open source working group session.


Remote participation is available free of charge [1].

You opinions are very welcome even if you don't plan to attend the meeting!

[1] https://ripe88.ripe.net/attend/register/

Thank you for your time - and see you in the meeting, at least virtually!

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] OARC 43 - Call for Contribution

2024-04-17 Thread Petr Špaček

The Programme Committee is seeking contributions from the community.

This workshop will be a hybrid event.

Date - likely in the week of 23-27 September 2024, details will be 
confirmed later


Location - South America, exact location will be confirmed later

Time zone - approximately 09:00-17:00 UTC-5

Co-located with - related industry events, will be confirmed later

Deadline for Submissions - 2024-06-23 23:59 UTC

For further details please see
https://indico.dns-oarc.net/event/51/abstracts/

Petr Špaček, for the DNS-OARC Programme Committee

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] ag.gov not providing NXDOMAIN responses

2024-04-12 Thread Petr Špaček

On 11. 04. 24 6:15, Stephane Bortzmeyer wrote:

On Tue, Apr 09, 2024 at 01:09:20PM -0500,
  David Zych  wrote
  a message of 121 lines which said:


The problem: when queried for a record underneath ag.gov. which does
not exist, these nameservers do not return a proper NXDOMAIN
response; instead, they don't answer at all.


Funny enough, it depends on the QTYPE.

% dig @ns2.usda.gov. nonono.ag.gov A
;; communications error to 2600:12f0:0:ac04::206#53: timed out
;; communications error to 2600:12f0:0:ac04::206#53: timed out
;; communications error to 2600:12f0:0:ac04::206#53: timed out
;; communications error to 199.141.126.206#53: timed out

; <<>> DiG 9.18.24-1-Debian <<>> @ns2.usda.gov. nonono.ag.gov A
; (2 servers found)
;; global options: +cmd
;; no servers could be reached

% dig @ns2.usda.gov. nonono.ag.gov NS

; <<>> DiG 9.18.24-1-Debian <<>> @ns2.usda.gov. nonono.ag.gov NS
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 44750
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1220
; COOKIE: 108e6a3526539745cbe04caf6617b75afc5cf42f25232e56 (good)
;; QUESTION SECTION:
;nonono.ag.gov. IN NS

;; AUTHORITY SECTION:
ag.gov. 900 IN SOA ns1.usda.gov. duty\.officer.usda.gov. (
...


The practical trouble this causes has to do with an increasingly popular DNS 
privacy feature called QNAME Minimization, which depends upon authoritative DNS 
servers like yours responding in a standards-compliant way to queries like

_.ag.gov IN A
_.ars.ag.gov IN A
_.tucson.ars.ag.gov IN A


More fun: the previous version of QNAME minimisation used QTYPE=NS. It
then changed to QTYPE=A precisely to work around broken
middleboxes. (And also to avoid sticking out.)


This is not only in violation of
https://datatracker.ietf.org/doc/html/rfc8906
but it is an outright security issue because it allows attackers to mess 
up load balancing in resolvers. See

https://indico.dns-oarc.net/event/47/contributions/1018/attachments/959/1802/pre-silence-not-golden-dns-orac.pdf

I predict you have much better chance getting this fixed if you go 
through respective CERT team and point them to this presentation.



Answering before some asks: No, we are not going to workaround this in 
BIND resolver. It has to be fixed on the auth side. This is not a 
security bug in BIND. See

https://bind9.readthedocs.io/en/latest/chapter7.html#dns-resolvers

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] NSEC3PARAM change strange behaviour

2023-10-12 Thread Petr Špaček

On 12. 10. 23 13:09, Misak Khachatryan wrote:

Thank you Viktor,

In logs I see IXFR, which should be a case. This brings me to question 
to bind developers - shouldn't a change of dnssec-policy or at least 
such destructive ones automatically trigger AXFR?


Of course it's not a question to be asked here, I will check the 
documentation of bind and ask it in the appropriate mailing list.


Just to close the loop, you can configure "max-ixfr-ratio" option. See 
https://bind9.readthedocs.io/en/latest/reference.html#namedconf-statement-max-ixfr-ratio


Please send further questions to mailing list
https://lists.isc.org/mailman/listinfo/bind-users

--
Petr Špaček
Internet Systems Consortium


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Signature expired for the DS of .ch at Cloudflare ?

2023-10-05 Thread Petr Špaček

On 04. 10. 23 10:38, Stephane Bortzmeyer wrote:

On Wed, Oct 04, 2023 at 10:35:14AM +0200,
  Stephane Bortzmeyer  wrote
  a message of 57 lines which said:


Other instances of Cloudflare has the correct info:

% dig +cd +nsid @1.1.1.1 DS ch.


https://www.cloudflarestatus.com/

Investigating - Cloudflare is aware of, and investigating, DNS resolution 
issues which potentially impacts multiple users using 1.1.1.1 public resolver 
and/or WARP.

Further detail will be provided as more information becomes available.
Oct 04, 2023 - 08:19 UTC


Details are now here:
https://blog.cloudflare.com/1-1-1-1-lookup-failures-on-october-4th-2023/

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS over TCP response fragmentation

2023-10-03 Thread Petr Špaček

On 03. 10. 23 11:25, Jan Petto wrote:
For my research, I am sending DNS requests over TCP to many different 
recursive DNS servers all over the world. A significant portion of these 
servers is sending the DNS response in two separate TCP segments, even 
though it would easily fit into one packet. Only after my client has 
acknowledged the first segment, the second part of the response is sent. 
The first TCP segment always contains only one or two bytes, never more.


I know a DNS message sent over TCP is prefixed by a two-byte field 
containing the message length. My first thought was that the first TCP 
segment contains this length field, and the second segment contains the 
DNS message, but then I discovered cases where only one of the two 
length bytes was contained in the first segment. In any case, sending 
the message length as a separate packet does not make much sense to me 
from an application design perspective. Maybe this is some sort of 
attack mitigation?


I have attached a packet capture containing two such examples. You can 
reproduce the behavior with any DNS client, e.g. dig:


# dig example.org +tcp @100.37.202.139

Also attached is a list of public DNS server IP addresses, where I have 
observed this behavior. They were found via scans of the IP address 
space, I have no affiliation with these servers.


I would greatly appreciate any input as to why so many servers are 
sending their responses in such a way.


I bet it's just suboptimal implementation on some SOHO router or 
something like that.


There are two things at play, I believe:
- Responder apparently does not use TCP_CORK (see "man tcp") or a 
userspace equivalent.
- Kernel is very relatex when it comes to TCP protocol segmentation. 
Nothing prescribes that TCP streams MUST be segmented in some sort of 
optimal way.


--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] MaginotDNS: Attacking the boundary of DNS caching protection

2023-09-27 Thread Petr Špaček

On 27. 09. 23 9:38, Ralf Weber wrote:

Moin!

On 27 Sep 2023, at 3:58, Xiang Li wrote:


Hi Stephane,

This is Xiang, the author of this paper.

For the off-path attack, DoT can protect the CDNS from being poisoned.
For the on-path attack, since the forwarding query is sent to the
attacker's server, only DNSSEC can mitigate the MaginotDNS.


I don’t think this is true otherwise all resolver implementations would
have been affected and not just a few. If you are on path direct behind
the resolver of course all bets are off, but if you are on path just
between the resolver and the forwarder those resolvers that are more
cautious in what cache information they use for iterative queries are not
vulnerable.

I guess that is why Akamai Cacheserve, NLNet Labs Unbound and PowerDNS
Recursor are not mentioned in the paper because they were not vulnerable.


That's right.

If you are interested in the gory details, BIND's description of the 
issue can be found here:

https://gitlab.isc.org/isc-projects/bind9/-/issues/2950#note_241893
https://gitlab.isc.org/isc-projects/bind9/-/issues/2950#note_244624

Also the surrounding comments have more details including vulnerable 
config files and PCAPs.


--
Petr Špaček
Internet Systems Consortium


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] IETF 118 hackaton: Does Not Scale: Rethinking DNS

2023-09-15 Thread Petr Špaček

Hello all!

I would like to invite you to a "round table" planned during IETF 118 
hackathon [1] - Saturday and Sunday before IETF 118.


We plan to have an open and friendly brainstorming session with people 
who work on the DNS protocol, write implementations, and operate networks.


The purpose is to brainstorm and think about DNS without being bound by 
current protocol constraints. Where are we hitting limits? What can we 
do about them? Do you want to put your protocol pet peeve out of its misery?


If you want to join, please list yourself here:
https://doodle.com/meeting/participate/id/azXrrv7d.
This will allow us to secure a large enough workspace.


Participants are expected to come with their homework done. Bring a list 
of limitations you can see in the current protocol with you, and don't 
hesitate to think big. Hate the duplicate TTLs in DNS messages? Please 
write it down. Want secure & flexible transport protocol specification? 
Never liked the compression method? Put it on the list.



As a teaser, here are a couple of real-world motivating questions just 
to get us started.


How do we make DNS:
... scalable so it can transfer millions of zones? And how do we monitor 
it? [2]
... handle humongous post-quantum crypto keys and signatures, in both 
protocol and transport? [3]

... support distributed multi-master setups?
... extensible to new wire format & at the same time, maintain a single 
namespace?
... simpler to operate? What if we rethink basic assumptions? [4] (see 
the talk starting at 33:40)


[1] https://wiki.ietf.org/en/meeting/118/hackathon
[2] https://indico.dns-oarc.net/event/47/contributions/1017/
[3] https://indico.dns-oarc.net/event/46/contributions/985/
[4] 
https://icann.zoom.us/rec/share/PUZu_QsO_rdY0gavMatzFOSVpZY1oNahNYnPBuy6pgTUJARw-YIOEzWEV11aqaHW.4Cwr3dGRlunUwhD9?startTime=1693897245000



It's unlikely we will produce running code, but hopefully we'll generate 
some good ideas and possibly proto-I-Ds.


--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region

2023-07-25 Thread Petr Špaček

On 18. 07. 23 23:53, Viktor Dukhovni wrote:

Currently, it’s 7 days for .com which almost exactly matches the RRSIG
expiry-inception difference and that doesn’t give any wiggle room if
things go wrong.

Expiry in the SOA applies to AXFR, but may deployments are not
AXFR-based.  And Verisign apparently did try to isolate the server,
sadly that didn't work out as expected.


. - 7 days SOA expiry and 14 days signature validity
.cz - 7 days SOA expiry and 14 days signature validity
.nl - 28 days SOA expiry and 14 days signature validity
.org - 14 days SOA expiry and 3 weeks signature validity

Do any of these use AXFR?  If all the servers are effectively "primary",
with incremental zone updates driven by some other process, the SOA
expiry is of little relevance.  Sure they should go offline before
signatures start to go stale (as Verisign tried to do, but failed).


Indeed some of the TLDs listed use good old AXFR/IXFR, but that's 
besides the point. See below.




The "go offline" logic should therefore be robust, but that's not
the topic at hand I think.  The topic is whether "bogus" should
generally be retriable (or even required to be retriable within
reasonable retry limits, and with error caching holddowns to
avoid thundering herd storms, ...).


I think that SOA EXPIRY is equally relevant to any sort of replication 
mechanism. Even if everything is driven by a non-DNS database backend it 
presumably has some notion of last successful synchronization with it's 
database-peers. Such timestamp can be used to trigger SERVFAIL when 
(last sync + SOA EXPIRY) time has passed.


--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNSSEC in BIND

2023-06-12 Thread Petr Špaček

Hello,

detailed documentation for DNSSEC in BIND is here:
https://bind9.readthedocs.io/en/latest/dnssec-guide.html

If anything is unclear please post questions to BIND mailing list:
https://lists.isc.org/mailman/listinfo/bind-users

HTH.
Petr Špaček
Internet Systems Consortium

On 12. 06. 23 15:37, daniel majela wrote:

   Hello...
My name is Daniel Majela and if possible I would like some help to 
implement DNNSEC on my servers.


Today I have 3 recursive and authoritative servers.
My external authoritative zones are copied to 2 DNS servers that are in 
the DMZ.


My first question is if there is a step by step way to implement dhssec 
using bind9 9.16.23-RH?


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNSSEC in BIND

2023-06-12 Thread Petr Špaček

Hello,

detailed documentation for DNSSEC in BIND is here:
https://bind9.readthedocs.io/en/latest/dnssec-guide.html

If anything is unclear please post questions to BIND mailing list:
https://lists.isc.org/mailman/listinfo/bind-users

HTH.
Petr Špaček
Internet Systems Consortium

On 12. 06. 23 15:37, daniel majela wrote:

   Hello...
My name is Daniel Majela and if possible I would like some help to 
implement DNNSEC on my servers.


Today I have 3 recursive and authoritative servers.
My external authoritative zones are copied to 2 DNS servers that are in 
the DMZ.


My first question is if there is a step by step way to implement dhssec 
using bind9 9.16.23-RH?


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [DNSOP] bind fails to continue recursing on one specific query

2023-03-29 Thread Petr Špaček

On 29. 03. 23 13:03, Dave Lawrence wrote:

Peter DeVries via dns-operations writes:

 Another relevant draft:
 https://datatracker.ietf.org/doc/html/rfc8906

Not sure how, it doesn't address _. as a use case at all and I only
see testing for minimal EDNS not minimal qname.


The journey of that document was with, essentially, No Response
Considered Harmful. While it does go over many specific examples, the
thrust of it from the Introduction is that not responding to
legitimate queries is an ambiguous signal that burdens the DNS
ecosystem even more.


That's right.

Well behaved DNS resolvers might assume that timeout indicates that the 
server is not keeping up, and resolver should try another server or 
enable throttling for a given non-responsive server (in an attempt to 
help server to keep up with load).


In other words, dropping queries from resolvers might/will cause 
legitimate clients to not get timely answers, but attackers will not 
care and will continue flooding the resolver.


Artificial timeouts also wreak havoc to some RTT estimation approaches etc.

Thus
=> RFC 8906 => It's A Bad Idea To Drop Queries.

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [DNSOP] bind fails to continue recursing on one specific query

2023-03-28 Thread Petr Špaček

On 28. 03. 23 13:00, Peter DeVries via dns-operations wrote:

The queries for "_.extglb.tn.gov. IN A ?" in your PCAP are a novelty to
me.  Are these some form of query minimisation, or some sort of sanity
check of the delegation?  Sadly, the "tn.gov" nameserver just drops
these without responding, so their failure could well contribute to the
problems you observe.

These are indeed how BIND does qname minimization in "relaxed" mode
which is currently the default.

We almost blocked these because we didn't know what they were but then
I stumbled upon one of the old RFC drafts for query minimization and
it does mention this as a technique.  I could see someone else doing
so as well because it did make up a very large percentage of our
inbound queries and there isn't much documentation on it.


FTR the underscore trick is now documented in
https://bind9.readthedocs.io/en/latest/reference.html#namedconf-statement-qname-minimization

(And also mentioned in RFC 7816 section 3.)

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Cloudflare TYPE65283

2023-03-27 Thread Petr Špaček

On 27. 03. 23 16:00, Emmanuel Fusté wrote:

Le 27/03/2023 à 15:38, Petr Špaček a écrit :

On 27. 03. 23 15:27, Emmanuel Fusté wrote:

Le 27/03/2023 à 14:34, Petr Špaček a écrit :

On 27. 03. 23 13:31, Emmanuel Fusté wrote:

Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit :

Le 27/03/2023 à 12:14, Joe Abley a écrit :

Hi Emmanuel,

On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté 
 wrote:
Cloudflare start to return TYPE65283 in their NSEC records for 
"compact

DNSSEC denial of existence"/"minimal lies" for NXDOMAINs.
It actually break "minimal lies" NXDOMAIN established decoding
implementations.
Does someone know the TYPE65283 usage/purpose in this context ?


If a compact negative response includes an NSEC RR whose type 
bitmap only includes NSEC and RRSIG, the response is is 
indistuishable from the case where the name exists but is an 
empty non-terminal. Adding a special entry in the type bitmap 
avoids that ambiguity and as a bonus provides an NXDOMAINish 
signal as a kind of compromise to those consumers who are all 
pitchforky about the RCODE. The spec currently calls that special 
type NXNAME.


https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt 
<https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt>

The spec is still a work in progress and the NXNAME type does not 
have a codepoint. I believe TYPE65283 is being used as a 
placeholder. I think Christian made a comment to that effect on 
this list last week, although I think he may not have mentioned 
the specific RRTYPE that was to be used.


If this has caused something to break, more details would be good 
to hear! 

..

Ok, replying to myself.
TYPE65283 is as you stated the place holder for a future NXNAME.
So they silently break their previous implementation to implement 
half of this this draft.
Their previous NXDOMAIN implementation correspond to draft ENT 
case, but they still implement their old way for ENT.

Thank you for the pointer.


Could you elaborate on the type of breakage you mentioned?

What got broken, specifically?

Client side previous draft NXDOMAIN status decoding as originally 
encoded by Cloudflare.
Having application relying on the NXDOMAIN status passed by the API, 
we where forced to add simple minimal lies decoding on the stub 
resolver (we don't want to disable DNSSEC validation on our trusted 
resolver or do special treatments on it for theses clients).
The decoding is based on exactly the presence of RRSIG and NSEC in 
the NSEC record.
The NS1 extension for restoring simple ENT identification is 
compatible with this scheme as for ENT you get RRSIG NSEC and TYPE65281.


Now I need to explicitly strip (or special case) TYPE65283 to restore 
NXDOMAIN identification from Cloudflare and still identify NXDOMAIN 
on NS1 and NXDOMAIN or ENT on Route53.
If Cloudflare switch to this draft for the ENT case too, it will 
became as worse as Route53 and only NS1 will give distinguishable 
real NXDOMAIN.
Or ALL compact lies response implementer should switch to this new 
draft and be known to have switched.


Thank you, that explains it!

I simply did not expect changes to draft implementations to be called 
"breakage".


Yes, I perfectly understand this position toward drafts in the common 
IETF sense/usage.
But as these drafts where and are imposed to us unilaterally on the 
whole Internet since years by majors DNS service providers they are 
sadly de-facto standards.


You got me curious: What is the use-case depending on this? I mean, from 
reading the DNS spec _alone_ it's not clear why any of variants in use 
should cause serious problems if it's done correctly.


--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Cloudflare TYPE65283

2023-03-27 Thread Petr Špaček

On 27. 03. 23 15:27, Emmanuel Fusté wrote:

Le 27/03/2023 à 14:34, Petr Špaček a écrit :

On 27. 03. 23 13:31, Emmanuel Fusté wrote:

Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit :

Le 27/03/2023 à 12:14, Joe Abley a écrit :

Hi Emmanuel,

On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté 
 wrote:
Cloudflare start to return TYPE65283 in their NSEC records for 
"compact

DNSSEC denial of existence"/"minimal lies" for NXDOMAINs.
It actually break "minimal lies" NXDOMAIN established decoding
implementations.
Does someone know the TYPE65283 usage/purpose in this context ?


If a compact negative response includes an NSEC RR whose type 
bitmap only includes NSEC and RRSIG, the response is is 
indistuishable from the case where the name exists but is an empty 
non-terminal. Adding a special entry in the type bitmap avoids that 
ambiguity and as a bonus provides an NXDOMAINish signal as a kind 
of compromise to those consumers who are all pitchforky about the 
RCODE. The spec currently calls that special type NXNAME.


https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt 
<https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt>

The spec is still a work in progress and the NXNAME type does not 
have a codepoint. I believe TYPE65283 is being used as a 
placeholder. I think Christian made a comment to that effect on 
this list last week, although I think he may not have mentioned the 
specific RRTYPE that was to be used.


If this has caused something to break, more details would be good 
to hear! 

..

Ok, replying to myself.
TYPE65283 is as you stated the place holder for a future NXNAME.
So they silently break their previous implementation to implement 
half of this this draft.
Their previous NXDOMAIN implementation correspond to draft ENT case, 
but they still implement their old way for ENT.

Thank you for the pointer.


Could you elaborate on the type of breakage you mentioned?

What got broken, specifically?

Client side previous draft NXDOMAIN status decoding as originally 
encoded by Cloudflare.
Having application relying on the NXDOMAIN status passed by the API, we 
where forced to add simple minimal lies decoding on the stub resolver 
(we don't want to disable DNSSEC validation on our trusted resolver or 
do special treatments on it for theses clients).
The decoding is based on exactly the presence of RRSIG and NSEC in the 
NSEC record.
The NS1 extension for restoring simple ENT identification is compatible 
with this scheme as for ENT you get RRSIG NSEC and TYPE65281.


Now I need to explicitly strip (or special case) TYPE65283 to restore 
NXDOMAIN identification from Cloudflare and still identify NXDOMAIN on 
NS1 and NXDOMAIN or ENT on Route53.
If Cloudflare switch to this draft for the ENT case too, it will became 
as worse as Route53 and only NS1 will give distinguishable real NXDOMAIN.
Or ALL compact lies response implementer should switch to this new draft 
and be known to have switched.


Thank you, that explains it!

I simply did not expect changes to draft implementations to be called 
"breakage".


--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Cloudflare TYPE65283

2023-03-27 Thread Petr Špaček

On 27. 03. 23 13:31, Emmanuel Fusté wrote:

Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit :

Le 27/03/2023 à 12:14, Joe Abley a écrit :

Hi Emmanuel,

On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté  
wrote:

Cloudflare start to return TYPE65283 in their NSEC records for "compact
DNSSEC denial of existence"/"minimal lies" for NXDOMAINs.
It actually break "minimal lies" NXDOMAIN established decoding
implementations.
Does someone know the TYPE65283 usage/purpose in this context ?


If a compact negative response includes an NSEC RR whose type bitmap 
only includes NSEC and RRSIG, the response is is indistuishable from 
the case where the name exists but is an empty non-terminal. Adding a 
special entry in the type bitmap avoids that ambiguity and as a bonus 
provides an NXDOMAINish signal as a kind of compromise to those 
consumers who are all pitchforky about the RCODE. The spec currently 
calls that special type NXNAME.


https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt 
<https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt>


The spec is still a work in progress and the NXNAME type does not 
have a codepoint. I believe TYPE65283 is being used as a placeholder. 
I think Christian made a comment to that effect on this list last 
week, although I think he may not have mentioned the 
specific RRTYPE that was to be used.


If this has caused something to break, more details would be good to 
hear!


Yes, I know about the draft to unbreak ENT. Thank you for the updated 
link with the latest version witch superset 
draft-huque-dnsop-blacklies-ent-01.

NS1 use TYPE65281 for ENT.

But in the observed case, the entry is not an ENT:


; <<>> DiG 9.18.13-1-Debian <<>> +norecurse @ns3.cloudflare.com 
+dnssec albert.ns.cloudflare.com.

; (4 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19880
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;albert.ns.cloudflare.com.  IN  A

;; AUTHORITY SECTION:
cloudflare.com. 300 IN  SOA ns3.cloudflare.com. 
dns.cloudflare.com. 2304565806 1 2400 604800 300
albert.ns.cloudflare.com. 300 IN    NSEC 
\000.albert.ns.cloudflare.com. RRSIG NSEC TYPE65283
albert.ns.cloudflare.com. 300 IN    RRSIG   NSEC 13 4 300 
20230328112618 20230326092618 34505 cloudflare.com. 
vNF+qAaZUSSreKRLhYHfg5sn7qoP1SV+fZgmivg3qmJecz7Cvp69A/8I 
Ew0XPOuG8CPQGA5doswZdnOk9cfLRw==
cloudflare.com. 300 IN  RRSIG   SOA 13 2 300 
20230328112618 20230326092618 34505 cloudflare.com. 
fD4t5hWnE7js8/gRqJn2G833NCmjcyFqW+WJZnPqHX3SiKBlwUlX2wh8 
UFj0ajbwuTVQpiJxZSb5hUNs9+KErQ==


;; Query time: 8 msec
;; SERVER: 162.159.0.33#53(ns3.cloudflare.com) (UDP)
;; WHEN: Mon Mar 27 12:26:18 CEST 2023
;; MSG SIZE  rcvd: 376

And for ENT, the response did not change from previous Cloudflaire 
implementation : all Cloudflare known types are added instead of RRSIG 
and NSEC.




Ok, replying to myself.
TYPE65283 is as you stated the place holder for a future NXNAME.
So they silently break their previous implementation to implement half 
of this this draft.
Their previous NXDOMAIN implementation correspond to draft ENT case, but 
they still implement their old way for ENT.

Thank you for the pointer.


Could you elaborate on the type of breakage you mentioned?

What got broken, specifically?

--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS measurement traffic etiquette

2023-01-02 Thread Petr Špaček

On 01. 01. 23 20:22, Olafur Gudmundsson wrote:


Andreas,
Do not bother to reach out to anyone these are unmanaged automated systems.
I once ran an experiment where query names were unique (i.e. only used once and 
derived from the IP address the query was sent to)
I was still receiving “repeat queries” a year later.
The queries came from “cloud compute” instances that had nothing to do with the 
original query.
Some of they queries came to the address that “sent" the query but others 
followed the delegation information for the domain

The interesting fact was how periodic those queries were ==> this was generated 
by cron jobs by someone doing something DNS related …


+1 to what Olafur said.

It might very well be *me* doing automated PCAP replays in AWS, or 
anyone else doing DNS research, or some sort of QA on DNS software. And 
of course, malware.


I guess blog post
https://blog.apnic.net/2016/04/04/dns-zombies/
might give you some insight - at least you are not alone :-)

Petr Špaček
Internet Systems Consortium




Olafur





On Dec 21, 2022, at 9:27 PM, Andreas Ott  wrote:

About two months ago we retired a network lab at my work by disconnecting it 
from the internet, and at the time I (naively) removed from the lab domain name 
all forward DNS records pointing to assets that no longer exist. When it was 
still live we had forward DNS and reverse PTR records, and in most cases these 
matched, further, you were most likely to get back consistent answers on 
forward lookup of the reverse answer. About a week after the closure I also had 
the reverse DNS records removed from the ISP servers that were authoritative 
for the in-addr.arpa zones. All caching timeouts would have long occurred by 
now if an entity would honor what had been in the SOA records. If I query any 
old records today they do return NXDOMAIN for me.

I did move the authoritative DNS servers to a much smaller setup thinking with the retirement of the assets 
there would be less traffic asking for them. However I am still seeing significant traffic querying forward 
records of PTR answers that got deleted a long time ago. It appears that this is "measurement" 
traffic that ignores getting "no" aka. NXDOMAIN as an answer, and keeps insisting to send the same 
queries over and over. I identified one "DNS labs" entity by name as one of the sources of these 
queries and will attempt to contact them. Most of the other now useless queries come from anonymous cloud 
compute based sources, like AWS nodes, which have generic reverse DNS entries and don't allow identifying the 
responsible party. To me it looks like the case of something being removed from the internet for good is not 
accounted for when constructing the measurement operations, if you get NXDOMAIN you interpret it as it must 
be some kind of brokenness and should be back soon, so you keep asking thousands more times until you get an 
answer?

What are my best options to find out who is behind all this traffic when it 
comes from anonymous sources?

For how long should I expect this query traffic to continue?

Or is there a way to politely signal to the queries by any DNS parameters that 
the record is now gone for good and they can stop asking, and not something is 
broken that will be fixed soon?

Thanks, andreas



___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Trouble with qa.ws.igt.fiscal.treasury.gov

2022-10-19 Thread Petr Špaček

On 18. 10. 22 17:58, Viktor Dukhovni wrote:

By the way is the validation workflow used in BIND written up somewhere
as a separate document, or are the comments in the code the best way to
understand how BIND validates names below a trust anchor (finding either
a valid signature or an insecure delegation).


Code is your guide :-)

Now seriously: I don't think documenting it neither
a) necessary
b) good idea

It can change between versions, and we certainly do not want people to 
depend on particular behavior. We want people to follow protocol!


--
Petr Špaček
Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Input from dns-operations on NCAP proposal

2022-06-01 Thread Petr Špaček

On 24. 05. 22 17:54, Vladimír Čunát via dns-operations wrote:


On 23/05/2022 15.48, Thomas, Matthew via dns-operations wrote:


Configuration 1: Generate a synthetic NXDOMAIN response to all queries 
with no SOA provided in the authority section.


I believe the protocol says not to cache such answers at all. Some 
implementations chose to cache at least a few seconds, but I don't think 
all of them.  Breaking caching seems risky to me, as traffic could 
increase very much (if the TLD was queried a lot).



Configuration 2: Generate a synthetic NXDOMAIN response to all queries 
with a SOA record.  Some example queries for the TLD .foo are below:


It still feels a bit risky to answer in this non-conforming way, and I 
can't really see why attempt that.  At apex the NXDOMAIN would deny the 
SOA included in the very same answer...



Configuration 3: Use a properly configured empty zone with correct NS 
and SOA records. Queries for the single label TLD would return a 
NOERROR and NODATA response.


I expect that's OK, especially if it's a TLD that's seriously 
considered.  I'd hope that "bad" usage is mainly sensitive to existence 
of records of other types like A.


Generally I agree with Vladimir, Configuration 3 is the way to go.

Non-compliant responses are riskier than protocol-compliant responses, 
and option 3 is the only compliant variant in your proposal.


Reasoning: Behavior for non-compliant answer is basically undefined 
because most RFCs do not describe what to do when a MUST condition is 
violated. It's hard to see how further evaluation of undefined behavior 
would help with determining further course of action.


--
Petr Špaček  @  Internet Systems Consortium

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] You live in a dump, Quoyle!

2022-02-15 Thread Petr Špaček

On 14. 02. 22 19:31, Viktor Dukhovni wrote:

On Mon, Feb 14, 2022 at 09:48:09AM -0800, Fred Morris wrote:


They're full (the DNS is full) of patterns and antipatterns. One fractal
rabbit hole example: [0]

[0] The DNS protocol allows multiple rvalues per type per oname. This
works ok for e.g. A/, is disallowed for CNAME, and is... I'm not sure
what it is for PTR records.


Multiple PTR records are legal, but not a best (or even sound) practice.


If an app is using hostnames in ACLs, it means you need to list them
all.


SMTP servers in some cases require clients to have FCrDNS
(forward-canonicalised reverse DNS) names.  This requires
the DNS to return:

 client IP -> pick a PTR -> A/ RRSet including same IP

this works even in the presence of multiple PTRs, provided they all
resolve to address lists that contain the input address.

Things tend to work poorly when automation adds a PTR record for
every forward "name -> IP" mapping with a given address.  One
then sometimes ends up with absurdly large PTR RRsets that
consume tens of KB in a TCP fallback after TC=1.

Best practice is to choose just one "primary" name as the PTR
for a given IP.


Things tend to work poorly in other cases, too.

My favorite is:
$ dig -x 66.172.247.9
and associated
$ dig cmts1-dhcp.longlines.com

--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Command Line BIND Query to Delegating Name Servers Gives FORMERR - Is this Bad for Normal DNS Operations by Public Resolvers?

2021-09-20 Thread Petr Špaček

Hi Jason,

I think you already answered yourself in your blog post:
https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig/

>  This behavior appears to violate “Any OPTION-CODE values not 
understood by a responder or requestor MUST be ignored.” from Section 
6.1.2 of RFC 6891, but that is of small consolation for a non-working 
system.


So yes, the authoritative server most likely has a bug.

How to approach the operation in question - that's a hard problem. You 
can either try various contacts you find, you can ask send name of the 
domain here and ask them to contact you off-list. For TLDs this method 
can work surprisingly well :-)


Good luck.
Petr Špaček

On 20. 09. 21 15:37, Jason Hynds wrote:

Hi,

I hope that the following conforms to the content expected of this list.


I stumbled on some /name servers/ (a branch of a ccTLD, performing a 
public good service, as far as I know) which are giving a FORMat ERRor 
(FORMERR) to default /dig/ queries from the command line as described in 
the referenced webpage, see [1] below. The workaround of +nocookie 
described in the blog allows for a successful query response. /Nslookup/ 
queries work fine.



I should mention that I have no administrative authority of the name 
servers showing this condition. I'm just noticed the behaviour whilst 
checking on a DNS hosting migration for a client of the name servers 
exhibiting the behaviour.



Would someone be able to advise me on:

 1. How bad it may be for an authoritative or delegating name server to
be exhibiting this behaviour?
 2. Does this potentially cause a resolution outage, or would a BIND
server adjust and re-query in order to obtain a usable result?
 3. Is the BIND server non-compliant, or the likely Microsoft DNS
non-compliant, to an RFC?
 4. How would I explain such an issue to a name server operator who I do
not know?


I appreciate any guidance provided. I apologies in advance if I violated 
any list policy. Thanks for any assistance.



*REFERENCE*

[1] FORMERR from Microsoft DNS Server for DIG. Posted January 20,
2017 at 11:18 PM MST by Kevin Locke

<https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig

<https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig>>.


Regards,


Jason Hynds.


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Incomplete type bitmaps in NSEC(3) records and aggressive use of DNSSEC validated cache

2021-09-08 Thread Petr Špaček

On 08. 09. 21 11:12, Ruben van Staveren via dns-operations wrote:
Last month or so I saw two domains, postnl.nl <http://postnl.nl> and 
minjenv.nl <http://minjenv.nl>, return incomplete NSEC3 records where 
existing records where omitted from the Type Bit Maps.


This caused strange intermittent failures when a resolver was used that 
implements aggressive use of DNSSEC validated cache (RFC8198, 4 years 
old), e.g powerdns recursor 4.5.x.



e.g., the minjenv has a mx record, but it is not listed in the NSEC3 
you’ll get if you query for the non existent A/ record (only NS SOA 
RRSIG DNSKEY NSEC3PARAM) causing mail delivery failures until the TTL 
expires. postnl.nl <http://postnl.nl> has A/, but the NSEC3 seen for 
a nonexistent query only has NS SOA MX TXT RRSIG DNSKEY NSEC3PARAM


It is not as such to contact the dns operators and persuade them to 
upgrade/fix their software used for DNSSEC signing, but more as should 
we do more analysis of this phenomenon and even have a dns flag day 
before even more resolvers and operators are going to implement RFC8198? 
There might be an issue by deliberately exploiting this and make 
websites/mail unreachable.


Your estimate is correct, it's an old issue with F5 load balancers:
https://support.f5.com/csp/article/K00724442
It's an security issue and affected parties should patch their systems.

Detailed description of the problem can be found e.g. here:
https://en.blog.nic.cz/2019/07/10/error-in-dnssec-implementation-on-f5-big-ip-load-balancers/


--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Injection Attacks Reloaded: Tunnelling Malicious Payloads over DNS

2021-08-31 Thread Petr Špaček

On 30. 08. 21 18:01, Vladimír Čunát wrote:

On 30/08/2021 17.02, Petr Špaček wrote:
[...] It is clear to this group of DNS experts, but I think we should 
lend a helping hand to DNS consumers and at least explain why 
consumers have to check everything.


Is anyone interesting in writing a short RFC on this topic? 


That might serve as a good reference when some DNS expert points out to 
others why they shouldn't be doing what they're doing. However, I don't 
think we can expect a new RFC (by itself) to reduce these cases: *if* 
they were reading DNS RFCs, they would've surely realized that they need 
to be more careful.


Only if people were reading all of the DNS RFCs, but that's IMHO an 
unreasonable expectation for DNS data _consumers_ who do not (and should 
not) care about the inner workings of DNS.


The vast majority of DNS RFCs do not talk about data consumers, and the 
set of consumers is, I guess, almost disjoint with a set of DNS software 
vendors and server operators who are, I think, the primary target of the 
existing RFCs.


I would have a hard time if I wanted to send a link to relevant docs to 
an application developer who wants to use DNS data provided by a 
resolver library today. Most likely, I would require a bunch of links to 
several documents, with a custom commentary to explain which parts in 
what order to read.


For this reason, I think it would be good to have a document explicitly 
focused on consumers of DNS data. I think it should answer questions like:


- What's reasonable input to the resolver library? (E.g., an attacker 
might trick your code into calling the library with an attacker-provided 
input, etc.)
- What should you do with resolver library output? (Beware: it's binary, 
check syntax, it might be from the attacker's server, etc.)


--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Injection Attacks Reloaded: Tunnelling Malicious Payloads over DNS

2021-08-30 Thread Petr Špaček

On 17. 08. 21 22:17, Tony Finch wrote:

Viktor Dukhovni  wrote:


If applications make unwarranted assumptions about the syntax of
DNS replies, that's surely an application bug, rather than an issue
in DNS.


I particularly liked this paper because it's a really good example of a
common cause of security problems: when it isn't clear whose
responsibility it is to enforce an important restriction, in this case,
hostname syntax vs. DNS name (lack of) syntax. And different implementers
have made different choices, for instance whether the libc stub resolver
enforces hostname syntax or not.

And another classic vulnerability generator: standard APIs that make it
easy for non-specialists to step on every rake in the grass. In this case,
if an application needs something more fancy than getaddrinfo(), it has to
contend with the low-level resolver API which is just about better than
nothing for parsing DNS packets, but certainly won't help you handle names
that ought to have restricted syntax (service names, mail domains, etc...)

So I don't think the problems can be dismissed as simply application bugs:
the problems come from mismatches in expectations at the boundary between
the DNS and the applications. And the DNS is notorious (the subject of
memes!) for being far too difficult to use correctly.


I'm late to this thread, but ...

IMHO authors of the paper highlight a valid point:

There is no _explicit_ guidance for consumers of DNS data which explains 
that results of DNS resolution process must be treated very carefully. 
It is clear to this group of DNS experts, but I think we should lend a 
helping hand to DNS consumers and at least explain why consumers have to 
check everything.


Is anyone interesting in writing a short RFC on this topic?

--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] why does that domain resolve?

2021-06-10 Thread Petr Špaček

On 04. 06. 21 18:56, Paul Vixie wrote:

On Fri, Jun 04, 2021 at 12:22:10PM -0400, Anthony Lieuallen via dns-operations 
wrote:

This is a question of being parent- vs. child- centric.  The parents in the
DNS tree delegate correctly.  The fact that the children delegate
incorrectly can be a small or non-issue depending on resolver.


those NS RRs are authoritative at the apex of the child, but not at the leaf of
the parent. this means they have higher credibility, and also that they can be
DNSSEC signed and validated. credibility and validity _matter_.


Google Public DNS uses only parent delegations (
https://developers.devsite.corp.google.com/speed/public-dns/docs/troubleshooting/domains#delegation
).  Largely for issues like this: the child delegations can be wrong, but
for the domain to work at all, the parent delegations must be correct.


without broad and deep failure, the quality of apex NS names will never improve.


(Resolvers that choose to use child delegations will likely in this case
discover that these delegations are bogus, and be left with only the valid
delegations, from the parent.)


at which point they should return SERVFAIL. failure _matters_.



Personally, with all the experience we have in 2021, I find the historic 
decision to put authoritative NS RRs to the child side to be a poor 
choice, to the point of being indefensible.


As Anthony points out, the parent version of NS has to work anyway. It 
forces me to think a better course of action would be ignoring 
child-side NS instead of adding complex asynchronous code paths to 
validate child NS, which is not technically needed.


I mean - why waste resources on improving something which is not even 
needed?


(To be clear: This is my personal opinion, and I'm sure some of my 
colleagues at ISC will disagree violently.)


--
Petr Špaček

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] Possibly-incorrect NSEC responses from many RSOs

2021-03-03 Thread Petr Špaček

On 03. 03. 21 7:35, Viktor Dukhovni wrote:

On Wed, Mar 03, 2021 at 06:04:45AM +, Paul Vixie wrote:


A laudable goal, but exposing RRSIG as a bare RRset one can query does
not look like a viable path forward.  So I don't see this happening.

You described several cases in which rrsigs wouldn't be stable enough.
in my own role as signer, the rrsigs are refreshed by cron on sundays,
and so I think we're both looking at anecdotes here, worst or best case
scenarios, and what you don't see happening isn't totally compelling.

Another basic issue with RRSIG queries, already mention by Brian Dickson
is that there's no way to ask for the RRSIG of a specific RRSet, one can
(at present) only ask for all (or any subset) of the RRSSIGs associated
with a given name, and returning them all (at least over UDP) is often
not a good idea.

So, as noted by Tony Finch, the DNSSEC-oblivious iterative resolver may
(as already recommended) get back from its authoritative upstream only a
random representative record from the authoritative upstream (just as
with ANY queries), which is again often not the RRSIG you're looking
for.


For the records "respond with a randomly selected RRSIG" is implemented 
in Knot DNS 3.0.0, released in September 2020 [1]. Apparently sky did 
not fall.


[1] https://www.knot-dns.cz/2020-09-09-version-300.html

--
Petr Špaček  @  ISC

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Quad9 DNSSEC Validation?

2021-03-01 Thread Petr Špaček

On 28. 02. 21 9:39, Florian Weimer wrote:

* Winfried Angele:


I guess they've turned off validation for irs.gov because of a
former failure.


I think it goes beyond that.  It extends to GOV and MIL as a whole, it
seems.


In my experience negative trust anchors for big parts of MIL and/or GOV 
are way more common, let's not pick specifically on Quad9. For periods 
of time I have seen with other big resolver operators as well.


IMHO resolver market economics are going against DNSSEC security. If 
resolution does not work on one operator people routinely switch to 
other where it "works", either because they do not validate at all, or 
because their ops team already added negative trust anchor.


The only way to fix this is mutual agreement among operators to stop 
working around someone else's mistakes.


Are there operators willing to participate in such effort?

--
Petr Špaček  @  ISC

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] CLI Tool for DoH

2020-09-29 Thread Petr Špaček
On 29. 09. 20 3:30, cjc+dns-o...@pumpky.net wrote:
> Looking for a command line tool to do testing of DoH. Something like
> dig or drill with DoH support. I suspect there's a Python tool or
> the like out there somewhere, but my google-fu is failing.
> 
> Don't want to re-invent the wheel if I don't have to.

Knot DNS 3.0 has DoH support in kdig:

Examples for various DoH server implementations:
$ kdig @1.1.1.1 +https example.com.
$ kdig @193.17.47.1 +https=/doh example.com.
$ kdig @8.8.4.4 +https +https-get example.com.

Version 3.0 was released couple weeks ago and might not be in Linux 
distributions yet. Packages for common distributions and also source code is 
available from https://www.knot-dns.cz/download/

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-16 Thread Petr Špaček
On 15. 09. 20 13:16, Yasuhiro Orange Morishita / 森下泰宏 wrote:
> Petr-san,
> 
> Thank you for your clarification :-).
> But I have another question.
> 
> In my understanding, the official spelling of the day is "DNS flag
> day".  In the 2019 webpage, all of the spellings is lowercase.
> 
> But the spellings are not unified in the 2020 webpage.
> The ... and top of the official webpage's spellings
> are lowercase, but the content includes the capitalized one.
> 
> This may be a trivial question, but is important for providing the
> information to the related parties, I think.
> It would be helpful if you could clarify it.

It would be great if someone with strong opinion or expertise in English could 
create merge request to unify it:
https://github.com/dns-violations/dnsflagday

I'm not native speaker so I would have to flip coin to decide :-)

Petr Špaček  @  CZ.NIC


> 
> -- Orange
> 
> From: Petr Špaček 
> Subject: Re: [dns-operations] DNS Flag Day 2020 will become effective on 
> 2020-10-01
> Date: Wed, 9 Sep 2020 10:39:54 +0200
> 
>> Hi Orange-san,
>>
>> On 09. 09. 20 7:00, Yasuhiro Orange Morishita / 森下泰宏 wrote:
>>> Hi Petr-san,
>>>
>>> I tested some auth servers and resolvers by online checker in the
>>> official website.
>>>
>>> But I feel that both of them display "GO" even if EDNS buffer size is
>>> not set to 1232.  Is this by design?
>>
>> This is fine as long as all the authoritative servers work over DNS-over-TCP 
>> and respect EDNS buffer size sent by resolvers.
>>
>> The reason is that the effective EDNS buffer size is the minimal value from 
>> (client, server) pair. Consequently, once resolvers update their defaults, 
>> the change will become effective without any changes on the auth side.
>>
>> Lower EDNS buffer size might force fallback to TCP if auths are sending 
>> longer answers - that's why the web tester is checking availability of 
>> DNS-over-TCP.
>>
>> I hope it helps.
>>
>> If you can point to a section on https://dnsflagday.net/2020/ which should 
>> contain this answer I will be happy to add it there.
>>
>> Have a nice day!
>> Petr Špaček  @  CZ.NIC
>>
>>
>>
>>>
>>> -- Orange
>>>
>>> From: Petr Špaček 
>>> Subject: [dns-operations] DNS Flag Day 2020 will become effective on 
>>> 2020-10-01
>>> Date: Tue, 8 Sep 2020 12:04:39 +0200
>>>
>>>> Dear DNS people.
>>>>
>>>> We are happy to announce next step for DNS Flag Day 2020.
>>>>
>>>> Latest measurements indicate that practical breakage caused by the 
>>>> proposed change is tiny [1]. In other words we can conclude that the 
>>>> Internet is ready for the change.
>>>>
>>>> The long delayed DNS Flag Day will become effective on 2020-10-01 (October 
>>>> 1st 2020)!
>>>>
>>>> Detailed information including test tools and technical description of the 
>>>> change can be found at https://dnsflagday.net/2020/ .
>>>>
>>>> For questions please use dns-operations@lists.dns-oarc.net mailing list.
>>>>
>>>> [1] 
>>>> https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183
>>>>
>>>> -- 
>>>> Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] QTYPEs 65 and 65479

2020-09-16 Thread Petr Špaček
On 16. 09. 20 10:04, Greg Choules via dns-operations wrote:> Recently, whilst 
looking for something else, tcpdump on one of our recursive servers showed we 
are receiving queries with (from its point of view) unrecognised types. 
Wireshark doesn't have a decode for them yet either. There aren't many, yet. 
But it's more than just noise.
> A quick reverse lookup on the sources shows them all to be iPhone X or later.
> 
> Can anyone shed some light on what these are and whether we should be doing 
> something about them?

QTYPE 65 is new HTTPS Binding RR type. 
(https://www.iana.org/go/draft-ietf-dnsop-svcb-https-00)

65479 is in Private use range so it is hard to tell. (See 
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml)

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-11 Thread Petr Špaček
On 11. 09. 20 6:47, Paul Vixie wrote:
> Ondřej Surý wrote on 2020-09-10 21:25:
>> Paul,
>>
>> do you actually believe that shouting the same thing over and over will 
>> achieve anything?
> 
> no, of course not.
> 
>>
>> We’ve heard you before, we’ve listened to you, we’ve considered your 
>> arguments, and you haven’t convinced us and there’s a consensus between the 
>> vendors to go ahead with the change because it’s beneficial for the DNS 
>> ecosystem.
> 
> i think you changed the definition of the words "we" and "us" midsentence.

My non-native-speaker reading suggests it is in both cases refering to 
"consensus between the vendors". Let's not get distracted.

> 
>>
>> Sending multiple shouts to mailing lists, issue tracker, etc... because you 
>> have different opinion is not helpful to the DNS community nor to the cause. 
>> We are as much DNS experts as you are.
> 
> i don't think all of the people i intend to address here have heard my views. 
> they may think that dns-oarc speaks for the community rather than for a small 
> self selected team. they may also think that i as co-founder of dns-oarc can 
> be relied upon to support this activity. so, thank you for your concern for 
> my reputation (or my sanity, if that's also true), but i'll continue. if you 
> wish to actually respond to any of my claims, i am listening. if you wish to 
> continue to ignore those claims, i will cope.
> 
> this isn't a flag day and shouldn't be called that. it cheapens the term.
> 
> 1232 is a cargo-cult number. we must not revere as holy those things which 
> fall out of the sky.

I disagree. That number is based on real-world experiance of today's DNS 
resolver vendors - based on their experience with un/reliability of real 
configurations.

Later on research 
https://indico.dns-oarc.net/event/36/contributions/776/attachments/754/1277/DefragDNS-Axel_Koolhaas-Tjeerd_Slokker.pdf
 shown that the estimate based on vendor's experience was pretty good.

> 
> there is a right way to deprecate fragmentation. it would not involve adding 
> config complexity.

Well, this is not adding any complexity at all! It is the other way around:

a] All the configuration knobs for EDNS buffer size were already in the 
software, vendors are _just changing default values_ in their own software (as 
opposed to adding new options).

b] This effort actually enables vendors to remove code/fallback logic which 
attempts to guess a working EDNS buffer size, thus _reducing_ complexity of the 
DNS software and real-world operations.

> 
> there is a right way to reach consensus. it's an RFC draft, not a github repo 
> for the initiated.
> 
> in the testing referenced by the "flagday2020" web page, there was no 
> significant difference in loss between 1200 and 1400. there will be a 
> significant difference in truncation and tcp retry.

I think readers on this list can make conclusions for themselves, there is no 
need to hand wave. Slides are here:
https://indico.dns-oarc.net/event/36/contributions/776/attachments/754/1277/DefragDNS-Axel_Koolhaas-Tjeerd_Slokker.pdf

Do not forget to add ethernet + IP + UDP headers to EDNS buffer size when 
comparing numbers from slides, i.e. 14 + 20 + 8 + 1232 = MTU 1272 B vs. MTU 
1442. See slides 19 and 20 (= recursive resolvers) and do the math.

Is lowering failure rate roughly by 0.8 % for IPv4 and by 0.33 % for IPv6 
significant or not?
That's matter for each DNS vendor to decide because in the end it is the 
vendors who have to support the software and deal with all the obscure failure 
reports.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-09 Thread Petr Špaček
Hi Orange-san,

On 09. 09. 20 7:00, Yasuhiro Orange Morishita / 森下泰宏 wrote:
> Hi Petr-san,
> 
> I tested some auth servers and resolvers by online checker in the
> official website.
> 
> But I feel that both of them display "GO" even if EDNS buffer size is
> not set to 1232.  Is this by design?

This is fine as long as all the authoritative servers work over DNS-over-TCP 
and respect EDNS buffer size sent by resolvers.

The reason is that the effective EDNS buffer size is the minimal value from 
(client, server) pair. Consequently, once resolvers update their defaults, the 
change will become effective without any changes on the auth side.

Lower EDNS buffer size might force fallback to TCP if auths are sending longer 
answers - that's why the web tester is checking availability of DNS-over-TCP.

I hope it helps.

If you can point to a section on https://dnsflagday.net/2020/ which should 
contain this answer I will be happy to add it there.

Have a nice day!
Petr Špaček  @  CZ.NIC



> 
> -- Orange
> 
> From: Petr Špaček 
> Subject: [dns-operations] DNS Flag Day 2020 will become effective on 
> 2020-10-01
> Date: Tue, 8 Sep 2020 12:04:39 +0200
> 
>> Dear DNS people.
>>
>> We are happy to announce next step for DNS Flag Day 2020.
>>
>> Latest measurements indicate that practical breakage caused by the proposed 
>> change is tiny [1]. In other words we can conclude that the Internet is 
>> ready for the change.
>>
>> The long delayed DNS Flag Day will become effective on 2020-10-01 (October 
>> 1st 2020)!
>>
>> Detailed information including test tools and technical description of the 
>> change can be found at https://dnsflagday.net/2020/ .
>>
>> For questions please use dns-operations@lists.dns-oarc.net mailing list.
>>
>> [1] 
>> https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183
>>
>> -- 
>> Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-08 Thread Petr Špaček
Dear DNS people.

We are happy to announce next step for DNS Flag Day 2020.

Latest measurements indicate that practical breakage caused by the proposed 
change is tiny [1]. In other words we can conclude that the Internet is ready 
for the change.

The long delayed DNS Flag Day will become effective on 2020-10-01 (October 1st 
2020)!

Detailed information including test tools and technical description of the 
change can be found at https://dnsflagday.net/2020/ .

For questions please use dns-operations@lists.dns-oarc.net mailing list.

[1] 
https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] FlagDay 2020 UDP Size

2020-08-05 Thread Petr Špaček
On 04. 08. 20 18:26, Viktor Dukhovni wrote:
> On Mon, Aug 03, 2020 at 09:44:17PM +0100, Tony Finch wrote:
> 
>> jack tavares  wrote:
>>>
>>> I have gone through the archives, is there consensus on this at this time?
>>> For both the date of Flag Day (Which appears to be 1st October 2020,
>>> pending confirmation from google)
>>> and for the suggested default?
>>
>> There are some interesting measurements in
>> https://rp.delaat.net/2019-2020/p78/report.pdf
> 
> What I haven't seen reported is measurements of problems that occur when
> the EDNS(0) UDP buffer size is *too small*.
> 
> There are lots of measurements with lost UDP datagrams when the buffer
> size is too large, but given a "too small" buffer size servers truncate
> responses, and some don't also support TCP.  This causes lookup failures
> when the buffer size is sufficiently low.

You are right, and that's exactly reason why https://dnsflagday.net/2020/ web 
test tool focuses on TCP availability.

It is way easier to test if "TCP works for all auths for a given domain" than 
to test if "IP fragments can traverse all relevant paths over the Internet for 
all relevant answer sizes". The second option is just infeasible/madness.

Once we get TCP working we do not need to worry that too small EDNS buffer will 
break something, it only might make things less effective...

Of course once proponents of perfect-EDNS-buffer-size-detection-methods 
implement them in a resilient and scalable way we can can move on to these (at 
the moment hypothetical) better methods and get rid of these slight 
ineffectivity.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] RFC 6975 (was: Re: Algorithm 5 and 7 trends (please move to 8 or 13))

2020-06-10 Thread Petr Špaček


On 09. 06. 20 7:16, Brian Somers wrote:
> We turned this up again on Friday and turned it down yet again today.  There
> are issues with sacks.com and I’m told there are a bunch of other support
> tickets (although details haven’t been given yet).
> 
> $ dig +noall +answer +tries=1 +ednsopt=5:08 zacks.com @208.65.116.45  
> 
> ;; connection timed out; no servers could be reached
> $ dig +noall +answer +tries=1 +subnet=1.2.3.0/24 zacks.com @208.65.116.45
> ;; connection timed out; no servers could be reached
> $ dig +noall +answer +tries=1 zacks.com @208.65.116.45
> zacks.com.  1200IN  A   208.65.116.3
> 
> I’m now looking at re-implementing the code we had in place for EDNS
> probing prior to flag day 2019:
> - FORMERR/SERVFAIL/NOTIMP - try without any EDNS codes
> - No response - try with no EDNS codes on the third attempt

Please don't do that, it would further cement DNS protocol in 1998.

If you really really _really_ need a simple "workaround" send RFC 6975 signals 
only to root servers. That should be enough to provide researchers with data 
how RFC 6975+algo deployment is going without breaking weird auths.

See below for long-term proposal.


> 
> Still trying to think of a way to make this negatively affect the domain that
> misbehaves without negatively affecting our support folks :(
> 
> Any tips around this would be helpful (any resolvers do ECS probing for
> example?).

I think we (= resolver vendors) should coordinate first. There is no rush for 
RFC 6975 deployment so we can plan and act together to finally get DNS protocol 
into 21st century.

For example if we coordinated RFC 6975 deployment on major resolvers could push 
auths to action. DNS Flag day 2019 had major impact and DAU/DHU/N3U options are 
opportunity to clear up the rest.

If needed we can modify https://gitlab.labs.nic.cz/knot/edns-zone-scanner/ to 
test also  DAU/DHU/N3U options and compile list of shame, unfortunatelly it is 
the only thing which seems to work.
(I'm happy to help with that but I'm sick at the moment, let's talk later...)

Petr Špaček  @  CZ.NIC



> 
> —
> Brian
> 
>> On Jun 3, 2020, at 1:52 AM, Petr Špaček  wrote:
>>
>> On 03. 06. 20 7:18, Brian Somers wrote:
>>> On May 28, 2020, at 10:35 PM, Viktor Dukhovni  
>>> wrote:
>>>>
>>>> Enough time has passed since the need to abandon SHA-1 has become
>>>> more pressing to discern at least a couple short-term trend-lines.
>>>
>>> Along these lines, have any of the large resolvers implemented
>>> RFC 6975 (DAU/DHU/N3U EDNS codes)?  OpenDNS/Cisco
>>> enabled these a couple of weeks ago but had to disable them
>>> pending qq.com being fixed (its nameservers returned
>>> SERVFAIL).  Now that the fix is there, we’re planning to turn
>>> it up again at the end of the week.
>>>
>>> Just curious about its adoption… it feels like we testing new
>>> waters here.
>>
>> I believe you are the first, congrats! :-)
>>
>> It was not feasible to implement before https://dnsflagday.net/2019/ and 
>> then, you know, nobody asked for it ...
>>
>> Please report other issues you eventually encounter, I would bet there will 
>> be couple more lurking somewhere.
>>
>> -- 
>> Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] EDNS client-subnet best practice?

2020-06-03 Thread Petr Špaček
On 03. 06. 20 14:44, Chris Adams wrote:
> What is considered current best practice for recursive servers on
> enabling EDNS client-subnet?
> 
> I ask because I have a couple of recursive DNS servers at an independent
> telephone company that are getting different answers for a certain large
> website.  The servers are in the same subnet, but one gets an IP
> apparently in another country, while the other gets an IP in a nearby
> state.  The servers are configured identically (CentOS 7 with Unbound).
> 
> I emailed the website's NOC, and their response was that the issue was
> that "Most likely the issue is due to EDNS not being turned on with your
> DNS server."  I assume they were talking about EDNS client-subnet
> (because they then gave an example dig with +subnet set).
> 
> These servers are not configured to send client-subnet to anybody
> (pretty much default Unbound config).  They aren't serving clients from
> outside the AS - I generally think of client-subnet as something you'd
> use on a DNS server with a wide range of clients.  Is it expected that I
> should be enabling EDNS client-subnet on recursive servers?
> 
> I do have some recursive servers that have a large set of clients (where
> client-subnet might be useful) - should I just enable it for all
> requests?  In Unbound terms, enable "client-subnet-always-forward"?

In my view ECS is only useful if routing paths between:
a) resolver & Internet 
b) client sending query to resolver & Internet
are different.

Netmasks in Unbound's max-client-subnet-ipv4/6 would ideally be as short as 
possible to cover just the prefix where causes the routing to differ and 
nothing more.

As for client-subnet-always-forward... I do not understand what the manual 
attempts to say :-/

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] RFC 6975 (was: Re: Algorithm 5 and 7 trends (please move to 8 or 13))

2020-06-03 Thread Petr Špaček
On 03. 06. 20 7:18, Brian Somers wrote:
> On May 28, 2020, at 10:35 PM, Viktor Dukhovni  wrote:
>>
>> Enough time has passed since the need to abandon SHA-1 has become
>> more pressing to discern at least a couple short-term trend-lines.
> 
> Along these lines, have any of the large resolvers implemented
> RFC 6975 (DAU/DHU/N3U EDNS codes)?  OpenDNS/Cisco
> enabled these a couple of weeks ago but had to disable them
> pending qq.com being fixed (its nameservers returned
> SERVFAIL).  Now that the fix is there, we’re planning to turn
> it up again at the end of the week.
> 
> Just curious about its adoption… it feels like we testing new
> waters here.

I believe you are the first, congrats! :-)

It was not feasible to implement before https://dnsflagday.net/2019/ and then, 
you know, nobody asked for it ...

Please report other issues you eventually encounter, I would bet there will be 
couple more lurking somewhere.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] A strange DNS problem (intermittent SERVFAILs)

2020-06-03 Thread Petr Špaček
On 02. 06. 20 23:39, Guillaume LUCAS wrote:
> Hello,
> 
> I just subscribed to this list, so sorry for the thread breaking.
> 
>> Several users on Twitter reported problems accessing Banque
>> Populaire (a French bank)
> 
> Since 1 pm (UTC+2) this day (June 2nd), it works from CloudFlare, FDN,…
> everywhere. Customers confirm that on Twitter [*]. But
> nsisp1.i-bp.banquepopulaire.fr. still returns REFUSED for NS/SOA and
> over-TCP queries for www.banquepopulaire.fr or
> www.ibps.bpaca.banquepopulaire.fr. So, I don't understand what the root
> cause of the problem was…
> 
> www.caisse-epargne.fr, a french bank of the same banking group as Banque
> Populaire, had a similar problem in the same period of time: the two
> name servers for this DNS zone, nslp1.gcetech.net and nslp2.gcetech.net,
> returned NODATA for NS/SOA queries (but they answered to over-TCP
> queries). Unbound 1.9 could resolve this name, Unbound 1.6 couldn't.
> Technical details (in french): <http://shaarli.guiguishow.info/?TqC4Ug>.
> Like Banque Populaire, name resolution works since 1 pm (UTC+2) today.
> nslp(1|2).gcetech.net still returns NODATA… So, again, I don't
> understand what the root cause was…
> 
> @Matthew: you said « bcpe.fr is delegated to the same servers which do
> not answer NS queries ». It's wrong. bpce.fr have always been delegated
> to dns(1|2).bpce.fr . These servers have always answered to NS/SOA and
> TCP queries. Name servers for banquepopulaire / bpce.fr / groupebpce.com
> = dns(1|2).bpce.fr, name servers for www.banquepopulaire.fr /
> www.ibps.*.banquepopulaire.fr / www.*.banquepopulaire.fr =
> nsisp1.i-bp.banquepopulaire.fr. On last Saturday, I was able to
> reproduce your result for "dig @1.1.1.1 banquepopulaire.fr ns":
> CloudFlare always aswered SERVFAIL (or didn't answer). CloudFlare was
> the only resolver in this case. So, like you observed, it's normal that
> CloudFlare stop the resolution at this point, but what about the other
> resolvers?

Please let's not get to shaming resolvers here, the delegation chain for 
www.banquepopulaire.fr. is a utter mess.

The subdomain "www" is delegated to IP addresss 91.135.182.250 which answers 
REFUSED to most queries, so I guess it is a dumb or misconfigured load-balancer.

The only query which kind of works is:
$ dig +nord @91.135.182.250 www.banquepopulaire.fr

; <<>> DiG 9.16.3 <<>> +nord @91.135.182.250 www.banquepopulaire.fr
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32881
;; flags: qr aa ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.banquepopulaire.fr.IN  A

;; ANSWER SECTION:
www.banquepopulaire.fr. 30  IN  A   91.135.183.180
www.banquepopulaire.fr. 30  IN  A   91.135.183.180

;; Query time: 56 msec
;; SERVER: 91.135.182.250#53(91.135.182.250)
;; WHEN: St čen 03 10:33:51 CEST 2020
;; MSG SIZE  rcvd: 83

Yes, the duplicate RR is really here!


Fresh DNSViz analysis:
https://dnsviz.net/d/www.banquepopulaire.fr/XtdfDQ/dnssec/

www.banquepopulaire.fr zone: The server(s) were not responsive to queries 
over TCP. (91.135.182.250)
www.banquepopulaire.fr/DNSKEY: The response had an invalid RCODE (REFUSED). 
(91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_512_D_K)
www.banquepopulaire.fr/MX: The response had an invalid RCODE (REFUSED). 
(91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_512_D_K)
www.banquepopulaire.fr/NS: The response had an invalid RCODE (REFUSED). 
(91.135.182.250, UDP_-_EDNS0_4096_D_K)
www.banquepopulaire.fr/SOA: No response was received from the server over 
TCP (tried 3 times). (91.135.182.250, TCP_-_EDNS0_4096_D)
www.banquepopulaire.fr/SOA: The response had an invalid RCODE (REFUSED). 
(91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_4096_D_K_0x20)
www.banquepopulaire.fr/TXT: The response had an invalid RCODE (REFUSED). 
(91.135.182.250, UDP_-_EDNS0_4096_D_K)

From my perspective it is sufficiently broken to warrant fix on auth side.

Maybe resolver operators decided to workaroud it on their side, which would be 
the most unfortunate. The auth operator should bear the cost of their own 
misconfigurations, not resolver operators.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] DNSSEC signing bugfix in Knot DNS 2.9.5 (was: DNSSEC Validation Failures for RIPE NCC Zones)

2020-05-25 Thread Petr Špaček
On 22. 05. 20 14:22, Anand Buddhdev wrote:
> Dear colleagues,
> 
> Yesterday afternoon (21 May 2020), our DNSSEC signer rolled the Zone Signing 
> Keys (ZSKs) of all the zones we operate. Unfortunately, a bug in the signer 
> caused it to withdraw the old ZSKs soon after the new keys began signing the 
> zones.
> 
> Validating resolvers may have experienced some failures if they had cached 
> signatures made by the old ZSKs.
> 
> We apologise for any operational problems this may have caused. We are 
> looking at the issue with the developers of our Knot DNS signer to prevent 
> such an occurrence in the future.

Knot DNS 2.9.5 with fix for this particular problem was released and we 
encourage all users encouraged to upgrade.

Full release announcement:
https://lists.nic.cz/pipermail/knot-dns-users/2020-May/001815.html

The bug sometimes caused automatic key roll-overs to be finished too early, 
leading to temporary DNSSEC validation failures.

More detailed problem description + workaround:
https://lists.nic.cz/pipermail/knot-dns-users/2020-May/001813.html

We apologize to everyone affected.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(

2020-04-21 Thread Petr Špaček
Beware, wall of text ahead.

On 21. 04. 20 10:52, Brian Dickson wrote:
> On Tue, Apr 21, 2020 at 1:04 AM Petr Špaček  <mailto:petr.spa...@nic.cz>> wrote:
> 
> On 21. 04. 20 9:00, Paul Vixie wrote:
> > On Tuesday, 21 April 2020 06:20:04 UTC Petr Špaček wrote:
> 
> Unfortunatelly I can't, we never got to the root cause.
> 
> It is the same story again and again:
> Probably one of ISPs in chain on the affected link was doing weird stuff 
> with big packets. We as Knot Resolver developers were not "their customer" 
> but merely "supplier of their customer" so they refused to talk to us, and 
> their actual customer lost interest as soon as it started to work reliably 
> for them. That's all we have, i.e. nothing.
> 
> 
> I'm confused by the use of the definite and indefinite, which are in 
> disagreement.
> ..."the root cause" suggests a single instance that was being investigated.
> ..."again and again" suggests multiple occurrences.
> If it was multiples, was it all involving a single network, perhaps?
> Or can you clarify if this was only a single instance of this happening?

Sorry, a non-native English speaker here.

We have encountered this on various networks all over place, mainly because 
Turris Omnia comes with DNSSEC-validating resolver by default.


> Can you share any other diagnostic information or observations? 
> Was there fragmentation, or just packet loss above a certain size?
> (Fragmentation would suggest smaller-than-expected MTU or perhaps tunnels, 
> while packet loss would suggest possible MTU mismatch on a single link.)
> 
> Understanding whether this was operator error by an ISP, versus some other 
> non-error situation with real MTU below 1400, is important.

Except for rare cases users tell us just "this magic number works, bye" so 
there is really nothing to share. If I had data I would share them.

Hopefully my last presentation about server selection algorithms @ DNS-OARC 31 
[1] is strong enough indicator that my team publishes everything of value which 
comes from our tests and experience, even if it does not show us our products 
or decisions in the best light.

[1] https://indico.dns-oarc.net/event/32/contributions/711/


> This all has very real and very serious consequences to the entirety of the 
> DNS ecosystem.
 
Is this a generic statement, or does it relate specifically to difference 
between 1410 and 1232? If so, do you have data to support such strong statement?


> No, I did mean "would":
> - OpenDNS's experience says that in data centers 1410 works.
> - Our experience says that outside of data centers 1410 does not always 
> work.
> 
> 
> Are there additional instances where 1410 did not work, or are you using that 
> single instance to support the "does not always work" position?
>  
> 
> Let's be precise here. The proposal on the table is to change _default 
> values in configuration_.
> 
> Nobody is proposing to impose "arbitrary maximum response buffer size" 
> and weld it onto DNS software. Vendors are simply looking for defaults which 
> work for them and their customer/user base.
> 
> 
> Actually, it is both.
> 
> Just as a reminder: UDP responses will be sent with a not-to-exceed size of 
> MIN(configured authority max, requestor's max from EDNS0 UDP_BUFSIZE).
> If the client has a smaller value than the server, that is the maximum size 
> of a response that the server will send.
> If that value is too small, it has consequences.
> If that value is used for that client, to all servers, it has consequences 
> for all servers traffic to that client.
> If that value comes from default, and is used on the vast majority of that 
> packages' operators, that affects traffic from all servers to all of those 
> operators.
> If that package represents a large portion of the traffic from resolvers, 
> that's a big deal.
> 
> A significant shift in the amount of TCP traffic could occur due to TC=1 
> responses, with a non-linear relationship between the apparent decrease in 
> MTU, and the amount of TCP traffic.
> 
> Particularly with large RSA key sizes and large signatures, and a large 
> proportion of DNSSEC traffic, the impact could be severe.
> 
> If a DNS authority operator were to begin providing DNSSEC for their customer 
> base, DNSSEC deployment could jump from 1%-2% to 40% overnight.
> (Hint: at least one major DNS hosting provider has strongly suggested this is 
> likely to occur quite soon.)

Well, in that case I strongly suggest this unspecified large operator should 
with an ECC algorithm instead of RSA ;-)


> 
> And a 5% to 10% decrease in actual MTU (offered by clients in EDNS), the 
> proporti

Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(

2020-04-21 Thread Petr Špaček
On 21. 04. 20 9:00, Paul Vixie wrote:
> On Tuesday, 21 April 2020 06:20:04 UTC Petr Špaček wrote:
>> On 20. 04. 20 22:22, Viktor Dukhovni wrote:
>>> On Mon, Apr 20, 2020 at 12:52:49PM -0700, Brian Somers wrote:
>>>> ...
>>>> At Cisco we allow up to 1410 bytes upstream and drop fragments.  We
>>>> prefer IPv6 addresses when talking to authorities.  We’ve been doing
>>>> this for years (except for a period between Feb 2019 and Aug 2019). 
>>>> Zero customer complaints.> 
>>> So perhaps the advice to default to 1232 should be revised:
>>> ...
>>
>> Please let's not jump to conclusions, especially because of single anecdote.
> 
> my own anecdotes are not singular, but your point is taken.
> 
>> As Knot Resolver developer I counter with another anecdote:
>> We have experience with networks where ~ 1300 buffer was workable minimum
>> and 1400 was already too much.
> 
> i hope you can say much more than this, about that.

Unfortunatelly I can't, we never got to the root cause.

It is the same story again and again:
Probably one of ISPs in chain on the affected link was doing weird stuff with 
big packets. We as Knot Resolver developers were not "their customer" but 
merely "supplier of their customer" so they refused to talk to us, and their 
actual customer lost interest as soon as it started to work reliably for them. 
That's all we have, i.e. nothing.


>> As for OpenDNS experience - I'm hesistant to generalize. According to
>> https://indico.dns-oarc.net/event/33/contributions/751/attachments/724/1228/
>> 20200201_DNSSEC_Recursive_Resolution_From_the_Ground_Up.pptx DO bit is sent
>> out only since Sep'2018, and presumably from resolvers in data centers.
> 
> i understood the opendns team to say that they also used 1410 as the maximum 
> buffer size in responding to downstream queries. perhaps they can expand here.
> 
>> Results would be very different for recursive resolver deployment deep in
>> corporate networks/on the last mile.
> 
> that statement stretches the verb "would" too far. did you mean "could"?

No, I did mean "would":
- OpenDNS's experience says that in data centers 1410 works.
- Our experience says that outside of data centers 1410 does not always work.

> i 
> think we can learn a lot from authoritative responses (how many are followed 
> by retries or TCP or a complaint?) and recursive responses (same question). 
> 
>> DNS-over-TCP is mandatory to implement so please let's stop working it
>> around.
> 
> +1. no part of this debate is for me an argument against mandated TCP and 
> recommended DoT. those should be assumed on all timelines. however, that does 
> not justify an arbitrary maximum response buffer size such as 1232. all of 
> the 
> math that leads to 1232 is unsuitable for DNS's use.

Let's be precise here. The proposal on the table is to change _default values 
in configuration_.

Nobody is proposing to impose "arbitrary maximum response buffer size" and weld 
it onto DNS software. Vendors are simply looking for defaults which work for 
them and their customer/user base.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(

2020-04-21 Thread Petr Špaček
On 20. 04. 20 22:22, Viktor Dukhovni wrote:
> On Mon, Apr 20, 2020 at 12:52:49PM -0700, Brian Somers wrote:
> 
>> On Apr 18, 2020, at 9:39 PM, Viktor Dukhovni  wrote:
>>> Is there any new information on whether something closer to 1400 is
>>> generally safe also for IPv6?
>>
>> At Cisco we allow up to 1410 bytes upstream and drop fragments.  We prefer 
>> IPv6
>> addresses when talking to authorities.  We’ve been doing this for years 
>> (except for
>> a period between Feb 2019 and Aug 2019).  Zero customer complaints.
> 
> So perhaps the advice to default to 1232 should be revised:
> 
> https://dnsflagday.net/2020/#dns-flag-day-2020
> 
> I see some movement in that direction, with the recommendation of 1220
> in:
> 
> 
> https://tools.ietf.org/html/draft-fujiwara-dnsop-avoid-fragmentation-00#section-3
> 
>o  Full-service resolvers SHOULD set EDNS0 requestor's UDP payload
>   size to 1220.  (defined in [RFC4035] as minimum payload size)
> 
>o  Authoritative servers and full-service resolvers SHOULD choose
>   EDNS0 responder's maximum payload size to 1220 (defined in
>   [RFC4035] as minimum payload size)
> 
> revised in -01/-02 to:
> 
> 
> https://tools.ietf.org/html/draft-fujiwara-dnsop-avoid-fragmentation-02#section-4
> 
>o  [RFC4035] defines that "A security-aware name server MUST support
>   the EDNS0 message size extension, MUST support a message size of
>   at least 1220 octets".  Then, the smallest number of the maximum
>   DNS/UDP payload size is 1220.
> 
>o  However, in practice, the smallest MTU witnessed in the
>   operational DNS community is 1500 octets.  The estimated size of a
>   DNS message's UDP headers, IP headers, IP options, and one or more
>   set of tunnel, IP-in-IP, VLAN, and virtual circuit headers, SHOULD
>   be 100 octets.  Then, the maximum DNS/UDP payload size may be
>   1400.
> 
> While darpa.mil still need to enable TCP, a more generous buffer size
> that avoids IPv6 issues will also avoid unnecessary and potentially even
> unavailable TCP fallback.  So I'm in favour of 1400 or 1410 (do we need
> more empirical evidence from other vantage points?) assuming those are
> also safe.
> 
> If the IPv6 obstacles are typically closer to the resolver than the
> authoritative server, just Cisco's experience may not be enough to make
> a definite conclusion.

Please let's not jump to conclusions, especially because of single anecdote.

As Knot Resolver developer I counter with another anecdote:
We have experience with networks where ~ 1300 buffer was workable minimum and 
1400 was already too much.

As for OpenDNS experience - I'm hesistant to generalize. According to
https://indico.dns-oarc.net/event/33/contributions/751/attachments/724/1228/20200201_DNSSEC_Recursive_Resolution_From_the_Ground_Up.pptx
DO bit is sent out only since Sep'2018, and presumably from resolvers in data 
centers. 

Results would be very different for recursive resolver deployment deep in 
corporate networks/on the last mile.

DNS-over-TCP is mandatory to implement so please let's stop working it around.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Any known AD=1 intolerant iterative resolvers?

2020-04-14 Thread Petr Špaček
On 15. 04. 20 7:23, Florian Weimer wrote:
> This approach does not work because you do not know whether the
> recursive resolver merely echoes back the AD bit, or has actually
> performed DNSSEC validation.

As always, any reliance on AD bit requires out-of-band knowledge whether the 
other side does validation and can be trusted or not... and I'm sure Viktor 
knows that.

Glibc (after years and years of deliberation) now has explicit configuration 
for passing AD bit back to clients:

GLibc commit 446997ff1433d33452b81dfa9e626b8dccf101a4
Author: Florian Weimer 
Date:   Wed Oct 30 17:26:58 2019 +0100

resolv: Implement trust-ad option for /etc/resolv.conf [BZ #20358]

This introduces a concept of trusted name servers, for which the
AD bit is passed through to applications.  For untrusted name
servers (the default), the AD bit in responses are cleared, to
provide a safe default.

This approach is very similar to the one suggested by Pavel Šimerda
in <https://bugzilla.redhat.com/show_bug.cgi?id=1164339#c15>.

The DNS test framework in support/ is enhanced with support for
setting the AD bit in responses.

Tested on x86_64-linux-gnu.

Change-Id: Ibfe0f7c73ea221c35979842c5c3b6ed486495ccc

Kudos to Florian that he made it happen, it took 6 years to get it upstream!


Historical notes:
https://www.sourceware.org/glibc/wiki/DNSSEC

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

2020-04-03 Thread Petr Špaček


On 02. 04. 20 23:11, Doug Barton wrote:
> Thank you for flushing it, I can see that the nodes which were previously 
> failing are now working.
> 
> I also appreciate the logs, which confirms my fear that the old NS set was 
> stuck in the cache with what's left of the parent's TTL. That's sort of good 
> news in the short term since at least we know now that the problem will go 
> away in time. It's better news longer term since it tells me that my 
> ultra-paranoid step of adding both sets to the parent isn't so paranoid after 
> all, and will work to smooth the transitions for the other sites.
> 
> Wasn't there a move away from parent-centric in the past? Did I miss a memo?

Now I'm curious:
Was there?

TL;DR:
Updating parent NS set and waiting for its TTL to expire is in no way paranoid, 
it is a mandatory step.


Being parent-centric is the only way how to make resolution deterministic (with 
respect to NS changes) so we also do that in Knot Resolver. Ultimatelly, if 
there is no overlap between parent and child NS set, even child-centric 
resolvers will inevitably fail resolution as soon as the child NS expired from 
their cache.

This behavior is baked into the protocol so there is no way around it. I would 
much rather spend time on getting parents more flexible instead of spending 
time on workarounds (being child-centric is IMHO workaround).

Petr Špaček  @  CZ.NIC

> 
> Thanks again,
> 
> Doug
> 
> 
> On 2020-04-02 13:49, Brian Somers wrote:
>> I’ve flushed shopdisney.co.uk/NS globally.  Should work now for
>> Umbrella/OpenDNS/Cisco
>>
>>> On Apr 2, 2020, at 1:36 PM, Brian Somers  wrote:
>>>
>>> This is what I see with diagnostics turned up:
> 
>>> shopdisney.co.uk.   0   IN  TXT "RESOLVER: shopdisney.co.uk 
>>> IN NS ns1.disneyinternational.net"
>>> shopdisney.co.uk.   0   IN  TXT "RESOLVER: shopdisney.co.uk 
>>> IN NS ns2.disneyinternational.net"
>>> shopdisney.co.uk.   0   IN  TXT "RESOLVER: shopdisney.co.uk 
>>> IN NS ns3.disneyinternational.net"
>>> shopdisney.co.uk.   0   IN  TXT "RESOLVER: shopdisney.co.uk 
>>> IN NS ns4.disneyinternational.net"
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Any DNAME usage experience?

2020-03-31 Thread Petr Špaček
On 30. 03. 20 12:35, Meir Kraushar via dns-operations wrote:
> - Obviously resolver compliance is very important (Knot support is 
> questionable?)

We intend to release fix in 5.1.0 release, probably next week:
https://gitlab.labs.nic.cz/knot/knot-resolver/-/merge_requests/965

I'm sorry for being late to the DNAME party.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] weird queries for mx1.mx2.mx1.mx2...

2020-03-31 Thread Petr Špaček
On 30. 03. 20 21:07, John Levine wrote:
> In article <02fe7bae-fec6-f314-b189-4214b75ce...@nic.cz> you write:
>> This is query list for domain truckinsurancekentucky.com:
>>
>> mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mx1.mta-sts.mx1.mx1.mx2.mx2.mta-sts.mx1.mx1.truckinsurancekentucky.com.
>>  
> 
>> Domain truckinsurancekentucky.com is not the only one with this weird 
>> behavior. Does anyone have an idea what is causing this?
> 
> It sure looks like misconfigured mta-sts.
> 
> That domain is dead, got another live one we could look at and see how it's 
> configured?  

These seem to be alive:

mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx1.mta-sts.mx2.mx1.mx1.mx2.mx2.mx2.mx1.mx2.maxonsoftware.com.
 A

mx2.mx1.mx2.mx1.mx1.mx2.mta-sts.mx1.mx2.mx2.mx1.mx2.mx1.mx2.cineversityoneonone.net.
 A

mx2.mx1.mx1.mx1.mx2.mx2.mx2.mta-sts.mx1.mx2.mx1.mx1.mta-sts.mx2.mx2.mx2.effluentialtechnologies.net.
 A

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] weird queries for mx1.mx2.mx1.mx2...

2020-03-30 Thread Petr Špaček
Hello everyone,

while debugging some resolution problems we have notices really weird queries, 
seemingly related to e-mail delivery. This is query list for domain 
truckinsurancekentucky.com:

mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mx1.mta-sts.mx1.mx1.mx2.mx2.mta-sts.mx1.mx1.truckinsurancekentucky.com.
 

mx1.mx1.mx1.mx2.mx1.mx2.mx1.mx2.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx2.truckinsurancekentucky.com.
 A

mx1.mx2.mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx1.mta-sts.mx1.mx2.mx2.mx2.mx1.truckinsurancekentucky.com.
 A

mx1.mx2.mx1.mx1.mx2.mx1.mx1.mx2.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mta-sts.mx1.truckinsurancekentucky.com.
 NS

mx1.mx2.mx1.mx2.mx2.mx1.mx1.mx1.mx1.mx2.mta-sts.mx1.mx1.mx2.mta-sts.mx2.mx2.truckinsurancekentucky.com.
 

mx1.mx2.mx2.mx1.mx2.mx2.mx1.mx2.mx2.mx2.mx2.mx1.mx2.mx1.mx2.mx1.mx1.mx1.truckinsurancekentucky.com.
 A

mx2.mx1.mx1.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx2.mta-sts.mx1.mx2.mta-sts.mx1.mx2.mx1.truckinsurancekentucky.com.
 NS

mx2.mx1.mx2.mx1.mx1.mx2.mx1.mx2.mx1.mx2.mx1.mx1.mx1.mx1.mta-sts.mx1.mx2.mx2.truckinsurancekentucky.com.
 NS

mx2.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx1.mx1.mx1.mta-sts.mx1.mx2.truckinsurancekentucky.com.
 A

mx2.mx2.mx1.mx1.mx2.mx1.mx2.mx1.mx1.mta-sts.mx1.mx2.mx1.mx1.mta-sts.mx2.mx2.truckinsurancekentucky.com.
 

mx2.mx2.mx1.mx2.mx1.mx1.mx1.mx2.mx1.mx1.mx1.mx1.mx1.truckinsurancekentucky.com. 


mx2.mx2.mx1.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx1.mx1.mta-sts.mx1.mx2.mx2.mx2.truckinsurancekentucky.com.
 A

Domain truckinsurancekentucky.com is not the only one with this weird behavior. 
Does anyone have an idea what is causing this?

(We have access only to anonymized data so we are unable to pinpoint 
responsible client.)

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] DNS flag day 2020 update

2020-03-25 Thread Petr Špaček
Hello DNS operators!

Work like DNS flag day [1] requires a lot of global collaboration between DNS 
software developers, vendors, operators and service providers which is usually 
initiated and facilitated at conferences.

As that critical means of collaboration is greatly affected by the current 
situation in the world it's going to be harder to move things forward.

But the work has not been stopped or cancelled, it's moving forward as it did 
before!  Furthermore, we will closely monitor current developments and adapt 
our plans appropriately.

Please note there is still no date set (!) although there is a suggestion [2] 
that seem to be generally accepted.

Clarification: The date in this flag days title is not an indicator when the 
work will be finished, it is to differentiate this work from previous work.


Are you a DNS vendor, operator, firewall vendor or service provider and want to 
improve on DNS resilience?

Then ready our guidelines on "Message Size Considerations" for EDNS [3] to 
reduce or even avoid fragmentation of the DNS and please allow DNS over TCP!

Thank you for your attention.

[1] https://dnsflagday.net/2020/
[2] 
https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-554724998
[3] https://dnsflagday.net/2020/#message-size-considerations

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] contacts for FRITZ!box (AVM) DNS contacts

2020-03-17 Thread Petr Špaček
Hello,

I would like to talk to DNS engineers working on FRITZ!box manufactured by AVM.

Can you please contact me off-list?

Thank you!

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] .ORG still using SHA-1 DNSKEYs

2020-02-07 Thread Petr Špaček
On 07. 02. 20 10:51, James Stevens wrote:
>> - You would be surprised how slow UDP packet processing in kernel can be ;-)
> 
> Often UDP slowness is due to the fact that each packet requires a 
> context-switch from kernel to user-space, and back for the reply.

To be less vague: Knot DNS spends about 40 % of time waiting for UDP handling 
in kernel.

> 
> So the bottleneck on a DNS server is generally how fast the CPU can context 
> switch, and this often had a hardwired limit. In that you can top out the 
> packet throughput with the CPU still showing %idle.
> 
> I believe there is (or has been) a dev going on in the kernel to fix this.
> 
> I might be behind the curve, I've not looked into it for a bit.
> 
>> Algorithm 8 or 13 both seem like plausible targets, but opinions from the 
>> community would be very welcome.
> 
> I recently had to help a client make this exact same decision.
> 
> We felt they'd probably want to move to 13 one day and one move is lower risk 
> than two.
> 
> It benefits from smaller UDP packets, big packets can become a problem (esp 
> in v6), so we went for 13.
> 
> Changing algorithm is not fun.

Maybe you do not use the right software :-)

With right automation it is just matter of changing alg. specification + DS 
change at parent.

See
https://www.knot-dns.cz/docs/2.9/singlehtml/#automatic-ksk-and-zsk-rollovers-example
(It works equally well for alg rollovers.)

Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] .ORG still using SHA-1 DNSKEYs

2020-02-07 Thread Petr Špaček


On 06. 02. 20 1:58, Viktor Dukhovni wrote:
> On Wed, Feb 05, 2020 at 12:05:41PM -0500, Joe Abley wrote:
> 
>> We (PIR) are currently discussing a timeline for implementing changes
>> with Afilias, who run all the back-end registry systems for ORG.
>> Algorithm 8 or 13 both seem like plausible targets, but opinions from
>> the community would be very welcome.
> 
> FWIW, the momentum seems to be with algorithm 13:
> 
> https://twitter.com/AFNIC/status/1222904523481444362
> 
> But if the wisdom of the crowd is not the right basis for a decision,
> the considerations as I see them are:
> 
> 1.  P256(13) is generally considered equivalent to ~3072-bit
> RSA in security.
> 
> 2.  P256 signatures are half the size of 1024-bit RSA signatures
> (less amplification and/or truncation).
> 
> 3.  Signing with P256 is much faster than RSA.  For example, on my
> 25-watt (low power) 4-core 8-thread Xeon some quick informal
> measurements with "openssl speed" (1.1.1d) (1 thread, 4 threads
> and 8 threads) yield[1]:
> 
>   signverifysign/s verify/s
> rsa 2048 bits 0.000750s 0.21s   1333.0  48295.7
> rsa 2048 bits 0.000225s 0.08s   4445.0 131794.8  (4x MP)
> rsa 2048 bits 0.000173s 0.05s   5768.1 193455.0  (8x MP)
> 
> rsa 1024 bits 0.000107s 0.07s   9302.9 146937.3
> rsa 1024 bits 0.34s 0.02s  29785.5 467587.8  (4x MP)
> rsa 1024 bits 0.25s 0.02s  40302.4 564390.1  (8x MP)
> 
> rsa 1280 bits 0.000388s 0.12s   2575.4  83000.6
> rsa 1280 bits 0.000124s 0.04s   8058.4 256073.9  (4x MP)
> rsa 1280 bits 0.91s 0.03s  10937.2 349880.0  (8x MP)
> 
> ecdsa p2560.s   0.0001s36479.8  12455.2
> ecdsa p2560.s   0.s   124877.2  40733.4  (4x MP)
> ecdsa p2560.s   0.s   167358.6  52250.6  (8x MP)
> 
> 4.  However, as you can see above, signature *verification* with
> P256 is ~7 times slower than with 1280-bit RSA (or ~12 times
> slower than with 1024-bit RSA).
> 
> So if you're optimizing for higher security and lower packet size (and
> perhaps much faster zone signing time), P256(13) is the way to go.  If
> however, you're concerned about resolver performance, then rsa1280 has
> an advantage.
> 
> Thus my 25W server running flat out can do ~350K rsa1280 signature
> checks per second, vs. ~52K P256 signature checks per second.
> 
> When the DANE/DNSSEC survey is running, unbound is keeping 1 core pretty
> busy handling O(5K) cache misses a second.
> 
> I don't know what fraction of the CPU cost is in the crypto vs. all the
> other costs of processing the traffic.  I am reluctant to increase
> concurrency, lest my queries be throttled by upstream nameservers.
> 
> The survey already has to deal with a large fraction of the domains
> using P256, so these likely already dominate any crypto impact on CPU
> cost, and yet I can do ~5K validated qps on a low power server also
> running a Postgres database and the survey engine.  The system is
> somewhat less than 50% utilized while running the survey.

Anecdotal evidence:
When benchmarking Knot Resolver on realistic "ISP scenarios", amount of CPU 
time spent on DNSSEC validation is dwarfed by all the rest. It has two reasons:
- In practive most of the traffic is cache-hit.
- You would be surprised how slow UDP packet processing in kernel can be ;-)

Based on this anecdote RSA has no practical performance-advantage over P256.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-28 Thread Petr Špaček
On 27. 11. 19 21:49, David Conrad wrote:
> Petr,
> 
>> I think there is even more fundamental problem:
>> Someone has to pay operational costs of "the new system”.
> 
> The “new system” is simply the existing network of resolvers, augmented to 
> have the root zone.  As far as I can tell, the operational cost would be in 
> (a) ensuring the resolver is upgraded to support obtaining the root zone and 
> (b) dealing with the fetch of the root zone with some frequency.

Oh, sorry, this is misunderstanding! My reference to "the new system" was meant 
to be "the new system for root zone distribution".
Please let me try again:

Even if "the new system for root zone distribution" is BitTorrent it still:
- (most likely) needs a set of static IP addresses to solve the bootstrap 
problem,
- trackers need to be highly resilient against DDoS,
- trackers most likely need to be anycasted to limit scope of DDoS.

I hypothetise that in the end requirements for "the new system for root zone 
distribution" will be fairly close to current requirements for current DNS root 
system... so I do not see where the cost reduction comes from.

Or in other words:
If current root system must survive 1 TB/s attack so must the "the new system 
for root zone distribution" system, unless we move to decenralized root.

Changing one centralized system to another does not solve the fundamendal 
problem of costly-to-defent-single-point-of-failure.

Hopefully it is clearer this time.
Petr Špaček  @  CZ.NIC


> 
> There would be an additional cost, that of making the root zone available to 
> myriads of resolvers, but I believe this is an easily handled issue.
> 
>> Personally I do not see how transition to another root-zone-distribution 
>> system solves the over-provisioning problem - the new system still has to be 
>> ready to absorm absurdly large DDoS attacks.
> 
> Two ways:
> - greater decentralization: there are a lot more resolvers than the number of 
> instances root server operators are likely to ever deploy. While an 
> individual resolver might melt down, the impact would only be the end users 
> using that resolver (and it is relatively easy for a resolver operator to add 
> more capacity, mitigate the attacking client, etc).
> - the cost of operating and upgrade the service to deal with DDoS is 
> distributed to folks whose job it is to provide that service (namely the ISPs 
> or other network operators that run the resolvers).  Remember that the root 
> server operators have day jobs, some of which are not particularly related to 
> running root service, and they are not (currently) being compensated for the 
> costs of providing root service.
> 
>> Have a look at https://www.knot-dns.cz/benchmark/ . The numbers in charts at 
>> bottom of the page show that a *single server machine* is able to reply 
>> *all* steady state queries for the root today.
>> ...
>> Most of the money is today spent on *massive* over-provisioning. As an 
>> practical example, CZ TLD is over-provisiong factor is in order of 
>> *hunderds* of stead-state query traffic, and the root might have even more.
> 
> Yep. As mentioned before, steady state is largely irrelevant.
> 
> In my view, the fact that root service infrastructure funnels up to a 
> (logical) single point is an architectural flaw that may (assuming DDoS 
> attack capacity continues to grow at the current rate or even grows faster 
> with crappy IoT devices) put the root DNS service at risk.  One of the 
> advantages of putting the root zone in the resolver is that it mitigates that 
> potential risk.
> 
> Regards,
> -drc
> (Speaking for myself, not any organization I may be affiliated with)

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-27 Thread Petr Špaček
On 26. 11. 19 12:46, David Conrad wrote:
> On Nov 26, 2019, at 11:33 AM, Jim Reid  <mailto:j...@rfc1035.com>> wrote:
>>> On 26 Nov 2019, at 09:16, Florian Weimer >> <mailto:f...@deneb.enyo.de>> wrote:
>>>
>>> Up until recently, well-behaved recursive resolvers had to forward
>>> queries to the root if they were not already covered by a delegation.
>>> RFC 7816 and in particular RFC 8198 changed that, but before that, it
>>> was just how the protocol was expected to work.
>>
>> So what? These RFCs make very little difference to the volume of queries a 
>> resolving server will send to the root. QNAME minimisation has no impact at 
>> all: the root just sees a query for .com instead of foobar.com 
>> <http://foobar.com/>. A recursive resolver should already be supporting 
>> negative caching and will have a reasonably complete picture of what's in 
>> (or not in) the root. RFC8198 will of course help that but not by much IMO.
> 
> It would appear a rather large percentage of queries to the root (like 50% in 
> some samples) are random strings, between 7 to 15 characters long, sometimes 
> longer.  I believe this is Chrome-style probing to determine if there is 
> NXDOMAIN redirection. A good example of the tragedy of the commons, like 
> water pollution and climate change.
> 
> If resolvers would enable DNSSEC validation, there would, in theory, be a 
> reduction in these queries due to aggressive NSEC caching.  Of course, 
> practice may not match theory 
> (https://indico.dns-oarc.net/event/32/contributions/717/attachments/713/1206/2019-10-31-oarc-nsec-caching.pdf).
>  

The discussion after the talk (including hallway :-) was also interesting, and 
with all respect for Geoff's work, these slides should be read with some 
sceptism.

Main points:
1) Load-balancer with N resolver nodes behind it decrease effectivity of 
aggressive cache by factor N, it *does not* invalidate the concept.

In other words, a random subdomain attack which flows through resolver farm 
with N nodes has to fill N caches with NSEC records, and that will simply take 
N times longer when compared with non-load-balanced scenario.

The aggressive cache still provides upper bound for size of NXDOMAIN RRs in 
cache, which is super useful under attack because it prevents individual 
resolvers from dropping all the useful content from cache during the attack.


2) Two out of five major DNS resolver implementations used by large ISPs did 
not implement aggressive caching (yet?), so it needs to be expected that 
deployment is not great. Also the feature is pretty new and large ISPs are 
super conservative and might not deployed new versions yet ...

I forgot the rest so I will conclude with: Watch the video recording and think 
yourself! :-)
Petr Špaček  @  CZ.NIC


> 
> Of course, steady state query load is largely irrelevant since root service 
> has to be provisioned with massive DDoS in mind. In my personal view, the 
> deployment of additional anycast instances by the root server operators is a 
> useful stopgap, but ultimately, given the rate of growth of DoS attack 
> capacity (and assuming that growth will continue due to the stunning security 
> practices of IoT device manufacturers), stuff like what is discussed in that 
> paper is the right long term strategy.

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-27 Thread Petr Špaček
On 27. 11. 19 9:53, Ondřej Surý wrote:
> Mark,
> 
> I believe that any distributed system that won’t have a fallback to the RZ
> is inevitably doomed and will get out of sync.
> 
> The RFC7706 works because there’s always a safe guard and if the resolver
> is unable to use mirrored zone, it will go to the origin.
> 
> Call me a pessimist, but I’ve yet to see a loosely often neglected 
> distributed system
> that won’t get out of sync.
> 
> So, while the idea of distributing the full RZ to every resolver out there, 
> there are two
> fundamental problems:
> 
> 1. resilience - both against DoS and just plain breakage
> 2. the old clients - while the situation out there is getting better, we will 
> still be stuck with
> old codebase for foreseeable future
> 
> What we can do is to make the load on RZ servers lighter, but we can’t make 
> them just go.

I think there is even more fundamental problem:
Someone has to pay operational costs of "the new system".

Personally I do not see how transition to another root-zone-distribution system 
solves the over-provisioning problem - the new system still has to be ready to 
absorm absurdly large DDoS attacks.

Example:
Have a look at https://www.knot-dns.cz/benchmark/ . The numbers in charts at 
bottom of the page show that a *single server machine* is able to reply *all* 
steady state queries for the root today.

Sure, we have speed-of-light limits, so let's say we need couple hunderd 
servers in well connected places to keep reasonable latency. That's not a huge 
cost overall (keep in mind that these local nodes could be pretty small *if we 
were ignoring the over-provisioning problem*).

Most of the money is today spent on *massive* over-provisioning. As an 
practical example, CZ TLD is over-provisiong factor is in order of *hunderds* 
of stead-state query traffic, and the root might have even more.

Once we have similarly resilient HTTP system it is matter of simple 
configuration :-D
https://knot-resolver.readthedocs.io/en/stable/modules.html#cache-prefilling

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-27 Thread Petr Špaček
On 26. 11. 19 16:04, Roy Arends wrote:
> 
> 
>> On 26 Nov 2019, at 12:46, David Conrad  wrote:
>>
>> It would appear a rather large percentage of queries to the root (like 50% 
>> in some samples) are random strings, between 7 to 15 characters long, 
>> sometimes longer.  I believe this is Chrome-style probing to determine if 
>> there is NXDOMAIN redirection. A good example of the tragedy of the commons, 
>> like water pollution and climate change.
> 
> Yep.
> 
> https://chromium.googlesource.com/chromium/src/+/32352ad08ee673a4d43e8593ce988b224f6482d3/chrome/browser/intranet_redirect_detector.cc
> Line 79: "// We generate a random hostname with between 7 and 15 characters.”
> 
> https://ithi.research.icann.org/graph-m3.html
> Table "Queries to frequently found name patterns” shows that the frequency 
> distribution for queries between 7 and 15 characters are near flat (around 
> 5.2% per character length) AND an order higher than ANY other queries.
> 
> “Coincidence? I think NOT!”  
> 
> https://youtu.be/MDpuTqBI0RM?t=53

FYI there is also an issue about this in their tracker:
https://bugs.chromium.org/p/chromium/issues/detail?id=946450#c1

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] s3.amazonaws.com problem - price to pay for not using DNSSEC

2019-10-24 Thread Petr Špaček
On 23. 10. 19 14:37, Daniel Stirnimann wrote:
> I have located a host in our network which sends such queries the
> network resolver (which we operate):
> 
> mqfgioo5.s3.amazonaws[.]com. IN CNAME
> 6l-dpfrn.s3.amazonaws[.]com. IN CNAME
> 2idg5c42.s3.amazonaws[.]com. IN CNAME
> qzq3uz5m.s3.amazonaws[.]com. IN CNAME
> nenkxm2p.s3.amazonaws[.]com. IN CNAME
> yk2max6j.s3.amazonaws[.]com. IN CNAME
> qhcbric2.s3.amazonaws[.]com. IN CNAME
> wg-jmekf.s3.amazonaws[.]com. IN CNAME
> dnwn2ip1.s3.amazonaws[.]com. IN CNAME
> 711o385.s3.amazonaws[.]com. IN CNAME
> rn0v02a6.s3.amazonaws[.]com. IN CNAME
> pm1a3a4t.s3.amazonaws[.]com. IN CNAME
> 0xc.tibo.s3.amazonaws[.]com. IN CNAME
> 76jt.m9g.s3.amazonaws[.]com. IN CNAME
> 4tjc8hp.s3.amazonaws[.]com. IN CNAME
> b-.9ft7y.s3.amazonaws[.]com. IN CNAME

Funnily enough this attack would have been partially mitigated on resolver side 
if S3 domain was signed with DNSSEC!

New versions of resolvers already implement RFC 8198 which makes 
random-subdomain attacks ineffective against DNSSEC-signed domains. With 1/3 of 
clients in the world behind DNSSEC-validating resolver it would already make a 
difference.

This "protection effect" of signing + RFC 8198 was experimentally confirmed by 
measurements back in March 2018 and reported by me at DNS-OARC 28 meeting. 
Slides:
https://indico.dns-oarc.net/event/28/contributions/509/attachments/479/786/DNS-OARC-28-presentation-RFC8198.pdf

Update from 2019:
Slight latency increase reported on slide 9 is in fact bug in BIND 
implementation and not feature of the protocol.

Petr Špaček  @  CZ.NIC


> 
> Interestingly, it also sends other suspicious queries such as:
> 
> . IN TYPE1847
> . IN TYPE1847
> . IN TYPE567
> . IN TYPE1847
> . IN TYPE567
> . IN TYPE1847
> . IN TYPE1847
> . IN TYPE1900
> . IN TYPE823
> . IN TYPE1900
> . IN TYPE1847
> 7a4. IN TYPE868
> . IN TYPE1847
> . IN TYPE1847
> . IN TYPE1900
> . IN TYPE1847
> . IN TYPE1847
> 3n2y. IN TYPE612
> . IN TYPE311
> . IN TYPE1900
> 
> However, these are mostly answered from cache because of aggressive use
> of DNSSEC-validated cache. Still, I guess root server operators may see
> an increase in queries with unassigned query types.
> 
> Daniel
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Random question about Google resolver behaviour and long-lived TCP sessions

2019-09-30 Thread Petr Špaček
On 27. 09. 19 18:19, Alexander Dupuy via dns-operations wrote:
> Tony Finch wrote:
> 
> So I wonder if Google have implemented EDNS TCP keepalive. If you change
> what BIND calls tcp-advertised-timeout, do Google's TCP connection
> lifetimes change to match?
> 
> 
> Google Public DNS has not implemented EDNS TCP keepalive, neither as a server 
> for its clients, nor in its TCP connections to authoritative servers. Has 
> BIND added support on its client side, or only as a DNS server? It seems like 
> Unbound has client and server-side support 
> (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231283), and the GetDNS 
> client code also supports it (https://getdnsapi.net/releases/getdns-0-9-0/) 
> but those are the only ones I found.

Knot Resolver has a stub implementation of EDNS keepalive:
https://knot-resolver.readthedocs.io/en/stable/modules.html#edns-keepalive

Quote from docs:
The edns_keepalive module implements RFC 7828 for clients connecting to Knot 
Resolver via TCP and TLS. Note that client connections are timed-out the same 
way regardless of them sending the EDNS option; the module just allows clients 
to discover the timeout.

When connecting to servers, Knot Resolver does not send this EDNS option. It 
still attempts to reuse established connections intelligently.


> I don't see any implementations of RFC 8490 (DNS Stateful Operations).

BTW the protocol is complex like hell so I do not see it being implemented 
soon, if even, in Knot Resolver.

-- 
Petr Špaček  @  CZ.NIC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations