Re: [dns-operations] Open source DNS quality assurance & risks: survey and discussion
Last call: The survey will close in 24 hours. Link: https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey Thank you very much to those of you who already filled the survey! If you did not yet please take 4 minutes of your time, it's pretty easy with lots of multiple choice, so that you'll hardly have to type anything yourself! We'd like to draw some conclusions from the results *by this Friday*, so don't schedule it for later: click now on the link and contribute with your experience for next week's discussion. Petr Špaček Internet Systems Consortium On 13. 05. 24 10:05, Petr Špaček wrote: Dear DNS colleagues, I invite you all to an open discussion about Open source quality assurance & risk mitigation. Hopefully this is relevant to many participants here as I believe that open source and DNS go well together. To fuel the discussion please fill out a 5-minute survey here: https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey Our goal is to find out what people actually look for when assessing open source quality and risks associated with the software. The actual discussion will take place during hybrid RIPE 88 meeting in Krakow, Poland on Thursday 2024 May 23, starting around 14:00 UTC+2 - during Open source working group session. Remote participation is available free of charge [1]. You opinions are very welcome even if you don't plan to attend the meeting! [1] https://ripe88.ripe.net/attend/register/ Thank you for your time - and see you in the meeting, at least virtually! ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] Open source DNS quality assurance & risks: survey and discussion
Dear DNS colleagues, I invite you all to an open discussion about Open source quality assurance & risk mitigation. Hopefully this is relevant to many participants here as I believe that open source and DNS go well together. To fuel the discussion please fill out a 5-minute survey here: https://ec.europa.eu/eusurvey/runner/RIPE88OpenSourceWGSurvey Our goal is to find out what people actually look for when assessing open source quality and risks associated with the software. The actual discussion will take place during hybrid RIPE 88 meeting in Krakow, Poland on Thursday 2024 May 23, starting around 14:00 UTC+2 - during Open source working group session. Remote participation is available free of charge [1]. You opinions are very welcome even if you don't plan to attend the meeting! [1] https://ripe88.ripe.net/attend/register/ Thank you for your time - and see you in the meeting, at least virtually! -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] OARC 43 - Call for Contribution
The Programme Committee is seeking contributions from the community. This workshop will be a hybrid event. Date - likely in the week of 23-27 September 2024, details will be confirmed later Location - South America, exact location will be confirmed later Time zone - approximately 09:00-17:00 UTC-5 Co-located with - related industry events, will be confirmed later Deadline for Submissions - 2024-06-23 23:59 UTC For further details please see https://indico.dns-oarc.net/event/51/abstracts/ Petr Špaček, for the DNS-OARC Programme Committee ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] ag.gov not providing NXDOMAIN responses
On 11. 04. 24 6:15, Stephane Bortzmeyer wrote: On Tue, Apr 09, 2024 at 01:09:20PM -0500, David Zych wrote a message of 121 lines which said: The problem: when queried for a record underneath ag.gov. which does not exist, these nameservers do not return a proper NXDOMAIN response; instead, they don't answer at all. Funny enough, it depends on the QTYPE. % dig @ns2.usda.gov. nonono.ag.gov A ;; communications error to 2600:12f0:0:ac04::206#53: timed out ;; communications error to 2600:12f0:0:ac04::206#53: timed out ;; communications error to 2600:12f0:0:ac04::206#53: timed out ;; communications error to 199.141.126.206#53: timed out ; <<>> DiG 9.18.24-1-Debian <<>> @ns2.usda.gov. nonono.ag.gov A ; (2 servers found) ;; global options: +cmd ;; no servers could be reached % dig @ns2.usda.gov. nonono.ag.gov NS ; <<>> DiG 9.18.24-1-Debian <<>> @ns2.usda.gov. nonono.ag.gov NS ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 44750 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1220 ; COOKIE: 108e6a3526539745cbe04caf6617b75afc5cf42f25232e56 (good) ;; QUESTION SECTION: ;nonono.ag.gov. IN NS ;; AUTHORITY SECTION: ag.gov. 900 IN SOA ns1.usda.gov. duty\.officer.usda.gov. ( ... The practical trouble this causes has to do with an increasingly popular DNS privacy feature called QNAME Minimization, which depends upon authoritative DNS servers like yours responding in a standards-compliant way to queries like _.ag.gov IN A _.ars.ag.gov IN A _.tucson.ars.ag.gov IN A More fun: the previous version of QNAME minimisation used QTYPE=NS. It then changed to QTYPE=A precisely to work around broken middleboxes. (And also to avoid sticking out.) This is not only in violation of https://datatracker.ietf.org/doc/html/rfc8906 but it is an outright security issue because it allows attackers to mess up load balancing in resolvers. See https://indico.dns-oarc.net/event/47/contributions/1018/attachments/959/1802/pre-silence-not-golden-dns-orac.pdf I predict you have much better chance getting this fixed if you go through respective CERT team and point them to this presentation. Answering before some asks: No, we are not going to workaround this in BIND resolver. It has to be fixed on the auth side. This is not a security bug in BIND. See https://bind9.readthedocs.io/en/latest/chapter7.html#dns-resolvers -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] NSEC3PARAM change strange behaviour
On 12. 10. 23 13:09, Misak Khachatryan wrote: Thank you Viktor, In logs I see IXFR, which should be a case. This brings me to question to bind developers - shouldn't a change of dnssec-policy or at least such destructive ones automatically trigger AXFR? Of course it's not a question to be asked here, I will check the documentation of bind and ask it in the appropriate mailing list. Just to close the loop, you can configure "max-ixfr-ratio" option. See https://bind9.readthedocs.io/en/latest/reference.html#namedconf-statement-max-ixfr-ratio Please send further questions to mailing list https://lists.isc.org/mailman/listinfo/bind-users -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Signature expired for the DS of .ch at Cloudflare ?
On 04. 10. 23 10:38, Stephane Bortzmeyer wrote: On Wed, Oct 04, 2023 at 10:35:14AM +0200, Stephane Bortzmeyer wrote a message of 57 lines which said: Other instances of Cloudflare has the correct info: % dig +cd +nsid @1.1.1.1 DS ch. https://www.cloudflarestatus.com/ Investigating - Cloudflare is aware of, and investigating, DNS resolution issues which potentially impacts multiple users using 1.1.1.1 public resolver and/or WARP. Further detail will be provided as more information becomes available. Oct 04, 2023 - 08:19 UTC Details are now here: https://blog.cloudflare.com/1-1-1-1-lookup-failures-on-october-4th-2023/ -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS over TCP response fragmentation
On 03. 10. 23 11:25, Jan Petto wrote: For my research, I am sending DNS requests over TCP to many different recursive DNS servers all over the world. A significant portion of these servers is sending the DNS response in two separate TCP segments, even though it would easily fit into one packet. Only after my client has acknowledged the first segment, the second part of the response is sent. The first TCP segment always contains only one or two bytes, never more. I know a DNS message sent over TCP is prefixed by a two-byte field containing the message length. My first thought was that the first TCP segment contains this length field, and the second segment contains the DNS message, but then I discovered cases where only one of the two length bytes was contained in the first segment. In any case, sending the message length as a separate packet does not make much sense to me from an application design perspective. Maybe this is some sort of attack mitigation? I have attached a packet capture containing two such examples. You can reproduce the behavior with any DNS client, e.g. dig: # dig example.org +tcp @100.37.202.139 Also attached is a list of public DNS server IP addresses, where I have observed this behavior. They were found via scans of the IP address space, I have no affiliation with these servers. I would greatly appreciate any input as to why so many servers are sending their responses in such a way. I bet it's just suboptimal implementation on some SOHO router or something like that. There are two things at play, I believe: - Responder apparently does not use TCP_CORK (see "man tcp") or a userspace equivalent. - Kernel is very relatex when it comes to TCP protocol segmentation. Nothing prescribes that TCP streams MUST be segmented in some sort of optimal way. -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] MaginotDNS: Attacking the boundary of DNS caching protection
On 27. 09. 23 9:38, Ralf Weber wrote: Moin! On 27 Sep 2023, at 3:58, Xiang Li wrote: Hi Stephane, This is Xiang, the author of this paper. For the off-path attack, DoT can protect the CDNS from being poisoned. For the on-path attack, since the forwarding query is sent to the attacker's server, only DNSSEC can mitigate the MaginotDNS. I don’t think this is true otherwise all resolver implementations would have been affected and not just a few. If you are on path direct behind the resolver of course all bets are off, but if you are on path just between the resolver and the forwarder those resolvers that are more cautious in what cache information they use for iterative queries are not vulnerable. I guess that is why Akamai Cacheserve, NLNet Labs Unbound and PowerDNS Recursor are not mentioned in the paper because they were not vulnerable. That's right. If you are interested in the gory details, BIND's description of the issue can be found here: https://gitlab.isc.org/isc-projects/bind9/-/issues/2950#note_241893 https://gitlab.isc.org/isc-projects/bind9/-/issues/2950#note_244624 Also the surrounding comments have more details including vulnerable config files and PCAPs. -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] IETF 118 hackaton: Does Not Scale: Rethinking DNS
Hello all! I would like to invite you to a "round table" planned during IETF 118 hackathon [1] - Saturday and Sunday before IETF 118. We plan to have an open and friendly brainstorming session with people who work on the DNS protocol, write implementations, and operate networks. The purpose is to brainstorm and think about DNS without being bound by current protocol constraints. Where are we hitting limits? What can we do about them? Do you want to put your protocol pet peeve out of its misery? If you want to join, please list yourself here: https://doodle.com/meeting/participate/id/azXrrv7d. This will allow us to secure a large enough workspace. Participants are expected to come with their homework done. Bring a list of limitations you can see in the current protocol with you, and don't hesitate to think big. Hate the duplicate TTLs in DNS messages? Please write it down. Want secure & flexible transport protocol specification? Never liked the compression method? Put it on the list. As a teaser, here are a couple of real-world motivating questions just to get us started. How do we make DNS: ... scalable so it can transfer millions of zones? And how do we monitor it? [2] ... handle humongous post-quantum crypto keys and signatures, in both protocol and transport? [3] ... support distributed multi-master setups? ... extensible to new wire format & at the same time, maintain a single namespace? ... simpler to operate? What if we rethink basic assumptions? [4] (see the talk starting at 33:40) [1] https://wiki.ietf.org/en/meeting/118/hackathon [2] https://indico.dns-oarc.net/event/47/contributions/1017/ [3] https://indico.dns-oarc.net/event/46/contributions/985/ [4] https://icann.zoom.us/rec/share/PUZu_QsO_rdY0gavMatzFOSVpZY1oNahNYnPBuy6pgTUJARw-YIOEzWEV11aqaHW.4Cwr3dGRlunUwhD9?startTime=1693897245000 It's unlikely we will produce running code, but hopefully we'll generate some good ideas and possibly proto-I-Ds. -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region
On 18. 07. 23 23:53, Viktor Dukhovni wrote: Currently, it’s 7 days for .com which almost exactly matches the RRSIG expiry-inception difference and that doesn’t give any wiggle room if things go wrong. Expiry in the SOA applies to AXFR, but may deployments are not AXFR-based. And Verisign apparently did try to isolate the server, sadly that didn't work out as expected. . - 7 days SOA expiry and 14 days signature validity .cz - 7 days SOA expiry and 14 days signature validity .nl - 28 days SOA expiry and 14 days signature validity .org - 14 days SOA expiry and 3 weeks signature validity Do any of these use AXFR? If all the servers are effectively "primary", with incremental zone updates driven by some other process, the SOA expiry is of little relevance. Sure they should go offline before signatures start to go stale (as Verisign tried to do, but failed). Indeed some of the TLDs listed use good old AXFR/IXFR, but that's besides the point. See below. The "go offline" logic should therefore be robust, but that's not the topic at hand I think. The topic is whether "bogus" should generally be retriable (or even required to be retriable within reasonable retry limits, and with error caching holddowns to avoid thundering herd storms, ...). I think that SOA EXPIRY is equally relevant to any sort of replication mechanism. Even if everything is driven by a non-DNS database backend it presumably has some notion of last successful synchronization with it's database-peers. Such timestamp can be used to trigger SERVFAIL when (last sync + SOA EXPIRY) time has passed. -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNSSEC in BIND
Hello, detailed documentation for DNSSEC in BIND is here: https://bind9.readthedocs.io/en/latest/dnssec-guide.html If anything is unclear please post questions to BIND mailing list: https://lists.isc.org/mailman/listinfo/bind-users HTH. Petr Špaček Internet Systems Consortium On 12. 06. 23 15:37, daniel majela wrote: Hello... My name is Daniel Majela and if possible I would like some help to implement DNNSEC on my servers. Today I have 3 recursive and authoritative servers. My external authoritative zones are copied to 2 DNS servers that are in the DMZ. My first question is if there is a step by step way to implement dhssec using bind9 9.16.23-RH? ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNSSEC in BIND
Hello, detailed documentation for DNSSEC in BIND is here: https://bind9.readthedocs.io/en/latest/dnssec-guide.html If anything is unclear please post questions to BIND mailing list: https://lists.isc.org/mailman/listinfo/bind-users HTH. Petr Špaček Internet Systems Consortium On 12. 06. 23 15:37, daniel majela wrote: Hello... My name is Daniel Majela and if possible I would like some help to implement DNNSEC on my servers. Today I have 3 recursive and authoritative servers. My external authoritative zones are copied to 2 DNS servers that are in the DMZ. My first question is if there is a step by step way to implement dhssec using bind9 9.16.23-RH? ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] [DNSOP] bind fails to continue recursing on one specific query
On 29. 03. 23 13:03, Dave Lawrence wrote: Peter DeVries via dns-operations writes: Another relevant draft: https://datatracker.ietf.org/doc/html/rfc8906 Not sure how, it doesn't address _. as a use case at all and I only see testing for minimal EDNS not minimal qname. The journey of that document was with, essentially, No Response Considered Harmful. While it does go over many specific examples, the thrust of it from the Introduction is that not responding to legitimate queries is an ambiguous signal that burdens the DNS ecosystem even more. That's right. Well behaved DNS resolvers might assume that timeout indicates that the server is not keeping up, and resolver should try another server or enable throttling for a given non-responsive server (in an attempt to help server to keep up with load). In other words, dropping queries from resolvers might/will cause legitimate clients to not get timely answers, but attackers will not care and will continue flooding the resolver. Artificial timeouts also wreak havoc to some RTT estimation approaches etc. Thus => RFC 8906 => It's A Bad Idea To Drop Queries. -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] [DNSOP] bind fails to continue recursing on one specific query
On 28. 03. 23 13:00, Peter DeVries via dns-operations wrote: The queries for "_.extglb.tn.gov. IN A ?" in your PCAP are a novelty to me. Are these some form of query minimisation, or some sort of sanity check of the delegation? Sadly, the "tn.gov" nameserver just drops these without responding, so their failure could well contribute to the problems you observe. These are indeed how BIND does qname minimization in "relaxed" mode which is currently the default. We almost blocked these because we didn't know what they were but then I stumbled upon one of the old RFC drafts for query minimization and it does mention this as a technique. I could see someone else doing so as well because it did make up a very large percentage of our inbound queries and there isn't much documentation on it. FTR the underscore trick is now documented in https://bind9.readthedocs.io/en/latest/reference.html#namedconf-statement-qname-minimization (And also mentioned in RFC 7816 section 3.) -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Cloudflare TYPE65283
On 27. 03. 23 16:00, Emmanuel Fusté wrote: Le 27/03/2023 à 15:38, Petr Špaček a écrit : On 27. 03. 23 15:27, Emmanuel Fusté wrote: Le 27/03/2023 à 14:34, Petr Špaček a écrit : On 27. 03. 23 13:31, Emmanuel Fusté wrote: Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit : Le 27/03/2023 à 12:14, Joe Abley a écrit : Hi Emmanuel, On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté wrote: Cloudflare start to return TYPE65283 in their NSEC records for "compact DNSSEC denial of existence"/"minimal lies" for NXDOMAINs. It actually break "minimal lies" NXDOMAIN established decoding implementations. Does someone know the TYPE65283 usage/purpose in this context ? If a compact negative response includes an NSEC RR whose type bitmap only includes NSEC and RRSIG, the response is is indistuishable from the case where the name exists but is an empty non-terminal. Adding a special entry in the type bitmap avoids that ambiguity and as a bonus provides an NXDOMAINish signal as a kind of compromise to those consumers who are all pitchforky about the RCODE. The spec currently calls that special type NXNAME. https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt <https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt> The spec is still a work in progress and the NXNAME type does not have a codepoint. I believe TYPE65283 is being used as a placeholder. I think Christian made a comment to that effect on this list last week, although I think he may not have mentioned the specific RRTYPE that was to be used. If this has caused something to break, more details would be good to hear! .. Ok, replying to myself. TYPE65283 is as you stated the place holder for a future NXNAME. So they silently break their previous implementation to implement half of this this draft. Their previous NXDOMAIN implementation correspond to draft ENT case, but they still implement their old way for ENT. Thank you for the pointer. Could you elaborate on the type of breakage you mentioned? What got broken, specifically? Client side previous draft NXDOMAIN status decoding as originally encoded by Cloudflare. Having application relying on the NXDOMAIN status passed by the API, we where forced to add simple minimal lies decoding on the stub resolver (we don't want to disable DNSSEC validation on our trusted resolver or do special treatments on it for theses clients). The decoding is based on exactly the presence of RRSIG and NSEC in the NSEC record. The NS1 extension for restoring simple ENT identification is compatible with this scheme as for ENT you get RRSIG NSEC and TYPE65281. Now I need to explicitly strip (or special case) TYPE65283 to restore NXDOMAIN identification from Cloudflare and still identify NXDOMAIN on NS1 and NXDOMAIN or ENT on Route53. If Cloudflare switch to this draft for the ENT case too, it will became as worse as Route53 and only NS1 will give distinguishable real NXDOMAIN. Or ALL compact lies response implementer should switch to this new draft and be known to have switched. Thank you, that explains it! I simply did not expect changes to draft implementations to be called "breakage". Yes, I perfectly understand this position toward drafts in the common IETF sense/usage. But as these drafts where and are imposed to us unilaterally on the whole Internet since years by majors DNS service providers they are sadly de-facto standards. You got me curious: What is the use-case depending on this? I mean, from reading the DNS spec _alone_ it's not clear why any of variants in use should cause serious problems if it's done correctly. -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Cloudflare TYPE65283
On 27. 03. 23 15:27, Emmanuel Fusté wrote: Le 27/03/2023 à 14:34, Petr Špaček a écrit : On 27. 03. 23 13:31, Emmanuel Fusté wrote: Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit : Le 27/03/2023 à 12:14, Joe Abley a écrit : Hi Emmanuel, On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté wrote: Cloudflare start to return TYPE65283 in their NSEC records for "compact DNSSEC denial of existence"/"minimal lies" for NXDOMAINs. It actually break "minimal lies" NXDOMAIN established decoding implementations. Does someone know the TYPE65283 usage/purpose in this context ? If a compact negative response includes an NSEC RR whose type bitmap only includes NSEC and RRSIG, the response is is indistuishable from the case where the name exists but is an empty non-terminal. Adding a special entry in the type bitmap avoids that ambiguity and as a bonus provides an NXDOMAINish signal as a kind of compromise to those consumers who are all pitchforky about the RCODE. The spec currently calls that special type NXNAME. https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt <https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt> The spec is still a work in progress and the NXNAME type does not have a codepoint. I believe TYPE65283 is being used as a placeholder. I think Christian made a comment to that effect on this list last week, although I think he may not have mentioned the specific RRTYPE that was to be used. If this has caused something to break, more details would be good to hear! .. Ok, replying to myself. TYPE65283 is as you stated the place holder for a future NXNAME. So they silently break their previous implementation to implement half of this this draft. Their previous NXDOMAIN implementation correspond to draft ENT case, but they still implement their old way for ENT. Thank you for the pointer. Could you elaborate on the type of breakage you mentioned? What got broken, specifically? Client side previous draft NXDOMAIN status decoding as originally encoded by Cloudflare. Having application relying on the NXDOMAIN status passed by the API, we where forced to add simple minimal lies decoding on the stub resolver (we don't want to disable DNSSEC validation on our trusted resolver or do special treatments on it for theses clients). The decoding is based on exactly the presence of RRSIG and NSEC in the NSEC record. The NS1 extension for restoring simple ENT identification is compatible with this scheme as for ENT you get RRSIG NSEC and TYPE65281. Now I need to explicitly strip (or special case) TYPE65283 to restore NXDOMAIN identification from Cloudflare and still identify NXDOMAIN on NS1 and NXDOMAIN or ENT on Route53. If Cloudflare switch to this draft for the ENT case too, it will became as worse as Route53 and only NS1 will give distinguishable real NXDOMAIN. Or ALL compact lies response implementer should switch to this new draft and be known to have switched. Thank you, that explains it! I simply did not expect changes to draft implementations to be called "breakage". -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Cloudflare TYPE65283
On 27. 03. 23 13:31, Emmanuel Fusté wrote: Le 27/03/2023 à 12:37, Emmanuel Fusté a écrit : Le 27/03/2023 à 12:14, Joe Abley a écrit : Hi Emmanuel, On Mon, Mar 27, 2023 at 10:51, Emmanuel Fusté wrote: Cloudflare start to return TYPE65283 in their NSEC records for "compact DNSSEC denial of existence"/"minimal lies" for NXDOMAINs. It actually break "minimal lies" NXDOMAIN established decoding implementations. Does someone know the TYPE65283 usage/purpose in this context ? If a compact negative response includes an NSEC RR whose type bitmap only includes NSEC and RRSIG, the response is is indistuishable from the case where the name exists but is an empty non-terminal. Adding a special entry in the type bitmap avoids that ambiguity and as a bonus provides an NXDOMAINish signal as a kind of compromise to those consumers who are all pitchforky about the RCODE. The spec currently calls that special type NXNAME. https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt <https://www.ietf.org/archive/id/draft-huque-dnsop-compact-lies-01.txt> The spec is still a work in progress and the NXNAME type does not have a codepoint. I believe TYPE65283 is being used as a placeholder. I think Christian made a comment to that effect on this list last week, although I think he may not have mentioned the specific RRTYPE that was to be used. If this has caused something to break, more details would be good to hear! Yes, I know about the draft to unbreak ENT. Thank you for the updated link with the latest version witch superset draft-huque-dnsop-blacklies-ent-01. NS1 use TYPE65281 for ENT. But in the observed case, the entry is not an ENT: ; <<>> DiG 9.18.13-1-Debian <<>> +norecurse @ns3.cloudflare.com +dnssec albert.ns.cloudflare.com. ; (4 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19880 ;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1232 ;; QUESTION SECTION: ;albert.ns.cloudflare.com. IN A ;; AUTHORITY SECTION: cloudflare.com. 300 IN SOA ns3.cloudflare.com. dns.cloudflare.com. 2304565806 1 2400 604800 300 albert.ns.cloudflare.com. 300 IN NSEC \000.albert.ns.cloudflare.com. RRSIG NSEC TYPE65283 albert.ns.cloudflare.com. 300 IN RRSIG NSEC 13 4 300 20230328112618 20230326092618 34505 cloudflare.com. vNF+qAaZUSSreKRLhYHfg5sn7qoP1SV+fZgmivg3qmJecz7Cvp69A/8I Ew0XPOuG8CPQGA5doswZdnOk9cfLRw== cloudflare.com. 300 IN RRSIG SOA 13 2 300 20230328112618 20230326092618 34505 cloudflare.com. fD4t5hWnE7js8/gRqJn2G833NCmjcyFqW+WJZnPqHX3SiKBlwUlX2wh8 UFj0ajbwuTVQpiJxZSb5hUNs9+KErQ== ;; Query time: 8 msec ;; SERVER: 162.159.0.33#53(ns3.cloudflare.com) (UDP) ;; WHEN: Mon Mar 27 12:26:18 CEST 2023 ;; MSG SIZE rcvd: 376 And for ENT, the response did not change from previous Cloudflaire implementation : all Cloudflare known types are added instead of RRSIG and NSEC. Ok, replying to myself. TYPE65283 is as you stated the place holder for a future NXNAME. So they silently break their previous implementation to implement half of this this draft. Their previous NXDOMAIN implementation correspond to draft ENT case, but they still implement their old way for ENT. Thank you for the pointer. Could you elaborate on the type of breakage you mentioned? What got broken, specifically? -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS measurement traffic etiquette
On 01. 01. 23 20:22, Olafur Gudmundsson wrote: Andreas, Do not bother to reach out to anyone these are unmanaged automated systems. I once ran an experiment where query names were unique (i.e. only used once and derived from the IP address the query was sent to) I was still receiving “repeat queries” a year later. The queries came from “cloud compute” instances that had nothing to do with the original query. Some of they queries came to the address that “sent" the query but others followed the delegation information for the domain The interesting fact was how periodic those queries were ==> this was generated by cron jobs by someone doing something DNS related … +1 to what Olafur said. It might very well be *me* doing automated PCAP replays in AWS, or anyone else doing DNS research, or some sort of QA on DNS software. And of course, malware. I guess blog post https://blog.apnic.net/2016/04/04/dns-zombies/ might give you some insight - at least you are not alone :-) Petr Špaček Internet Systems Consortium Olafur On Dec 21, 2022, at 9:27 PM, Andreas Ott wrote: About two months ago we retired a network lab at my work by disconnecting it from the internet, and at the time I (naively) removed from the lab domain name all forward DNS records pointing to assets that no longer exist. When it was still live we had forward DNS and reverse PTR records, and in most cases these matched, further, you were most likely to get back consistent answers on forward lookup of the reverse answer. About a week after the closure I also had the reverse DNS records removed from the ISP servers that were authoritative for the in-addr.arpa zones. All caching timeouts would have long occurred by now if an entity would honor what had been in the SOA records. If I query any old records today they do return NXDOMAIN for me. I did move the authoritative DNS servers to a much smaller setup thinking with the retirement of the assets there would be less traffic asking for them. However I am still seeing significant traffic querying forward records of PTR answers that got deleted a long time ago. It appears that this is "measurement" traffic that ignores getting "no" aka. NXDOMAIN as an answer, and keeps insisting to send the same queries over and over. I identified one "DNS labs" entity by name as one of the sources of these queries and will attempt to contact them. Most of the other now useless queries come from anonymous cloud compute based sources, like AWS nodes, which have generic reverse DNS entries and don't allow identifying the responsible party. To me it looks like the case of something being removed from the internet for good is not accounted for when constructing the measurement operations, if you get NXDOMAIN you interpret it as it must be some kind of brokenness and should be back soon, so you keep asking thousands more times until you get an answer? What are my best options to find out who is behind all this traffic when it comes from anonymous sources? For how long should I expect this query traffic to continue? Or is there a way to politely signal to the queries by any DNS parameters that the record is now gone for good and they can stop asking, and not something is broken that will be fixed soon? Thanks, andreas ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Trouble with qa.ws.igt.fiscal.treasury.gov
On 18. 10. 22 17:58, Viktor Dukhovni wrote: By the way is the validation workflow used in BIND written up somewhere as a separate document, or are the comments in the code the best way to understand how BIND validates names below a trust anchor (finding either a valid signature or an insecure delegation). Code is your guide :-) Now seriously: I don't think documenting it neither a) necessary b) good idea It can change between versions, and we certainly do not want people to depend on particular behavior. We want people to follow protocol! -- Petr Špaček Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Input from dns-operations on NCAP proposal
On 24. 05. 22 17:54, Vladimír Čunát via dns-operations wrote: On 23/05/2022 15.48, Thomas, Matthew via dns-operations wrote: Configuration 1: Generate a synthetic NXDOMAIN response to all queries with no SOA provided in the authority section. I believe the protocol says not to cache such answers at all. Some implementations chose to cache at least a few seconds, but I don't think all of them. Breaking caching seems risky to me, as traffic could increase very much (if the TLD was queried a lot). Configuration 2: Generate a synthetic NXDOMAIN response to all queries with a SOA record. Some example queries for the TLD .foo are below: It still feels a bit risky to answer in this non-conforming way, and I can't really see why attempt that. At apex the NXDOMAIN would deny the SOA included in the very same answer... Configuration 3: Use a properly configured empty zone with correct NS and SOA records. Queries for the single label TLD would return a NOERROR and NODATA response. I expect that's OK, especially if it's a TLD that's seriously considered. I'd hope that "bad" usage is mainly sensitive to existence of records of other types like A. Generally I agree with Vladimir, Configuration 3 is the way to go. Non-compliant responses are riskier than protocol-compliant responses, and option 3 is the only compliant variant in your proposal. Reasoning: Behavior for non-compliant answer is basically undefined because most RFCs do not describe what to do when a MUST condition is violated. It's hard to see how further evaluation of undefined behavior would help with determining further course of action. -- Petr Špaček @ Internet Systems Consortium ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] You live in a dump, Quoyle!
On 14. 02. 22 19:31, Viktor Dukhovni wrote: On Mon, Feb 14, 2022 at 09:48:09AM -0800, Fred Morris wrote: They're full (the DNS is full) of patterns and antipatterns. One fractal rabbit hole example: [0] [0] The DNS protocol allows multiple rvalues per type per oname. This works ok for e.g. A/, is disallowed for CNAME, and is... I'm not sure what it is for PTR records. Multiple PTR records are legal, but not a best (or even sound) practice. If an app is using hostnames in ACLs, it means you need to list them all. SMTP servers in some cases require clients to have FCrDNS (forward-canonicalised reverse DNS) names. This requires the DNS to return: client IP -> pick a PTR -> A/ RRSet including same IP this works even in the presence of multiple PTRs, provided they all resolve to address lists that contain the input address. Things tend to work poorly when automation adds a PTR record for every forward "name -> IP" mapping with a given address. One then sometimes ends up with absurdly large PTR RRsets that consume tens of KB in a TCP fallback after TC=1. Best practice is to choose just one "primary" name as the PTR for a given IP. Things tend to work poorly in other cases, too. My favorite is: $ dig -x 66.172.247.9 and associated $ dig cmts1-dhcp.longlines.com -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Command Line BIND Query to Delegating Name Servers Gives FORMERR - Is this Bad for Normal DNS Operations by Public Resolvers?
Hi Jason, I think you already answered yourself in your blog post: https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig/ > This behavior appears to violate “Any OPTION-CODE values not understood by a responder or requestor MUST be ignored.” from Section 6.1.2 of RFC 6891, but that is of small consolation for a non-working system. So yes, the authoritative server most likely has a bug. How to approach the operation in question - that's a hard problem. You can either try various contacts you find, you can ask send name of the domain here and ask them to contact you off-list. For TLDs this method can work surprisingly well :-) Good luck. Petr Špaček On 20. 09. 21 15:37, Jason Hynds wrote: Hi, I hope that the following conforms to the content expected of this list. I stumbled on some /name servers/ (a branch of a ccTLD, performing a public good service, as far as I know) which are giving a FORMat ERRor (FORMERR) to default /dig/ queries from the command line as described in the referenced webpage, see [1] below. The workaround of +nocookie described in the blog allows for a successful query response. /Nslookup/ queries work fine. I should mention that I have no administrative authority of the name servers showing this condition. I'm just noticed the behaviour whilst checking on a DNS hosting migration for a client of the name servers exhibiting the behaviour. Would someone be able to advise me on: 1. How bad it may be for an authoritative or delegating name server to be exhibiting this behaviour? 2. Does this potentially cause a resolution outage, or would a BIND server adjust and re-query in order to obtain a usable result? 3. Is the BIND server non-compliant, or the likely Microsoft DNS non-compliant, to an RFC? 4. How would I explain such an issue to a name server operator who I do not know? I appreciate any guidance provided. I apologies in advance if I violated any list policy. Thanks for any assistance. *REFERENCE* [1] FORMERR from Microsoft DNS Server for DIG. Posted January 20, 2017 at 11:18 PM MST by Kevin Locke <https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig <https://kevinlocke.name/bits/2017/01/20/formerr-from-microsoft-dns-server-for-dig>>. Regards, Jason Hynds. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Incomplete type bitmaps in NSEC(3) records and aggressive use of DNSSEC validated cache
On 08. 09. 21 11:12, Ruben van Staveren via dns-operations wrote: Last month or so I saw two domains, postnl.nl <http://postnl.nl> and minjenv.nl <http://minjenv.nl>, return incomplete NSEC3 records where existing records where omitted from the Type Bit Maps. This caused strange intermittent failures when a resolver was used that implements aggressive use of DNSSEC validated cache (RFC8198, 4 years old), e.g powerdns recursor 4.5.x. e.g., the minjenv has a mx record, but it is not listed in the NSEC3 you’ll get if you query for the non existent A/ record (only NS SOA RRSIG DNSKEY NSEC3PARAM) causing mail delivery failures until the TTL expires. postnl.nl <http://postnl.nl> has A/, but the NSEC3 seen for a nonexistent query only has NS SOA MX TXT RRSIG DNSKEY NSEC3PARAM It is not as such to contact the dns operators and persuade them to upgrade/fix their software used for DNSSEC signing, but more as should we do more analysis of this phenomenon and even have a dns flag day before even more resolvers and operators are going to implement RFC8198? There might be an issue by deliberately exploiting this and make websites/mail unreachable. Your estimate is correct, it's an old issue with F5 load balancers: https://support.f5.com/csp/article/K00724442 It's an security issue and affected parties should patch their systems. Detailed description of the problem can be found e.g. here: https://en.blog.nic.cz/2019/07/10/error-in-dnssec-implementation-on-f5-big-ip-load-balancers/ -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Injection Attacks Reloaded: Tunnelling Malicious Payloads over DNS
On 30. 08. 21 18:01, Vladimír Čunát wrote: On 30/08/2021 17.02, Petr Špaček wrote: [...] It is clear to this group of DNS experts, but I think we should lend a helping hand to DNS consumers and at least explain why consumers have to check everything. Is anyone interesting in writing a short RFC on this topic? That might serve as a good reference when some DNS expert points out to others why they shouldn't be doing what they're doing. However, I don't think we can expect a new RFC (by itself) to reduce these cases: *if* they were reading DNS RFCs, they would've surely realized that they need to be more careful. Only if people were reading all of the DNS RFCs, but that's IMHO an unreasonable expectation for DNS data _consumers_ who do not (and should not) care about the inner workings of DNS. The vast majority of DNS RFCs do not talk about data consumers, and the set of consumers is, I guess, almost disjoint with a set of DNS software vendors and server operators who are, I think, the primary target of the existing RFCs. I would have a hard time if I wanted to send a link to relevant docs to an application developer who wants to use DNS data provided by a resolver library today. Most likely, I would require a bunch of links to several documents, with a custom commentary to explain which parts in what order to read. For this reason, I think it would be good to have a document explicitly focused on consumers of DNS data. I think it should answer questions like: - What's reasonable input to the resolver library? (E.g., an attacker might trick your code into calling the library with an attacker-provided input, etc.) - What should you do with resolver library output? (Beware: it's binary, check syntax, it might be from the attacker's server, etc.) -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Injection Attacks Reloaded: Tunnelling Malicious Payloads over DNS
On 17. 08. 21 22:17, Tony Finch wrote: Viktor Dukhovni wrote: If applications make unwarranted assumptions about the syntax of DNS replies, that's surely an application bug, rather than an issue in DNS. I particularly liked this paper because it's a really good example of a common cause of security problems: when it isn't clear whose responsibility it is to enforce an important restriction, in this case, hostname syntax vs. DNS name (lack of) syntax. And different implementers have made different choices, for instance whether the libc stub resolver enforces hostname syntax or not. And another classic vulnerability generator: standard APIs that make it easy for non-specialists to step on every rake in the grass. In this case, if an application needs something more fancy than getaddrinfo(), it has to contend with the low-level resolver API which is just about better than nothing for parsing DNS packets, but certainly won't help you handle names that ought to have restricted syntax (service names, mail domains, etc...) So I don't think the problems can be dismissed as simply application bugs: the problems come from mismatches in expectations at the boundary between the DNS and the applications. And the DNS is notorious (the subject of memes!) for being far too difficult to use correctly. I'm late to this thread, but ... IMHO authors of the paper highlight a valid point: There is no _explicit_ guidance for consumers of DNS data which explains that results of DNS resolution process must be treated very carefully. It is clear to this group of DNS experts, but I think we should lend a helping hand to DNS consumers and at least explain why consumers have to check everything. Is anyone interesting in writing a short RFC on this topic? -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] why does that domain resolve?
On 04. 06. 21 18:56, Paul Vixie wrote: On Fri, Jun 04, 2021 at 12:22:10PM -0400, Anthony Lieuallen via dns-operations wrote: This is a question of being parent- vs. child- centric. The parents in the DNS tree delegate correctly. The fact that the children delegate incorrectly can be a small or non-issue depending on resolver. those NS RRs are authoritative at the apex of the child, but not at the leaf of the parent. this means they have higher credibility, and also that they can be DNSSEC signed and validated. credibility and validity _matter_. Google Public DNS uses only parent delegations ( https://developers.devsite.corp.google.com/speed/public-dns/docs/troubleshooting/domains#delegation ). Largely for issues like this: the child delegations can be wrong, but for the domain to work at all, the parent delegations must be correct. without broad and deep failure, the quality of apex NS names will never improve. (Resolvers that choose to use child delegations will likely in this case discover that these delegations are bogus, and be left with only the valid delegations, from the parent.) at which point they should return SERVFAIL. failure _matters_. Personally, with all the experience we have in 2021, I find the historic decision to put authoritative NS RRs to the child side to be a poor choice, to the point of being indefensible. As Anthony points out, the parent version of NS has to work anyway. It forces me to think a better course of action would be ignoring child-side NS instead of adding complex asynchronous code paths to validate child NS, which is not technically needed. I mean - why waste resources on improving something which is not even needed? (To be clear: This is my personal opinion, and I'm sure some of my colleagues at ISC will disagree violently.) -- Petr Špaček ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] [Ext] Possibly-incorrect NSEC responses from many RSOs
On 03. 03. 21 7:35, Viktor Dukhovni wrote: On Wed, Mar 03, 2021 at 06:04:45AM +, Paul Vixie wrote: A laudable goal, but exposing RRSIG as a bare RRset one can query does not look like a viable path forward. So I don't see this happening. You described several cases in which rrsigs wouldn't be stable enough. in my own role as signer, the rrsigs are refreshed by cron on sundays, and so I think we're both looking at anecdotes here, worst or best case scenarios, and what you don't see happening isn't totally compelling. Another basic issue with RRSIG queries, already mention by Brian Dickson is that there's no way to ask for the RRSIG of a specific RRSet, one can (at present) only ask for all (or any subset) of the RRSSIGs associated with a given name, and returning them all (at least over UDP) is often not a good idea. So, as noted by Tony Finch, the DNSSEC-oblivious iterative resolver may (as already recommended) get back from its authoritative upstream only a random representative record from the authoritative upstream (just as with ANY queries), which is again often not the RRSIG you're looking for. For the records "respond with a randomly selected RRSIG" is implemented in Knot DNS 3.0.0, released in September 2020 [1]. Apparently sky did not fall. [1] https://www.knot-dns.cz/2020-09-09-version-300.html -- Petr Špaček @ ISC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Quad9 DNSSEC Validation?
On 28. 02. 21 9:39, Florian Weimer wrote: * Winfried Angele: I guess they've turned off validation for irs.gov because of a former failure. I think it goes beyond that. It extends to GOV and MIL as a whole, it seems. In my experience negative trust anchors for big parts of MIL and/or GOV are way more common, let's not pick specifically on Quad9. For periods of time I have seen with other big resolver operators as well. IMHO resolver market economics are going against DNSSEC security. If resolution does not work on one operator people routinely switch to other where it "works", either because they do not validate at all, or because their ops team already added negative trust anchor. The only way to fix this is mutual agreement among operators to stop working around someone else's mistakes. Are there operators willing to participate in such effort? -- Petr Špaček @ ISC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] CLI Tool for DoH
On 29. 09. 20 3:30, cjc+dns-o...@pumpky.net wrote: > Looking for a command line tool to do testing of DoH. Something like > dig or drill with DoH support. I suspect there's a Python tool or > the like out there somewhere, but my google-fu is failing. > > Don't want to re-invent the wheel if I don't have to. Knot DNS 3.0 has DoH support in kdig: Examples for various DoH server implementations: $ kdig @1.1.1.1 +https example.com. $ kdig @193.17.47.1 +https=/doh example.com. $ kdig @8.8.4.4 +https +https-get example.com. Version 3.0 was released couple weeks ago and might not be in Linux distributions yet. Packages for common distributions and also source code is available from https://www.knot-dns.cz/download/ -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01
On 15. 09. 20 13:16, Yasuhiro Orange Morishita / 森下泰宏 wrote: > Petr-san, > > Thank you for your clarification :-). > But I have another question. > > In my understanding, the official spelling of the day is "DNS flag > day". In the 2019 webpage, all of the spellings is lowercase. > > But the spellings are not unified in the 2020 webpage. > The ... and top of the official webpage's spellings > are lowercase, but the content includes the capitalized one. > > This may be a trivial question, but is important for providing the > information to the related parties, I think. > It would be helpful if you could clarify it. It would be great if someone with strong opinion or expertise in English could create merge request to unify it: https://github.com/dns-violations/dnsflagday I'm not native speaker so I would have to flip coin to decide :-) Petr Špaček @ CZ.NIC > > -- Orange > > From: Petr Špaček > Subject: Re: [dns-operations] DNS Flag Day 2020 will become effective on > 2020-10-01 > Date: Wed, 9 Sep 2020 10:39:54 +0200 > >> Hi Orange-san, >> >> On 09. 09. 20 7:00, Yasuhiro Orange Morishita / 森下泰宏 wrote: >>> Hi Petr-san, >>> >>> I tested some auth servers and resolvers by online checker in the >>> official website. >>> >>> But I feel that both of them display "GO" even if EDNS buffer size is >>> not set to 1232. Is this by design? >> >> This is fine as long as all the authoritative servers work over DNS-over-TCP >> and respect EDNS buffer size sent by resolvers. >> >> The reason is that the effective EDNS buffer size is the minimal value from >> (client, server) pair. Consequently, once resolvers update their defaults, >> the change will become effective without any changes on the auth side. >> >> Lower EDNS buffer size might force fallback to TCP if auths are sending >> longer answers - that's why the web tester is checking availability of >> DNS-over-TCP. >> >> I hope it helps. >> >> If you can point to a section on https://dnsflagday.net/2020/ which should >> contain this answer I will be happy to add it there. >> >> Have a nice day! >> Petr Špaček @ CZ.NIC >> >> >> >>> >>> -- Orange >>> >>> From: Petr Špaček >>> Subject: [dns-operations] DNS Flag Day 2020 will become effective on >>> 2020-10-01 >>> Date: Tue, 8 Sep 2020 12:04:39 +0200 >>> >>>> Dear DNS people. >>>> >>>> We are happy to announce next step for DNS Flag Day 2020. >>>> >>>> Latest measurements indicate that practical breakage caused by the >>>> proposed change is tiny [1]. In other words we can conclude that the >>>> Internet is ready for the change. >>>> >>>> The long delayed DNS Flag Day will become effective on 2020-10-01 (October >>>> 1st 2020)! >>>> >>>> Detailed information including test tools and technical description of the >>>> change can be found at https://dnsflagday.net/2020/ . >>>> >>>> For questions please use dns-operations@lists.dns-oarc.net mailing list. >>>> >>>> [1] >>>> https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183 >>>> >>>> -- >>>> Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] QTYPEs 65 and 65479
On 16. 09. 20 10:04, Greg Choules via dns-operations wrote:> Recently, whilst looking for something else, tcpdump on one of our recursive servers showed we are receiving queries with (from its point of view) unrecognised types. Wireshark doesn't have a decode for them yet either. There aren't many, yet. But it's more than just noise. > A quick reverse lookup on the sources shows them all to be iPhone X or later. > > Can anyone shed some light on what these are and whether we should be doing > something about them? QTYPE 65 is new HTTPS Binding RR type. (https://www.iana.org/go/draft-ietf-dnsop-svcb-https-00) 65479 is in Private use range so it is hard to tell. (See https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml) -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01
On 11. 09. 20 6:47, Paul Vixie wrote: > Ondřej Surý wrote on 2020-09-10 21:25: >> Paul, >> >> do you actually believe that shouting the same thing over and over will >> achieve anything? > > no, of course not. > >> >> We’ve heard you before, we’ve listened to you, we’ve considered your >> arguments, and you haven’t convinced us and there’s a consensus between the >> vendors to go ahead with the change because it’s beneficial for the DNS >> ecosystem. > > i think you changed the definition of the words "we" and "us" midsentence. My non-native-speaker reading suggests it is in both cases refering to "consensus between the vendors". Let's not get distracted. > >> >> Sending multiple shouts to mailing lists, issue tracker, etc... because you >> have different opinion is not helpful to the DNS community nor to the cause. >> We are as much DNS experts as you are. > > i don't think all of the people i intend to address here have heard my views. > they may think that dns-oarc speaks for the community rather than for a small > self selected team. they may also think that i as co-founder of dns-oarc can > be relied upon to support this activity. so, thank you for your concern for > my reputation (or my sanity, if that's also true), but i'll continue. if you > wish to actually respond to any of my claims, i am listening. if you wish to > continue to ignore those claims, i will cope. > > this isn't a flag day and shouldn't be called that. it cheapens the term. > > 1232 is a cargo-cult number. we must not revere as holy those things which > fall out of the sky. I disagree. That number is based on real-world experiance of today's DNS resolver vendors - based on their experience with un/reliability of real configurations. Later on research https://indico.dns-oarc.net/event/36/contributions/776/attachments/754/1277/DefragDNS-Axel_Koolhaas-Tjeerd_Slokker.pdf shown that the estimate based on vendor's experience was pretty good. > > there is a right way to deprecate fragmentation. it would not involve adding > config complexity. Well, this is not adding any complexity at all! It is the other way around: a] All the configuration knobs for EDNS buffer size were already in the software, vendors are _just changing default values_ in their own software (as opposed to adding new options). b] This effort actually enables vendors to remove code/fallback logic which attempts to guess a working EDNS buffer size, thus _reducing_ complexity of the DNS software and real-world operations. > > there is a right way to reach consensus. it's an RFC draft, not a github repo > for the initiated. > > in the testing referenced by the "flagday2020" web page, there was no > significant difference in loss between 1200 and 1400. there will be a > significant difference in truncation and tcp retry. I think readers on this list can make conclusions for themselves, there is no need to hand wave. Slides are here: https://indico.dns-oarc.net/event/36/contributions/776/attachments/754/1277/DefragDNS-Axel_Koolhaas-Tjeerd_Slokker.pdf Do not forget to add ethernet + IP + UDP headers to EDNS buffer size when comparing numbers from slides, i.e. 14 + 20 + 8 + 1232 = MTU 1272 B vs. MTU 1442. See slides 19 and 20 (= recursive resolvers) and do the math. Is lowering failure rate roughly by 0.8 % for IPv4 and by 0.33 % for IPv6 significant or not? That's matter for each DNS vendor to decide because in the end it is the vendors who have to support the software and deal with all the obscure failure reports. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01
Hi Orange-san, On 09. 09. 20 7:00, Yasuhiro Orange Morishita / 森下泰宏 wrote: > Hi Petr-san, > > I tested some auth servers and resolvers by online checker in the > official website. > > But I feel that both of them display "GO" even if EDNS buffer size is > not set to 1232. Is this by design? This is fine as long as all the authoritative servers work over DNS-over-TCP and respect EDNS buffer size sent by resolvers. The reason is that the effective EDNS buffer size is the minimal value from (client, server) pair. Consequently, once resolvers update their defaults, the change will become effective without any changes on the auth side. Lower EDNS buffer size might force fallback to TCP if auths are sending longer answers - that's why the web tester is checking availability of DNS-over-TCP. I hope it helps. If you can point to a section on https://dnsflagday.net/2020/ which should contain this answer I will be happy to add it there. Have a nice day! Petr Špaček @ CZ.NIC > > -- Orange > > From: Petr Špaček > Subject: [dns-operations] DNS Flag Day 2020 will become effective on > 2020-10-01 > Date: Tue, 8 Sep 2020 12:04:39 +0200 > >> Dear DNS people. >> >> We are happy to announce next step for DNS Flag Day 2020. >> >> Latest measurements indicate that practical breakage caused by the proposed >> change is tiny [1]. In other words we can conclude that the Internet is >> ready for the change. >> >> The long delayed DNS Flag Day will become effective on 2020-10-01 (October >> 1st 2020)! >> >> Detailed information including test tools and technical description of the >> change can be found at https://dnsflagday.net/2020/ . >> >> For questions please use dns-operations@lists.dns-oarc.net mailing list. >> >> [1] >> https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183 >> >> -- >> Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] DNS Flag Day 2020 will become effective on 2020-10-01
Dear DNS people. We are happy to announce next step for DNS Flag Day 2020. Latest measurements indicate that practical breakage caused by the proposed change is tiny [1]. In other words we can conclude that the Internet is ready for the change. The long delayed DNS Flag Day will become effective on 2020-10-01 (October 1st 2020)! Detailed information including test tools and technical description of the change can be found at https://dnsflagday.net/2020/ . For questions please use dns-operations@lists.dns-oarc.net mailing list. [1] https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-673489183 -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] FlagDay 2020 UDP Size
On 04. 08. 20 18:26, Viktor Dukhovni wrote: > On Mon, Aug 03, 2020 at 09:44:17PM +0100, Tony Finch wrote: > >> jack tavares wrote: >>> >>> I have gone through the archives, is there consensus on this at this time? >>> For both the date of Flag Day (Which appears to be 1st October 2020, >>> pending confirmation from google) >>> and for the suggested default? >> >> There are some interesting measurements in >> https://rp.delaat.net/2019-2020/p78/report.pdf > > What I haven't seen reported is measurements of problems that occur when > the EDNS(0) UDP buffer size is *too small*. > > There are lots of measurements with lost UDP datagrams when the buffer > size is too large, but given a "too small" buffer size servers truncate > responses, and some don't also support TCP. This causes lookup failures > when the buffer size is sufficiently low. You are right, and that's exactly reason why https://dnsflagday.net/2020/ web test tool focuses on TCP availability. It is way easier to test if "TCP works for all auths for a given domain" than to test if "IP fragments can traverse all relevant paths over the Internet for all relevant answer sizes". The second option is just infeasible/madness. Once we get TCP working we do not need to worry that too small EDNS buffer will break something, it only might make things less effective... Of course once proponents of perfect-EDNS-buffer-size-detection-methods implement them in a resilient and scalable way we can can move on to these (at the moment hypothetical) better methods and get rid of these slight ineffectivity. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] RFC 6975 (was: Re: Algorithm 5 and 7 trends (please move to 8 or 13))
On 09. 06. 20 7:16, Brian Somers wrote: > We turned this up again on Friday and turned it down yet again today. There > are issues with sacks.com and I’m told there are a bunch of other support > tickets (although details haven’t been given yet). > > $ dig +noall +answer +tries=1 +ednsopt=5:08 zacks.com @208.65.116.45 > > ;; connection timed out; no servers could be reached > $ dig +noall +answer +tries=1 +subnet=1.2.3.0/24 zacks.com @208.65.116.45 > ;; connection timed out; no servers could be reached > $ dig +noall +answer +tries=1 zacks.com @208.65.116.45 > zacks.com. 1200IN A 208.65.116.3 > > I’m now looking at re-implementing the code we had in place for EDNS > probing prior to flag day 2019: > - FORMERR/SERVFAIL/NOTIMP - try without any EDNS codes > - No response - try with no EDNS codes on the third attempt Please don't do that, it would further cement DNS protocol in 1998. If you really really _really_ need a simple "workaround" send RFC 6975 signals only to root servers. That should be enough to provide researchers with data how RFC 6975+algo deployment is going without breaking weird auths. See below for long-term proposal. > > Still trying to think of a way to make this negatively affect the domain that > misbehaves without negatively affecting our support folks :( > > Any tips around this would be helpful (any resolvers do ECS probing for > example?). I think we (= resolver vendors) should coordinate first. There is no rush for RFC 6975 deployment so we can plan and act together to finally get DNS protocol into 21st century. For example if we coordinated RFC 6975 deployment on major resolvers could push auths to action. DNS Flag day 2019 had major impact and DAU/DHU/N3U options are opportunity to clear up the rest. If needed we can modify https://gitlab.labs.nic.cz/knot/edns-zone-scanner/ to test also DAU/DHU/N3U options and compile list of shame, unfortunatelly it is the only thing which seems to work. (I'm happy to help with that but I'm sick at the moment, let's talk later...) Petr Špaček @ CZ.NIC > > — > Brian > >> On Jun 3, 2020, at 1:52 AM, Petr Špaček wrote: >> >> On 03. 06. 20 7:18, Brian Somers wrote: >>> On May 28, 2020, at 10:35 PM, Viktor Dukhovni >>> wrote: >>>> >>>> Enough time has passed since the need to abandon SHA-1 has become >>>> more pressing to discern at least a couple short-term trend-lines. >>> >>> Along these lines, have any of the large resolvers implemented >>> RFC 6975 (DAU/DHU/N3U EDNS codes)? OpenDNS/Cisco >>> enabled these a couple of weeks ago but had to disable them >>> pending qq.com being fixed (its nameservers returned >>> SERVFAIL). Now that the fix is there, we’re planning to turn >>> it up again at the end of the week. >>> >>> Just curious about its adoption… it feels like we testing new >>> waters here. >> >> I believe you are the first, congrats! :-) >> >> It was not feasible to implement before https://dnsflagday.net/2019/ and >> then, you know, nobody asked for it ... >> >> Please report other issues you eventually encounter, I would bet there will >> be couple more lurking somewhere. >> >> -- >> Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] EDNS client-subnet best practice?
On 03. 06. 20 14:44, Chris Adams wrote: > What is considered current best practice for recursive servers on > enabling EDNS client-subnet? > > I ask because I have a couple of recursive DNS servers at an independent > telephone company that are getting different answers for a certain large > website. The servers are in the same subnet, but one gets an IP > apparently in another country, while the other gets an IP in a nearby > state. The servers are configured identically (CentOS 7 with Unbound). > > I emailed the website's NOC, and their response was that the issue was > that "Most likely the issue is due to EDNS not being turned on with your > DNS server." I assume they were talking about EDNS client-subnet > (because they then gave an example dig with +subnet set). > > These servers are not configured to send client-subnet to anybody > (pretty much default Unbound config). They aren't serving clients from > outside the AS - I generally think of client-subnet as something you'd > use on a DNS server with a wide range of clients. Is it expected that I > should be enabling EDNS client-subnet on recursive servers? > > I do have some recursive servers that have a large set of clients (where > client-subnet might be useful) - should I just enable it for all > requests? In Unbound terms, enable "client-subnet-always-forward"? In my view ECS is only useful if routing paths between: a) resolver & Internet b) client sending query to resolver & Internet are different. Netmasks in Unbound's max-client-subnet-ipv4/6 would ideally be as short as possible to cover just the prefix where causes the routing to differ and nothing more. As for client-subnet-always-forward... I do not understand what the manual attempts to say :-/ -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] RFC 6975 (was: Re: Algorithm 5 and 7 trends (please move to 8 or 13))
On 03. 06. 20 7:18, Brian Somers wrote: > On May 28, 2020, at 10:35 PM, Viktor Dukhovni wrote: >> >> Enough time has passed since the need to abandon SHA-1 has become >> more pressing to discern at least a couple short-term trend-lines. > > Along these lines, have any of the large resolvers implemented > RFC 6975 (DAU/DHU/N3U EDNS codes)? OpenDNS/Cisco > enabled these a couple of weeks ago but had to disable them > pending qq.com being fixed (its nameservers returned > SERVFAIL). Now that the fix is there, we’re planning to turn > it up again at the end of the week. > > Just curious about its adoption… it feels like we testing new > waters here. I believe you are the first, congrats! :-) It was not feasible to implement before https://dnsflagday.net/2019/ and then, you know, nobody asked for it ... Please report other issues you eventually encounter, I would bet there will be couple more lurking somewhere. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] A strange DNS problem (intermittent SERVFAILs)
On 02. 06. 20 23:39, Guillaume LUCAS wrote: > Hello, > > I just subscribed to this list, so sorry for the thread breaking. > >> Several users on Twitter reported problems accessing Banque >> Populaire (a French bank) > > Since 1 pm (UTC+2) this day (June 2nd), it works from CloudFlare, FDN,… > everywhere. Customers confirm that on Twitter [*]. But > nsisp1.i-bp.banquepopulaire.fr. still returns REFUSED for NS/SOA and > over-TCP queries for www.banquepopulaire.fr or > www.ibps.bpaca.banquepopulaire.fr. So, I don't understand what the root > cause of the problem was… > > www.caisse-epargne.fr, a french bank of the same banking group as Banque > Populaire, had a similar problem in the same period of time: the two > name servers for this DNS zone, nslp1.gcetech.net and nslp2.gcetech.net, > returned NODATA for NS/SOA queries (but they answered to over-TCP > queries). Unbound 1.9 could resolve this name, Unbound 1.6 couldn't. > Technical details (in french): <http://shaarli.guiguishow.info/?TqC4Ug>. > Like Banque Populaire, name resolution works since 1 pm (UTC+2) today. > nslp(1|2).gcetech.net still returns NODATA… So, again, I don't > understand what the root cause was… > > @Matthew: you said « bcpe.fr is delegated to the same servers which do > not answer NS queries ». It's wrong. bpce.fr have always been delegated > to dns(1|2).bpce.fr . These servers have always answered to NS/SOA and > TCP queries. Name servers for banquepopulaire / bpce.fr / groupebpce.com > = dns(1|2).bpce.fr, name servers for www.banquepopulaire.fr / > www.ibps.*.banquepopulaire.fr / www.*.banquepopulaire.fr = > nsisp1.i-bp.banquepopulaire.fr. On last Saturday, I was able to > reproduce your result for "dig @1.1.1.1 banquepopulaire.fr ns": > CloudFlare always aswered SERVFAIL (or didn't answer). CloudFlare was > the only resolver in this case. So, like you observed, it's normal that > CloudFlare stop the resolution at this point, but what about the other > resolvers? Please let's not get to shaming resolvers here, the delegation chain for www.banquepopulaire.fr. is a utter mess. The subdomain "www" is delegated to IP addresss 91.135.182.250 which answers REFUSED to most queries, so I guess it is a dumb or misconfigured load-balancer. The only query which kind of works is: $ dig +nord @91.135.182.250 www.banquepopulaire.fr ; <<>> DiG 9.16.3 <<>> +nord @91.135.182.250 www.banquepopulaire.fr ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32881 ;; flags: qr aa ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.banquepopulaire.fr.IN A ;; ANSWER SECTION: www.banquepopulaire.fr. 30 IN A 91.135.183.180 www.banquepopulaire.fr. 30 IN A 91.135.183.180 ;; Query time: 56 msec ;; SERVER: 91.135.182.250#53(91.135.182.250) ;; WHEN: St čen 03 10:33:51 CEST 2020 ;; MSG SIZE rcvd: 83 Yes, the duplicate RR is really here! Fresh DNSViz analysis: https://dnsviz.net/d/www.banquepopulaire.fr/XtdfDQ/dnssec/ www.banquepopulaire.fr zone: The server(s) were not responsive to queries over TCP. (91.135.182.250) www.banquepopulaire.fr/DNSKEY: The response had an invalid RCODE (REFUSED). (91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_512_D_K) www.banquepopulaire.fr/MX: The response had an invalid RCODE (REFUSED). (91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_512_D_K) www.banquepopulaire.fr/NS: The response had an invalid RCODE (REFUSED). (91.135.182.250, UDP_-_EDNS0_4096_D_K) www.banquepopulaire.fr/SOA: No response was received from the server over TCP (tried 3 times). (91.135.182.250, TCP_-_EDNS0_4096_D) www.banquepopulaire.fr/SOA: The response had an invalid RCODE (REFUSED). (91.135.182.250, UDP_-_EDNS0_4096_D_K, UDP_-_EDNS0_4096_D_K_0x20) www.banquepopulaire.fr/TXT: The response had an invalid RCODE (REFUSED). (91.135.182.250, UDP_-_EDNS0_4096_D_K) From my perspective it is sufficiently broken to warrant fix on auth side. Maybe resolver operators decided to workaroud it on their side, which would be the most unfortunate. The auth operator should bear the cost of their own misconfigurations, not resolver operators. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] DNSSEC signing bugfix in Knot DNS 2.9.5 (was: DNSSEC Validation Failures for RIPE NCC Zones)
On 22. 05. 20 14:22, Anand Buddhdev wrote: > Dear colleagues, > > Yesterday afternoon (21 May 2020), our DNSSEC signer rolled the Zone Signing > Keys (ZSKs) of all the zones we operate. Unfortunately, a bug in the signer > caused it to withdraw the old ZSKs soon after the new keys began signing the > zones. > > Validating resolvers may have experienced some failures if they had cached > signatures made by the old ZSKs. > > We apologise for any operational problems this may have caused. We are > looking at the issue with the developers of our Knot DNS signer to prevent > such an occurrence in the future. Knot DNS 2.9.5 with fix for this particular problem was released and we encourage all users encouraged to upgrade. Full release announcement: https://lists.nic.cz/pipermail/knot-dns-users/2020-May/001815.html The bug sometimes caused automatic key roll-overs to be finished too early, leading to temporary DNSSEC validation failures. More detailed problem description + workaround: https://lists.nic.cz/pipermail/knot-dns-users/2020-May/001813.html We apologize to everyone affected. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(
Beware, wall of text ahead. On 21. 04. 20 10:52, Brian Dickson wrote: > On Tue, Apr 21, 2020 at 1:04 AM Petr Špaček <mailto:petr.spa...@nic.cz>> wrote: > > On 21. 04. 20 9:00, Paul Vixie wrote: > > On Tuesday, 21 April 2020 06:20:04 UTC Petr Špaček wrote: > > Unfortunatelly I can't, we never got to the root cause. > > It is the same story again and again: > Probably one of ISPs in chain on the affected link was doing weird stuff > with big packets. We as Knot Resolver developers were not "their customer" > but merely "supplier of their customer" so they refused to talk to us, and > their actual customer lost interest as soon as it started to work reliably > for them. That's all we have, i.e. nothing. > > > I'm confused by the use of the definite and indefinite, which are in > disagreement. > ..."the root cause" suggests a single instance that was being investigated. > ..."again and again" suggests multiple occurrences. > If it was multiples, was it all involving a single network, perhaps? > Or can you clarify if this was only a single instance of this happening? Sorry, a non-native English speaker here. We have encountered this on various networks all over place, mainly because Turris Omnia comes with DNSSEC-validating resolver by default. > Can you share any other diagnostic information or observations? > Was there fragmentation, or just packet loss above a certain size? > (Fragmentation would suggest smaller-than-expected MTU or perhaps tunnels, > while packet loss would suggest possible MTU mismatch on a single link.) > > Understanding whether this was operator error by an ISP, versus some other > non-error situation with real MTU below 1400, is important. Except for rare cases users tell us just "this magic number works, bye" so there is really nothing to share. If I had data I would share them. Hopefully my last presentation about server selection algorithms @ DNS-OARC 31 [1] is strong enough indicator that my team publishes everything of value which comes from our tests and experience, even if it does not show us our products or decisions in the best light. [1] https://indico.dns-oarc.net/event/32/contributions/711/ > This all has very real and very serious consequences to the entirety of the > DNS ecosystem. Is this a generic statement, or does it relate specifically to difference between 1410 and 1232? If so, do you have data to support such strong statement? > No, I did mean "would": > - OpenDNS's experience says that in data centers 1410 works. > - Our experience says that outside of data centers 1410 does not always > work. > > > Are there additional instances where 1410 did not work, or are you using that > single instance to support the "does not always work" position? > > > Let's be precise here. The proposal on the table is to change _default > values in configuration_. > > Nobody is proposing to impose "arbitrary maximum response buffer size" > and weld it onto DNS software. Vendors are simply looking for defaults which > work for them and their customer/user base. > > > Actually, it is both. > > Just as a reminder: UDP responses will be sent with a not-to-exceed size of > MIN(configured authority max, requestor's max from EDNS0 UDP_BUFSIZE). > If the client has a smaller value than the server, that is the maximum size > of a response that the server will send. > If that value is too small, it has consequences. > If that value is used for that client, to all servers, it has consequences > for all servers traffic to that client. > If that value comes from default, and is used on the vast majority of that > packages' operators, that affects traffic from all servers to all of those > operators. > If that package represents a large portion of the traffic from resolvers, > that's a big deal. > > A significant shift in the amount of TCP traffic could occur due to TC=1 > responses, with a non-linear relationship between the apparent decrease in > MTU, and the amount of TCP traffic. > > Particularly with large RSA key sizes and large signatures, and a large > proportion of DNSSEC traffic, the impact could be severe. > > If a DNS authority operator were to begin providing DNSSEC for their customer > base, DNSSEC deployment could jump from 1%-2% to 40% overnight. > (Hint: at least one major DNS hosting provider has strongly suggested this is > likely to occur quite soon.) Well, in that case I strongly suggest this unspecified large operator should with an ECC algorithm instead of RSA ;-) > > And a 5% to 10% decrease in actual MTU (offered by clients in EDNS), the > proporti
Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(
On 21. 04. 20 9:00, Paul Vixie wrote: > On Tuesday, 21 April 2020 06:20:04 UTC Petr Špaček wrote: >> On 20. 04. 20 22:22, Viktor Dukhovni wrote: >>> On Mon, Apr 20, 2020 at 12:52:49PM -0700, Brian Somers wrote: >>>> ... >>>> At Cisco we allow up to 1410 bytes upstream and drop fragments. We >>>> prefer IPv6 addresses when talking to authorities. We’ve been doing >>>> this for years (except for a period between Feb 2019 and Aug 2019). >>>> Zero customer complaints.> >>> So perhaps the advice to default to 1232 should be revised: >>> ... >> >> Please let's not jump to conclusions, especially because of single anecdote. > > my own anecdotes are not singular, but your point is taken. > >> As Knot Resolver developer I counter with another anecdote: >> We have experience with networks where ~ 1300 buffer was workable minimum >> and 1400 was already too much. > > i hope you can say much more than this, about that. Unfortunatelly I can't, we never got to the root cause. It is the same story again and again: Probably one of ISPs in chain on the affected link was doing weird stuff with big packets. We as Knot Resolver developers were not "their customer" but merely "supplier of their customer" so they refused to talk to us, and their actual customer lost interest as soon as it started to work reliably for them. That's all we have, i.e. nothing. >> As for OpenDNS experience - I'm hesistant to generalize. According to >> https://indico.dns-oarc.net/event/33/contributions/751/attachments/724/1228/ >> 20200201_DNSSEC_Recursive_Resolution_From_the_Ground_Up.pptx DO bit is sent >> out only since Sep'2018, and presumably from resolvers in data centers. > > i understood the opendns team to say that they also used 1410 as the maximum > buffer size in responding to downstream queries. perhaps they can expand here. > >> Results would be very different for recursive resolver deployment deep in >> corporate networks/on the last mile. > > that statement stretches the verb "would" too far. did you mean "could"? No, I did mean "would": - OpenDNS's experience says that in data centers 1410 works. - Our experience says that outside of data centers 1410 does not always work. > i > think we can learn a lot from authoritative responses (how many are followed > by retries or TCP or a complaint?) and recursive responses (same question). > >> DNS-over-TCP is mandatory to implement so please let's stop working it >> around. > > +1. no part of this debate is for me an argument against mandated TCP and > recommended DoT. those should be assumed on all timelines. however, that does > not justify an arbitrary maximum response buffer size such as 1232. all of > the > math that leads to 1232 is unsuitable for DNS's use. Let's be precise here. The proposal on the table is to change _default values in configuration_. Nobody is proposing to impose "arbitrary maximum response buffer size" and weld it onto DNS software. Vendors are simply looking for defaults which work for them and their customer/user base. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(
On 20. 04. 20 22:22, Viktor Dukhovni wrote: > On Mon, Apr 20, 2020 at 12:52:49PM -0700, Brian Somers wrote: > >> On Apr 18, 2020, at 9:39 PM, Viktor Dukhovni wrote: >>> Is there any new information on whether something closer to 1400 is >>> generally safe also for IPv6? >> >> At Cisco we allow up to 1410 bytes upstream and drop fragments. We prefer >> IPv6 >> addresses when talking to authorities. We’ve been doing this for years >> (except for >> a period between Feb 2019 and Aug 2019). Zero customer complaints. > > So perhaps the advice to default to 1232 should be revised: > > https://dnsflagday.net/2020/#dns-flag-day-2020 > > I see some movement in that direction, with the recommendation of 1220 > in: > > > https://tools.ietf.org/html/draft-fujiwara-dnsop-avoid-fragmentation-00#section-3 > >o Full-service resolvers SHOULD set EDNS0 requestor's UDP payload > size to 1220. (defined in [RFC4035] as minimum payload size) > >o Authoritative servers and full-service resolvers SHOULD choose > EDNS0 responder's maximum payload size to 1220 (defined in > [RFC4035] as minimum payload size) > > revised in -01/-02 to: > > > https://tools.ietf.org/html/draft-fujiwara-dnsop-avoid-fragmentation-02#section-4 > >o [RFC4035] defines that "A security-aware name server MUST support > the EDNS0 message size extension, MUST support a message size of > at least 1220 octets". Then, the smallest number of the maximum > DNS/UDP payload size is 1220. > >o However, in practice, the smallest MTU witnessed in the > operational DNS community is 1500 octets. The estimated size of a > DNS message's UDP headers, IP headers, IP options, and one or more > set of tunnel, IP-in-IP, VLAN, and virtual circuit headers, SHOULD > be 100 octets. Then, the maximum DNS/UDP payload size may be > 1400. > > While darpa.mil still need to enable TCP, a more generous buffer size > that avoids IPv6 issues will also avoid unnecessary and potentially even > unavailable TCP fallback. So I'm in favour of 1400 or 1410 (do we need > more empirical evidence from other vantage points?) assuming those are > also safe. > > If the IPv6 obstacles are typically closer to the resolver than the > authoritative server, just Cisco's experience may not be enough to make > a definite conclusion. Please let's not jump to conclusions, especially because of single anecdote. As Knot Resolver developer I counter with another anecdote: We have experience with networks where ~ 1300 buffer was workable minimum and 1400 was already too much. As for OpenDNS experience - I'm hesistant to generalize. According to https://indico.dns-oarc.net/event/33/contributions/751/attachments/724/1228/20200201_DNSSEC_Recursive_Resolution_From_the_Ground_Up.pptx DO bit is sent out only since Sep'2018, and presumably from resolvers in data centers. Results would be very different for recursive resolver deployment deep in corporate networks/on the last mile. DNS-over-TCP is mandatory to implement so please let's stop working it around. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Any known AD=1 intolerant iterative resolvers?
On 15. 04. 20 7:23, Florian Weimer wrote: > This approach does not work because you do not know whether the > recursive resolver merely echoes back the AD bit, or has actually > performed DNSSEC validation. As always, any reliance on AD bit requires out-of-band knowledge whether the other side does validation and can be trusted or not... and I'm sure Viktor knows that. Glibc (after years and years of deliberation) now has explicit configuration for passing AD bit back to clients: GLibc commit 446997ff1433d33452b81dfa9e626b8dccf101a4 Author: Florian Weimer Date: Wed Oct 30 17:26:58 2019 +0100 resolv: Implement trust-ad option for /etc/resolv.conf [BZ #20358] This introduces a concept of trusted name servers, for which the AD bit is passed through to applications. For untrusted name servers (the default), the AD bit in responses are cleared, to provide a safe default. This approach is very similar to the one suggested by Pavel Šimerda in <https://bugzilla.redhat.com/show_bug.cgi?id=1164339#c15>. The DNS test framework in support/ is enhanced with support for setting the AD bit in responses. Tested on x86_64-linux-gnu. Change-Id: Ibfe0f7c73ea221c35979842c5c3b6ed486495ccc Kudos to Florian that he made it happen, it took 6 years to get it upstream! Historical notes: https://www.sourceware.org/glibc/wiki/DNSSEC -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode
On 02. 04. 20 23:11, Doug Barton wrote: > Thank you for flushing it, I can see that the nodes which were previously > failing are now working. > > I also appreciate the logs, which confirms my fear that the old NS set was > stuck in the cache with what's left of the parent's TTL. That's sort of good > news in the short term since at least we know now that the problem will go > away in time. It's better news longer term since it tells me that my > ultra-paranoid step of adding both sets to the parent isn't so paranoid after > all, and will work to smooth the transitions for the other sites. > > Wasn't there a move away from parent-centric in the past? Did I miss a memo? Now I'm curious: Was there? TL;DR: Updating parent NS set and waiting for its TTL to expire is in no way paranoid, it is a mandatory step. Being parent-centric is the only way how to make resolution deterministic (with respect to NS changes) so we also do that in Knot Resolver. Ultimatelly, if there is no overlap between parent and child NS set, even child-centric resolvers will inevitably fail resolution as soon as the child NS expired from their cache. This behavior is baked into the protocol so there is no way around it. I would much rather spend time on getting parents more flexible instead of spending time on workarounds (being child-centric is IMHO workaround). Petr Špaček @ CZ.NIC > > Thanks again, > > Doug > > > On 2020-04-02 13:49, Brian Somers wrote: >> I’ve flushed shopdisney.co.uk/NS globally. Should work now for >> Umbrella/OpenDNS/Cisco >> >>> On Apr 2, 2020, at 1:36 PM, Brian Somers wrote: >>> >>> This is what I see with diagnostics turned up: > >>> shopdisney.co.uk. 0 IN TXT "RESOLVER: shopdisney.co.uk >>> IN NS ns1.disneyinternational.net" >>> shopdisney.co.uk. 0 IN TXT "RESOLVER: shopdisney.co.uk >>> IN NS ns2.disneyinternational.net" >>> shopdisney.co.uk. 0 IN TXT "RESOLVER: shopdisney.co.uk >>> IN NS ns3.disneyinternational.net" >>> shopdisney.co.uk. 0 IN TXT "RESOLVER: shopdisney.co.uk >>> IN NS ns4.disneyinternational.net" ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Any DNAME usage experience?
On 30. 03. 20 12:35, Meir Kraushar via dns-operations wrote: > - Obviously resolver compliance is very important (Knot support is > questionable?) We intend to release fix in 5.1.0 release, probably next week: https://gitlab.labs.nic.cz/knot/knot-resolver/-/merge_requests/965 I'm sorry for being late to the DNAME party. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] weird queries for mx1.mx2.mx1.mx2...
On 30. 03. 20 21:07, John Levine wrote: > In article <02fe7bae-fec6-f314-b189-4214b75ce...@nic.cz> you write: >> This is query list for domain truckinsurancekentucky.com: >> >> mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mx1.mta-sts.mx1.mx1.mx2.mx2.mta-sts.mx1.mx1.truckinsurancekentucky.com. >> > >> Domain truckinsurancekentucky.com is not the only one with this weird >> behavior. Does anyone have an idea what is causing this? > > It sure looks like misconfigured mta-sts. > > That domain is dead, got another live one we could look at and see how it's > configured? These seem to be alive: mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx1.mta-sts.mx2.mx1.mx1.mx2.mx2.mx2.mx1.mx2.maxonsoftware.com. A mx2.mx1.mx2.mx1.mx1.mx2.mta-sts.mx1.mx2.mx2.mx1.mx2.mx1.mx2.cineversityoneonone.net. A mx2.mx1.mx1.mx1.mx2.mx2.mx2.mta-sts.mx1.mx2.mx1.mx1.mta-sts.mx2.mx2.mx2.effluentialtechnologies.net. A -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] weird queries for mx1.mx2.mx1.mx2...
Hello everyone, while debugging some resolution problems we have notices really weird queries, seemingly related to e-mail delivery. This is query list for domain truckinsurancekentucky.com: mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mx1.mta-sts.mx1.mx1.mx2.mx2.mta-sts.mx1.mx1.truckinsurancekentucky.com. mx1.mx1.mx1.mx2.mx1.mx2.mx1.mx2.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx2.truckinsurancekentucky.com. A mx1.mx2.mx1.mx1.mx1.mx1.mx1.mx2.mx1.mx1.mta-sts.mx1.mx2.mx2.mx2.mx1.truckinsurancekentucky.com. A mx1.mx2.mx1.mx1.mx2.mx1.mx1.mx2.mx1.mx1.mx1.mx1.mx2.mx1.mx2.mta-sts.mx1.truckinsurancekentucky.com. NS mx1.mx2.mx1.mx2.mx2.mx1.mx1.mx1.mx1.mx2.mta-sts.mx1.mx1.mx2.mta-sts.mx2.mx2.truckinsurancekentucky.com. mx1.mx2.mx2.mx1.mx2.mx2.mx1.mx2.mx2.mx2.mx2.mx1.mx2.mx1.mx2.mx1.mx1.mx1.truckinsurancekentucky.com. A mx2.mx1.mx1.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx2.mta-sts.mx1.mx2.mta-sts.mx1.mx2.mx1.truckinsurancekentucky.com. NS mx2.mx1.mx2.mx1.mx1.mx2.mx1.mx2.mx1.mx2.mx1.mx1.mx1.mx1.mta-sts.mx1.mx2.mx2.truckinsurancekentucky.com. NS mx2.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx2.mx1.mx1.mx1.mta-sts.mx1.mx2.truckinsurancekentucky.com. A mx2.mx2.mx1.mx1.mx2.mx1.mx2.mx1.mx1.mta-sts.mx1.mx2.mx1.mx1.mta-sts.mx2.mx2.truckinsurancekentucky.com. mx2.mx2.mx1.mx2.mx1.mx1.mx1.mx2.mx1.mx1.mx1.mx1.mx1.truckinsurancekentucky.com. mx2.mx2.mx1.mx2.mx1.mx1.mx1.mx2.mx2.mx2.mx1.mx1.mx1.mta-sts.mx1.mx2.mx2.mx2.truckinsurancekentucky.com. A Domain truckinsurancekentucky.com is not the only one with this weird behavior. Does anyone have an idea what is causing this? (We have access only to anonymized data so we are unable to pinpoint responsible client.) -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] DNS flag day 2020 update
Hello DNS operators! Work like DNS flag day [1] requires a lot of global collaboration between DNS software developers, vendors, operators and service providers which is usually initiated and facilitated at conferences. As that critical means of collaboration is greatly affected by the current situation in the world it's going to be harder to move things forward. But the work has not been stopped or cancelled, it's moving forward as it did before! Furthermore, we will closely monitor current developments and adapt our plans appropriately. Please note there is still no date set (!) although there is a suggestion [2] that seem to be generally accepted. Clarification: The date in this flag days title is not an indicator when the work will be finished, it is to differentiate this work from previous work. Are you a DNS vendor, operator, firewall vendor or service provider and want to improve on DNS resilience? Then ready our guidelines on "Message Size Considerations" for EDNS [3] to reduce or even avoid fragmentation of the DNS and please allow DNS over TCP! Thank you for your attention. [1] https://dnsflagday.net/2020/ [2] https://github.com/dns-violations/dnsflagday/issues/139#issuecomment-554724998 [3] https://dnsflagday.net/2020/#message-size-considerations -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] contacts for FRITZ!box (AVM) DNS contacts
Hello, I would like to talk to DNS engineers working on FRITZ!box manufactured by AVM. Can you please contact me off-list? Thank you! -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] .ORG still using SHA-1 DNSKEYs
On 07. 02. 20 10:51, James Stevens wrote: >> - You would be surprised how slow UDP packet processing in kernel can be ;-) > > Often UDP slowness is due to the fact that each packet requires a > context-switch from kernel to user-space, and back for the reply. To be less vague: Knot DNS spends about 40 % of time waiting for UDP handling in kernel. > > So the bottleneck on a DNS server is generally how fast the CPU can context > switch, and this often had a hardwired limit. In that you can top out the > packet throughput with the CPU still showing %idle. > > I believe there is (or has been) a dev going on in the kernel to fix this. > > I might be behind the curve, I've not looked into it for a bit. > >> Algorithm 8 or 13 both seem like plausible targets, but opinions from the >> community would be very welcome. > > I recently had to help a client make this exact same decision. > > We felt they'd probably want to move to 13 one day and one move is lower risk > than two. > > It benefits from smaller UDP packets, big packets can become a problem (esp > in v6), so we went for 13. > > Changing algorithm is not fun. Maybe you do not use the right software :-) With right automation it is just matter of changing alg. specification + DS change at parent. See https://www.knot-dns.cz/docs/2.9/singlehtml/#automatic-ksk-and-zsk-rollovers-example (It works equally well for alg rollovers.) Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] .ORG still using SHA-1 DNSKEYs
On 06. 02. 20 1:58, Viktor Dukhovni wrote: > On Wed, Feb 05, 2020 at 12:05:41PM -0500, Joe Abley wrote: > >> We (PIR) are currently discussing a timeline for implementing changes >> with Afilias, who run all the back-end registry systems for ORG. >> Algorithm 8 or 13 both seem like plausible targets, but opinions from >> the community would be very welcome. > > FWIW, the momentum seems to be with algorithm 13: > > https://twitter.com/AFNIC/status/1222904523481444362 > > But if the wisdom of the crowd is not the right basis for a decision, > the considerations as I see them are: > > 1. P256(13) is generally considered equivalent to ~3072-bit > RSA in security. > > 2. P256 signatures are half the size of 1024-bit RSA signatures > (less amplification and/or truncation). > > 3. Signing with P256 is much faster than RSA. For example, on my > 25-watt (low power) 4-core 8-thread Xeon some quick informal > measurements with "openssl speed" (1.1.1d) (1 thread, 4 threads > and 8 threads) yield[1]: > > signverifysign/s verify/s > rsa 2048 bits 0.000750s 0.21s 1333.0 48295.7 > rsa 2048 bits 0.000225s 0.08s 4445.0 131794.8 (4x MP) > rsa 2048 bits 0.000173s 0.05s 5768.1 193455.0 (8x MP) > > rsa 1024 bits 0.000107s 0.07s 9302.9 146937.3 > rsa 1024 bits 0.34s 0.02s 29785.5 467587.8 (4x MP) > rsa 1024 bits 0.25s 0.02s 40302.4 564390.1 (8x MP) > > rsa 1280 bits 0.000388s 0.12s 2575.4 83000.6 > rsa 1280 bits 0.000124s 0.04s 8058.4 256073.9 (4x MP) > rsa 1280 bits 0.91s 0.03s 10937.2 349880.0 (8x MP) > > ecdsa p2560.s 0.0001s36479.8 12455.2 > ecdsa p2560.s 0.s 124877.2 40733.4 (4x MP) > ecdsa p2560.s 0.s 167358.6 52250.6 (8x MP) > > 4. However, as you can see above, signature *verification* with > P256 is ~7 times slower than with 1280-bit RSA (or ~12 times > slower than with 1024-bit RSA). > > So if you're optimizing for higher security and lower packet size (and > perhaps much faster zone signing time), P256(13) is the way to go. If > however, you're concerned about resolver performance, then rsa1280 has > an advantage. > > Thus my 25W server running flat out can do ~350K rsa1280 signature > checks per second, vs. ~52K P256 signature checks per second. > > When the DANE/DNSSEC survey is running, unbound is keeping 1 core pretty > busy handling O(5K) cache misses a second. > > I don't know what fraction of the CPU cost is in the crypto vs. all the > other costs of processing the traffic. I am reluctant to increase > concurrency, lest my queries be throttled by upstream nameservers. > > The survey already has to deal with a large fraction of the domains > using P256, so these likely already dominate any crypto impact on CPU > cost, and yet I can do ~5K validated qps on a low power server also > running a Postgres database and the survey engine. The system is > somewhat less than 50% utilized while running the survey. Anecdotal evidence: When benchmarking Knot Resolver on realistic "ISP scenarios", amount of CPU time spent on DNSSEC validation is dwarfed by all the rest. It has two reasons: - In practive most of the traffic is cache-hit. - You would be surprised how slow UDP packet processing in kernel can be ;-) Based on this anecdote RSA has no practical performance-advantage over P256. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
On 27. 11. 19 21:49, David Conrad wrote: > Petr, > >> I think there is even more fundamental problem: >> Someone has to pay operational costs of "the new system”. > > The “new system” is simply the existing network of resolvers, augmented to > have the root zone. As far as I can tell, the operational cost would be in > (a) ensuring the resolver is upgraded to support obtaining the root zone and > (b) dealing with the fetch of the root zone with some frequency. Oh, sorry, this is misunderstanding! My reference to "the new system" was meant to be "the new system for root zone distribution". Please let me try again: Even if "the new system for root zone distribution" is BitTorrent it still: - (most likely) needs a set of static IP addresses to solve the bootstrap problem, - trackers need to be highly resilient against DDoS, - trackers most likely need to be anycasted to limit scope of DDoS. I hypothetise that in the end requirements for "the new system for root zone distribution" will be fairly close to current requirements for current DNS root system... so I do not see where the cost reduction comes from. Or in other words: If current root system must survive 1 TB/s attack so must the "the new system for root zone distribution" system, unless we move to decenralized root. Changing one centralized system to another does not solve the fundamendal problem of costly-to-defent-single-point-of-failure. Hopefully it is clearer this time. Petr Špaček @ CZ.NIC > > There would be an additional cost, that of making the root zone available to > myriads of resolvers, but I believe this is an easily handled issue. > >> Personally I do not see how transition to another root-zone-distribution >> system solves the over-provisioning problem - the new system still has to be >> ready to absorm absurdly large DDoS attacks. > > Two ways: > - greater decentralization: there are a lot more resolvers than the number of > instances root server operators are likely to ever deploy. While an > individual resolver might melt down, the impact would only be the end users > using that resolver (and it is relatively easy for a resolver operator to add > more capacity, mitigate the attacking client, etc). > - the cost of operating and upgrade the service to deal with DDoS is > distributed to folks whose job it is to provide that service (namely the ISPs > or other network operators that run the resolvers). Remember that the root > server operators have day jobs, some of which are not particularly related to > running root service, and they are not (currently) being compensated for the > costs of providing root service. > >> Have a look at https://www.knot-dns.cz/benchmark/ . The numbers in charts at >> bottom of the page show that a *single server machine* is able to reply >> *all* steady state queries for the root today. >> ... >> Most of the money is today spent on *massive* over-provisioning. As an >> practical example, CZ TLD is over-provisiong factor is in order of >> *hunderds* of stead-state query traffic, and the root might have even more. > > Yep. As mentioned before, steady state is largely irrelevant. > > In my view, the fact that root service infrastructure funnels up to a > (logical) single point is an architectural flaw that may (assuming DDoS > attack capacity continues to grow at the current rate or even grows faster > with crappy IoT devices) put the root DNS service at risk. One of the > advantages of putting the root zone in the resolver is that it mitigates that > potential risk. > > Regards, > -drc > (Speaking for myself, not any organization I may be affiliated with) ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
On 26. 11. 19 12:46, David Conrad wrote: > On Nov 26, 2019, at 11:33 AM, Jim Reid <mailto:j...@rfc1035.com>> wrote: >>> On 26 Nov 2019, at 09:16, Florian Weimer >> <mailto:f...@deneb.enyo.de>> wrote: >>> >>> Up until recently, well-behaved recursive resolvers had to forward >>> queries to the root if they were not already covered by a delegation. >>> RFC 7816 and in particular RFC 8198 changed that, but before that, it >>> was just how the protocol was expected to work. >> >> So what? These RFCs make very little difference to the volume of queries a >> resolving server will send to the root. QNAME minimisation has no impact at >> all: the root just sees a query for .com instead of foobar.com >> <http://foobar.com/>. A recursive resolver should already be supporting >> negative caching and will have a reasonably complete picture of what's in >> (or not in) the root. RFC8198 will of course help that but not by much IMO. > > It would appear a rather large percentage of queries to the root (like 50% in > some samples) are random strings, between 7 to 15 characters long, sometimes > longer. I believe this is Chrome-style probing to determine if there is > NXDOMAIN redirection. A good example of the tragedy of the commons, like > water pollution and climate change. > > If resolvers would enable DNSSEC validation, there would, in theory, be a > reduction in these queries due to aggressive NSEC caching. Of course, > practice may not match theory > (https://indico.dns-oarc.net/event/32/contributions/717/attachments/713/1206/2019-10-31-oarc-nsec-caching.pdf). > The discussion after the talk (including hallway :-) was also interesting, and with all respect for Geoff's work, these slides should be read with some sceptism. Main points: 1) Load-balancer with N resolver nodes behind it decrease effectivity of aggressive cache by factor N, it *does not* invalidate the concept. In other words, a random subdomain attack which flows through resolver farm with N nodes has to fill N caches with NSEC records, and that will simply take N times longer when compared with non-load-balanced scenario. The aggressive cache still provides upper bound for size of NXDOMAIN RRs in cache, which is super useful under attack because it prevents individual resolvers from dropping all the useful content from cache during the attack. 2) Two out of five major DNS resolver implementations used by large ISPs did not implement aggressive caching (yet?), so it needs to be expected that deployment is not great. Also the feature is pretty new and large ISPs are super conservative and might not deployed new versions yet ... I forgot the rest so I will conclude with: Watch the video recording and think yourself! :-) Petr Špaček @ CZ.NIC > > Of course, steady state query load is largely irrelevant since root service > has to be provisioned with massive DDoS in mind. In my personal view, the > deployment of additional anycast instances by the root server operators is a > useful stopgap, but ultimately, given the rate of growth of DoS attack > capacity (and assuming that growth will continue due to the stunning security > practices of IoT device manufacturers), stuff like what is discussed in that > paper is the right long term strategy. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
On 27. 11. 19 9:53, Ondřej Surý wrote: > Mark, > > I believe that any distributed system that won’t have a fallback to the RZ > is inevitably doomed and will get out of sync. > > The RFC7706 works because there’s always a safe guard and if the resolver > is unable to use mirrored zone, it will go to the origin. > > Call me a pessimist, but I’ve yet to see a loosely often neglected > distributed system > that won’t get out of sync. > > So, while the idea of distributing the full RZ to every resolver out there, > there are two > fundamental problems: > > 1. resilience - both against DoS and just plain breakage > 2. the old clients - while the situation out there is getting better, we will > still be stuck with > old codebase for foreseeable future > > What we can do is to make the load on RZ servers lighter, but we can’t make > them just go. I think there is even more fundamental problem: Someone has to pay operational costs of "the new system". Personally I do not see how transition to another root-zone-distribution system solves the over-provisioning problem - the new system still has to be ready to absorm absurdly large DDoS attacks. Example: Have a look at https://www.knot-dns.cz/benchmark/ . The numbers in charts at bottom of the page show that a *single server machine* is able to reply *all* steady state queries for the root today. Sure, we have speed-of-light limits, so let's say we need couple hunderd servers in well connected places to keep reasonable latency. That's not a huge cost overall (keep in mind that these local nodes could be pretty small *if we were ignoring the over-provisioning problem*). Most of the money is today spent on *massive* over-provisioning. As an practical example, CZ TLD is over-provisiong factor is in order of *hunderds* of stead-state query traffic, and the root might have even more. Once we have similarly resilient HTTP system it is matter of simple configuration :-D https://knot-resolver.readthedocs.io/en/stable/modules.html#cache-prefilling -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
On 26. 11. 19 16:04, Roy Arends wrote: > > >> On 26 Nov 2019, at 12:46, David Conrad wrote: >> >> It would appear a rather large percentage of queries to the root (like 50% >> in some samples) are random strings, between 7 to 15 characters long, >> sometimes longer. I believe this is Chrome-style probing to determine if >> there is NXDOMAIN redirection. A good example of the tragedy of the commons, >> like water pollution and climate change. > > Yep. > > https://chromium.googlesource.com/chromium/src/+/32352ad08ee673a4d43e8593ce988b224f6482d3/chrome/browser/intranet_redirect_detector.cc > Line 79: "// We generate a random hostname with between 7 and 15 characters.” > > https://ithi.research.icann.org/graph-m3.html > Table "Queries to frequently found name patterns” shows that the frequency > distribution for queries between 7 and 15 characters are near flat (around > 5.2% per character length) AND an order higher than ANY other queries. > > “Coincidence? I think NOT!” > > https://youtu.be/MDpuTqBI0RM?t=53 FYI there is also an issue about this in their tracker: https://bugs.chromium.org/p/chromium/issues/detail?id=946450#c1 -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] s3.amazonaws.com problem - price to pay for not using DNSSEC
On 23. 10. 19 14:37, Daniel Stirnimann wrote: > I have located a host in our network which sends such queries the > network resolver (which we operate): > > mqfgioo5.s3.amazonaws[.]com. IN CNAME > 6l-dpfrn.s3.amazonaws[.]com. IN CNAME > 2idg5c42.s3.amazonaws[.]com. IN CNAME > qzq3uz5m.s3.amazonaws[.]com. IN CNAME > nenkxm2p.s3.amazonaws[.]com. IN CNAME > yk2max6j.s3.amazonaws[.]com. IN CNAME > qhcbric2.s3.amazonaws[.]com. IN CNAME > wg-jmekf.s3.amazonaws[.]com. IN CNAME > dnwn2ip1.s3.amazonaws[.]com. IN CNAME > 711o385.s3.amazonaws[.]com. IN CNAME > rn0v02a6.s3.amazonaws[.]com. IN CNAME > pm1a3a4t.s3.amazonaws[.]com. IN CNAME > 0xc.tibo.s3.amazonaws[.]com. IN CNAME > 76jt.m9g.s3.amazonaws[.]com. IN CNAME > 4tjc8hp.s3.amazonaws[.]com. IN CNAME > b-.9ft7y.s3.amazonaws[.]com. IN CNAME Funnily enough this attack would have been partially mitigated on resolver side if S3 domain was signed with DNSSEC! New versions of resolvers already implement RFC 8198 which makes random-subdomain attacks ineffective against DNSSEC-signed domains. With 1/3 of clients in the world behind DNSSEC-validating resolver it would already make a difference. This "protection effect" of signing + RFC 8198 was experimentally confirmed by measurements back in March 2018 and reported by me at DNS-OARC 28 meeting. Slides: https://indico.dns-oarc.net/event/28/contributions/509/attachments/479/786/DNS-OARC-28-presentation-RFC8198.pdf Update from 2019: Slight latency increase reported on slide 9 is in fact bug in BIND implementation and not feature of the protocol. Petr Špaček @ CZ.NIC > > Interestingly, it also sends other suspicious queries such as: > > . IN TYPE1847 > . IN TYPE1847 > . IN TYPE567 > . IN TYPE1847 > . IN TYPE567 > . IN TYPE1847 > . IN TYPE1847 > . IN TYPE1900 > . IN TYPE823 > . IN TYPE1900 > . IN TYPE1847 > 7a4. IN TYPE868 > . IN TYPE1847 > . IN TYPE1847 > . IN TYPE1900 > . IN TYPE1847 > . IN TYPE1847 > 3n2y. IN TYPE612 > . IN TYPE311 > . IN TYPE1900 > > However, these are mostly answered from cache because of aggressive use > of DNSSEC-validated cache. Still, I guess root server operators may see > an increase in queries with unassigned query types. > > Daniel ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] Random question about Google resolver behaviour and long-lived TCP sessions
On 27. 09. 19 18:19, Alexander Dupuy via dns-operations wrote: > Tony Finch wrote: > > So I wonder if Google have implemented EDNS TCP keepalive. If you change > what BIND calls tcp-advertised-timeout, do Google's TCP connection > lifetimes change to match? > > > Google Public DNS has not implemented EDNS TCP keepalive, neither as a server > for its clients, nor in its TCP connections to authoritative servers. Has > BIND added support on its client side, or only as a DNS server? It seems like > Unbound has client and server-side support > (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231283), and the GetDNS > client code also supports it (https://getdnsapi.net/releases/getdns-0-9-0/) > but those are the only ones I found. Knot Resolver has a stub implementation of EDNS keepalive: https://knot-resolver.readthedocs.io/en/stable/modules.html#edns-keepalive Quote from docs: The edns_keepalive module implements RFC 7828 for clients connecting to Knot Resolver via TCP and TLS. Note that client connections are timed-out the same way regardless of them sending the EDNS option; the module just allows clients to discover the timeout. When connecting to servers, Knot Resolver does not send this EDNS option. It still attempts to reuse established connections intelligently. > I don't see any implementations of RFC 8490 (DNS Stateful Operations). BTW the protocol is complex like hell so I do not see it being implemented soon, if even, in Knot Resolver. -- Petr Špaček @ CZ.NIC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations