Re: [dns-operations] Name servers returning incorrectly truncated UDP responses

2022-07-30 Thread Brian Dickson
On Sat, Jul 30, 2022 at 11:14 AM Ondřej Surý  wrote:

> I am 99% sure the fpdns is wrong and this is not djbdns. The fpdns relies
> on subtle differences between various DNS implementations and is often
> wrong because there’s either not enough data or just not enough
> differences. That’s what I’ve seen when we started with Knot DNS - the
> quality implementations just didn’t had enough differences between than
> because they adhered to standards that fpdns just could not tell the
> difference.
>

If there are SUBTLE differences between DJB DNS and anything else, I would
die of shock.
I will offer a beer to anyone who shows me anything even remotely close to
as broken as that POS.
(The DNS community should maybe start offering rewards for replacing it, to
anyone running djbdns, in real currency, just to purge it from the
internet.)

Brian



>
> Cheers,
> --
> Ondřej Surý  (He/Him)
>
> On 30. 7. 2022, at 19:37, Puneet Sood  wrote:
>
> 
>
>
> On Sat, Jul 30, 2022 at 10:26 AM Dave Lawrence  wrote:
>
>> Greg Choules via dns-operations writes:
>> > I am including in this mail the RNAME from the SOA (same for both
>> > zones) in the hope that someone who is responsible for DNS at Sony
>> > entertainment will see this and take note.
>>
>> And tell us what in the world DNS software they're running, and why
>> they chose it.
>>
>
> Jaap up-thread used fpdns to figure out the first question.
>
> fpdns e.ns.email.sonyentertainmentnetwork.com
> fingerprint (e.ns.email.sonyentertainmentnetwork.com, 207.251.96.133): DJ
> Bernstein TinyDNS 1.05 [Old Rules]
>
>> ___
>> dns-operations mailing list
>> dns-operations@lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>>
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Input from dns-operations on NCAP proposal

2022-06-03 Thread Brian Dickson
On Fri, Jun 3, 2022 at 3:17 PM John R Levine  wrote:

> On Fri, 3 Jun 2022, John Levine wrote:
> >> In such a configuration, if the host name "foo" matches the candidate
> TLD
> >> "foo", and the latter is changed from NXDOMAIN ...
>
> > Do we have any idea how many systems still use search lists?  We've been
> saying
> > bad things about them at least since .CS was added in 1991.
>
> It occurs to me there is another way to look at this.  There are already
> 1487 delegated TLDs, and I doubt anyone could name more than a small
> fraction of them.  If this increases the number of names that will break
> search lists from 1487 to 1488, how much of a problem is this likely to be
> in practice, which leads back to ...
>
>
If it was ONLY a progression of 1487->1488, it might not be that bad (but
again, that all depends on what number 1488 actually is.)

What it is actually is an exercise in survivorship bias.
Anyone who might have been impacted by any of the earlier rounds of
expansion, will (likely) have learned their lesson.
That lesson may depend on tribal knowledge, which might not be reliable
enough for any previous victim to not be re-victimized.

Anyone not previously affected may be unaware of the risk their own set-up
places them in, until their choices run up against newly deployed TLDs.

Until the practice or standard/implementation for search-lists is fully
deprecated, the risk will remain, for either new TLDs being deployed or new
host names or naming conventions being deployed.

Unimaginative host names like "mail001" are likely safe.

However, naming hosts after some class of entities, like manufacturers or
fast food companies or even classes of things, will ironically be risky.

The best analogy I can think of is playing "minesweeper" on a huge board,
where the number of mines periodically gets increased, where there are no
signals of adjacent mines (1-8), no flags, and no automatic flooding of
zero-mine areas.
Spots you have clicked on could be subsequently mined, and you lose. It is
an asynchronous race condition, where an external party is making moves
(adding mines) on your behalf.
It would not be considered a "fun" game, IMNSHO.

Brian

P.S. Having "ndots:N" for N>0 isn't necessarily safe, either. Any new TLD
that matches an internal namespace component rather than hostname, won't
necessarily be discovered until registrations begin.
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Input from dns-operations on NCAP proposal

2022-06-03 Thread Brian Dickson
On Fri, Jun 3, 2022 at 11:57 AM Thomas, Matthew via dns-operations <
dns-operati...@dns-oarc.net> wrote:

>
>
>
> -- Forwarded message --
> From: "Thomas, Matthew" 
> To: "d...@virtualized.org" , "pspa...@isc.org" <
> pspa...@isc.org>
> Cc: "vladimir.cunat+i...@nic.cz" , "
> dns-operati...@dns-oarc.net" 
> Bcc:
> Date: Fri, 3 Jun 2022 18:48:57 +
> Subject: Re:  Re: [dns-operations] Input from dns-operations on NCAP
> proposal
> Thank you David.  That change from NXDOMAIN to NOERROR/NODATA and things
> going "boom" is exactly what we are looking for community input towards.
> Do folks know of applications, or things like suffix search list
> processing, that will change their behavior.
>
>
There is one particular non-default configuration that definitely would
make things go "boom". (This is not a comprehensive list of behaviors, just
one example that is known.)

If the options value of "ndots:N" is set in /etc/resolv.conf (or whatever
analogous configuration elements exist in non-Unix/linux systems) to a
value of N==0, then a lookup for a single label name (e.g. "foo") would be
made as an absolute query first, before doing search list additions.

"ndots" can generally be any number between 0 and X, for
implementation-specific X. Some implementations cap X at 15, some at 255,
there may be other implementations.

In such a configuration, if the host name "foo" matches the candidate TLD
"foo", and the latter is changed from NXDOMAIN (non-existing in the root)
to anything else (e.g. a delegation is made for "foo"), this will break
search list processing for "foo". I.e. earth-shattering kaboom.
BEFORE: "foo" => NXDOMAIN, resolver then tries various "foo.bar.example.com",
"foo.example.com" etc.
AFTER: "foo" => not NXDOMAIN, resolver stops after the answer it gets
(especially if there is a matching QTYPE and RRTYPE in the Answer, such as
QTYPE == A, answer is 127.0.53.53)

Brian



> Matt
>
> On 6/2/22, 5:22 PM, "David Conrad"  wrote:
>
> Hi,
>
> On Jun 1, 2022, at 12:39 AM, Petr Špaček  wrote:
> > On 24. 05. 22 17:54, Vladimír Čunát via dns-operations wrote:
> >>> Configuration 1: Generate a synthetic NXDOMAIN response to all
> queries with no SOA provided in the authority section.
> >>> Configuration 2: Generate a synthetic NXDOMAIN response to all
> queries with a SOA record.  Some example queries for the TLD .foo are below:
> >>> Configuration 3: Use a properly configured empty zone with correct
> NS and SOA records. Queries for the single label TLD would return a NOERROR
> and NODATA response.
> >> I expect that's OK, especially if it's a TLD that's seriously
> considered.  I'd hope that "bad" usage is mainly sensitive to existence of
> records of other types like A.
> >
> > Generally I agree with Vladimir, Configuration 3 is the way to go.
> >
> > Non-compliant responses are riskier than protocol-compliant
> responses, and option 3 is the only compliant variant in your proposal.
>
> Just to be clear, the elsewhere-expressed concern with configuration 3
> is that it exposes applications to new and unexpected behavior.  That is,
> if applications have been “tuned” to anticipate an NXDOMAIN and they get
> something else, even a NOERROR/NODATA response, the argument goes those
> applications _could_ explode in an earth shattering kaboom, cause mass
> hysteria, cats and dogs living together, etc.
>
> While I’ve always considered this concern "a bit" unreasonable, I
> figure its existence is worth pointing out.
>
> Regards,
> -drc
>
>
>
>
>
>
> -- Forwarded message --
> From: "Thomas, Matthew via dns-operations" 
> To: "d...@virtualized.org" , "pspa...@isc.org" <
> pspa...@isc.org>
> Cc: "dns-operati...@dns-oarc.net" 
> Bcc:
> Date: Fri, 3 Jun 2022 18:48:57 +
> Subject: Re: [dns-operations] Input from dns-operations on NCAP proposal
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Input from dns-operations on NCAP proposal

2022-06-02 Thread Brian Dickson
he in-zone CNAME record to cause re-queries from
   resolvers, to estimate query volume

Additional child record CNAMEs could be added with the same or similar
target(s).

   - Each CNAME would be added to the root zone, since there is no
   delegation involved.
  - e.g. common-name.candidate-tld. CNAME
  some-other-target-that-is-cnamed-to-nxdomain.ncap.example.net.

It would also be possible to add a wildcard CNAME below any FQDN, which
would match any descendant of the FQDN for which no existing name was
present in the zone. (Details of wildcard matching are omitted for brevity.)

   - e.g *.candidate-tld. CNAME
   wildcard-target-that-is-cnamee-to-nxdomain.ncap.example.net.

It would be advisable to do this first, before any consideration of doing
option 3.
None of the other options is advisable.

Brian Dickson

P.S. This solution can be tested and validated relatively easily, as it
only involves normal, standard DNS server(s) and supported record types.
P.P.S. Of course, you would need to supply your own real domain name
anywhere in the above that "example.net" appears.



>
>
> Best,
>
>
>
> Matt Thomas
>
> NCAP Co-chair
>
>
>
>
>
> -- Forwarded message --
> From: "Thomas, Matthew via dns-operations" 
> To: "dns-operati...@dns-oarc.net" 
> Cc:
> Bcc:
> Date: Mon, 23 May 2022 13:48:12 +
> Subject: [dns-operations] Input from dns-operations on NCAP proposal
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] Obsoleting 1024-bit RSA ZSKs (move to 1280 or algorithm 13)

2021-10-21 Thread Brian Dickson
On Thu, Oct 21, 2021 at 7:54 PM George Michaelson  wrote:

> I would be concerned that the language which makes the recommendation
> HAS to also note the operational problems. You alluded to the UDP
> packetsize problem. And implicitly the V6 fragmentation problem. What
> about the functional limitations of the HSM and associated signing
> hardware? I checked, and the units we operate (for other purposes than
> DNSSEC) don't support RSA1280. They do RSA1024 or  RSA2048. This is
> analogous to the recommendation I frequently make casually, to stop
> using RSA and move to the shorter cryptographic signature algorithms
> to bypass the size problem: They are slower, and they aren't supported
> by some hardware cryptographic modules.
>

Okay, yes, this was something I wasn't taking into consideration.
(My apologies to everyone.)

Everything is, to some degree or another, a trade-off.

So, out of curiosity (and for a single data point I suppose), which non-RSA
algorithms does your HSM support?
If it includes one of the elliptic curve algorithms, I think the
interesting thing would be the respective multipliers on slowdown and
crypto strength (work factor).
E.g. a 50x slowdown which produces, say, a 1000x work factor increase,
would be worth considering seriously, but it is unclear what the work
factor increase would be.

I think additionally, anyone looking at what to do would probably need to
determine two parameters:

   - Natural signing rate (e.g due to changes in data to be signed)
   - Re-signing time (speed x number of entries)

There are places on the performance curves that are unsupportable, e.g.
when the number of entries is large enough and the natural signing rate is
high enough, that the re-signing time becomes infinite.

In that situation, there are not a lot of alternatives: replace the HSM(s);
scale horizontally with additional HSMs operating in parallel; use a faster
(and presumably weaker) algorithm.

The fourth option is to perform signing using non-HSM equipment, which has
challenges of its own.


> Even without moving algorithm, Signing gets slower as a function of
> keysize as well as time to brute force. So, there is a loss of
> "volume" of signing events through the system overall. Time to resign
> zones can change. Maybe this alters some operational boundary limits?
> (from what I can see, 1024 -> 1280 would incur 5x slowdown.  1024-2048
> would be 10-20x slowdown. RSA to elliptic curve could be 50x or worse
> slowdown)
>
> If the case for "bigger" is weak, then if the consequences of bigger
> are operational risks, maybe bigger isn't better, if the TTL bound
> life, is less than the brute force risk?
>
> A totally fictitious example. but .. lets pretend somebody has locked
> in to a hardware TPM, and it simply won't do the recommended algorithm
> but would power on with 1024 until the cows come home? If the TTL was
> kept within bounds, if resign could be done in a 10 day cycle rather
> than a 20 day cycle (for instance) I don't see why the algorithm
> change is the best choice.
>
>
You are correct, and much depends on things like stability of the zone and
total zone size.

The ultimate limit is really the utilization level of the signing hardware.
Once the hardware is operating full-out constantly, it is only a matter of
time before the theoretical adversarial risk exceeds the zone operator's
risk tolerance.
If the hardware performance generally continues to improve along the
current exponential scale (e.g CPU and GPU performance), signing hardware
will eventually be obsolete and need replacing.

Brian


> On Fri, Oct 22, 2021 at 11:46 AM Brian Dickson
>  wrote:
> >
> >
> >
> > On Wed, Oct 20, 2021 at 10:22 AM Paul Hoffman 
> wrote:
> >>
> >> On Oct 20, 2021, at 9:29 AM, Viktor Dukhovni 
> wrote:
> >>
> >> > I'd like to encourage implementations to change the default RSA key
> size
> >> > for ZSKs from 1024 to 1280 (if sticking with RSA, or the user elects
> RSA).
> >>
> >> This misstates the value of breaking ZSKs. Once a KSK is broken, the
> attacker can impersonate the zone only as long as the impersonation is not
> noticed. Once it is noticed, any sane zone owner will immediately change
> the ZSK again, thus greatly limiting the time that the attacker has.
> >
> >
> > This presupposes what the ZSKs are signing, and what the attacker does
> while that ZSK has not been replaced.
> >
> > For example, if the zone in question is a TLD or eTLD, then the records
> signed by the ZSK would include almost exclusively DS records.
> > DS records do change occasionally, so noticing a changed DS with valid
> signature is unlikely for anyone other than the operator of the
> corresponding delegated zone.
> &

Re: [dns-operations] [Ext] Obsoleting 1024-bit RSA ZSKs (move to 1280 or algorithm 13)

2021-10-21 Thread Brian Dickson
On Wed, Oct 20, 2021 at 10:22 AM Paul Hoffman 
wrote:

> On Oct 20, 2021, at 9:29 AM, Viktor Dukhovni 
> wrote:
>
> > I'd like to encourage implementations to change the default RSA key size
> > for ZSKs from 1024 to 1280 (if sticking with RSA, or the user elects
> RSA).
>
> This misstates the value of breaking ZSKs. Once a KSK is broken, the
> attacker can impersonate the zone only as long as the impersonation is not
> noticed. Once it is noticed, any sane zone owner will immediately change
> the ZSK again, thus greatly limiting the time that the attacker has.
>

This presupposes what the ZSKs are signing, and what the attacker does
while that ZSK has not been replaced.

For example, if the zone in question is a TLD or eTLD, then the records
signed by the ZSK would include almost exclusively DS records.
DS records do change occasionally, so noticing a changed DS with valid
signature is unlikely for anyone other than the operator of the
corresponding delegated zone.
An attacker using such a substituted DS record can basically spoof anything
they want in the delegated zone, assuming they are in a position to do that
spoofing.
And how long those results are cached is controlled only by the resolver
implementation and operator configuration, and the attacker.

So, the timing is not the duration until the attack is noticed
(NOTICE_DELAY), it is the range MIN_TTL to MIN_TTL+NOTICE_DELAY (where
MIN_TTL is min(configured_TTL_limit, attacker_supplied_TTL)).

The ability of the operator of the delegated zone to intervene with the
resolver operator is not predictable, as it depends on what relationship,
if any, the two parties have, and how successful the delegated zone
operator is in convincing the resolver operator that the cached records
need to be purged.

Stronger ZSKs at TLDs is warranted even if the incremental improvement is
less than what cryptographers consider interesting, IMNSHO. It's not an
all-or-nothing thing (jump by 32 bits or don't change), it's a question of
what reasonable granularity should be considered in increments of bits for
RSA keys. More of those increments is better, but at least 1 such increment
should be strongly encouraged.

I think Viktor's analysis justifies the suggestion of 256 bits (of RSA) as
the granularity, and thus recommending whatever in the series 1280, 1576,
1832, 2048 the TLD operator is comfortable with, with recommendations
against going too big (and thus tripping over the UDP-TCP boundary).


> In summary, it is fine to propose that software default to issuing larger
> RSA keys for ZSKs, but not with an analysis that makes a lot of unstated
> guesses. Instead, it is fine to say "make them as large as possible without
> causing automatically needing TCP, and ECDSA P256 is a great choice at a
> much smaller key size".
>

I'm fine with adding those to the recommendations (i.e. good guidance for
the rationale for picking ZSK size and/or algorithm), with the added
emphasis on not doing nothing.

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Verisign won't delete obsolete glue records?

2021-03-01 Thread Brian Dickson
On Mon, Mar 1, 2021 at 4:41 PM Doug Barton  wrote:

>
> Thanks for the explanation about objects vs. host names. In this case
> it's not a third party that is using the old names, it's still us, so we
> don't want to "break" those delegations.
>
> Perhaps I didn't ask my question clearly enough. Let's take a delegation
> for example.com to ns1.example.info and ns2.example.info. There will be
> no host records at Verisign for those two names, right? So how are those
> delegation host names represented in the database, and why can't my
> now-obsolete glue records be represented the same way?
>

Okay, I think I understand better what you're asking.

My understanding is that, even though the delegation is to an off-TLD name
server, the registry still needs an object.
So, the glue rules mean that object will have a name, but not have any
addresses.

Those objects' names are basically first-come, first served.
But, if you rename them, the original name is no longer in existence.
At that point, if you wanted to, you could create a new object with the
now-vacated name.

(This may even be what you want to do, one way or another.)

I'm pretty sure you can't have different objects using the same name at the
same time.

And basically, if you want the other delegations to point to the
same/original IP, or to the new name, what you really want to do is rename
the host, not change the delegation of the domain.

(I'm assuming you want all the domains to point to a new name, and not have
any delegations pointing to the old name).

If you did the re-delegation first, that could be a bit tricky. You might
need to do the following:

   - Rename the new host record that was created to a throw-away name
   - Change the delegation to the original name (and re-connect to the
   original object)
   - Delete the now-unreferenced throw-away name
   - Rename the original object host to the new name you want to use for
   all your delegations

Repeat the above for each name server host name.

After the above steps, there will no longer be any host objects which are
children of the "primary" domain.

Thus, you won't need to try to delete anything, because the name will
already no longer exist. (The object will, but it will have a new name.)

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Verisign won't delete obsolete glue records?

2021-03-01 Thread Brian Dickson
On Mon, Mar 1, 2021 at 3:28 PM Doug Barton  wrote:

> I'm being told something by my registrar which I find impossible to
> believe, but they keep telling me that they have accurately transmitted
> my request, and that the answer is no. "Let me 'splain. No, there is too
> much. Let me sum up."
>
>

> So what am I missing here? I know that in the past it was possible, and
> in fact desirable, to remove those obsolete glue records, but now it's
> impossible to do it?
>

Not speaking with knowledge of the specifics, only concerning the general
case:
The RRR (registry/registrar/registrant) system is somewhat complex, and
arcane.
The common language used, EPP, is capable of representing relationships,
but is restrictive.

The root problem is the object model (tied to the database nature of
registries).
A glue record is basically a host record, with a name and IP address(es).
Domains (registered with the registry, belonging to registrants) have their
delegations represented as references to host records.

This is where things break down: the delegation is to the object, not the
name.

If you change your delegations to a different name, that will either change
the reference to a different object, or possibly create a new object and
use that for its delegation reference.

The old object (with the original name) still exists.

If (and ONLY if) there are no other references (i.e. delegations) to that
object, can the object be deleted.

That rule is enforced, and is tied to the database model for hosts and
domains.

You do generally have the option of renaming the object, and there are some
interesting options available.

One is to change the name to an off-TLD name, in which case the
corresponding IP address(es) are removed.

Using an off-TLD name that is deliberately and permanently unresolvable is
a nice, clean way of "breaking" the other domains, who should really not
have been using your name server as their name server without your
permission.

An example name would be "SOME_RANDOM_VALUE".empty.as112.arpa
(empty.as112.arpa is a zone intended to never have any non-apex records, as
the name suggests, and its existence is defined for that purpose in RFC
7535).

For "SOME_RANDOM_VALUE", it is recommended that you use a GUID type
generated value for the label, to ensure it does not collide with anyone
else doing the same thing. (There are others doing this already.)

Hope this helps explain the situation.

(It's not your fault, and it isn't the registry's fault, it is whoever has
for whatever reason delegated some other domain to your name server that
has caused the problem.)

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Quad9 DNSSEC Validation?

2021-03-01 Thread Brian Dickson
On Mon, Mar 1, 2021 at 2:16 PM Viktor Dukhovni 
wrote:

> On Mon, Mar 01, 2021 at 09:12:38AM +0100, Petr Špaček wrote:
>
> > In my experience negative trust anchors for big parts of MIL and/or GOV
> > are way more common, let's not pick specifically on Quad9. For periods
> > of time I have seen with other big resolver operators as well.
>
> On the .gov side, just 10 of 1239 domains fail to return validated
> DNSKEY RRsets (with rounded number of weeks duration):
>
> weeks |   domain
>---+
>


>   148 | uscapitolpolice.gov


Just an observation, in terms of real world implications of DNSSEC
validation failures:

I hope this wasn't in any way a contributing factor in the 2021-01-06
events/response.

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] Possibly-incorrect NSEC responses from many RSOs

2021-02-28 Thread Brian Dickson
On Sun, Feb 28, 2021 at 12:37 PM Viktor Dukhovni 
wrote:

> On Sun, Feb 28, 2021 at 08:52:38PM +0100, Vladimír Čunát wrote:
>
> > On 2/28/21 8:47 PM, Paul Hoffman wrote:
> > >> [1]https://tools.ietf.org/html/rfc8482#section-7  [tools.ietf.org]
> > > That RFC (a) doesn't update RFC 4025 and (b) is only about QTYPE of
> "ANY".
> >
> > I meant just the informal future-work note focused on QTYPE=RRSIG (in
> > the linked section), to support my claim that there are advantages in
> > avoiding full replies to such queries.
>
> Not only are "full" replies not needed, detached from the RRSet for
> which an RRSIG is the signature, the content of the RRSIG is both
> useless and meaningless.  Since it can never be validated it should not
> be cached.
>
> An interoperable synthetic reply when the qname exists would be:
>
>  0 IN RRSIG RRSIG 255  0 <0x0> <0x0> 0
>  AA==
>
> A signature payload of a single 0 byte avoids potential issues with
> unexpected zero-length signatures.
>
>   * It is less clear what to do when the qname is wildcard-synthesized.
> Should there be NSEC records to validate a wildcard-based response???
>
> My take is "no", just always set the closest encloser to equal the
> qname, and let the zero TTL take care of not having such replies
> stick around in caches to imply anything about the node.
>
> Iterative resolvers should not cache RRSIG replies, regardless of TTL.
>
> I'm writing a new stub resolver for Haskell, and even prior to this
> thread my plan was to not permit RRSIG queries, because they made no
> sense.  I could instead just return the above synthetic response without
> asking any upstream server, but an error telling the user they're doing
> the wrong thing seems more appropriate.
>

I think this is vaguely interesting, if for no other reason that it exposes
some weaknesses in the original 403[345] RFCs.

Two relevant questions are:

   - Are the observed RRSIG queries the result of an actual client's
   behavior? (The alternative being diagnostic tool usage, and I'd be
   surprised if it was the former.)
   - If a validating client is either using a security-oblivious resolver
   (the client might be a stub or a forwarder), or has an intermediate network
   device interfering with DNSSEC, is it even possible to do validation, ever?

It's clear that the relevant portion(s) of 403[345] don't correctly handle
the RRSIG case, and pretty much cannot (since RRSIG needs to know the
original QTYPE to select/filter the RRSIG).

If (big IF) there is interest in solving for the second case (validating
client behind a middle box or resolver that is not returning RRSIG records,
possibly not doing EDNS, and may or may not be security-aware with respect
to CD and AD bits), what then?
It might be interesting to know how much of the Internet is in those
situations (validating stub or validating resolver, but unable to actually
validate).

The follow-up question, if there is a substantial portion of the client
base that is impacted, which path is more likely to occur on a timely basis:

   - Removal/upgrade of resolvers or middle boxes causing these issues
   - Deployment of new code on resolvers and clients with ways of
   addressing the RRSIG issue

(I don't think there is any real reason or value for auth-only servers to
do anything different, or at most only add the auth piece of any new logic.)

If the latter (deployment of new code) is the path of least resistance
(which would be unpleasant, obviously), the question would then be: how
would a client signal to a server, that it wants RRSIG records for a
specific signed RRSET/RRTYPE?

The assumption would probably be a worst-case scenario: no EDNS, but
possibly transparent path for AD/CD bits, and possibly support for new
OPCODEs. (Testing real paths might be needed for the OPCODE support.)

The methods I can think of are basically:

   - Underscore added to QNAME, to indicate the second QTYPE (either
   _RRSIG.QNAME for QTYPE==thing that is signed by the RRSIG, or _QTYPE.QNAME
   with real QTYPE being RRSIG).
   - New OPCODE for RRSIG, so instead of OPCODE==0 and QTYPE==FOO, have
   OPCODE==RRSIG and QTYPE==FOO
   - The returned reply would either be just the RRSIG of the right QTYPE,
   or the answer of QTYPE RRSET in the Answer section, and the RRSIG(s) in the
   Additional section

Absent the above, it is probably fair to exclude RRSIG from things that can
get sensible answers, and 403[345] should be updated to clarify.

(IMHO, the extra logic might not be too bad, and would potentially be
useful for advancing the deployment of validating stubs.)

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] Broken A and J root responses

2021-02-26 Thread Brian Dickson
This is of interest to both resolver operators and Verisign.

We have noticed broken responses to certain query types from some instances
of A and J.
This was raised originally by David Kinzel, BTW, on the DNS-OARC Mattermost
channels.

We have seen queries for NSEC for both "jp" and "sl" return results that
could/would poison the root delegation NS set (and this was what David saw
that started the investigation).

See below for the query/response. Note the Authority section in particular.

Brian Dickson
GoDaddy

dig +do +norec @a.root-servers.net nsec sl. +nsid


; <<>> DiG 9.16.7 <<>> +do +norec @a.root-servers.net nsec sl. +nsid

; (1 server found)

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27231

;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 3


;; OPT PSEUDOSECTION:

; EDNS: version: 0, flags: do; udp: 4096

; NSID: 6e 6e 6e 31 2d 73 66 6f 37 ("nnn1-sfo7")

;; QUESTION SECTION:

;sl. IN NSEC


;; ANSWER SECTION:

sl. 86400 IN NSEC sling. NS RRSIG NSEC

sl. 86400 IN RRSIG NSEC 8 1 86400 2021031117 2021022616 42351 .
CQf3h+rHcoK2WSn7ItV8IQLb6yFFXSA+Lt86S58sm32u7QtTJsepap6r
LcREA16YEmr5N9U7ytPyqNZmH92q24XGAtB0bikn9iZXTuIDG6BztbLr
EqmDZ+lxutzmLDL2LOA9wcnk6TiKirxcId9j95Evy3gVNObAe94xvQIw
5LLtjeyQqRvWM+SAg7aXOyugedYIJtxUBVg9P7AHlLU+Z5HSfXo8EeJ9
NgyrkVnNnJNyJ7n02qNiyCiNm0lrkglWTbEAt5iquR6KiLlKcrB6ml3c
ZSqfTBv108Ev+iuL3W80kWJEpkwomPRVlF+2R4yCZt38kA0Xc0VBp4FR hTlGYA==


;; AUTHORITY SECTION:

. 172800 IN NS ns2.neoip.com.

. 172800 IN NS ns1.neoip.com.

. 518400 IN RRSIG NS 8 0 518400 2021031117 2021022616 42351 .
WTZU7GHTyNZvGFvc+avXpUgu26QDWaywDOoS0Ac8FQnuVnwvIbYpdoew
jMJFmZ5b7rWdzlJ6NgwURxLX7/0EOSDYk3sTdnjK9RtQbVtEBCueiSF4
3xkFNILgmiCYuoLQLHNpue/ORvEPMQUYif33KLoSgoX+qMLEqjrp14E0
qKmDCErjHkrV3uqRmvix5psxLSebhCz4WJeqPC3kIi6OcfGMQO5siI4L
gVNnw9Hmal7W9UJGokDbhcsnb51Q43rGlrfp6pBosiWYfJDys9YWg4jU
JUeShUFLH74SqavH+jQ0FsPoi5Vzbtfua3GUs0T67J2TpctlOjUBD3oz yX1g9g==


;; ADDITIONAL SECTION:

ns2.neoip.com. 172800 IN A 64.202.189.47

ns1.neoip.com. 172800 IN A 45.83.41.38


;; Query time: 21 msec

;; SERVER: 198.41.0.4#53(198.41.0.4)

;; WHEN: Fri Feb 26 11:12:15 PST 2021

;; MSG SIZE  rcvd: 719
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] Speaking of fixing things...

2020-10-30 Thread Brian Dickson
Hi, Victor,
Would you mind checking the list for domains with broken signed delegations
to anything matching *.domaincontrol.com (GoDaddy's nameservers), including
categorization (e.g. lame NS, vs non-lame NS with broken signature)?
My suspicion is there may be a bunch of lame delegations, and knowing which
TLDs (and if possible domains!) would be greatly appreciated.
Cleaning up lame delegations is neither easy nor fast, but we do want to
actually clean them up.

(The root issue is there is currently no path for the delegatee to get the
lame delegation removed. None. Nada. :-( )

Thanks,
Brian

On Thu, Oct 29, 2020 at 10:59 PM Viktor Dukhovni 
wrote:

> I have a list of ~69k domain names with extant DS RRsets, where the
> DNSKEY RRset has been either unavailable or failing validation for 180
> days or more (92k domains if the bar is set to 90 days).  These span 439
> TLDs!  Of these domains, ~30k are simply lame and zone apex NS lookups
> fail even with CD=1.  The remaining ~39k likely have DNSSEC-specific
> misconfiguration.
>
> The top 25 TLDs by count of long-term dead signed delegations are:
>
>   24742 com
>9258 nl
>5357 se
>4553 cz
>2897 net
>2763 eu
>2044 pl
>1661 org
>1070 no
>1035 hu
> 992 fr
> 916 nu
> 731 uk
> 701 info
> 594 be
> 562 ch
> 557 xyz
> 552 de
> 421 es
> 349 sk
> 346 dk
> 321 app
> 282 io
> 250 biz
> 240 pt
>
> If any of the TLDs have policies that allow the deadwood to be delisted
> (still registered, but not delegated) I can provide the list of
> domains...  It would be nice to see less breakage in the live zones.
>
> --
> Viktor.
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-11 Thread Brian Dickson
On Fri, Sep 11, 2020 at 1:01 PM Vladimír Čunát 
wrote:

> On 9/11/20 9:14 PM, Randy Bush wrote:
> >> The main issue with having the discussion on github, is that it is a
> >> discussion on github, not on a major mailing list involving the
> >> operators and folks doing independent implementations.
> > for cabals which like a bubble, this is a feature, not a bug
>
> Are you telling me that Flag Day 2020 got too little publicity in here
> and similar circles?  (its web, linked to GitHub, the plans, etc.)  I
> rather thought we've pushed it everywhere often enough to make anyone
> sick of the topic, but perhaps my perception is biased.  I'm really
> sorry if anyone feels excluded from the discussion.  To be clear, we've
> had multiple "Flag Day 2020" threads just on this list.
>

TL;DR: Yes, or rather the content of that discussion was not necessarily
raised adequately in other venues, IMNSHO.

The main participants on that github site appear to not have had enough
breadth and depth of experience on networks, low-level transports, and all
the roles in the DNS ecosystem, to collectively make supportable
conclusions/decisions.
E.g. "voting" based on participant's opinions is neither justifiable nor
sensible, especially if there are only five voters. The majority of those
votes appeared to be suffering from "group think" after reading the paper
discussing the range of 12xx to 1400.

In particular, things really went off the rails when discussing using
client-side defaults lower than current defaults, i.e. Paul Vixie's
suggestion to leave the offered client bufsize as is, and only change the
server-side configured max size.

The discussion around that mostly devolved to vendors/implementers
defending past engineering/implementation choices (that's how we did it),
without adequately considering the actual real-world impact.

The deployment to client side machines will be a slow roll, for sure.

But the difficulty, time, and pain, to roll that *back* when it is
discovered as a major operational PITA and cost for authority operators,
has been completely overlooked.

In short: I would be perfectly okay if the recommendation were ONLY for the
authority (and server side of resolvers) to lower their default configured
UDP bufsizes, at which point having a range of recommended values (rather
than a single value) would be more appropriate.
Server-side defaults can have their values changed (overridden) by config
changes, but that ONLY has effect if the clients are NOT ALSO implementing
the SAME values.

That's the problem: EDNS0 UDP Bufsize negotiation allows different values
to be configured/offered, and uses the MINIMUM value. If both ends have
their defaults lowered, and that causes a problem, it CANNOT be fixed
unilaterally.

Even only considering the recursive resolver population (estimated at ~3M),
this is a huge issue, and IMHO a huge mistake.

The analysis of the relative impact (e.g. N x cost for TCP) ignores things
like state exhaustion, where the state CANNOT be increased (since port 53
is mandated, and server IP addresses are hardcoded everywhere). You cannot
add IP addresses or ports to fix state exhaustion, which can be a localized
issue on anycast operated networks.

Sorry for the long message, but it is really a big deal, and the timeline
is unfortunate. I'd suggest pushing the date back by a month or two
minimum, and re-opening discussion on these issues on the github site.

Or, discuss them here with a wider set of participants.

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] DNS Flag Day 2020 will become effective on 2020-10-01

2020-09-11 Thread Brian Dickson
To quote the late Douglas Adams, from HHGttG:
>
> “There’s no point in acting surprised about it. All the planning charts
> and demolition orders have been on display at your local planning
> department in Alpha Centauri for 50 of your Earth years, so you’ve had
> plenty of time to lodge any formal complaint and it’s far too late to start
> making a fuss about it now. … What do you mean you’ve never been to Alpha
> Centauri? Oh, for heaven’s sake, mankind, it’s only four light years away,
> you know. I’m sorry, but if you can’t be bothered to take an interest in
> local affairs, that’s your own lookout. Energize the demolition beams.”


The main issue with having the discussion on github, is that it is a
discussion on github, not on a major mailing list involving the operators
and folks doing independent implementations.

The other main issue is, that EDNS UDP size is negotiated.
This means that it is NOT required that the default be the same on both the
client and server.
I would argue that which end should be lower, should depend largely on:

   1. The number of potential places overrides are necessary
   2. The comparative skill and expertise of the operators in those places
   3. The log-log nature of distribution of volume of queries between the
   top-talkers (biggest recursives and biggest authorities) and the long tail
   4. The position in the ecosystem of the various software elements, i.e.
   client-recursive vs recursive-root vs recursive-TLD vs recursive-leaf-auth

There is asymmetry involved, when a lot of small-ish clients have new
defaults that end up triggering excessive TCP traffic, which
disproportionately impacts the authority server operators who then have no
control over the situation.

As such, I would greatly prefer the recommendation be lifted to a higher
number, that is supported by the data, which achieves the following
simultaneous goals:

   - Minimize probability of fragmentation (to approximately 0.1% or 0.01%
   or even lower)
   - Minimize the resulting degree of TCP traffic triggered by DNS
   responses that exceed the UDP size negotiated

To me, that means maximizing the UDP size within a reasonable range of
observed data points with similar-enough behavior.

>From a theoretical perspective, I would be surprised if 1452 wouldn't work,
but the data suggests otherwise, from what I recall. (1452 = 1500 - 8 bytes
for MPLS or 802.1q, twice, plus L2 (ethernet frame) encapsulation over
MPLS, plus IP-in-IP encapsulation, either/or (twice).
If 1400 will work pretty much as well as 1232, I really want to encourage
re-evaluating consensus regarding the Flag Day 2020 number.

Brian

P.S. Maybe we could call this "frag day" instead? Apologies if anyone finds
that term offensive for any reason. But this is all about fragmentation,
and "frag" is much less of a mouthful, and it rhymes with "flag".

On Fri, Sep 11, 2020 at 9:46 AM Vladimír Čunát 
wrote:

> On 9/11/20 4:44 PM, Paul Hoffman wrote:
> > If this is really just a vendor-driven flag day, please be clearer about
> that on the web page.
>
> The GitHub repo and other places have been open for *everyone* to
> participate in the discussions.  That's how I understand the "we",
> similarly to "we DNSOP".  Yes, the final number was not unanimous, but
> such a thing rarely happens.  And yes, I think it's true that
> open-source resolver vendors were the most active there.
>
> --Vladimir
>
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] [Ext] Nameserver responses from different IP than destination of request

2020-08-31 Thread Brian Dickson
On Mon, Aug 31, 2020 at 5:09 PM Paul Hoffman  wrote:

> On Aug 31, 2020, at 2:47 PM, Viktor Dukhovni 
> wrote:
> >
> > Quite likely the domains that are completely broken (none of the
> > nameservers respond from the right IP) are simply parked, and nobody
> > cares whether they they actually work or not.
> >
> > The only reason you're seeing queries for them may be that folks doing
> > DNS measurements, query all the domains we can find including the parked
> > ones that nobody actually cares to have working.
>
> These assumptions seem... assumptiony. I'd love to see some data from
> anyone who is collecting it on which NS names or IPs are exhibiting the
> behavior.
>

I don't disagree, but the data would really only be visible to anyone
on-path or at either end of the resolver-to-authority transaction.

I think the only way to get meaningful data would be an active experiment,
involving an authority server (or set of servers) for a domain set up just
this way.
That is the kind of thing that Geoff and George are good at, so if they
want to do such an experiment and let us all know the results, I think that
would be interesting.

But I can't compel them to do that, and absent them choosing to do that, I
think the general consensus is it's fine to let the broken stuff be broken.
(The interesting result would be on the resolver side, as to which
resolvers, if any, accept broken answers, and if possible, inferring the
resolver operator's software being used.)

Brian
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] prefetching and thundering herds

2020-07-15 Thread Brian Dickson
TL;DR:
I think the main issue is, to make sure that any caching "stubs" (e.g.
resolvers in forward-only mode) do NOT do pre-fetch, but rather only query
for expired entries when a natural query from the forwarder's client(s)
(e.g. from an application on the host) occurs.

That should, in principle, prevent thundering herd from client to recursive.

Doing the opposite (prefetch by forwarder) would definitely cause
thundering herd behavior, and likely cause significantly degraded
performance from the client applications' perspective.
(UDP, network queues, hardware queues, retries, and all that fun stuff
being triggered with greater and greater synchronization over time.)

Brian

On Wed, Jul 15, 2020 at 4:50 AM Tony Finch  wrote:

> I've been wondering about the effects of stub resolvers with caches as
> clients of recursive servers. To what extent do they cause a thundering
> herd effect where all the cache entries expire with the same deadline?
> The herd will arrive when the RRset expires so most of those clients will
> hit maximum latency and stress the server's query deduplication mechanism.
>
> (I don't think I have enough traffic to get a useful answer from my
> servers right now.)
>
> If thundering herds happen, do they thunder enough to help explain the
> lack of benefit from prefetching observed by PowerDNS?
>
> Or maybe is the herd is too small to thunder? Instead there's a more
> gentle swell of queries after the TTL expires?
>
> https://lists.dns-oarc.net/pipermail/dns-operations/2019-April/018605.html
>
> If there is much of a herd, would it make sense to give some proportion of
> the clients a slightly reduced TTL so that they will trigger prefetch
> before the rest of them requery?
>
> Tony.
> --
> f.anthony.n.finchhttp://dotat.at/
> Bailey: Southwest 4 or 5, increasing 6 or 7 later. Moderate or rough,
> occasionally very rough later in far northwest. Drizzle, fog patches.
> Moderate
> or poor, occasionally very poor.
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] any registries require DNSKEY not DS?

2020-04-17 Thread Brian Dickson
On Fri, Apr 17, 2020 at 12:57 PM Olafur Gudmundsson  wrote:

>
>
> On Jan 22, 2020, at 11:16 PM, Paul Vixie  wrote:
>
> On Thursday, 23 January 2020 02:51:28 UTC Warren Kumari wrote:
>
> ...
>
> If the parent makes the DS for me from my DNSKEY, well, then the DS
> suddently "feels" like it belongs more to the parent than the child,
> but this is starting to get into the "I no longer know why I believe
> what I believe" territory (and is internally inconsistent), so I'll
> just stop thinking about this and go shopping instead :-)
>
>
> as you see, the DS RRset is authoritative in the parent, in spite of its
> name
> being the delegation point, which is otherwise authoritative only in the
> child. so, DS really is "owned by" the delegating zone, unlike, say, NS.
>
> historians please note: we should have put the DS RRset at $child._dnssec.
> $parent, so that there was no exception to the rule whereby the delegation
> point belongs to the child. this was an unforced error; we were just
> careless.
> so, example._dnssec.com rather than example.com.
>
> --
> Paul
>
>
> Paul,
> If start talking about history and looking back with hindsight
>
> IMHO the second biggest mistake in DNS design was to have the same type in
> both parent and child zone
> If RFC1035 had specified DEL record in parent and NS in child or the other
> way around it would have been obvious to
> specify a range of records that were parent only (just like meta records)
>  thus all resolvers from the get go would have known that types in that
> range only reside at the parent.
> ……
> If we had the DEL record then that could also have provided the glue hints
> and no need for additional processing,
>

Would the method have potentially been to have GLUEA and GLUE records
rather than effectively overloading the A/ status (authoritative vs
not)?
And then all of the new types that live only in the parent, could have been
signed.
I'm guessing it's way to late to start doing that now, without rev'ing all
of DNS to v2.

Brian



>
> You may recall that in 1995 when you and I were trying to formalize for
> DNSSEC what the the exact semantics of NS record were, then you and Paul
> Mockapetris came up with
> “Parent is authoritative for the existence of NS record, Child is
> authoritative for the contents”
>
>
> Just in case you are wondering what was the biggest mistake that is QR
> bit, recursion should have been on a different port than Authoritative.
>
> But this is all hindsight based on 30 years of coding and operational
> difficulties.
>
> Regards,
> Ólafur
>
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations