Re: [dns-operations] Planning the 2021 DITL collection

2021-03-05 Thread Mark Allman

> OARC is beginning planning for the 2021 Day in the Life (DITL)
> collection.

As a researcher, the DITL collection is a fantastic resource.  I
appreciate all the hard work.

That said, as I have used or tried to use the data over the years I
have been bit by the lack of meta-data.  I would encourage folks to
document a few simple things as the data is collected.  In
particular:

  - It is often crucial to know what is missing from a dataset, if
possible (it isn't always).  So, e.g., if there are 10 replicas
of x-root and data only comes from 7 of them that is good to
scribble down.  And, which are missing and where they are
located would also be nice to know.

  - Similarly, if you have some indication of the measurement based
packet loss rate please also scribble that down.  That isn't
packets lost in the middle of the network somewhere, but packets
that were not recorded by the measurement infrastructure.
Tcpdump or the like spit out their own (incorrect, but sometimes
better than nothing) notion of this and recording that would be
handy.

  - If the packets in the traces have been changed in any ways from
what was on the wire, it'd be great to know.  The crucial one
here is whether the IP addresses have been anpnymized.  And, if
so, are they being uniformly anonymized across all the traces /
locations you submit?  Or, is it random per trace / DNS server /
what?

  - If there is something strange going on that might impact how
folks interpret the data, please scribble it down.  Even really
benign things like the disk filled and so there is an hour long
gap are handy to know because when we see this gap we can
readily decide it wasn't network-related.

  - Add some easily accessible contact information if you wouldn't
mind.  Sometimes we could use some help in figuring out puzzles
in the data.  I know sometimes folks don't want to be interupted
to help ... and OK.  But, if you wouldn't mind, we'd for sure
appreciate it.

I am not suggesting some formal document or something.  Scribble in
a text file that can be left with the data.  Anything is better than
the current state.

Many thanks!

allman


signature.asc
Description: OpenPGP digital signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] resolver cache question

2020-11-17 Thread Mark Allman

>> [1] https://www.sciencedirect.com/science/article/abs/pii/S1389128620312627
>
> This is a network security paper, not a systems engineering
> paper. The authors are primarily concerned with the storage
> requirements of systems that store DNS data well beyond the
> nominal TTL of the RRsets, which they want to reduce by filtering
> out "disposable" records.

Sure.  There are multiple goals.  The logging goal is a little weird
as motivation to my eye.  In any case, there are other papers, too.

> As far as detecting whether "clogging" of DNS resolver caches
> occurs, it probably would make sense to have resolvers increment a
> "premature expiration" counter when they expire cache entries
> prior to the scheduled TTD of the cached RRset or cached
> message. This would allow operators to provide feedback to
> resolver developers as to whether this behavior is actually
> occurring in the real world. (Looking at the unbound-control
> manpage I don't think unbound has such a metric, not to pick on
> unbound.)

I was sort of hoping this kind of thing was logged and folks could
give me a clue with actual data.  Seems like a natural thing to log.
But, I guess not.  Or, maybe folks are not willing to share.

Thanks!

allman


signature.asc
Description: OpenPGP digital signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] resolver cache question

2020-11-13 Thread Mark Allman

> One could use a Bloom filter to avoid caching (most) lookup
> results that are encountered just once.  Or start out with an
> artifically lowered TTL combined with prefetching.

I am not sure what you mean.  If a given lookup isn't in a bloom
filter, add it to the bloom filter and if it is cache it?

I dunno ... do we need that?  That's sort of the question.  Do
caches end up evicting useful records because of these one-time
records?  Or, do current cache sizes and eviction policies just
basically handle this without a lot of complication?  I really have
no idea.

> The authoritative server operators know that their zones are used
> for what is essentially an RPC.  Don't they set a zero TTL?

No.  The first paper I cited says in their dataset the one-time
lookups have TTLs on the order of 600sec (on median or something I
can't quite recall).  So, non-trivial.

> So maybe this is about ignoring low TTLs to increase cache hit
> rates.  Unfortunately this report is behind a paywall, and I'm not
> going to spend $35.95 to find out if the authors have considered
> Bloom filters and the rest.

Yeah- annoying as hell.  The particulars are different, but [3] has
the same sort of thrust.  I.e., infer which lookups will only be
used once and don't cache them.  I should have linked to this, as
well, in the first message.  The investigation is fine as far as it
goes.  But, it makes some assumptions about the size and operation
of the cache and those are the aspects I'd like to better
understand from folks who run real resolvers.

allman


[3] 
https://ccronline.sigcomm.org/2017/exploring-domain-name-based-features-on-the-effectiveness-of-dns-caching/


signature.asc
Description: OpenPGP digital signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] resolver cache question

2020-11-13 Thread Mark Allman

Folks-

I just finished reading a paper that basically tries to figure out
if a hostname is worth caching or not [1].  This isn't the first
paper like this I have read.  This sort of thing strikes me as a
solution in search of a problem.  The basic idea is that there are
lots of hostnames that are automatically generated---for various
reasons---and only ever looked up one time.  Then there is an
argument made that these obviously clog up resolver caches.
Therefore, if we can train a fancy ML classifier well enough to
predict these hostnames are ephemeral and will only be resolved the
once---because they are automatically generated and so have some
tells---then we can save cache space (and effort) by not caching
these.

  - My first reaction to the notion of clogging the cache is always
to think that surely some pretty simple LFU/LRU eviction policy
could handle this pretty readily.  But, that aside...

  - I wonder how much this notion of caches getting clogged up
really happens.  Could anyone help with a clue?  How often do
resolvers evict entries before the TTL expires?  Or, how much
over-provisioning of resolvers happens to accommodate such
records?  I know resolver caching helps [2], but I always feel
like I really know nothing about it when I read papers like
this.  Can folks help?  Or, point me at handy references?

[1] https://www.sciencedirect.com/science/article/abs/pii/S1389128620312627
[2] https://www.icir.org/mallman/pubs/All20b/

Many thanks!

allman


--
https://www.icir.org/mallman/
@mallman_icsi


signature.asc
Description: OpenPGP digital signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-12-18 Thread Mark Allman


> Still, I believe that a small resolver instance only needs a few
> DNS queries to root (per TTL), so switching everyone to always
> transferring the whole root should increase the total traffic
> considerably,

An anecdote here ...

I crunched a day's worth of DNS traffic originated at ICSI (which is
pretty much a "small resolver instance") from mid-Oct (which just
happened to be handy).  The entire root zone file would be ~725
full-size TCP packets.  Our two main DNS resolvers together sent
nearly 63K queries to the root nameservers.

I am not arguing either of these is onerous for us.  But, the notion
that snarfing a MB of zone file is somehow a considerable increase
in traffic vs. what we impose on the roots now seems dubious.

allman
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-12-18 Thread Mark Allman

>> On 11 Dec 2019, at 12:51, Stephane Bortzmeyer  wrote:
>>
>> IMHO, this is by far the biggest issue with your proposal: TLDs change
>> from one technical operator to another and, when it happens, all name
>> servers change at once.
>
> That’s not correct.
>
> In principle, they could all change at once, In reality, they
> don’t.

I wondered about this.  So, I crunched across our corpus of root
zone files, which spans from Apr 28 2009 to now (I stopped crunching
on Dec 11 2019).  We have one zone file per day (we miss a day here
or there due to glitches, but not many, the corpus is 3,500 days
long).  I found:

  - There are 1,578 TLDs that appear in the root zone file at some
point in the last 10 years.

  - Of those, 1,139 (72.2%) have at least one nameserver (by IP)
that is constant over the entire period the TLD is active.  (I'd
have not guessed it was this high!)

  - For the remaining 439 TLDs, for each day the TLD was active I
calculated how many days into the future would it be until none
of the current set of nameservers (by IP) would no longer be
listed.  For each TLD I took the minimum value.  That shows:

+ 173 TLDs (or 11.0% of all TLDs) at some point have a switch as
  Stephane describes.  I.e., there are no common IP addresses in
  the nameserver set between day X and day X+1.

+ Another 107 TLDs (or 6.8% of all TLDs) had a point where a
  zone file become outdated more in [2,7] days.

+ 75 TLDs (or 4.8% of all TLDs) had a point where a zone file
  become outdated in [8,30] days.

+ 84 TLDs (or 5.4% of all TLDs) only ever became outdated after
  more than 30 days.

FWIW.

allman
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-12-18 Thread Mark Allman


Hi Stephane!

Thanks for the note.  I have been thinking about this point a bit.

> IMHO, this is by far the biggest issue with your proposal: TLDs
> change from one technical operator to another and, when it
> happens, all name servers change at once. Should your proposal be
> implemented, we would have to debug problems with root zones
> outdated by 1, 2, 3 months and some TLD working for some resolvers
> but not all. (True, we already have resolver-specific issues, but
> I think it would aggravate the problem.)
>
> This would have anti-competitive consequences, discouraging TLD
> holders to swap technical operators.

I get your point.  But, it's predicated on resolvers using a local
zone file that is "outdated by 1, 2, 3 months".  And, I can't really
quite figure out if that is realistic or, if it is, how much we
should care.  A few comments:

  - To be clear, the scenario is that someone has taken the step to
grab a root zone file and use it in the resolution process, but
then (a) have no effective update process to update the zone file or
(b) have a process that goes bad some point without anyone
noticing.  This would without a doubt happen if we shut down the
root nameservers and forced everyone to use local replicas.
But, I am hard pressed to convince myself it'd happen a lot or
that we should care about (/engineer around) such shoddy
operations.

  - Seems like if we took this approach to run without root
nameservers that we'd first design software to update local
replicas in an automated and robust fashion.  In other words,
this isn't something every operator is going to have to piece
together themselves.

  - To the extent that this is an issue, RFC 7706-style local roots
already have it.  So, this is not a new issue---but, the issue
might be bigger if more local roots existed.

  - Finally, I think there is some incentive to stay up-to-date.  We
do see problems when soft state becomes de-facto hard state
because it doesn't change except for once every eon or so.
E.g., the root hints file.  But, since the root zone file does
change pretty constantly (albeit in small ways), there is an
incentive to keep up, it seems to me.

I guess in sum, after some thought I am not ready to buy that this
situation you describe will constitute a big enough phenomenon to
exert anti-competitive pressure on TLD holders.

allman
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-12-02 Thread Mark Allman


> For reachability, it is not enough to consider the nameserver IP
> addresses, did you also check DS record stability?

I did not.  I was more interested in understanding how much the
infrastructure churned.  To me the crypto stuff is config that we
can more readily hack.   And, while I didn't scrutinize your long
list, I sort of skimmed and it seems the changes may not always keep
for a year, but are generally more than "a few days".

However, that said, ...

> In any case, it seems likely that have a root zone that is a year
> out of date would be problematic for many TLDs.

My point wasn't to argue a zone file that is a year out of date is
somehow OK.  Even without the DNSSEC bits, I think not being able to
reach 50 TLDs is not OK.  However, the infrastructure seems slowly
changing.  And, that has some ramifications.  And, we might be able
to leverage those if we wanted to ...

  - E.g., if we wanted to extend the TTL that doesn't seem like it
would be a big problem.

  - E.g., if *in a pinch* we had to use an expired, but not too old
root zone file to reach a TLD server because we couldn't fetch a
current zone file that would likely be OK, too.

allman
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-12-02 Thread Mark Allman


Hi Florian!

> What's the change rate for the root zone?  If there is a full
> transition of the name server addresses for a zone, how long does
> it typically take from the first change to the completion of the
> sequence of changes?

Not a direct answer to your question, but a couple empirical bits
from the paper that started this thread ...

We analyzed a snapshot of the root zone file from each day in
April, 2019.  On the first of the month the root zone included
1,532~TLDs and one was deleted during the month.  Of the TLDs,
all but five have at least one nameserver (by IP address) that
is constant for the entire month.  That is, if a recursive
resolver used a root zone file that was out of date by one
month, 99.6\% of the TLDs would remain accessible.  The five
TLDs that do not have a constant nameserver for the entire month
are run by NeuStar and use a slowly rotating set of IP addresses
for the TLD nameservers.  The overlap ensures that a root zone
file that is no more than 14~days out of date will ensure
constant TLD reachability.  Further, comparing the root zone
files on April 1, 2018 and April 1, 2019 we find that all but
50~TLDs (3.3\%) would still retain reachability with a root zone
file that is a year out of date.

Obviously, there could be a more comprehensive analysis, but I think
that gives some idea about how stable the root zone file is in
practice.

allman
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-26 Thread Mark Allman


Hi Paul!

> The biggest problem I see here is the legacy/long-tail problem. As
> of a few years ago, I bumped into BIND 4 servers still
> active. Wouldn't be shocked to hear they are still being used.
>
> IPv4 reachable traditional DNS servers for some tiny group of
> antique folks will be needed for years, even if we get 99+% of the
> world to some new system.

I wonder if we're ever allowed to just decide this sort of thing is
ridiculous old shit and for lots of reasons we can and should just
garbage collect it away.

> Doesn't mean we shouldn't be thinking about a better way to do it
> for that 99% though.

Is it better if we only get to 99%?

To me, this whole notion is that we can in fact get rid of this
giant network service.  If we don't get rid of it then what is the
incentive to move one's own resolver away from using the root
nameservers?  I don't have any heartburn with RFC 7706.  But, it is
a quite minor optimization in the general case.  It may well be
important in some corner cases, but in general I don't think running
a local root nameserver helps all that much.

Maybe 99% lets us draw down the size of the root infrastructure...I
dunno.  But, if we don't say something like "it's going to go away"
then I am not sure resolvers will move away from it.

allman


--
https://www.icir.org/mallman/
@mallman_icsi
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-26 Thread Mark Allman


> It would appear a rather large percentage of queries to the root
> (like 50% in some samples) are random strings, between 7 to 15
> characters long, sometimes longer.  I believe this is Chrome-style
> probing to determine if there is NXDOMAIN redirection. A good
> example of the tragedy of the commons, like water pollution and
> climate change.

I will note that there have been quite a number of studies over the
last 20 years that show > 95% of the queries are junk of one kind or
another.  Someone mentioned Duane's nice paper.  But, this
observation started with Brownlee, et.al.'s 2001 paper.  Point
being, Chrome might cause some of this now, but it has been there
long before Chrome started this particularly probing.

What's more... in my rudimentary poking of the DITL data [*] it
seems that 25-50% of the "resolvers" that query the root *never*
send a legit query.  I.e., we can't ascribe a lot of this junk to
resolvers that could just work better somehow.

[*] There may be numbers on this sort of thing in the Brownlee,
Wessels, etc. papers ... I just can't recall them off the top of
my head.

allman

--
https://www.icir.org/mallman/
@mallman_icsi
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] root? we don't need no stinkin' root!

2019-11-26 Thread Mark Allman


Let me try to get away from what is or is not "big" and ask two
questions.  (These are legit questions to me.  I have studied the
DNS a whole bunch, but I do not operate any non-trivial part of the
DNS and so that viewpoint is valuable to me.)

(1) Setting aside history and how things have been done and why
(which I am happy to stipulate is rational)... At this point,
are there tangible benefits for getting information about the
TLD nameservers to resolvers as needed via a network service?

(2) Are there fundamental problems that would arise in recursive
resolvers if the information about TLD nameservers was no longer
available via a network service, but instead had to come from a
file that was snarfed periodically?

Thanks!

allman


--
https://www.icir.org/mallman/
@mallman_icsi
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] root? we don't need no stinkin' root!

2019-11-25 Thread Mark Allman


Left here to be ripped apart ... :-)

  Mark Allman. On Eliminating Root Nameservers from the DNS, ACM
  SIGCOMM Workshop on Hot Topics in Networks (HotNets), November
  2019.
  https://www.icir.org/mallman/pubs/All19b/

  Abstract:
The Domain Name System (DNS) leverages nearly 1K distributed
servers to provide information about the root of the Internet's
namespace. The large size and broad distribution of the root
nameserver infrastructure has a number of benefits, including
providing robustness, low delays to topologically close root
servers and a way to cope with the immense torrent of queries
destined for the root nameservers. While the root nameserver
service operates well, it represents a large community
investment. Due to this large cost, in this paper we take the
position that DNS' root nameservers should be
eliminated. Instead, recursive resolvers should use a local copy
of the root zone file instead of consulting root
nameservers. This paper considers the pros and cons of this
alternate approach.

allman


--
https://www.icir.org/mallman/
@mallman_icsi
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] resolvers considered harmful

2014-10-23 Thread Mark Allman

 simply on their own moves the entire query load of all endpoints
 (billions) onto the authoritative nameservers only. Do you really
 propose a billion clients should perform lookups against my 3 poor
 nameservers for nohats.ca.?

Well 

  - All billions of clients are not interested in nohats.ca so trying to
compare billions of clients against your three nameservers is a red
herring. 

  - I don't know what your load is, but do you have any idea how much
your load will increase if shared resolvers did not shield you from
some of it?  We quantify this a little in our paper (for .com).  We
should use numbers to talk about these things instead of just waving
our hands at some boogie man.

  - And, I'd spin this around on you ... You clearly care about your 3
poor nameservers.  That is natural and rational.  But, why do you
think it is someone else's job to run a cache to shield you from
load?  Why should we at ICSI run a shared resolver for your benefit?
If we get benefit and it happens to help you, too, great.  But, I
can tell you that we certainly don't factor your load into our
considerations of how to run our infrastructure.

 Suggesting to dismantle the largest distributed database in the world
 and thinking you can get away with it is a very ill thought plan not
 rooted in reality.

Well, ...

  - We root our argument in some empiricalism, anyway.  That is more
than you're doing.  One can always get more data and more vantage
points, but at least let's not pretend we just waved our hands here,
please.

  - We are not talking about dismantling the distributed database.  We
are talking about eliminating optional caches from the system.  The
actual database embodied in the auth servers remains untouched.  I
have had many conversations with people over the last year about
this idea and I always find it sort of interesting that resolvers
are viewed as a required component of the system and that it so much
blows people's minds that the system could or should work without
them. 

(Yet, web caches---which one can view as pretty analogous---do not
seem to rise to the level.  I.e., folks seem to view them for what
they are---perhaps a helping hand---and not as some crucial
component of the system.)

  - Doing what we have always done because we have always done it and
not thinking about the implications of that seems like a lousy plan,
too, BTW.

allman





pgpkoQPojfZQi.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-23 Thread Mark Allman

 The biggest problem I have with this paper is of terminology. 

No- I don't want every app to build in a resolver.  Madness!

Think of it as a change under-the-hood to gethostbyname().  Same
interface to the applications.  But, underneath it doesn't go query
whatever is in /etc/resolv.conf, but rather just walks the tree itself
(to the extent needed, based on the cache).

 Then, when it comes to privacy (the biggest problem with your
 proposal), I strongly disagree with the way you get rid of the
 problems by saying we note that many users are willing to use open
 shared resolvers (e.g., Google DNS) and are therefore comfortable with
 directly attributable DNS requests arriving at a large third-party
 network. This is propaganda, not science. Users use Google Public DNS
 because their ISP's resolver is broken or slow, or because the ISP
 censors http://www.bortzmeyer.org/dns-routing-hijack-turkey.html or
 because the IP address is cool or simply because they feel that it's
 Google so it must be nice. They never perform an assessment of the
 public resolver privacy policy and practices, and they certainly don't
 analyze the tradeoffs. Most users (even most IT professionals) have no
 idea of the complex privacy issues associated with DNS.

I understand you have probably thought this through more than I have.
But, I have a couple of views here in addition to the above ...

  - Ultimately you're going to take the results of a DNS transaction and
turn around and hit the given service with an application.  So,
while I may have been some nebulous someone at ICSI during the
name lookup, once I make the TCP connection I am not so anonymous
anymore.

That does not apply to all cases, of course.  I.e., I ask Verisign
for google.com and then I TCP to Google and not Verisign.  So, in
this case I could remain someone at ICSI to Verisign if I used the
shared resolver.

  - I think a rational way to look at this is the way we look at privacy
more generally.  If you communicate with someone then they'll know
your IP.  If you don't want that, take some explicit step to prevent
it (e.g., use Tor).  We get an obfuscation from shared resolvers
now, but is that enough of a reason to keep them around?

allman





pgpScm1GuXrlh.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-23 Thread Mark Allman

 cache hit rate is about 80%-90% for those caching you think can be
 removed. Note that this cache hit rate is heavilly skewed because of
 the facebook one time uncachable hostnames they were using at the
 time. If you also include the fact that these caches were feeding
 other caches, you will see the enormous amount of queries you are
 suggesting to unleash on authoritative nameservers on the internet.

Right... Our cache hit rate is somewhere in the two-thirds area.  At
least the last time I looked.  (We have some notions from a couple years
ago in http://www.icir.org/mallman/pubs/CAR13/ .)  But, one important
bit here is that while this does increase the load, the load is
distributed.  So, it isn't like we're landing all the load one given
point in the network.  And, it is distributed proportionally to the
popularity of the underlying services (which intuitively seems about
right to me).

   - And, I'd spin this around on you ... You clearly care about your 3
 poor nameservers.  That is natural and rational.  But, why do you
 think it is someone else's job to run a cache to shield you from
 load?
 
 If you really believe the mode of the internet should be that the
 weakest device should be able to deal with the largest volume load the
 world can throw at it, there is not much point discussing this
 further.  I'm just happy that people like Van Jacobson designed the
 internet.
 
  Why should we at ICSI run a shared resolver for your benefit?
 
 Because thousands of ISP run caches for your servers' benefit.
 Using your reasoning, we should drop all the exponential backoff
 in our TCP/IP protocols. You'll just have to deal with the load,
 and if you get blasted off the net it's clear your fault for being
 underpowered.

I think this is a really bad analogy.  I do happen to know something
about congestion control.  Maybe even two things!

Congestion control is a shared set of algorithms / strategies for
dealing with the case when some shared piece of infrastructure is
over-committed.  For instance, a link in the middle of the net.  It
isn't any one person's/host's fault that it is over-capacity.  So, we
agree on a set of techniques that we can all use to reasonably backoff
and share the link so nobody starves.

We do not point at the owner of the link and say hey, we're trying to
use more capacity than you have so buy some more capacity!.  I.e., we
don't impose on someone else to add resources on our behalf.  Rather, we
decide to all play nice and so we all get something done (even if slower
than we'd really like).  Congestion control is about coping with a
less-than-ideal shared reality.

But, this is not at all like your nameserver.  Your nameserver is not
shared infrastructure that zillions of disparate people all happen to
(over) use.  It is infrastructure that works on your behalf to serve
names that you want served.  If it can't handle the load then there is a
clear culprit: you.  I.e., your popularity has outgrown your resources.
Why should that be my problem?  We don't have google telling us to all
fire up an institutional HTTP cache on our networks because it has run
out of capacity and its our problem to fix.

Finally, it isn't that I believe the weakest device should be able to
deal with the largest volume load the world can throw at it.  Rather, I
believe if someone is providing a service, then they should be
responsible for provisioning for the load that service incurs (or
dealing with the suboptimal performance).  I wouldn't have thought that
a controversial notion.

allman





pgpvoASMDGV4J.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-23 Thread Mark Allman

 There is no relationship between the data and the conclusion. Having a
 short TTL is not because you make changes often, it's because, when
 you decide to make a change, you want it to be effective rapidly. The
 actual number of changes does not matter, what matter are the
 expectations of users (sorry, buddy, we made the change immediately
 but it will not be seen by all caches before one week).

I think that is totally fair.  But, two things ...

  - The TLDs are a little weird in that they are trying to control for
their load and yet serving someone else's names.  So, yeah, there is
this sort of mismatch where the SLDs would like shorter TTLs because
they want the flexibility and don't pay the serving price.
Meanwhile, the TLDs don't directly care about the flexibility and so
they optimize for load shedding.  So, um, yeah 

  - But, inside google.com this is all much more straightforward.  I.e.,
they can be as flexible as they want to provision for.

allman





pgp4xvBaWNfEW.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-23 Thread Mark Allman

   - As noted in the paper 93% of the zones see no increase in our
 trace-driven simulations.  That is, they are accessed by at most one
 end host per TTL and therefore see no benefit from the shared cache
 and hence will see the same load regardless of whether it is an end
 host or a shared resolver asking the questions.
 
 How does this compare to resolvers with one or two (or four) orders of
 magnitude more clients behind them?  

Presumably pretty well.  I only know of old results here, but Jung's
IMW 2001 paper suggests that the cache hit rate levels off after 10-20
users.  I have in mind that there is a more recent (but not last year)
validation of this, but I don't have a reference at my finger tips.

(And, that follows my intuition, BTW.  I.e., that most of the cache hit
rate comes from popular names and you don't need many users to coax
those popular names into the cache.)

  - There is also a philosophical-yet-practical argument here.  That is,
 if I want to bypass all the shared resolver junk between my
 laptop and the auth servers I can do that now.  And, it seems to
 me that even given all the arguments against bypassing a shared
 resolver that should be viewed as at least a rational choice.
 So, in this case the auth zones just have to cope with what shows
 up.  So, do we believe that it is incumbent upon (say) ATT to
 provide shared resolvers to shield (say) Google from a portion of
 the DNS load?
 
 It doesn’t look to me like your paper has done anything to capture
 what it looks like behind ATT’s resolvers, so I’m not sure how you
 can come to that sort of conclusion.

Correct.  This is a thought experiment with exemplars that I gave names
to. 

allman





pgpIVS7Nd5amz.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

[dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

Short paper / crazy idea for your amusement ...

Kyle Schomp, Mark Allman, Michael Rabinovich.  DNS Resolvers Considered
Harmful, ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets),
October 2014.  To appear.
http://www.icir.org/mallman/pubs/SAR14/

Abstract:
  The Domain Name System (DNS) is a critical component of the Internet
  infrastructure that has many security vulnerabilities.  In particular,
  shared DNS resolvers are a notorious security weak spot in the system.
  We propose an unorthodox approach for tackling vulnerabilities in
  shared DNS resolvers: removing shared DNS resolvers entirely and
  leaving recursive resolution to the clients.  We show that the two
  primary costs of this approach---loss of performance and an increase
  in system load---are modest and therefore conclude that this approach
  is beneficial for strengthening the DNS by reducing the attack
  surface.

Comments welcome.

allman


--
http://www.icir.org/mallman/





pgpDM08LBkHiO.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

Let me try to tackle a few things before heading off to my next meeting
to conjure other stupid ideas! :-)

Florian Weimer f...@deneb.enyo.de:
 This is a bit over the top.  I've suggested multiple times that one
 possible way to make DNS cache poisoning less attractive is to cache
 only records which are stable over multiple upstream responses, and
 limit the time-to-live not just in seconds, but also in client
 responses.  Expiry in terms of client responses does not cause a cache
 expiration, but a new upstream query once the record is needed again.
 If it the new response matches what is currently in the cache, double
 the new client response time-to-live count from the previous starting
 value.  If not, start again at the default low value (perhaps even 1).

Sure... this may well be a fine enough idea.  Lots of such point
solutions are perfectly fine ideas.  Our point is to step back and
wonder whether these resolvers (whether ISP-level, institution-level,
CPE, whatever) are really a large enough benefit compared against the
cost of the attack surface.  And, the paper shows---at least in an
initial fashion---that the benefits might not be all that great.

Andrew Sullivan a...@anvilwalrusden.com:
 There's a third cost here, and that is a large increase in costs to
 authoritative server operators.

 That might be worth trading off, but it won't do to pretend that isn't
 a cost that's incurred.

I absolutely agree.  Please read section 5 which addresses exactly this
question.  We use .com as an example of a popular authoritative domain
in this work.

Frank Sweetser f...@wpi.edu:
 We make pretty heavy use of RPZ to block outbound malware traffic,
 especially to prevent people from inadvertently browsing malicious web
 sites.

Yup, this would be a cost of getting rid of resolvers, I agree.  We
mention policy implementation in the paper, but don't say a lot about
it.  My view here is that you could certainly still do exactly this by
funneling traffic through a resolver.  That might be the right tradeoff
for you.  But, the right default may well be to let clients do lookups
themselves and let this situation happen as the exception in places
where folks want to implement such policy.  In other words, just because
there may well be times we want to use an intermediate resolver for good
and valid reasons does not mean that running zillions of such boxes all
over the place (as we do now) is the right general approach.

Matthew Pounsett m...@conundrum.com:
 The paper also appears to make the assumption that eliminating
 existing resolvers is a thing we can do.  Open recursive resolvers
 won’t go away simply because we, as an industry, decide to stop
 setting up new ones.  There’s no way to prevent them from sending
 queries (or to selectively block them), and they are almost by
 definition unmanaged, so we cannot expect they will be taken offline
 by their respective administrators. 

Sure.  I agree with this.  But, if we make clients default to not using
resolvers then the harm resolvers can do is reduced.  I.e., so what if I
can cache poison a CPE if none of the clients behind it utilize the CPE
for lookups?  Of course, if the CPE is open then we still have
reflection/amplification problems.  But, if nobody is using the DNS
forwarders on these things then maybe they eventually go away.  We are
not under the illusion that one can wave a magic wand and get rid of
these things.  But, individual clients can get benefits from ignoring
them.  And, if they are generally not found to be useful anymore then
maybe they start going away.

Thanks folks!

allman





pgphCJGAo8fWN.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

  Why not just turn on DNSSEC?

 Important zones are still unsigned, so I can understand why there is a
 desire for altenative solutions.

Right.  It isn't like we are lacking for ways to solve the problems we
know about.  E.g., we know how to mitigate the Kaminsky attack.  But,
yet, still there are plenty of vulnerable resolvers (per our PAM paper
From this past spring).  E.g., we know how to secure DNS records with
crypto.  But, yet, broadly speaking we don't do it.  So, perhaps we need
to re-think things.

allman





pgpaXw4VIMmms.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

Let me try to take care of both of these related points together:

Joe Greco jgr...@ns.sol.net:
 Then we merely move on to the issue of cache poisoning individual
 clients.
 
 Assuming that the CPE is a NAT (effectively firewalling clients from
 poisoning attacks) and/or that the individual clients have well-
 designed, impervious resolvers is likely to be a fail.

David Conrad d...@virtualized.org:
 As I understand it, you're proposing pushing the resolvers out to the
 edges 

That is not what we are proposing.  We are not suggesting resolvers be
*moved*, but rather *removed*.  That is, clients simply do name lookup
on their own.

Name lookup at an endpoint is different from name lookup in an
intermediate resolver.  

An intermediate resolver looks up a name on behalf of other hosts.  It
therefore *must* listen for lookup requests that roll in from the
network.  This is fundamental to a resolver's operation---it simply
*must* accept requests from other hosts.  Don't get me wrong it
doesn't have to accept all requests and as we know, too many resolvers
accept requests they should not.  All I am saying is that the resolver
cannot do its job without accepting requests from other hosts.

On the other hand, an endpoint can look up a name without listening for
any request from the network.  We suggest this be an entirely local
operation.  Think of it like this: just because I want to load the
cnn.com web page I don't have to run httpd.  Well, just because I want
to look up an A record for cnn.com doesn't mean I have to run bind.

Could there be attacks against the internal lookup process on a host?
Of course.  But, those are attacks that require some sort of access to
the end host first.

David Conrad d...@virtualized.org:
 if you're not doing DNSSEC at the edges,

Let me be clear I am not arguing against DNSSEC.  A crypto signed
record is always better than a clear text record.  But, DNSSEC is still
not here and it seems to me that factoring out some of the
intermediaries that we know sometimes both play games and have games
played on them may well be a useful path.

allman





pgpQhzuXt7GqY.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

 Their model doesn't make it a large increase, and I think that's
 because of their unrealistic assumptions about actual use.

I am not sure I follow this one.  We used a workload derived from actual
use to assess how much the load on .com would increase in a world where
each endpoint was doing its own name lookups.

 The problem is that the really popular domains on the Internet (Google
 is one example they do discuss) have completely different patterns
 than everyone else.  The mitigation technique of increasing TTLs
 imposes a cost on the operator of the popular domain (a cost in terms
 of flexibility).  The authors seem to think this is no large cost, and
 I disagree.

The paper quantifies this cost for .com.  We find that something like 1%
of the records change each week.  So, while increasing the TTL from the
current two days to one week certainly sacrifices some possible
flexibility, in practical terms the flexibility isn't being used.

 It's true that the increase in your numbers doesn't seem to be as bad
 as the theoretical maximium, bit 2.5 or 3 times several million is
 still a large number.

Right... But, let me make a few points:

  - As noted in the paper 93% of the zones see no increase in our
trace-driven simulations.  That is, they are accessed by at most one
end host per TTL and therefore see no benefit from the shared cache
and hence will see the same load regardless of whether it is an end
host or a shared resolver asking the questions.

  - The point is that the increase rates above don't seem to me to be
disqualifying.  I.e., there is some increase rate (e.g., on the
order of the number of users) that would I think be a show stopper.
But, that doesn't seem to be what we're looking at here.

  - Or, put differently ... We are not pretending that there is no
additional cost at some auth servers.  But, this additional cost
does buy us things.  So, it is simply a different tradeoff than we
are making now.

  - As for popular content providers, as you note, their TTLs are much
lower.  So, while the NS for facebook.com is going to be used by
nearly all the clients behind some shared resolver over its two day
TTL, the A for star.c10r.facebook.com lasts less than a minute and
so a cached version is only going to shield facebook from so much of
the load.  

I haven't run the numbers, but my hunch is that we'll see increase
rates for facebook.com, google.com, etc. to be smaller than for the
.com results we have in the paper because the cachability of these
records is smaller.

  - There is also a philosophical-yet-practical argument here.  That is,
if I want to bypass all the shared resolver junk between my laptop
and the auth servers I can do that now.  And, it seems to me that
even given all the arguments against bypassing a shared resolver
that should be viewed as at least a rational choice.  So, in this
case the auth zones just have to cope with what shows up.  So, do we
believe that it is incumbent upon (say) ATT to provide shared
resolvers to shield (say) Google from a portion of the DNS load?
Or, put differently, the results in the paper suggest that there
really isn't much for ATT to gain from providing those resolvers,
so why should it?  One argument here could be that ATT is trying to
provide its customers better performance.  But, the paper shows this
is really not happening (which is largely a function of pervasive
DNS prefetching).  So, if I am ATT I'd be thinking hey, what am I
or my customers actually gaining from this complexity I have in my
network?!.  And, if the answer is little-to-nothing then it seems
rational to not provide this service.  Or, so it seems to me.

allman





pgpNXQtxM_pYD.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

Re: [dns-operations] resolvers considered harmful

2014-10-22 Thread Mark Allman

It is a terminology issue, I think.

 Perhaps I'm being unclear and/or we're having a terminology
 mismatch. To be concrete: are you suggesting that (a) every
 application on an 'endpoint' provide its own iterative resolution, (b)
 the 'end point' effectively runs an iterative caching resolver at
 127.0.0.1/::1, or (c) something else? 

(c)

Or, all of them!

The implementation does not matter to me.  An app could just run dig
+short for all I care.  My point is that this is an entirely internal
matter to the host.  Unlike the case of a shared resolver there is no
requirement that it accept any lookup request from outside the box.

  All I am saying is that the resolver cannot do its job without
  accepting requests from other hosts.
 
 As a person who frequently runs unbound listening only to 127.0.0.1 on
 my laptop, we may have differing opinions of the scope of the job of a
 resolver. 

I should---as I hope we do in the paper---be careful and use the term
'shared resolver' for something outside the host itself.

 My point was that to definitively fix resolver-to-authoritative,
 you're going to need something like DNSSEC.

Yes- absolutely.  E.g., just because a client does the name lookup
instead of handing to a shared resolver isn't going to make the great
firewall any less likely to forge a response.  So, I don't mean to in
any way say DNSSEC isn't useful.

allman





pgp4JunCe_ilo.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs

[dns-operations] new DNS forwarder vulnerability

2014-03-14 Thread Mark Allman

Just a quick note to let folks know about a new vulnerability we have
found in some low-rent DNS forwarders---which we have been calling the
'preplay attack'.

The finding is that when the vulnerable open resolvers receive a DNS
response they just look at the query string in the response to see if
they have a request for the given string outstanding.  If they do, they
accept the result.  I.e., there is no validating of the source IP, port
numbers or DNS transaction ID in the response.  Dumb.  This makes
poisoning the caches of these boxes trivial (i.e., send a request for
www.facebook.com and then immediately send an answer).

A few notes ...

  - We have found 7--9% of the open resolver population---or 2-3 million
boxes---to be vulnerable to this cache poisoning attack.  (The
variance is from different runs of our experiments.)

  - We have not been able to nail this vulnerability down to a single
box or manufacturer.  To the contrary our efforts at identifying the
boxes indicates it crosses such boundaries.  (However, these boxes
do seem to be largely situated in residential settings.)

  - We presented these results at PAM earlier this week.  Our paper,
slides, etc. with details of the attack (and results about
previously known DNS attacks) are available here:

  http://www.icir.org/mallman/pubs/SCRA14/

  - We did give CERT a heads up about this before the paper appeared and
they kibitzed the information around to various manufacturers of
this sort of gear.

My mental model is that this sort of gear is upgraded when it goes
kaput.  So, vigilance I guess.

FWIW.

allman


--
http://www.icir.org/mallman/





pgpwjwxRONXk2.pgp
Description: PGP signature
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations
dns-jobs mailing list
https://lists.dns-oarc.net/mailman/listinfo/dns-jobs