Re: [dns-operations] Planning the 2021 DITL collection
> OARC is beginning planning for the 2021 Day in the Life (DITL) > collection. As a researcher, the DITL collection is a fantastic resource. I appreciate all the hard work. That said, as I have used or tried to use the data over the years I have been bit by the lack of meta-data. I would encourage folks to document a few simple things as the data is collected. In particular: - It is often crucial to know what is missing from a dataset, if possible (it isn't always). So, e.g., if there are 10 replicas of x-root and data only comes from 7 of them that is good to scribble down. And, which are missing and where they are located would also be nice to know. - Similarly, if you have some indication of the measurement based packet loss rate please also scribble that down. That isn't packets lost in the middle of the network somewhere, but packets that were not recorded by the measurement infrastructure. Tcpdump or the like spit out their own (incorrect, but sometimes better than nothing) notion of this and recording that would be handy. - If the packets in the traces have been changed in any ways from what was on the wire, it'd be great to know. The crucial one here is whether the IP addresses have been anpnymized. And, if so, are they being uniformly anonymized across all the traces / locations you submit? Or, is it random per trace / DNS server / what? - If there is something strange going on that might impact how folks interpret the data, please scribble it down. Even really benign things like the disk filled and so there is an hour long gap are handy to know because when we see this gap we can readily decide it wasn't network-related. - Add some easily accessible contact information if you wouldn't mind. Sometimes we could use some help in figuring out puzzles in the data. I know sometimes folks don't want to be interupted to help ... and OK. But, if you wouldn't mind, we'd for sure appreciate it. I am not suggesting some formal document or something. Scribble in a text file that can be left with the data. Anything is better than the current state. Many thanks! allman signature.asc Description: OpenPGP digital signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] resolver cache question
>> [1] https://www.sciencedirect.com/science/article/abs/pii/S1389128620312627 > > This is a network security paper, not a systems engineering > paper. The authors are primarily concerned with the storage > requirements of systems that store DNS data well beyond the > nominal TTL of the RRsets, which they want to reduce by filtering > out "disposable" records. Sure. There are multiple goals. The logging goal is a little weird as motivation to my eye. In any case, there are other papers, too. > As far as detecting whether "clogging" of DNS resolver caches > occurs, it probably would make sense to have resolvers increment a > "premature expiration" counter when they expire cache entries > prior to the scheduled TTD of the cached RRset or cached > message. This would allow operators to provide feedback to > resolver developers as to whether this behavior is actually > occurring in the real world. (Looking at the unbound-control > manpage I don't think unbound has such a metric, not to pick on > unbound.) I was sort of hoping this kind of thing was logged and folks could give me a clue with actual data. Seems like a natural thing to log. But, I guess not. Or, maybe folks are not willing to share. Thanks! allman signature.asc Description: OpenPGP digital signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] resolver cache question
> One could use a Bloom filter to avoid caching (most) lookup > results that are encountered just once. Or start out with an > artifically lowered TTL combined with prefetching. I am not sure what you mean. If a given lookup isn't in a bloom filter, add it to the bloom filter and if it is cache it? I dunno ... do we need that? That's sort of the question. Do caches end up evicting useful records because of these one-time records? Or, do current cache sizes and eviction policies just basically handle this without a lot of complication? I really have no idea. > The authoritative server operators know that their zones are used > for what is essentially an RPC. Don't they set a zero TTL? No. The first paper I cited says in their dataset the one-time lookups have TTLs on the order of 600sec (on median or something I can't quite recall). So, non-trivial. > So maybe this is about ignoring low TTLs to increase cache hit > rates. Unfortunately this report is behind a paywall, and I'm not > going to spend $35.95 to find out if the authors have considered > Bloom filters and the rest. Yeah- annoying as hell. The particulars are different, but [3] has the same sort of thrust. I.e., infer which lookups will only be used once and don't cache them. I should have linked to this, as well, in the first message. The investigation is fine as far as it goes. But, it makes some assumptions about the size and operation of the cache and those are the aspects I'd like to better understand from folks who run real resolvers. allman [3] https://ccronline.sigcomm.org/2017/exploring-domain-name-based-features-on-the-effectiveness-of-dns-caching/ signature.asc Description: OpenPGP digital signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] resolver cache question
Folks- I just finished reading a paper that basically tries to figure out if a hostname is worth caching or not [1]. This isn't the first paper like this I have read. This sort of thing strikes me as a solution in search of a problem. The basic idea is that there are lots of hostnames that are automatically generated---for various reasons---and only ever looked up one time. Then there is an argument made that these obviously clog up resolver caches. Therefore, if we can train a fancy ML classifier well enough to predict these hostnames are ephemeral and will only be resolved the once---because they are automatically generated and so have some tells---then we can save cache space (and effort) by not caching these. - My first reaction to the notion of clogging the cache is always to think that surely some pretty simple LFU/LRU eviction policy could handle this pretty readily. But, that aside... - I wonder how much this notion of caches getting clogged up really happens. Could anyone help with a clue? How often do resolvers evict entries before the TTL expires? Or, how much over-provisioning of resolvers happens to accommodate such records? I know resolver caching helps [2], but I always feel like I really know nothing about it when I read papers like this. Can folks help? Or, point me at handy references? [1] https://www.sciencedirect.com/science/article/abs/pii/S1389128620312627 [2] https://www.icir.org/mallman/pubs/All20b/ Many thanks! allman -- https://www.icir.org/mallman/ @mallman_icsi signature.asc Description: OpenPGP digital signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
> Still, I believe that a small resolver instance only needs a few > DNS queries to root (per TTL), so switching everyone to always > transferring the whole root should increase the total traffic > considerably, An anecdote here ... I crunched a day's worth of DNS traffic originated at ICSI (which is pretty much a "small resolver instance") from mid-Oct (which just happened to be handy). The entire root zone file would be ~725 full-size TCP packets. Our two main DNS resolvers together sent nearly 63K queries to the root nameservers. I am not arguing either of these is onerous for us. But, the notion that snarfing a MB of zone file is somehow a considerable increase in traffic vs. what we impose on the roots now seems dubious. allman ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
>> On 11 Dec 2019, at 12:51, Stephane Bortzmeyer wrote: >> >> IMHO, this is by far the biggest issue with your proposal: TLDs change >> from one technical operator to another and, when it happens, all name >> servers change at once. > > That’s not correct. > > In principle, they could all change at once, In reality, they > don’t. I wondered about this. So, I crunched across our corpus of root zone files, which spans from Apr 28 2009 to now (I stopped crunching on Dec 11 2019). We have one zone file per day (we miss a day here or there due to glitches, but not many, the corpus is 3,500 days long). I found: - There are 1,578 TLDs that appear in the root zone file at some point in the last 10 years. - Of those, 1,139 (72.2%) have at least one nameserver (by IP) that is constant over the entire period the TLD is active. (I'd have not guessed it was this high!) - For the remaining 439 TLDs, for each day the TLD was active I calculated how many days into the future would it be until none of the current set of nameservers (by IP) would no longer be listed. For each TLD I took the minimum value. That shows: + 173 TLDs (or 11.0% of all TLDs) at some point have a switch as Stephane describes. I.e., there are no common IP addresses in the nameserver set between day X and day X+1. + Another 107 TLDs (or 6.8% of all TLDs) had a point where a zone file become outdated more in [2,7] days. + 75 TLDs (or 4.8% of all TLDs) had a point where a zone file become outdated in [8,30] days. + 84 TLDs (or 5.4% of all TLDs) only ever became outdated after more than 30 days. FWIW. allman ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
Hi Stephane! Thanks for the note. I have been thinking about this point a bit. > IMHO, this is by far the biggest issue with your proposal: TLDs > change from one technical operator to another and, when it > happens, all name servers change at once. Should your proposal be > implemented, we would have to debug problems with root zones > outdated by 1, 2, 3 months and some TLD working for some resolvers > but not all. (True, we already have resolver-specific issues, but > I think it would aggravate the problem.) > > This would have anti-competitive consequences, discouraging TLD > holders to swap technical operators. I get your point. But, it's predicated on resolvers using a local zone file that is "outdated by 1, 2, 3 months". And, I can't really quite figure out if that is realistic or, if it is, how much we should care. A few comments: - To be clear, the scenario is that someone has taken the step to grab a root zone file and use it in the resolution process, but then (a) have no effective update process to update the zone file or (b) have a process that goes bad some point without anyone noticing. This would without a doubt happen if we shut down the root nameservers and forced everyone to use local replicas. But, I am hard pressed to convince myself it'd happen a lot or that we should care about (/engineer around) such shoddy operations. - Seems like if we took this approach to run without root nameservers that we'd first design software to update local replicas in an automated and robust fashion. In other words, this isn't something every operator is going to have to piece together themselves. - To the extent that this is an issue, RFC 7706-style local roots already have it. So, this is not a new issue---but, the issue might be bigger if more local roots existed. - Finally, I think there is some incentive to stay up-to-date. We do see problems when soft state becomes de-facto hard state because it doesn't change except for once every eon or so. E.g., the root hints file. But, since the root zone file does change pretty constantly (albeit in small ways), there is an incentive to keep up, it seems to me. I guess in sum, after some thought I am not ready to buy that this situation you describe will constitute a big enough phenomenon to exert anti-competitive pressure on TLD holders. allman ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
> For reachability, it is not enough to consider the nameserver IP > addresses, did you also check DS record stability? I did not. I was more interested in understanding how much the infrastructure churned. To me the crypto stuff is config that we can more readily hack. And, while I didn't scrutinize your long list, I sort of skimmed and it seems the changes may not always keep for a year, but are generally more than "a few days". However, that said, ... > In any case, it seems likely that have a root zone that is a year > out of date would be problematic for many TLDs. My point wasn't to argue a zone file that is a year out of date is somehow OK. Even without the DNSSEC bits, I think not being able to reach 50 TLDs is not OK. However, the infrastructure seems slowly changing. And, that has some ramifications. And, we might be able to leverage those if we wanted to ... - E.g., if we wanted to extend the TTL that doesn't seem like it would be a big problem. - E.g., if *in a pinch* we had to use an expired, but not too old root zone file to reach a TLD server because we couldn't fetch a current zone file that would likely be OK, too. allman ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
Hi Florian! > What's the change rate for the root zone? If there is a full > transition of the name server addresses for a zone, how long does > it typically take from the first change to the completion of the > sequence of changes? Not a direct answer to your question, but a couple empirical bits from the paper that started this thread ... We analyzed a snapshot of the root zone file from each day in April, 2019. On the first of the month the root zone included 1,532~TLDs and one was deleted during the month. Of the TLDs, all but five have at least one nameserver (by IP address) that is constant for the entire month. That is, if a recursive resolver used a root zone file that was out of date by one month, 99.6\% of the TLDs would remain accessible. The five TLDs that do not have a constant nameserver for the entire month are run by NeuStar and use a slowly rotating set of IP addresses for the TLD nameservers. The overlap ensures that a root zone file that is no more than 14~days out of date will ensure constant TLD reachability. Further, comparing the root zone files on April 1, 2018 and April 1, 2019 we find that all but 50~TLDs (3.3\%) would still retain reachability with a root zone file that is a year out of date. Obviously, there could be a more comprehensive analysis, but I think that gives some idea about how stable the root zone file is in practice. allman ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
Hi Paul! > The biggest problem I see here is the legacy/long-tail problem. As > of a few years ago, I bumped into BIND 4 servers still > active. Wouldn't be shocked to hear they are still being used. > > IPv4 reachable traditional DNS servers for some tiny group of > antique folks will be needed for years, even if we get 99+% of the > world to some new system. I wonder if we're ever allowed to just decide this sort of thing is ridiculous old shit and for lots of reasons we can and should just garbage collect it away. > Doesn't mean we shouldn't be thinking about a better way to do it > for that 99% though. Is it better if we only get to 99%? To me, this whole notion is that we can in fact get rid of this giant network service. If we don't get rid of it then what is the incentive to move one's own resolver away from using the root nameservers? I don't have any heartburn with RFC 7706. But, it is a quite minor optimization in the general case. It may well be important in some corner cases, but in general I don't think running a local root nameserver helps all that much. Maybe 99% lets us draw down the size of the root infrastructure...I dunno. But, if we don't say something like "it's going to go away" then I am not sure resolvers will move away from it. allman -- https://www.icir.org/mallman/ @mallman_icsi ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
> It would appear a rather large percentage of queries to the root > (like 50% in some samples) are random strings, between 7 to 15 > characters long, sometimes longer. I believe this is Chrome-style > probing to determine if there is NXDOMAIN redirection. A good > example of the tragedy of the commons, like water pollution and > climate change. I will note that there have been quite a number of studies over the last 20 years that show > 95% of the queries are junk of one kind or another. Someone mentioned Duane's nice paper. But, this observation started with Brownlee, et.al.'s 2001 paper. Point being, Chrome might cause some of this now, but it has been there long before Chrome started this particularly probing. What's more... in my rudimentary poking of the DITL data [*] it seems that 25-50% of the "resolvers" that query the root *never* send a legit query. I.e., we can't ascribe a lot of this junk to resolvers that could just work better somehow. [*] There may be numbers on this sort of thing in the Brownlee, Wessels, etc. papers ... I just can't recall them off the top of my head. allman -- https://www.icir.org/mallman/ @mallman_icsi ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] root? we don't need no stinkin' root!
Let me try to get away from what is or is not "big" and ask two questions. (These are legit questions to me. I have studied the DNS a whole bunch, but I do not operate any non-trivial part of the DNS and so that viewpoint is valuable to me.) (1) Setting aside history and how things have been done and why (which I am happy to stipulate is rational)... At this point, are there tangible benefits for getting information about the TLD nameservers to resolvers as needed via a network service? (2) Are there fundamental problems that would arise in recursive resolvers if the information about TLD nameservers was no longer available via a network service, but instead had to come from a file that was snarfed periodically? Thanks! allman -- https://www.icir.org/mallman/ @mallman_icsi ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] root? we don't need no stinkin' root!
Left here to be ripped apart ... :-) Mark Allman. On Eliminating Root Nameservers from the DNS, ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), November 2019. https://www.icir.org/mallman/pubs/All19b/ Abstract: The Domain Name System (DNS) leverages nearly 1K distributed servers to provide information about the root of the Internet's namespace. The large size and broad distribution of the root nameserver infrastructure has a number of benefits, including providing robustness, low delays to topologically close root servers and a way to cope with the immense torrent of queries destined for the root nameservers. While the root nameserver service operates well, it represents a large community investment. Due to this large cost, in this paper we take the position that DNS' root nameservers should be eliminated. Instead, recursive resolvers should use a local copy of the root zone file instead of consulting root nameservers. This paper considers the pros and cons of this alternate approach. allman -- https://www.icir.org/mallman/ @mallman_icsi ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] resolvers considered harmful
simply on their own moves the entire query load of all endpoints (billions) onto the authoritative nameservers only. Do you really propose a billion clients should perform lookups against my 3 poor nameservers for nohats.ca.? Well - All billions of clients are not interested in nohats.ca so trying to compare billions of clients against your three nameservers is a red herring. - I don't know what your load is, but do you have any idea how much your load will increase if shared resolvers did not shield you from some of it? We quantify this a little in our paper (for .com). We should use numbers to talk about these things instead of just waving our hands at some boogie man. - And, I'd spin this around on you ... You clearly care about your 3 poor nameservers. That is natural and rational. But, why do you think it is someone else's job to run a cache to shield you from load? Why should we at ICSI run a shared resolver for your benefit? If we get benefit and it happens to help you, too, great. But, I can tell you that we certainly don't factor your load into our considerations of how to run our infrastructure. Suggesting to dismantle the largest distributed database in the world and thinking you can get away with it is a very ill thought plan not rooted in reality. Well, ... - We root our argument in some empiricalism, anyway. That is more than you're doing. One can always get more data and more vantage points, but at least let's not pretend we just waved our hands here, please. - We are not talking about dismantling the distributed database. We are talking about eliminating optional caches from the system. The actual database embodied in the auth servers remains untouched. I have had many conversations with people over the last year about this idea and I always find it sort of interesting that resolvers are viewed as a required component of the system and that it so much blows people's minds that the system could or should work without them. (Yet, web caches---which one can view as pretty analogous---do not seem to rise to the level. I.e., folks seem to view them for what they are---perhaps a helping hand---and not as some crucial component of the system.) - Doing what we have always done because we have always done it and not thinking about the implications of that seems like a lousy plan, too, BTW. allman pgpkoQPojfZQi.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
The biggest problem I have with this paper is of terminology. No- I don't want every app to build in a resolver. Madness! Think of it as a change under-the-hood to gethostbyname(). Same interface to the applications. But, underneath it doesn't go query whatever is in /etc/resolv.conf, but rather just walks the tree itself (to the extent needed, based on the cache). Then, when it comes to privacy (the biggest problem with your proposal), I strongly disagree with the way you get rid of the problems by saying we note that many users are willing to use open shared resolvers (e.g., Google DNS) and are therefore comfortable with directly attributable DNS requests arriving at a large third-party network. This is propaganda, not science. Users use Google Public DNS because their ISP's resolver is broken or slow, or because the ISP censors http://www.bortzmeyer.org/dns-routing-hijack-turkey.html or because the IP address is cool or simply because they feel that it's Google so it must be nice. They never perform an assessment of the public resolver privacy policy and practices, and they certainly don't analyze the tradeoffs. Most users (even most IT professionals) have no idea of the complex privacy issues associated with DNS. I understand you have probably thought this through more than I have. But, I have a couple of views here in addition to the above ... - Ultimately you're going to take the results of a DNS transaction and turn around and hit the given service with an application. So, while I may have been some nebulous someone at ICSI during the name lookup, once I make the TCP connection I am not so anonymous anymore. That does not apply to all cases, of course. I.e., I ask Verisign for google.com and then I TCP to Google and not Verisign. So, in this case I could remain someone at ICSI to Verisign if I used the shared resolver. - I think a rational way to look at this is the way we look at privacy more generally. If you communicate with someone then they'll know your IP. If you don't want that, take some explicit step to prevent it (e.g., use Tor). We get an obfuscation from shared resolvers now, but is that enough of a reason to keep them around? allman pgpScm1GuXrlh.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
cache hit rate is about 80%-90% for those caching you think can be removed. Note that this cache hit rate is heavilly skewed because of the facebook one time uncachable hostnames they were using at the time. If you also include the fact that these caches were feeding other caches, you will see the enormous amount of queries you are suggesting to unleash on authoritative nameservers on the internet. Right... Our cache hit rate is somewhere in the two-thirds area. At least the last time I looked. (We have some notions from a couple years ago in http://www.icir.org/mallman/pubs/CAR13/ .) But, one important bit here is that while this does increase the load, the load is distributed. So, it isn't like we're landing all the load one given point in the network. And, it is distributed proportionally to the popularity of the underlying services (which intuitively seems about right to me). - And, I'd spin this around on you ... You clearly care about your 3 poor nameservers. That is natural and rational. But, why do you think it is someone else's job to run a cache to shield you from load? If you really believe the mode of the internet should be that the weakest device should be able to deal with the largest volume load the world can throw at it, there is not much point discussing this further. I'm just happy that people like Van Jacobson designed the internet. Why should we at ICSI run a shared resolver for your benefit? Because thousands of ISP run caches for your servers' benefit. Using your reasoning, we should drop all the exponential backoff in our TCP/IP protocols. You'll just have to deal with the load, and if you get blasted off the net it's clear your fault for being underpowered. I think this is a really bad analogy. I do happen to know something about congestion control. Maybe even two things! Congestion control is a shared set of algorithms / strategies for dealing with the case when some shared piece of infrastructure is over-committed. For instance, a link in the middle of the net. It isn't any one person's/host's fault that it is over-capacity. So, we agree on a set of techniques that we can all use to reasonably backoff and share the link so nobody starves. We do not point at the owner of the link and say hey, we're trying to use more capacity than you have so buy some more capacity!. I.e., we don't impose on someone else to add resources on our behalf. Rather, we decide to all play nice and so we all get something done (even if slower than we'd really like). Congestion control is about coping with a less-than-ideal shared reality. But, this is not at all like your nameserver. Your nameserver is not shared infrastructure that zillions of disparate people all happen to (over) use. It is infrastructure that works on your behalf to serve names that you want served. If it can't handle the load then there is a clear culprit: you. I.e., your popularity has outgrown your resources. Why should that be my problem? We don't have google telling us to all fire up an institutional HTTP cache on our networks because it has run out of capacity and its our problem to fix. Finally, it isn't that I believe the weakest device should be able to deal with the largest volume load the world can throw at it. Rather, I believe if someone is providing a service, then they should be responsible for provisioning for the load that service incurs (or dealing with the suboptimal performance). I wouldn't have thought that a controversial notion. allman pgpvoASMDGV4J.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
There is no relationship between the data and the conclusion. Having a short TTL is not because you make changes often, it's because, when you decide to make a change, you want it to be effective rapidly. The actual number of changes does not matter, what matter are the expectations of users (sorry, buddy, we made the change immediately but it will not be seen by all caches before one week). I think that is totally fair. But, two things ... - The TLDs are a little weird in that they are trying to control for their load and yet serving someone else's names. So, yeah, there is this sort of mismatch where the SLDs would like shorter TTLs because they want the flexibility and don't pay the serving price. Meanwhile, the TLDs don't directly care about the flexibility and so they optimize for load shedding. So, um, yeah - But, inside google.com this is all much more straightforward. I.e., they can be as flexible as they want to provision for. allman pgp4xvBaWNfEW.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
- As noted in the paper 93% of the zones see no increase in our trace-driven simulations. That is, they are accessed by at most one end host per TTL and therefore see no benefit from the shared cache and hence will see the same load regardless of whether it is an end host or a shared resolver asking the questions. How does this compare to resolvers with one or two (or four) orders of magnitude more clients behind them? Presumably pretty well. I only know of old results here, but Jung's IMW 2001 paper suggests that the cache hit rate levels off after 10-20 users. I have in mind that there is a more recent (but not last year) validation of this, but I don't have a reference at my finger tips. (And, that follows my intuition, BTW. I.e., that most of the cache hit rate comes from popular names and you don't need many users to coax those popular names into the cache.) - There is also a philosophical-yet-practical argument here. That is, if I want to bypass all the shared resolver junk between my laptop and the auth servers I can do that now. And, it seems to me that even given all the arguments against bypassing a shared resolver that should be viewed as at least a rational choice. So, in this case the auth zones just have to cope with what shows up. So, do we believe that it is incumbent upon (say) ATT to provide shared resolvers to shield (say) Google from a portion of the DNS load? It doesnt look to me like your paper has done anything to capture what it looks like behind ATTs resolvers, so Im not sure how you can come to that sort of conclusion. Correct. This is a thought experiment with exemplars that I gave names to. allman pgpIVS7Nd5amz.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
[dns-operations] resolvers considered harmful
Short paper / crazy idea for your amusement ... Kyle Schomp, Mark Allman, Michael Rabinovich. DNS Resolvers Considered Harmful, ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), October 2014. To appear. http://www.icir.org/mallman/pubs/SAR14/ Abstract: The Domain Name System (DNS) is a critical component of the Internet infrastructure that has many security vulnerabilities. In particular, shared DNS resolvers are a notorious security weak spot in the system. We propose an unorthodox approach for tackling vulnerabilities in shared DNS resolvers: removing shared DNS resolvers entirely and leaving recursive resolution to the clients. We show that the two primary costs of this approach---loss of performance and an increase in system load---are modest and therefore conclude that this approach is beneficial for strengthening the DNS by reducing the attack surface. Comments welcome. allman -- http://www.icir.org/mallman/ pgpDM08LBkHiO.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
Let me try to tackle a few things before heading off to my next meeting to conjure other stupid ideas! :-) Florian Weimer f...@deneb.enyo.de: This is a bit over the top. I've suggested multiple times that one possible way to make DNS cache poisoning less attractive is to cache only records which are stable over multiple upstream responses, and limit the time-to-live not just in seconds, but also in client responses. Expiry in terms of client responses does not cause a cache expiration, but a new upstream query once the record is needed again. If it the new response matches what is currently in the cache, double the new client response time-to-live count from the previous starting value. If not, start again at the default low value (perhaps even 1). Sure... this may well be a fine enough idea. Lots of such point solutions are perfectly fine ideas. Our point is to step back and wonder whether these resolvers (whether ISP-level, institution-level, CPE, whatever) are really a large enough benefit compared against the cost of the attack surface. And, the paper shows---at least in an initial fashion---that the benefits might not be all that great. Andrew Sullivan a...@anvilwalrusden.com: There's a third cost here, and that is a large increase in costs to authoritative server operators. That might be worth trading off, but it won't do to pretend that isn't a cost that's incurred. I absolutely agree. Please read section 5 which addresses exactly this question. We use .com as an example of a popular authoritative domain in this work. Frank Sweetser f...@wpi.edu: We make pretty heavy use of RPZ to block outbound malware traffic, especially to prevent people from inadvertently browsing malicious web sites. Yup, this would be a cost of getting rid of resolvers, I agree. We mention policy implementation in the paper, but don't say a lot about it. My view here is that you could certainly still do exactly this by funneling traffic through a resolver. That might be the right tradeoff for you. But, the right default may well be to let clients do lookups themselves and let this situation happen as the exception in places where folks want to implement such policy. In other words, just because there may well be times we want to use an intermediate resolver for good and valid reasons does not mean that running zillions of such boxes all over the place (as we do now) is the right general approach. Matthew Pounsett m...@conundrum.com: The paper also appears to make the assumption that eliminating existing resolvers is a thing we can do. Open recursive resolvers wont go away simply because we, as an industry, decide to stop setting up new ones. Theres no way to prevent them from sending queries (or to selectively block them), and they are almost by definition unmanaged, so we cannot expect they will be taken offline by their respective administrators. Sure. I agree with this. But, if we make clients default to not using resolvers then the harm resolvers can do is reduced. I.e., so what if I can cache poison a CPE if none of the clients behind it utilize the CPE for lookups? Of course, if the CPE is open then we still have reflection/amplification problems. But, if nobody is using the DNS forwarders on these things then maybe they eventually go away. We are not under the illusion that one can wave a magic wand and get rid of these things. But, individual clients can get benefits from ignoring them. And, if they are generally not found to be useful anymore then maybe they start going away. Thanks folks! allman pgphCJGAo8fWN.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
Why not just turn on DNSSEC? Important zones are still unsigned, so I can understand why there is a desire for altenative solutions. Right. It isn't like we are lacking for ways to solve the problems we know about. E.g., we know how to mitigate the Kaminsky attack. But, yet, still there are plenty of vulnerable resolvers (per our PAM paper From this past spring). E.g., we know how to secure DNS records with crypto. But, yet, broadly speaking we don't do it. So, perhaps we need to re-think things. allman pgpaXw4VIMmms.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
Let me try to take care of both of these related points together: Joe Greco jgr...@ns.sol.net: Then we merely move on to the issue of cache poisoning individual clients. Assuming that the CPE is a NAT (effectively firewalling clients from poisoning attacks) and/or that the individual clients have well- designed, impervious resolvers is likely to be a fail. David Conrad d...@virtualized.org: As I understand it, you're proposing pushing the resolvers out to the edges That is not what we are proposing. We are not suggesting resolvers be *moved*, but rather *removed*. That is, clients simply do name lookup on their own. Name lookup at an endpoint is different from name lookup in an intermediate resolver. An intermediate resolver looks up a name on behalf of other hosts. It therefore *must* listen for lookup requests that roll in from the network. This is fundamental to a resolver's operation---it simply *must* accept requests from other hosts. Don't get me wrong it doesn't have to accept all requests and as we know, too many resolvers accept requests they should not. All I am saying is that the resolver cannot do its job without accepting requests from other hosts. On the other hand, an endpoint can look up a name without listening for any request from the network. We suggest this be an entirely local operation. Think of it like this: just because I want to load the cnn.com web page I don't have to run httpd. Well, just because I want to look up an A record for cnn.com doesn't mean I have to run bind. Could there be attacks against the internal lookup process on a host? Of course. But, those are attacks that require some sort of access to the end host first. David Conrad d...@virtualized.org: if you're not doing DNSSEC at the edges, Let me be clear I am not arguing against DNSSEC. A crypto signed record is always better than a clear text record. But, DNSSEC is still not here and it seems to me that factoring out some of the intermediaries that we know sometimes both play games and have games played on them may well be a useful path. allman pgpQhzuXt7GqY.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
Their model doesn't make it a large increase, and I think that's because of their unrealistic assumptions about actual use. I am not sure I follow this one. We used a workload derived from actual use to assess how much the load on .com would increase in a world where each endpoint was doing its own name lookups. The problem is that the really popular domains on the Internet (Google is one example they do discuss) have completely different patterns than everyone else. The mitigation technique of increasing TTLs imposes a cost on the operator of the popular domain (a cost in terms of flexibility). The authors seem to think this is no large cost, and I disagree. The paper quantifies this cost for .com. We find that something like 1% of the records change each week. So, while increasing the TTL from the current two days to one week certainly sacrifices some possible flexibility, in practical terms the flexibility isn't being used. It's true that the increase in your numbers doesn't seem to be as bad as the theoretical maximium, bit 2.5 or 3 times several million is still a large number. Right... But, let me make a few points: - As noted in the paper 93% of the zones see no increase in our trace-driven simulations. That is, they are accessed by at most one end host per TTL and therefore see no benefit from the shared cache and hence will see the same load regardless of whether it is an end host or a shared resolver asking the questions. - The point is that the increase rates above don't seem to me to be disqualifying. I.e., there is some increase rate (e.g., on the order of the number of users) that would I think be a show stopper. But, that doesn't seem to be what we're looking at here. - Or, put differently ... We are not pretending that there is no additional cost at some auth servers. But, this additional cost does buy us things. So, it is simply a different tradeoff than we are making now. - As for popular content providers, as you note, their TTLs are much lower. So, while the NS for facebook.com is going to be used by nearly all the clients behind some shared resolver over its two day TTL, the A for star.c10r.facebook.com lasts less than a minute and so a cached version is only going to shield facebook from so much of the load. I haven't run the numbers, but my hunch is that we'll see increase rates for facebook.com, google.com, etc. to be smaller than for the .com results we have in the paper because the cachability of these records is smaller. - There is also a philosophical-yet-practical argument here. That is, if I want to bypass all the shared resolver junk between my laptop and the auth servers I can do that now. And, it seems to me that even given all the arguments against bypassing a shared resolver that should be viewed as at least a rational choice. So, in this case the auth zones just have to cope with what shows up. So, do we believe that it is incumbent upon (say) ATT to provide shared resolvers to shield (say) Google from a portion of the DNS load? Or, put differently, the results in the paper suggest that there really isn't much for ATT to gain from providing those resolvers, so why should it? One argument here could be that ATT is trying to provide its customers better performance. But, the paper shows this is really not happening (which is largely a function of pervasive DNS prefetching). So, if I am ATT I'd be thinking hey, what am I or my customers actually gaining from this complexity I have in my network?!. And, if the answer is little-to-nothing then it seems rational to not provide this service. Or, so it seems to me. allman pgpNXQtxM_pYD.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] resolvers considered harmful
It is a terminology issue, I think. Perhaps I'm being unclear and/or we're having a terminology mismatch. To be concrete: are you suggesting that (a) every application on an 'endpoint' provide its own iterative resolution, (b) the 'end point' effectively runs an iterative caching resolver at 127.0.0.1/::1, or (c) something else? (c) Or, all of them! The implementation does not matter to me. An app could just run dig +short for all I care. My point is that this is an entirely internal matter to the host. Unlike the case of a shared resolver there is no requirement that it accept any lookup request from outside the box. All I am saying is that the resolver cannot do its job without accepting requests from other hosts. As a person who frequently runs unbound listening only to 127.0.0.1 on my laptop, we may have differing opinions of the scope of the job of a resolver. I should---as I hope we do in the paper---be careful and use the term 'shared resolver' for something outside the host itself. My point was that to definitively fix resolver-to-authoritative, you're going to need something like DNSSEC. Yes- absolutely. E.g., just because a client does the name lookup instead of handing to a shared resolver isn't going to make the great firewall any less likely to forge a response. So, I don't mean to in any way say DNSSEC isn't useful. allman pgp4JunCe_ilo.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
[dns-operations] new DNS forwarder vulnerability
Just a quick note to let folks know about a new vulnerability we have found in some low-rent DNS forwarders---which we have been calling the 'preplay attack'. The finding is that when the vulnerable open resolvers receive a DNS response they just look at the query string in the response to see if they have a request for the given string outstanding. If they do, they accept the result. I.e., there is no validating of the source IP, port numbers or DNS transaction ID in the response. Dumb. This makes poisoning the caches of these boxes trivial (i.e., send a request for www.facebook.com and then immediately send an answer). A few notes ... - We have found 7--9% of the open resolver population---or 2-3 million boxes---to be vulnerable to this cache poisoning attack. (The variance is from different runs of our experiments.) - We have not been able to nail this vulnerability down to a single box or manufacturer. To the contrary our efforts at identifying the boxes indicates it crosses such boundaries. (However, these boxes do seem to be largely situated in residential settings.) - We presented these results at PAM earlier this week. Our paper, slides, etc. with details of the attack (and results about previously known DNS attacks) are available here: http://www.icir.org/mallman/pubs/SCRA14/ - We did give CERT a heads up about this before the paper appeared and they kibitzed the information around to various manufacturers of this sort of gear. My mental model is that this sort of gear is upgraded when it goes kaput. So, vigilance I guess. FWIW. allman -- http://www.icir.org/mallman/ pgpwjwxRONXk2.pgp Description: PGP signature ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs