Re: DNS caches that support partitioning ?

2012-10-14 Thread Florian Weimer
* John Levine:

 Are there DNS caches that allow you to partition the cache for
 subtrees of DNS names?  That is, you can say that all entries from
 say, in-addr.arpa, are limited to 20% of the cache.

You can build something like that using forwarders and most DNS
caches.  But it won't result in an exact split because cross-subtree
CNAMEs and DNS delegations will cause caching outside the subtree.
However, for in-addr.arpa, that's probably what you want anyway.



Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Patrick W. Gilmore
While I hesitate to argue DNS with Mark, I feel this needs a response.

On Aug 19, 2012, at 17:37 , Mark Andrews ma...@isc.org wrote:
 In message ddf607b5-415b-41e8-9222-eb549d3db...@semihuman.com, Chris 
 Woodfield writes:

 What Patrick said. For large sites that offer services in multiple data =
 centers on multiple IPs that can individually fail at any time, 300 =
 seconds is actually a bit on the long end.

 Which is why the DNS supports multiple address records.  Clients
 don't have to wait a minutes to fallover to a second address.  One
 doesn't have to point all the addresses returned to the closest
 data center.  One can get sub-second fail over in clients as HE
 code shows.

I'm afraid I am not familiar with HE code, so perhaps I am being silly here.  
But I do not think returning multiple A records for multiple datacenters is as 
useful as lowering the TTL.  Just a few reasons off the top of my head:

* How do you guarantee the user goes to the closer location if you respond
  with multiple addresses?  Forcing users to go to farther away datacenters
  half the time is likely a poor trade-off for the occasional TTL problem
  when a DC goes down.

* How many applications are even aware multiple addresses were returned?

* How do you guarantee sub-second failover when most apps will wait longer 
  than one second to see if an address responds?

Etc.

And that doesn't begin to touch thing such as cache efficiency that affect 
companies like Google, CDNs, etc.


 As for the original problem.  LRU replacement will keep hot items in
 the cache unless it is seriously undersized.

This was covered well by others.

-- 
TTFN,
patrick




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Dobbins, Roland

On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote:

 But I do not think returning multiple A records for multiple datacenters is 
 as useful as lowering the TTL.

Some folks do this via various GSLB mechanisms which selectively respond with 
different records based on the assumed relative topological distance between 
the  querying resolver and various server/service instantiations in different 
locations.

---
Roland Dobbins rdobb...@arbor.net // http://www.arbornetworks.com

  Luck is the residue of opportunity and design.

   -- John Milton




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Patrick W. Gilmore
On Aug 20, 2012, at 06:49 , Dobbins, Roland rdobb...@arbor.net wrote:
 On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote:
 
 But I do not think returning multiple A records for multiple datacenters is 
 as useful as lowering the TTL.
 
 Some folks do this via various GSLB mechanisms which selectively respond with 
 different records based on the assumed relative topological distance between 
 the  querying resolver and various server/service instantiations in different 
 locations.

Some folks == more than half of all traffic on broadband modems these days.

However, I think you missed a post or two in this thread.

The original point was you need a low TTL to respond with a single A record or 
multiple A records which all point to the same datacenter in case that node / 
DC goes down.  Mark replied saying you can respond with multiple A records 
pointing at multiple DCs, thereby allowing a much longer TTL.

My question above is asking Mark how you guarantee the user/application selects 
the A record closest to them and only use the other A record when the closer 
one is unavailable.

-- 
TTFN,
patrick




Re: DNS caches that support partitioning ?

2012-08-20 Thread Tony Finch
Raymond Dijkxhoorn raym...@prolocation.net wrote:

 When you use forwarding it doesnt cache the entry. ('forward only'
 option in bind for example).

That's incorrect. Try configuring a forwarded zone and observe the TTLs
you get in responses. The forward only option disables recursion but
not cacheing.

 I talked with Paul Vixie about doing this internal inside bind but that was
 not something they would be delighted to do (at least not now). If you could
 define how large your cache pool was for certain objects that would fix it
 also.

You might be able to hack it using a combination of forwarding zones,
views, max-cache-ttl, and attach-cache.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Chris Adams
Once upon a time, Patrick W. Gilmore patr...@ianai.net said:
 * How many applications are even aware multiple addresses were returned?

Most anything that supports IPv6 should handle this correctly, since
getaddrinfo() will return a list of addresses to try.

 * How do you guarantee sub-second failover when most apps will wait longer 
   than one second to see if an address responds?

That's a bigger issue.  Also, for web services, the application might
wait, but the end-user usually won't (if the site doesn't come up in a
second, they move on to the something else).

-- 
Chris Adams cmad...@hiwaay.net
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Patrick W. Gilmore
On Aug 20, 2012, at 08:25 , Tony Finch d...@dotat.at wrote:
 Patrick W. Gilmore patr...@ianai.net wrote:
 On Aug 19, 2012, at 17:37 , Mark Andrews ma...@isc.org wrote:
 
 Which is why the DNS supports multiple address records.  Clients
 don't have to wait a minutes to fallover to a second address.  One
 doesn't have to point all the addresses returned to the closest
 data center.  One can get sub-second fail over in clients as HE
 code shows.
 
 I'm afraid I am not familiar with HE code, so perhaps I am being silly
 here.
 
 Mark is referring to happy eyeballs:
 http://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-server-over-tcp

Oh.  Yep, I was being silly, thinking only of v4.  (I'm sleep deprived of late 
- yes, more than usual.)  Unfortunately, whether we like it or not, 99+% of 
traffic on the 'Net is still v4, as were the examples given.

Even with HE, though, there is no (not yet a?) way in DNS to signal use this A 
record first, then that one if the first doesn't work / is slow / whatever.  
Any chance of getting MX-style weights for A records? :)

Even then, it would not solve the original problem of low TTLs.  Just as a 
simple example, when traffic ramps quickly, a provider may want to move some 
users off a node to balance traffic.  With a long TTL, that's not really 
possible baring really bad hacks like DoS'ing some users to hope they use the 
next A record, which would lead to massive complaints.

We could go on, but hopefully the point is clear that low TTLs are useful in 
many instances despite the ability to return multiple A records.

-- 
TTFN,
patrick




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Patrick W. Gilmore
On Aug 20, 2012, at 08:47 , Chris Adams cmad...@hiwaay.net wrote:
 Once upon a time, Patrick W. Gilmore patr...@ianai.net said:

 * How many applications are even aware multiple addresses were returned?
 
 Most anything that supports IPv6 should handle this correctly, since
 getaddrinfo() will return a list of addresses to try.

Ah, the amazing new call which destroys any possibility of randomness or round 
robin or other ways of load balancing between A /  records.

Yes, all of us returning more than one A /  record are hoping that gets 
widely deployed instantly.  Or not.

-- 
TTFN,
patrick




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Dobbins, Roland

On Aug 20, 2012, at 5:56 PM, Patrick W. Gilmore wrote:

 My question above is asking Mark how you guarantee the user/application 
 selects the A record closest to them and only use the other A record when the 
 closer one is unavailable.

I understand - my point was that folks using a GSLB-type solution would 
generally include availability probing in the GSLB stack, so that a given 
instance won't be included in answers if it's locally unavailable (obviously, 
the GSLB can't know about all path elements between the querying resolver and 
the desired server/service).

---
Roland Dobbins rdobb...@arbor.net // http://www.arbornetworks.com

  Luck is the residue of opportunity and design.

   -- John Milton




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Tony Finch
Patrick W. Gilmore patr...@ianai.net wrote:
 On Aug 20, 2012, at 08:47 , Chris Adams cmad...@hiwaay.net wrote:
 
  Most anything that supports IPv6 should handle this correctly, since
  getaddrinfo() will return a list of addresses to try.

 Ah, the amazing new call which destroys any possibility of randomness or
 round robin or other ways of load balancing between A /  records.
 Yes, all of us returning more than one A /  record are hoping that
 gets widely deployed instantly.  Or not.

The problem is RFC 3484 address selection; getaddrinfo is just the usual
place this is implemented. I had believed that there was work in progress
to fix this problem with the specs but it seems to have stalled.
http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Shumon Huque

On 8/20/12 10:11 AM, Tony Finch wrote:

Patrick W. Gilmore patr...@ianai.net wrote:

On Aug 20, 2012, at 08:47 , Chris Adams cmad...@hiwaay.net wrote:


Most anything that supports IPv6 should handle this correctly, since
getaddrinfo() will return a list of addresses to try.


Ah, the amazing new call which destroys any possibility of randomness or
round robin or other ways of load balancing between A /  records.
Yes, all of us returning more than one A /  record are hoping that
gets widely deployed instantly.  Or not.


The problem is RFC 3484 address selection; getaddrinfo is just the usual
place this is implemented. I had believed that there was work in progress
to fix this problem with the specs but it seems to have stalled.
http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05

Tony.



It's in the RFC editor queue actually:

http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1
http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/

--Shumon.




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Nick Hilliard
On 20/08/2012 14:18, Patrick W. Gilmore wrote:
 On Aug 20, 2012, at 08:47 , Chris Adams cmad...@hiwaay.net wrote:
 Most anything that supports IPv6 should handle this correctly, since
 getaddrinfo() will return a list of addresses to try.
 
 Ah, the amazing new call which destroys any possibility of randomness or
 round robin or other ways of load balancing between A /  records.

well, new as in about 16 years old.

 Yes, all of us returning more than one A /  record are hoping that
 gets widely deployed instantly.  Or not.

inet_addr() is zomfg brain damaged, ipv4 only, non-thread-safe and needs to
die horribly in a fire.  getaddrinfo() puts at least the possibility of
address selection into the hands of the developer which is going to be
completely necessary for happy eyeballs / rfc3484bis / etc.

Nick




Re: DNS caches that support partitioning ?

2012-08-20 Thread Aaron Hopkins

On Sat, 18 Aug 2012, Patrick W. Gilmore wrote:


IMHO, if Google losses a datacenter and all users are stuck waiting for a
long TTL to run out, that is Very Bad.  In fact, I would call even 2.5
minutes (average of 5 min TTL) Very Bad.  I'm impressed they are
comfortable with a 300 second TTL.


Google is very aggressive about reducing user-visible latency, of which they
cite DNS as a significant contributing factor.  They may be choosing to
strike a different balance of faster when everything is working correctly vs
faster to recover when something breaks.

-- Aaron



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Tony Finch
Shumon Huque shu...@upenn.edu wrote:
 On 8/20/12 10:11 AM, Tony Finch wrote:
 
  The problem is RFC 3484 address selection; getaddrinfo is just the usual
  place this is implemented. I had believed that there was work in progress
  to fix this problem with the specs but it seems to have stalled.
  http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05

 It's in the RFC editor queue actually:

 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1
 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/

Excellent :-)

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Tim Chown

On 20 Aug 2012, at 16:39, Tony Finch d...@dotat.at wrote:

 Shumon Huque shu...@upenn.edu wrote:
 On 8/20/12 10:11 AM, Tony Finch wrote:
 
 The problem is RFC 3484 address selection; getaddrinfo is just the usual
 place this is implemented. I had believed that there was work in progress
 to fix this problem with the specs but it seems to have stalled.
 http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05
 
 It's in the RFC editor queue actually:
 
 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1
 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/
 
 Excellent :-)
 
 Tony.

I'll ask the IETF admins to link -revise to the bis draft so this continuation 
is clearer in the tools pages.

Tim




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Patrick W. Gilmore
On Aug 20, 2012, at 10:07 , Dobbins, Roland rdobb...@arbor.net wrote:
 On Aug 20, 2012, at 5:56 PM, Patrick W. Gilmore wrote:
 
 My question above is asking Mark how you guarantee the user/application 
 selects the A record closest to them and only use the other A record when 
 the closer one is unavailable.
 
 I understand - my point was that folks using a GSLB-type solution would 
 generally include availability probing in the GSLB stack, so that a given 
 instance won't be included in answers if it's locally unavailable

How does that allow for a long TTL?  If you set a 3600 second TTL when the DC 
is up, and the DC goes down 2 seconds later, what do you do?


 (obviously, the GSLB can't know about all path elements between the querying 
 resolver and the desired server/service).

Says who? :)

-- 
TTFN,
patrick




Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Mark Andrews

In message 0d919d57-bda0-4fda-873d-3dc0cd574...@ianai.net, Patrick W. 
Gilmore writes:
 On Aug 20, 2012, at 06:49 , Dobbins, Roland rdobb...@arbor.net =
 wrote:
  On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote:
 =20
  But I do not think returning multiple A records for multiple =
 datacenters is as useful as lowering the TTL.
 =20
  Some folks do this via various GSLB mechanisms which selectively =
 respond with different records based on the assumed relative topological =
 distance between the  querying resolver and various server/service =
 instantiations in different locations.
 
 Some folks =3D=3D more than half of all traffic on broadband modems =
 these days.
 
 However, I think you missed a post or two in this thread.
 
 The original point was you need a low TTL to respond with a single A =
 record or multiple A records which all point to the same datacenter in =
 case that node / DC goes down.  Mark replied saying you can respond with =
 multiple A records pointing at multiple DCs, thereby allowing a much =
 longer TTL.
 
 My question above is asking Mark how you guarantee the user/application =
 selects the A record closest to them and only use the other A record =
 when the closer one is unavailable.

You can't but a GSLB also can't know if the path from the client
to the DC selected by the GSLB will work.  There is a high probability
that it will but no certainty.  By returning addresses to multiple
DC's you increase the probability that a client will be able to
connect in the presence of network errors not visible to the GSLB
control algorithms.

If you want to add weights etc. then you need to use something like
SRV to pass this information to the client.  The GSLB can then
adjust these.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org



Re: Return two locations or low TTL [was: DNS caches that support partitioning ?]

2012-08-20 Thread Mark Andrews

In message 20120820124734.ga14...@hiwaay.net, Chris Adams writes:
 Once upon a time, Patrick W. Gilmore patr...@ianai.net said:
  * How many applications are even aware multiple addresses were returned?
 
 Most anything that supports IPv6 should handle this correctly, since
 getaddrinfo() will return a list of addresses to try.
 
  * How do you guarantee sub-second failover when most apps will wait longer 
than one second to see if an address responds?
 
 That's a bigger issue.  Also, for web services, the application might
 wait, but the end-user usually won't (if the site doesn't come up in a
 second, they move on to the something else).

You file RFE / bug reports against the clients for having crappy
fail over behaviour.  It isn't hard to write TCP based code that
falls over to the next available server.  You don't have to wait
for connect to fail before you attempt to connect to the next
address.  You just use a smarter connect loop.

UDP code is a little harder as the work needs to more spread though
the code than just replacing the dumb connect loop with a smart
connect loop.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org



Re: DNS caches that support partitioning ?

2012-08-19 Thread Chris Woodfield
What Patrick said. For large sites that offer services in multiple data centers 
on multiple IPs that can individually fail at any time, 300 seconds is actually 
a bit on the long end.

-C

On Aug 18, 2012, at 3:43 PM, Patrick W. Gilmore patr...@ianai.net wrote:

 On Aug 18, 2012, at 8:44, Jimmy Hess mysi...@gmail.com wrote:
 
 And I say that, because some very popular RRs have insanely low TTLs.
 
 Case in point:
 www.l.google.com.300INA74.125.227.148
 www.l.google.com.300INA74.125.227.144
 www.l.google.com.300INA74.125.227.146
 www.l.google.com.300INA74.125.227.145
 www.l.google.com.300INA74.125.227.147
 www.l.google.com.300INA74.125.227.148
 
 Different people have different points of view.
 
 IMHO, if Google losses a datacenter and all users are stuck waiting for a 
 long TTL to run out, that is Very Bad.  In fact, I would call even 2.5 
 minutes (average of 5 min TTL) Very Bad.  I'm impressed they are comfortable 
 with a 300 second TTL.
 
 You obviously feel differently.  Feel free to set your TTL higher.
 
 -- 
 TTFN,
 patrick
 
 




Re: DNS caches that support partitioning ?

2012-08-19 Thread Mark Andrews

In message ddf607b5-415b-41e8-9222-eb549d3db...@semihuman.com, Chris Woodfiel
d writes:
 What Patrick said. For large sites that offer services in multiple data =
 centers on multiple IPs that can individually fail at any time, 300 =
 seconds is actually a bit on the long end.
 
 -C

Which is why the DNS supports multiple address records.  Clients
don't have to wait a minutes to fallover to a second address.  One
doesn't have to point all the addresses returned to the closest
data center.  One can get sub-second fail over in clients as HE
code shows.

As for the original problem.  LRU replacement will keep hot items in
the cache unless it is seriously undersized.

Mark

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org



Re: DNS caches that support partitioning ?

2012-08-19 Thread William Herrin
On Sun, Aug 19, 2012 at 5:37 PM, Mark Andrews ma...@isc.org wrote:
 As for the original problem.  LRU replacement will keep hot items in
 the cache unless it is seriously undersized.

Maybe. This discussion is reminiscent of the Linux swappiness debate.

Early in the 2.x series Linux kernels, the guy responsible for the
virtual memory manager changed it to allow the disk cache to push
program code and data out of ram if all other disk cache was more
recently touched than the program data. Previously, the disk cache
would only consume otherwise free memory. Programs would only get
pushed to swap by memory demands from other programs.

The users went ape. Suddenly if you copied a bunch of data from one
disk to another, your machine would be sluggish and choppy for minutes
or hours afterward as programs recovered swapped pages from disk and
ran just long enough to hit the next section needing to be recovered
from swap. Some folks ditched swap entirely to get around the problem.

The guy insisted the users were wrong. He had the benchmarks,
meticulously collected data and careful analysis to prove that the
machines were more efficient with pure LRU swap. The math said he was
right. 2+2=4. But it didn't.

In the very common case of copy-a-bunch-of-files, simple LRU
expiration of memory pages was the wrong answer. It caused the machine
to behave badly. More work was required until a tuned and weighted LRU
algorithm solved the problem.


Whether John's solution of limiting the cache by zone subtree is
useful or not, he started an interesting discussion. Consider, for
example, what happens when you ask for www.google.com. You get a 7-day
CNAME record for a 5 minute www.l.google.com A record and the resolver
gets 2-day NS records for ns1.google.com, 4 day A records for
ns1.google.com, 2 day NS records for a.gtld-servers.com, etc.

Those authority records don't get touched again until www.l.google.com
expires. With a hypothetically simple least recently used (LRU)
algorithm, the 4 minute old A record for ns1.google.com was last
touched longer ago than the 3 minute old A record for
5.6.7.8.rbl.antispam.com. So when the resolver needs more cache for
4.3.2.1.rbl.antispam.com, which record gets kicked?

Then, of course, when www.l.google.com expires after five minutes the
entire chain has to be refetched because ns1.google.com was already
LRU'd out of the cache. This is distinctly slower than just refetching
www.l.google.com from the already known address of ns1.google.com and
the user sees a slight pause at their web browser while it happens.

Would a smarter, weighted LRU algorithm work better here? Something
where rarely used leaf data doesn't tend to expire also rarely used
but much more important data from the lookup chain?

Regards,
Bill Herrin


-- 
William D. Herrin  her...@dirtside.com  b...@herrin.us
3005 Crane Dr. .. Web: http://bill.herrin.us/
Falls Church, VA 22042-3004



Re: DNS caches that support partitioning ?

2012-08-19 Thread Jimmy Hess
On 8/19/12, Mark Andrews ma...@isc.org wrote:
 As for the original problem.  LRU replacement will keep hot items in
 the cache unless it is seriously undersized.
[snip]
Well,  that's the problem.   Items that are not relatively hot will
be purged,  even though they may be very popular RRs.  Cache
efficiency is not defined as keeping the hot items.

Efficient caching is defined as maximizing the hit percentage.
The DNS cache may have a load of DNSBL queries;  so all the entries
will be cold.  The problem in that case is not low utilization,  it's
high utilization of   queries that are useless to cache,  because
those questions will only be asked once,because  in LRU there's no
buffer maintaining an eviction history.

An example alternative strategy is, you have a Cache size of  XX  RR
buckets,   and you keep a list of YY  cache replacements  (not
necessarily the entire RRs, just label and a 1-byte count of
evictions),  and you have a cache policy of:  pick the entry whose TTL
has expired OR that has the lowest eviction count, that is
least-recently used or
has the least number of queries, to replace,.

--
-JH



Re: DNS caches that support partitioning ?

2012-08-19 Thread Gary Buhrmaster
Re: LRU badness

One approach is called adaptive replacement cache (ARC) which is
used by Oracle/Sun in ZFS, and was used in PostgreSQL for a time
(and slightly modified to (as I recall) to be more like 2Q due to
concerns over the IBM patent on the algorithm).  Unfortunately,
we do not have any implementations of the OPT (aka clairvoyant)
algorithm, so something like 2Q might be an interesting approach
to experiment with.

Gary



Re: DNS caches that support partitioning ?

2012-08-18 Thread Raymond Dijkxhoorn

Hi!


The cache needs to be big enough that it has a thrashy bit that is
getting changed all the time.  Those are the records that go into the
cache and then die without being queried again.  If the problem is
that there's some other record in there that might be queried again,
but that doesn't get queried often enough to keep it alive, then the
additional cost of the recursive lookup is just not that big a deal.



A big part of the problem is that TTLs reflect the server's estimate
of how long it might be until the data changes, but not the client's
likelihood of reusing the data.  I publish a BL of hosts in Korea.
The data changes maybe once a month so from my point of view a large
TTL would make sense, but most of the hits are for bots spraying spam
at random, so the client's not likely to reuse the data even if it
sticks around for many hours.

A plausible alternative to a partitioned cache might be TTL capping,
some way to say to your cache that no matter what the TTL is on
data from a range of names, don't keep it for more than a few seconds.

Another thing that would be useful to me for figuring out the BL cache
stuff would be traces of incoming IP addresses and timestamps for mail
servers larger than my tiny system, so I could try out some cache models.


About any caching nameserver supports forwarding. So if you REALLY are 
afraid this will bite you. Fire up another instance. Forward all the 
reverse DNS to that and you are set. That other instance will handle your 
reverse DNS with a cache that you will specify. The other ones are not 
impacted at all. When you use forwarding it doesnt cache the entry. 
('forward only' option in bind for example).


So:

Caching resolver - Caching resolver dedicated for reverse DNS.

Your first resolver does not cache reverse DNS and you are safe there. On 
the second you do the reverse DNS.


I talked with Paul Vixie about doing this internal inside bind but that 
was not something they would be delighted to do (at least not now). If you 
could define how large your cache pool was for certain objects that would 
fix it also.


Reverse DNS isnt the only issue here. There are many sites that give each 
user a subdomain. And if i look at my top talkers on some busy resolvers i 
do see that thats doing about 25-30% of the lookups currently.


akamai.net, amazonaws.com and so on. All make nice use of DNS for this.
Those have litterly millions of entry's in DNS also. And thats what 
currently is doing the load on resolvers...


Oh and dont forget DNSSEC. Traffic went up allready by a factor 7 due to 
that. Those objects are also much larger to store.


Bye,
Raymond.







Re: DNS caches that support partitioning ?

2012-08-18 Thread Jimmy Hess
On 8/17/12, Michael Thomas m...@mtcc.com wrote:
 If the dnsbl queries are not likely to be used again, why don't they
 set their ttl way down?
Because the DNSBLs don't tune the TTLs for individual responses; they
likely still benefit from extended caching,  high TTLs for responses
makes sense for controlling load on the DNSBL servers,   and your
cache efficiency is not the DNSBL operators' problem.  There are
/some/  IP addresses that generate a lot of mail;  and I would suggest
there are /plenty/ of low-traffic recursive DNS servers in the world
that may still make DNSBL queries.

The  /real/ problem to be addressed is the poor caching discipline
used by DNS recursive servers.  A recursive server with
intelligent caching,  would not simply allocate a chunk of heap space
and hold for TTL, and when assigned cache space runs out, evict the
oldest query,  when space is short,  eg.  Least-Recently Hit cache
entry;LRU; Least-Recent RR Response; Least-Recently Queried;  or
Lowest-Remaining TTL,
are all naive.

An intelligent recursive DNS caching system would leverage both RAM
and Disk when appropriate,  store the cache efficiently and
persistently,  keep track of  cache evictions  that occured,  and the
number of times a RR  was requested   would factor into  not only
avoiding eviction of the cache,  but  for popular RR,   it would make
sense for the  recursive DNS server to make a pre-emptive query,   to
refresh a RR  that is about to expire due to TTL,

So that the response latency for some random time that RR is queried
in the future doesn't go up due to the cache entry expiring.

And I say that, because some very popular RRs have insanely low TTLs.

Case in point:
www.l.google.com.   300 IN  A   74.125.227.148
www.l.google.com.   300 IN  A   74.125.227.144
www.l.google.com.   300 IN  A   74.125.227.146
www.l.google.com.   300 IN  A   74.125.227.145
www.l.google.com.   300 IN  A   74.125.227.147
www.l.google.com.   300 IN  A   74.125.227.148




 In any case, DNSBL's use of DNS has always been a hack. If v6
 causes the hack to blow up, they should create their own protocol
 rather than ask how we can make the global DNS accommodate
 their misuse of DNS.

 Mike
[snip]

--
-JH



Re: DNS caches that support partitioning ?

2012-08-18 Thread Patrick W. Gilmore
On Aug 18, 2012, at 5:35, Raymond Dijkxhoorn raym...@prolocation.net wrote:

 Reverse DNS isnt the only issue here. There are many sites that give each 
 user a subdomain. And if i look at my top talkers on some busy resolvers i do 
 see that thats doing about 25-30% of the lookups currently.
 
 akamai.net, amazonaws.com and so on. All make nice use of DNS for this.
 Those have litterly millions of entry's in DNS also. And thats what currently 
 is doing the load on resolvers...

Akamai has no users.  So not really sure what you mean by that.

There are a /lot/ of hostnames on *.akamai.net.  That may have something to do 
with the 1000s of companies that use Akamai to deliver approximately 20% of all 
the traffic going down broadband modems.  Which fits nicely in your DNS lookup 
percentage.

-- 
TTFN,
patrick




DNS caches that support partitioning ?

2012-08-17 Thread John Levine
Are there DNS caches that allow you to partition the cache for
subtrees of DNS names?  That is, you can say that all entries from
say, in-addr.arpa, are limited to 20% of the cache.

The application I have in mind is to see if it helps to keep DNSBL
traffic, which caches poorly, from pushing other stuff out of the
cache, but there are doubtless others.

R's,
John





Re: DNS caches that support partitioning ?

2012-08-17 Thread Andrew Sullivan
On Fri, Aug 17, 2012 at 04:13:09PM -, John Levine wrote:
 The application I have in mind is to see if it helps to keep DNSBL
 traffic, which caches poorly, from pushing other stuff out of the
 cache, but there are doubtless others.

If it's getting evicted from cache because other things are getting
used more often, why do you want to put your thumb on that scale?  The
other queries are presumably benefitting just as much from the caching.

Best,

A

-- 
Andrew Sullivan
Dyn Labs
asulli...@dyn.com



Re: DNS caches that support partitioning ?

2012-08-17 Thread valdis . kletnieks
On Fri, 17 Aug 2012 15:32:11 -0400, Andrew Sullivan said:
 On Fri, Aug 17, 2012 at 04:13:09PM -, John Levine wrote:
  The application I have in mind is to see if it helps to keep DNSBL
  traffic, which caches poorly, from pushing other stuff out of the
  cache, but there are doubtless others.

 If it's getting evicted from cache because other things are getting
 used more often, why do you want to put your thumb on that scale? The
 other queries are presumably benefitting just as much from the caching.

I think John's issue is that he's seeing those other queries *not* benefiting
from the caching because they get pushed out by DNSBL queries that will likely
not ever be used again.  You don't want your cached entry for www.google.com
to get pushed out by a lookup for a dialup line somewhere in Africa.


pgpfqHP9iuevn.pgp
Description: PGP signature


Re: DNS caches that support partitioning ?

2012-08-17 Thread Michael Thomas

On 08/17/2012 01:32 PM, valdis.kletni...@vt.edu wrote:

On Fri, 17 Aug 2012 15:32:11 -0400, Andrew Sullivan said:

On Fri, Aug 17, 2012 at 04:13:09PM -, John Levine wrote:

The application I have in mind is to see if it helps to keep DNSBL
traffic, which caches poorly, from pushing other stuff out of the
cache, but there are doubtless others.

If it's getting evicted from cache because other things are getting
used more often, why do you want to put your thumb on that scale? The
other queries are presumably benefitting just as much from the caching.

I think John's issue is that he's seeing those other queries *not* benefiting
from the caching because they get pushed out by DNSBL queries that will likely
not ever be used again.  You don't want your cached entry for www.google.com
to get pushed out by a lookup for a dialup line somewhere in Africa.

If the dnsbl queries are not likely to be used again, why don't they
set their ttl way down?

In any case, DNSBL's use of DNS has always been a hack. If v6
causes the hack to blow up, they should create their own protocol
rather than ask how we can make the global DNS accommodate
their misuse of DNS.

Mike



Re: DNS caches that support partitioning ?

2012-08-17 Thread Andrew Sullivan
On Fri, Aug 17, 2012 at 04:32:45PM -0400, valdis.kletni...@vt.edu wrote:
 
 I think John's issue is that he's seeing those other queries *not* benefiting
 from the caching because they get pushed out by DNSBL queries that will likely
 not ever be used again.  You don't want your cached entry for www.google.com
 to get pushed out by a lookup for a dialup line somewhere in Africa.

Oh, yes, I see.  You're right, I misread it.  But the proposed
solution still seems wrong to me.  If the entry for www.google.com
gets invalidated by a new cache candidate that is never going to get
used again, the cache is simply too small (or else it doesn't have
enough traffic, and you shouldn't have a cache there at all).

The cache needs to be big enough that it has a thrashy bit that is
getting changed all the time.  Those are the records that go into the
cache and then die without being queried again.  If the problem is
that there's some other record in there that might be queried again,
but that doesn't get queried often enough to keep it alive, then the
additional cost of the recursive lookup is just not that big a deal.  

Best,

A
-- 
Andrew Sullivan
Dyn Labs
asulli...@dyn.com



Re: DNS caches that support partitioning ?

2012-08-17 Thread John Levine
The cache needs to be big enough that it has a thrashy bit that is
getting changed all the time.  Those are the records that go into the
cache and then die without being queried again.  If the problem is
that there's some other record in there that might be queried again,
but that doesn't get queried often enough to keep it alive, then the
additional cost of the recursive lookup is just not that big a deal.  

A big part of the problem is that TTLs reflect the server's estimate
of how long it might be until the data changes, but not the client's
likelihood of reusing the data.  I publish a BL of hosts in Korea.
The data changes maybe once a month so from my point of view a large
TTL would make sense, but most of the hits are for bots spraying spam
at random, so the client's not likely to reuse the data even if it
sticks around for many hours.

A plausible alternative to a partitioned cache might be TTL capping,
some way to say to your cache that no matter what the TTL is on
data from a range of names, don't keep it for more than a few seconds.

Another thing that would be useful to me for figuring out the BL cache
stuff would be traces of incoming IP addresses and timestamps for mail
servers larger than my tiny system, so I could try out some cache models.


R's,
John