Re: DNS Reliability

2013-09-23 Thread bmanning
On Mon, Sep 16, 2013 at 06:36:22PM +0200, Niels Bakker wrote:
 * bmann...@vacation.karoshi.com (bmann...@vacation.karoshi.com) [Fri 13 Sep 
 2013, 22:16 CEST]:
  from where?  to where?  what % of the Internet is _not_
  reachable from my DNS service at any given time?  why is
  that acceptable? and more importantly, who's job is it to
  fix/stablize the net so these remote locations can reach
  my DNS service?
 
  we will answer 100% of the valid DNS queries we receive.
 
 Is this thread even about authoritative or recursive DNS?
 
 
   -- Niels.


Does it matter?

/bill



Re: DNS Reliability

2013-09-16 Thread Niels Bakker

* bmann...@vacation.karoshi.com (bmann...@vacation.karoshi.com) [Fri 13 Sep 
2013, 22:16 CEST]:

from where?  to where?  what % of the Internet is _not_
reachable from my DNS service at any given time?  why is
that acceptable? and more importantly, who's job is it to
fix/stablize the net so these remote locations can reach
my DNS service?

we will answer 100% of the valid DNS queries we receive.


Is this thread even about authoritative or recursive DNS?


-- Niels.

--



Re: DNS Reliability

2013-09-16 Thread Nick Hilliard
On 16/09/2013 17:36, Niels Bakker wrote:
 Is this thread even about authoritative or recursive DNS?

as far as I can tell, it's about waves hands wildly

Or something.

Nick





Re: DNS Reliability

2013-09-16 Thread Sebastian Castro
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 13/09/13 12:45, valdis.kletni...@vt.edu wrote:
 On Thu, 12 Sep 2013 14:03:44 -0600, Phil Fagan said:
 Everything else remaining equal...is there a standard or
 expectation for DNS reliability?
 
 98% 99% 99.5% 99.9% 99.99% 99.999%
 
 Measured in queries completed vs. queries lost.
 
 Whats the consensus?
 
 Remember to factor in Duane Wessel's work that showed that
 something like 98% of the DNS traffic at the root servers was
 totally bogus?
 
 Maybe you need to factor in broken queries not answered, and
 offenders slapped around with a large trout?  Because if it's
 busted requests you're sending towards the root, they're going to
 count against your completed/lost ratio in a really bad way.
 
 Anybody know if people have cleaned up their collective acts since
 Duane did that paper?
 

Wearing a different hat, I had the chance to rerun that analysis with
data from 2008 (original paper is from 2003) and the number were still
around 98%

http://www.caida.org/publications/presentations/2008/wide_castro_root_servers/wide_castro_root_servers.pdf

Cheers,
- -- 
Sebastian Castro
DNS Specialist
.nz Registry Services (New Zealand Domain Name Registry Limited)
desk: +64 4 495 2337
mobile: +64 21 400535
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlI3bfYACgkQWyqRrHcQWTkagwCeOaShzFH1i8q9Y34/cybV6bUY
qBYAn1A8JPgNJqH6mijUFN7+4ufybJqZ
=X7UE
-END PGP SIGNATURE-



Re: DNS Reliability

2013-09-13 Thread Marco Davids (Prive)
On 09/13/13 03:53, Larry Sheldon wrote:
 On 9/12/2013 3:25 PM, Phil Fagan wrote:
 Its a good point about the anycast; 99.999% should be expected.
 A small choice of attitude-reflecting language.

 I expect 100.000%

 I'll accept 99.999% or better.


It depends... define 'lost queries'. For example; is RRL included here
or not (sometimes you want to deliberatly 'loose' queries).

--
Marco



Re: DNS Reliability

2013-09-13 Thread Larry Sheldon

On 9/13/2013 2:14 AM, Marco Davids (Prive) wrote:

On 09/13/13 03:53, Larry Sheldon wrote:

On 9/12/2013 3:25 PM, Phil Fagan wrote:

Its a good point about the anycast; 99.999% should be expected.

A small choice of attitude-reflecting language.

I expect 100.000%

I'll accept 99.999% or better.



It depends... define 'lost queries'. For example; is RRL included here
or not (sometimes you want to deliberatly 'loose' queries).




I do not ever set any amount of failure as an objective.  I usually have 
a specified tolerance for failure.  If for some odd circumstance I wan 
to discard queries, that would involve knowing exactly what happened to 
them--not loosing them.




--
Requiescas in pace o email   Two identifying characteristics
of System Administrators:
Ex turpi causa non oritur actio  Infallibility, and the ability to
learn from their mistakes.
  (Adapted from Stephen Pinker)



Re: DNS Reliability

2013-09-13 Thread Phil Fagan
Tolerance for failure; I like it.

Eric - I'm interested in an accepted norm of loss of queries made to the
cache tier. Yes, when I provide a 'service' to a client (don't really care
about SLA) i'm interested in what the accepted norm or guidance is on %
loss on queries -- because this drives my architecture, right?

Marco - I think 'lost queries' in this instance is simply, wait for
it.the full UDP session. Yes yes, session is bad to say, but service
request completed through middle-boxes are tracked as sessions.

So thats what I'm looking for; what is the general consesus for reliability
all other things equal. Sure, you have the factor of UDP, retry, path, etc.
etc. etc.but I think Larry hit the nail on the head - whats my
clients[aggregate of] tolerance before Evil ensues.




On Fri, Sep 13, 2013 at 8:00 AM, Larry Sheldon larryshel...@cox.net wrote:

 On 9/13/2013 2:14 AM, Marco Davids (Prive) wrote:

 On 09/13/13 03:53, Larry Sheldon wrote:

 On 9/12/2013 3:25 PM, Phil Fagan wrote:

 Its a good point about the anycast; 99.999% should be expected.

 A small choice of attitude-reflecting language.

 I expect 100.000%

 I'll accept 99.999% or better.


 It depends... define 'lost queries'. For example; is RRL included here
 or not (sometimes you want to deliberatly 'loose' queries).




 I do not ever set any amount of failure as an objective.  I usually have a
 specified tolerance for failure.  If for some odd circumstance I wan to
 discard queries, that would involve knowing exactly what happened to
 them--not loosing them.




 --
 Requiescas in pace o email   Two identifying characteristics
 of System Administrators:
 Ex turpi causa non oritur actio  Infallibility, and the ability to
 learn from their mistakes.
   (Adapted from Stephen Pinker)




-- 
Phil Fagan
Denver, CO
970-480-7618


Re: DNS Reliability

2013-09-13 Thread Jean-Francois Mezei
On 13-09-12 21:53, Larry Sheldon wrote:

 I expect 100.000%
 
 I'll accept 99.999% or better.

At these numbers, one has to start to count failover time. A system
can be disaster tolerant but take 2 hours to recover fully, or it could
also recover within a couple of seconds. It depends on architecture and
available services. And in networking, you also need to consider
internal and external routing update propagation times.





Re: DNS Reliability

2013-09-13 Thread bmanning
On Fri, Sep 13, 2013 at 04:01:51PM -0400, Jean-Francois Mezei wrote:
 On 13-09-12 21:53, Larry Sheldon wrote:
 
  I expect 100.000%
  
  I'll accept 99.999% or better.
 
 At these numbers, one has to start to count failover time. A system
 can be disaster tolerant but take 2 hours to recover fully, or it could
 also recover within a couple of seconds. It depends on architecture and
 available services. And in networking, you also need to consider
 internal and external routing update propagation times.
 
 

from where?  to where?  what % of the Internet is _not_ reachable
from my DNS service at any given time?  why is that acceptable?
and more importantly, who's job is it to fix/stablize the net so
these remote locations can reach my DNS service?

we will answer 100% of the valid DNS queries we receive. 


/bill



Re: DNS Reliability

2013-09-13 Thread Joe Abley

On 2013-09-13, at 16:01, Jean-Francois Mezei jfmezei_na...@vaxination.ca 
wrote:

 On 13-09-12 21:53, Larry Sheldon wrote:
 
 I expect 100.000%
 
 I'll accept 99.999% or better.
 
 At these numbers, one has to start to count failover time.

Before really any part of this thread makes sense, you have to describe exactly 
what you mean by available.


Joe




Re: DNS Reliability

2013-09-12 Thread Rubens Kuhl
On Thu, Sep 12, 2013 at 5:03 PM, Phil Fagan philfa...@gmail.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?


ICANN new gTLD agreements specified 100% availability for the service,
meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms
(UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a
single name server, defined as 1 DNS IP address.


Rubens


Re: DNS Reliability

2013-09-12 Thread Glen Wiley
Remember though that anycast only solves for availability in one layer of
the system and it is not difficult to create a less available anycast
presence if you do silly things with the way you manage your routes. A
system is only as available as the least available layer in that system

For example, if you use an automated system that changes your route
advertisements and that system encounters a defect that breaks your
announcements then although a well built anycast footprint might acheive
99.999, a poorly implemented management system that is less available and
creates an outage would reduce the number.


On Thu, Sep 12, 2013 at 4:25 PM, Phil Fagan philfa...@gmail.com wrote:

 Its a good point about the anycast; 99.999% should be expected.


 On Thu, Sep 12, 2013 at 2:14 PM, Beavis pfu...@gmail.com wrote:

  I go with 99.999% given that you have a good number of DNS Servers
  (anycasted).
 
 
  On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan philfa...@gmail.com wrote:
 
  Everything else remaining equal...is there a standard or expectation for
  DNS reliability?
 
  98%
  99%
  99.5%
  99.9%
  99.99%
  99.999%
 
  Measured in queries completed vs. queries lost.
 
  Whats the consensus?
 
 
  --
  Phil Fagan
  Denver, CO
  970-480-7618
 
 
 
 
  --
  ()  ascii ribbon campaign - against html e-mail
  /\  www.asciiribbon.org   - against proprietary attachments
 
  Disclaimer:
  http://goldmark.org/jeff/stupid-disclaimers/
 



 --
 Phil Fagan
 Denver, CO
 970-480-7618




-- 
Glen Wiley
KK4SFV

A designer knows he has achieved perfection not when there is nothing left
to add, but when there is nothing left to take away. - Antoine de
Saint-Exupery


Re: DNS Reliability

2013-09-12 Thread Randy Bush
 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?
 ...
 Measured in queries completed vs. queries lost.

this is the wrong question.  the protocol is designed assuming query
failures.

randy



Re: DNS Reliability

2013-09-12 Thread Bryan Tong
To me anything below 99.99% is unacceptable.

100 failures out of 100,000 queries still seems like a lot especially if
its not network related.

So I would say 99.999% would be what I would look for.

Thanks


On Thu, Sep 12, 2013 at 2:03 PM, Phil Fagan philfa...@gmail.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?


 --
 Phil Fagan
 Denver, CO
 970-480-7618




-- 

Bryan Tong
Nullivex LLC | eSited LLC
(507) 298-1624


Re: DNS Reliability

2013-09-12 Thread Phil Fagan
Thumbs up on this one; my entire path and chain of management of that path
need to be equally fault tolerant - Awesome.


On Thu, Sep 12, 2013 at 2:40 PM, Glen Wiley glen.wi...@gmail.com wrote:

 Remember though that anycast only solves for availability in one layer of
 the system and it is not difficult to create a less available anycast
 presence if you do silly things with the way you manage your routes. A
 system is only as available as the least available layer in that system

 For example, if you use an automated system that changes your route
 advertisements and that system encounters a defect that breaks your
 announcements then although a well built anycast footprint might acheive
 99.999, a poorly implemented management system that is less available and
 creates an outage would reduce the number.


 On Thu, Sep 12, 2013 at 4:25 PM, Phil Fagan philfa...@gmail.com wrote:

  Its a good point about the anycast; 99.999% should be expected.
 
 
  On Thu, Sep 12, 2013 at 2:14 PM, Beavis pfu...@gmail.com wrote:
 
   I go with 99.999% given that you have a good number of DNS Servers
   (anycasted).
  
  
   On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan philfa...@gmail.com
 wrote:
  
   Everything else remaining equal...is there a standard or expectation
 for
   DNS reliability?
  
   98%
   99%
   99.5%
   99.9%
   99.99%
   99.999%
  
   Measured in queries completed vs. queries lost.
  
   Whats the consensus?
  
  
   --
   Phil Fagan
   Denver, CO
   970-480-7618
  
  
  
  
   --
   ()  ascii ribbon campaign - against html e-mail
   /\  www.asciiribbon.org   - against proprietary attachments
  
   Disclaimer:
   http://goldmark.org/jeff/stupid-disclaimers/
  
 
 
 
  --
  Phil Fagan
  Denver, CO
  970-480-7618
 



 --
 Glen Wiley
 KK4SFV

 A designer knows he has achieved perfection not when there is nothing left
 to add, but when there is nothing left to take away. - Antoine de
 Saint-Exupery




-- 
Phil Fagan
Denver, CO
970-480-7618


Re: DNS Reliability

2013-09-12 Thread Phil Fagan
Its a good point about the anycast; 99.999% should be expected.


On Thu, Sep 12, 2013 at 2:14 PM, Beavis pfu...@gmail.com wrote:

 I go with 99.999% given that you have a good number of DNS Servers
 (anycasted).


 On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan philfa...@gmail.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?


 --
 Phil Fagan
 Denver, CO
 970-480-7618




 --
 ()  ascii ribbon campaign - against html e-mail
 /\  www.asciiribbon.org   - against proprietary attachments

 Disclaimer:
 http://goldmark.org/jeff/stupid-disclaimers/




-- 
Phil Fagan
Denver, CO
970-480-7618


Re: DNS Reliability

2013-09-12 Thread Beavis
I go with 99.999% given that you have a good number of DNS Servers
(anycasted).


On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan philfa...@gmail.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?


 --
 Phil Fagan
 Denver, CO
 970-480-7618




-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Disclaimer:
http://goldmark.org/jeff/stupid-disclaimers/


Re: DNS Reliability

2013-09-12 Thread Phil Fagan
Good reference; thank you.


On Thu, Sep 12, 2013 at 2:39 PM, Rubens Kuhl rube...@gmail.com wrote:




 On Thu, Sep 12, 2013 at 5:03 PM, Phil Fagan philfa...@gmail.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?


 ICANN new gTLD agreements specified 100% availability for the service,
 meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms
 (UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a
 single name server, defined as 1 DNS IP address.


 Rubens






-- 
Phil Fagan
Denver, CO
970-480-7618


Re: DNS Reliability

2013-09-12 Thread George William Herbert


On Sep 12, 2013, at 2:35 PM, Randy Bush ra...@psg.com wrote:

 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?
 ...
 Measured in queries completed vs. queries lost.
 
 this is the wrong question.  the protocol is designed assuming query
 failures.
 
 randy

I think it's part of the right answer.  Capacity and server connectivity 
issues, what this metric will mostly measure, do matter.

The other part, more likely to get you on CNN and Reddit and the front pages of 
the NY Times and WSJ, is the area represented by MTBF / MTTR / etc.  how often 
is DNS for your domain DOWN - or WRONG - and how fast did you recover.

The other subthread about routeability plays into that.  For BIGPLACE 
environments, you should be considering how many AS numbers independently host 
DNS instances for you, in how many geographical regions, and do you have a 
backup registrar available spun up...


-george william herbert


Sent from Kangphone


Re: DNS Reliability

2013-09-12 Thread George Michaelson
we're already outside our operating envelope, if these community
expectation figures are believable. a wise man once said to me that when
setting formal conformance targets its a good idea to only set ones you can
honestly achieve, otherwise you're setting yourself up to be measured to
fail. I don't think that necessarily competes with 'aim high' ('be all you
can be') but...


On Fri, Sep 13, 2013 at 8:26 AM, George William Herbert 
george.herb...@gmail.com wrote:



 On Sep 12, 2013, at 2:35 PM, Randy Bush ra...@psg.com wrote:

  Everything else remaining equal...is there a standard or expectation for
  DNS reliability?
  ...
  Measured in queries completed vs. queries lost.
 
  this is the wrong question.  the protocol is designed assuming query
  failures.
 
  randy

 I think it's part of the right answer.  Capacity and server connectivity
 issues, what this metric will mostly measure, do matter.

 The other part, more likely to get you on CNN and Reddit and the front
 pages of the NY Times and WSJ, is the area represented by MTBF / MTTR /
 etc.  how often is DNS for your domain DOWN - or WRONG - and how fast did
 you recover.

 The other subthread about routeability plays into that.  For BIGPLACE
 environments, you should be considering how many AS numbers independently
 host DNS instances for you, in how many geographical regions, and do you
 have a backup registrar available spun up...


 -george william herbert


 Sent from Kangphone



Re: DNS Reliability

2013-09-12 Thread Randy Bush
 we're already outside our operating envelope

not really.  just some folk seem not to understand things such as udp
datagrams and the dns protocols.

randy



Re: DNS Reliability

2013-09-12 Thread George Michaelson
you removed a clause in that sentence randy:

we're already outside our operating envelope, if these community
expectation figures are believable

there is a point to that clause. its the same as your answer in some
respects.


On Fri, Sep 13, 2013 at 8:39 AM, Randy Bush ra...@psg.com wrote:

  we're already outside our operating envelope

 not really.  just some folk seem not to understand things such as udp
 datagrams and the dns protocols.

 randy



Re: DNS Reliability

2013-09-12 Thread George William Herbert


On Sep 12, 2013, at 3:39 PM, Randy Bush ra...@psg.com wrote:

 we're already outside our operating envelope
 
 not really.  just some folk seem not to understand things such as udp
 datagrams and the dns protocols.
 
 randy

Statistically, UDP sometimes arrives after an internet wide round trip.  Honest!

The worry is bimodal.

Most small sites, two or three servers, stop worrying.

Most medium sites, watch your server load and run external monitoring.

Most big sites are not sufficiently paranoid / redundant here.


-george william herbert


Sent from Kangphone




Re: DNS Reliability

2013-09-12 Thread Valdis . Kletnieks
On Thu, 12 Sep 2013 14:03:44 -0600, Phil Fagan said:
 Everything else remaining equal...is there a standard or expectation for
 DNS reliability?

 98%
 99%
 99.5%
 99.9%
 99.99%
 99.999%

 Measured in queries completed vs. queries lost.

 Whats the consensus?

Remember to factor in Duane Wessel's work that showed that something
like 98% of the DNS traffic at the root servers was totally bogus?

Maybe you need to factor in broken queries not answered, and offenders slapped
around with a large trout?  Because if it's busted requests you're sending
towards the root, they're going to count against your completed/lost ratio in a
really bad way.

Anybody know if people have cleaned up their collective acts since Duane
did that paper?



pgpIkMPmok_xn.pgp
Description: PGP signature


Re: DNS Reliability

2013-09-12 Thread Larry Sheldon

On 9/12/2013 3:25 PM, Phil Fagan wrote:

Its a good point about the anycast; 99.999% should be expected.


A small choice of attitude-reflecting language.

I expect 100.000%

I'll accept 99.999% or better.

--
Requiescas in pace o email   Two identifying characteristics
of System Administrators:
Ex turpi causa non oritur actio  Infallibility, and the ability to
learn from their mistakes.
  (Adapted from Stephen Pinker)



Re: DNS Reliability

2013-09-12 Thread Christopher Morrow
On Thu, Sep 12, 2013 at 6:26 PM, George William Herbert
george.herb...@gmail.com wrote:
 The other subthread about routeability plays into that.  For BIGPLACE 
 environments, you should be considering how many AS numbers independently 
 host DNS instances for you, in how many geographical regions, and do you have 
 a backup registrar available spun up...

here's an interesting point... if you are a BIGPLACE, do you want to
trust your fate to some third party hosting your dns for you? What
about how your internal name service stuff is managed?

say you have a practice of using rsh to affect updates across your 4
main dns nodes, adding a 5th or Nth outside where rsh is not
possible/desired  means adding additional processes and cruft to
your update process, is this acceptable?

Take, for instance the FBI.gov domain 3 days ago, some set of updates
happened, their ipv4 servers were answering with a consistent
response, their ipv6 nodes were answering with a variety of not
correct answers :( In the case of the FBI.gov domain, all of it is
handled outside 'fbi.gov hands' (all servers hosted externally) but...

-chris



Re: DNS Reliability

2013-09-12 Thread Eric Brunner-Williams
On 9/12/13 1:39 PM, Rubens Kuhl wrote:
 ICANN new gTLD agreements specified 100% availability for the service,
 meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms
 (UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a
 single name server, defined as 1 DNS IP address.

unless phil happens to be building out (or spec'ing out $provider's
offered sla) for one of the happy thousand or so celebrants of 2014, a
surprisingly large fraction of which are tenant plays on existing
infrastructure, the bogie above, uninterpreted, is not a controlling
authority.

additionally, was phil asking for a metric for an authoritative
server, serving a zone delegated directly from the iana root? was he
asking for a metric for a caching server?

and if the metric is queries completed vs. queries lost, from where
to where? (that is the uninterpreted bit from the bogie rubens
quotes, as we did have to correct some assumptions of the requirement
author -- where is the measurement being preformed?

i'm with randy on this, dns is a service, the better question is what
fails as query response degrades, in the presence of hierarchical
caching and the protocol being used as designed under best effort of
infrastructure and application.

eric