Re: [LINK] RFI: Telstra DNS outage

2016-05-14 Thread Roger Clarke
Thanks Geoff, that clarifies quite a few things.

One query remains though:  if resolvers stopped responding, resulting in the 
service dying, surely that still means that there was a single-point-of-failure 
- although in a different part of the system from where I was inferring.

__

At 9:23 +1000 15/5/16, Geoff Huston wrote:
>Hi Roger,
>
>Yes you are making unfounded accusations here based on poor evidence and 
>insufficient analysis.
>
>Firstly, you are confusing resolvers and authoritative name servers. The 
>article you quote was about Telstra's resolvers not answering DNS queries from 
>Telstra customers. I.e. Telstra's resolvers stopped responding. Your note 
>looks at the authoritative name servers for the telstra.net domain.
>
>Secondly, you should've seen that two of the four servers are operated by 
>APNIC, rather than Telstra. So there is no single point of name failure in 
>serving telstra.net
>
>Thirdly, in the DNS too much is sometimes as bad as too little. More servers 
>for a name can cause slower responses to resolution requests in some cases. 
>Telstra's design of its server infrastructure, using 2 organizations and 4 
>server addresses looks like a good decision.
>
>Fourthly you are inferring way too much from the IPv4 address. I have not 
>bothered to check but that fact that these are numerically adjacent addresses 
>still permits the possibility that these are the addresses of two anycast 
>clouds and there many be a number of servers that respond to the same address. 
>It may also be the case that the internal routing infrastructure treats these 
>as distinct /32s and they may well be provisioned using diverse internal paths.
>
>I would hesitate to hurl around accusations of "utter incompetence" in this 
>case. I would tend to say that the server design for serving 'telstra.net' 
>looks like decent service engineering, and the "problems" you appear to 
>identify may well reflect your understanding of DNS and network engineering.
>
>
>Regards,
>
>Geoff

__


>> On 13 May 2016, at 09:02, Roger Clarke  wrote:
>> 
>> itNews reports:
>>> Telstra suffered a nationwide network outage last night, as two of its 
>>> internet domain name servers ceased to respond to queries from thousands of 
>>> customer systems.
>> 
>> Am I missing something here?
>> 
>> I've chastised small-time ISPs in the past for having both or all of their 
>> DNS-servers on the same sub-net and therefore (under IPv4 at least) subject 
>> to the same threats.  They thereby represent a single-point-of-failure, 
>> rather than the redundancy that is the whole point of having >1 DNS-server
>> But Telstra currently shows
>> telstra.net.NSdns1.telstra.net.
>> telstra.net.NSsec1.apnic.net.
>> telstra.net.NSsec3.apnic.net.
>> telstra.net.NSdns0.telstra.net.
>> 
>> dns1.telstra.net.A203.50.5.200
>> dns0.telstra.net.A203.50.5.199
>> 
>> Is the largest provider in the country utterly incompetent?
>> 
>> Or is there something important about Internet architecture that I fail to 
>> understand?
>> 
>> __
>> 
>> Telstra DNS outage causes customer grief
>> By Juha Saarinen on May 13, 2016 6:51AM
>> Two-hour interruption to services.
>> http://www.itnews.com.au/news/telstra-dns-outage-causes-customer-grief-419496
>> 
>> Telstra suffered a nationwide network outage last night, as two of its 
>> internet domain name servers ceased to respond to queries from thousands of 
>> customer systems.
>> 
>> Two Telstra name servers used by customers for domain resolution, ns0 and 
>> ns1.telstra.net, went offline just after eight o'clock last night, users 
>> reported.
>> 
>> Domain name system servers are used to look up and point client systems to 
>> the correct IP address for human readable URLs such as www.telstra.net.
>> 
>> Without working DNS resolution, web browsers and other applications are 
>> unable to locate the IP address of the server they need to communicate with.
>> 
>> The name servers appear to have come back up around 11pm yesterday.
>> 
>> Telstra's service status web page made no mention of the DNS server problem.
>> 
>> While many Telstra customers took to Twitter and Facebook to complain about 
>> the outage, the telco did not confirm the service interruption until this 
>> morning, when it said the issue had been dealt with.
>> 
>>@crakd67 Sorry for the delay in replying - the DNS issue has since been 
>> resolved - Steph
>>- Telstra (@Telstra) May 12, 2016
>> 
>> iTnews has contacted Telstra for comment on the outage.
>> 
>> The telco earlier this month pledged to pour an extra $50 million into its 
>> mobile network after a series of damaging outages in the early months of 
>> this year.
>> 
>> 
>> -- 
>> Roger Clarke http://www.rogerclarke.com/
>>
>> Xamax Consultancy Pty Ltd  78 Sidaway St, 

Re: [LINK] RFI: Telstra DNS outage

2016-05-13 Thread JanW
At 12:15 PM 13/05/2016, Roger Clarke wrote:

>>Update A Telstra spokesman acknowledged last night's outage and attributed it 
>>to a failure with a component that manages traffic ...
>...snip... 

This discussion today was timely. Our computer club met today and asked what 
happened to the Internet last night. I was able to tell them that it wasn't 
their fault.

Link comes to the rescue again!

Jan


I write books. http://janwhitaker.com/?page_id=8

Melbourne, Victoria, Australia
jw...@janwhitaker.com
Twitter: JL_Whitaker
Blog: www.janwhitaker.com 

Sooner or later, I hate to break it to you, you're gonna die, so how do you 
fill in the space between here and there? It's yours. Seize your space. 
~Margaret Atwood, writer 

_ __ _
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


Re: [LINK] RFI: Telstra DNS outage

2016-05-12 Thread Roger Clarke
At 13:27 +1200 13/5/16, Juha Saarinen wrote:
>Have some official comment from Telstra now:
>http://www.itnews.com.au/news/telstra-dns-outage-causes-customer-grief-419496

>Update A Telstra spokesman acknowledged last night's outage and attributed it 
>to a failure with a component that manages traffic to its two DNS servers.
>"We can confirm there was an issue with one of our DNS servers which impacted 
>some of our corporate customers only. It was resolved in 90 minutes," the 
>spokesperson said.
>"We isolated the component and traffic was directed to the servers."

Thanks Juha!

This suggests that the Telstra spokesman may have mis-read the DNS entries the 
same way I did (in order to conclude that there were "two [and only two] DNS 
servers").

And that the Telstra spokesman may have overlooked the use of Anycasting.

Or, alternatively, the Telstra spokesman knows what he's talking about, and 
there's a single-point-of-failure for the entire Telstra service, viz. "a 
[single] component that manages traffic to its two DNS servers".

That's consistent with 'a router' and hence with my initial presumption that 
the two DNS-servers are on the same sub-net.  (But there are many other 
possibilities, so it certainly doesn't prove my presumption to be correct).

I need to revise my critical comments to something like "it still seems like 
incompetence on the part of Telstra's technical staff and/or execs (e.g. if 
they ignored risk assessments and tech recommendations) and/or spokesman".

I remain aghast, and I think Telstra customers should be as well.


-- 
Roger Clarke http://www.rogerclarke.com/

Xamax Consultancy Pty Ltd  78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916http://about.me/roger.clarke
mailto:roger.cla...@xamax.com.auhttp://www.xamax.com.au/

Visiting Professor in the Faculty of LawUniversity of N.S.W.
Visiting Professor in Computer ScienceAustralian National University
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


Re: [LINK] RFI: Telstra DNS outage

2016-05-12 Thread Roger Clarke
>> On 13 May 2016, at 9:02 AM, Roger Clarke  wrote:
>> Is the largest provider in the country utterly incompetent?
>> Or is there something important about Internet architecture that I fail to 
>> understand?

At 9:08 +1000 13/5/16, Avi Miller wrote:
>It's most likely that Telstra are AnyCasting their DNS servers:
>https://en.wikipedia.org/wiki/Anycast
>Essentially this means that they have a single IP address that is routed to 
>the nearest actual DNS server to the requester. And that there can be lots and 
>lots of backends for this.

Thanks for this!

However, following through to RFC3258
https://tools.ietf.org/html/rfc3258

it seems that redundancy, and hence accessibility when the primary DNS-server 
is unreachable, was *not* a motivation for the application of Anycasting to the 
DNS:
"The primary motivation for the development and deployment of these practices 
is to increase the distribution of Domain Name System (DNS) servers to 
previously under-served areas of the network topology and to reduce the latency 
for DNS query responses in those areas"

And, as I understand it, the first backbone router, where BGP comes into play, 
should intercept the packet addressed to the Telstra name-server, and 
substitute an IP-address based on its internal table.

If Anycasting is in use, and the Telstra name-servers were unreachable, then 
presumably either the BGP tables were polluted, or *all* of the net-near 
name-servers were out of action.  (Or even *all* of the name-servers were out 
of action, if the process is clever enough to detect that the net-near ones 
aren't responding and then sends packets to net-distant servers).

Either way, it still seems like incompetence on Telstra's part.

(And the speed with which it was fixed suggests that there could have been a 
pre-programmed solution to whatever the underlying cause was, had they bothered 
to implement it).


-- 
Roger Clarke http://www.rogerclarke.com/

Xamax Consultancy Pty Ltd  78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916http://about.me/roger.clarke
mailto:roger.cla...@xamax.com.auhttp://www.xamax.com.au/

Visiting Professor in the Faculty of LawUniversity of N.S.W.
Visiting Professor in Computer ScienceAustralian National University
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


Re: [LINK] RFI: Telstra DNS outage

2016-05-12 Thread Jim Birch
Avi Miller wrote:

It's most likely that Telstra are AnyCasting their DNS servers


...so the problem would likely relate their routers' Anycast configuration,
rather than an actual dns server problem, I guess.

Jim
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


Re: [LINK] RFI: Telstra DNS outage

2016-05-12 Thread Hamish Moffatt

On 13/05/16 09:02, Roger Clarke wrote:

itNews reports:

Telstra suffered a nationwide network outage last night, as two of its internet 
domain name servers ceased to respond to queries from thousands of customer 
systems.

Am I missing something here?

I've chastised small-time ISPs in the past for having both or all of their 
DNS-servers on the same sub-net and therefore (under IPv4 at least) subject to the 
same threats.  They thereby represent a single-point-of-failure, rather than the 
redundancy that is the whole point of having >1 DNS-server.

But Telstra currently shows this:

telstra.net.NS  dns1.telstra.net.
telstra.net.NS  sec1.apnic.net.
telstra.net.NS  sec3.apnic.net.
telstra.net.NS  dns0.telstra.net.

dns1.telstra.net.   A   203.50.5.200
dns0.telstra.net.   A   203.50.5.199

Is the largest provider in the country utterly incompetent?

Or is there something important about Internet architecture that I fail to 
understand?



Besides dns0/1.telstra.net there's two other servers there you've 
overlooked.


In addition to what the others have said, those are the IPs for 
telstra.net's name servers (used for everybody worldwide to find 
Telstra), not the name servers used by Telstra customers to find things 
on the Internet. On my cable connection in Melbourne the provided name 
servers are 61.9.133.193 and 61.9.134.49 
(dns-cust.lon.bigpond.net.au/dns-cust.win.bigpond.net.au).




Hamish
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


Re: [LINK] RFI: Telstra DNS outage

2016-05-12 Thread Avi Miller
Hi,

> On 13 May 2016, at 9:02 AM, Roger Clarke  wrote:
> 
> Is the largest provider in the country utterly incompetent?
> Or is there something important about Internet architecture that I fail to 
> understand?

It's most likely that Telstra are AnyCasting their DNS servers:

https://en.wikipedia.org/wiki/Anycast

Essentially this means that they have a single IP address that is routed to the 
nearest actual DNS server to the requester. And that there can be lots and lots 
of backends for this.

Cheers,
Avi
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link


[LINK] RFI: Telstra DNS outage

2016-05-12 Thread Roger Clarke
itNews reports:
>Telstra suffered a nationwide network outage last night, as two of its 
>internet domain name servers ceased to respond to queries from thousands of 
>customer systems.

Am I missing something here?

I've chastised small-time ISPs in the past for having both or all of their 
DNS-servers on the same sub-net and therefore (under IPv4 at least) subject to 
the same threats.  They thereby represent a single-point-of-failure, rather 
than the redundancy that is the whole point of having >1 DNS-server.

But Telstra currently shows this:

telstra.net.NS  dns1.telstra.net.
telstra.net.NS  sec1.apnic.net.
telstra.net.NS  sec3.apnic.net.
telstra.net.NS  dns0.telstra.net.

dns1.telstra.net.   A   203.50.5.200
dns0.telstra.net.   A   203.50.5.199

Is the largest provider in the country utterly incompetent?

Or is there something important about Internet architecture that I fail to 
understand?

__

Telstra DNS outage causes customer grief
By Juha Saarinen on May 13, 2016 6:51AM
Two-hour interruption to services.
http://www.itnews.com.au/news/telstra-dns-outage-causes-customer-grief-419496

Telstra suffered a nationwide network outage last night, as two of its internet 
domain name servers ceased to respond to queries from thousands of customer 
systems.

Two Telstra name servers used by customers for domain resolution, ns0 and 
ns1.telstra.net, went offline just after eight o'clock last night, users 
reported.

Domain name system servers are used to look up and point client systems to the 
correct IP address for human readable URLs such as www.telstra.net.

Without working DNS resolution, web browsers and other applications are unable 
to locate the IP address of the server they need to communicate with.

The name servers appear to have come back up around 11pm yesterday.

Telstra's service status web page made no mention of the DNS server problem.

While many Telstra customers took to Twitter and Facebook to complain about the 
outage, the telco did not confirm the service interruption until this morning, 
when it said the issue had been dealt with.

@crakd67 Sorry for the delay in replying - the DNS issue has since been 
resolved - Steph
- Telstra (@Telstra) May 12, 2016

iTnews has contacted Telstra for comment on the outage.

The telco earlier this month pledged to pour an extra $50 million into its 
mobile network after a series of damaging outages in the early months of this 
year.


-- 
Roger Clarke http://www.rogerclarke.com/

Xamax Consultancy Pty Ltd  78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916http://about.me/roger.clarke
mailto:roger.cla...@xamax.com.auhttp://www.xamax.com.au/

Visiting Professor in the Faculty of LawUniversity of N.S.W.
Visiting Professor in Computer ScienceAustralian National University
___
Link mailing list
Link@mailman.anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/link