Re: [uknof] Go daddy what happened

2012-10-06 Thread Thomas Mangin
Hi Neil,

http://www.gossamer-threads.com/lists/nsp/outages/40837

Thomas

On 6 Oct 2012, at 05:29, Neil J. McRae n...@domino.org wrote:

 but even if they didn't have RR how do they get into a situation where a 
 router starts switching in software. RR is a red herring in this failure 
 scenario even with full mesh this failure would still have happened.
 
  root cause is somewhere a wad of routes turned a lot of silicon into 
 something useless. 
 
 does anyone know what kit this was?
 
 Sent from my iPad 
 
 On 5 Oct 2012, at 22:40, Thomas Mangin thomas.man...@exa-networks.co.uk 
 wrote:
 
 http://inside.godaddy.com/inside-story-happened-godaddy-com-sept-10-2012/
 
 Their conclusion about RR make sense ...
 
 Sent from my iPad



Re: [uknof] Go daddy what happened

2012-10-06 Thread Will Hargrave

On 6 Oct 2012, at 05:29, Neil J. McRae n...@domino.org wrote:

 but even if they didn't have RR how do they get into a situation where a 
 router starts switching in software. RR is a red herring in this failure 
 scenario even with full mesh this failure would still have happened.
  root cause is somewhere a wad of routes turned a lot of silicon into 
 something useless. 
 does anyone know what kit this was?

These sorts of designs are common in DC networks now, with increasing use of l3 
to the edge. 

The key thing here is to keep your internet edge/core separated from your DC 
network.

Great preso here from Microsoft:
http://www.nanog.org/meetings/nanog55/abstracts.php?pt=MTk0MiZuYW5vZzU1nm=nanog55


-- 
Will Hargrave
+44 114 303 






Re: [uknof] Go daddy what happened

2012-10-06 Thread Neil J. McRae
Ah ok now I understand Will's email, didn't spot this reply :-)

and your hypothesis seems very reasonable.

--
Neil J. McRae.
n...@domino.org


From: uknof-boun...@lists.uknof.org.uk [uknof-boun...@lists.uknof.org.uk] on 
behalf of Daniel Austin [dan...@kewlio.net]
Sent: 06 October 2012 07:54
To: uknof@lists.uknof.org.uk
Subject: Re: [uknof] Go daddy what happened

Hi,

On 06/10/2012 05:29, Neil J. McRae wrote:
 but even if they didn't have RR how do they get into a situation where a
 router starts switching in software. RR is a red herring in this failure
 scenario even with full mesh this failure would still have happened.

   root cause is somewhere a wad of routes turned a lot of silicon into
 something useless.

 does anyone know what kit this was?

I had a theory that they were using switches to route with a limited
table, and accidentally pushed a full table to them.
When they say 210x normal routes... if they normally had around 2000
routes in the FIB, 210x this would be approx a full table.

If they limited the route reflectors with a max-prefix setting, they
could end up in a situation where their routers become islands.

These are the sorts of mistakes i'd expect from a new, unexperienced ISP
- not someone the size of godaddy.


Thanks,

Dan.