Re: FYI Netflix is down

2012-07-11 Thread steve pirk [egrep]
On Mon, Jul 9, 2012 at 10:20 AM, Dave Hart daveh...@gmail.com wrote: We continue to investigate why these connections were timing out during connect, rather than quickly determining that there was no route to the unavailable hosts and failing quickly. potential translation: We continue to

Re: FYI Netflix is down

2012-07-09 Thread gb10hkzo-na...@yahoo.co.uk
Steve at pirk, I fail to grasp the concept in your argument. You do realise, do you not, that your $ black boxes from your favourite brand name vendor have software running inside of them do you not ? Case in point for example, the recent LINX issues it wasn't the hardware that gave

Re: FYI Netflix is down

2012-07-09 Thread Alain Hebert
Hi, Well depending on your black box, your millage will vary. Their wide use of ASIC eliminate a lot of the headache of pure software implementation. Buffer, timing, expected results, etc. Their real sofware only represent a small part of the device and is mostly

Re: FYI Netflix is down

2012-07-09 Thread valdis . kletnieks
On Mon, 09 Jul 2012 08:07:14 -0400, Alain Hebert said: Their wide use of ASIC eliminate a lot of the headache of pure software implementation. And gets you, in return, the headaches of buggy hardware, where bug-fixing is just a bit harder than load the new release. ;) pgpSvdXo7xMkN.pgp

Re: FYI Netflix is down

2012-07-09 Thread Rayson Ho
On Sun, Jul 8, 2012 at 8:27 PM, steve pirk [egrep] st...@pirk.com wrote: I am pretty sure Netflix and others were trying to do it right, as they all had graceful fail-over to a secondary AWS zone defined. It looks to me like Amazon uses DNS round-robin to load balance the zones, because they

Re: FYI Netflix is down

2012-07-09 Thread Dave Hart
On Mon, Jul 9, 2012 at 15:50 UTC, Rayson Ho wrote: There are also bugs from the Netflix side uncovered by the AWS outage: Lessons Netflix Learned from the AWS Storm http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html We continue to investigate why these

Re: FYI Netflix is down

2012-07-08 Thread steve pirk [egrep]
On Tue, Jul 3, 2012 at 1:00 PM, Ryan Malayter malay...@gmail.com wrote: Doing it the right way makes the cloud far less cost-effective and far less agile. Once you get it all set up just so, change becomes very difficult. All the monitoring and fail-over/fail-back operations are generally

Re: FYI Netflix is down

2012-07-08 Thread Ryan Malayter
On Jul 8, 2012, at 7:27 PM, steve pirk [egrep] st...@pirk.com wrote: I am pretty sure Netflix and others were trying to do it right, as they all had graceful fail-over to a secondary AWS zone defined. Having a single company as an infrastructure supplier is not trying to do it right from

RE: FYI Netflix is down

2012-07-06 Thread Dan Golding
-Original Message- I imagine Netflix is mature enough to track this data as you suggest, and that's why they use AWS - downtime isn't a big deal for their business unless it gets really, really bad. There is another possibility that is probably much more widespread amongst AWS

Re: FYI Netflix is down

2012-07-06 Thread James Downs
On Jul 6, 2012, at 1:50 PM, Dan Golding wrote: This happens all the time. Not saying Netflix is doing this, but lots of other folks are. It’s a trap that’s easy to fall into. Especially with Netflix did the reverse. The moved *to* Amazon, so they could do noops.

Re: FYI Netflix is down

2012-07-04 Thread Kyle Creyts
Tell that to people in the third world without utilities. On Jul 3, 2012 8:32 PM, Randy Bush ra...@psg.com wrote: Also, I don't think there is an acceptable level of downtime for water. coming soon to a planet near you randy

Re: FYI Netflix is down

2012-07-04 Thread Randy Bush
Tell that to people in the third world without utilities. Also, I don't think there is an acceptable level of downtime for water. coming soon to a planet near you i work there regularly. the typical nanog kiddie does not. randy

Re: FYI Netflix is down

2012-07-03 Thread George Herbert
On Jul 2, 2012, at 7:19 PM, Rodrick Brown rodrick.br...@gmail.com wrote: People are acting as if Netflix is part of some critical service they stream movies for Christ sake. Some acceptable level of loss is fine for 99.99% of Netflix's user base just like cable, electricity and running

RE: FYI Netflix is down

2012-07-03 Thread Dan Golding
-Original Message- From: James Downs [mailto:e...@egon.cc] On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: People are acting as if Netflix is part of some critical service they stream movies for Christ sake. Some acceptable level of loss is fine for 99.99% of Netflix's user

RE: FYI Netflix is down

2012-07-03 Thread Ryan Malayter
James Downs wrote: For Netflix (and all other similar services) downtime is money and money is downtime. There is a quantifiable cost for customer acquisition and a quantifiable churn during each minute of downtime. Mature organizations actually calculate and track this. The trick is to

Re: FYI Netflix is down

2012-07-03 Thread James Downs
On Jul 3, 2012, at 6:11 AM, Dan Golding wrote: Also, I don't think there is an acceptable level of downtime for water. Neither do water utilities. I remember a certain conversation I had with a web-developer. We were talking about zero downtime releases. He thought it was acceptable if the

Re: FYI Netflix is down

2012-07-03 Thread Rodrick Brown
On Jul 3, 2012, at 9:11 AM, Dan Golding dgold...@ragingwire.com wrote: -Original Message- From: James Downs [mailto:e...@egon.cc] On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: People are acting as if Netflix is part of some critical service they stream movies for Christ

Re: FYI Netflix is down

2012-07-03 Thread Rodrick Brown
On Jul 3, 2012, at 10:58 AM, Ryan Malayter malay...@gmail.com wrote: James Downs wrote: For Netflix (and all other similar services) downtime is money and money is downtime. There is a quantifiable cost for customer acquisition and a quantifiable churn during each minute of downtime. Mature

Re: FYI Netflix is down

2012-07-03 Thread david raistrick
On Tue, 3 Jul 2012, Rodrick Brown wrote: face when implementing BCP today. I doubt Amazon gave much thought to multiple site outages and clients not being able to dynamically redeploy their engines because of inaccessibility from ELB. Considering there's a grand total of -one- tool in the

Re: FYI Netflix is down

2012-07-03 Thread Jon Lewis
On Mon, 2 Jul 2012, Greg D. Moore wrote: As for pulling the plug to test stuff. I recall a demo at Netapps in the early 00's. They were talking about their fault tolerance and how great it was. So I walked up to their demo array and said, So, it shouldn't be a problem if I pulled this drive

Re: FYI Netflix is down

2012-07-03 Thread Jon Lewis
On Mon, 2 Jul 2012, david raistrick wrote: On Mon, 2 Jul 2012, James Downs wrote: back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem.

Re: FYI Netflix is down

2012-07-03 Thread Seth Mattinen
On 6/29/12 8:22 PM, Joe Blanchard wrote: Seems that they are unreachable at the moment. Called and theres a recorded message stating they are aware of an issue, no details. I didn't see anyone post this yet, so here's Amazon's summary of events: http://aws.amazon.com/message/67457/

Re: FYI Netflix is down

2012-07-03 Thread Jay Ashworth
- Original Message - From: Steven Bellovin s...@cs.columbia.edu Subject: Re: FYI Netflix is down On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow

Re: FYI Netflix is down

2012-07-03 Thread George Herbert
On Jul 3, 2012, at 10:38 AM, Jay Ashworth j...@baylink.com wrote: - Original Message - From: Steven Bellovin s...@cs.columbia.edu Subject: Re: FYI Netflix is down On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: At 03:08 PM 7/2/2012, George Herbert wrote: If folks have

Re: FYI Netflix is down

2012-07-03 Thread Ryan Malayter
Jon Lewis wrote: It seems like if you're going to outsource your mission critical infrastructure to cloud you should probably pick at least 2 unrelated cloud providers and if at all possible, not outsource the systems that balance/direct traffic...and if you're really serious about it, have

Re: FYI Netflix is down

2012-07-03 Thread Randy Bush
Also, I don't think there is an acceptable level of downtime for water. coming soon to a planet near you randy

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
-Original Message- From: Todd Underwood [mailto:toddun...@gmail.com] scott, This was not a cascading failure.  It was a simple power outage Actually, it was a very complex power outage. I'm going to assume that what happened this weekend was similar to the event that happened

Re: FYI Netflix is down

2012-07-02 Thread Todd Underwood
Actually, it was a very complex power outage. I'm going to assume that what happened this weekend was similar to the event that happened at the same facility approximately two weeks ago (its immaterial - the details are probably different, but it illustrates the complexity of a data center

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
While I was working for a wireless telecom company our primary datacenter was knocked off the power grid due to weather, the generators kicked on and everything was fine, till one generator was struck by lighting and that same strike fried the control panel on the second one. Considering the

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote: from the perspective of people watching B-rate movies: this was a failure to implement and test a reliable system for streaming those movies in the face of a power outage at one facility. I want to emphasize

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: I used to work with a guy who had a simple test for these things, and if I was a VP at Amazon, Netflix, or any other large company I would do the same. About once a month he would walk out on the you mean like this?

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:13:22PM -0400, david raistrick wrote: you mean like this? http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great QA tool. However, it is not a complete testing

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, Leo Bicknell wrote: http://techblog.netflix.com/2011/07/netflix-simian-army.html Yes, Netflix seems to get it, and I think their Simian Army is a great QA tool. However, it is not a complete testing system, I have never seen them talk about testing non-software

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
This is an excellent example of how tests should be ran, unfortunately far too many places don't do this... -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 12:09 PM, Leo Bicknell wrote: In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400,

Re: FYI Netflix is down

2012-07-02 Thread Grant Ridder
The problem is large scale tests take a lot of time and planning. For it to be done right, you really need a dedicated DR team. -Grant On Mon, Jul 2, 2012 at 11:31 AM, AP NANOG na...@armoredpackets.com wrote: This is an excellent example of how tests should be ran, unfortunately far too many

Re: FYI Netflix is down

2012-07-02 Thread Leo Bicknell
In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: When the hardware is outsourced how would you propose testing the non-software components? They do simulate availability zone issues (and AZ is as close as you get to controlling which internal

Re: FYI Netflix is down

2012-07-02 Thread Cameron Byrne
On Jul 2, 2012 10:53 AM, Leo Bicknell bickn...@ufp.org wrote: In a message written on Mon, Jul 02, 2012 at 12:23:57PM -0400, david raistrick wrote: When the hardware is outsourced how would you propose testing the non-software components? They do simulate availability zone issues (and AZ

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 9:23 AM, david raistrick wrote: When the hardware is outsourced how would you propose testing the non-software components? They do simulate availability zone issues (and AZ is as close as you get to controlling which internal power/network/etc grid you're attached to).

Re: FYI Netflix is down

2012-07-02 Thread Tony McCrory
On 2 July 2012 19:20, Cameron Byrne cb.li...@gmail.com wrote: Make your chaos animal go after sites and regions instead of individual VMs. CB From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html Create More Failures Currently, Netflix uses a service called

Re: FYI Netflix is down

2012-07-02 Thread Paul Graydon
On 07/02/2012 08:53 AM, Tony McCrory wrote: On 2 July 2012 19:20, Cameron Byrne cb.li...@gmail.com wrote: Make your chaos animal go after sites and regions instead of individual VMs. CB From a previous post mortem http://techblog.netflix.com/2011_04_01_archive.html Create More Failures

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
-Original Message- From: Leo Bicknell [mailto:bickn...@ufp.org] I want to emphasize _and test_. [snip] I used to work with a guy who had a simple test for these things, and if I was a VP at Amazon, Netflix, or any other large company I would do the same. About once a month

Re: FYI Netflix is down

2012-07-02 Thread AP NANOG
I believe in my dictionary Chaos Gorilla translates into Time To Go Home, with a rough definition of Everything just crapped out - The world is ending; but then again I may have hat incorrect :-) -- Thank you, Robert Miller http://www.armoredpackets.com Twitter: @arch3angel On 7/2/12 2:59

Re: FYI Netflix is down

2012-07-02 Thread Joly MacFie
Good band name. Chaos Gorilla -- --- Joly MacFie 218 565 9365 Skype:punkcast WWWhatsup NYC - http://wwwhatsup.com http://pinstand.com - http://punkcast.com VP (Admin) - ISOC-NY - http://isoc-ny.org

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. The it can't happen is almost guaranteed to happen. ;-) And when it does, it'll often interact in ways we can't predict or sometimes even understand. As for

Re: FYI Netflix is down

2012-07-02 Thread david raistrick
On Mon, 2 Jul 2012, James Downs wrote: back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem. Someone needs to define back-plane/control-plane

Re: FYI Netflix is down

2012-07-02 Thread Brett Frankenberger
On Mon, Jul 02, 2012 at 09:09:09AM -0700, Leo Bicknell wrote: In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote: from the perspective of people watching B-rate movies: this was a failure to implement and test a reliable system for streaming those movies

RE: FYI Netflix is down

2012-07-02 Thread Dan Golding
-Original Message- From: Greg D. Moore [mailto:moor...@greenms.com] If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. Also, Human Error by James Reason.

Re: FYI Netflix is down

2012-07-02 Thread George Herbert
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore moor...@greenms.com wrote: At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. The it can't happen is almost guaranteed to happen. ;-)  And when it does, it'll often

Re: FYI Netflix is down

2012-07-02 Thread Greg D. Moore
At 05:04 PM 7/2/2012, George Herbert wrote: On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore moor...@greenms.com wrote: At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. The it can't happen is almost guaranteed

Re: FYI Netflix is down

2012-07-02 Thread Steven Bellovin
On Jul 2, 2012, at 3:43 PM, Greg D. Moore wrote: At 03:08 PM 7/2/2012, George Herbert wrote: If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow. Strong second to that suggestion. --Steve Bellovin, https://www.cs.columbia.edu/~smb

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 1:20 PM, david raistrick wrote: Amazon resources are controlled (from a consumer viewpoint) by API - that API is also used by amazon's internal toolkits that support ELB (and RDS..). Those (http accessed) API interfaces were unavailable for a good portion of the

Re: FYI Netflix is down

2012-07-02 Thread Rodrick Brown
On Jul 2, 2012, at 7:03 PM, James Downs e...@egon.cc wrote: On Jul 2, 2012, at 1:20 PM, david raistrick wrote: Amazon resources are controlled (from a consumer viewpoint) by API - that API is also used by amazon's internal toolkits that support ELB (and RDS..). Those (http accessed)

Re: FYI Netflix is down

2012-07-02 Thread James Downs
On Jul 2, 2012, at 7:19 PM, Rodrick Brown wrote: People are acting as if Netflix is part of some critical service they stream movies for Christ sake. Some acceptable level of loss is fine for 99.99% of Netflix's user base just like cable, electricity and running water I suffer a few

Re: FYI Netflix is down

2012-07-02 Thread Hal Murray
George Herbert george.herb...@gmail.com said: I worked for a Sun clone vendor (Axil) for a while and took some of our systems and storage to Comdex one year in the 90s. We had a RAID unit (Mylex controller) we had just introduced. Beforehand, I made REALLY REALLY SURE that the

Re: FYI Netflix is down

2012-07-01 Thread Jay Ashworth
- Original Message - From: Tyler Haske tyler.ha...@gmail.com How to run a datacenter 101. Have more then one location, preferably far apart. It being Amazon I would expect more. :/ Not entirely. Datacenters do go down, our best efforts to the contrary notwithstanding. Amazon

Re: FYI Netflix is down

2012-07-01 Thread steve pirk [egrep]
On Sun, Jul 1, 2012 at 11:38 AM, Jay Ashworth j...@baylink.com wrote: Not entirely. Datacenters do go down, our best efforts to the contrary notwithstanding. Amazon doesn't guarantee you redundancy on EC2, only the tools to provide it yourself. 25% Amazon; 75% service provider clients;

Re: FYI Netflix is down

2012-06-30 Thread Roy
On 6/29/2012 10:38 PM, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. they just had an outage quite similar.. they performed a full audit on electrical systems worldwide, according to the rfo/post mortem. looks like they need to perform a full and

Re: FYI Netflix is down

2012-06-30 Thread Grant Ridder
well one would think that they could at least get power redundancy right... On Sat, Jun 30, 2012 at 1:07 AM, Roy r.engehau...@gmail.com wrote: On 6/29/2012 10:38 PM, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. they just had an outage quite

Re: FYI Netflix is down

2012-06-30 Thread Tyler Haske
I am not a computer science guy but been around a long time.  Data centers and clouds are like software.  Once they reach a certain size, its impossible to keep the bugs out.  You can test and test your heart out and something will slip by.  You can say the same thing about nuclear reactors,

Re: FYI Netflix is down

2012-06-30 Thread Andrew D Kirch
On 6/30/2012 3:11 AM, Tyler Haske wrote: How to run a datacenter 101. Have more then one location, preferably far apart. It being Amazon I would expect more. :/ Based on? Clouds are nothing more than outsourced responsibility. My business has stopped while my IT department explains to me

Re: FYI Netflix is down

2012-06-30 Thread joel jaeggli
On 6/30/12 12:11 AM, Tyler Haske wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and something will slip by. You can say

Re: FYI Netflix is down

2012-06-30 Thread Lynda
On 6/30/2012 12:11 AM, Tyler Haske wrote: On 6/29/2012 11:07 PM, Roy wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and

Re: FYI Netflix is down

2012-06-30 Thread Justin M. Streiner
On Sat, 30 Jun 2012, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. I was not giving anyone a free pass or attempting to shrug off the outage. I was just stating that there are many reasons why things break. I haven't seen anything official on

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Grant Ridder shortdudey...@gmail.com wrote: well one would think that they could at least get power redundancy right... It is very similar to suggesting redundancy within a site against building collapse. Reliable power redundancy is very hard and very expensive.Much harder and

Re: FYI Netflix is down

2012-06-30 Thread Cameron Byrne
On Jun 30, 2012 12:25 AM, joel jaeggli joe...@bogus.com wrote: On 6/30/12 12:11 AM, Tyler Haske wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Cameron Byrne cb.li...@gmail.com wrote: On Jun 30, 2012 12:25 AM, joel jaeggli joe...@bogus.com wrote: On 6/30/12 12:11 AM, Tyler Haske wrote: Geo-redundancy is key. In fact, i would take distributed data centers over RAID, UPS, or any other fancy pants © mechanisms any day.

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 4:50 AM, Justin M. Streiner wrote: On Sat, 30 Jun 2012, jamie rishaw wrote: you know what's happening even more? ..Amazon not learning their lesson. I was not giving anyone a free pass or attempting to shrug off the outage. I was just stating that there are many reasons why

Re: FYI Netflix is down

2012-06-30 Thread Roy
On 6/30/2012 12:11 AM, Tyler Haske wrote: I am not a computer science guy but been around a long time. Data centers and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and something will slip by. You can say

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
On Jun 30, 2012 11:23 AM, Seth Mattinen se...@rollernet.us wrote: But haven't they all been cascading failures? No. They have not. That's not what that term means. 'Cascading failure' has a fairly specific meaning that doesn't imply resilience in the face of decomposition into smaller

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Todd Underwood toddun...@gmail.com wrote: On Jun 30, 2012 11:23 AM, Seth Mattinen se...@rollernet.us wrote: But haven't they all been cascading failures? No. They have not. That's not what that term means. 'Cascading failure' has a fairly specific meaning that doesn't imply

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 9:25 AM, Todd Underwood wrote: On Jun 30, 2012 11:23 AM, Seth Mattinen se...@rollernet.us mailto:se...@rollernet.us wrote: But haven't they all been cascading failures? No. They have not. That's not what that term means. 'Cascading failure' has a fairly specific meaning

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. T On Jun 30, 2012 2:21 PM, Seth Mattinen se...@rollernet.us wrote: On 6/30/12 9:25 AM, Todd Underwood wrote: On Jun 30, 2012 11:23 AM, Seth Mattinen

Re: FYI Netflix is down

2012-06-30 Thread Jimmy Hess
On 6/30/12, Todd Underwood toddun...@gmail.com wrote: This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. Actually, you can't really say that. It's true that it was a simple power outage for Amazon. Power failed,

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
The last 2 Amazon outages were power issues isolated to just there us-east Virginia data center. I read somewhere that Amazon has something like 70% of their ec2 resources in Virginia and its also their oldest ec2 datacenter..so I am guessing they learned a lot of lessons and are stuck with an

Re: FYI Netflix is down

2012-06-30 Thread Seth Mattinen
On 6/30/12 12:04 PM, Todd Underwood wrote: This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. I guess I'm assuming there were UPS and generator systems involved (and failing) with powering the critical load, but I

Re: FYI Netflix is down

2012-06-30 Thread Rayson Ho
in a single Availability Zone have lost power due to electrical storms in the area. We are actively working to restore power. -Original Message- From: Grant Ridder [mailto:shortdudey...@gmail.com] Sent: Friday, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI

Re: FYI Netflix is down

2012-06-30 Thread Randy Bush
Sorry to be the monday morning quarterback, but the sites that went down learned a valuable lesson in single point of failure analysis. as this has happened more than once before, i am less optimistic. or maybe they decided the spof risk was not worth the avoidance costs. randy

Re: FYI Netflix is down

2012-06-30 Thread Jared Mauch
The interesting thing to me is the us population by time zone. If amazon has 70% of servers in the eastern time zone it makes some sense. Mountain + pacific is smaller than central, which is a bit more than half eastern. These stats are older but a good rough gauge:

Re: FYI Netflix is down

2012-06-30 Thread Scott Howard
On Sat, Jun 30, 2012 at 12:04 PM, Todd Underwood toddun...@gmail.comwrote: This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. Not always. Cascading failures can also occur when there is zero dependency between

Re: FYI Netflix is down

2012-06-30 Thread Todd Underwood
scott, This was not a cascading failure.  It was a simple power outage Cascading failures involve interdependencies among components. Not always.  Cascading failures can also occur when there is zero dependency between components.  The simplest form of this is where one environment fails

Re: FYI Netflix is down

2012-06-30 Thread Bryan Horstmann-Allen
+-- | On 2012-06-30 16:08:40, Rayson Ho wrote: | | If I recall correctly, availability zone (AZ) mappings are specific to | an AWS account, and in fact there is no way to know if you are running | in the same AZ as

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
On Sat, Jun 30, 2012 at 4:45 PM, Bryan Horstmann-Allen b...@mirrorshades.net wrote: Explain Netflix and Heroku last night. Both of whom architect across multiple AZs and have for many years. The API and EBS across the region were also affected. ELB was _also_ affected across the region,

Re: FYI Netflix is down

2012-06-30 Thread Bryan Horstmann-Allen
+-- | On 2012-06-30 16:55:53, Mike Devlin wrote: | | But in netflix case, if they architected their environment the way they | said they did, why wouldnt they just fail over to us-west? especially at | their scale, I

Re: FYI Netflix is down

2012-06-30 Thread Mike Devlin
On Sat, Jun 30, 2012 at 5:04 PM, Bryan Horstmann-Allen b...@mirrorshades.net wrote: Have a look at Asgard, the AWS management tool they just open sourced. It implies they rely very heavily on many AWS features, some of which are very much region specific. As to their multi-region

Re: FYI Netflix is down

2012-06-30 Thread Brett Frankenberger
On Sat, Jun 30, 2012 at 01:19:54PM -0700, Scott Howard wrote: On Sat, Jun 30, 2012 at 12:04 PM, Todd Underwood toddun...@gmail.comwrote: This was not a cascading failure. It was a simple power outage Cascading failures involve interdependencies among components. Not always.

Re: FYI Netflix is down

2012-06-29 Thread Jason Baugher
Seeing some reports of Pinterest and Instagram down as well. Amazon cloud services being implicated. On 6/29/2012 10:22 PM, Joe Blanchard wrote: Seems that they are unreachable at the moment. Called and theres a recorded message stating they are aware of an issue, no details. -Joe

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
From Amazon Amazon Elastic Compute Cloud (N. Virginia) (http://status.aws.amazon.com/) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT We are investigating elevated errors rates for APIs in the US-EAST-1 (Northern Virginia)

RE: FYI Netflix is down

2012-06-29 Thread James Laszko
, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix is down From Amazon Amazon Elastic Compute Cloud (N. Virginia) (http://status.aws.amazon.com/) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
to electrical storms in the area. We are actively working to restore power. -Original Message- From: Grant Ridder [mailto:shortdudey...@gmail.com] Sent: Friday, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix is down From Amazon Amazon Elastic

Re: FYI Netflix is down

2012-06-29 Thread Jason Baugher
- From: Grant Ridder [mailto:shortdudey...@gmail.com] Sent: Friday, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix is down From Amazon Amazon Elastic Compute Cloud (N. Virginia) (http://status.aws.amazon.com/) 8:21 PM PDT We are investigating connectivity

Re: FYI Netflix is down

2012-06-29 Thread Mike Lyon
: FYI Netflix is down From Amazon Amazon Elastic Compute Cloud (N. Virginia) ( http://status.aws.amazon.com/**) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT We are investigating elevated errors rates for APIs in the US

Re: FYI Netflix is down

2012-06-29 Thread Ian Wilson
On Fri, Jun 29, 2012 at 11:44 PM, Grant Ridder shortdudey...@gmail.com wrote: I have an instance in zone C and it is up and fine, so it must be A, B, or D that is down. It is my understanding that instance zones are randomized between customers -- so your zone C may be my zone A. Ian -- Ian

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
Yes, although, when you launch an instance, you do have the option of selecting a zone if you want. However, once the instance is started it stays in that zone and does not switch. On Fri, Jun 29, 2012 at 10:47 PM, Ian Wilson ian.m.wil...@gmail.com wrote: On Fri, Jun 29, 2012 at 11:44 PM,

Re: FYI Netflix is down

2012-06-29 Thread Derek Ivey
have lost power due to electrical storms in the area. We are actively working to restore power. -Original Message- From: Grant Ridder [mailto:shortdudey123@gmail.**comshortdudey...@gmail.com ] Sent: Friday, June 29, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix

Re: FYI Netflix is down

2012-06-29 Thread Seth Mattinen
On 6/29/12 8:47 PM, Mike Lyon wrote: Whatever happened to UPSs and generators? You don't need them with The Cloud! But seriously, this is something like the third or fourth time AWS fell over flat in recent memory. ~Seth

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
They may use it for content, but reddit.com resolves to IPs own by quest On Fri, Jun 29, 2012 at 10:51 PM, Seth Mattinen se...@rollernet.us wrote: On 6/29/12 8:47 PM, Mike Lyon wrote: Whatever happened to UPSs and generators? You don't need them with The Cloud! But seriously, this is

Re: FYI Netflix is down

2012-06-29 Thread Grant Ridder
8:49 PM PDT Power has been restored to the impacted Availability Zone and we are working to bring impacted instances and volumes back online On Fri, Jun 29, 2012 at 10:52 PM, Grant Ridder shortdudey...@gmail.comwrote: They may use it for content, but reddit.com resolves to IPs own by quest

Re: FYI Netflix is down

2012-06-29 Thread William Herrin
On Fri, Jun 29, 2012 at 11:42 PM, Grant Ridder shortdudey...@gmail.com wrote: From Amazon Amazon Elastic Compute Cloud (N. Virginia)  (http://status.aws.amazon.com/) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT We are

Re: FYI Netflix is down

2012-06-29 Thread Seth Mattinen
On 6/29/12 8:22 PM, Joe Blanchard wrote: Seems that they are unreachable at the moment. Called and theres a recorded message stating they are aware of an issue, no details. Streaming services and web; just tried my Roku and it failed to connect. ~Seth

Re: FYI Netflix is down

2012-06-29 Thread Justin M. Streiner
, 2012 8:42 PM To: Jason Baugher Cc: nanog@nanog.org Subject: Re: FYI Netflix is down From Amazon Amazon Elastic Compute Cloud (N. Virginia) ( http://status.aws.amazon.com/**) 8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region. 8:31 PM PDT We

  1   2   >