Re: Check out Massive Amazon cloud service outage disrupts sites
Not a problem now: $HASP003 RC=(52),P J247-240 - NO SELECTABLE ENTRIES FOUND $HASP003 MATCHING SPECIFICATION On Sun, Mar 5, 2017 at 1:44 AM, Steve Smith <sasd...@gmail.com> wrote: > Typos can be deadly in this business... true story (and likely not unique): > > Operator is told to purge jobs 247-249. Types in $PJ247-240 (missed it by > *that* much). Which as anyone familiar with JES2 knows, pretty much wiped > the spool clean. And JES2 calls *its* problems "catastrophic". > > I don't know if that would still "work", I'm sure not going to try it. > > sas > > p.s. It wasn't me (I have caused some disasters in my career, just not this > one), The actual numbers I don't remember, but the salient point is the 9 & > 0 keys are adjacent. JES2 allows (or did in the 90s) allow job number > ranges on commands to wrap-around from the max ( at the time) to 1. It > did not ask for verification or check for reasonableness. > > On Fri, Mar 3, 2017 at 4:20 PM, Steve Beaver <st...@stevebeaver.com> > wrote: > > > The one thing that might be interesting is which AWS site went down, and > > it they the CIA down > > > > Steve > > > > -Original Message- > > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > > Behalf Of Vernooij, Kees (ITOPT1) - KLM > > Sent: Friday, March 3, 2017 8:35 AM > > To: IBM-MAIN@LISTSERV.UA.EDU > > Subject: Re: Check out Massive Amazon cloud service outage disrupts sites > > > > It was on Dutch newssites, with the text: "Amazon Web Services (AWS) > > announced this on Thursday". > > > > Kees. > > > > > -Original Message----- > > > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > > > On Behalf Of zMan > > > Sent: 03 March, 2017 15:31 > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > Subject: Re: Check out Massive Amazon cloud service outage disrupts > > > sites > > > > > > Cite? (Not challenging you, interested!) > > > > > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > > > kees.verno...@klm.com> wrote: > > > > > > > The outage was caused by a typo. > > > > I remember there were times we made scripts and tested them on our > > > test > > > > environments, to avoid silly errors in the production environment... > > > > > > > > Kees. > > > > > > > > > -Original Message- > > > > > From: IBM Mainframe Discussion List [mailto:IBM- > > > m...@listserv.ua.edu] On > > > > > Behalf Of Edward Finnell > > > > > Sent: 28 February, 2017 23:47 > > > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > > > Subject: Check out Massive Amazon cloud service outage disrupts > > > sites > > > > > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > > > > service-goes-down-sites-scramble > > > > > /98530914/) > > > > > > > > > > Wondered why traffic was a little off. > > > > > > > > > > -- > > > > > -- > > > -- > > > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > > > MAIN > > > > > > > > For information, services and offers, please visit our web site: > > > > http://www.klm.com. This e-mail and any attachment may contain > > > > confidential and privileged material intended for the addressee only. > > > If > > > > you are not the addressee, you are notified that no part of the > > > > e-mail > > > or > > > > any attachment may be disclosed, copied or distributed, and that any > > > other > > > > action related to this e-mail or attachment is strictly prohibited, > > > and may > > > > be unlawful. If you have received this e-mail by error, please > > > > notify > > > the > > > > sender immediately by return e-mail, and delete this message. > > > > > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > > > > and/or > > > its > > > > employees shall not be liable for the incorrect or i
Re: Check out Massive Amazon cloud service outage disrupts sites
Typos can be deadly in this business... true story (and likely not unique): Operator is told to purge jobs 247-249. Types in $PJ247-240 (missed it by *that* much). Which as anyone familiar with JES2 knows, pretty much wiped the spool clean. And JES2 calls *its* problems "catastrophic". I don't know if that would still "work", I'm sure not going to try it. sas p.s. It wasn't me (I have caused some disasters in my career, just not this one), The actual numbers I don't remember, but the salient point is the 9 & 0 keys are adjacent. JES2 allows (or did in the 90s) allow job number ranges on commands to wrap-around from the max ( at the time) to 1. It did not ask for verification or check for reasonableness. On Fri, Mar 3, 2017 at 4:20 PM, Steve Beaver <st...@stevebeaver.com> wrote: > The one thing that might be interesting is which AWS site went down, and > it they the CIA down > > Steve > > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of Vernooij, Kees (ITOPT1) - KLM > Sent: Friday, March 3, 2017 8:35 AM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Check out Massive Amazon cloud service outage disrupts sites > > It was on Dutch newssites, with the text: "Amazon Web Services (AWS) > announced this on Thursday". > > Kees. > > > -Original Message- > > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > > On Behalf Of zMan > > Sent: 03 March, 2017 15:31 > > To: IBM-MAIN@LISTSERV.UA.EDU > > Subject: Re: Check out Massive Amazon cloud service outage disrupts > > sites > > > > Cite? (Not challenging you, interested!) > > > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > > kees.verno...@klm.com> wrote: > > > > > The outage was caused by a typo. > > > I remember there were times we made scripts and tested them on our > > test > > > environments, to avoid silly errors in the production environment... > > > > > > Kees. > > > > > > > -Original Message- > > > > From: IBM Mainframe Discussion List [mailto:IBM- > > m...@listserv.ua.edu] On > > > > Behalf Of Edward Finnell > > > > Sent: 28 February, 2017 23:47 > > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > > Subject: Check out Massive Amazon cloud service outage disrupts > > sites > > > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > > > service-goes-down-sites-scramble > > > > /98530914/) > > > > > > > > Wondered why traffic was a little off. > > > > > > > > -- > > > > -- > > -- > > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > > MAIN > > > > > > For information, services and offers, please visit our web site: > > > http://www.klm.com. This e-mail and any attachment may contain > > > confidential and privileged material intended for the addressee only. > > If > > > you are not the addressee, you are notified that no part of the > > > e-mail > > or > > > any attachment may be disclosed, copied or distributed, and that any > > other > > > action related to this e-mail or attachment is strictly prohibited, > > and may > > > be unlawful. If you have received this e-mail by error, please > > > notify > > the > > > sender immediately by return e-mail, and delete this message. > > > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > > > and/or > > its > > > employees shall not be liable for the incorrect or incomplete > > transmission > > > of this e-mail or any attachments, nor responsible for any delay in > > receipt. > > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > > Dutch > > > Airlines) is registered in Amstelveen, The Netherlands, with > > registered > > > number 33014286 > > > > > > > > > > > > -- For IBM-MAIN subscribe / signoff / archive access instructions, > > > send email to lists...@listserv.ua.edu with the message
Re: Check out Massive Amazon cloud service outage disrupts sites
The one thing that might be interesting is which AWS site went down, and it they the CIA down Steve -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Vernooij, Kees (ITOPT1) - KLM Sent: Friday, March 3, 2017 8:35 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites It was on Dutch newssites, with the text: "Amazon Web Services (AWS) announced this on Thursday". Kees. > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > On Behalf Of zMan > Sent: 03 March, 2017 15:31 > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Check out Massive Amazon cloud service outage disrupts > sites > > Cite? (Not challenging you, interested!) > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > kees.verno...@klm.com> wrote: > > > The outage was caused by a typo. > > I remember there were times we made scripts and tested them on our > test > > environments, to avoid silly errors in the production environment... > > > > Kees. > > > > > -Original Message- > > > From: IBM Mainframe Discussion List [mailto:IBM- > m...@listserv.ua.edu] On > > > Behalf Of Edward Finnell > > > Sent: 28 February, 2017 23:47 > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > Subject: Check out Massive Amazon cloud service outage disrupts > sites > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > > service-goes-down-sites-scramble > > > /98530914/) > > > > > > Wondered why traffic was a little off. > > > > > > -- > > > -- > -- > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > MAIN > > > > For information, services and offers, please visit our web site: > > http://www.klm.com. This e-mail and any attachment may contain > > confidential and privileged material intended for the addressee only. > If > > you are not the addressee, you are notified that no part of the > > e-mail > or > > any attachment may be disclosed, copied or distributed, and that any > other > > action related to this e-mail or attachment is strictly prohibited, > and may > > be unlawful. If you have received this e-mail by error, please > > notify > the > > sender immediately by return e-mail, and delete this message. > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > > and/or > its > > employees shall not be liable for the incorrect or incomplete > transmission > > of this e-mail or any attachments, nor responsible for any delay in > receipt. > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > Dutch > > Airlines) is registered in Amstelveen, The Netherlands, with > registered > > number 33014286 > > > > > > > > -- For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to lists...@listserv.ua.edu with the message: INFO > > IBM-MAIN > > > > > > -- > zMan -- "I've got a mainframe and I'm not afraid to use it" > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, send > email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KL
Re: Check out Massive Amazon cloud service outage disrupts sites
There's a great (though probably apocryphal) story of the Xerox STAR product being installed in directory /bin/star. Tech support is on phone with customer, they decide to whack it and start over: "Type rm dash rf slash bin slash star" "What's happening?" "It's taking a really long time..." Like I said, probably apocryphal, but... -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Fri, Mar 3, 2017 at 1:09 PM, Paul Gilmartin < 000433f07816-dmarc-requ...@listserv.ua.edu> wrote: > On Fri, 3 Mar 2017 11:19:52 -0600, John McKown wrote: > > > >> The outage was caused by a typo. > > > >Ah, yes. The UNIX community has the legend of the system administrator > >(aka "root") who meant to remove all single character files via "rm -f ?" > >who typed in "rm -f /" OOPS! > > > Can even superuser unlink a non-empty directory? > No. But that would unlink all empty directories and files. Not that should should be many such. Actual stupid is: rm -rf / > > -- gil > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > -- "Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion Maranatha! <>< John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Hi, Become root, navigate to /, enter rm -rf *. Boom. Every unix/linux admin. Has done a variation of this - usually only once! As a MF guy that has dabbled in the dark side - Not as much as John McK! - I always thought it would be fun to watch to do exactly that from z/OS unix and watch the MVS syslog. :-) Happy Friday, BobL -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Paul Gilmartin Sent: Friday, March 03, 2017 12:09 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites [ EXTERNAL ] On Fri, 3 Mar 2017 11:19:52 -0600, John McKown wrote: > >> The outage was caused by a typo. > >Ah, yes. The UNIX community has the legend of the system administrator >(aka "root") who meant to remove all single character files via "rm -f ?" >who typed in "rm -f /" OOPS! > Can even superuser unlink a non-empty directory? -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This e-mail transmission may contain information that is proprietary, privileged and/or confidential and is intended exclusively for the person(s) to whom it is addressed. Any use, copying, retention or disclosure by any person other than the intended recipient or the intended recipient's designees is strictly prohibited. If you are not the intended recipient or their designee, please notify the sender immediately by return e-mail and delete all copies. OppenheimerFunds may, at its sole discretion, monitor, review, retain and/or disclose the content of all email communications. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On 3 March 2017 at 13:51, John McKownwrote: > On Fri, Mar 3, 2017 at 12:31 PM, J R wrote: > >> Maybe a side effect of Agile / DevOps. >> >> Ah, well, time for a Scrum! >> > > Software, rugby or drink? > http://stackoverflow.com/questions/11469358/what-is-the-difference-between-scrum-and-agile-development > > https://untappd.com/b/facer-s-scrum-dragon/1936165 http://dilbert.com/strip/2017-02-06 Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Fri, 3 Mar 2017 11:19:52 -0600, John McKown wrote: > >> The outage was caused by a typo. > >Ah, yes. The UNIX community has the legend of the system administrator >(aka "root") who meant to remove all single character files via "rm -f ?" >who typed in "rm -f /" OOPS! > Can even superuser unlink a non-empty directory? -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Fri, Mar 3, 2017 at 12:31 PM, J Rwrote: > Maybe a side effect of Agile / DevOps. > > Ah, well, time for a Scrum! > Software, rugby or drink? http://stackoverflow.com/questions/11469358/what-is-the-difference-between-scrum-and-agile-development https://untappd.com/b/facer-s-scrum-dragon/1936165 -- "Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion Maranatha! <>< John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Maybe a side effect of Agile / DevOps. Ah, well, time for a Scrum! Sent from my iPhone -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Fri, Mar 3, 2017 at 2:40 AM, Vernooij, Kees (ITOPT1) - KLM < kees.verno...@klm.com> wrote: > The outage was caused by a typo. > Ah, yes. The UNIX community has the legend of the system administrator (aka "root") who meant to remove all single character files via "rm -f ?" who typed in "rm -f /" OOPS! > I remember there were times we made scripts and tested them on our test > environments, to avoid silly errors in the production environment... > > Kees. > > -- "Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion Maranatha! <>< John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
When I see things like this I am always amazed how IT has changed. Acting like that in the previous century was not the way professionals worked. Of course, there was a test, development and production environment. But what really puzzled me was that billing did not work in the way they had expected it to be. Architecture? Application Design? Sizing? Testing??? Or is it just coding nowadays? Mike -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Bill Woodger wrote: >>...giving a link to this [honest] post mortem by the AWS: >>https://aws.amazon.com/message/41926/ >I'm not sure "honest" is the exact word I'd use to describe what Amazon writes >:-). This is why I put that word in those brackets because I also barely read that airplane magazine junk you mentioned like this one: >"Finally, we want to apologize for the impact this event caused for our >customers. While we are proud of our long track record of availability with >Amazon S3, we know how critical this service is to our customers, their >applications and end users, and their businesses. We will do everything we can >to learn from this event and use it to improve our availability even further." I would also asked why 'finally'? >As has been said, don't you test it first? With something of >ever-increasing-scale you don't even rely on "well, it worked OK six months >ago". Do they have a sandbox to do their testing? Was that guy not supervised or peer reviewed at all? >They were "debugging". It was a "billing" problem. Something causing the >billing to "progress more slowly than expected" (does that really sound so >bad?). Debugging billing on a live system, and they loose vast numbers of >business-availability-hours across vast numbers of websites? Debugging? >Really? Seriously? And they can get away with that? I nearly spilled my coffee when I see that 'debugging' thing on a live system. Just like you, I also think it is just a standard PR thing. They're just pacifying journalists, shareholders, bosses and their users. >Move along, please, nothing to see here. Just a virtual police line. No rubberneckers here, move on! Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Fri, 3 Mar 2017 09:30:11 -0600, Elardus Engelbrechtwrote: >Vernooij, Kees (ITOPT1) - KLM wrote: > ... > >...giving a link to this [honest] post mortem by the AWS: > >https://aws.amazon.com/message/41926/ > >Just a simple lame typo... ;-) > >Groete / Greetings >Elardus Engelbrecht > The link to the Amazon release was in the article mentioned yesterday. I'm not sure "honest" is the exact word I'd use to describe what Amazon writes :-). There's also some irony (for me) that the most obvious things on that web page are "by the way, take up our service" and "hey, you can eve do it for free". Here's an example of how "well crafted" the item is: "Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further." Why "finally"? Isn't that the first thing they want to do? Why is it an "event", which doesn't sound very bad? After all, event happens, it's just often spelled differently in that phrase. And it is not lessons learned to improve availability. It is to " improve our availability even further". So it was a good thing. So, full disclosure, everything in the open. Whoops. Somehow it is convenient not to mention or address HOW DID THAT EVER HAPPEN IN THE FIRST PLACE. As has been said, don't you test it first? With something of ever-increasing-scale you don't even rely on "well, it worked OK six months ago". "The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected." They were "debugging". It was a "billing" problem. Something causing the billing to "progress more slowly than expected" (does that really sound so bad?). Debugging billing on a live system, and they loose vast numbers of business-availability-hours across vast numbers of websites? Debugging? Really? Seriously? And they can get away with that? Yes, it's all in there. Sort of. Standard PR technique to reveal "everything" so that no-one digs into the revelations, because the revalatory work of the journalist is already done by Amazon themselves. Move along, please, nothing to see here. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Vernooij, Kees (ITOPT1) - KLM wrote: Thanks for giving the sources. You can look at https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Services were it says: "On March 2, AWS reveals that the outage was caused by an incorrect parameter passed in by an authorized employee while debugging that ended up deleting more instances than the employee intended." citing this source: http://venturebeat.com/2017/03/02/aws-apologizes-for-february-28-outage-takes-steps-to-prevent-similar-events/ ...giving a link to this [honest] post mortem by the AWS: https://aws.amazon.com/message/41926/ Just a simple lame typo... ;-) Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
http://www.geekwire.com/2017/amazon-explains-massive-aws-outage-says-employee-error-took-servers-offline-promises-changes/ Kees. > -Original Message- > From: Vernooij, Kees (ITOPT1) - KLM > Sent: 03 March, 2017 15:41 > To: 'IBM-MAIN@listserv.ua.edu' <IBM-MAIN@listserv.ua.edu> > Subject: RE: Check out Massive Amazon cloud service outage disrupts > sites > > Since I can't find any other references, the full translation of the > Dutch article is: > > Seattle- > a simple typing error was the cause of the global failure Tuesday on the > internet. That was announced by Amazon Web Services (AWS) Thursday. An > employee tried to fix a problem with the cloud servers from Amazon and > had to turn off some. However, he mistakenly gave the command to disable > a range of servers. That triggered a chain reaction whereby the > computers had to be restarted. Websites like Buzzfeed, Expedia and > Medium were hours hardly reachable. Also Snapchat was suffering from the > failure. Eventually the problem lasted four hours. > > Kees. > > -Original Message- > > From: Vernooij, Kees (ITOPT1) - KLM > > Sent: 03 March, 2017 15:35 > > To: 'IBM Mainframe Discussion List' <IBM-MAIN@LISTSERV.UA.EDU> > > Subject: RE: Check out Massive Amazon cloud service outage disrupts > > sites > > > > It was on Dutch newssites, with the text: "Amazon Web Services (AWS) > > announced this on Thursday". > > > > Kees. > > > > > -Original Message- > > > From: IBM Mainframe Discussion List [mailto:IBM- > m...@listserv.ua.edu] > > On > > > Behalf Of zMan > > > Sent: 03 March, 2017 15:31 > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > Subject: Re: Check out Massive Amazon cloud service outage disrupts > > > sites > > > > > > Cite? (Not challenging you, interested!) > > > > > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > > > kees.verno...@klm.com> wrote: > > > > > > > The outage was caused by a typo. > > > > I remember there were times we made scripts and tested them on our > > > test > > > > environments, to avoid silly errors in the production > environment... > > > > > > > > Kees. > > > > > > > > > -Original Message- > > > > > From: IBM Mainframe Discussion List [mailto:IBM- > > > m...@listserv.ua.edu] On > > > > > Behalf Of Edward Finnell > > > > > Sent: 28 February, 2017 23:47 > > > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > > > Subject: Check out Massive Amazon cloud service outage disrupts > > > sites > > > > > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons- > cloud- > > > > > service-goes-down-sites-scramble > > > > > /98530914/) > > > > > > > > > > Wondered why traffic was a little off. > > > > > > > > > > > -- > > -- > > > -- > > > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > > > send email to lists...@listserv.ua.edu with the message: INFO > IBM- > > > MAIN > > > > > > > > For information, services and offers, please visit our web site: > > > > http://www.klm.com. This e-mail and any attachment may contain > > > > confidential and privileged material intended for the addressee > > only. > > > If > > > > you are not the addressee, you are notified that no part of the e- > > mail > > > or > > > > any attachment may be disclosed, copied or distributed, and that > any > > > other > > > > action related to this e-mail or attachment is strictly > prohibited, > > > and may > > > > be unlawful. If you have received this e-mail by error, please > > notify > > > the > > > > sender immediately by return e-mail, and delete this message. > > > > > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > > and/or > > > its > > > > employees shall not be liable for the incorrect or incomplete > > > transmission > > > > of this e-mail or any attachments, nor responsible for any delay > in > > > receipt. > > > > Koninklijke Luch
Re: Check out Massive Amazon cloud service outage disrupts sites
http://www.pcmag.com/news/352160/a-typo-took-amazon-s3-offline Dan -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Vernooij, Kees (ITOPT1) - KLM Sent: Friday, March 03, 2017 9:41 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites Since I can't find any other references, the full translation of the Dutch article is: Seattle- a simple typing error was the cause of the global failure Tuesday on the internet. That was announced by Amazon Web Services (AWS) Thursday. An employee tried to fix a problem with the cloud servers from Amazon and had to turn off some. However, he mistakenly gave the command to disable a range of servers. That triggered a chain reaction whereby the computers had to be restarted. Websites like Buzzfeed, Expedia and Medium were hours hardly reachable. Also Snapchat was suffering from the failure. Eventually the problem lasted four hours. Kees. > -Original Message- > From: Vernooij, Kees (ITOPT1) - KLM > Sent: 03 March, 2017 15:35 > To: 'IBM Mainframe Discussion List' <IBM-MAIN@LISTSERV.UA.EDU> > Subject: RE: Check out Massive Amazon cloud service outage disrupts > sites > > It was on Dutch newssites, with the text: "Amazon Web Services (AWS) > announced this on Thursday". > > Kees. > > > -Original Message- > > From: IBM Mainframe Discussion List > > [mailto:IBM-MAIN@LISTSERV.UA.EDU] > On > > Behalf Of zMan > > Sent: 03 March, 2017 15:31 > > To: IBM-MAIN@LISTSERV.UA.EDU > > Subject: Re: Check out Massive Amazon cloud service outage disrupts > > sites > > > > Cite? (Not challenging you, interested!) > > > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > > kees.verno...@klm.com> wrote: > > > > > The outage was caused by a typo. > > > I remember there were times we made scripts and tested them on our > > test > > > environments, to avoid silly errors in the production environment... > > > > > > Kees. > > > > > > > -Original Message- > > > > From: IBM Mainframe Discussion List [mailto:IBM- > > m...@listserv.ua.edu] On > > > > Behalf Of Edward Finnell > > > > Sent: 28 February, 2017 23:47 > > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > > Subject: Check out Massive Amazon cloud service outage disrupts > > sites > > > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-clou > > > > d- service-goes-down-sites-scramble > > > > /98530914/) > > > > > > > > Wondered why traffic was a little off. > > > > > > > > > > > > -- > -- > > -- > > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > > send email to lists...@listserv.ua.edu with the message: INFO > > > > IBM- > > MAIN > > > > > > For information, services and offers, please visit our web site: > > > http://www.klm.com. This e-mail and any attachment may contain > > > confidential and privileged material intended for the addressee > only. > > If > > > you are not the addressee, you are notified that no part of the e- > mail > > or > > > any attachment may be disclosed, copied or distributed, and that > > > any > > other > > > action related to this e-mail or attachment is strictly > > > prohibited, > > and may > > > be unlawful. If you have received this e-mail by error, please > notify > > the > > > sender immediately by return e-mail, and delete this message. > > > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > and/or > > its > > > employees shall not be liable for the incorrect or incomplete > > transmission > > > of this e-mail or any attachments, nor responsible for any delay > > > in > > receipt. > > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > > Dutch > > > Airlines) is registered in Amstelveen, The Netherlands, with > > registered > > > number 33014286 > > > > > > > > > -- > > > -- > -- > > > For IBM-MAIN subscribe / signoff / arc
Re: Check out Massive Amazon cloud service outage disrupts sites
Since I can't find any other references, the full translation of the Dutch article is: Seattle- a simple typing error was the cause of the global failure Tuesday on the internet. That was announced by Amazon Web Services (AWS) Thursday. An employee tried to fix a problem with the cloud servers from Amazon and had to turn off some. However, he mistakenly gave the command to disable a range of servers. That triggered a chain reaction whereby the computers had to be restarted. Websites like Buzzfeed, Expedia and Medium were hours hardly reachable. Also Snapchat was suffering from the failure. Eventually the problem lasted four hours. Kees. > -Original Message- > From: Vernooij, Kees (ITOPT1) - KLM > Sent: 03 March, 2017 15:35 > To: 'IBM Mainframe Discussion List' <IBM-MAIN@LISTSERV.UA.EDU> > Subject: RE: Check out Massive Amazon cloud service outage disrupts > sites > > It was on Dutch newssites, with the text: "Amazon Web Services (AWS) > announced this on Thursday". > > Kees. > > > -Original Message- > > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > On > > Behalf Of zMan > > Sent: 03 March, 2017 15:31 > > To: IBM-MAIN@LISTSERV.UA.EDU > > Subject: Re: Check out Massive Amazon cloud service outage disrupts > > sites > > > > Cite? (Not challenging you, interested!) > > > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > > kees.verno...@klm.com> wrote: > > > > > The outage was caused by a typo. > > > I remember there were times we made scripts and tested them on our > > test > > > environments, to avoid silly errors in the production environment... > > > > > > Kees. > > > > > > > -Original Message- > > > > From: IBM Mainframe Discussion List [mailto:IBM- > > m...@listserv.ua.edu] On > > > > Behalf Of Edward Finnell > > > > Sent: 28 February, 2017 23:47 > > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > > Subject: Check out Massive Amazon cloud service outage disrupts > > sites > > > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > > > service-goes-down-sites-scramble > > > > /98530914/) > > > > > > > > Wondered why traffic was a little off. > > > > > > > > -- > -- > > -- > > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > > MAIN > > > > > > For information, services and offers, please visit our web site: > > > http://www.klm.com. This e-mail and any attachment may contain > > > confidential and privileged material intended for the addressee > only. > > If > > > you are not the addressee, you are notified that no part of the e- > mail > > or > > > any attachment may be disclosed, copied or distributed, and that any > > other > > > action related to this e-mail or attachment is strictly prohibited, > > and may > > > be unlawful. If you have received this e-mail by error, please > notify > > the > > > sender immediately by return e-mail, and delete this message. > > > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries > and/or > > its > > > employees shall not be liable for the incorrect or incomplete > > transmission > > > of this e-mail or any attachments, nor responsible for any delay in > > receipt. > > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > > Dutch > > > Airlines) is registered in Amstelveen, The Netherlands, with > > registered > > > number 33014286 > > > > > > > > > > -- > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > MAIN > > > > > > > > > > > -- > > zMan -- "I've got a mainframe and I'm not afraid to use it" > > > > -- > > For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to lists...@listserv.u
Re: Check out Massive Amazon cloud service outage disrupts sites
It was on Dutch newssites, with the text: "Amazon Web Services (AWS) announced this on Thursday". Kees. > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of zMan > Sent: 03 March, 2017 15:31 > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Check out Massive Amazon cloud service outage disrupts > sites > > Cite? (Not challenging you, interested!) > > On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < > kees.verno...@klm.com> wrote: > > > The outage was caused by a typo. > > I remember there were times we made scripts and tested them on our > test > > environments, to avoid silly errors in the production environment... > > > > Kees. > > > > > -Original Message- > > > From: IBM Mainframe Discussion List [mailto:IBM- > m...@listserv.ua.edu] On > > > Behalf Of Edward Finnell > > > Sent: 28 February, 2017 23:47 > > > To: IBM-MAIN@LISTSERV.UA.EDU > > > Subject: Check out Massive Amazon cloud service outage disrupts > sites > > > > > > _Massive Amazon cloud service outage disrupts sites_ > > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > > service-goes-down-sites-scramble > > > /98530914/) > > > > > > Wondered why traffic was a little off. > > > > > > > -- > > > For IBM-MAIN subscribe / signoff / archive access instructions, > > > send email to lists...@listserv.ua.edu with the message: INFO IBM- > MAIN > > > > For information, services and offers, please visit our web site: > > http://www.klm.com. This e-mail and any attachment may contain > > confidential and privileged material intended for the addressee only. > If > > you are not the addressee, you are notified that no part of the e-mail > or > > any attachment may be disclosed, copied or distributed, and that any > other > > action related to this e-mail or attachment is strictly prohibited, > and may > > be unlawful. If you have received this e-mail by error, please notify > the > > sender immediately by return e-mail, and delete this message. > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or > its > > employees shall not be liable for the incorrect or incomplete > transmission > > of this e-mail or any attachments, nor responsible for any delay in > receipt. > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > Dutch > > Airlines) is registered in Amstelveen, The Netherlands, with > registered > > number 33014286 > > > > > > -- > > For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > > > > > -- > zMan -- "I've got a mainframe and I'm not afraid to use it" > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Cite? (Not challenging you, interested!) On Fri, Mar 3, 2017 at 3:40 AM, Vernooij, Kees (ITOPT1) - KLM < kees.verno...@klm.com> wrote: > The outage was caused by a typo. > I remember there were times we made scripts and tested them on our test > environments, to avoid silly errors in the production environment... > > Kees. > > > -Original Message- > > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > > Behalf Of Edward Finnell > > Sent: 28 February, 2017 23:47 > > To: IBM-MAIN@LISTSERV.UA.EDU > > Subject: Check out Massive Amazon cloud service outage disrupts sites > > > > _Massive Amazon cloud service outage disrupts sites_ > > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > > service-goes-down-sites-scramble > > /98530914/) > > > > Wondered why traffic was a little off. > > > > -- > > For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee only. If > you are not the addressee, you are notified that no part of the e-mail or > any attachment may be disclosed, copied or distributed, and that any other > action related to this e-mail or attachment is strictly prohibited, and may > be unlawful. If you have received this e-mail by error, please notify the > sender immediately by return e-mail, and delete this message. > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its > employees shall not be liable for the incorrect or incomplete transmission > of this e-mail or any attachments, nor responsible for any delay in receipt. > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch > Airlines) is registered in Amstelveen, The Netherlands, with registered > number 33014286 > > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > -- zMan -- "I've got a mainframe and I'm not afraid to use it" -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
The outage was caused by a typo. I remember there were times we made scripts and tested them on our test environments, to avoid silly errors in the production environment... Kees. > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of Edward Finnell > Sent: 28 February, 2017 23:47 > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Check out Massive Amazon cloud service outage disrupts sites > > _Massive Amazon cloud service outage disrupts sites_ > (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud- > service-goes-down-sites-scramble > /98530914/) > > Wondered why traffic was a little off. > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
l...@garlic.com (Anne & Lynn Wheeler) writes: > trivia: I had worked with Jim Gray at IBM SJR ... before he left for > Tandem. At Tandem he does a detailed analysis of failure modes, finding > that hardware was in the process of becoming significantly more reliable > ... and failures were starting to shift to human error, software bugs, > and environmental (power, acts of nature, etc) ... copy of summary from > that study > http://www.garlic.com/~lynn/grayft84.pdf re: http://www.garlic.com/~lynn/2017c.html#13 Check out Massive Amazon cloud service outage disrupts sites http://www.garlic.com/~lynn/2017c.html#14 Check out Massive Amazon cloud service outage disrupts sites Amazon knocked AWS sites offline because of typo http://www.zdnet.com/article/amazon-knocked-aws-sites-offline-because-of-typo/ "Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended." ... my wife was in gburg JES group when she was con'ed into going to POK to be in charge of loosely-coupled architecture. While there she did peer-coupled shared data architecture which saw little uptake (except for IMS hot-standby) until SYSPLEX and Parallel SYSPLEX. She didn't remain long, in part because of little uptake and in part because of constant battles with the communication group trying to force her into using SNA/VTAM for loosely-coupled operation. long after we had left IBM, we use to periodically drop in on the guy responsible for running one of the largest financial transaction networks. He attributed their 100% availability over extended period of time to * geographically separated, triple-replicated IMS hot-standby * automated operator ... aka human error. old reference to having done work on operator automation in the early 70s ... originally for running automated unattended benchmarking ... that included automated system reboot with possibly reconfiguration &/or different kernel, between each benchmark. http://www.garlic.com/~lynn/94.html#2 -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
charl...@mcn.org (Charles Mills) writes: > It is hard to prepare for unknown unknowns. It is legendary that people have > had recovery failures because the fallover switch (channel, power, network, > whatever) failed. re: http://www.garlic.com/~lynn/2017c.html#13 Check out Massive Amazon cloud service outage disrupts sites unks unks are frequently not having done detailed end-to-end evaluations of all possible scenarios. we started out claiming no-single-point-of-failure ... but it required doing end-to-end walk through looking for all sort of critical components (I was also asked to review IBM RAID designs and sometimes would uncover single-point-of-failure in the least anticipated places). This included replicated fallover switches and process over precedence of failover (as part of handling some race conditions). Also needed a inverse "RESERVE" ... there is a failure case where a processor gets suspended just before a write operation, which then kicks-off recovery processes. The processor that is assumed to have failed has to be "fenced off" from proceeding with a write operation when it wakes up (aka RESERVE allows only one processor to write and prevents all other, inverse "RESERVE" blocks one or more identified processors from writing; there also has to be tie-breaker process for race conditions). "real" no-signel-point-of-failures contributed to having to specify geographically separated operation. we also started defining what was needed to handle multiple points of failure ... and looking at 5-nines availability configurations. http://www.garlic.com/~lynn/submain.html#available As undergraduate at the univ ... I was first hired as fulltime person to be responsible for IBM production mainframe systems. Then before graduation, I was hired fulltime by Boeing to help with creation of Boeing Computer Serivces (consolidate all dataprocessing in an independent business unit to better monetize the investment, including offering services to non-Boeing entities). I thot Renton datacenter was possibly largest in the world with something like $300M (late 60s dollars) in ibm mainframes (360/65s were arriving faster than they could be installed). 747#3 was flying skies of seattle getting FAA flt. certification. There was also decision to replicate Renton up at the new 747 plant in Everett ... there was disaster scenario where Mt. Rainier heats up and the resulting mudslide takes out the Renton datacenter). I finally join IBM (science center) after graduation ... some past posts http://www.garlic.com/~lynn/subtopic.html#545tech One of the things that was exposed in the 70s with respect to IBM dasd ... was IBM mainframe channel/dasd had (for some time, up through 80s) a undetectable power interruption failure mode in the middle of write operation ... control/dasd had sufficient power to complete write correctly, but there wasn't sufficient power to transfer data from processor memory ... so the record write was completed with all zeros with valid error correcting codes. In CMS case, MFD is somewhat equivalent of OS VTOC, change was made to have pairs of alternating MFD records, with sequence appended. A power-interrupted MFD write would zero all or part of appended sequence ... so it wouldn't appear most current during recovery (and the other MFD would be used). Towards the mid-80s there was controller work to try and handle the case (for operating systems that didn't know how). Later hardware solution was that all the data had to be available for the write to start. later they let me play disk engineer in bldgs 14&15 ... some past posts http://www.garlic.com/~lynn/subtopic.html#disk they had bunch of mainframes for dasd engineering testing that were scheduled stand-alone 7x24 around the clock. They had once tried to do testing under MVS ... but in that environment, MVS had 15min MTBF, requiring manual re-ipl. I offerred to redo input/output supervisor that was bullet proof and never fail ... greatly improving productivity, allowing anytime, on-demand concurrent testing. When I wrote up the wrote in an internal report, I may have made a mistake mentioning the MVS 15min MTBF ... because I was later told that the MVS RAS group did their best to have me separated from the company. A couple years later ... field engineering had 3880 controller error regression test with 57 "injected" errors (that they considered typical and likely to occur). MVS was failing in all 57 cases (requiring manual re-ipl) ... and in 2/3rds of the cases, no indiciation of what was responsible for the failure ... previously posted old email http://www.garlic.com/~lynn/2007.html#email801015 trivia: I had worked with Jim Gray at IBM SJR ... before he left for Tandem. At Tandem he does a detailed analysis of failure modes, finding that hardware was in the process of becoming significantly more reliable ... and failures were starting to shift to human error, software
Re: Check out Massive Amazon cloud service outage disrupts sites
> On Mar 1, 2017, at 11:40 AM, Barry Merrillwrote: > > John Deere's data center in the 70s had multiple AC providers but still had > Midwest lightning > induced outages, but the data center manager was unsuccessful in installing a > diesel backup > system because the only UPS system that met their power needs was available > ONLY with a > Caterpillar Diesel engine, and the board would not approve. > > After the third or fourth outage, the data center manager was finally allowed > to build > a GREEN windowless building to house the diesel and generator, and they snuck > the yellow > Cat engine into its home in the middle of the night. > > Barry Merrill > —SNIP— In that same time frame we had a DC in downtown Chicago. There was a suggestion that we move the DC to another building but the building didn’t have access to two different power grids (I think that was the term) so the idea was delayed until the change happened. We moved the DC and have not had a power outage since (although it could have happened after I lefty, I doubt it). Ed -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
It is hard to prepare for unknown unknowns. It is legendary that people have had recovery failures because the fallover switch (channel, power, network, whatever) failed. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Anne & Lynn Wheeler Sent: Wednesday, March 1, 2017 9:47 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites jesse1.robin...@sce.com (Jesse 1 Robinson) writes: > Our data center folks insist on dual power feeds for everything, > sometimes infuriatingly so. To test power redundancy, they > occasionally drop one power feed or the other--with ample heads > up--and check that all devices are functioning. Other than call-home > events, we have not had any surprises so far. when we were doing IBM's ha/cmp product ... some past posts http://www.garlic.com/~lynn/subtopic.html#hacmp we went around to various customers talking about failure modes. One customer in Manhatten had carefully chosen a building that that telco feeds from four different substations on four sides of the bldg, power from different substations on opposite sides of the bldg, and water from different water mains on opposite sides of the bldg. The datacenter was shutdown when transformer in the basement exploded and the bldg had to be evacuate because of contamination. There were a number of other customers with similar stories. It was while out talking to customers that I coined disaster survivability (to differentiate from disaster recovery) and geographic survivabilty. some past posts http://www.garlic.com/~lynn/submain.html#availability I was then asked to write a section for the corporate continuous availability strategy document. The section got pulled when both Rochester (as/400) and POK (es/9000) complained that they weren't able to meet the requirements. trivia: mainframe DB2 group were also complaining if I was allowed to proceed with (commercial) HA/CMP cluster scaleup, it would be at least 5yrs ahead of them. old post about Jan1992 meeting in Ellison's conference room http://www.garlic.com/~lynn/95.html#13 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
jesse1.robin...@sce.com (Jesse 1 Robinson) writes: > Our data center folks insist on dual power feeds for everything, > sometimes infuriatingly so. To test power redundancy, they > occasionally drop one power feed or the other--with ample heads > up--and check that all devices are functioning. Other than call-home > events, we have not had any surprises so far. when we were doing IBM's ha/cmp product ... some past posts http://www.garlic.com/~lynn/subtopic.html#hacmp we went around to various customers talking about failure modes. One customer in Manhatten had carefully chosen a building that that telco feeds from four different substations on four sides of the bldg, power from different substations on opposite sides of the bldg, and water from different water mains on opposite sides of the bldg. The datacenter was shutdown when transformer in the basement exploded and the bldg had to be evacuate because of contamination. There were a number of other customers with similar stories. It was while out talking to customers that I coined disaster survivability (to differentiate from disaster recovery) and geographic survivabilty. some past posts http://www.garlic.com/~lynn/submain.html#availability I was then asked to write a section for the corporate continuous availability strategy document. The section got pulled when both Rochester (as/400) and POK (es/9000) complained that they weren't able to meet the requirements. trivia: mainframe DB2 group were also complaining if I was allowed to proceed with (commercial) HA/CMP cluster scaleup, it would be at least 5yrs ahead of them. old post about Jan1992 meeting in Ellison's conference room http://www.garlic.com/~lynn/95.html#13 -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
John Deere's data center in the 70s had multiple AC providers but still had Midwest lightning induced outages, but the data center manager was unsuccessful in installing a diesel backup system because the only UPS system that met their power needs was available ONLY with a Caterpillar Diesel engine, and the board would not approve. After the third or fourth outage, the data center manager was finally allowed to build a GREEN windowless building to house the diesel and generator, and they snuck the yellow Cat engine into its home in the middle of the night. Barry Merrill Merrilly yours, Herbert W. Barry Merrill, PhD President-Programmer Merrill Consultants MXG Software 10717 Cromwell Drive technical questions: supp...@mxg.com Dallas, TX 75229 http://www.mxg.comadmin questions: ad...@mxg.com tel: 214 351 1966 fax: 214 350 3694 -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jesse 1 Robinson Sent: Wednesday, March 1, 2017 11:25 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites I heard about a major electric utility that had an influential VP who believed that it was 'unseemly' for the company to spend money on backup power. Sort dissing their own product. He blocked all efforts to install diesel generators. Thankfully he is long gone. A major multi-state event in the mid-2000s induced the company to finally invest in serious power backup and DR. I love the old saw about the cobbler's children going without shoes. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 323-715-0595 Mobile 626-543-6132 Office ⇐=== NEW robin...@sce.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, March 01, 2017 9:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: (External):Re: Check out Massive Amazon cloud service outage disrupts sites And you're in the electric power BUSINESS! Up here in northern CA we used to joke that PG's company song should be the Simon & Garfunkel "Hello Darkness my Old Friend." Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jesse 1 Robinson Sent: Wednesday, March 1, 2017 8:27 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites Our data center folks insist on dual power feeds for everything, sometimes infuriatingly so. To test power redundancy, they occasionally drop one power feed or the other--with ample heads up--and check that all devices are functioning. Other than call-home events, we have not had any surprises so far. Testing for a full power outage (both sides) is a lot harder, but it has been done here a few times. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Wed, 1 Mar 2017 09:02:47 -0800, Charles Mills (charl...@mcn.org) wrote about "Re: Check out Massive Amazon cloud service outage disrupts sites" (in <04c101d292ad$a751f750$f5f5e5f0$@mcn.org>): [snip] > Up here in northern CA we used to joke that PG's company song > should be the Simon & Garfunkel "Hello Darkness my Old Friend." Given that the song's title is "Sound of Silence", it would apply to stereo systems too. ... :-)) -- Regards, Dave [RLU #314465] *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* david.w.n...@googlemail.com (David W Noon) *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
I heard about a major electric utility that had an influential VP who believed that it was 'unseemly' for the company to spend money on backup power. Sort dissing their own product. He blocked all efforts to install diesel generators. Thankfully he is long gone. A major multi-state event in the mid-2000s induced the company to finally invest in serious power backup and DR. I love the old saw about the cobbler's children going without shoes. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 323-715-0595 Mobile 626-543-6132 Office ⇐=== NEW robin...@sce.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, March 01, 2017 9:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: (External):Re: Check out Massive Amazon cloud service outage disrupts sites And you're in the electric power BUSINESS! Up here in northern CA we used to joke that PG's company song should be the Simon & Garfunkel "Hello Darkness my Old Friend." Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jesse 1 Robinson Sent: Wednesday, March 1, 2017 8:27 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites Our data center folks insist on dual power feeds for everything, sometimes infuriatingly so. To test power redundancy, they occasionally drop one power feed or the other--with ample heads up--and check that all devices are functioning. Other than call-home events, we have not had any surprises so far. Testing for a full power outage (both sides) is a lot harder, but it has been done here a few times. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
And you're in the electric power BUSINESS! Up here in northern CA we used to joke that PG's company song should be the Simon & Garfunkel "Hello Darkness my Old Friend." Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jesse 1 Robinson Sent: Wednesday, March 1, 2017 8:27 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites Our data center folks insist on dual power feeds for everything, sometimes infuriatingly so. To test power redundancy, they occasionally drop one power feed or the other--with ample heads up--and check that all devices are functioning. Other than call-home events, we have not had any surprises so far. Testing for a full power outage (both sides) is a lot harder, but it has been done here a few times. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Our data center folks insist on dual power feeds for everything, sometimes infuriatingly so. To test power redundancy, they occasionally drop one power feed or the other--with ample heads up--and check that all devices are functioning. Other than call-home events, we have not had any surprises so far. Testing for a full power outage (both sides) is a lot harder, but it has been done here a few times. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 323-715-0595 Mobile 626-543-6132 Office ⇐=== NEW robin...@sce.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Mike Beer Sent: Wednesday, March 01, 2017 5:45 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: (External):Re: Check out Massive Amazon cloud service outage disrupts sites >Ah, probably happened when the floor maintenance technician (janitor inold-speak) repurposed a electrical outlet. Usually large datacenters test their emergency procedures (at least once a year) - unfortunately the redundant power supply does not always work. And some servers might not resume work automatically... Mike -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
>Ah, probably happened when the floor maintenance technician (janitor inold-speak) repurposed a electrical outlet. Usually large datacenters test their emergency procedures (at least once a year) - unfortunately the redundant power supply does not always work. And some servers might not resume work automatically... Mike -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Tue, Feb 28, 2017 at 5:52 PM, Edward Finnell < 000248cce9f3-dmarc-requ...@listserv.ua.edu> wrote: > The notice says check the Dashboard for updates. They seem to have one or > two of these a year. So far they've been pretty tight lipped about it. > > Ah, probably happened when the floor maintenance technician (janitor in old-speak) repurposed a electrical outlet. -- "Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion Maranatha! <>< John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
On Tue, 28 Feb 2017 17:46:46 -0500, Edward Finnell wrote: >_Massive Amazon cloud service outage disrupts sites_ >(http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud-service-goes-down-sites-scramble >/98530914/) > >Wondered why traffic was a little off. > That URL is broken by wrap. Google gives me one that works: http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud-service-goes-down-sites-scramble/98530914/ -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Yep, I've had the dashboard up since this morning. I'll be interested to learn what (if any) recourse paying customers have if they can say (and prove): " I lost $ due to this outage" VMs that don't (automatically?) have access to the replicated data within the AZ? That would be bad. More to come, I'm sure! -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Edward Finnell Sent: Tuesday, February 28, 2017 4:52 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Check out Massive Amazon cloud service outage disrupts sites [ EXTERNAL ] The notice says check the Dashboard for updates. They seem to have one or two of these a year. So far they've been pretty tight lipped about it. In a message dated 2/28/2017 5:35:37 P.M. Central Standard Time, bles...@ofiglobal.com writes: I'm doing an AWS class and was online fiddling with my EC2 instance when it happened. I use us-west-1a (I'm in Colorado), but there was some increased response even there - maybe as they moved resources? I thought this was not supposed to happen, ever. Should be interesting to see what we learn going forward. Thanks! -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This e-mail transmission may contain information that is proprietary, privileged and/or confidential and is intended exclusively for the person(s) to whom it is addressed. Any use, copying, retention or disclosure by any person other than the intended recipient or the intended recipient's designees is strictly prohibited. If you are not the intended recipient or their designee, please notify the sender immediately by return e-mail and delete all copies. OppenheimerFunds may, at its sole discretion, monitor, review, retain and/or disclose the content of all email communications. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
The notice says check the Dashboard for updates. They seem to have one or two of these a year. So far they've been pretty tight lipped about it. In a message dated 2/28/2017 5:35:37 P.M. Central Standard Time, bles...@ofiglobal.com writes: I'm doing an AWS class and was online fiddling with my EC2 instance when it happened. I use us-west-1a (I'm in Colorado), but there was some increased response even there - maybe as they moved resources? I thought this was not supposed to happen, ever. Should be interesting to see what we learn going forward. Thanks! -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Check out Massive Amazon cloud service outage disrupts sites
Hi Ed! I'm doing an AWS class and was online fiddling with my EC2 instance when it happened. I use us-west-1a (I'm in Colorado), but there was some increased response even there - maybe as they moved resources? I thought this was not supposed to happen, ever. Should be interesting to see what we learn going forward. Thanks! BobL -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Edward Finnell Sent: Tuesday, February 28, 2017 3:47 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Check out Massive Amazon cloud service outage disrupts sites [ EXTERNAL ] _Massive Amazon cloud service outage disrupts sites_ (https://urldefense.proofpoint.com/v2/url?u=http-3A__www.usatoday.com_story_tech_news_2017_02_28_amazons-2Dcloud-2Dservice-2Dgoes-2Ddown-2Dsites-2Dscramble=DwICAg=huW-Z3760n7oNORvLCN2eJBo4X7nIGCr9Ffht-z0f4k=1KMMjoSvFEwY7ZoooplFIrKcOeeTJVI4X6Bc3o6vdK4=jyMISBULPMnoU7rInNUdpCpFoRdcz0WuaEppIETiqMI=70jvEDM3OSBtkoXSmN-K_b_Z1tgmShl1K5OIN0hq5gE= /98530914/) Wondered why traffic was a little off. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This e-mail transmission may contain information that is proprietary, privileged and/or confidential and is intended exclusively for the person(s) to whom it is addressed. Any use, copying, retention or disclosure by any person other than the intended recipient or the intended recipient's designees is strictly prohibited. If you are not the intended recipient or their designee, please notify the sender immediately by return e-mail and delete all copies. OppenheimerFunds may, at its sole discretion, monitor, review, retain and/or disclose the content of all email communications. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Check out Massive Amazon cloud service outage disrupts sites
_Massive Amazon cloud service outage disrupts sites_ (http://www.usatoday.com/story/tech/news/2017/02/28/amazons-cloud-service-goes-down-sites-scramble /98530914/) Wondered why traffic was a little off. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN