Re: [uknof] Example of total DC loss

2017-06-13 Thread Paul Mansfield
not an ISP or datacentre outage, but somewhat similar:

http://www.independent.co.uk/news/uk/home-news/victoria-line-cement-flooding-fixed-workers-used-sugar-to-stop-spilled-concrete-from-setting-9082206.html


I've had water leaks from aircon systems in computer rooms, but never
had to deal with a flood of concrete!



Re: [uknof] Example of total DC loss

2017-06-05 Thread Neil J. McRae
Completely agree - especially when all the telemetry is saying - everything is 
ok!

Neil  

Sent from my iPhone

> On 3 Jun 2017, at 20:56, Nick Hilliard  wrote:
> 
> Neil J. McRae wrote:
>> That telehouse one below was 1997 I was in the building - it was bad but
>> amusing watching the telehouse ops guys running around like headless
>> chickens!
> 
> from personal experience, I can say that the silence which ensues from a
> power failure in a data centre is seriously creepy.
> 
> Also, due respects to anyone caught in the situation where the entire
> show has derailed.  The term "blind panic" doesn't do justice to the
> situation.
> 
> Nick



Re: [uknof] Example of total DC loss

2017-06-05 Thread David Derrick

On 03/06/2017 20:56, Nick Hilliard wrote:


from personal experience, I can say that the silence which ensues from a
power failure in a data centre is seriously creepy.


Oh yes.

Took me a moment to realise why it was so quiet after I'd caused the 
breaker to our Telehouse suite to pop. Followed by a very fast walk down 
to Ops. That was not a fun day.

--
David Derrick
Systems & Network Engineer
Entanet International Ltd
0330 100 0330



Re: [uknof] Example of total DC loss

2017-06-03 Thread Scott Weeks


--- n...@foobar.org wrote:
From: Nick Hilliard 

from personal experience, I can say that the silence which 
ensues from a power failure in a data centre is seriously 
creepy.
-

Yes, I've experienced that, too.  And the looks on everyone's 
faces during the silence is also creepy...

scott



Re: [uknof] Example of total DC loss

2017-06-03 Thread Paul Brown


Quoting Nick Hilliard :



from personal experience, I can say that the silence which ensues from a
power failure in a data centre is seriously creepy.



Oh yes - I was indirectly the cause of that happening at one hosting  
location in the Midlands about ten years ago when inserting a Dell  
blade server into its chassis, somehow the power rails shorted, and  
dumped about 60A straight through some copper which wasn't man enough  
for it and which then proceeded to vaporise a chunk of the blade  
motherboard, and both multi-way connectors (Blade side and chassis  
side).


One of my guys who was working behind it ended up inhaling a load of  
incredibly magic smoke, and the DCs VESDA install proved its worth by  
SCRAMing the entire floor.


Yeah - took a lot of other customers out with it, and we were banned  
from installing Dell blades at that location ever again.


Not a day long outage - took about an hour to verify that the site was  
good to restart, and to start feeding power in again, but yeah - the  
eerie silence after the white noise of several thousand server fans  
and such t like.


I bought a hundred and fifty HP DL140G2's straight afterwards and  
started ripping out the Dell kit.


P.





Re: [uknof] Example of total DC loss

2017-06-03 Thread Simon Gunton
On 3 Jun 2017 8:56 pm, "Nick Hilliard"  wrote:


from personal experience, I can say that the silence which ensues from a
power failure in a data centre is seriously creepy.


I used to do the gentests in $DayJob-1 and yeh the moment the ATS switches
and the cooling goes silent and the house lights go out for a brief moment.
Oh and jumping out of your skin when the twang of the ATS springs kick in.

Simon


Re: [uknof] Example of total DC loss

2017-06-03 Thread Nick Hilliard
Neil J. McRae wrote:
> That telehouse one below was 1997 I was in the building - it was bad but
> amusing watching the telehouse ops guys running around like headless
> chickens!

from personal experience, I can say that the silence which ensues from a
power failure in a data centre is seriously creepy.

Also, due respects to anyone caught in the situation where the entire
show has derailed.  The term "blind panic" doesn't do justice to the
situation.

Nick



Re: [uknof] Example of total DC loss

2017-06-03 Thread Maria Blackmore
On 2 June 2017 at 15:06, Paul Brown  wrote:

> There was also the Telehouse incident in 1998 (I think - may have been
> '97) when somebody hit the Master Off switch in one of the suites rather
> than the Open Door one, and then lots of ISP's kit responded badly to the
> power surge when the breaker was reset.
>
There was also an incident at Telehouse North in the early 2000s during a
full load generator test, when they simulated a generator failure. The
remaining generators couldn't handle the load step, and also shut down. As
they picked up the load, the UPS system feeding the B bus (iirc) suffered
an undervolt for a single cycle, which caused the UPS to go to bypass. The
bypass feed was connected to the grid breakers, which were open at the time
for the generator test, and everything became very quiet in Telehouse for
about six seconds. This was presumably how long it takes for someone to
swear viciously and close the grid breakers, at which point it became only
marginally louder as almost every single breaker on every single PDU on the
B bus tripped all at once.

Fun times.


-- 
Maria Blackmore
Professional Network Fairy


Re: [uknof] Example of total DC loss

2017-06-02 Thread Paul Brown
I'm hunting for an examples of long duration data centre outages in  
the UK, from a day of downtime to total data centre loss (explosion or  
some other industrial accident).


Cable and Wireless Watford

http://www.telegraph.co.uk/news/uknews/2282427/Metal-thieves-to-blame-for-Sainsburys-website-blackout.html

IIRC they had several instances of this happening to that particular site.

Also, Manchester Guardian Tunnels (BT) fire

https://www.theguardian.com/uk/2004/mar/30/simonjeffery

Also, not a datacentre outage, but a critical infrastructure failure  
when the Colossus backbone imploded.


https://www.theregister.co.uk/2001/11/20/uk_hit_by_major_adsl/

Quite a few (Including IIRC a CapGemini site on behalf of the NHS) on  
Brakspear Way in Hemel were rendered inoperable/inaccessible after the  
Buncefield disaster.


http://www.theregister.co.uk/2005/12/12/oil_blast_northgate/

There was also the Telehouse incident in 1998 (I think - may have been  
'97) when somebody hit the Master Off switch in one of the suites  
rather than the Open Door one, and then lots of ISP's kit responded  
badly to the power surge when the breaker was reset. Brief total  
outage (IIRC it took out a large chunk of LINX infrastructure briefly)  
and substantial impact for weeks afterwards as fragile PSUs in  
installed kit either failed on restart, or suffered premature death -  
I know of one Ascend MAX that never came back to life.


Paul


Re: [uknof] Example of total DC loss

2017-06-02 Thread Tom Hill
On 02/06/17 12:29, Peter Knapp wrote:
> FWIW this took out a load of subtend exchanges as well, and even
> “passthru” services, or so you would believe that only relied on the ODF
> were affected although I understand this was an accident rather than
> that EADs had optical amps in the building. Someone commented (at the
> time) somebody “put their foot in it” – quite literally, in badly
> managed fibre alongside one of the ODFs and ripped a bunch out.

This bodes well for DFA!

-- 
Tom Hill
Network Manager

Bytemark Hosting
http://www.bytemark.co.uk/
tel. +44 1904 890 890



signature.asc
Description: OpenPGP digital signature


Re: [uknof] Example of total DC loss

2017-06-02 Thread Pete Stevens

I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss
(explosion or some other industrial accident).


Northgate systems lost theirs after Buncefield blew up.

http://www.information-age.com/getting-back-to-business-after-buncefield-284331/

Didn't amke many headlines because their DR plan was pretty good.

Pete

--
Pete Stevens
p...@ex-parrot.com
http://www.ex-parrot.com/~pete/

The last time humans crossed space to a destination was the Apollo 17 mission
in 1972. In the 32 years since, no man has seen, with his own eyes, Earth as
that beautiful, solitary blue sphere, and - reality check - no woman has ever
seen it at all.
   -- James Cameron

Re: [uknof] Example of total DC loss

2017-06-02 Thread Peter Knapp
You have to love some of the comments though:

yorkie71 20th January 2016
and when this happened I never knew BT Exchange being down would also impact 
Mobile phone signals who knew I thought the magic satellites/mobile masts 
did that


Aye Yorkie71, mobile phone backhaul is via Sootys’ magic wand.

FWIW this took out a load of subtend exchanges as well, and even “passthru” 
services, or so you would believe that only relied on the ODF were affected 
although I understand this was an accident rather than that EADs had optical 
amps in the building. Someone commented (at the time) somebody “put their foot 
in it” – quite literally, in badly managed fibre alongside one of the ODFs and 
ripped a bunch out.


Much the same happened in Leeds centre, although the shadow exchange was 
pressed into service here..

From: uknof [mailto:uknof-boun...@lists.uknof.org.uk] On Behalf Of Alistair 
Cockeram
Sent: 02 June 2017 12:06
To: uknof@lists.uknof.org.uk
Subject: Re: [uknof] Example of total DC loss

On 1 June 2017 at 11:50, Simon Green 
<simon.gr...@wirehive.com<mailto:simon.gr...@wirehive.com>> wrote:
I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).
 [...]
Slightly more casually interested in BT exchanges as well.

York Stonebow.

http://www.yorkpress.co.uk/news/14214765.FLOODS__BT_investigating_how_to_prevent_repeat_of_outage_that_hit_50_000_properties/

--
Alistair Cockeram


Re: [uknof] Example of total DC loss

2017-06-01 Thread Paul Mansfield
The first one, I forget all the details but I think @Nick Sellors
who's probably a lurker on this list, may give more details, but as I
remember it...

Teledanmark (I think) had a PoP whose UPS needed upgrading. In
preparation they topped up the diesel and fired up the backup
generator, put the UPS into bypass and it was removed.
The lorry carrying the new generator was reversing onto the site, and
collided with the generator and took the site offline.


The second one, and unfortunately my google-fu has failed to find the
original text, is a story of how the night shift at a large computing
facility would get bored and play cricket using a ball made of a
rolled-up ball of paper and an improvised bat. The games got fiercer
and the players better. Then one day someone hit the "ball" really
hard for a perfect six and it hit the emergency power off. I seem to
recall the author saying the sudden silence was shocking.

The closest I could get to the story:
https://www.usenix.org/legacy/publications/bibliography/byDate.html
http://ftp.math.utah.edu/pub/tex/bib/usenix1990.html#Anonymous:1996:HSC



Re: [uknof] Example of total DC loss

2017-06-01 Thread Peter Knapp
Be*There, Global Crossing/Level 3 etc.

http://www.ispreview.co.uk/index.php/2013/09/major-broadband-outage-northampton-uk-business-park-fire.html

Peter

From: uknof [mailto:uknof-boun...@lists.uknof.org.uk] On Behalf Of Jack Kay
Sent: 01 June 2017 18:50
To: Simon Green
Cc: uknof@lists.uknof.org.uk
Subject: Re: [uknof] Example of total DC loss

https://www.theregister.co.uk/2015/11/18/telecity_outage_fix_failed/

The SOV outage wasn’t day long.. but the fallout and “AT RISK” period that came 
around that spanned several days.


On 1 Jun 2017, at 11:50, Simon Green 
<simon.gr...@wirehive.com<mailto:simon.gr...@wirehive.com>> wrote:

Morning List :)

I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).

Is anyone aware of any tails they could share? Bigger and higher impact the 
better.

Slightly more casually interested in BT exchanges as well.

I’m aware of:
• Several corporate incidents, including Three, Capita, and Vodafone
• The Telecity power issues from a few years back, though they were 
less than a day



Simon



Re: [uknof] Example of total DC loss

2017-06-01 Thread Job Snijders
The syslog is real :-)

On Thu, 1 Jun 2017 at 16:59, Will Hargrave  wrote:

> On 1 Jun 2017, at 15:09, Rob Evans wrote:
>
> >> As I recall fuzzy though was AC issues caused by dust and they
> >> eventually ran out of fuel for the generator as the port authority
> >> wouldn't allow tankers onto Manhattan. Might be wrong on that long
> >> time ago.
> > We have an epic ticket from that, I’ll see if I can find a copy.
>
> The catastrophic fire which took out University of Twente’s DC in 2002
> is an example that springs to mind. Not sure if the legendary syslog
> messages are fake or not:
>
>
> lo0.ar5.enschede1.surf.net 3613: Nov 20 07:20:50.927 UTC:
> %ENV_MON-2-TEMP: Hotpoint temp sensor(slot 18) temperature has reached
> WARNING level at 61(C)
> lo0.cr2.amsterdam2.surf.net 1146: Nov 20 07:20:56.458 UTC:
> %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar5.enschede1 (POS2/0) Down,
> interface deleted(non-iih)
>
> --
> Will Hargrave
> Technical Director
> LONAP Ltd
> +44 114 303 
>
>


Re: [uknof] Example of total DC loss

2017-06-01 Thread Neil J. McRae
Depressing seeing so many link from the register. Very sad.

Sent from my iPhone

On 1 Jun 2017, at 18:53, Jack Kay > wrote:

https://www.theregister.co.uk/2015/11/18/telecity_outage_fix_failed/

The SOV outage wasn’t day long.. but the fallout and “AT RISK” period that came 
around that spanned several days.


On 1 Jun 2017, at 11:50, Simon Green 
> wrote:

Morning List :)

I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).

Is anyone aware of any tails they could share? Bigger and higher impact the 
better.

Slightly more casually interested in BT exchanges as well.

I’m aware of:
• Several corporate incidents, including Three, Capita, and Vodafone
• The Telecity power issues from a few years back, though they were 
less than a day



Simon



Re: [uknof] Example of total DC loss

2017-06-01 Thread Jack Kay
https://www.theregister.co.uk/2015/11/18/telecity_outage_fix_failed/

The SOV outage wasn’t day long.. but the fallout and “AT RISK” period that came 
around that spanned several days.


On 1 Jun 2017, at 11:50, Simon Green 
> wrote:

Morning List :)

I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).

Is anyone aware of any tails they could share? Bigger and higher impact the 
better.

Slightly more casually interested in BT exchanges as well.

I’m aware of:
• Several corporate incidents, including Three, Capita, and Vodafone
• The Telecity power issues from a few years back, though they were 
less than a day



Simon



Re: [uknof] Example of total DC loss

2017-06-01 Thread Tony Finch
Will Hargrave  wrote:
>
> The catastrophic fire which took out University of Twente’s DC in 2002 is an
> example that springs to mind. Not sure if the legendary syslog messages are
> fake or not:
>
> lo0.ar5.enschede1.surf.net 3613: Nov 20 07:20:50.927 UTC: %ENV_MON-2-TEMP: 
> Hotpoint temp sensor(slot 18) temperature has reached WARNING level at 61(C)
> lo0.cr2.amsterdam2.surf.net 1146: Nov 20 07:20:56.458 UTC: %CLNS-5-ADJCHANGE: 
> ISIS: Adjacency to ar5.enschede1 (POS2/0) Down, interface deleted(non-iih)

Reminds me of when one of our E450s lost its magic smoke - nearly burned a
hole in its motherboard, but still managed to print a panic message on its
console. Happily it did not set off the fire alarm...

http://people.ds.cam.ac.uk/fanf2/hermes/doc/misc/orange-fire/

Tony.
-- 
f.anthony.n.finch    http://dotat.at/  -  I xn--zr8h punycode
Southeast Fitzroy: Variable 4, becoming northerly 4 or 5. Moderate or rough.
Fair then occasional rain. Good, occasionally moderate.

Re: [uknof] Example of total DC loss

2017-06-01 Thread Will Hargrave

On 1 Jun 2017, at 15:09, Rob Evans wrote:

As I recall fuzzy though was AC issues caused by dust and they 
eventually ran out of fuel for the generator as the port authority 
wouldn't allow tankers onto Manhattan. Might be wrong on that long 
time ago.

We have an epic ticket from that, I’ll see if I can find a copy.


The catastrophic fire which took out University of Twente’s DC in 2002 
is an example that springs to mind. Not sure if the legendary syslog 
messages are fake or not:



lo0.ar5.enschede1.surf.net 3613: Nov 20 07:20:50.927 UTC: 
%ENV_MON-2-TEMP: Hotpoint temp sensor(slot 18) temperature has reached 
WARNING level at 61(C)
lo0.cr2.amsterdam2.surf.net 1146: Nov 20 07:20:56.458 UTC: 
%CLNS-5-ADJCHANGE: ISIS: Adjacency to ar5.enschede1 (POS2/0) Down, 
interface deleted(non-iih)


--
Will Hargrave
Technical Director
LONAP Ltd
+44 114 303 



Re: [uknof] Example of total DC loss

2017-06-01 Thread Rob Evans

> As a reminder for the bitrot in the last 16 years...
> 
>   http://www.slimey.org/bbc_ticket_10083.txt

Our version of the same is appended (newest updates at top).

Rob

> Ticket Number: 20010912-2  Ticket Status: UPDATE
> Ticket Type  : Unscheduled Ticket Source: TEN-US NOC
> Ticket Scope : SiteSite/Line: New York
> Ticket Owner : TEN-US NOC  Problem Fixer: Telehouse NY
> Ticket Opened: 20010912 12:08 UTC  Problem Start: 20010911 20:35 UTC
> Ticket Update: 20010918 05:13 UTC
> Ticket Closed: Problem Ends :
> 
> 
> Ticket Summary: Status of 25 Broadway
> 
> Problem Description:
> 
> This ticket is being issued to track the status of the infrastructure
> at the DANTE World Services PoP in New York.  Individual circuits
> will be dealt with separately.  Updates will be in the "actions" section
> below.
> 
> Affected:
> 
> DANTE World Service
> 
> Actions:
> 
> cziarhe   20010918 05:13 UTC
> Load was switched from the Con Ed generator to the critical generator
> as planned.  Unfortunately, after about 45 minutes the generator
> again began to overheat and the load was switched back.  This was
> achieved without interruption to service.  It is now believed this
> could be due to limescale build-up in the radiator.
> 
> This is to be rectified by using an acid wash through the cooling
> system and again running the generator under load at 22:30 UTC
> tonight (Tuesday).
> 
> A plan is also being developed to bring in a new generator should
> the situation with the critical generator persist.
> 
> 
> 
> Time to Fix: (Hours:Mins)
> 
> 
> 
> Fix:
> 
> 
> 
> History:
> 
> cziarhe   20010917 21:53 UTC
> Telehouse NY believe the problems with their "critical" generator
> have been caused by faulty thermostats.  These have now been replaced
> and the generator has been spun up for a basic test.  The next
> step is to try a load test, which will begin at 22:30 UTC.  If
> this is successful, then load from the "essential" generator, which
> supplies the air conditioning will be transferred to the Con Ed
> generator for maintainence to take place on this generator, which
> has also been running overtemperature.
> 
> Assuming all this works successfully, Telehouse plan to run the
> Con Ed generator in conjunction with either the essential (Air
> Con) or critical (equipment) generator to make the best use of
> available fuel.  Further fuel supply problems are not thought to
> be likely as there are 17 tankers in the city, and a further 30
> generators are available.
> 
> The IP NOC will monitor the initial switch to the critical generator
> to ensure equipment remains in operation, and reinstall the workaround
> with KPN should it be needed.
> 
> cziarhe   20010916 14:59 UTC
> The latest update from Telehouse reports that an extra 2000 gallon
> fuel tank is expected to be connected to the Con Ed generator soon
> to allow for longer refuelling intervals.  in addition, Con Ed's
> refuelling process to all buildings in the area has now settled
> down, so it should be as reliable as possible in the circumstances.
> 
> An engineer is currently onsite investigating the problems with
> Telehouse's own generator.
> 
> cziarhe   20010916 12:05 UTC
> Regular fuel deliveries appear to have been secured for the Con
> Ed generator.  There were deliveries at 02:00 UTC and 11:30 UTC,
> and another is scheduled for 21:30 UTC.
> 
> The status of the Telehouse generator is currently unknown.
> 
> cziajom   20010915 20:16 UTC
> Contacted Telehouse NY. Informed that Con Ed generator is now up,
> with enough fuel until Tuesday (18 Sep). Engineers are still working
> on the Telehouse generator. NY Routers are starting to come up.
> 
> cziarhe   20010915 18:25 UTC
> Despite what we were informed of earlier, the Con Ed generator ran
> out of fuel at 16:45 UTC.  Telehouse's generator then started up,
> but overheated again.  There is a fuel truck just outside the cordoned
> area, but it is not being allowed through, and unfortunately we
> have no ETA for the fuel, or time-to-repair for the Telehouse generator.
> 
> cziarhe   20010915 18:15 UTC
> Power was lost again at 17:45 UTC, we are contacting Telehouse for
> information.
> 
> cziarhe   20010915 15:52 UTC
> The current status relayed to us by Telehouse is as follows:
> 
> The Con Ed generator is monitored around the clock by Con Ed personnel,
> who are responsible for refuelling it.  As of 15:00 UTC there was
> approximately 15 hours worth of fuel remaining.
> 
> The water pump on Telehouse's own generator has been replaced and
> the unit has been run for about 30 minutes to test.  Telehouse
> are taking this opportunity to perform some routine maintenance
> (changing oil and filters, pressure washing the radiator) to ensure
> it is ready should it be needed for another extended run.
> 
> Fuel currently available onsite should last until 

Re: [uknof] Example of total DC loss

2017-06-01 Thread Tim Anker
+1 for Hurricane Sandy hitting a few Manhattan and East coast facilities.  I 
recall hearing stories of staff in 75 Broad trudging up and down stairs with 
buckets of diesel (due to pump failures) until they were (probably quite 
sensibly) told to stop.  Other facilities also had issues with diesel, eg algae 
clogging up filters etc after prolonged running etc 

Back here, Gos has featured a few times on this "list of shame", Braham had a 
network issue once due to theft, HEX, Sov (quite recently) but don’t think any 
of these have been days and days.  A bit of Googling will unearth these.  A 
large ISP had a mega network issue a couple of years ago, they have since been 
acquired, haven't the heart to name and shame on here, you're welcome to call 
me if you want the full grizzly details!!  But again, more of a network than 
data centre issue.

Tim

-Original Message-
From: uknof [mailto:uknof-boun...@lists.uknof.org.uk] On Behalf Of James Bensley
Sent: 01 June 2017 14:42
To: Simon Green <simon.gr...@wirehive.com>
Cc: uknof@lists.uknof.org.uk
Subject: Re: [uknof] Example of total DC loss

On 1 June 2017 at 11:50, Simon Green <simon.gr...@wirehive.com> wrote:
> Morning List :)
>
>
>
> I’m hunting for an examples of long duration data centre outages in 
> the UK, from a day of downtime to total data centre loss (explosion or 
> some other industrial accident).
>
>
>
> Is anyone aware of any tails they could share? Bigger and higher 
> impact the better.
>
>
>
> Slightly more casually interested in BT exchanges as well.
>
>
>
> I’m aware of:
>
> · Several corporate incidents, including Three, Capita, and Vodafone
>
> · The Telecity power issues from a few years back, though they were
> less than a day
>


Not a DC outage but the Kings College outage was pretty serious, if you have a 
SPoF be it a single RAID array or single DC, its a SPoF;

https://www.theregister.co.uk/2016/10/25/and_so_we_enter_day_seven_of_kings_college_london_major_it_outage/

https://www.theregister.co.uk/2016/11/15/after_kcl_kills_uniwide_backups_staff_get_order_to_never_make_their_own/

Cheers,
James.



Re: [uknof] Example of total DC loss

2017-06-01 Thread Rob Evans

> As I recall fuzzy though was AC issues caused by dust and they eventually ran 
> out of fuel for the generator as the port authority wouldn't allow tankers 
> onto Manhattan. Might be wrong on that long time ago.

We have an epic ticket from that, I’ll see if I can find a copy.

Rob



signature.asc
Description: Message signed with OpenPGP


Re: [uknof] Example of total DC loss

2017-06-01 Thread Phil Mayers

On 01/06/17 14:41, James Bensley wrote:


Not a DC outage but the Kings College outage was pretty serious, if
you have a SPoF be it a single RAID array or single DC, its a SPoF;


I dislike linking to the 'reg (yuck) but another useful link related the 
the KCL problems is:


https://www.theregister.co.uk/2017/02/23/kcl_external_review/

The review is a surprisingly even-handed document IMO. Some of the 
non-technical findings - p. 12 & 13 - are worth pondering.


Cheers,
Phil



Re: [uknof] Example of total DC loss

2017-06-01 Thread James Bensley
On 1 June 2017 at 11:50, Simon Green  wrote:
> Morning List :)
>
>
>
> I’m hunting for an examples of long duration data centre outages in the UK,
> from a day of downtime to total data centre loss (explosion or some other
> industrial accident).
>
>
>
> Is anyone aware of any tails they could share? Bigger and higher impact the
> better.
>
>
>
> Slightly more casually interested in BT exchanges as well.
>
>
>
> I’m aware of:
>
> · Several corporate incidents, including Three, Capita, and Vodafone
>
> · The Telecity power issues from a few years back, though they were
> less than a day
>


Not a DC outage but the Kings College outage was pretty serious, if
you have a SPoF be it a single RAID array or single DC, its a SPoF;

https://www.theregister.co.uk/2016/10/25/and_so_we_enter_day_seven_of_kings_college_london_major_it_outage/

https://www.theregister.co.uk/2016/11/15/after_kcl_kills_uniwide_backups_staff_get_order_to_never_make_their_own/

Cheers,
James.



Re: [uknof] Example of total DC loss

2017-06-01 Thread Paul Waring

On 01/06/17 11:50, Simon Green wrote:
I’m hunting for an examples of long duration data centre outages in the 
UK, from a day of downtime to total data centre loss (explosion or some 
other industrial accident).


Is anyone aware of any tails they could share? Bigger and higher impact 
the better.


SSP had a major outage towards the end of 2016 where a data centre power 
supply failure damaged their storage system. They supply integration 
software to a significant percentage of insurance brokers, who were 
basically unable to do any work for several weeks:


http://www.insurancebusinessmag.com/uk/news/breaking-news/the-ssp-outage-two-weeks-on-37592.aspx

http://www.theregister.co.uk/2016/09/07/ssp_decommissions_outage_hit_data_centre/

It didn't make the mainstream news (other than El Reg) as SSP isn't a 
household name, but it was the top story within the industry for over a 
month.


--
Paul Waring
Freelance consultant
https://www.pwaring.com



Re: [uknof] Example of total DC loss

2017-06-01 Thread David Derrick

On 01/06/2017 11:50, Simon Green wrote:

Morning List :)

I'm hunting for an examples of long duration data centre outages in
the UK, from a day of downtime to total data centre loss (explosion
or some other industrial accident).


Have you asked BA?

Any particular reason you want UK examples? I can think of several 
examples in the US, like The Planet in Houston and the ones affected by 
Hurricane Sandy.

--
David Derrick
Systems & Network Engineer
Entanet International Ltd
0330 100 0330



Re: [uknof] Example of total DC loss

2017-06-01 Thread Simon Lockhart
As a reminder for the bitrot in the last 16 years...

http://www.slimey.org/bbc_ticket_10083.txt

Simon

On Thu Jun 01, 2017 at 01:26:46PM +, Neil J. McRae wrote:
> As I recall fuzzy though was AC issues caused by dust and they eventually ran 
> out of fuel for the generator as the port authority wouldn't allow tankers 
> onto Manhattan. Might be wrong on that long time ago.
> 
> Sent from my iPad
> 
> On 1 Jun 2017, at 14:22, Rob pickering 
> > wrote:
> 
> 
> Not UK, but Telehouse NY 25 Broadway had a fairly long outage a few days 
> after the initial 9/11 attack. ISTR they were initially OK, but then had a 
> generator problems due to ingress of dust into radiators. It's a long while 
> ago, lots of neural bit rot since then, may not have been 25 Broadway at all, 
> but Nanog archives (if they go back that far) will probably tell you quite a 
> bit of the story.
> 
> On 01/06/2017 11:50, Simon Green wrote:
> Morning List :)
> 
> I???m hunting for an examples of long duration data centre outages in the UK, 
> from a day of downtime to total data centre loss (explosion or some other 
> industrial accident).
> 
> Is anyone aware of any tails they could share? Bigger and higher impact the 
> better.
> 
> Slightly more casually interested in BT exchanges as well.
> 
> I???m aware of:
> 
> · Several corporate incidents, including Three, Capita, and Vodafone
> 
> · The Telecity power issues from a few years back, though they were 
> less than a day
> 
> 
> 
> Simon
> 

-- 
Simon Lockhart |   * Server Co-location * ADSL * Domain Registration *
   Director|  * Domain & Web Hosting * Connectivity * Consultancy * 
  Bogons Ltd   | *  http://www.bogons.net/  *  Email: i...@bogons.net  * 



Re: [uknof] Example of total DC loss

2017-06-01 Thread Neil J. McRae
As I recall fuzzy though was AC issues caused by dust and they eventually ran 
out of fuel for the generator as the port authority wouldn't allow tankers onto 
Manhattan. Might be wrong on that long time ago.

Sent from my iPad

On 1 Jun 2017, at 14:22, Rob pickering 
> wrote:


Not UK, but Telehouse NY 25 Broadway had a fairly long outage a few days after 
the initial 9/11 attack. ISTR they were initially OK, but then had a generator 
problems due to ingress of dust into radiators. It's a long while ago, lots of 
neural bit rot since then, may not have been 25 Broadway at all, but Nanog 
archives (if they go back that far) will probably tell you quite a bit of the 
story.

On 01/06/2017 11:50, Simon Green wrote:
Morning List :)

I’m hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).

Is anyone aware of any tails they could share? Bigger and higher impact the 
better.

Slightly more casually interested in BT exchanges as well.

I’m aware of:

· Several corporate incidents, including Three, Capita, and Vodafone

· The Telecity power issues from a few years back, though they were 
less than a day



Simon



Re: [uknof] Example of total DC loss

2017-06-01 Thread Rob pickering
Not UK, but Telehouse NY 25 Broadway had a fairly long outage a few days 
after the initial 9/11 attack. ISTR they were initially OK, but then had 
a generator problems due to ingress of dust into radiators. It's a long 
while ago, lots of neural bit rot since then, may not have been 25 
Broadway at all, but Nanog archives (if they go back that far) will 
probably tell you quite a bit of the story.



On 01/06/2017 11:50, Simon Green wrote:


Morning List :)

I’m hunting for an examples of long duration data centre outages in 
the UK, from a day of downtime to total data centre loss (explosion or 
some other industrial accident).


Is anyone aware of any tails they could share? Bigger and higher 
impact the better.


Slightly more casually interested in BT exchanges as well.

I’m aware of:

·Several corporate incidents, including Three, Capita, and Vodafone

·The Telecity power issues from a few years back, though they were 
less than a day


Simon





[uknof] Example of total DC loss

2017-06-01 Thread Simon Green
Morning List :)

I'm hunting for an examples of long duration data centre outages in the UK, 
from a day of downtime to total data centre loss (explosion or some other 
industrial accident).

Is anyone aware of any tails they could share? Bigger and higher impact the 
better.

Slightly more casually interested in BT exchanges as well.

I'm aware of:

* Several corporate incidents, including Three, Capita, and Vodafone

* The Telecity power issues from a few years back, though they were 
less than a day



Simon