Re: "Hypothetical" Datacenter Overheating

2024-01-21 Thread Jay R. Ashworth
- Original Message -
> From: "Tom Beecher" 

>> It's certainly one of many possible root causes which someone doing an
>> AAR on an event like this should be thinking about, and looking for in
>> their evaluation of the data they see.
> 
> And I'm sure they are and will.
> 
> By the time that post was made, the vendor had shared multiple updates
> about what the actual cause seemed to be, which were very plausible. An
> unaffiliated 3rd party stating 'maybe an attack!' when there has been no
> observation or information shared that even remotely points to that simply
> spreads FUD for no reason.

I didn't see any of them in the thread, which was the only thing I was paying
attention to, so those are fact not in evidence to *me*.

I didn't see an exclamation point in his comment, which seemed relatively
measured to me.

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth & Associates   http://www.bcp38.info  2000 Land Rover DII
St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647 1274


Re: "Hypothetical" Datacenter Overheating

2024-01-21 Thread Tom Beecher
>
> t's certainly one of many possible root causes which someone doing an
> AAR on an event like this should be thinking about, and looking for in
> their
> evaluation of the data they see.
>

And I'm sure they are and will.

By the time that post was made, the vendor had shared multiple updates
about what the actual cause seemed to be, which were very plausible. An
unaffiliated 3rd party stating 'maybe an attack!' when there has been no
observation or information shared that even remotely points to that simply
spreads FUD for no reason.

I respectfully disagree.



On Sun, Jan 21, 2024 at 1:22 AM Jay R. Ashworth  wrote:

> - Original Message -
> > From: "Tom Beecher" 
> > To: "Lamar Owen" 
> > Cc: nanog@nanog.org
> > Sent: Wednesday, January 17, 2024 8:06:07 PM
> > Subject: Re: "Hypothetical" Datacenter Overheating
>
> >> If these chillers are connected to BACnet or similar network, then I
> >> wouldn't rule out the possibility of an attack.
> >
> > Don't insinuate something like this without evidence. Completely
> > unreasonable and inappropriate.
>
> WADR, horsecrap.
>
> It's certainly one of many possible root causes which someone doing an
> AAR on an event like this should be thinking about, and looking for in
> their
> evaluation of the data they see.
>
> He didn't *accuse* anyone, which would be out of bounds.
>
> Cheers,
> -- jra
> --
> Jay R. Ashworth  Baylink
> j...@baylink.com
> Designer The Things I Think   RFC
> 2100
> Ashworth & Associates   http://www.bcp38.info  2000 Land
> Rover DII
> St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647
> 1274
>


Re: "Hypothetical" Datacenter Overheating

2024-01-20 Thread Jay R. Ashworth
- Original Message -
> From: "Tom Beecher" 
> To: "Lamar Owen" 
> Cc: nanog@nanog.org
> Sent: Wednesday, January 17, 2024 8:06:07 PM
> Subject: Re: "Hypothetical" Datacenter Overheating

>> If these chillers are connected to BACnet or similar network, then I
>> wouldn't rule out the possibility of an attack.
> 
> Don't insinuate something like this without evidence. Completely
> unreasonable and inappropriate.

WADR, horsecrap.

It's certainly one of many possible root causes which someone doing an
AAR on an event like this should be thinking about, and looking for in their
evaluation of the data they see.

He didn't *accuse* anyone, which would be out of bounds.

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth & Associates   http://www.bcp38.info  2000 Land Rover DII
St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647 1274


Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Glenn McGurrin via NANOG
I'm actually referring to something like, I've not yet had a system 
where they have made sense, I mostly deal with either places where I 
have no say in the hvac or very small server rooms, but I've thought 
these were an interesting concept since I first saw them years ago.


https://www.chiltrix.com/server-room-chiller.html

quoting from the page:

The Chiltrix SE Server Room Edition adds a "free cooling" option to CX34.

Server rooms need cooling all year, even when it is cold outside. If you 
operate in a northern area with cold winters, this option is for you.


When outdoor temperatures drop below 38F, the CX34 glycol-water loop is 
automatically extended through a special water-to-air heat exchanger to 
harvest outdoor cold ambient conditions to pre-cool the glycol-water 
loop so that the CX34 variable speed compressor can drop to a very slow 
speed and consume less power. This can save about 50% off of it's 
already low power consumption without lowering capacity.


At and below 28F, the CX34 chiller with Free Cooling SE add-on will turn 
off the compressor entirely and still be able to maintain its rated 
cooling capacity using only the variable speed pump and fan motors. At 
this point, the CX34 achieves a COP of of >41 and EER of >141.


Enjoy the savings of 2 tons of cooling for less than 75 watts. The 
colder it gets, the less water flow rate is needed, allowing the VSD 
pump power draw to drop under 20 watts.


Depending on location, for some customers free cooling mode can be 
active up to 3 months per year during the daytime and up to 5 months per 
year at night.


On 1/17/2024 3:10 PM, Izaac wrote:

On Wed, Jan 17, 2024 at 12:07:42AM -0500, Glenn McGurrin via NANOG wrote:

Free air cooling loops maybe? (Not direct free air cooling with air
exchange, the version with something much like an air handler outside with a
coil and an fan running cold outside air over the coil with the water/glycol
that would normally be the loop off of the chiller) the primary use of them
is cost savings by using less energy to cool when it's fairly cold out, but
it can also prevent low temperature issues on compressors by not running
them when it's cold.  I'd expect it would not require the same sort of
facade changes as it could be on the roof and depending only need
water/glycol lines into the space, depending on cooling tower vs air cooled
and chiller location it could also potentially use the same piping (which I
think is the traditional use).


You're looking for these: https://en.wikipedia.org/wiki/Thermal_wheel

Basically, an aluminum honeycomb wheel.  One half of its housing is an
air duct "outside" while the other half is an air duct that's "inside."
Cold outside air blows through the straws and cools the metal.  Wheel
rotates slowly.  That straw is now "inside."  Inside air blows through
it and deposits heat onto the metal.  Turn turn turn.

A surprisingly effective way to lower heating/cooling costs.  Basically
"free," as you just need to turn it on the bearing.  Do you get deposits
in the comb?  Yes, if you don't filter properly.  Do you get
condensation in the comb?  Yeah.  Treat it with desiccants.



Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Lamar Owen

On 1/15/24 10:14, sro...@ronan-online.com wrote:

I’m more interested in how you lose six chillers all at once.

According to a post on a support forum for one of the clients in that 
space: "We understand the issue is due to snow on the roof affecting the 
cooling equipment."


Never overlook the simplest single points of failure.  Snow on cooling 
tower fan bladesfailed fan motors are possible or even likely at 
that point.  Assuming the airflow won't be clogged; conceptually much 
like the issue in having multiple providers for redundancy but they're 
all in the same cable or conduit.




Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Lamar Owen

On 1/17/24 20:06, Tom Beecher wrote:


If these chillers are connected to BACnet or similar network, then
I wouldn't rule out the possibility of an attack.


Don't insinuate something like this without evidence. Completely 
unreasonable and inappropriate.


I wasn't meaning to insinuate anything; it's as much of a reasonable 
possibility as any other these days.


Perhaps I should have worded it differently: "if my small data centers' 
chillers were connected to some building management network such as 
BACnet and all of them went down concurrently I would be investigating 
my building management network for signs of intrusion in addition to 
checking other items, such as shared points of failure in things like 
chilled water pumps, electrical supply, emergency shut-off circuits, 
chiller/closed-loop configurations for various temperature, pressure, 
and flow set points, etc."  Bit more wordy, but doesn't have the same 
implication.  But I would think it unreasonable, if I were to find 
myself in this situation in my own operations, to rule any possibility 
out that can explain simultaneous shutdowns.


And this week we did have a chiller go out on freeze warning, but the DC 
temp never made it quite up to 80F before the temperature raised back 
into double digits and the chiller restarted.


Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Mike Hammett





- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 

- Original Message -

From: "Tom Beecher"  
To: "Mike Hammett"  
Cc: sro...@ronan-online.com, "NANOG"  
Sent: Thursday, January 18, 2024 9:19:09 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 




Well right, which came well after the question was posited here. 




Wasn't poo pooing the question, just sharing the information as I didn't see 
that cited otherwise in this thread. 


On Thu, Jan 18, 2024 at 10:15 AM Mike Hammett < na...@ics-il.net > wrote: 





Well right, which came well after the question was posited here. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 



From: "Tom Beecher" < beec...@beecher.cc > 
To: "Mike Hammett" < na...@ics-il.net > 
Cc: sro...@ronan-online.com , "NANOG" < nanog@nanog.org > 
Sent: Thursday, January 18, 2024 9:00:34 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 




and none in the other two facilities you operate in that same building had any 
failures. 




Quoting directly from their outage ticket updates : 



CH2 does not have chillers, cooling arrangement is DX CRACs manufactured by 
another company. CH3 has Smart chillers but are water cooled not air cooled so 
not susceptible to cold ambient air temps as they are indoor chillers. 







On Mon, Jan 15, 2024 at 10:19 AM Mike Hammett < na...@ics-il.net > wrote: 





and none in the other two facilities you operate in that same building had any 
failures. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 



From: sro...@ronan-online.com 
To: "Mike Hammett" < na...@ics-il.net > 
Cc: "NANOG" < nanog@nanog.org > 
Sent: Monday, January 15, 2024 9:14:49 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 



I’m more interested in how you lose six chillers all at once. 


Shane 



On Jan 15, 2024, at 9:11 AM, Mike Hammett < na...@ics-il.net > wrote: 







Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What should be expected in the aftermath? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 












Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Tom Beecher
>
> Well right, which came well after the question was posited here.


Wasn't poo pooing the question, just sharing the information as I didn't
see that cited otherwise in this thread.

On Thu, Jan 18, 2024 at 10:15 AM Mike Hammett  wrote:

> Well right, which came well after the question was posited here.
>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
> --
> *From: *"Tom Beecher" 
> *To: *"Mike Hammett" 
> *Cc: *sro...@ronan-online.com, "NANOG" 
> *Sent: *Thursday, January 18, 2024 9:00:34 AM
> *Subject: *Re: "Hypothetical" Datacenter Overheating
>
> and none in the other two facilities you operate in that same building had
>> any failures.
>
>
> Quoting directly from their outage ticket updates :
>
> CH2 does not have chillers, cooling arrangement is DX CRACs manufactured
>> by another company. CH3 has Smart chillers but are water cooled not air
>> cooled so not susceptible to cold ambient air temps as they are indoor
>> chillers.
>
>
>
>
> On Mon, Jan 15, 2024 at 10:19 AM Mike Hammett  wrote:
>
>> and none in the other two facilities you operate in that same building
>> had any failures.
>>
>>
>>
>> -
>> Mike Hammett
>> Intelligent Computing Solutions <http://www.ics-il.com/>
>> <https://www.facebook.com/ICSIL>
>> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
>> <https://www.linkedin.com/company/intelligent-computing-solutions>
>> <https://twitter.com/ICSIL>
>> Midwest Internet Exchange <http://www.midwest-ix.com/>
>> <https://www.facebook.com/mdwestix>
>> <https://www.linkedin.com/company/midwest-internet-exchange>
>> <https://twitter.com/mdwestix>
>> The Brothers WISP <http://www.thebrotherswisp.com/>
>> <https://www.facebook.com/thebrotherswisp>
>> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
>> --
>> *From: *sro...@ronan-online.com
>> *To: *"Mike Hammett" 
>> *Cc: *"NANOG" 
>> *Sent: *Monday, January 15, 2024 9:14:49 AM
>> *Subject: *Re: "Hypothetical" Datacenter Overheating
>>
>> I’m more interested in how you lose six chillers all at once.
>>
>> Shane
>>
>> On Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote:
>>
>> 
>> Let's say that hypothetically, a datacenter you're in had a cooling
>> failure and escalated to an average of 120 degrees before mitigations
>> started having an effect. What are normal QA procedures on your behalf?
>> What is the facility likely to be doing? What  should be expected in the
>> aftermath?
>>
>>
>>
>> -
>> Mike Hammett
>> Intelligent Computing Solutions <http://www.ics-il.com/>
>> <https://www.facebook.com/ICSIL>
>> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
>> <https://www.linkedin.com/company/intelligent-computing-solutions>
>> <https://twitter.com/ICSIL>
>> Midwest Internet Exchange <http://www.midwest-ix.com/>
>> <https://www.facebook.com/mdwestix>
>> <https://www.linkedin.com/company/midwest-internet-exchange>
>> <https://twitter.com/mdwestix>
>> The Brothers WISP <http://www.thebrotherswisp.com/>
>> <https://www.facebook.com/thebrotherswisp>
>> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
>>
>>
>>
>


Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Mike Hammett
Well right, which came well after the question was posited here. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 

- Original Message -

From: "Tom Beecher"  
To: "Mike Hammett"  
Cc: sro...@ronan-online.com, "NANOG"  
Sent: Thursday, January 18, 2024 9:00:34 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 




and none in the other two facilities you operate in that same building had any 
failures. 




Quoting directly from their outage ticket updates : 



CH2 does not have chillers, cooling arrangement is DX CRACs manufactured by 
another company. CH3 has Smart chillers but are water cooled not air cooled so 
not susceptible to cold ambient air temps as they are indoor chillers. 







On Mon, Jan 15, 2024 at 10:19 AM Mike Hammett < na...@ics-il.net > wrote: 





and none in the other two facilities you operate in that same building had any 
failures. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 



From: sro...@ronan-online.com 
To: "Mike Hammett" < na...@ics-il.net > 
Cc: "NANOG" < nanog@nanog.org > 
Sent: Monday, January 15, 2024 9:14:49 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 



I’m more interested in how you lose six chillers all at once. 


Shane 



On Jan 15, 2024, at 9:11 AM, Mike Hammett < na...@ics-il.net > wrote: 







Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What should be expected in the aftermath? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 









Re: "Hypothetical" Datacenter Overheating

2024-01-18 Thread Tom Beecher
>
> and none in the other two facilities you operate in that same building had
> any failures.


Quoting directly from their outage ticket updates :

CH2 does not have chillers, cooling arrangement is DX CRACs manufactured by
> another company. CH3 has Smart chillers but are water cooled not air cooled
> so not susceptible to cold ambient air temps as they are indoor chillers.




On Mon, Jan 15, 2024 at 10:19 AM Mike Hammett  wrote:

> and none in the other two facilities you operate in that same building had
> any failures.
>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
> ------
> *From: *sro...@ronan-online.com
> *To: *"Mike Hammett" 
> *Cc: *"NANOG" 
> *Sent: *Monday, January 15, 2024 9:14:49 AM
> *Subject: *Re: "Hypothetical" Datacenter Overheating
>
> I’m more interested in how you lose six chillers all at once.
>
> Shane
>
> On Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote:
>
> 
> Let's say that hypothetically, a datacenter you're in had a cooling
> failure and escalated to an average of 120 degrees before mitigations
> started having an effect. What are normal QA procedures on your behalf?
> What is the facility likely to be doing? What  should be expected in the
> aftermath?
>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
>
>
>


Re: "Hypothetical" Datacenter Overheating

2024-01-17 Thread Tom Beecher
>
> If these chillers are connected to BACnet or similar network, then I
> wouldn't rule out the possibility of an attack.
>

Don't insinuate something like this without evidence. Completely
unreasonable and inappropriate.

On Wed, Jan 17, 2024 at 8:31 AM Lamar Owen  wrote:

> >This sort of mass failure seems to point
> >towards either design issues (like equipment >selection/configuration vs
> >temperature range for the location), systemic maintenance issues, or
> >some sort of single failure point that could take all the chillers out,
> >none of which I'd be happy to see in a data center.
>
>
>
> If these chillers are connected to BACnet or similar network, then I
> wouldn't rule out the possibility of an attack.
>


Re: "Hypothetical" Datacenter Overheating

2024-01-17 Thread JoeSox
Great question. Stop powering off the non-essential equipment and get
natural air going in a flow.
And then start praying and watching the utility's GIS outage map. lol
--
Later, Joe


On Mon, Jan 15, 2024 at 2:26 PM Mike Hammett  wrote:

> Let's say that hypothetically, a datacenter you're in had a cooling
> failure and escalated to an average of 120 degrees before mitigations
> started having an effect. What are normal QA procedures on your behalf?
> What is the facility likely to be doing? What  should be expected in the
> aftermath?
>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions 
> 
> 
> 
> 
> Midwest Internet Exchange 
> 
> 
> 
> The Brothers WISP 
> 
> 
>


Re: "Hypothetical" Datacenter Overheating

2024-01-17 Thread Izaac
On Wed, Jan 17, 2024 at 12:07:42AM -0500, Glenn McGurrin via NANOG wrote:
> Free air cooling loops maybe? (Not direct free air cooling with air
> exchange, the version with something much like an air handler outside with a
> coil and an fan running cold outside air over the coil with the water/glycol
> that would normally be the loop off of the chiller) the primary use of them
> is cost savings by using less energy to cool when it's fairly cold out, but
> it can also prevent low temperature issues on compressors by not running
> them when it's cold.  I'd expect it would not require the same sort of
> facade changes as it could be on the roof and depending only need
> water/glycol lines into the space, depending on cooling tower vs air cooled
> and chiller location it could also potentially use the same piping (which I
> think is the traditional use).

You're looking for these: https://en.wikipedia.org/wiki/Thermal_wheel

Basically, an aluminum honeycomb wheel.  One half of its housing is an
air duct "outside" while the other half is an air duct that's "inside."
Cold outside air blows through the straws and cools the metal.  Wheel
rotates slowly.  That straw is now "inside."  Inside air blows through
it and deposits heat onto the metal.  Turn turn turn.

A surprisingly effective way to lower heating/cooling costs.  Basically
"free," as you just need to turn it on the bearing.  Do you get deposits
in the comb?  Yes, if you don't filter properly.  Do you get
condensation in the comb?  Yeah.  Treat it with desiccants.

-- 
. ___ ___  .   .  ___
.  \/  |\  |\ \
.  _\_ /__ |-\ |-\ \__


Re: "Hypothetical" Datacenter Overheating

2024-01-17 Thread Lamar Owen
>This sort of mass failure seems to point
>towards either design issues (like equipment >selection/configuration vs 
>temperature range for the location), systemic maintenance issues, or 
>some sort of single failure point that could take all the chillers out, 
>none of which I'd be happy to see in a data center.



If these chillers are connected to BACnet or similar network, then I wouldn't 
rule out the possibility of an attack.

Re: "Hypothetical" Datacenter Overheating

2024-01-17 Thread Saku Ytti
On Wed, 17 Jan 2024 at 03:18,  wrote:

> Others have pointed to references, I found some others, it's all
> pretty boring but perhaps one should embrace the general point that
> some equipment may not like abrupt temperature changes.

Can you share them? Only one I've found is:
https://www.ashrae.org/file%20library/technical%20resources/bookstore/supplemental%20files/referencecard_2021thermalguidelines.pdf

Which quotes 20c/h, which is a much higher rate than almost anyone has
ability to perform in their DC ambient. But it makes no explanation
where this comes from.

I believe in reality there is immense complexity here
 - Gradient depends on processes and materials used in
manufacturing (like pre/post ROHS will certainly have different
gradient)
 - Gradient has directionality, unlike ASHRAE quotes, because
devices are engineered to go from 20C to 90C in very short moment,
when turned on, but there was less engineering pressure for similar
cooling rates
 - Gradient has positionality going 20C between any two pairs does
not mean equal risk

And likely no one knows well, because no one has had to know well,
because it's not expensive enough to derisk.

But what we do know well
- ASHRAE quotes rate which you are unlikely to be able to hit
- Devices that travel with you, regularly see 50c instant ambient
gradients, both directions, multiple times a day
- Devices see large fast gradients when turned on, but slower when
turned off
- Compute people quote ASHRAE, Networking people appear not to,
perhaps like you say spindles is the ultimately reason for the limits
to exist

I think generally we have bias in that we like to identify risks and
then add them as organisational knowledge, but ultimately all these
new rules and exceptions you introduce, increase cost, complexity,
reduce efficiency and productivity. So we should be very critical
about them. It is fine to realise risks, and use realised risks as
data to analyse if avoiding those risks makes sense. It's very easy to
build poorly defined rules over poorly defined rules and arrive in
high cost, low efficiency operations.
Like this 'few centigrades per hour' is an exceedingly palatable
rule-of-thumb, it sounds good, unless you stop to think about it.

I would not recommend spending any time or money derisking gradients,
I would hope that rules that redisk condensation are enough to cover
derisking gradients and I would re-evaluate after sufficient realised
risks.
-- 
  ++ytti


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Glenn McGurrin via NANOG
Free air cooling loops maybe? (Not direct free air cooling with air 
exchange, the version with something much like an air handler outside 
with a coil and an fan running cold outside air over the coil with the 
water/glycol that would normally be the loop off of the chiller) the 
primary use of them is cost savings by using less energy to cool when 
it's fairly cold out, but it can also prevent low temperature issues on 
compressors by not running them when it's cold.  I'd expect it would not 
require the same sort of facade changes as it could be on the roof and 
depending only need water/glycol lines into the space, depending on 
cooling tower vs air cooled and chiller location it could also 
potentially use the same piping (which I think is the traditional use).


I'm also fairly curious to see the root cause analysis, also hoping 
someone is at least looking at some mechanism to allow transferring 
chiller capacity between floors if they had multiple floors and only had 
the failure on one floor.  This sort of mass failure seems to point 
towards either design issues (like equipment selection/configuration vs 
temperature range for the location), systemic maintenance issues, or 
some sort of single failure point that could take all the chillers out, 
none of which I'd be happy to see in a data center.


Anyone have any idea what the total cost of this incident is likely to 
reach (dead equipment, etc.)


On 1/16/2024 4:08 PM, Sean Donelan wrote:


350 Cermak Chicago is a "historic" building which means you can't change 
the visible outside.  Someone had long discussions about the benefits of 
outside air economizers, but can't change the windows.  Need to hide 
HVAC plant (as much as possible).


I would design all colos to look like 375 Pearl St (formerly Verizon, 
formerly AT) New York.  Vents and concrete.


Almost all the windows visible on the outside of 350 Cermak Chicago are 
"fake." They are enclosed on the inside (with fake indoor decor) because 
of 1912 glass panes aren't very weatherproof. But they preserve the look 
and feel of the neighborhood :-)



350 Cermak rebuilt as a colo is over 20-years old. It will be 
interesting to read the final root cause analysis.


Of course, as always, networks and data centers should not depend on a 
single POP.  Diversify the redudancy, because something will always fail.


There are multiple POP/IXP in major cities.  And multiple cities with 
POPs and IXPs.


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread bzs


For completeness' sake at the first commercial ISP to sell individual
dial-up to the public, The World, we had six of those typical desktop
2400bps modems (I forget the brand tho I still have them, a photo also
I think) sitting on a file cabinet in an office space in Brookline, MA
plugged into a Sun 4/280. I bought those modems from a local computer
retail store on my own personal credit card thinking maybe others
would like to try this internet thing, and we'd just gotten access to
a T1.

On January 17, 2024 at 09:28 ka...@biplane.com.au (Karl Auer) wrote:
 > On Tue, 2024-01-16 at 10:44 -0800, Jay Hennigan wrote:
 > > We made our own. And then we had to deal with all the wall warts. We 
 > > rigged up a power supply with a big snake of barrel jacks.
 > 
 > Luxury. We had a hamster in a hamster wheel for each modem.
 > 
 > Ah, the old days.
 > 
 > Regards, K.
 > 
 > -- 
 > ~~~
 > Karl Auer (ka...@biplane.com.au)
 > http://www.biplane.com.au/kauer
 > 

-- 
-Barry Shein

Software Tool & Die| b...@theworld.com | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread bzs


Others have pointed to references, I found some others, it's all
pretty boring but perhaps one should embrace the general point that
some equipment may not like abrupt temperature changes.

But phones (well, modern mobile phones) don't generally have moving
parts.

So the issue is more likely with things like hard drives, the kind
with fast spinning platters with heads flying microns above those
platters, or even just fans and similar gear.

It leads me to another question which is that IN THE BEFORE DAYS you
had to wait to remove a removable disk until you were sure it was spun
down, maybe 30 seconds or so. If you lifted one and felt that
gyroscopic pull you may well have toasted it.

Today we have spinning disks in laptops you can toss across the room
or whatever so clearly they solved that problem. And they're
apparently a lot more tolerant of temperature and other environmental
changes.

So the tolerances may be much greater than one is ever likely to run
into.

I'm still not sure I'd be comfortable opening the windows to let
sub-freezing air into a 120F room with petabytes of spinning rust.

P.S. Please don't tell me what an SSD is. Yes they're probably much
more tolerant of environmental changes.

On January 16, 2024 at 09:08 s...@ytti.fi (Saku Ytti) wrote:
 > On Tue, 16 Jan 2024 at 08:51,  wrote:
 > 
 > > A rule of thumb is a few degrees per hour change but YMMV, depends on
 > > the equipment. Sometimes manufacturer's specs include this.
 > 
 > Is this common sense, or do you have reference to this, like paper
 > showing at what temperature change at what rate occurs what damage?
 > 
 > I regularly bring fine electronics, say iPhone, through significant
 > temperature gradients, as do most people who have to live in places
 > where inside and outside can be wildly different temperatures, with no
 > particular observable effect. iPhone does go into 'thermometer' mode,
 > when it overheats though.
 > 
 > Manufacturers, say Juniper and Cisco describe humidity, storage and
 > operating temperatures, but do not define temperature change rate.
 > Does NEBS have an opinion on this, or is this just a common case of
 > yours?
 > 
 > -- 
 >   ++ytti

-- 
-Barry Shein

Software Tool & Die| b...@theworld.com | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Karl Auer
On Tue, 2024-01-16 at 10:44 -0800, Jay Hennigan wrote:
> We made our own. And then we had to deal with all the wall warts. We 
> rigged up a power supply with a big snake of barrel jacks.

Luxury. We had a hamster in a hamster wheel for each modem.

Ah, the old days.

Regards, K.

-- 
~~~
Karl Auer (ka...@biplane.com.au)
http://www.biplane.com.au/kauer




Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Sean Donelan



350 Cermak Chicago is a "historic" building which means you can't change 
the visible outside.  Someone had long discussions about the benefits of 
outside air economizers, but can't change the windows.  Need to hide 
HVAC plant (as much as possible).


I would design all colos to look like 375 Pearl St (formerly Verizon, 
formerly AT) New York.  Vents and concrete.


Almost all the windows visible on the outside of 350 Cermak Chicago are 
"fake." They are enclosed on the inside (with fake indoor decor) because 
of 1912 glass panes aren't very weatherproof. But they preserve the look 
and feel of the neighborhood :-)



350 Cermak rebuilt as a colo is over 20-years old. It will be interesting 
to read the final root cause analysis.


Of course, as always, networks and data centers should not depend on a 
single POP.  Diversify the redudancy, because something will always fail.


There are multiple POP/IXP in major cities.  And multiple cities with POPs 
and IXPs.


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Jay Hennigan

On 1/16/24 10:33, Shawn L via NANOG wrote:
I remember those days --- I think we bought cages from someone and 
pulled the boards out of the modems to mount them.


We made our own. And then we had to deal with all the wall warts. We 
rigged up a power supply with a big snake of barrel jacks.


Livingston Portmaster 2s, of course.

--
Jay Hennigan - j...@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV



Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread William Herrin
On Tue, Jan 16, 2024 at 10:30 AM Chris Adams  wrote:
> The back-room ISP I started at was at least owned by a company with
> their own small machine shop, so we had them make plates we could mount
> two Sportsters (sans top) to and slide them into card cages. 20 modems
> in a 5U cage!

The ISP where I worked bought wall-mounted wire shelves and routed the
serial cable through the shelf to stably hold the modems vertically in
place and apart from each other. Worked pretty well. The open shelving
let air move up past them keeping them cool and offered plenty of tie
points for cable management.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Izaac
On Tue, Jan 16, 2024 at 08:37:09AM -0800, Warren Kumari wrote:
> ISP/Colo provider

The good ole days.  When one stacked modems with two pencils in between
them and box fans blew through the gaps.

-- 
. ___ ___  .   .  ___
.  \/  |\  |\ \
.  _\_ /__ |-\ |-\ \__


RE: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Robert Mercier
For gear climate controls and rate of change look at ASHRAE ratings on the 
equipment.  Most vendors publish the rating of equipment on their data sheets 
(sometimes they include the ASHRAE rating), and it gives the required operating 
conditions as well as acceptable rates of change.  Most well run data centres 
follow these recommendations; this “hypothetical” data centre normally does as 
well, but may have missed some maintenance tasks it appears.  I have equipment 
in the same building which hasn’t been effected in another providers suite.



The ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning 
Engineers) has a committee - ASHRAE Technical Committee 9.9 that covers Mission 
Critical Facilities, Data Centers, Technology Spaces and Electronic Equipment.



2021 Data Center Cooling Resiliency Brief

https://tpc.ashrae.org/Documents?cmtKey=fd4a4ee6-96a3-4f61-8b85-43418dfa988d



2016 ASHRAE Data Center Power Equipment Thermal Guidelines and Best Practices

https://tpc.ashrae.org/Documents?cmtKey=fd4a4ee6-96a3-4f61-8b85-43418dfa988d



2020 Cold Weather Shipping Acclimation and Best Practices (included this one 
because it is fitting this time of year)

https://tpc.ashrae.org/FileDownload?idx=809784d5-911b-4e9a-a2da-ff3ab6ff9eea





BTW; it hit -50°C (-58°F) in Alberta last week and you aren’t hearing about the 
data centres in that province going offline.  The record low for Chicago was 
-27°F set in 1985; this building wasn’t a data centre at that time, and only 
became a data centre in 1999 so they would have known how cold it could get 
there when they did the initial system planning and should have accounted for 
this.





Rob







[cid:25-3_b9d46eb0-9a15-418d-a30b-ea51cacaf6f8.png]
Robert Mercier

CTO

Next Dimension Inc.




[mobilePhone]
Tel: 1-800-461-0585 ext 421
[emailAddress]
rmerc...@nextdimensioninc.com
[website]
www.nextdimensioninc.com<https://www.nextdimensioninc.com>







[facebook]<https://www.facebook.com/ndinc.ca>
[twitter] <https://twitter.com/NextDimensionCA>
[linkedin] <https://www.linkedin.com/company/next-dimension-inc.>
[instagram] <https://www.instagram.com/next.dimension.inc>




-Original Message-
From: NANOG  On Behalf 
Of Saku Ytti
Sent: January 16, 2024 2:09 AM
To: b...@theworld.com
Cc: NANOG 
Subject: Re: "Hypothetical" Datacenter Overheating



On Tue, 16 Jan 2024 at 08:51, mailto:b...@theworld.com>> 
wrote:



> A rule of thumb is a few degrees per hour change but YMMV, depends on

> the equipment. Sometimes manufacturer's specs include this.



Is this common sense, or do you have reference to this, like paper showing at 
what temperature change at what rate occurs what damage?



I regularly bring fine electronics, say iPhone, through significant temperature 
gradients, as do most people who have to live in places where inside and 
outside can be wildly different temperatures, with no particular observable 
effect. iPhone does go into 'thermometer' mode, when it overheats though.



Manufacturers, say Juniper and Cisco describe humidity, storage and operating 
temperatures, but do not define temperature change rate.

Does NEBS have an opinion on this, or is this just a common case of yours?



--

  ++ytti


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Warren Kumari
On Mon, Jan 15, 2024 at 9:55 AM, William Herrin  wrote:

> On Mon, Jan 15, 2024 at 6:08 AM Mike Hammett  wrote:
>
> Let's say that hypothetically, a datacenter you're in had a cooling
> failure and escalated to an average of 120 degrees before mitigations
> started having an effect. What should be expected in the aftermath?
>
> Hi Mike,
>
> A decade or so ago I maintained a computer room with a single air
> conditioner because the boss wouldn't go for n+1. It failed in exactly this
> manner several times.
>


And in the early 2000s I worked at a (very crappy) ISP/Colo provider which
had their primary locations in a small, brick garage. It *did* have
redundant AC — in the form of two large window units, stuck into a hole
which had been hacked through the brick wall. They were redundant — there
were two of them, and they were on separate circuits. What more could you
ask for?!

At 2AM one morning I'm awakened from my slumber by a warning page from the
monitoring system (Whatsup Gold. Remember Whatsup Gold?) letting me know
that the temperature is out of range. This is a fairly common occurrences,
so I ack it and go back to sleep. A short while later I'm awakened again,
and this time it's a critical alert and the temperature is really high.

So, I grumble, get dressed, and drive over to the location. I open the
door, and, yes, it really *is* hot. This is because the AC units have been
vibrating over the years, and the entire row of bricks above have popped
out. There is now an even larger hole in the wall, and both AC units are
lying outside, still running.

'Twas not a good day….
W




After the overheat was detected by the monitoring system, it would be
> brought under control with a combination of spot cooler and powering down
> to a minimal configuration. But of course it takes time to get people there
> and set up the mitigations, during which the heat continues to rise.
>
> The main thing I noticed was a modest uptick in spinning drive failures
> for the couple months that followed. If there was any other consequence it
> was at a rate where I'd have had to be carefully measuring before and after
> to detect it.
>
> Regards,
> Bill Herrin
>
> --
> William Herrin
> b...@herrin.us
> https://bill.herrin.us/
>


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Izaac
On Mon, Jan 15, 2024 at 10:14:49AM -0500, sro...@ronan-online.com wrote:
> I’m more interested in how you lose six chillers all at once.

Because you're probably mistaking an air handling unit for a chiller.

I usually point people at this to get us on the same page:
https://www.youtube.com/watch?v=1cvFlBLo4u0

If you are not so mistaken, it is important to realize that neither
roofs nor utility risers are infinitely wide to accommodate the six
independent cooling towers and refrigerant lines that would thus be
required for a single floor.  They're sharing something somewhere.

-- 
. ___ ___  .   .  ___
.  \/  |\  |\ \
.  _\_ /__ |-\ |-\ \__


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Ray Bellis

On 16/01/2024 01:32, Mike Hammett wrote:

Someone I talked to while on scene today said their area got to 130 and 
cooked two core routers.


We've lost one low-end switch.  I'm very glad it wasn't two core routers!

We're still looking into what recourse we have against the datacenter 
operator.


Ray


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Saku Ytti
On Tue, 16 Jan 2024 at 12:22, Nathan Ward  wrote:

> Here’s some manufacturer specs:
> https://www.dell.com/support/manuals/en-nz/poweredge-r6515/per6515_ts_pub/environmental-specifications?guid=guid-debd273c-0dc8-40d8-abbc-be059a0ce59c=en-us
>
> 3rd section, “Maximum temperature gradient”.

Thanks. It seems quite many compute context quote ASHRAE gradients,
but in networking kit context it seems very rarely quoted (unless
indirectly via NEBS), while I wouldn't expect intuitively their
tolerances to be significantly different.

-- 
  ++ytti


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Nathan Ward via NANOG
On 16/01/2024 at 10:50:13 PM, Saku Ytti  wrote:

> On Tue, 16 Jan 2024 at 11:00, William Herrin  wrote:
>
> You have a computer room humidified to 40% and you inject cold air
>
> below the dew point. The surfaces in the room will get wet.
>
>
> I think humidity and condensation is well understood and indeed
> documented but by NEBS and vendors as verboten.
>
> I am more interested in temperature changes when not condensating and
> causing water damage. Like we could theorise, some soldering will
> expand/contract too fast, breaking or various other types of scenarios
> one might guess without context, and indeed electronics often have to
> experience large temperature gradients and appear to survive.
> When you turn these things on, various parts rapidly heat from ambient
> to 80-90c. So I have some doubts if this is actually a problem you
> need to consider, in absence of condensation.
>

Here’s some manufacturer specs:

https://www.dell.com/support/manuals/en-nz/poweredge-r6515/per6515_ts_pub/environmental-specifications?guid=guid-debd273c-0dc8-40d8-abbc-be059a0ce59c=en-us

3rd section, “Maximum temperature gradient”.

>From memory, the management cards alarm when the gradient is exceeded, too.

--
Nathan Ward


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Saku Ytti
On Tue, 16 Jan 2024 at 11:00, William Herrin  wrote:

> You have a computer room humidified to 40% and you inject cold air
> below the dew point. The surfaces in the room will get wet.

I think humidity and condensation is well understood and indeed
documented but by NEBS and vendors as verboten.

I am more interested in temperature changes when not condensating and
causing water damage. Like we could theorise, some soldering will
expand/contract too fast, breaking or various other types of scenarios
one might guess without context, and indeed electronics often have to
experience large temperature gradients and appear to survive.
When you turn these things on, various parts rapidly heat from ambient
to 80-90c. So I have some doubts if this is actually a problem you
need to consider, in absence of condensation.

-- 
  ++ytti


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread Bryan Holloway



On 1/15/24 23:11, Martin Hannigan wrote:



On Mon, Jan 15, 2024 at 4:10 PM Jay Hennigan > wrote:


On 1/15/24 10:37, Pennington, Scott wrote:
 > yes but it has been -8 in Chicago plenty of times before this.
 >   Very interested in root cause...

Absolutely. My point was that claiming "Global warming" isn't going to
fly as an excuse.


+1

Is their design N+1?

https://www.equinix.com/data-centers/americas-colocation/united-states-colocation/chicago-data-centers/ch1
 


We're not smashing temp records in Chicago. At least it doesn't seem so 
when you look across historical data:


https://www.weather.gov/lot/Chicago_Temperature_Records 



HTH,

-M<


I was at that "hypothetical" location once when it was -17º F ...

No issues then ...

I really have to wonder how six chillers all failed at once.


Re: "Hypothetical" Datacenter Overheating

2024-01-16 Thread William Herrin
On Mon, Jan 15, 2024 at 11:08 PM Saku Ytti  wrote:
> On Tue, 16 Jan 2024 at 08:51,  wrote:
> > A rule of thumb is a few degrees per hour change but YMMV, depends on
> > the equipment. Sometimes manufacturer's specs include this.
>
> Is this common sense, or do you have reference to this, like paper
> showing at what temperature change at what rate occurs what damage?

It's uncommon sense.

You have a computer room humidified to 40% and you inject cold air
below the dew point. The surfaces in the room will get wet.

See also: https://en.wikipedia.org/wiki/Thermal_stress

Regards,
Bill Herrin

-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread sronan
Good thing there are no windows at this “hypothetical” location :)

> On Jan 16, 2024, at 1:51 AM, b...@theworld.com wrote:
> 
> 
> Something worth a thought is that as much as devices don't like being
> too hot they also don't like to have their temperature change too
> quickly. Parts can expand/shrink variably depending on their
> composition.
> 
> A rule of thumb is a few degrees per hour change but YMMV, depends on
> the equipment. Sometimes manufacturer's specs include this.
> 
> Throwing open the windows on a winter day to try to rapidly bring the
> room down to a "normal" temperature may do more harm than good.
> 
> It might be worthwhile figuring out what is reasonable in advance with
> buy-in rather than in a panic because, from personal experience,
> someone will be screaming in your ear JUST OPEN ALL THE WINDOWS
> WHADDYA STUPID?
> 
>> On January 15, 2024 at 09:23 clay...@mnsi.net (Clayton Zekelman) wrote:
>> 
>> 
>> 
>> At 09:08 AM 2024-01-15, Mike Hammett wrote:
>>> Let's say that hypothetically, a datacenter you're in had a cooling
>>> failure and escalated to an average of 120 degrees before
>>> mitigations started having an effect. What are normal QA procedures
>>> on your behalf? What is the facility likely to be doing?
>>> What  should be expected in the aftermath?
>> 
>> One would hope they would have had disaster recovery plans to bring
>> in outside cold air, and have executed on it quickly, rather than
>> hoping the chillers got repaired.
>> 
>> All our owned facilities have large outside air intakes, automatic
>> dampers and air mixing chambers in case of mechanical cooling
>> failure, because cooling systems are often not designed to run well
>> in extreme cold.  All of these can be manually run incase of controls
>> failure, but people tell me I'm a little obsessive over backup plans
>> for backup plans.
>> 
>> You will start to see premature failure of equipment over the coming
>> weeks/months/years.
>> 
>> Coincidentally, we have some gear in a data centre in the Chicago
>> area that is experiencing that sort of issue right now... :-(
>> 
>> 
>> 
> 
> --
>-Barry Shein
> 
> Software Tool & Die| b...@theworld.com | 
> http://www.TheWorld.com
> Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
> The World: Since 1989  | A Public Information Utility | *oo*


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Saku Ytti
On Tue, 16 Jan 2024 at 08:51,  wrote:

> A rule of thumb is a few degrees per hour change but YMMV, depends on
> the equipment. Sometimes manufacturer's specs include this.

Is this common sense, or do you have reference to this, like paper
showing at what temperature change at what rate occurs what damage?

I regularly bring fine electronics, say iPhone, through significant
temperature gradients, as do most people who have to live in places
where inside and outside can be wildly different temperatures, with no
particular observable effect. iPhone does go into 'thermometer' mode,
when it overheats though.

Manufacturers, say Juniper and Cisco describe humidity, storage and
operating temperatures, but do not define temperature change rate.
Does NEBS have an opinion on this, or is this just a common case of
yours?

-- 
  ++ytti


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread bzs


Something worth a thought is that as much as devices don't like being
too hot they also don't like to have their temperature change too
quickly. Parts can expand/shrink variably depending on their
composition.

A rule of thumb is a few degrees per hour change but YMMV, depends on
the equipment. Sometimes manufacturer's specs include this.

Throwing open the windows on a winter day to try to rapidly bring the
room down to a "normal" temperature may do more harm than good.

It might be worthwhile figuring out what is reasonable in advance with
buy-in rather than in a panic because, from personal experience,
someone will be screaming in your ear JUST OPEN ALL THE WINDOWS
WHADDYA STUPID?

On January 15, 2024 at 09:23 clay...@mnsi.net (Clayton Zekelman) wrote:
 > 
 > 
 > 
 > At 09:08 AM 2024-01-15, Mike Hammett wrote:
 > >Let's say that hypothetically, a datacenter you're in had a cooling 
 > >failure and escalated to an average of 120 degrees before 
 > >mitigations started having an effect. What are normal QA procedures 
 > >on your behalf? What is the facility likely to be doing? 
 > >What  should be expected in the aftermath?
 > 
 > One would hope they would have had disaster recovery plans to bring 
 > in outside cold air, and have executed on it quickly, rather than 
 > hoping the chillers got repaired.
 > 
 > All our owned facilities have large outside air intakes, automatic 
 > dampers and air mixing chambers in case of mechanical cooling 
 > failure, because cooling systems are often not designed to run well 
 > in extreme cold.  All of these can be manually run incase of controls 
 > failure, but people tell me I'm a little obsessive over backup plans 
 > for backup plans.
 > 
 > You will start to see premature failure of equipment over the coming 
 > weeks/months/years.
 > 
 > Coincidentally, we have some gear in a data centre in the Chicago 
 > area that is experiencing that sort of issue right now... :-(
 > 
 > 
 > 

-- 
-Barry Shein

Software Tool & Die| b...@theworld.com | http://www.TheWorld.com
Purveyors to the Trade | Voice: +1 617-STD-WRLD   | 800-THE-WRLD
The World: Since 1989  | A Public Information Utility | *oo*


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Mike Hammett
Someone I talked to while on scene today said their area got to 130 and cooked 
two core routers. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 

- Original Message -

From: "Mike Hammett"  
To: "NANOG"  
Sent: Monday, January 15, 2024 8:08:25 AM 
Subject: "Hypothetical" Datacenter Overheating 


Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What should be expected in the aftermath? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 




Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Karl Auer
On Mon, 2024-01-15 at 08:08 -0600, Mike Hammett wrote:
> Let's say that hypothetically, a datacenter you're in had a cooling
> failure and escalated to an average of 120 degrees

Major double-take there for this non-US reader, until I realised you
just had to mean Fahrenheit.

Regards, K.

-- 
~~~
Karl Auer (ka...@biplane.com.au)
http://www.biplane.com.au/kauer





Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Lamar Owen
On Mon, Jan 15, 2024 at 7:14 AM  wrote:
>> I’m more interested in how you lose six chillers all at once.
>Extreme cold. If the transfer temperature is too low, they can reach a
>state where the refrigerant liquifies too soon, damaging the
compressor.
>Regards,
>Bill Herrin

Our 70-ton Tranes here have kicked out on 'freeze warning' before; there's a 
strainer in the water loop at the evaporator that can clog, restricting flow 
enough to allow freezing to occur if the chiller is actively cooling.  It's so 
strange to have an overheating data center in subzero (F) temps.  The flow 
sensor in the water loop can sometimes get too cold and not register the flow 
as well.





Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread William Herrin
On Mon, Jan 15, 2024 at 7:14 AM  wrote:
> I’m more interested in how you lose six chillers all at once.

Extreme cold. If the transfer temperature is too low, they can reach a
state where the refrigerant liquifies too soon, damaging the
compressor.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Martin Hannigan
On Mon, Jan 15, 2024 at 4:10 PM Jay Hennigan  wrote:

> On 1/15/24 10:37, Pennington, Scott wrote:
> > yes but it has been -8 in Chicago plenty of times before this.
> >   Very interested in root cause...
>
> Absolutely. My point was that claiming "Global warming" isn't going to
> fly as an excuse.
>

+1

Is their design N+1?

https://www.equinix.com/data-centers/americas-colocation/united-states-colocation/chicago-data-centers/ch1

We're not smashing temp records in Chicago. At least it doesn't seem so
when you look across historical data:

https://www.weather.gov/lot/Chicago_Temperature_Records

HTH,

-M<


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Mel Beckman
My sarcasm generator is clearly set incorrectly :)

 -mel

> On Jan 15, 2024, at 10:33 AM, Jay Hennigan  wrote:
> 
> On 1/15/24 07:21, Mel Beckman wrote:
>> Easy. Climate change. Lol!
> 
> It was -8°F in Chicago yesterday.
> 
 On Jan 15, 2024, at 7:17 AM, sro...@ronan-online.com wrote:
>>> 
>>> 
>>> I’m more interested in how you lose six chillers all at once.
> 
> 
> -- 
> Jay Hennigan - j...@west.net
> Network Engineering - CCIE #7880
> 503 897-8550 - WB6RDV
> 


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Jay Hennigan

On 1/15/24 10:37, Pennington, Scott wrote:
yes but it has been -8 in Chicago plenty of times before this.  
  Very interested in root cause...


Absolutely. My point was that claiming "Global warming" isn't going to 
fly as an excuse.




*From:* NANOG  on 
behalf of Jay Hennigan 

*Sent:* Monday, January 15, 2024 1:31 PM
*To:* nanog@nanog.org 
*Subject:* Re: "Hypothetical" Datacenter Overheating
On 1/15/24 07:21, Mel Beckman wrote:

Easy. Climate change. Lol!


It was -8°F in Chicago yesterday.


On Jan 15, 2024, at 7:17 AM, sro...@ronan-online.com wrote:


I’m more interested in how you lose six chillers all at once.


--
Jay Hennigan - j...@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV



Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Jay Hennigan

On 1/15/24 07:21, Mel Beckman wrote:

Easy. Climate change. Lol!


It was -8°F in Chicago yesterday.


On Jan 15, 2024, at 7:17 AM, sro...@ronan-online.com wrote:


I’m more interested in how you lose six chillers all at once.



--
Jay Hennigan - j...@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV



Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread sronan
Exactly. Perhaps they weren’t all online to begin with…On Jan 15, 2024, at 10:18 AM, Mike Hammett  wrote:and none in the other two facilities you operate in that same building had any failures.-Mike HammettIntelligent Computing SolutionsMidwest Internet ExchangeThe Brothers WISPFrom: sro...@ronan-online.comTo: "Mike Hammett" Cc: "NANOG" Sent: Monday, January 15, 2024 9:14:49 AMSubject: Re: "Hypothetical" Datacenter OverheatingI’m more interested in how you lose six chillers all at once.ShaneOn Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote:Let's say that hypothetically, a datacenter you're in had a cooling failure and escalated to an average of 120 degrees before mitigations started having an effect. What are normal QA procedures on your behalf? What is the facility likely to be doing? What  should be expected in the aftermath?-Mike HammettIntelligent Computing SolutionsMidwest Internet ExchangeThe Brothers WISP

Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Mel Beckman
Easy. Climate change. Lol!

 -mel

On Jan 15, 2024, at 7:17 AM, sro...@ronan-online.com wrote:


I’m more interested in how you lose six chillers all at once.

Shane

On Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote:


Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What  should be expected in the aftermath?



-
Mike Hammett
Intelligent Computing Solutions
[http://www.ics-il.com/images/fbicon.png][http://www.ics-il.com/images/googleicon.png][http://www.ics-il.com/images/linkedinicon.png][http://www.ics-il.com/images/twittericon.png]
Midwest Internet Exchange
[http://www.ics-il.com/images/fbicon.png][http://www.ics-il.com/images/linkedinicon.png][http://www.ics-il.com/images/twittericon.png]
The Brothers WISP
[http://www.ics-il.com/images/fbicon.png][http://www.ics-il.com/images/youtubeicon.png]


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Mike Hammett
and none in the other two facilities you operate in that same building had any 
failures. 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 

- Original Message -

From: sro...@ronan-online.com 
To: "Mike Hammett"  
Cc: "NANOG"  
Sent: Monday, January 15, 2024 9:14:49 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 



I’m more interested in how you lose six chillers all at once. 


Shane 



On Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote: 







Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What should be expected in the aftermath? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 






Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread sronan
I’m more interested in how you lose six chillers all at once.ShaneOn Jan 15, 2024, at 9:11 AM, Mike Hammett  wrote:Let's say that hypothetically, a datacenter you're in had a cooling failure and escalated to an average of 120 degrees before mitigations started having an effect. What are normal QA procedures on your behalf? What is the facility likely to be doing? What  should be expected in the aftermath?-Mike HammettIntelligent Computing SolutionsMidwest Internet ExchangeThe Brothers WISP

Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Jason Canady
Our Zayo circuit just came up 30 minutes ago and it routes through 350 E 
Cermak.  Chillers were all messed up.  No hypothetical there.  :-) It 
was down for over 16 hours!


On 1/15/24 10:04 AM, Bryan Holloway wrote:

I think we're beyond "hypothetical" at this point, Mike ... ;)


On 1/15/24 15:49, Mike Hammett wrote:

Coincidence indeed   ;-)



-
Mike Hammett
Intelligent Computing Solutions <http://www.ics-il.com/>
<https://www.facebook.com/ICSIL><https://plus.google.com/+IntelligentComputingSolutionsDeKalb><https://www.linkedin.com/company/intelligent-computing-solutions><https://twitter.com/ICSIL> 


Midwest Internet Exchange <http://www.midwest-ix.com/>
<https://www.facebook.com/mdwestix><https://www.linkedin.com/company/midwest-internet-exchange><https://twitter.com/mdwestix> 


The Brothers WISP <http://www.thebrotherswisp.com/>
<https://www.facebook.com/thebrotherswisp><https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> 



*From: *"Clayton Zekelman" 
*To: *"Mike Hammett" , "NANOG" 
*Sent: *Monday, January 15, 2024 8:23:37 AM
*Subject: *Re: "Hypothetical" Datacenter Overheating




At 09:08 AM 2024-01-15, Mike Hammett wrote:
 >Let's say that hypothetically, a datacenter you're in had a cooling
 >failure and escalated to an average of 120 degrees before
 >mitigations started having an effect. What are normal QA procedures
 >on your behalf? What is the facility likely to be doing?
 >What  should be expected in the aftermath?

One would hope they would have had disaster recovery plans to bring
in outside cold air, and have executed on it quickly, rather than
hoping the chillers got repaired.

All our owned facilities have large outside air intakes, automatic
dampers and air mixing chambers in case of mechanical cooling
failure, because cooling systems are often not designed to run well
in extreme cold.  All of these can be manually run incase of controls
failure, but people tell me I'm a little obsessive over backup plans
for backup plans.

You will start to see premature failure of equipment over the coming
weeks/months/years.

Coincidentally, we have some gear in a data centre in the Chicago
area that is experiencing that sort of issue right now... :-(







Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Bryan Holloway

I think we're beyond "hypothetical" at this point, Mike ... ;)


On 1/15/24 15:49, Mike Hammett wrote:

Coincidence indeed   ;-)



-
Mike Hammett
Intelligent Computing Solutions <http://www.ics-il.com/>
<https://www.facebook.com/ICSIL><https://plus.google.com/+IntelligentComputingSolutionsDeKalb><https://www.linkedin.com/company/intelligent-computing-solutions><https://twitter.com/ICSIL>
Midwest Internet Exchange <http://www.midwest-ix.com/>
<https://www.facebook.com/mdwestix><https://www.linkedin.com/company/midwest-internet-exchange><https://twitter.com/mdwestix>
The Brothers WISP <http://www.thebrotherswisp.com/>
<https://www.facebook.com/thebrotherswisp><https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>

*From: *"Clayton Zekelman" 
*To: *"Mike Hammett" , "NANOG" 
*Sent: *Monday, January 15, 2024 8:23:37 AM
*Subject: *Re: "Hypothetical" Datacenter Overheating




At 09:08 AM 2024-01-15, Mike Hammett wrote:
 >Let's say that hypothetically, a datacenter you're in had a cooling
 >failure and escalated to an average of 120 degrees before
 >mitigations started having an effect. What are normal QA procedures
 >on your behalf? What is the facility likely to be doing?
 >What  should be expected in the aftermath?

One would hope they would have had disaster recovery plans to bring
in outside cold air, and have executed on it quickly, rather than
hoping the chillers got repaired.

All our owned facilities have large outside air intakes, automatic
dampers and air mixing chambers in case of mechanical cooling
failure, because cooling systems are often not designed to run well
in extreme cold.  All of these can be manually run incase of controls
failure, but people tell me I'm a little obsessive over backup plans
for backup plans.

You will start to see premature failure of equipment over the coming
weeks/months/years.

Coincidentally, we have some gear in a data centre in the Chicago
area that is experiencing that sort of issue right now... :-(







Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread William Herrin
On Mon, Jan 15, 2024 at 6:08 AM Mike Hammett  wrote:
> Let's say that hypothetically, a datacenter you're in had a cooling failure
> and escalated to an average of 120 degrees before mitigations started
> having an effect. What  should be expected in the aftermath?

Hi Mike,

A decade or so ago I maintained a computer room with a single air
conditioner because the boss wouldn't go for n+1. It failed in exactly
this manner several times. After the overheat was detected by the
monitoring system, it would be brought under control with a
combination of spot cooler and powering down to a minimal
configuration. But of course it takes time to get people there and set
up the mitigations, during which the heat continues to rise.

The main thing I noticed was a modest uptick in spinning drive
failures for the couple months that followed. If there was any other
consequence it was at a rate where I'd have had to be carefully
measuring before and after to detect it.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Mike Hammett
Coincidence indeed ;-) 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 

- Original Message -

From: "Clayton Zekelman"  
To: "Mike Hammett" , "NANOG"  
Sent: Monday, January 15, 2024 8:23:37 AM 
Subject: Re: "Hypothetical" Datacenter Overheating 




At 09:08 AM 2024-01-15, Mike Hammett wrote: 
>Let's say that hypothetically, a datacenter you're in had a cooling 
>failure and escalated to an average of 120 degrees before 
>mitigations started having an effect. What are normal QA procedures 
>on your behalf? What is the facility likely to be doing? 
>What should be expected in the aftermath? 

One would hope they would have had disaster recovery plans to bring 
in outside cold air, and have executed on it quickly, rather than 
hoping the chillers got repaired. 

All our owned facilities have large outside air intakes, automatic 
dampers and air mixing chambers in case of mechanical cooling 
failure, because cooling systems are often not designed to run well 
in extreme cold. All of these can be manually run incase of controls 
failure, but people tell me I'm a little obsessive over backup plans 
for backup plans. 

You will start to see premature failure of equipment over the coming 
weeks/months/years. 

Coincidentally, we have some gear in a data centre in the Chicago 
area that is experiencing that sort of issue right now... :-( 







Re: "Hypothetical" Datacenter Overheating

2024-01-15 Thread Clayton Zekelman





At 09:08 AM 2024-01-15, Mike Hammett wrote:
Let's say that hypothetically, a datacenter you're in had a cooling 
failure and escalated to an average of 120 degrees before 
mitigations started having an effect. What are normal QA procedures 
on your behalf? What is the facility likely to be doing? 
What  should be expected in the aftermath?


One would hope they would have had disaster recovery plans to bring 
in outside cold air, and have executed on it quickly, rather than 
hoping the chillers got repaired.


All our owned facilities have large outside air intakes, automatic 
dampers and air mixing chambers in case of mechanical cooling 
failure, because cooling systems are often not designed to run well 
in extreme cold.  All of these can be manually run incase of controls 
failure, but people tell me I'm a little obsessive over backup plans 
for backup plans.


You will start to see premature failure of equipment over the coming 
weeks/months/years.


Coincidentally, we have some gear in a data centre in the Chicago 
area that is experiencing that sort of issue right now... :-(







"Hypothetical" Datacenter Overheating

2024-01-15 Thread Mike Hammett
Let's say that hypothetically, a datacenter you're in had a cooling failure and 
escalated to an average of 120 degrees before mitigations started having an 
effect. What are normal QA procedures on your behalf? What is the facility 
likely to be doing? What should be expected in the aftermath? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP