I'm going to start working on VCL-169 Event Driven power down...

This is a first step of a larger power management feature.

In this step, I suggest extending health_check.pl script to accept options for different data center events that would require the hardware to be shutdown. The events are usually related to heat issues that are detected within the blade chassis's or other external thermal sensors.

The two primary events are

1)shutdown idle blades (phase 1)
I'm thinking the process is to pull all blades that are idle under the controlling management node, relocate any upcoming reservations that might reside on those blades, then proceed to shutdown the blades.

2)shutdown blades currently inuse (phase 2 - phase 1 did not do enough)
This second part would be triggered if and only if event 1 is not effective. It notifies the user running on the VCL resource about the unexpected data center problem and then starts a count-down of when the node will be shutdown. Depending on the reservation type (Long-term vs short or some other method) - we'll need to address either reclaiming the blade or just shutting it down and retaining the reservation data by extending the end time. Then once things are back to normal vcld on start up will detect these previous reservations and start them back up, then notify the end-user it is available again.

If there are any thoughts or other suggestions, please feel free to comment.

Aaron

Reply via email to