Re: Best practises and ideas for power outtake scenarios

Andrija Panic Thu, 03 Jun 2021 14:11:00 -0700

Hi Chris,

shutting down ACS VMs and hosts is just one part of the story, another part
of the story is also shutting down your side infra, such as storage arrays,
and any external stuff you might have. But let's focus on ACS.


I would make some scripting which involves:
- pulling data from UPS to detect when power down, sleep for some short
amount of time (i.e. 1-2minute to see if this is some hiccup) then if power
is still down, you can use bash and cloudmonkey to automate the following
(to be tested properly in your test env):

- Disable the zone(s)
- Put all Primary Storages in the Maintenance mode - this will cause all
existing VMs on that storage to be shutdown, one by one = takes LONG time
     ---- SO better approach is to query ACS for a list of running VMs
(save this in a file somewhere for later!), put that list in a file, send
VM stop command in batches of 10 or 20 or 50 (depending on the size and
speed of your storage) - don't overload the storage....
     ---- confirm all your user VMs are stopped
     ---- stop all system VMs (SSVM, CPVM, all VRs)
      --- confirm no running VM
-- at this point there is no VM running at all in your ACS setup
-- shutdown management server (so you avoid putting your hosts in
maintenance mode in ACS - takes time otherwise), shutdown your MySQL
servers (use by ACS mgmt servers)
-- if using VMware, put all hosts in maintenance mode, and ask Center to
shutdown all hosts (you need to see what to do with vCenter itself, etc)
-- if other hypervisors are used (KVM, XenServer) - do whatever it takes to
shut them down safely/correctly

When power is up, reverse everything - power hosts, wait some time and
confirm all your hosts are up/connected in your management tools ( e.g. in
vCenter, XenCenter, KVM via VirtManager, whatever....)
Start the MySQL, the mgmt server, enable the zone (SSVM and CPVM will be
started automatically), then query that list of previously running VMS
(that you saved) and start them one by one, or in batches of 5 or 10 (or
more) - depending on your HW performance (try to avoid "boot storm) -
starting a VM in a network will cause VRs to be started automatically, so
this step is not needed explicitely - but you can handle it manually as
well if you like


This is just to give you an idea (I once had to shutdown ACS and everything
else, which included SolidFire storage, Cloudian S3 cluster, some other
storage solutions etc, in my ex company) - was a really "interesting"
experience...


Best,


On Thu, 3 Jun 2021 at 18:21, vas...@gmx.de <vas...@gmx.de> wrote:

> Hello everyone,
>
> i would like to ask for some ideas / bestpractises for dealing with
> power outtake scenarios involving the the cloudstack infrastructure.
> So the usecase would be a power outtake at a datacenter where all
> components of cloudstack (management server, hosts, storage) are
> hosted, which can't be repaired in a given time.
>
> so the "simple" target process would be something like this:
>
> 1.Power outtake detected bei UPS
> 2. UPS is giving notification to CS Management
> 3. CS management is sending information to all vm's as well as hosts
> 4. vm's and hosts performing gracefull shutdown
> 5. management server performs gracefull shutdown
> 6. afterwards shutdown of storage and further components
>
> Are there any included "workflows" or mechanics which can be used out
> of the box? Any real-life best practices how to implement this kind of
> workflow?
>
> with regards,
> Chris
>


-- 

Andrija Panić

Re: Best practises and ideas for power outtake scenarios

Reply via email to