Re: Long downtimes for VMs through automatically triggered storage migration

Simon Weller Wed, 12 Oct 2016 10:19:58 -0700

See inline.


________________________________
<snip>
>What we are still thinking about is the point, if it is principally a
>good idea to limit CloudStack in its ability to freely and automatically
>migrate VMs between all cluster nodes. Is setting
>"enable.ha.storage.migration"=false the intended way to handle a setup
>with multiple clusters or is it kind of a dirty hack to circumvent
>disadvantages of our setup? In the latter case we would like to know
>that and keep a focus on alternatives and be ready to improve our setup
>in the mid-term.

The logic around having multiple primary storage options tied to clusters is 
really designed to limit failure domains. Ideally you want to spread your 
workloads across different failure domains so that if you do lose a primary 
storage system, you still have services up and running.
We build redundancy into the cluster and the storage attached to the cluster. 
We also run multiple clusters within a pod. If you spread your redundant VMs 
across multiple clusters (with their own primary storage), it's easier to 
absorb a catastrophic storage failure, as your eggs aren't in one basket.

We turn off HA storage migration, as it doesn't make much sense to us. It 
assumes the storage is still up, as you obviously can't migrate a VM to a 
different primary storage if it's down. If you have enough hosts in a cluster, 
you should never run into a situation where you can't bring all your VMs back 
up due to host failure. So in that sense, HA storage migration is a pointless 
feature if you build and scale your clusters properly.

Just my 2 cents.

- Si



>We would be happy to hear about additional advice, experience and
>suggestions.

>Thanks a lot,

>Melanie Desaive

--
--

Heinlein Support GmbH
Linux: Akademie - Support - Hosting

http://www.heinlein-support.de
Linux: Support, Consulting, Kurs, Training, Schulung 
...<http://www.heinlein-support.de/>
www.heinlein-support.de
Wir bieten Wissen und Erfahrung ...und Sie können sich aussuchen, wie Sie 
beides nutzen. Profitieren Sie vom Wissen in unseren Linux-Schulungen an 
unserer Akademie ...



Tel: 030 / 40 50 51 - 0
Fax: 030 / 40 50 51 - 19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin


> ________________________________
> From: Melanie Desaive <m.desa...@heinlein-support.de>
> Sent: Wednesday, October 12, 2016 5:50 AM
> To: users@cloudstack.apache.org
> Subject: Long downtimes for VMs through automatically triggered storage 
> migration
>
> Hi all,
>
> my college and I are having a dispute on when cloudstack should
> automatically trigger storage migrations and what options we have to
> control cloudstacks behavior in terms of storage migrations.
>
> We are operating a setup with two XenServer clusters which are combined
> into one pod. Each cluster with its own independent SRs of type lvmoiscsi.
>
> Unfortunately we had a XenServer bug, which prevented a few VMs to start
> on any compute node. Any time this bug appeared, CloudStack tried to
> start the concerned VM successively on each node of the actual cluster
> and afterwards started a storage migration to the second cluster.
>
> We are using UserDispersing deployment planner.
>
> The decision of the deployment planner to start the storage migration
> was very unfortunate for us. Mainly because:
>  * We are operating some VMs with big data volumes which where
> inaccessible for the time the storage migration was running.
>  * The SR on the destination cluster did not even have the capacity to
> take all volumes of the big VMs. Still the migration was triggered.
>
> We would like to have some kind of best practice advice on how other are
> preventing long, unplanned downtimes for VMs with huge data volumes
> through automated storage migration.
>
> We discussed the topic and came up with the following questions:
>  * Is the described behaviour of the deployment planner intentional?
>  * Is it possible to prevent some few VMs with huge storage volumes from
> automated storage migration and what would be the best way to achieve
> this? Could we use storage or host tags for this purpose?
>  * Is it possible to globally prevent the deployment planner from
> starting storage migrations?
>     * Are there global settings to achieve this?
>     * Would we have to adapt the deployment planner?
>  * Do we have to rethink our system architecture and avoid huge data
> volumes completely?
>  * Was the decision to put two clusters into one pod a bad idea?
>  * Are there other solutions to our problem?
>
> We would greatly appreciate any advice in the issue!
>
> Best regards,
>
> Melanie
>
> --

Re: Long downtimes for VMs through automatically triggered storage migration

Reply via email to