Re: [openstack-dev] [nova] automatically evacuate instances on compute failure

Chris Friesen Tue, 08 Oct 2013 15:51:55 -0700

On 10/08/2013 03:20 PM, Alex Glikson wrote:

Seems that this can be broken into 3 incremental pieces. First, would be
great if the ability to schedule a single 'evacuate' would be finally
merged
(_https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance_).


Agreed.

Then, it would make sense to have the logic that evacuates an entire
host
(_https://blueprints.launchpad.net/python-novaclient/+spec/find-and-evacuate-host_).
The reasoning behind suggesting that this should not necessarily be in
Nova is, perhaps, that it *can* be implemented outside Nova using the
indvidual 'evacuate' API.

This actually more-or-less exists already in the existing "novahost-evacuate" command. One major issue with this however is that itrequires the caller to specify whether all the instances are on sharedor local storage, and so it can't handle a mix of local and sharedstorage for the instances. If any of them boot off block storage forinstance you need to move them first and then do the remaining ones as agroup.

It would be nice to embed the knowledge of whether or not an instance ison shared storage in the instance itself at creation time. I envisionspecifying this in the config file for the compute manager along withthe instance storage location, and the compute manager could set thefield in the instance at creation time.

Finally, it should be possible to close the
loop and invoke the evacuation automatically as a result of a failure
detection (not clear how exactly this would work, though). Hopefully we
will have at least the first part merged soon (not sure if anyone is
actively working on a rebase).

My interpretation of the discussion so far is that the nova maintainerswould prefer this to be driven by an outside orchestration daemon.

Currently the only way a service is recognized to be "down" is ifsomeone calls is_up() and it notices that the service hasn't sent anupdate in the last minute. There's nothing in nova actively scanningfor compute node failures, which is where the outside daemon comes in.

Also, there is some complexity involved in dealing with auto-evacuate:What do you do if an evacuate fails? How do you recover intelligentlyif there is no admin involved?


Chris

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] automatically evacuate instances on compute failure

Reply via email to