On 10/06/2015 04:34 PM, Matthew Booth wrote: > Hi, Roman, > > Evacuated has been on my radar for a while and this post has prodded me > to take a look at the code. I think it's worth starting by explaining > the problems in the current solution. Nova client is currently > responsible for doing this evacuate. It does: >
</snipping a lot of reasonable text> > > I believe we can solve this problem, but I think that without fixing > single-instance evacuate we're just pushing the problem around (or > creating new places for it to live). I would base the robustness of my > implementation on a single principal: > > An instance has a single owner, which is exclusively responsible for > rebuilding it. > > In outline, I would redefine the evacuate process to do: > > API: > 1. Call the scheduler to get a destination for the evacuate if none was > given. > 2. Atomically update instance.host to this destination, and task state > to rebuilding. > We can't do this because of resource tracking - the host switch has to be done after the claim is done which can happen only on the target compute, otherwise we don't track the resources properly (*). That does not invalidate your more general point which is that we need a way to make sure that started evacuations can be picked up and resumed in case of any failures along the way (even a rebuild failure of the target host that may have failed during the process). Some work that dansmith did [1] and I later built upon some of that work [2]. I think our assumption was that we would use the migration record for this, which _I think_ gives us all the stuff you talk about further below, apart of course from there being a need for an external task to actually see the evacuation through to the end. I think this is in-line with most HA design proposals, where we make sure our control plane is redundant while we really don't care about individual compute nodes (apart from the instances they host). I am also not sure that leaving the actual building of the instance up to a periodic task is a good choice if we want to minimize downtime which seem to me to be the point of the instance HA proposals. N. (*) We could "solve" this by checkin instance.task_state for example but IMHO we shouldn't go down that route as it becomes way more difficult to reason about resource tracking once you introduce one more free-variable. [1] https://github.com/openstack/nova/blob/02b7e64b29dd707c637ea7026d337e5cb196f337/nova/compute/api.py#L3303 [2] https://github.com/openstack/nova/blob/02b7e64b29dd707c637ea7026d337e5cb196f337/nova/compute/manager.py#L2702 > Compute: > 3. Rebuild the instance. > > This would be supported by a periodic task on the compute host which > looks for rebuilding instances assigned to this host which aren't > currently rebuilding, and kicks off a rebuild for them. This would cover > the compute going down during a rebuild, or the api going down before > messaging the compute. > > Implementing this gives us several things: > > 1. The list instances, evacuate all instances process becomes > idempotent, because as soon as the evacuate is initiated, the instance > is removed from the source host. > 2. We get automatic recovery of failure of the target compute. Because > we atomically moved the instance to the target compute immediately, if > the target compute also has to be evacuated, our instance won't fall > through the gap. > 3. We don't need an additional place for the code to run, because it > will run on the compute. All the work has to be done by the compute > anyway. By farming the evacuates out directly and immediately to the > target compute we reduce both overhead and complexity. > > The coordination becomes very simple. If we've run the nova client > evacuation anywhere at least once, the actual evacuations are now > Sombody Else's Problem (to quote h2g2), and will complete eventually. As > evacuation in any case involves a forced change of owner it requires > fencing of the source and implies an external agent such as pacemaker. > The nova client evacuation can run in pacemaker. > > Matt > > On Fri, Oct 2, 2015 at 2:05 PM, Roman Dobosz <[email protected] > <mailto:[email protected]>> wrote: > > Hi all, > > The case of automatic evacuation (or resurrection currently), is a topic > which surfaces once in a while, but it isn't yet fully supported by > OpenStack and/or by the cluster services. There was some attempts to > bring the feature into OpenStack, however it turns out it cannot be > easily integrated with. On the other hand evacuation may be executed > from the outside using Nova client or Nova API calls for evacuation > initiation. > > I did some research regarding the ways how it could be designed, based > on Russel Bryant blog post[1] as a starting point. Apart from it, I've > also taken high availability and reliability into consideration when > designing the solution. > > Together with coworker, we did first PoC[2] to enable cluster to be able > to perform evacuation. The idea behind that PoC was simple - providing > additional, small service which would trigger and supervise the > evacuation process, which would be triggered from the outside (in this > example we were using Pacemaker fencing facility, but it might be > anything) using RabbitMQ directly. Those services are running on the > control plane in AA fashion. > > That work well for us. So we started exploring other possibilities like > oslo.messaging just to use it in the same manner as we did in the poc. > It turns out that the implementation will not be as easy, because there > is no facility in the oslo.messaging for letting sending an ACK from the > client after the job is done (not as soon as it gets the message). We > also looked at the existing OpenStack projects for a candidate which > provide service for managing long running tasks. > > There is the Mistral project, which gives us almost all the features we > need. The one missing feature is the HA of the Mistral tasks execution. > > The question is, how such problem (long running tasks) could be resolved > in OpenStack? > > [1] > http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/ > [2] https://github.com/dawiddeja/evacuationd > > -- > Cheers, > Roman Dobosz > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > [email protected]?subject:unsubscribe > <http://[email protected]?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
