Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-12 Thread Nikola Đipanov
On 10/06/2015 04:34 PM, Matthew Booth wrote:
> Hi, Roman,
> 
> Evacuated has been on my radar for a while and this post has prodded me
> to take a look at the code. I think it's worth starting by explaining
> the problems in the current solution. Nova client is currently
> responsible for doing this evacuate. It does:
>



> 
> I believe we can solve this problem, but I think that without fixing
> single-instance evacuate we're just pushing the problem around (or
> creating new places for it to live). I would base the robustness of my
> implementation on a single principal:
> 
>   An instance has a single owner, which is exclusively responsible for
> rebuilding it.
> 
> In outline, I would redefine the evacuate process to do:
> 
> API:
> 1. Call the scheduler to get a destination for the evacuate if none was
> given.
> 2. Atomically update instance.host to this destination, and task state
> to rebuilding.
> 

We can't do this because of resource tracking - the host switch has to
be done after the claim is done which can happen only on the target
compute, otherwise we don't track the resources properly (*).

That does not invalidate your more general point which is that we need a
way to make sure that started evacuations can be picked up and resumed
in case of any failures along the way (even a rebuild failure of the
target host that may have failed during the process).

Some work that dansmith did [1] and I later built upon some of that work
[2]. I think our assumption was that we would use the migration record
for this, which _I think_ gives us all the stuff you talk about further
below, apart of course from there being a need for an external task to
actually see the evacuation through to the end. I think this is in-line
with most HA design proposals, where we make sure our control plane is
redundant while we really don't care about individual compute nodes
(apart from the instances they host).

I am also not sure that leaving the actual building of the instance up
to a periodic task is a good choice if we want to minimize downtime
which seem to me to be the point of the instance HA proposals.

N.

(*) We could "solve" this by checkin instance.task_state for example but
IMHO we shouldn't go down that route as it becomes way more difficult to
reason about resource tracking once you introduce one more free-variable.

[1]
https://github.com/openstack/nova/blob/02b7e64b29dd707c637ea7026d337e5cb196f337/nova/compute/api.py#L3303
[2]
https://github.com/openstack/nova/blob/02b7e64b29dd707c637ea7026d337e5cb196f337/nova/compute/manager.py#L2702

> Compute:
> 3. Rebuild the instance.
> 
> This would be supported by a periodic task on the compute host which
> looks for rebuilding instances assigned to this host which aren't
> currently rebuilding, and kicks off a rebuild for them. This would cover
> the compute going down during a rebuild, or the api going down before
> messaging the compute.
> 
> Implementing this gives us several things:
> 
> 1. The list instances, evacuate all instances process becomes
> idempotent, because as soon as the evacuate is initiated, the instance
> is removed from the source host.
> 2. We get automatic recovery of failure of the target compute. Because
> we atomically moved the instance to the target compute immediately, if
> the target compute also has to be evacuated, our instance won't fall
> through the gap.
> 3. We don't need an additional place for the code to run, because it
> will run on the compute. All the work has to be done by the compute
> anyway. By farming the evacuates out directly and immediately to the
> target compute we reduce both overhead and complexity.
> 
> The coordination becomes very simple. If we've run the nova client
> evacuation anywhere at least once, the actual evacuations are now
> Sombody Else's Problem (to quote h2g2), and will complete eventually. As
> evacuation in any case involves a forced change of owner it requires
> fencing of the source and implies an external agent such as pacemaker.
> The nova client evacuation can run in pacemaker.
> 
> Matt
> 
> On Fri, Oct 2, 2015 at 2:05 PM, Roman Dobosz  > wrote:
> 
> Hi all,
> 
> The case of automatic evacuation (or resurrection currently), is a topic
> which surfaces once in a while, but it isn't yet fully supported by
> OpenStack and/or by the cluster services. There was some attempts to
> bring the feature into OpenStack, however it turns out it cannot be
> easily integrated with. On the other hand evacuation may be executed
> from the outside using Nova client or Nova API calls for evacuation
> initiation.
> 
> I did some research regarding the ways how it could be designed, based
> on Russel Bryant blog post[1] as a starting point. Apart from it, I've
> also taken high availability and reliability into consideration when
> designing the solution.
> 
> Together with coworker, we did first 

Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-08 Thread Deja, Dawid
Hi Matthew,

Thanks for bringing some light on what problems has nova with evacuation of an 
instance. It is very important to have those limitations in mind when preparing 
final solution. Or to fix them, as you proposed.

Nevertheless, I would say that evacuationD does more than what calling 'nova 
host-evacuate' do. Let's consider such scenario:

1. Call 'nova host evacuate HostX'
2. Caller dies during call - information that some VMs are still to be 
evacuated is lost.

Such thing would not happen with evacuationD, because it prepares one rabbitMQ 
message for each VM that needs to be evacuated. Moreover, it deals with 
situation, when process that lists VMs crashes. In such case, whole operation 
would be continued by another daemon.

EvacD may also handle another problem that you mentioned: failure of target 
host of evacuation. In such scenario, 'evacuate host' message will be send for 
a new host and EvacD will try to evacuate all of it's vms - even those in 
rebuild state. Of course, evacuation of such instances fails, but they would 
eventually enter error state and evacuationD would start resurrection process. 
This can be speed up by setting instances state to 'error' (despite these which 
are in 'active' state) on the beginning of whole 'evacuate host' process.

Finally, another action - called 'Look for VM' - could be added. It would check 
if given VM ended up in active state on new hosts; if no, VM could be rebuild. 
I hope this would give us as much certainty that VM is alive as possible.

Dawid

On Tue, 2015-10-06 at 16:34 +0100, Matthew Booth wrote:
Hi, Roman,

Evacuated has been on my radar for a while and this post has prodded me to take 
a look at the code. I think it's worth starting by explaining the problems in 
the current solution. Nova client is currently responsible for doing this 
evacuate. It does:

1. List all instances on the source host
2. Initiate evacuate for each instance

Evacuating a single instance does:

API:
1. Set instance task state to rebuilding
2. Create a migration record with source and dest if specified

Conductor:
3. Call the scheduler to get a destination host if not specified
4. Get the migration object from the db

Compute:
5. Rebuild the instance on dest
6. Update instance.host to dest

Examining single instance evacuation, the first obvious thing to look at is 
what if 2 happen simultaneously. Because step 1 is atomic, it should not be 
possible to initiate 2 evacuations simultaneously of a single instance. 
However, note that this atomic action hasn't updated the instance host, meaning 
the source host remains the owner of this instance. If the evacuation process 
fails to complete, the source host will automatically delete it if it comes 
back up because it will find a migration record, but it will not be rebuilt 
anywhere else. Evacuating it again will fail, because its task state is already 
rebuilding.

Also, let's imagine that the conductor crashes. There is not enough state for 
any tool, whether internal or external, to be able to know if the rebuild is 
ongoing somewhere or not, and therefore whether it is safe to retry even if 
that retry would succeed, which it wouldn't.

Which is to say that we can't currently robustly evacuate one instance!

Looking at the nova client side, there is an obvious race there: there is no 
guarantee in step 2 that instances returned in step one have not already been 
evacuated by another process. We're protected here, though because evacuating a 
single instance twice will fail the second time. Note that the process isn't 
idempotent, though, because an evacuation which falls into a hole will never be 
retried.

Moving on to what evacuated does. Evacuated uses rabbit to distribute jobs 
reliably. There are 2 jobs in evacuated:

1. Evacuate host:
  1.1 Get list of all instances on the source host from Nova
  1.2 Send an evacuate vm job for each instance
2. Evacuate vm:
  2.1 Tell Nova to start evacuating an instance

Because we're using rabbit as a reliable message bus, the initiator of one of 
the tasks knows that it will eventually run to completion at least once. Note 
that there's nothing to prevent the task being executed more than once per 
call, though. A task may crash before sending an ack, or may just be really 
slow. However, in both cases, for exactly the same reasons as for the 
implementation in nova client, running more than once should not race. It is 
still not idempotent, though, again for exactly the same reasons as nova client.

Also notice that, exactly as in the nova client implementation, we are not 
asserting that an instance has been evacuated. We are only asserting that we 
called nova.evacuate, which is to say that we got as far as step 2 in the 
evacuation sequence above.

In other words, in terms of robustness, calling evacuated's evacuate host is 
identical to asserting that nova client's evacuate host ran to completion at 
least once, which is quite a lot simpler to do. That's 

Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-06 Thread Steve Gordon
- Original Message -
> From: "Roman Dobosz" 
> To: "OpenStack Development Mailing List" 
> 
> Hi all,
> 
> The case of automatic evacuation (or resurrection currently), is a topic
> which surfaces once in a while, but it isn't yet fully supported by
> OpenStack and/or by the cluster services. There was some attempts to
> bring the feature into OpenStack, however it turns out it cannot be
> easily integrated with. On the other hand evacuation may be executed
> from the outside using Nova client or Nova API calls for evacuation
> initiation.
> 
> I did some research regarding the ways how it could be designed, based
> on Russel Bryant blog post[1] as a starting point. Apart from it, I've
> also taken high availability and reliability into consideration when
> designing the solution.
> 
> Together with coworker, we did first PoC[2] to enable cluster to be able
> to perform evacuation. The idea behind that PoC was simple - providing
> additional, small service which would trigger and supervise the
> evacuation process, which would be triggered from the outside (in this
> example we were using Pacemaker fencing facility, but it might be
> anything) using RabbitMQ directly. Those services are running on the
> control plane in AA fashion.

Hi Roman,

Another aspect of this which we discussed briefly a few weeks back was whether 
external HA solutions like that proposed by Russell should be "opt-in" on a 
per-instance basis via an image property or flavor extra specification. That is 
that the external instance high-availability solution would only automatically 
move virtual machines that had this attribute associated with them, whatever it 
ends up being.

I'm wondering if there is any appetite in the community for standardizing on 
what this literal property or extra specification would be even though the 
delivery of the HA solutions themselves is not part of Nova itself but rather 
handled by the deployers/distributors using external tools like Pacemaker?

Thanks,

Steve

> That work well for us. So we started exploring other possibilities like
> oslo.messaging just to use it in the same manner as we did in the poc.
> It turns out that the implementation will not be as easy, because there
> is no facility in the oslo.messaging for letting sending an ACK from the
> client after the job is done (not as soon as it gets the message). We
> also looked at the existing OpenStack projects for a candidate which
> provide service for managing long running tasks.
> 
> There is the Mistral project, which gives us almost all the features we
> need. The one missing feature is the HA of the Mistral tasks execution.
> 
> The question is, how such problem (long running tasks) could be resolved
> in OpenStack?
> 
> [1] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
> [2] https://github.com/dawiddeja/evacuationd

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-06 Thread Matthew Booth
Hi, Roman,

Evacuated has been on my radar for a while and this post has prodded me to
take a look at the code. I think it's worth starting by explaining the
problems in the current solution. Nova client is currently responsible for
doing this evacuate. It does:

1. List all instances on the source host
2. Initiate evacuate for each instance

Evacuating a single instance does:

API:
1. Set instance task state to rebuilding
2. Create a migration record with source and dest if specified

Conductor:
3. Call the scheduler to get a destination host if not specified
4. Get the migration object from the db

Compute:
5. Rebuild the instance on dest
6. Update instance.host to dest

Examining single instance evacuation, the first obvious thing to look at is
what if 2 happen simultaneously. Because step 1 is atomic, it should not be
possible to initiate 2 evacuations simultaneously of a single instance.
However, note that this atomic action hasn't updated the instance host,
meaning the source host remains the owner of this instance. If the
evacuation process fails to complete, the source host will automatically
delete it if it comes back up because it will find a migration record, but
it will not be rebuilt anywhere else. Evacuating it again will fail,
because its task state is already rebuilding.

Also, let's imagine that the conductor crashes. There is not enough state
for any tool, whether internal or external, to be able to know if the
rebuild is ongoing somewhere or not, and therefore whether it is safe to
retry even if that retry would succeed, which it wouldn't.

Which is to say that we can't currently robustly evacuate one instance!

Looking at the nova client side, there is an obvious race there: there is
no guarantee in step 2 that instances returned in step one have not already
been evacuated by another process. We're protected here, though because
evacuating a single instance twice will fail the second time. Note that the
process isn't idempotent, though, because an evacuation which falls into a
hole will never be retried.

Moving on to what evacuated does. Evacuated uses rabbit to distribute jobs
reliably. There are 2 jobs in evacuated:

1. Evacuate host:
  1.1 Get list of all instances on the source host from Nova
  1.2 Send an evacuate vm job for each instance
2. Evacuate vm:
  2.1 Tell Nova to start evacuating an instance

Because we're using rabbit as a reliable message bus, the initiator of one
of the tasks knows that it will eventually run to completion at least once.
Note that there's nothing to prevent the task being executed more than once
per call, though. A task may crash before sending an ack, or may just be
really slow. However, in both cases, for exactly the same reasons as for
the implementation in nova client, running more than once should not race.
It is still not idempotent, though, again for exactly the same reasons as
nova client.

Also notice that, exactly as in the nova client implementation, we are not
asserting that an instance has been evacuated. We are only asserting that
we called nova.evacuate, which is to say that we got as far as step 2 in
the evacuation sequence above.

In other words, in terms of robustness, calling evacuated's evacuate host
is identical to asserting that nova client's evacuate host ran to
completion at least once, which is quite a lot simpler to do. That's still
not very robust, though: we don't recover from failures, and we don't
ensure that an instance is evacuated, only that we started an attempt to
evacuate at least once. I'm obviously not satisfied with nova client,
however as the implementation is simpler I would favour it over evacuated.

I believe we can solve this problem, but I think that without fixing
single-instance evacuate we're just pushing the problem around (or creating
new places for it to live). I would base the robustness of my
implementation on a single principal:

  An instance has a single owner, which is exclusively responsible for
rebuilding it.

In outline, I would redefine the evacuate process to do:

API:
1. Call the scheduler to get a destination for the evacuate if none was
given.
2. Atomically update instance.host to this destination, and task state to
rebuilding.

Compute:
3. Rebuild the instance.

This would be supported by a periodic task on the compute host which looks
for rebuilding instances assigned to this host which aren't currently
rebuilding, and kicks off a rebuild for them. This would cover the compute
going down during a rebuild, or the api going down before messaging the
compute.

Implementing this gives us several things:

1. The list instances, evacuate all instances process becomes idempotent,
because as soon as the evacuate is initiated, the instance is removed from
the source host.
2. We get automatic recovery of failure of the target compute. Because we
atomically moved the instance to the target compute immediately, if the
target compute also has to be evacuated, our instance won't fall through
the 

Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-06 Thread Renat Akhmerov
Roman,

Here are some things that may help you:
In Mistral we’ve been aware of this post-message-processing ACK problem since 
we began to use oslo.messaging and we’ve been communicating with oslo team in 
order to fix that. Patch [1] is supposed to help us finally solve it. I would 
encourage you to participate in that effort too to make sure this matches your 
understanding of the problem. We’ve also seen a bug [2] that you filed at 
Launchpad so we’ll be updating its status.
As far as Mistral HA, I would say the following: it is actually supported by 
design but there’s a number of issues with its implementation. Not that it’s an 
HA info but, FYI, there are existing Mistral installations working in 
production with multiple Mistral engines, executors and api servers. Although I 
have to admit that it’s not so easy yet to make such installations work 
reliably. Generally, we keep working on it and we have huge plans for making 
Mistral HA in Mitaka cycle. Significant part of design sessions in Tokyo will 
be exactly about HA which includes a lot of things: proper testing, profiling, 
identifying points of failure and overall performance improvement (which is 
also one of the things influencing overall robustness).
As far as the task you’re trying to solve, I can say that, IMO, Mistral is a 
good candidate for this just because it’s really a standalone reliable service 
that can take execution of a long process under its control. This is one of the 
main ideas behind it. Currently we are planning to address similar cases with 
Mistral within our company. I think we’ll share the results when once we get 
something done and described.

Thanks for bringing this up. And I'll say what I usually do: you’re very 
welcome to contribute into Mistral, it should be fun to do.

Looking forward to hear more from you about your discoveries.

[1] https://review.openstack.org/#/c/229186/ 

[2] https://bugs.launchpad.net/mistral/+bug/1502120 


Renat Akhmerov
@ Mirantis Inc.



> On 02 Oct 2015, at 19:05, Roman Dobosz  wrote:
> 
> Hi all,
> 
> The case of automatic evacuation (or resurrection currently), is a topic 
> which surfaces once in a while, but it isn't yet fully supported by 
> OpenStack and/or by the cluster services. There was some attempts to 
> bring the feature into OpenStack, however it turns out it cannot be 
> easily integrated with. On the other hand evacuation may be executed 
> from the outside using Nova client or Nova API calls for evacuation 
> initiation.
> 
> I did some research regarding the ways how it could be designed, based 
> on Russel Bryant blog post[1] as a starting point. Apart from it, I've 
> also taken high availability and reliability into consideration when 
> designing the solution.
> 
> Together with coworker, we did first PoC[2] to enable cluster to be able 
> to perform evacuation. The idea behind that PoC was simple - providing 
> additional, small service which would trigger and supervise the 
> evacuation process, which would be triggered from the outside (in this 
> example we were using Pacemaker fencing facility, but it might be 
> anything) using RabbitMQ directly. Those services are running on the 
> control plane in AA fashion.
> 
> That work well for us. So we started exploring other possibilities like 
> oslo.messaging just to use it in the same manner as we did in the poc.  
> It turns out that the implementation will not be as easy, because there 
> is no facility in the oslo.messaging for letting sending an ACK from the 
> client after the job is done (not as soon as it gets the message). We 
> also looked at the existing OpenStack projects for a candidate which 
> provide service for managing long running tasks.
> 
> There is the Mistral project, which gives us almost all the features we 
> need. The one missing feature is the HA of the Mistral tasks execution.
> 
> The question is, how such problem (long running tasks) could be resolved 
> in OpenStack?
> 
> [1] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
> [2] https://github.com/dawiddeja/evacuationd
> 
> -- 
> Cheers,
> Roman Dobosz
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-02 Thread Roman Dobosz
Hi all,

The case of automatic evacuation (or resurrection currently), is a topic 
which surfaces once in a while, but it isn't yet fully supported by 
OpenStack and/or by the cluster services. There was some attempts to 
bring the feature into OpenStack, however it turns out it cannot be 
easily integrated with. On the other hand evacuation may be executed 
from the outside using Nova client or Nova API calls for evacuation 
initiation.

I did some research regarding the ways how it could be designed, based 
on Russel Bryant blog post[1] as a starting point. Apart from it, I've 
also taken high availability and reliability into consideration when 
designing the solution.

Together with coworker, we did first PoC[2] to enable cluster to be able 
to perform evacuation. The idea behind that PoC was simple - providing 
additional, small service which would trigger and supervise the 
evacuation process, which would be triggered from the outside (in this 
example we were using Pacemaker fencing facility, but it might be 
anything) using RabbitMQ directly. Those services are running on the 
control plane in AA fashion.

That work well for us. So we started exploring other possibilities like 
oslo.messaging just to use it in the same manner as we did in the poc.  
It turns out that the implementation will not be as easy, because there 
is no facility in the oslo.messaging for letting sending an ACK from the 
client after the job is done (not as soon as it gets the message). We 
also looked at the existing OpenStack projects for a candidate which 
provide service for managing long running tasks.

There is the Mistral project, which gives us almost all the features we 
need. The one missing feature is the HA of the Mistral tasks execution.

The question is, how such problem (long running tasks) could be resolved 
in OpenStack?

[1] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
[2] https://github.com/dawiddeja/evacuationd

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev