Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-02 Thread Chen CH Ji
Not only ERROR state, but also VERIFY_RESIZE might have this kind problem
https://review.openstack.org/#/c/101435/ has more info
so guess the server task stuff might be the right direction to those
problems ...

Best Regards!

Kevin (Chen) Ji 纪 晨

Engineer, zVM Development, CSTL
Notes: Chen CH Ji/China/IBM@IBMCN   Internet: jiche...@cn.ibm.com
Phone: +86-10-82454158
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District,
Beijing 100193, PRC



From:   Chris Friesen 
To: "openstack-dev@lists.openstack.org"

Date:   10/02/2014 03:05 AM
Subject:[openstack-dev] [nova] formally distinguish server "desired
        state"  from "actual state"?




Currently in nova we have the "vm_state", which according to the code
comments is supposed to represent "a VM's current stable (not
transition) state", or "what the customer expect the VM to be".

However, we then added in an ERROR state.  How does this possibly make
sense given the above definition?  Which customer would ever expect the
VM to be in an error state?

Given this, I wonder whether it might make sense to formally distinguish
between the expected/desired state (i.e. the state that the customer
wants the VM to be in), and the actual state (i.e. the state that nova
thinks the VM is in).

This would more easily allow for recovery actions, since if the actual
state changes to ERROR (or similar) we would still have the
expected/desired state available for reference when trying to take
recovery actions.

Thoughts?

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-01 Thread Jay Pipes

On 10/01/2014 04:31 PM, Chris Friesen wrote:

On 10/01/2014 01:23 PM, Jay Pipes wrote:

On 10/01/2014 03:07 PM, Chris Friesen wrote:

Currently in nova we have the "vm_state", which according to the code
comments is supposed to represent "a VM's current stable (not
transition) state", or "what the customer expect the VM to be".

However, we then added in an ERROR state.  How does this possibly make
sense given the above definition?


Where do you see that vm_state is intended to be "what the customer
expects the VM to be"?


 From nova/compute/vm_states.py:
'vm_state describes a VM's current stable (not transition) state. That
is, if there is no ongoing compute API calls (running tasks), vm_state
should reflect what the customer expect the VM to be.'


Hmm, interesting wording. I wasn't aware of that wiki page and I'm not 
sure about the freshness of it, but I think what the language is saying 
is that if a user isn't actively running an action against the server, 
it should be in the last state a user put it in -- i.e. active, 
terminated, stopped, paused, etc.



Also, from http://wiki.openstack.org/VMState:
'vm_state reflects the stable state based on API calls, matching user
expectation, revised “top-down” within API implementation.'


Yeah, also awkward wording...


Now granted, the wiki also says 'If the task fails and is not possible
to rollback, the vm_state is set to ERROR.'  I don't particularly like
that behaviour, which is why I'd like to see a separate "actual state".


If we had a task-based system, which is what I am advocating for, you 
would have a *task* (action) set to an ERROR state, not the VM itself. 
Which is what I was getting at... the task's history could tell the user 
what failed about the action, but the state of a VM could continue to 
be, for example, ACTIVE (or STOPPED or whatever). In the oscomputevnext 
proposal, I have an ERROR state for virt_state, but you are correct that 
it doesn't make sense to have one there if you have the history of the 
failure of an action in the task item history and ERROR isn't really a 
state of the virtual machine at all, just an operation against one.



I don't think this is all that useful. I think what would be more useful
is changing the Nova API to perform actions against an instance using a
POST /servers/{server_id}/tasks call, allow a user to see a history of
what actions were taken against an instance with a call to GET
/servers/{server_id}/tasks and allow a user to see the progress of a
particular task (say, a rebuild) by calling GET /tasks/{task_id}/items.


Yep, I like that idea.  But I think it's orthogonal to the issue of
desired vs actual state.  When you start a task it could change the
"desired" state, and when the task completes the "actual" state should
match the "expected state.


Not sure it's necessary to have a desired state on the instance (since 
that could be derived from the task history), but I see your point about 
it being orthogonal.



I proposed as much here:

http://docs.oscomputevnext.apiary.io/#servertask


Just curious, where is the equivalent of "evacuate"?


Evacuate is an operator API and IMO is not appropriate to live in the 
same API as the one used by regular users of a compute service.


Put another way: you don't see an evacuate host API in the EC2 API, do 
you? I guarantee there *is* such an API, but it's not in the public 
REST-ish EC2 API that you and I use and may not even be an HTTP API at all.


I talk a little bit more about my opinion on having operator API calls 
mixed into the same compute control API in my notes on GitHub here, if 
you're interested:


https://github.com/jaypipes/openstack-compute-api#operator-api-calls


http://docs.oscomputevnext.apiary.io/#servertaskitem


This would more easily allow for recovery actions, since if the actual
state changes to ERROR (or similar) we would still have the
expected/desired state available for reference when trying to take
recovery actions.


Where would the expected/desired state be stored?  Or is it implicit in
the most recent task attempted for the instance in question?


Right, exactly, it would be implicity in the most recent task attempted.


I think a task-based API and internal system that uses taskflow to
organize related tasks with state machine changes is the best design to
work towards.


I think something like this would certainly be an improvement over what
we have now. That said, I don't see that as mutually exclusive with an
explicit distinction between desired and actual state.  I think having
"nova list" or the dashboard equivalent show both states would be useful.


Sure, fair point.
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-01 Thread Johannes Erdfelt
On Wed, Oct 01, 2014, Chris Friesen  wrote:
> Currently in nova we have the "vm_state", which according to the
> code comments is supposed to represent "a VM's current stable (not
> transition) state", or "what the customer expect the VM to be".
> 
> However, we then added in an ERROR state.  How does this possibly
> make sense given the above definition?  Which customer would ever
> expect the VM to be in an error state?
> 
> Given this, I wonder whether it might make sense to formally
> distinguish between the expected/desired state (i.e. the state that
> the customer wants the VM to be in), and the actual state (i.e. the
> state that nova thinks the VM is in).
> 
> This would more easily allow for recovery actions, since if the
> actual state changes to ERROR (or similar) we would still have the
> expected/desired state available for reference when trying to take
> recovery actions.
> 
> Thoughts?

I'm happy you brought this up because I've had a similar proposal in the
bouncing around in the back of my head lately.

ERROR is a pet peeve of mine because it doesn't tell you the operational
state of the instance. It may be running or it may not be running. It
also ends up complicating logic quite a bit (we have a very ugly patch
to allow users to revert resizes in ERROR).

Also, in a few places we have to store vm_state off into instance
metadata (key 'old_vm_state') so it can be restored to the correct state
(for things like RESCUED). This is fairly ugly.

I've wanted to sit down and work through all of the different vm_state
transitions and figure out to make it all less confusing. I just haven't
had the time to do it yet :(

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-01 Thread Chris Friesen

On 10/01/2014 01:23 PM, Jay Pipes wrote:

On 10/01/2014 03:07 PM, Chris Friesen wrote:

Currently in nova we have the "vm_state", which according to the code
comments is supposed to represent "a VM's current stable (not
transition) state", or "what the customer expect the VM to be".

However, we then added in an ERROR state.  How does this possibly make
sense given the above definition?


Where do you see that vm_state is intended to be "what the customer
expects the VM to be"?


From nova/compute/vm_states.py:
'vm_state describes a VM's current stable (not transition) state. That 
is, if there is no ongoing compute API calls (running tasks), vm_state 
should reflect what the customer expect the VM to be.'


Also, from http://wiki.openstack.org/VMState:
'vm_state reflects the stable state based on API calls, matching user 
expectation, revised “top-down” within API implementation.'



Now granted, the wiki also says 'If the task fails and is not possible 
to rollback, the vm_state is set to ERROR.'  I don't particularly like 
that behaviour, which is why I'd like to see a separate "actual state".



I don't think this is all that useful. I think what would be more useful
is changing the Nova API to perform actions against an instance using a
POST /servers/{server_id}/tasks call, allow a user to see a history of
what actions were taken against an instance with a call to GET
/servers/{server_id}/tasks and allow a user to see the progress of a
particular task (say, a rebuild) by calling GET /tasks/{task_id}/items.


Yep, I like that idea.  But I think it's orthogonal to the issue of 
desired vs actual state.  When you start a task it could change the 
"desired" state, and when the task completes the "actual" state should 
match the "expected state.



I proposed as much here:

http://docs.oscomputevnext.apiary.io/#servertask


Just curious, where is the equivalent of "evacuate"?


http://docs.oscomputevnext.apiary.io/#servertaskitem


This would more easily allow for recovery actions, since if the actual
state changes to ERROR (or similar) we would still have the
expected/desired state available for reference when trying to take
recovery actions.


Where would the expected/desired state be stored?  Or is it implicit in 
the most recent task attempted for the instance in question?



I think a task-based API and internal system that uses taskflow to
organize related tasks with state machine changes is the best design to
work towards.


I think something like this would certainly be an improvement over what 
we have now. That said, I don't see that as mutually exclusive with an 
explicit distinction between desired and actual state.  I think having 
"nova list" or the dashboard equivalent show both states would be useful.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-01 Thread Jay Pipes

On 10/01/2014 03:07 PM, Chris Friesen wrote:

Currently in nova we have the "vm_state", which according to the code
comments is supposed to represent "a VM's current stable (not
transition) state", or "what the customer expect the VM to be".

However, we then added in an ERROR state.  How does this possibly make
sense given the above definition?


Where do you see that vm_state is intended to be "what the customer 
expects the VM to be"? An ERROR vm_state is pretty clearly a stable 
state (i.e. it's not a transient state).


>  Which customer would ever expect the

VM to be in an error state?


Nobody would. But I don't see where you are getting that definition of 
what vm_state means...



Given this, I wonder whether it might make sense to formally distinguish
between the expected/desired state (i.e. the state that the customer
wants the VM to be in), and the actual state (i.e. the state that nova
thinks the VM is in).


I don't think this is all that useful. I think what would be more useful 
is changing the Nova API to perform actions against an instance using a 
POST /servers/{server_id}/tasks call, allow a user to see a history of 
what actions were taken against an instance with a call to GET 
/servers/{server_id}/tasks and allow a user to see the progress of a 
particular task (say, a rebuild) by calling GET /tasks/{task_id}/items.


I proposed as much here:

http://docs.oscomputevnext.apiary.io/#servertask
http://docs.oscomputevnext.apiary.io/#servertaskitem


This would more easily allow for recovery actions, since if the actual
state changes to ERROR (or similar) we would still have the
expected/desired state available for reference when trying to take
recovery actions.


I think a task-based API and internal system that uses taskflow to 
organize related tasks with state machine changes is the best design to 
work towards.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?

2014-10-01 Thread Chris Friesen


Currently in nova we have the "vm_state", which according to the code 
comments is supposed to represent "a VM's current stable (not 
transition) state", or "what the customer expect the VM to be".


However, we then added in an ERROR state.  How does this possibly make 
sense given the above definition?  Which customer would ever expect the 
VM to be in an error state?


Given this, I wonder whether it might make sense to formally distinguish 
between the expected/desired state (i.e. the state that the customer 
wants the VM to be in), and the actual state (i.e. the state that nova 
thinks the VM is in).


This would more easily allow for recovery actions, since if the actual 
state changes to ERROR (or similar) we would still have the 
expected/desired state available for reference when trying to take 
recovery actions.


Thoughts?

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev