Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
Not only ERROR state, but also VERIFY_RESIZE might have this kind problem https://review.openstack.org/#/c/101435/ has more info so guess the server task stuff might be the right direction to those problems ... Best Regards! Kevin (Chen) Ji 纪 晨 Engineer, zVM Development, CSTL Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com Phone: +86-10-82454158 Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC From: Chris Friesen To: "openstack-dev@lists.openstack.org" Date: 10/02/2014 03:05 AM Subject:[openstack-dev] [nova] formally distinguish server "desired state" from "actual state"? Currently in nova we have the "vm_state", which according to the code comments is supposed to represent "a VM's current stable (not transition) state", or "what the customer expect the VM to be". However, we then added in an ERROR state. How does this possibly make sense given the above definition? Which customer would ever expect the VM to be in an error state? Given this, I wonder whether it might make sense to formally distinguish between the expected/desired state (i.e. the state that the customer wants the VM to be in), and the actual state (i.e. the state that nova thinks the VM is in). This would more easily allow for recovery actions, since if the actual state changes to ERROR (or similar) we would still have the expected/desired state available for reference when trying to take recovery actions. Thoughts? Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
On 10/01/2014 04:31 PM, Chris Friesen wrote: On 10/01/2014 01:23 PM, Jay Pipes wrote: On 10/01/2014 03:07 PM, Chris Friesen wrote: Currently in nova we have the "vm_state", which according to the code comments is supposed to represent "a VM's current stable (not transition) state", or "what the customer expect the VM to be". However, we then added in an ERROR state. How does this possibly make sense given the above definition? Where do you see that vm_state is intended to be "what the customer expects the VM to be"? From nova/compute/vm_states.py: 'vm_state describes a VM's current stable (not transition) state. That is, if there is no ongoing compute API calls (running tasks), vm_state should reflect what the customer expect the VM to be.' Hmm, interesting wording. I wasn't aware of that wiki page and I'm not sure about the freshness of it, but I think what the language is saying is that if a user isn't actively running an action against the server, it should be in the last state a user put it in -- i.e. active, terminated, stopped, paused, etc. Also, from http://wiki.openstack.org/VMState: 'vm_state reflects the stable state based on API calls, matching user expectation, revised “top-down” within API implementation.' Yeah, also awkward wording... Now granted, the wiki also says 'If the task fails and is not possible to rollback, the vm_state is set to ERROR.' I don't particularly like that behaviour, which is why I'd like to see a separate "actual state". If we had a task-based system, which is what I am advocating for, you would have a *task* (action) set to an ERROR state, not the VM itself. Which is what I was getting at... the task's history could tell the user what failed about the action, but the state of a VM could continue to be, for example, ACTIVE (or STOPPED or whatever). In the oscomputevnext proposal, I have an ERROR state for virt_state, but you are correct that it doesn't make sense to have one there if you have the history of the failure of an action in the task item history and ERROR isn't really a state of the virtual machine at all, just an operation against one. I don't think this is all that useful. I think what would be more useful is changing the Nova API to perform actions against an instance using a POST /servers/{server_id}/tasks call, allow a user to see a history of what actions were taken against an instance with a call to GET /servers/{server_id}/tasks and allow a user to see the progress of a particular task (say, a rebuild) by calling GET /tasks/{task_id}/items. Yep, I like that idea. But I think it's orthogonal to the issue of desired vs actual state. When you start a task it could change the "desired" state, and when the task completes the "actual" state should match the "expected state. Not sure it's necessary to have a desired state on the instance (since that could be derived from the task history), but I see your point about it being orthogonal. I proposed as much here: http://docs.oscomputevnext.apiary.io/#servertask Just curious, where is the equivalent of "evacuate"? Evacuate is an operator API and IMO is not appropriate to live in the same API as the one used by regular users of a compute service. Put another way: you don't see an evacuate host API in the EC2 API, do you? I guarantee there *is* such an API, but it's not in the public REST-ish EC2 API that you and I use and may not even be an HTTP API at all. I talk a little bit more about my opinion on having operator API calls mixed into the same compute control API in my notes on GitHub here, if you're interested: https://github.com/jaypipes/openstack-compute-api#operator-api-calls http://docs.oscomputevnext.apiary.io/#servertaskitem This would more easily allow for recovery actions, since if the actual state changes to ERROR (or similar) we would still have the expected/desired state available for reference when trying to take recovery actions. Where would the expected/desired state be stored? Or is it implicit in the most recent task attempted for the instance in question? Right, exactly, it would be implicity in the most recent task attempted. I think a task-based API and internal system that uses taskflow to organize related tasks with state machine changes is the best design to work towards. I think something like this would certainly be an improvement over what we have now. That said, I don't see that as mutually exclusive with an explicit distinction between desired and actual state. I think having "nova list" or the dashboard equivalent show both states would be useful. Sure, fair point. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
On Wed, Oct 01, 2014, Chris Friesen wrote: > Currently in nova we have the "vm_state", which according to the > code comments is supposed to represent "a VM's current stable (not > transition) state", or "what the customer expect the VM to be". > > However, we then added in an ERROR state. How does this possibly > make sense given the above definition? Which customer would ever > expect the VM to be in an error state? > > Given this, I wonder whether it might make sense to formally > distinguish between the expected/desired state (i.e. the state that > the customer wants the VM to be in), and the actual state (i.e. the > state that nova thinks the VM is in). > > This would more easily allow for recovery actions, since if the > actual state changes to ERROR (or similar) we would still have the > expected/desired state available for reference when trying to take > recovery actions. > > Thoughts? I'm happy you brought this up because I've had a similar proposal in the bouncing around in the back of my head lately. ERROR is a pet peeve of mine because it doesn't tell you the operational state of the instance. It may be running or it may not be running. It also ends up complicating logic quite a bit (we have a very ugly patch to allow users to revert resizes in ERROR). Also, in a few places we have to store vm_state off into instance metadata (key 'old_vm_state') so it can be restored to the correct state (for things like RESCUED). This is fairly ugly. I've wanted to sit down and work through all of the different vm_state transitions and figure out to make it all less confusing. I just haven't had the time to do it yet :( JE ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
On 10/01/2014 01:23 PM, Jay Pipes wrote: On 10/01/2014 03:07 PM, Chris Friesen wrote: Currently in nova we have the "vm_state", which according to the code comments is supposed to represent "a VM's current stable (not transition) state", or "what the customer expect the VM to be". However, we then added in an ERROR state. How does this possibly make sense given the above definition? Where do you see that vm_state is intended to be "what the customer expects the VM to be"? From nova/compute/vm_states.py: 'vm_state describes a VM's current stable (not transition) state. That is, if there is no ongoing compute API calls (running tasks), vm_state should reflect what the customer expect the VM to be.' Also, from http://wiki.openstack.org/VMState: 'vm_state reflects the stable state based on API calls, matching user expectation, revised “top-down” within API implementation.' Now granted, the wiki also says 'If the task fails and is not possible to rollback, the vm_state is set to ERROR.' I don't particularly like that behaviour, which is why I'd like to see a separate "actual state". I don't think this is all that useful. I think what would be more useful is changing the Nova API to perform actions against an instance using a POST /servers/{server_id}/tasks call, allow a user to see a history of what actions were taken against an instance with a call to GET /servers/{server_id}/tasks and allow a user to see the progress of a particular task (say, a rebuild) by calling GET /tasks/{task_id}/items. Yep, I like that idea. But I think it's orthogonal to the issue of desired vs actual state. When you start a task it could change the "desired" state, and when the task completes the "actual" state should match the "expected state. I proposed as much here: http://docs.oscomputevnext.apiary.io/#servertask Just curious, where is the equivalent of "evacuate"? http://docs.oscomputevnext.apiary.io/#servertaskitem This would more easily allow for recovery actions, since if the actual state changes to ERROR (or similar) we would still have the expected/desired state available for reference when trying to take recovery actions. Where would the expected/desired state be stored? Or is it implicit in the most recent task attempted for the instance in question? I think a task-based API and internal system that uses taskflow to organize related tasks with state machine changes is the best design to work towards. I think something like this would certainly be an improvement over what we have now. That said, I don't see that as mutually exclusive with an explicit distinction between desired and actual state. I think having "nova list" or the dashboard equivalent show both states would be useful. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
On 10/01/2014 03:07 PM, Chris Friesen wrote: Currently in nova we have the "vm_state", which according to the code comments is supposed to represent "a VM's current stable (not transition) state", or "what the customer expect the VM to be". However, we then added in an ERROR state. How does this possibly make sense given the above definition? Where do you see that vm_state is intended to be "what the customer expects the VM to be"? An ERROR vm_state is pretty clearly a stable state (i.e. it's not a transient state). > Which customer would ever expect the VM to be in an error state? Nobody would. But I don't see where you are getting that definition of what vm_state means... Given this, I wonder whether it might make sense to formally distinguish between the expected/desired state (i.e. the state that the customer wants the VM to be in), and the actual state (i.e. the state that nova thinks the VM is in). I don't think this is all that useful. I think what would be more useful is changing the Nova API to perform actions against an instance using a POST /servers/{server_id}/tasks call, allow a user to see a history of what actions were taken against an instance with a call to GET /servers/{server_id}/tasks and allow a user to see the progress of a particular task (say, a rebuild) by calling GET /tasks/{task_id}/items. I proposed as much here: http://docs.oscomputevnext.apiary.io/#servertask http://docs.oscomputevnext.apiary.io/#servertaskitem This would more easily allow for recovery actions, since if the actual state changes to ERROR (or similar) we would still have the expected/desired state available for reference when trying to take recovery actions. I think a task-based API and internal system that uses taskflow to organize related tasks with state machine changes is the best design to work towards. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] formally distinguish server "desired state" from "actual state"?
Currently in nova we have the "vm_state", which according to the code comments is supposed to represent "a VM's current stable (not transition) state", or "what the customer expect the VM to be". However, we then added in an ERROR state. How does this possibly make sense given the above definition? Which customer would ever expect the VM to be in an error state? Given this, I wonder whether it might make sense to formally distinguish between the expected/desired state (i.e. the state that the customer wants the VM to be in), and the actual state (i.e. the state that nova thinks the VM is in). This would more easily allow for recovery actions, since if the actual state changes to ERROR (or similar) we would still have the expected/desired state available for reference when trying to take recovery actions. Thoughts? Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev