Re: [Openstack-qa-team] wait_for_server_status and Compute API
What I'm saying is that wait_for_server_status is just a function that, at least at the moment, has no concept of this state machine. Coding in a whitelist of good states may be confusing/frustrating as the different projects are expanded. Unless you know to add in a good state for feature X that was just added, writing a test will be a headache. Instead, what I'm proposing is to make it a standard (documented in our HACKING file) that waiting for one of these in-between states is, in general, bad practice. This can then be controlled through code reviews. That's just my opinion. Daryl On Jun 18, 2012, at 12:43 PM, David Kranz wrote: > I am not sure what you mean. I meant that the status argument of > wait_for_server_status should be an enum type, whether that be an actual enum > type or an allowed set of strings. It should be a static type error, or at > least as close as we can come, to pass a value that always means a coding > error. Or are you objecting to any kind of static type checking? > > I also made a mistake in that DELETED should not be in the allowed set > because this method will bomb if the server is actually in that state. The > call to get_server fill fail which is why we also have > wait_for_server_termination. > > -David > > On 6/18/2012 1:37 PM, Daryl Walleck wrote: >> I'm not sure I agree with causing a failure based on what the user inputs to >> a function. I feel like this is something we should enforce through >> standards and not in code. >> >> Daryl >> >> On Jun 18, 2012, at 12:34 PM, David Kranz wrote: >> >>> On 6/18/2012 1:07 PM, Jay Pipes wrote: On 06/18/2012 12:49 PM, Daryl Walleck wrote: > I can verify that rescue is a non-race state. The transition is active > to rescue on setting rescue, and rescue to active when leaving rescue. I don't see a RESCUE state. I see a RESCUED state. Is that what you are referring to here? Want to make sure, since the semantics and tenses of the power, VM, and task states are a bit inconsistent. Best, -jay > - >>> For a black-box test what we have is 'status', which is neither vm-state >>> not task state. I believe 'status' contains the values of the attributes >>> in the below code. I am going to add an assertion to wait_for_server_status >>> that will fail if you give it an ephemeral state. From this list and the comments of Daryl and Jay, I propose the list of >>> allowed states for this check: >>> >>> ACTIVE, VERIFY_RESIZE, STOPPED, SHUTOFF, PAUSED, SUSPENDED, RESCUE, ERROR, >>> DELETED >>> >>> Any comments? >>> >>> >>> From nova/nova/api/openstack/common.py: >>> >>> _STATE_MAP = { >>>vm_states.ACTIVE: { >>>'default': 'ACTIVE', >>>task_states.REBOOTING: 'REBOOT', >>>task_states.REBOOTING_HARD: 'HARD_REBOOT', >>>task_states.UPDATING_PASSWORD: 'PASSWORD', >>>task_states.RESIZE_VERIFY: 'VERIFY_RESIZE', >>>}, >>>vm_states.BUILDING: { >>>'default': 'BUILD', >>>}, >>>vm_states.REBUILDING: { >>>'default': 'REBUILD', >>>}, >>>vm_states.STOPPED: { >>>'default': 'STOPPED', >>>}, >>>vm_states.SHUTOFF: { >>>'default': 'SHUTOFF', >>>}, >>>vm_states.MIGRATING: { >>>'default': 'MIGRATING', >>>}, >>>vm_states.RESIZING: { >>>'default': 'RESIZE', >>>task_states.RESIZE_REVERTING: 'REVERT_RESIZE', >>>}, >>>vm_states.PAUSED: { >>>'default': 'PAUSED', >>>}, >>>vm_states.SUSPENDED: { >>>'default': 'SUSPENDED', >>>}, >>>vm_states.RESCUED: { >>>'default': 'RESCUE', >>>}, >>>vm_states.ERROR: { >>>'default': 'ERROR', >>>}, >>>vm_states.DELETED: { >>>'default': 'DELETED', >>>}, >>>vm_states.SOFT_DELETE: { >>>'default': 'DELETED', >>>}, >>> } >>> >>> def status_from_state(vm_state, task_state='default'): >>>"""Given vm_state and task_state, return a status string.""" >>>task_map = _STATE_MAP.get(vm_state, dict(default='UNKNOWN_STATE')) >>>status = task_map.get(task_state, task_map['default']) >>>LOG.debug("Generated %(status)s from vm_state=%(vm_state)s " >>> "task_state=%(task_state)s." % locals()) >>>return status >>> >>> >>> def vm_state_from_status(status): >>>"""Map the server status string to a vm state.""" >>>for state, task_map in _STATE_MAP.iteritems(): >>>status_string = task_map.get("default") >>>if status.lower() == status_string.lower(): >>>return state >>> >>> >>> >>> -- >>> Mailing list: https://launchpad.net/~openstack-qa-team >>> Post to : openstack-qa-team@lists.launchpad.net >>> Unsubscribe : https://launchpad.net/~openstack-qa-team >>> More help : https://help.launchpad.net/ListHelp > -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/
Re: [Openstack-qa-team] wait_for_server_status and Compute API
I am not sure what you mean. I meant that the status argument of wait_for_server_status should be an enum type, whether that be an actual enum type or an allowed set of strings. It should be a static type error, or at least as close as we can come, to pass a value that always means a coding error. Or are you objecting to any kind of static type checking? I also made a mistake in that DELETED should not be in the allowed set because this method will bomb if the server is actually in that state. The call to get_server fill fail which is why we also have wait_for_server_termination. -David On 6/18/2012 1:37 PM, Daryl Walleck wrote: I'm not sure I agree with causing a failure based on what the user inputs to a function. I feel like this is something we should enforce through standards and not in code. Daryl On Jun 18, 2012, at 12:34 PM, David Kranz wrote: On 6/18/2012 1:07 PM, Jay Pipes wrote: On 06/18/2012 12:49 PM, Daryl Walleck wrote: I can verify that rescue is a non-race state. The transition is active to rescue on setting rescue, and rescue to active when leaving rescue. I don't see a RESCUE state. I see a RESCUED state. Is that what you are referring to here? Want to make sure, since the semantics and tenses of the power, VM, and task states are a bit inconsistent. Best, -jay - For a black-box test what we have is 'status', which is neither vm-state not task state. I believe 'status' contains the values of the attributes in the below code. I am going to add an assertion to wait_for_server_status that will fail if you give it an ephemeral state. From this list and the comments of Daryl and Jay, I propose the list of allowed states for this check: ACTIVE, VERIFY_RESIZE, STOPPED, SHUTOFF, PAUSED, SUSPENDED, RESCUE, ERROR, DELETED Any comments? From nova/nova/api/openstack/common.py: _STATE_MAP = { vm_states.ACTIVE: { 'default': 'ACTIVE', task_states.REBOOTING: 'REBOOT', task_states.REBOOTING_HARD: 'HARD_REBOOT', task_states.UPDATING_PASSWORD: 'PASSWORD', task_states.RESIZE_VERIFY: 'VERIFY_RESIZE', }, vm_states.BUILDING: { 'default': 'BUILD', }, vm_states.REBUILDING: { 'default': 'REBUILD', }, vm_states.STOPPED: { 'default': 'STOPPED', }, vm_states.SHUTOFF: { 'default': 'SHUTOFF', }, vm_states.MIGRATING: { 'default': 'MIGRATING', }, vm_states.RESIZING: { 'default': 'RESIZE', task_states.RESIZE_REVERTING: 'REVERT_RESIZE', }, vm_states.PAUSED: { 'default': 'PAUSED', }, vm_states.SUSPENDED: { 'default': 'SUSPENDED', }, vm_states.RESCUED: { 'default': 'RESCUE', }, vm_states.ERROR: { 'default': 'ERROR', }, vm_states.DELETED: { 'default': 'DELETED', }, vm_states.SOFT_DELETE: { 'default': 'DELETED', }, } def status_from_state(vm_state, task_state='default'): """Given vm_state and task_state, return a status string.""" task_map = _STATE_MAP.get(vm_state, dict(default='UNKNOWN_STATE')) status = task_map.get(task_state, task_map['default']) LOG.debug("Generated %(status)s from vm_state=%(vm_state)s " "task_state=%(task_state)s." % locals()) return status def vm_state_from_status(status): """Map the server status string to a vm state.""" for state, task_map in _STATE_MAP.iteritems(): status_string = task_map.get("default") if status.lower() == status_string.lower(): return state -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
Re: [Openstack-qa-team] wait_for_server_status and Compute API
I'm not sure I agree with causing a failure based on what the user inputs to a function. I feel like this is something we should enforce through standards and not in code. Daryl On Jun 18, 2012, at 12:34 PM, David Kranz wrote: > On 6/18/2012 1:07 PM, Jay Pipes wrote: >> On 06/18/2012 12:49 PM, Daryl Walleck wrote: >>> I can verify that rescue is a non-race state. The transition is active >>> to rescue on setting rescue, and rescue to active when leaving rescue. >> >> I don't see a RESCUE state. I see a RESCUED state. Is that what you are >> referring to here? Want to make sure, since the semantics and tenses of the >> power, VM, and task states are a bit inconsistent. >> >> Best, >> -jay >> >>> - > For a black-box test what we have is 'status', which is neither vm-state not > task state. I believe 'status' contains the values of the attributes in the > below code. I am going to add an assertion to wait_for_server_status that > will fail if you give it an ephemeral state. >> From this list and the comments of Daryl and Jay, I propose the list of > allowed states for this check: > > ACTIVE, VERIFY_RESIZE, STOPPED, SHUTOFF, PAUSED, SUSPENDED, RESCUE, ERROR, > DELETED > > Any comments? > > > From nova/nova/api/openstack/common.py: > > _STATE_MAP = { >vm_states.ACTIVE: { >'default': 'ACTIVE', >task_states.REBOOTING: 'REBOOT', >task_states.REBOOTING_HARD: 'HARD_REBOOT', >task_states.UPDATING_PASSWORD: 'PASSWORD', >task_states.RESIZE_VERIFY: 'VERIFY_RESIZE', >}, >vm_states.BUILDING: { >'default': 'BUILD', >}, >vm_states.REBUILDING: { >'default': 'REBUILD', >}, >vm_states.STOPPED: { >'default': 'STOPPED', >}, >vm_states.SHUTOFF: { >'default': 'SHUTOFF', >}, >vm_states.MIGRATING: { >'default': 'MIGRATING', >}, >vm_states.RESIZING: { >'default': 'RESIZE', >task_states.RESIZE_REVERTING: 'REVERT_RESIZE', >}, >vm_states.PAUSED: { >'default': 'PAUSED', >}, >vm_states.SUSPENDED: { >'default': 'SUSPENDED', >}, >vm_states.RESCUED: { >'default': 'RESCUE', >}, >vm_states.ERROR: { >'default': 'ERROR', >}, >vm_states.DELETED: { >'default': 'DELETED', >}, >vm_states.SOFT_DELETE: { >'default': 'DELETED', >}, > } > > def status_from_state(vm_state, task_state='default'): >"""Given vm_state and task_state, return a status string.""" >task_map = _STATE_MAP.get(vm_state, dict(default='UNKNOWN_STATE')) >status = task_map.get(task_state, task_map['default']) >LOG.debug("Generated %(status)s from vm_state=%(vm_state)s " > "task_state=%(task_state)s." % locals()) >return status > > > def vm_state_from_status(status): >"""Map the server status string to a vm state.""" >for state, task_map in _STATE_MAP.iteritems(): >status_string = task_map.get("default") >if status.lower() == status_string.lower(): >return state > > > > -- > Mailing list: https://launchpad.net/~openstack-qa-team > Post to : openstack-qa-team@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack-qa-team > More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
Re: [Openstack-qa-team] wait_for_server_status and Compute API
On 6/18/2012 1:07 PM, Jay Pipes wrote: On 06/18/2012 12:49 PM, Daryl Walleck wrote: I can verify that rescue is a non-race state. The transition is active to rescue on setting rescue, and rescue to active when leaving rescue. I don't see a RESCUE state. I see a RESCUED state. Is that what you are referring to here? Want to make sure, since the semantics and tenses of the power, VM, and task states are a bit inconsistent. Best, -jay - For a black-box test what we have is 'status', which is neither vm-state not task state. I believe 'status' contains the values of the attributes in the below code. I am going to add an assertion to wait_for_server_status that will fail if you give it an ephemeral state. From this list and the comments of Daryl and Jay, I propose the list of allowed states for this check: ACTIVE, VERIFY_RESIZE, STOPPED, SHUTOFF, PAUSED, SUSPENDED, RESCUE, ERROR, DELETED Any comments? From nova/nova/api/openstack/common.py: _STATE_MAP = { vm_states.ACTIVE: { 'default': 'ACTIVE', task_states.REBOOTING: 'REBOOT', task_states.REBOOTING_HARD: 'HARD_REBOOT', task_states.UPDATING_PASSWORD: 'PASSWORD', task_states.RESIZE_VERIFY: 'VERIFY_RESIZE', }, vm_states.BUILDING: { 'default': 'BUILD', }, vm_states.REBUILDING: { 'default': 'REBUILD', }, vm_states.STOPPED: { 'default': 'STOPPED', }, vm_states.SHUTOFF: { 'default': 'SHUTOFF', }, vm_states.MIGRATING: { 'default': 'MIGRATING', }, vm_states.RESIZING: { 'default': 'RESIZE', task_states.RESIZE_REVERTING: 'REVERT_RESIZE', }, vm_states.PAUSED: { 'default': 'PAUSED', }, vm_states.SUSPENDED: { 'default': 'SUSPENDED', }, vm_states.RESCUED: { 'default': 'RESCUE', }, vm_states.ERROR: { 'default': 'ERROR', }, vm_states.DELETED: { 'default': 'DELETED', }, vm_states.SOFT_DELETE: { 'default': 'DELETED', }, } def status_from_state(vm_state, task_state='default'): """Given vm_state and task_state, return a status string.""" task_map = _STATE_MAP.get(vm_state, dict(default='UNKNOWN_STATE')) status = task_map.get(task_state, task_map['default']) LOG.debug("Generated %(status)s from vm_state=%(vm_state)s " "task_state=%(task_state)s." % locals()) return status def vm_state_from_status(status): """Map the server status string to a vm state.""" for state, task_map in _STATE_MAP.iteritems(): status_string = task_map.get("default") if status.lower() == status_string.lower(): return state -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
Re: [Openstack-qa-team] wait_for_server_status and Compute API
On 06/18/2012 12:49 PM, Daryl Walleck wrote: I can verify that rescue is a non-race state. The transition is active to rescue on setting rescue, and rescue to active when leaving rescue. I don't see a RESCUE state. I see a RESCUED state. Is that what you are referring to here? Want to make sure, since the semantics and tenses of the power, VM, and task states are a bit inconsistent. Best, -jay Original message Subject: Re: [Openstack-qa-team] wait_for_server_status and Compute API From: Jay Pipes To: "openstack-qa-team@lists.launchpad.net" ,"openst...@lists.launchpad.net" CC: Re: [Openstack-qa-team] wait_for_server_status and Compute API On 06/18/2012 12:01 PM, David Kranz wrote: There are a few tempest tests, and many in the old kong suite that is still there, that wait for a server status that is something other than ACTIVE or VERIFY_RESIZE. These other states, such as BUILD or REBOOT, are transient so I don't understand why it is correct for code to poll for those states. Am I missing something or do those tests have race condition bugs? No, you are correct, and I have made some comments in recent code reviews to that effect. Here are all the task states: https://github.com/openstack/nova/blob/master/nova/compute/task_states.py Out of all those task states, I believe the only one safe to poll in a wait loop is RESIZE_VERIFY. All the others are prone to state transitions outside the control of the user. For the VM states: https://github.com/openstack/nova/blob/master/nova/compute/vm_states.py I consider the following to be non-racy, quiescent states: ACTIVE DELETED STOPPED SHUTDOFF PAUSED SUSPENDED ERROR I consider the following to be racy states that should not be tested for: MIGRATING -- Instead, the final state should be checked for... RESIZING -- Instead, the RESIZE_VERIFY and RESIZE_CONFIRM task states should be checked I have absolutely no idea what the state termination is for the following VM states: RESCUED -- is this a permanent state? Is this able to be queried for in a consistent manner before it transitions to some further state? SOFT_DELETE -- I have no clue what the purpose or queryability of this state is, but would love to know... Best, -jay -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
Re: [Openstack-qa-team] wait_for_server_status and Compute API
I can verify that rescue is a non-race state. The transition is active to rescue on setting rescue, and rescue to active when leaving rescue. Original message Subject: Re: [Openstack-qa-team] wait_for_server_status and Compute API From: Jay Pipes To: "openstack-qa-team@lists.launchpad.net" ,"openst...@lists.launchpad.net" CC: Re: [Openstack-qa-team] wait_for_server_status and Compute API On 06/18/2012 12:01 PM, David Kranz wrote: > There are a few tempest tests, and many in the old kong suite that is > still there, that wait for a server status that is something other than > ACTIVE or VERIFY_RESIZE. These other states, such as BUILD or REBOOT, > are transient so I don't understand why it is correct for code to poll > for those states. Am I missing something or do those tests have race > condition bugs? No, you are correct, and I have made some comments in recent code reviews to that effect. Here are all the task states: https://github.com/openstack/nova/blob/master/nova/compute/task_states.py Out of all those task states, I believe the only one safe to poll in a wait loop is RESIZE_VERIFY. All the others are prone to state transitions outside the control of the user. For the VM states: https://github.com/openstack/nova/blob/master/nova/compute/vm_states.py I consider the following to be non-racy, quiescent states: ACTIVE DELETED STOPPED SHUTDOFF PAUSED SUSPENDED ERROR I consider the following to be racy states that should not be tested for: MIGRATING -- Instead, the final state should be checked for... RESIZING -- Instead, the RESIZE_VERIFY and RESIZE_CONFIRM task states should be checked I have absolutely no idea what the state termination is for the following VM states: RESCUED -- is this a permanent state? Is this able to be queried for in a consistent manner before it transitions to some further state? SOFT_DELETE -- I have no clue what the purpose or queryability of this state is, but would love to know... Best, -jay -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
Re: [Openstack-qa-team] wait_for_server_status and Compute API
On 06/18/2012 12:01 PM, David Kranz wrote: There are a few tempest tests, and many in the old kong suite that is still there, that wait for a server status that is something other than ACTIVE or VERIFY_RESIZE. These other states, such as BUILD or REBOOT, are transient so I don't understand why it is correct for code to poll for those states. Am I missing something or do those tests have race condition bugs? No, you are correct, and I have made some comments in recent code reviews to that effect. Here are all the task states: https://github.com/openstack/nova/blob/master/nova/compute/task_states.py Out of all those task states, I believe the only one safe to poll in a wait loop is RESIZE_VERIFY. All the others are prone to state transitions outside the control of the user. For the VM states: https://github.com/openstack/nova/blob/master/nova/compute/vm_states.py I consider the following to be non-racy, quiescent states: ACTIVE DELETED STOPPED SHUTDOFF PAUSED SUSPENDED ERROR I consider the following to be racy states that should not be tested for: MIGRATING -- Instead, the final state should be checked for... RESIZING -- Instead, the RESIZE_VERIFY and RESIZE_CONFIRM task states should be checked I have absolutely no idea what the state termination is for the following VM states: RESCUED -- is this a permanent state? Is this able to be queried for in a consistent manner before it transitions to some further state? SOFT_DELETE -- I have no clue what the purpose or queryability of this state is, but would love to know... Best, -jay -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp
[Openstack-qa-team] wait_for_server_status and Compute API
There are a few tempest tests, and many in the old kong suite that is still there, that wait for a server status that is something other than ACTIVE or VERIFY_RESIZE. These other states, such as BUILD or REBOOT, are transient so I don't understand why it is correct for code to poll for those states. Am I missing something or do those tests have race condition bugs? -David -- Mailing list: https://launchpad.net/~openstack-qa-team Post to : openstack-qa-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack-qa-team More help : https://help.launchpad.net/ListHelp