Reviewed: https://review.opendev.org/658845 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aae5c7aa3819ad9161fd2effed3872d540099230 Submitter: Zuul Branch: master
commit aae5c7aa3819ad9161fd2effed3872d540099230 Author: Matthew Booth <[email protected]> Date: Mon May 13 16:04:39 2019 +0100 Fix retry of instance_update_and_get_original _instance_update modifies its 'values' argument. Consequently if it is retried due to an update conflict, the second invocation has the wrong arguments. A specific issue this causes is that if we called it with expected_task_state a concurrent modification to task_state will cause us to fail and retry. However, expected_task_state will have been popped from values on the first invocation and will not be present for the second. Consequently the second invocation will fail to perform the task_state check and therefore succeed, resulting in a race. We rewrite the old race unit test which wasn't testing the correct thing for 2 reasons: 1. Due to the bug fixed in this patch, although we were calling update_on_match() twice, the second call didn't check the task state. 2. side_effect=iterable returns function items without executing them, but we weren't hitting this due to the bug fixed in this patch. Closes-Bug: #1821373 Change-Id: I01c63e685113bf30e687ccb14a4d18e344b306f6 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1821373 Title: Most instance actions can be called concurrently Status in OpenStack Compute (nova): Fix Released Bug description: A customer reported that they were getting DB corruption if they called shelve twice in quick succession on the same instance. This should be prevented by the guard in nova.API.shelve, which does: instance.task_state = task_states.SHELVING instance.save(expected_task_state=[None]) This is intended to act as a robust gate against 2 instance actions happening concurrently. The first will set the task state to SHELVING, the second will fail because the task state is not SHELVING. The comparison is done atomically in db.instance_update_and_get_original(), and should be race free. However, instance.save() shortcuts if there is no update and does not call db.instance_update_and_get_original(). Therefore this guard fails if we call the same operation twice: instance = get_instance() => Returned instance.task_state is None instance.task_state = task_states.SHELVING instance.save(expected_task_state=[None]) => task_state was None, now SHELVING, updates = {'task_state': SHELVING} => db.instance_update_and_get_original() executes and succeeds instance = get_instance() => Returned instance.task_state is SHELVING instance.task_state = task_states.SHELVING instance.save(expected_task_state=[None]) => task_state was SHELVING, still SHELVING, updates = {} => db.instance_update_and_get_original() does not execute, therefore doesn't raise the expected exception This pattern is common to almost all instance actions in nova api. A quick scan suggests that all of the following actions are affected by this bug, and can therefore all potentially be executed multiple times concurrently for the same instance: restore force_stop start backup snapshot soft reboot hard reboot rebuild revert_resize resize shelve shelve_offload unshelve pause unpause suspend resume rescue unrescue set_admin_password live_migrate evacuate To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1821373/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

