Public bug reported: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances.
However, at the time that routine is called from ResourceTracker.update_available_resource() we do not hold COMPUTE_RESOURCE_SEMAPHORE. This means that instance claims can be modified (due to instance creation/deletion/resize/migration/etc), potentially causing the calculated value for data['disk_available_least'] to not actually reflect current reality, and potentially allowing different eventlets to have different views of data['disk_available_least']. There was a related bug reported some time back (https://bugs.launchpad.net/nova/+bug/968339) but rather than deal with the underlying race condition they just sort of papered over it by ignoring the InstanceNotFound exception. ** Affects: nova Importance: Undecided Status: New ** Tags: compute race-condition ** Description changed: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances. However, at the time that routine is called from ResourceTracker.update_available_resource() we do not hold COMPUTE_RESOURCE_SEMAPHORE. This means that instances can be created/destroyed/resized, causing the calculated value for data['disk_available_least'] to not actually reflect current reality. + + There was a related bug reported some time back + (https://bugs.launchpad.net/nova/+bug/968339) but rather than deal with + the underlying race condition they just sort of papered over it by + ignoring the InstanceNotFound exception. ** Description changed: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances. However, at the time that routine is called from ResourceTracker.update_available_resource() we do not hold - COMPUTE_RESOURCE_SEMAPHORE. This means that instances can be - created/destroyed/resized, causing the calculated value for - data['disk_available_least'] to not actually reflect current reality. + COMPUTE_RESOURCE_SEMAPHORE. This means that instance claims can be + modified (due to instance creation/deletion/resize/migration/etc), + causing the calculated value for data['disk_available_least'] to not + actually reflect current reality. There was a related bug reported some time back (https://bugs.launchpad.net/nova/+bug/968339) but rather than deal with the underlying race condition they just sort of papered over it by ignoring the InstanceNotFound exception. ** Description changed: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances. However, at the time that routine is called from ResourceTracker.update_available_resource() we do not hold COMPUTE_RESOURCE_SEMAPHORE. This means that instance claims can be modified (due to instance creation/deletion/resize/migration/etc), - causing the calculated value for data['disk_available_least'] to not - actually reflect current reality. + potentially causing the calculated value for + data['disk_available_least'] to not actually reflect current reality, + and potentially allowing different eventlets to have different views of + data['disk_available_least']. There was a related bug reported some time back (https://bugs.launchpad.net/nova/+bug/968339) but rather than deal with the underlying race condition they just sort of papered over it by ignoring the InstanceNotFound exception. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1577642 Title: race between disk_available_least and instance operations Status in OpenStack Compute (nova): New Bug description: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances. However, at the time that routine is called from ResourceTracker.update_available_resource() we do not hold COMPUTE_RESOURCE_SEMAPHORE. This means that instance claims can be modified (due to instance creation/deletion/resize/migration/etc), potentially causing the calculated value for data['disk_available_least'] to not actually reflect current reality, and potentially allowing different eventlets to have different views of data['disk_available_least']. There was a related bug reported some time back (https://bugs.launchpad.net/nova/+bug/968339) but rather than deal with the underlying race condition they just sort of papered over it by ignoring the InstanceNotFound exception. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1577642/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp