Public bug reported: While I was working on another race condition bug around a failure to delete an instance while it was booting [1], I noticed that we have an assumption in the DB API layer that if we fail to soft delete an instance record, it means that a query constraint was not met.
This was misleading when I was working on debugging [1] because the traceback indicated a constraint on the 'host' column was not met: nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy() nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs) nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed') nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed which means that the instance.host changed while attempting to destroy the instance record. This was however not possible in this case as the instance had not yet landed on a compute host (nova-compute sets the instance.host). What had actually happened was that nova-conductor had deleted the instance record after finding that nova-api had deleted the build request, as part of its logic to halt the build of an instance that's being deleted while it's booting. So when nova-api tried to delete the instance record, it failed (returned 0 rows soft deleted). Because of the assumption in the DB API layer that a failure to soft delete means a constraint was not met, it raised ConstraintNotMet, which instance.destroy interprets as "host changed", which makes nova-api expect the instance record to exist. So the handling was for a "host changed" scenario when in reality it was an "instance not found" scenario. We can avoid incorrect exception handling and future confusion while debugging if we make a change to raise InstanceNotFound instead of ConstraintNotMet when the instance record is missing during a soft delete. [1] https://bugs.launchpad.net/nova/+bug/1914777 ** Affects: nova Importance: Undecided Assignee: melanie witt (melwitt) Status: New ** Tags: db -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1917945 Title: ConstraintNotMet raised from DB API layer when instance is not found Status in OpenStack Compute (nova): New Bug description: While I was working on another race condition bug around a failure to delete an instance while it was booting [1], I noticed that we have an assumption in the DB API layer that if we fail to soft delete an instance record, it means that a query constraint was not met. This was misleading when I was working on debugging [1] because the traceback indicated a constraint on the 'host' column was not met: nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy() nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs) nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed') nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed which means that the instance.host changed while attempting to destroy the instance record. This was however not possible in this case as the instance had not yet landed on a compute host (nova-compute sets the instance.host). What had actually happened was that nova-conductor had deleted the instance record after finding that nova-api had deleted the build request, as part of its logic to halt the build of an instance that's being deleted while it's booting. So when nova-api tried to delete the instance record, it failed (returned 0 rows soft deleted). Because of the assumption in the DB API layer that a failure to soft delete means a constraint was not met, it raised ConstraintNotMet, which instance.destroy interprets as "host changed", which makes nova- api expect the instance record to exist. So the handling was for a "host changed" scenario when in reality it was an "instance not found" scenario. We can avoid incorrect exception handling and future confusion while debugging if we make a change to raise InstanceNotFound instead of ConstraintNotMet when the instance record is missing during a soft delete. [1] https://bugs.launchpad.net/nova/+bug/1914777 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1917945/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

