Reviewed: https://review.opendev.org/702368 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f35930eef8fa27ee972e87366abb38596839fdba Submitter: Zuul Branch: master
commit f35930eef8fa27ee972e87366abb38596839fdba Author: Alexandre Arents <[email protected]> Date: Mon Jan 13 15:53:24 2020 +0000 Avoid allocation leak when deleting instance stuck in BUILD During instance build, conductor claim resources to scheduler and create instance DB entry in cell. If for any reason conductor is not able to complete a build after instance claim (ex: AMQP issues, conductor restart before build completes) and in the mean time user requests deletion of its stuck instance in BUILD, nova api will delete build_request but let allocation in place resulting in a leak. The change proposes that nova api ensures allocation cleanup is made in case of ongoing/incomplete build. Note that because build did not reach a cell, compute is not able to heal allocation during its periodic update_available_resource task. Furthermore, it ensures that instance mapping is also queued for deletion. Change-Id: I4d3193d8401614311010ed0e055fcb3aaeeebaed Closes-Bug: #1859496 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1859496 Title: Deleting stuck build instance may leak allocations Status in OpenStack Compute (nova): Fix Released Bug description: Description =========== After issues in control plane during instance creation, Instance may stay stuck in BUILD state. Even after deleting them, placement allocation may remain, and compute host log is complaining that: Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database. Steps to reproduce ================== On a fresh devstack master install 1) open a terminal that display entry in placement.allocations and nova_cell1.instances all seconds: while true ; do date ; mysql -e "select * from placement.allocations" ; mysql -e "select * from nova_cell1.instances where deleted=0" ;sleep 1 ; done 2) Trigguer a spawn of 50 instances & kill rabbit after 5sec to simulate issue on control plane: openstack server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic net-id=private alex --min 50 --max 50 & sleep 5 ; sudo pkill rabbitmq-server Note: To reach the bug, goal is to get instances Allocated by scheduler, but not let the time to conductor to create entry in nova_cell1.instances You should see allocations appearing in allocations: +---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+ | created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used | +---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+ | 2020-01-13 11:02:51 | NULL | 1727 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 2 | 1 | | 2020-01-13 11:02:51 | NULL | 1728 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 1 | 512 | | 2020-01-13 11:02:51 | NULL | 1729 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 0 | 1 | | 2020-01-13 11:02:51 | NULL | 1730 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 2 | 1 | | 2020-01-13 11:02:51 | NULL | 1731 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 1 | 512 | ..... instances are all stuck in BUILD at this stage 3) delete instances: openstack server list | awk '/m1.tiny/ {print $2}' | xargs openstack server delete 4) service rabbitmq-server start 5) openstack server list <display nothing> 6) mysql -e "select count(*) from placement.allocations" +----------+ | count(*) | +----------+ | 150 | +----------+ Allocation remains 7) nova-compute logs complaining that: Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database. Expected result =============== placement allocation of instance have to be cleanup after deletion Actual result ============= placement allocation of instance are leaked. Environment =========== At least stein to master seems impacted To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1859496/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

