Public bug reported: Description =========== Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore.
Steps to reproduce ================== 1) Suppose nova-compute get down for some reason, and during this downtime period, the user deletes the server by API, then it's records deleted from the DB. After this, nova-compute comes back up again. Now the guest VM is still running on this compute node and consuming resources. 2) During Live-Migration, after the Live-Migration begins, it then runs to completion controlled by libvirt. If something happened to the under- layer infrastructure, eg, rabbitmq dead or networking is terrible congestion, it may not delete the instance on source compute, or it try to rollback but failed, then, there will be 2 of the same instance on both source and destination compute node. On the source host, the instance is a duplication, it's orphan instance for source compute node. Expected result =============== There should be no orphan instances. Actual result ============= Some instances is out of management of Nova. Environment =========== Reproduce such condition is not easy. Refer to discuss on stein meetup: https://etherpad.openstack.org/p/nova-ptg-stein L931 Fix ===== Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is: * reap the instance. * stop the instance. * log the messages only. [default] The interval of the periodic task should be configurable. This was proposed as a Blueprints previously but more qualified as a bug. Refer to: https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances- delete ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1820802 Title: nova orphan instances Status in OpenStack Compute (nova): New Bug description: Description =========== Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore. Steps to reproduce ================== 1) Suppose nova-compute get down for some reason, and during this downtime period, the user deletes the server by API, then it's records deleted from the DB. After this, nova-compute comes back up again. Now the guest VM is still running on this compute node and consuming resources. 2) During Live-Migration, after the Live-Migration begins, it then runs to completion controlled by libvirt. If something happened to the under-layer infrastructure, eg, rabbitmq dead or networking is terrible congestion, it may not delete the instance on source compute, or it try to rollback but failed, then, there will be 2 of the same instance on both source and destination compute node. On the source host, the instance is a duplication, it's orphan instance for source compute node. Expected result =============== There should be no orphan instances. Actual result ============= Some instances is out of management of Nova. Environment =========== Reproduce such condition is not easy. Refer to discuss on stein meetup: https://etherpad.openstack.org/p/nova-ptg-stein L931 Fix ===== Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is: * reap the instance. * stop the instance. * log the messages only. [default] The interval of the periodic task should be configurable. This was proposed as a Blueprints previously but more qualified as a bug. Refer to: https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances- delete To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1820802/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

