Public bug reported: Description ===========
Another PEBKAC issue but the current evacuation flow allows an admin to force down, evacuate and unset forced down *without* ever restarting the compute service. While it is clearly documented that operators need to fence the source compute service ahead of evacuation (see below) that should cause a service restart it isn't enforced anywhere in the current flow: https://docs.openstack.org/api-ref/compute/?expanded=evacuate-server- evacuate-action-detail#evacuate-server-evacuate-action This leaves evacuation migration records marked as done instead of completed as the source host is never given a chance to clean up. The request to unset forced down should be rejected until this happens and the evacuation migration records are marked as completed. This ultimately could lead to data loss if the instance is migrated back to the host ahead of the next service restart. That restart causing the evacuation clean up logic to fire potentially removing storage from under the running instance. Steps to reproduce ================== - Mark a given host as forced down - Evacuate instances from this host - Unset forced down on the host - Check that the migration records associated with the evacuations are still marked as done Expected result =============== The request to unset forced down is rejected until the service is restarted and evacuation migration records moved to completed. Actual result ============= The request to unset forced down is allowed and evacuation migration records remained marked as done. This could eventually lead to data loss if the instance is migrated back to the host prior to the next service restart. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ Master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? N/A 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1922053 Title: Operators can unset forced-down with `done` evacuation migration records against the host Status in OpenStack Compute (nova): New Bug description: Description =========== Another PEBKAC issue but the current evacuation flow allows an admin to force down, evacuate and unset forced down *without* ever restarting the compute service. While it is clearly documented that operators need to fence the source compute service ahead of evacuation (see below) that should cause a service restart it isn't enforced anywhere in the current flow: https://docs.openstack.org/api-ref/compute/?expanded=evacuate-server- evacuate-action-detail#evacuate-server-evacuate-action This leaves evacuation migration records marked as done instead of completed as the source host is never given a chance to clean up. The request to unset forced down should be rejected until this happens and the evacuation migration records are marked as completed. This ultimately could lead to data loss if the instance is migrated back to the host ahead of the next service restart. That restart causing the evacuation clean up logic to fire potentially removing storage from under the running instance. Steps to reproduce ================== - Mark a given host as forced down - Evacuate instances from this host - Unset forced down on the host - Check that the migration records associated with the evacuations are still marked as done Expected result =============== The request to unset forced down is rejected until the service is restarted and evacuation migration records moved to completed. Actual result ============= The request to unset forced down is allowed and evacuation migration records remained marked as done. This could eventually lead to data loss if the instance is migrated back to the host prior to the next service restart. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ Master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? N/A 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1922053/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

