Reviewed: https://review.openstack.org/636699 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=19cb8280232fd3b0ba0000a475d061ea9fb10e1a Submitter: Zuul Branch: master
commit 19cb8280232fd3b0ba0000a475d061ea9fb10e1a Author: Jim Rollenhagen <[email protected]> Date: Wed Feb 13 12:59:53 2019 -0500 ironic: check fresh data when sync_power_state doesn't line up We return cached data to sync_power_state to avoid pummeling the ironic API. However, this can lead to a race condition where an instance is powered on, but nova thinks it should be off and calls stop(). Check again without the cache when this happens to make sure we don't unnecessarily kill an instance. Closes-Bug: #1815791 Change-Id: I907b69eb689cf6c169a4869cfc7889308ca419d5 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1815791 Title: Race condition causes Nova to shut off a successfully deployed baremetal server Status in OpenStack Compute (nova): Fix Released Bug description: When booting a baremetal server with Nova, we see Ironic report a successful power on: ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e 7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f - default default] Successfully set node a00696d5-32ba- 475e-9528-59bf11cffea6 power state to power on by power on. But seconds later, Nova (a) triggers a power state sync and then (b) decided the node is in state "power off" and shuts it down: nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516 nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327 nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4 nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291 It looks like Nova is using stale cache data to make this decision. jroll on irc suggests a solution may look like https://review.openstack.org/#/c/636699/ (bypass cache data to verify power state before shutting down the server). This is with nova @ ad842aa and ironic @ 4404292. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1815791/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

