Public bug reported: When ironic service is shortly down (e.g. ironic conductor down), removing an instance will immediately make this instance into error state without retry.
After investigation, it points to the code segment: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L977-L984 When conductor is down, we will not receive the InstanceDeployFailure exception. The exception is raised, so ironic will not apply the configuration CONF.ironic.api_max_retries and CONF.ironic.api_retry_interval. Reproduce: 1. nova boot a baremetal instance. 2. reboot the ironic conductor node (or stop conductor service). 3. remove instance in spawn. 4. instance go into error state, not after 2 minutes (default value). As a comparison, simply comments L983-984 to reproduce. Proposed fix: Improve the exception handling to be more robust. ** Affects: nova Importance: Undecided Status: New ** Tags: ironic -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1685590 Title: No retry for removing instance in case of ironic service down Status in OpenStack Compute (nova): New Bug description: When ironic service is shortly down (e.g. ironic conductor down), removing an instance will immediately make this instance into error state without retry. After investigation, it points to the code segment: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L977-L984 When conductor is down, we will not receive the InstanceDeployFailure exception. The exception is raised, so ironic will not apply the configuration CONF.ironic.api_max_retries and CONF.ironic.api_retry_interval. Reproduce: 1. nova boot a baremetal instance. 2. reboot the ironic conductor node (or stop conductor service). 3. remove instance in spawn. 4. instance go into error state, not after 2 minutes (default value). As a comparison, simply comments L983-984 to reproduce. Proposed fix: Improve the exception handling to be more robust. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1685590/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

