Public bug reported: Version: OpenStack Liberty
Boot from volumes that fail in volume initialize_connection are not rescheduled. Initialize connection failures can be very host-specific and in many cases the boot would succeed if the instance build was rescheduled to another host. The instance is not rescheduled because the initialize_connection is being called down this stack: nova.compute.manager _build_resources nova.compute.manager _prep_block_device nova.virt.block_device attach_block_devices nova.virt.block_device.DriverVolumeBlockDevice.attach When this fails an exception is thrown which lands in this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1740 and throws an InvalidBDM exception which is caught by this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2110 this in turn throws a BuildAbortException which causes the instance to not be rescheduled by landing the flow in this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2004 To fix this we likely need a different exception thrown from nova.virt.block_device.DriverVolumeBlockDevice.attach when the failure is in initialize_connection and then work back up the stack to ensure that when this different exception is thrown a BuildAbortException is not thrown so the reschedule can happen. ** Affects: nova Importance: Undecided Assignee: Samuel Matzek (smatzek) Status: New ** Changed in: nova Assignee: (unassigned) => Samuel Matzek (smatzek) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1488111 Title: Boot from volumes that fail in initialize_connection are not rescheduled Status in OpenStack Compute (nova): New Bug description: Version: OpenStack Liberty Boot from volumes that fail in volume initialize_connection are not rescheduled. Initialize connection failures can be very host-specific and in many cases the boot would succeed if the instance build was rescheduled to another host. The instance is not rescheduled because the initialize_connection is being called down this stack: nova.compute.manager _build_resources nova.compute.manager _prep_block_device nova.virt.block_device attach_block_devices nova.virt.block_device.DriverVolumeBlockDevice.attach When this fails an exception is thrown which lands in this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1740 and throws an InvalidBDM exception which is caught by this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2110 this in turn throws a BuildAbortException which causes the instance to not be rescheduled by landing the flow in this block: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2004 To fix this we likely need a different exception thrown from nova.virt.block_device.DriverVolumeBlockDevice.attach when the failure is in initialize_connection and then work back up the stack to ensure that when this different exception is thrown a BuildAbortException is not thrown so the reschedule can happen. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1488111/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

