Re: [openstack-dev] [nova] Networks are not cleaned up in build failure
On 01/15/2015 12:55 PM, Andrew Laski wrote: > On 01/15/2015 09:33 AM, Brian Haley wrote: >> On 01/14/2015 02:15 PM, Andrew Laski wrote: >>> On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote: Hi All, I recently experienced failures getting images from Glance while spawning instances. This step comes after building the networks in the guild sequence. When the Glance failure occurred the instance was cleaned up and rescheduled as expected, but the networks were not cleaned up. On investigation I found that the cleanup code for the networks is in the compute manager’s _/do_build_run/_instance() method as follows: # NOTE(comstud): Deallocate networks if the driver wants # us to do so. if self.driver.deallocate_networks_on_reschedule(instance): self._cleanup_allocated_networks(context, instance, requested_networks) The default behavior in for the deallocate_networks_on_schedule() method defined in ComputeDriver is: def deallocate_networks_on_reschedule(self, instance): """Does the driver want networks deallocated on reschedule?""" return False Only the Ironic driver over rides this method to return True, so I think this means the networks will not be cleaned up for any other virt driver. Is this really the desired behavior? >>> Yes. Other than when using Ironic there is nothing specific to a particular >>> host in the networking setup. This means it is not necessary to deallocate >>> and >>> reallocate networks when an instance is rescheduled, so we can avoid the >>> unnecessary work of doing it. >> That's either not true any more, or not true when DVR is enabled in Neutron, >> since in this case the port['binding:host_id'] value has been initialized to >> a >> compute node, and won't get updated when nova-conductor re-schedules the VM >> elsewhere. >> >> This causes the neutron port to stay on the original compute node, and any >> neutron operations (like floatingip-associate) happen on the "old" port, >> leaving >> the VM unreachable. > > Gotcha. Then we should be rebinding that port on a reschedule or go back to > de/reallocating. I'm assuming there's some way to handle the port being moved > or resizes would be broken for the same reason. > > If we do need to move back to de/reallocation of networks I think it would be > better to remove the conditional nature of it and just do it. If the > deallocate_networks_on_reschedule method defaults to True I don't see a case > where it would be overridden by a driver given the information above. Andrew, I was able to run a test here on a multi-node setup with DVR enabled: - Booted VM - Associated floating IP - Updated binding:host_id (as admin) using the neutron API: $ neutron port-update $port -- --binding:host_id=novacompute5 The port was correctly moved to the other compute node and the floating IP configured. So that showed me the agents all did the right thing as far as I can tell. I know Paul was looking at the nova code to try and update just this field, I'll check-in with him regarding that so we can get a patch up soon. -Brian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Networks are not cleaned up in build failure
On 01/15/2015 09:33 AM, Brian Haley wrote: On 01/14/2015 02:15 PM, Andrew Laski wrote: On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote: Hi All, I recently experienced failures getting images from Glance while spawning instances. This step comes after building the networks in the guild sequence. When the Glance failure occurred the instance was cleaned up and rescheduled as expected, but the networks were not cleaned up. On investigation I found that the cleanup code for the networks is in the compute manager’s _/do_build_run/_instance() method as follows: # NOTE(comstud): Deallocate networks if the driver wants # us to do so. if self.driver.deallocate_networks_on_reschedule(instance): self._cleanup_allocated_networks(context, instance, requested_networks) The default behavior in for the deallocate_networks_on_schedule() method defined in ComputeDriver is: def deallocate_networks_on_reschedule(self, instance): """Does the driver want networks deallocated on reschedule?""" return False Only the Ironic driver over rides this method to return True, so I think this means the networks will not be cleaned up for any other virt driver. Is this really the desired behavior? Yes. Other than when using Ironic there is nothing specific to a particular host in the networking setup. This means it is not necessary to deallocate and reallocate networks when an instance is rescheduled, so we can avoid the unnecessary work of doing it. That's either not true any more, or not true when DVR is enabled in Neutron, since in this case the port['binding:host_id'] value has been initialized to a compute node, and won't get updated when nova-conductor re-schedules the VM elsewhere. This causes the neutron port to stay on the original compute node, and any neutron operations (like floatingip-associate) happen on the "old" port, leaving the VM unreachable. Gotcha. Then we should be rebinding that port on a reschedule or go back to de/reallocating. I'm assuming there's some way to handle the port being moved or resizes would be broken for the same reason. If we do need to move back to de/reallocation of networks I think it would be better to remove the conditional nature of it and just do it. If the deallocate_networks_on_reschedule method defaults to True I don't see a case where it would be overridden by a driver given the information above. If the instance goes to ERROR then the network will get cleaned up when the instance is deleted. I think we need to clean-up even in this case now too. -Brian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Networks are not cleaned up in build failure
On 01/14/2015 02:15 PM, Andrew Laski wrote: > > On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote: >> >> Hi All, >> >> I recently experienced failures getting images from Glance while spawning >> instances. This step comes after building the networks in the guild sequence. >> When the Glance failure occurred the instance was cleaned up and rescheduled >> as expected, but the networks were not cleaned up. On investigation I found >> that the cleanup code for the networks is in the compute manager’s >> _/do_build_run/_instance() method as follows: >> >> # NOTE(comstud): Deallocate networks if the driver wants >> # us to do so. >> if self.driver.deallocate_networks_on_reschedule(instance): >> self._cleanup_allocated_networks(context, instance, >> requested_networks) >> >> The default behavior in for the deallocate_networks_on_schedule() method >> defined in ComputeDriver is: >> >> def deallocate_networks_on_reschedule(self, instance): >> """Does the driver want networks deallocated on reschedule?""" >> return False >> >> Only the Ironic driver over rides this method to return True, so I think this >> means the networks will not be cleaned up for any other virt driver. >> >> >> >> Is this really the desired behavior? >> > > Yes. Other than when using Ironic there is nothing specific to a particular > host in the networking setup. This means it is not necessary to deallocate > and > reallocate networks when an instance is rescheduled, so we can avoid the > unnecessary work of doing it. That's either not true any more, or not true when DVR is enabled in Neutron, since in this case the port['binding:host_id'] value has been initialized to a compute node, and won't get updated when nova-conductor re-schedules the VM elsewhere. This causes the neutron port to stay on the original compute node, and any neutron operations (like floatingip-associate) happen on the "old" port, leaving the VM unreachable. > If the instance goes to ERROR then the network will get cleaned up when the > instance is deleted. I think we need to clean-up even in this case now too. -Brian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Networks are not cleaned up in build failure
On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote: Hi All, I recently experienced failures getting images from Glance while spawning instances. This step comes after building the networks in the guild sequence. When the Glance failure occurred the instance was cleaned up and rescheduled as expected, but the networks were not cleaned up. On investigation I found that the cleanup code for the networks is in the compute manager’s _/do_build_run/_instance() method as follows: # NOTE(comstud): Deallocate networks if the driver wants # us to do so. if self.driver.deallocate_networks_on_reschedule(instance): self._cleanup_allocated_networks(context, instance, requested_networks) The default behavior in for the deallocate_networks_on_schedule() method defined in ComputeDriver is: def deallocate_networks_on_reschedule(self, instance): """Does the driver want networks deallocated on reschedule?""" return False Only the Ironic driver over rides this method to return True, so I think this means the networks will not be cleaned up for any other virt driver. Is this really the desired behavior? Yes. Other than when using Ironic there is nothing specific to a particular host in the networking setup. This means it is not necessary to deallocate and reallocate networks when an instance is rescheduled, so we can avoid the unnecessary work of doing it. If the instance goes to ERROR then the network will get cleaned up when the instance is deleted. I have filed a bug for this and plan to fix it: https://bugs.launchpad.net/nova/+bug/1410739 My initial thought is to fix this either by making the method in the base class return True or by adding the method to virt drivers returning True (I would expect the former). But I wanted to check if there is a reason for the base class behavior (and so the default behavior) to be **NOT** to clean up the networks? Paul Paul Murray Nova Technical Lead, HP Cloud +44 117 316 2527 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL". __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Networks are not cleaned up in build failure
Hi All, I recently experienced failures getting images from Glance while spawning instances. This step comes after building the networks in the guild sequence. When the Glance failure occurred the instance was cleaned up and rescheduled as expected, but the networks were not cleaned up. On investigation I found that the cleanup code for the networks is in the compute manager's _do_build_run_instance() method as follows: # NOTE(comstud): Deallocate networks if the driver wants # us to do so. if self.driver.deallocate_networks_on_reschedule(instance): self._cleanup_allocated_networks(context, instance, requested_networks) The default behavior in for the deallocate_networks_on_schedule() method defined in ComputeDriver is: def deallocate_networks_on_reschedule(self, instance): """Does the driver want networks deallocated on reschedule?""" return False Only the Ironic driver over rides this method to return True, so I think this means the networks will not be cleaned up for any other virt driver. Is this really the desired behavior? I have filed a bug for this and plan to fix it: https://bugs.launchpad.net/nova/+bug/1410739 My initial thought is to fix this either by making the method in the base class return True or by adding the method to virt drivers returning True (I would expect the former). But I wanted to check if there is a reason for the base class behavior (and so the default behavior) to be *NOT* to clean up the networks? Paul Paul Murray Nova Technical Lead, HP Cloud +44 117 316 2527 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL". __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev