Re: [openstack-dev] [nova] Networks are not cleaned up in build failure

2015-01-15 Thread Andrew Laski


On 01/15/2015 09:33 AM, Brian Haley wrote:

On 01/14/2015 02:15 PM, Andrew Laski wrote:

On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:

Hi All,

I recently experienced failures getting images from Glance while spawning
instances. This step comes after building the networks in the guild sequence.
When the Glance failure occurred the instance was cleaned up and rescheduled
as expected, but the networks were not cleaned up. On investigation I found
that the cleanup code for the networks is in the compute manager’s
_/do_build_run/_instance() method as follows:

 # NOTE(comstud): Deallocate networks if the driver wants
 # us to do so.
 if self.driver.deallocate_networks_on_reschedule(instance):
 self._cleanup_allocated_networks(context, instance,
 requested_networks)

The default behavior in for the deallocate_networks_on_schedule() method
defined in ComputeDriver is:

 def deallocate_networks_on_reschedule(self, instance):
 Does the driver want networks deallocated on reschedule?
 return False

Only the Ironic driver over rides this method to return True, so I think this
means the networks will not be cleaned up for any other virt driver.

  


Is this really the desired behavior?


Yes.  Other than when using Ironic there is nothing specific to a particular
host in the networking setup.  This means it is not necessary to deallocate and
reallocate networks when an instance is rescheduled, so we can avoid the
unnecessary work of doing it.

That's either not true any more, or not true when DVR is enabled in Neutron,
since in this case the port['binding:host_id'] value has been initialized to a
compute node, and won't get updated when nova-conductor re-schedules the VM
elsewhere.

This causes the neutron port to stay on the original compute node, and any
neutron operations (like floatingip-associate) happen on the old port, leaving
the VM unreachable.


Gotcha.  Then we should be rebinding that port on a reschedule or go 
back to de/reallocating.  I'm assuming there's some way to handle the 
port being moved or resizes would be broken for the same reason.


If we do need to move back to de/reallocation of networks I think it 
would be better to remove the conditional nature of it and just do it.  
If the deallocate_networks_on_reschedule method defaults to True I don't 
see a case where it would be overridden by a driver given the 
information above.



If the instance goes to ERROR then the network will get cleaned up when the
instance is deleted.

I think we need to clean-up even in this case now too.

-Brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Networks are not cleaned up in build failure

2015-01-15 Thread Brian Haley
On 01/15/2015 12:55 PM, Andrew Laski wrote:
 On 01/15/2015 09:33 AM, Brian Haley wrote:
 On 01/14/2015 02:15 PM, Andrew Laski wrote:
 On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:
 Hi All,

 I recently experienced failures getting images from Glance while spawning
 instances. This step comes after building the networks in the guild 
 sequence.
 When the Glance failure occurred the instance was cleaned up and 
 rescheduled
 as expected, but the networks were not cleaned up. On investigation I found
 that the cleanup code for the networks is in the compute manager’s
 _/do_build_run/_instance() method as follows:

  # NOTE(comstud): Deallocate networks if the driver wants
  # us to do so.
  if self.driver.deallocate_networks_on_reschedule(instance):
  self._cleanup_allocated_networks(context, instance,
  requested_networks)

 The default behavior in for the deallocate_networks_on_schedule() method
 defined in ComputeDriver is:

  def deallocate_networks_on_reschedule(self, instance):
  Does the driver want networks deallocated on reschedule?
  return False

 Only the Ironic driver over rides this method to return True, so I think 
 this
 means the networks will not be cleaned up for any other virt driver.

  
 Is this really the desired behavior?

 Yes.  Other than when using Ironic there is nothing specific to a particular
 host in the networking setup.  This means it is not necessary to deallocate 
 and
 reallocate networks when an instance is rescheduled, so we can avoid the
 unnecessary work of doing it.
 That's either not true any more, or not true when DVR is enabled in Neutron,
 since in this case the port['binding:host_id'] value has been initialized to 
 a
 compute node, and won't get updated when nova-conductor re-schedules the VM
 elsewhere.

 This causes the neutron port to stay on the original compute node, and any
 neutron operations (like floatingip-associate) happen on the old port, 
 leaving
 the VM unreachable.
 
 Gotcha.  Then we should be rebinding that port on a reschedule or go back to
 de/reallocating.  I'm assuming there's some way to handle the port being moved
 or resizes would be broken for the same reason.
 
 If we do need to move back to de/reallocation of networks I think it would be
 better to remove the conditional nature of it and just do it.  If the
 deallocate_networks_on_reschedule method defaults to True I don't see a case
 where it would be overridden by a driver given the information above.

Andrew,

I was able to run a test here on a multi-node setup with DVR enabled:

- Booted VM
- Associated floating IP
- Updated binding:host_id (as admin) using the neutron API:

  $ neutron port-update $port -- --binding:host_id=novacompute5

The port was correctly moved to the other compute node and the floating IP
configured.  So that showed me the agents all did the right thing as far as I
can tell.  I know Paul was looking at the nova code to try and update just this
field, I'll check-in with him regarding that so we can get a patch up soon.

-Brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Networks are not cleaned up in build failure

2015-01-15 Thread Brian Haley
On 01/14/2015 02:15 PM, Andrew Laski wrote:
 
 On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:

 Hi All, 

 I recently experienced failures getting images from Glance while spawning
 instances. This step comes after building the networks in the guild sequence.
 When the Glance failure occurred the instance was cleaned up and rescheduled
 as expected, but the networks were not cleaned up. On investigation I found
 that the cleanup code for the networks is in the compute manager’s
 _/do_build_run/_instance() method as follows:

 # NOTE(comstud): Deallocate networks if the driver wants
 # us to do so.
 if self.driver.deallocate_networks_on_reschedule(instance):
 self._cleanup_allocated_networks(context, instance,
 requested_networks)

 The default behavior in for the deallocate_networks_on_schedule() method
 defined in ComputeDriver is:

 def deallocate_networks_on_reschedule(self, instance):
 Does the driver want networks deallocated on reschedule?
 return False

 Only the Ironic driver over rides this method to return True, so I think this
 means the networks will not be cleaned up for any other virt driver.

  

 Is this really the desired behavior?

 
 Yes.  Other than when using Ironic there is nothing specific to a particular
 host in the networking setup.  This means it is not necessary to deallocate 
 and
 reallocate networks when an instance is rescheduled, so we can avoid the
 unnecessary work of doing it.

That's either not true any more, or not true when DVR is enabled in Neutron,
since in this case the port['binding:host_id'] value has been initialized to a
compute node, and won't get updated when nova-conductor re-schedules the VM
elsewhere.

This causes the neutron port to stay on the original compute node, and any
neutron operations (like floatingip-associate) happen on the old port, leaving
the VM unreachable.

 If the instance goes to ERROR then the network will get cleaned up when the
 instance is deleted.

I think we need to clean-up even in this case now too.

-Brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Networks are not cleaned up in build failure

2015-01-14 Thread Andrew Laski


On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:


Hi All,

I recently experienced failures getting images from Glance while 
spawning instances. This step comes after building the networks in the 
guild sequence. When the Glance failure occurred the instance was 
cleaned up and rescheduled as expected, but the networks were not 
cleaned up. On investigation I found that the cleanup code for the 
networks is in the compute manager’s _/do_build_run/_instance() method 
as follows:


# NOTE(comstud): Deallocate networks if the driver wants
# us to do so.
if self.driver.deallocate_networks_on_reschedule(instance):
self._cleanup_allocated_networks(context, instance,
requested_networks)

The default behavior in for the deallocate_networks_on_schedule() 
method defined in ComputeDriver is:


def deallocate_networks_on_reschedule(self, instance):
Does the driver want networks deallocated on reschedule?
return False

Only the Ironic driver over rides this method to return True, so I 
think this means the networks will not be cleaned up for any other 
virt driver.


Is this really the desired behavior?



Yes.  Other than when using Ironic there is nothing specific to a 
particular host in the networking setup.  This means it is not necessary 
to deallocate and reallocate networks when an instance is rescheduled, 
so we can avoid the unnecessary work of doing it.


If the instance goes to ERROR then the network will get cleaned up when 
the instance is deleted.


I have filed a bug for this and plan to fix it: 
https://bugs.launchpad.net/nova/+bug/1410739


My initial thought is to fix this either by making the method in the 
base class return True or by adding the method to virt drivers 
returning True (I would expect the former). But I wanted to check if 
there is a reason for the base class behavior (and so the default 
behavior) to be **NOT** to clean up the networks?


Paul

Paul Murray

Nova Technical Lead, HP Cloud

+44 117 316 2527

Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks 
RG12 1HN Registered No: 690597 England. The contents of this message 
and any attachments to it are confidential and may be legally 
privileged. If you have received this message in error, you should 
delete it from your system immediately and advise the sender. To any 
recipient of this message within HP, unless otherwise stated you 
should consider this message and attachments as HP CONFIDENTIAL.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev