Public bug reported: Hi,
When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@ if not scheduled_node: try: compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname except exception.ComputeHostNotFound: LOG.exception(_LE('Failed to get compute_info for %s'), self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards ** Affects: nova Importance: Undecided Status: New ** Tags: ironic rebuild -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1623473 Title: Overwrite node field by wrong value after ironic instance rebuild Status in OpenStack Compute (nova): New Bug description: Hi, When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@ if not scheduled_node: try: compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname except exception.ComputeHostNotFound: LOG.exception(_LE('Failed to get compute_info for %s'), self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1623473/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

