Public bug reported:

This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
(which should be fixed already) but still happens when resizing to the
same host.

reproduction:

fresh single node devstack/master (Nova commit
b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

[DEFAULT]
allow_resize_to_same_host = True # already set by default on a single node 
devstack
update_resources_interval = 20 # to increase chances of a race

[compute]
cpu_shared_set = 0
cpu_dedicated_set = 1-3

create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
confirming) a cirros-based instance between them back and forth.

Some times the resize confirm fails with

Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by 
"nova.compute.manager.ComputeManager.confirm_resize.<lo
cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}      
                                                                                
                               
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 
{{(pid=136855) do_confirm_resize /opt/
stack/nova/nova/compute/manager.py:4287}}                                       
                                                                                
                                                                                
              
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" 
{{(pid=136855) lock /usr/local/lib/python3.8
/dist-packages/oslo_concurrency/lockutils.py:294}}                              
                                                                                
                                                                                
              
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance 
{{(pid=136855) _get_instanc
e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}                        
                                                                                
                                                                                
              
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 
'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae 
{{(pid=136855) obj_load_attr /opt/st
ack/nova/nova/objects/instance.py:1099}}                                        
                                                                                
                                                                                
              
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. 
{{(pid=136855) _get_preexisting
_port_ids /opt/stack/nova/nova/network/neutron.py:3300}}                        
                                                                                
                                                                                
              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with 
network_info: [] {{(pid=136855) up
date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}  
                                                                                
                                                                                
              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" 
{{(pid=136855) lock /usr/local/lib/python3.
8/dist-packages/oslo_concurrency/lockutils.py:312}}                             
                                                                                
                                                                                
              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 
'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae 
{{(pid=136855) obj_load_attr 
/opt/stack/nova/nova/objects/instance.py:1099}}                                 
                                                                                
                                                                                
              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "compute_resources" acquired by 
"nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source" 
:: waited 0.000s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}      
                                                                                
                                                        
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "compute_resources" "released" by 
"nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
" :: held 0.037s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}      
                                                                                
                                                        
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host 
master-dsvm. Resource allocatio
ns in the placement service will be removed regardless because the instance is 
now on the destination host master-dsvm. You can try hard rebooting the 
instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to 
unpin [1] must be a subs
et of pinned CPU set [2, 3]                                                     
                                                                                
                                                                                
              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call 
last):                                                                          
                    
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize      
                                         
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._confirm_resize(      
                                                                                
                 
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize        
                                         
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     
self.rt.drop_move_claim_at_source(context, instance, migration)                 
                                            
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 
391, in inner                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     return f(*args, **kwargs)  
                                                                                
                 
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in 
drop_move_claim_at_source                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._drop_move_claim(     
                                                                                
                 
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in 
_drop_move_claim                                        
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._update_usage(usage, 
nodename, sign=-1)                                                              
                  
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage 
                                         
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     cn.numa_topology = 
hardware.numa_usage_from_instance_numa(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/virt/hardware.py", line 2476, in 
numa_usage_from_instance_numa
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     
new_cell.unpin_cpus(pinned_cpus)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     raise 
exception.CPUUnpinningInvalid(requested=list(cpus),
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] 
nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of 
pinned CPU set [2, 3]

full log snippet is at
https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1961188

Title:
  confirm resize fails with CPUUnpinningInvalid when resizing to the
  same host

Status in OpenStack Compute (nova):
  New

Bug description:
  This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
  (which should be fixed already) but still happens when resizing to the
  same host.

  reproduction:

  fresh single node devstack/master (Nova commit
  b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

  in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

  [DEFAULT]
  allow_resize_to_same_host = True # already set by default on a single node 
devstack
  update_resources_interval = 20 # to increase chances of a race

  [compute]
  cpu_shared_set = 0
  cpu_dedicated_set = 1-3

  create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
  confirming) a cirros-based instance between them back and forth.

  Some times the resize confirm fails with

  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by 
"nova.compute.manager.ComputeManager.confirm_resize.<lo
  cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}      
                                                                                
                               
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 
{{(pid=136855) do_confirm_resize /opt/
  stack/nova/nova/compute/manager.py:4287}}                                     
                                                                                
                                                                                
                
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" 
{{(pid=136855) lock /usr/local/lib/python3.8
  /dist-packages/oslo_concurrency/lockutils.py:294}}                            
                                                                                
                                                                                
                
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance 
{{(pid=136855) _get_instanc
  e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}                      
                                                                                
                                                                                
                
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 
'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae 
{{(pid=136855) obj_load_attr /opt/st
  ack/nova/nova/objects/instance.py:1099}}                                      
                                                                                
                                                                                
                
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. 
{{(pid=136855) _get_preexisting
  _port_ids /opt/stack/nova/nova/network/neutron.py:3300}}                      
                                                                                
                                                                                
                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with 
network_info: [] {{(pid=136855) up
  date_instance_cache_with_nw_info 
/opt/stack/nova/nova/network/neutron.py:117}}                                   
                                                                                
                                                             
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" 
{{(pid=136855) lock /usr/local/lib/python3.
  8/dist-packages/oslo_concurrency/lockutils.py:312}}                           
                                                                                
                                                                                
                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 
'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae 
{{(pid=136855) obj_load_attr 
  /opt/stack/nova/nova/objects/instance.py:1099}}                               
                                                                                
                                                                                
                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "compute_resources" acquired by 
"nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source" 
  :: waited 0.000s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}      
                                                                                
                                                        
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG 
oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin 
admin] Lock "compute_resources" "released" by 
"nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
  " :: held 0.037s {{(pid=136855) inner 
/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}      
                                                                                
                                                        
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: 
a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host 
master-dsvm. Resource allocatio
  ns in the placement service will be removed regardless because the instance 
is now on the destination host master-dsvm. You can try hard rebooting the 
instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to 
unpin [1] must be a subs
  et of pinned CPU set [2, 3]                                                   
                                                                                
                                                                                
                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call 
last):                                                                          
                    
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize      
                                         
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._confirm_resize(      
                                                                                
                 
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize        
                                         
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     
self.rt.drop_move_claim_at_source(context, instance, migration)                 
                                            
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 
391, in inner                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     return f(*args, **kwargs)  
                                                                                
                 
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in 
drop_move_claim_at_source                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._drop_move_claim(     
                                                                                
                 
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in 
_drop_move_claim                                        
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._update_usage(usage, 
nodename, sign=-1)                                                              
                  
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage 
                                         
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     cn.numa_topology = 
hardware.numa_usage_from_instance_numa(
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/virt/hardware.py", line 2476, in 
numa_usage_from_instance_numa
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     
new_cell.unpin_cpus(pinned_cpus)
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File 
"/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     raise 
exception.CPUUnpinningInvalid(requested=list(cpus),
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager 
[instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] 
nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of 
pinned CPU set [2, 3]

  full log snippet is at
  https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1961188/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to