First I tried to reproduce the issue on master. But it turned out that
instance.flavor is not lazy loaded at all in the nova-compute any more
after [1] added a instance.flavor call as part of the compute API
live_migrate[2] call in the block_accelerator decorator[3]. This run in
the nova-api service a lot before the live migration hits the compute
service.

Right now on master there is no double lazy load on the instance during
the live migration so the problem cannot be reproduced there any more.
It has been fixed by accident since Ussuri. The older than Ussuri nova
branches are in extended maintenance mode [4]. You can still propose
fixes but there won't be any point release out from Stein any more.

I think fix that you can try is to simply trigger the instance.flavor
lazy load before the _live_migration() spawns the new thread.


[1] https://review.opendev.org/c/openstack/nova/+/674726
[2] 
https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L5241
[3] 
https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L328
[4] https://docs.openstack.org/project-team-guide/stable-branches.html

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1941819

Title:
  A mistack caused by temporary_mutation reentry

Status in OpenStack Compute (nova):
  Invalid
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  nova.virt.libvirt./driver.py:LibvirtDriver._live_migration() spawn a thread 
to execute _live_migration_operation(called after threadA).Original thread 
execute _live_migration_monitor (called after threadB).
  Assignment statement inst_type=instance.flavor call 
nova/objects/instance.py:obj_load_attr in function _live_migration_operation.
  Function _live_migration_monitor call func _live_migration_data_gb。Assignment 
statement ram_gb = instance.flavor.memory_mb * units.Mi / units.Gi also call 
nova/objects/instance.py:obj_load_attr. 
  Function temporary_mutation is called by obj_load_attr. The mistack caused by 
the temporary_mutation is called by two threads simultaneously。
  Time0: self._context[‘read.deleted’] is ‘no’
  Time1: ThreadA called temporary_mutation, self._context[‘read.deleted’] is 
assigned a value of ‘yes’. Old value is ‘no’.
  Time2: ThreadB called temporary_mutation, self._context[‘read.deleted’] is 
assigned a value of ‘yes’. Old value is ‘yes’.
  Time3: ThreadA executing finally code of temporary_mutation, the value of  
self._context[‘read.deleted’] is restored to ‘no’.
  Time3: ThreadA executing finally code of temporary_mutation, the value of  
self._context[‘read.deleted’] is restored to ‘yes’.
  Result : Two calls to temporary_mutation cause the value of 
self._context[‘read.deleted’] to change from ‘no’ to ‘yes’. When Source host 
calling update_available_resource(ctxt) in _post_live_migration, Grabbing all 
instances assigned to this node will read deleted instances, which is 
time-consuming.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1941819/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to