First I tried to reproduce the issue on master. But it turned out that instance.flavor is not lazy loaded at all in the nova-compute any more after [1] added a instance.flavor call as part of the compute API live_migrate[2] call in the block_accelerator decorator[3]. This run in the nova-api service a lot before the live migration hits the compute service.
Right now on master there is no double lazy load on the instance during the live migration so the problem cannot be reproduced there any more. It has been fixed by accident since Ussuri. The older than Ussuri nova branches are in extended maintenance mode [4]. You can still propose fixes but there won't be any point release out from Stein any more. I think fix that you can try is to simply trigger the instance.flavor lazy load before the _live_migration() spawns the new thread. [1] https://review.opendev.org/c/openstack/nova/+/674726 [2] https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L5241 [3] https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L328 [4] https://docs.openstack.org/project-team-guide/stable-branches.html ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1941819 Title: A mistack caused by temporary_mutation reentry Status in OpenStack Compute (nova): Invalid Status in OpenStack Compute (nova) stein series: New Bug description: nova.virt.libvirt./driver.py:LibvirtDriver._live_migration() spawn a thread to execute _live_migration_operation(called after threadA).Original thread execute _live_migration_monitor (called after threadB). Assignment statement inst_type=instance.flavor call nova/objects/instance.py:obj_load_attr in function _live_migration_operation. Function _live_migration_monitor call func _live_migration_data_gb。Assignment statement ram_gb = instance.flavor.memory_mb * units.Mi / units.Gi also call nova/objects/instance.py:obj_load_attr. Function temporary_mutation is called by obj_load_attr. The mistack caused by the temporary_mutation is called by two threads simultaneously。 Time0: self._context[‘read.deleted’] is ‘no’ Time1: ThreadA called temporary_mutation, self._context[‘read.deleted’] is assigned a value of ‘yes’. Old value is ‘no’. Time2: ThreadB called temporary_mutation, self._context[‘read.deleted’] is assigned a value of ‘yes’. Old value is ‘yes’. Time3: ThreadA executing finally code of temporary_mutation, the value of self._context[‘read.deleted’] is restored to ‘no’. Time3: ThreadA executing finally code of temporary_mutation, the value of self._context[‘read.deleted’] is restored to ‘yes’. Result : Two calls to temporary_mutation cause the value of self._context[‘read.deleted’] to change from ‘no’ to ‘yes’. When Source host calling update_available_resource(ctxt) in _post_live_migration, Grabbing all instances assigned to this node will read deleted instances, which is time-consuming. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1941819/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

