Public bug reported: Description =========== Server cold migration fails after re-schedule.
Steps to reproduce ================== * create a devstack with two compute hosts with libvirt driver * set allow_resize_to_same_host=True on both computes * set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler * enable NUMATopologyFilter and make sure both computes has NUMA resources * create a flavor with hw:cpu_policy='dedicated' extra spec * boot a server with the flavor. Check which compute the server is placed (let's call it host1) * boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers * cold migrate the pinned server Expected result =============== * scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability * re-schedule happens * scheduler selects host2 where the server spawns successfully Actual result ============= * during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error. Environment =========== * two node devstack with libvirt driver * stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part Triage ====== The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance. The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein. [1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87 ** Affects: nova Importance: Medium Assignee: Balazs Gibizer (balazs-gibizer) Status: Invalid ** Affects: nova/ocata Importance: Undecided Status: New ** Affects: nova/pike Importance: Medium Assignee: Balazs Gibizer (balazs-gibizer) Status: Triaged ** Affects: nova/queens Importance: Undecided Status: New ** Affects: nova/rocky Importance: Undecided Status: New ** Tags: stable-only ** Tags added: stable-only ** Changed in: nova Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer) ** Changed in: nova Status: New => Triaged ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/ocata Importance: Undecided Status: New ** Changed in: nova Status: Triaged => Invalid ** Changed in: nova/pike Status: New => Triaged ** Changed in: nova/pike Importance: Undecided => Medium ** Changed in: nova/pike Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer) ** Description changed: Description =========== Server cold migration fails after re-schedule. Steps to reproduce ================== * create a devstack with two compute hosts with libvirt driver * set allow_resize_to_same_host=True on both computes * set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler * enable NUMATopologyFilter and make sure both computes has NUMA resources * create a flavor with hw:cpu_policy='dedicated' extra spec - * boot a server with the flavor and ensure that the server. Check which compute the server is placed (let's call it host1) + * boot a server with the flavor. Check which compute the server is placed (let's call it host1) * boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers * cold migrate the pinned server Expected result =============== * scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability * re-schedule happens * scheduler selects host2 where the server spawns successfully Actual result ============= * during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error. Environment =========== - * two node devstack with libvirt driver + * two node devstack with libvirt driver * stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part - Triage ====== The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance. The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein. [1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1864665 Title: Circular reference error during re-schedule Status in OpenStack Compute (nova): Invalid Status in OpenStack Compute (nova) ocata series: New Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: New Bug description: Description =========== Server cold migration fails after re-schedule. Steps to reproduce ================== * create a devstack with two compute hosts with libvirt driver * set allow_resize_to_same_host=True on both computes * set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler * enable NUMATopologyFilter and make sure both computes has NUMA resources * create a flavor with hw:cpu_policy='dedicated' extra spec * boot a server with the flavor. Check which compute the server is placed (let's call it host1) * boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers * cold migrate the pinned server Expected result =============== * scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability * re-schedule happens * scheduler selects host2 where the server spawns successfully Actual result ============= * during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error. Environment =========== * two node devstack with libvirt driver * stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part Triage ====== The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance. The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein. [1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1864665/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp