Public bug reported: The compute manager late anti-affinity policy check rejects both parallel VM boot requests even though one of them could be accepted to the host.
To reproduce: * create server group with anti-affinity policy * select a single compute and disable the rest of your computes * boot two VMs in parallel Expected: One of the two VMs succeeds to boot the other VM fails with NoValidHost. Actual: If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation ``` ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation: Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ``` There is a functional reproduce pushed in https://review.opendev.org/c/openstack/nova/+/930326 ** Affects: nova Importance: Undecided Status: New ** Tags: compute scheduler ** Tags added: compute scheduler ** Description changed: The compute manager late anti-affinity policy check rejects both parallel VM boot requests even though one of them could be accepted to the host. To reproduce: * create server group with anti-affinity policy * select a single compute and disable the rest of your computes * boot two VMs in parallel Expected: One of the two VMs succeeds to boot the other VM fails with NoValidHost. Actual: - If you are (un)lucky then both VM will fail with nova.exception.GroupAffinityViolation + If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation ``` ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation: Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ``` There is a functional reproduce pushed in https://review.opendev.org/c/openstack/nova/+/930326 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2081853 Title: Booting two VMs with anti-affinity in parallel to the same host results in both failing Status in OpenStack Compute (nova): New Bug description: The compute manager late anti-affinity policy check rejects both parallel VM boot requests even though one of them could be accepted to the host. To reproduce: * create server group with anti-affinity policy * select a single compute and disable the rest of your computes * boot two VMs in parallel Expected: One of the two VMs succeeds to boot the other VM fails with NoValidHost. Actual: If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation ``` ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation: Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated ``` There is a functional reproduce pushed in https://review.opendev.org/c/openstack/nova/+/930326 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2081853/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

