Public bug reported: Description =========== Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.
This makes scheduling 48 cores flavor extremely sloooow… Output of reproducer: ``` InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>) ________________________________________________________ Executed in 269.13 secs fish external usr time 268.60 secs 0.00 micros 268.60 secs sys time 0.07 secs 595.00 micros 0.07 secs ``` Steps to reproduce ================== 1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack 2. Create a flavor for 48 CPUs that would take half of the host exactly openstack flavor create sh4a-c48r488e20 \ --ram $((488*1024)) \ --vcpus 48 \ --ephemeral 20 \ --disk 20 \ --swap 0 \ --property 'hw:mem_page_size=1GB' \ --property 'hw:cpu_policy=dedicated' \ --property 'hw:cpu_thread_policy=prefer' \ --property 'hw:cpu_max_sockets=8' \ --property 'hw:cpu_sockets=8' \ --property 'hw:numa_mempolicy=strict' \ --property 'hw:numa_nodes=8' \ --property 'hw:numa_cpus.0=0,1,2,3,4,5' \ --property 'hw:numa_cpus.1=6,7,8,9,10,11' \ --property 'hw:numa_cpus.2=12,13,14,15,16,17' \ --property 'hw:numa_cpus.3=18,19,20,21,22,23' \ --property 'hw:numa_cpus.4=24,25,26,27,28,29' \ --property 'hw:numa_cpus.5=30,31,32,33,34,35' \ --property 'hw:numa_cpus.6=36,37,38,39,40,41' \ --property 'hw:numa_cpus.7=42,43,44,45,46,47' \ --property 'hw:numa_mem.0=62464' \ --property 'hw:numa_mem.1=62464' \ --property 'hw:numa_mem.2=62464' \ --property 'hw:numa_mem.3=62464' \ --property 'hw:numa_mem.4=62464' \ --property 'hw:numa_mem.5=62464' \ --property 'hw:numa_mem.6=62464' \ --property 'hw:numa_mem.7=62464' \ --property 'hw:cpu_threads=2' \ --property 'hw:cpu_max_threads=2' 3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different 4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes). 5. Create a second instance with the same flavor. … Wait 5+ minutes until nova-scheduler is done with its work. Expected result =============== NUMA nodes selected within 10-15 seconds. Actual result ============= Algorithm is slow enough so that it takes 5 minutes to have instance scheduled. Environment =========== 1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer. commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master) Merge: c6e0f4f551 4c339c10e3 Author: Zuul <[email protected]> Date: Tue May 17 00:01:41 2022 +0000 Merge "Drop lower-constraints.txt and its testing" 2. Libvirt + KVM (although it is not relevant here) libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64 qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64 2. LVM storage (not relevant either) lvm2-2.03.14-3.el8.x86_64 3. Neutron with L2 (not relevant) Logs & Configs ============== Check the reproducer and try it with uncommented DEBUG lines (will attach it here too). ** Affects: nova Importance: Undecided Status: New ** Attachment added: "reproducer-simplified.py" https://bugs.launchpad.net/bugs/1978372/+attachment/5596697/+files/t.py -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1978372 Title: numa_fit_instance_to_host() algorithm is highly ineffective on higher number of NUMA nodes Status in OpenStack Compute (nova): New Bug description: Description =========== Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied. This makes scheduling 48 cores flavor extremely sloooow… Output of reproducer: ``` InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>) ________________________________________________________ Executed in 269.13 secs fish external usr time 268.60 secs 0.00 micros 268.60 secs sys time 0.07 secs 595.00 micros 0.07 secs ``` Steps to reproduce ================== 1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack 2. Create a flavor for 48 CPUs that would take half of the host exactly openstack flavor create sh4a-c48r488e20 \ --ram $((488*1024)) \ --vcpus 48 \ --ephemeral 20 \ --disk 20 \ --swap 0 \ --property 'hw:mem_page_size=1GB' \ --property 'hw:cpu_policy=dedicated' \ --property 'hw:cpu_thread_policy=prefer' \ --property 'hw:cpu_max_sockets=8' \ --property 'hw:cpu_sockets=8' \ --property 'hw:numa_mempolicy=strict' \ --property 'hw:numa_nodes=8' \ --property 'hw:numa_cpus.0=0,1,2,3,4,5' \ --property 'hw:numa_cpus.1=6,7,8,9,10,11' \ --property 'hw:numa_cpus.2=12,13,14,15,16,17' \ --property 'hw:numa_cpus.3=18,19,20,21,22,23' \ --property 'hw:numa_cpus.4=24,25,26,27,28,29' \ --property 'hw:numa_cpus.5=30,31,32,33,34,35' \ --property 'hw:numa_cpus.6=36,37,38,39,40,41' \ --property 'hw:numa_cpus.7=42,43,44,45,46,47' \ --property 'hw:numa_mem.0=62464' \ --property 'hw:numa_mem.1=62464' \ --property 'hw:numa_mem.2=62464' \ --property 'hw:numa_mem.3=62464' \ --property 'hw:numa_mem.4=62464' \ --property 'hw:numa_mem.5=62464' \ --property 'hw:numa_mem.6=62464' \ --property 'hw:numa_mem.7=62464' \ --property 'hw:cpu_threads=2' \ --property 'hw:cpu_max_threads=2' 3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different 4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes). 5. Create a second instance with the same flavor. … Wait 5+ minutes until nova-scheduler is done with its work. Expected result =============== NUMA nodes selected within 10-15 seconds. Actual result ============= Algorithm is slow enough so that it takes 5 minutes to have instance scheduled. Environment =========== 1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer. commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master) Merge: c6e0f4f551 4c339c10e3 Author: Zuul <[email protected]> Date: Tue May 17 00:01:41 2022 +0000 Merge "Drop lower-constraints.txt and its testing" 2. Libvirt + KVM (although it is not relevant here) libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64 qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64 2. LVM storage (not relevant either) lvm2-2.03.14-3.el8.x86_64 3. Neutron with L2 (not relevant) Logs & Configs ============== Check the reproducer and try it with uncommented DEBUG lines (will attach it here too). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1978372/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

