Reviewed: https://review.opendev.org/744020 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=737e0c0111acd364d1481bdabd9d23bc8d5d6a2e Submitter: Zuul Branch: master
commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e Author: Stephen Finucane <[email protected]> Date: Thu Jul 30 17:37:38 2020 +0100 tests: Add reproducer for bug #1889633 With the introduction of the cpu-resources work [1], (libvirt) hosts can now report 'PCPU' inventory separate from 'VCPU' inventory, which is consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As part of that effort, we had to drop support for the ability to boot instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want hyperthreads) on hosts with hyperthreading. This had been previously implemented by marking thread siblings of the host cores used by such an instance as reserved and unusable by other instances, but such a design wasn't possible in world where we had to track resource consumption in placement before landing in the host. Instead, the 'isolate' policy now simply means "give me a host without hyperthreads". This is enforced by hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and instances with the 'isolate' policy requesting 'HW_CPU_HYPERTHREADING=forbidden'. Or at least, that's how it should work. We also have a fallback query for placement to find hosts with 'VCPU' inventory and that doesn't care about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could continue to be scheduled to. We figured that this second fallback query could accidentally pick up hosts with new-style configuration, but we are also tracking the available and used cores from those listed in the '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the 'NUMACell' child objects). These are validated by both the 'NUMATopologyFilter' and the virt driver itself, which means hosts with new style configuration that got caught up in this second query would be rejected by this filter or by a late failure on the host. (Hint: there's much more detail on this in the spec). Unfortunately we didn't think about hyperthreading. If a host gets picked up in the second request, it might well have enough PCPU inventory but simply be rejected in the first query since it had hyperthreads. In this case, because it has enough free cores available for pinning, neither the filter nor the virt driver will reject the request, resulting in a situation whereby the instance ends up falling back to the old code paths and consuming $flavor.vcpu host cores, plus the thread siblings for each of these cores. Despite this, it will be marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement. This patch proves this to be the case, allowing us to resolve the issue later. [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e Signed-off-by: Stephen Finucane <[email protected]> Related-Bug: #1889633 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1889633 Title: Pinned instance with thread policy can consume VCPU Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Triaged Status in OpenStack Compute (nova) ussuri series: Triaged Bug description: In Train, we introduced the concept of the 'PCPU' resource type to track pinned instance CPU usage. The '[compute] cpu_dedicated_set' is used to indicate which host cores should be used by pinned instances and, once this config option was set, nova would start reporting 'PCPU' resource types in addition to (or entirely instead of, if 'cpu_shared_set' was unset) 'VCPU'. Requests for pinned instances (via the 'hw:cpu_policy=dedicated' flavor extra spec or equivalent image metadata property) would result in a query for 'PCPU' inventory rather than 'VCPU', as previously done. We anticipated some upgrade issues with this change, whereby there could be a period during an upgrade in which some hosts would have the new configuration, meaning they'd be reporting PCPU, but the remainder would still be on legacy config and therefore would continue reporting just VCPU. An instance could be reasonably expected to land on any host, but since only the hosts with the new configuration were reporting 'PCPU' inventory and the 'hw:cpu_policy=dedicated' extra spec was resulting in a request for 'PCPU', the hosts with legacy configuration would never be consumed. We worked around this issue by adding support for a fallback placement query, enabled by default, which would make a second request using 'VCPU' inventory instead of 'PCPU'. The idea behind this was that the hosts with 'PCPU' inventory would be preferred, meaning we'd only try the 'VCPU' allocation if the preferred path failed. Crucially, we anticipated that if a host with new style configuration was picked up by this second 'VCPU' query, an instance would never actually be able to build there. This is because the new-style configuration would be reflected in the 'numa_topology' blob of the 'ComputeNode' object, specifically via the 'cpuset' (for cores allocated to 'VCPU') and 'pcpuset' (for cores allocated to 'PCPU') fields. With new-style configuration, both of these are set to unique values. If the scheduler had determined that there wasn't enough 'PCPU' inventory available for the instance, that would implicitly mean there weren't enough of the cores listed in the 'pcpuset' field still available. Turns out there's a gap in this thinking: thread policies. The 'isolate' CPU thread policy previously meant "give me a host with no hyperthreads, else a host with hyperthreads but mark the thread siblings of the cores used by the instance as reserved". This didn't translate to a new 'PCPU' world where we needed to know how many cores we were consuming up front before landing on the host. To work around this, we removed support for the latter case and instead relied on a trait, 'HW_CPU_HYPERTHEADING', to indicate whether a host had hyperthread support or not. Using the 'isolate' policy meant that trait could not be defined on the host, or the trait was "forbidden". The gap comes via a combination of this trait request and the fallback query. If we request the isolate thread policy, hosts with new-style configuration and sufficient PCPU inventory would nonetheless be rejected if they reported the 'HW_CPU_HYPERTHEADING' trait. However, these could get picked up in the fallback query and the instance would not fail to build on the host because of lack of 'PCPU' inventory. This means we end up with a pinned instance on a host using new-style configuration that is consuming 'VCPU' inventory. Boo. # Steps to reproduce 1. Using a host with hyperthreading support enabled, configure both '[compute] cpu_dedicated_set' and '[compute] cpu_shared_set' 2. Boot an instance with the 'hw:cpu_thread_policy=isolate' extra spec. # Expected result Instance should not boot since the host has hyperthreads. # Actual result Instance boots. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1889633/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

