On 06/18/2018 10:16 AM, Artom Lifshitz wrote:
For Rocky I'm trying to get live migration to work properly for
instances that have a NUMA topology .
A question that came up on one of patches  is how to handle
resources claims on the destination, or indeed whether to handle that
The previous attempt's approach  (call it A) was to use the
resource tracker. This is race-free and the "correct" way to do it,
but the code is pretty opaque and not easily reviewable, as evidenced
by  sitting in review purgatory for literally years.
A simpler approach (call it B) is to ignore resource claims entirely
for now and wait for NUMA in placement to land in order to handle it
that way. This is obviously race-prone and not the "correct" way of
doing it, but the code would be relatively easy to review.
For the longest time, live migration did not keep track of resources
(until it started updating placement allocations). The message to
operators was essentially "we're giving you this massive hammer, don't
break your fingers." Continuing to ignore resource claims for now is
just maintaining the status quo. In addition, there is value in
improving NUMA live migration *now*, even if the improvement is
incomplete because it's missing resource claims. "Best is the enemy of
good" and all that. Finally, making use of the resource tracker is
just work that we know will get thrown out once we start using
placement for NUMA resources.
For all those reasons, I would favor approach B, but I wanted to ask
the community for their thoughts.
Side question... does either approach touch PCI device management during
I ask because the only workloads I've ever seen that pin guest vCPU
threads to specific host processors -- or make use of huge pages
consumed from a specific host NUMA node -- have also made use of SR-IOV
and/or PCI passthrough. 
If workloads that use PCI passthrough or SR-IOV VFs cannot be live
migrated (due to existing complications in the lower-level virt layers)
I don't see much of a point spending lots of developer resources trying
to "fix" this situation when in the real world, only a mythical workload
that uses CPU pinning or huge pages but *doesn't* use PCI passthrough or
SR-IOV VFs would be helped by it.
[1 I know I'm only one person, but every workload I've seen that
requires pinned CPUs and/or huge pages is a VNF that has been
essentially an ASIC that a telco OEM/vendor has converted into software
and requires the same guarantees that the ASIC and custom hardware gave
the original hardware-based workload. These VNFs, every single one of
them, used either PCI passthrough or SR-IOV VFs to handle
latency-sensitive network I/O.
OpenStack Development Mailing List (not for usage questions)