On 06/18/2018 10:16 AM, Artom Lifshitz wrote:
Hey all,

For Rocky I'm trying to get live migration to work properly for
instances that have a NUMA topology [1].

A question that came up on one of patches [2] is how to handle
resources claims on the destination, or indeed whether to handle that
at all.

The previous attempt's approach [3] (call it A) was to use the
resource tracker. This is race-free and the "correct" way to do it,
but the code is pretty opaque and not easily reviewable, as evidenced
by [3] sitting in review purgatory for literally years.

A simpler approach (call it B) is to ignore resource claims entirely
for now and wait for NUMA in placement to land in order to handle it
that way. This is obviously race-prone and not the "correct" way of
doing it, but the code would be relatively easy to review.

For the longest time, live migration did not keep track of resources
(until it started updating placement allocations). The message to
operators was essentially "we're giving you this massive hammer, don't
break your fingers." Continuing to ignore resource claims for now is
just maintaining the status quo. In addition, there is value in
improving NUMA live migration *now*, even if the improvement is
incomplete because it's missing resource claims. "Best is the enemy of
good" and all that. Finally, making use of the resource tracker is
just work that we know will get thrown out once we start using
placement for NUMA resources.

For all those reasons, I would favor approach B, but I wanted to ask
the community for their thoughts.

Side question... does either approach touch PCI device management during live migration?

I ask because the only workloads I've ever seen that pin guest vCPU threads to specific host processors -- or make use of huge pages consumed from a specific host NUMA node -- have also made use of SR-IOV and/or PCI passthrough. [1]

If workloads that use PCI passthrough or SR-IOV VFs cannot be live migrated (due to existing complications in the lower-level virt layers) I don't see much of a point spending lots of developer resources trying to "fix" this situation when in the real world, only a mythical workload that uses CPU pinning or huge pages but *doesn't* use PCI passthrough or SR-IOV VFs would be helped by it.

Best,
-jay

[1 I know I'm only one person, but every workload I've seen that requires pinned CPUs and/or huge pages is a VNF that has been essentially an ASIC that a telco OEM/vendor has converted into software and requires the same guarantees that the ASIC and custom hardware gave the original hardware-based workload. These VNFs, every single one of them, used either PCI passthrough or SR-IOV VFs to handle latency-sensitive network I/O.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to