On 21/03/2022 21:12, Tamas K Lengyel wrote:
> During VM forking and resetting a failed vmentry has been observed due
> to the guest non-register state going out-of-sync with the guest register
> state. For example, a VM fork reset right after a STI instruction can trigger
> the failed entry. This is due to the guest non-register state not being saved
> from the parent VM, thus the reset operation only copies the register state.
>
> Fix this by including the guest non-register state in hvm_hw_cpu so that when
> its copied from the parent VM the vCPU state remains in sync.
>
> SVM is not currently wired-in as VM forking is VMX only and saving 
> non-register
> state during normal save/restore/migration operation hasn't been needed. If
> deemed necessary in the future it can be wired in by adding a svm-substructure
> to hvm_hw_cpu.

I disagree here.  This bug isn't really to do with fuzzing; it's to do
with our APIs being able to get/set vCPU state correctly, and fuzzing is
the example which demonstrated the breakage.

This will also cause very subtle bugs for the guest-transparent
migration work, and the live update work, both of which want to shift
vcpu state behind a VM which hasn't actively entered Xen via hypercall.

(Quick tangent.  Seriously, can the SDM be fixed so it actually
enumerates the Activity States.)

Xen doesn't currently support any situation where Intel's idea of
Activity State is anything other than 0.  This in turn is buggy, because
we don't encode VPF_blocked anywhere.  An activity state of hlt might
not be best place to encode this, because notably absent from Intel's
activity state is mwait.  We'll also terminate things like schedop_poll
early.

Next, AMD does have various fields from interruptibility.  If you want
me to write the patch then fine, but they should not be excluded from a
patch like this.  AMD do not have separate bits for STI and MovSS, due
to microarchitectural differences, but the single INTERRUPT_SHADOW bit
does need managing, as does the blocked-by-NMI bit which is emulated on
SVM and older Intel CPUs.

Minor tangent, blocked-by-SMI needs cross-linking with vm-entry
controls, so I'm not sure it is something we ought to expose.

Next, it turns out that MSR_DEBUGCTL isn't included anywhere (not even
the MSR list).  It is important, because it forms part of the VMEntry
cross-check with PENDING_DBG and TF.

Next, we also don't preserve PDPTRs.  This is another area where Intel
and AMD CPUs behave differently, but under Intel's architecture, the
guest kernel can clobber the 32bit PAE block of pointers and everything
will function correctly until the next mov into cr3.  There are
definitely bugs in Xen's emulated pagewalk in this area.

So there are a lot of errors with hvm_hw_cpu and I bet I haven't found
them all.

As discussed at multiple previous XenSummits, the current load/save mess
needs moving out of Xen, and that would be the correct time to fix the
other errors, but it probably is too much for this case.


At a minimum, there shouldn't be a VMX specific union, because we are
talking about architecture-agnostic properties of the vCPU.  It's fine
for the format to follow Intel's bit layout for the subset of bits we
tolerate saving/restoring, but this needs discussing in the header, and
needs rejecting on set.  An extra check/reject is 0% overhead for
forking, so I don't find that a credible argument against doing it.

I'm not sure if we want to include the activity state at the moment
without a better idea of how to encode VPF_blocked, but I think DEBUGCTL
does need including.  This in turn involves transmitting the LBR MSRs too.

~Andrew

Reply via email to