On 09/05/2019 14:38, Tamas K Lengyel wrote:
> Hi all,
> I'm investigating an issue with altp2m that can easily be reproduced
> and leads to a hypervisor deadlock when PML is available in hardware.
> I haven't been able to trace down where the actual deadlock occurs.
>
> The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer
> that calls p2m_change_type_one on all gfns that were recorded the PML
> buffer. The problem occurs when the PML buffer full vmexit happens
> while the active p2m is an altp2m. Switching  p2m_change_type_one to
> work with the altp2m instead of the hostp2m however results in EPT
> misconfiguration crashes.
>
> Adding to the issue is that it seem to only occur when the altp2m has
> remapped GFNs. Since PML records entries based on GFN leads me to
> question whether it is safe at all to use PML when altp2m is used with
> GFN remapping. However, AFAICT the GFNs in the PML buffer are not the
> remapped GFNs and my understanding is that it should be safe as long
> as the GFNs being tracked by PML are never the remapped GFNs.
>
> Booting Xen with ept=pml=0 resolves the issue.
>
> If anyone has any insight into what might be happening, please let me know.


I could have sworn that George spotted a problem here and fixed it.  I
shouldn't be surprised if we have more.

The problem that PML introduced (and this is mostly my fault, as I
suggested the buggy solution) is that the vmexit handler from one vcpu
pauses others to drain the PML queue into the dirty bitmap.  Overall I
wasn't happy with the design and I've got some ideas to improve it, but
within the scope of how altp2m was engineered, I proposed
domain_pause_except_self().

As it turns out, that is vulnerable to deadlocks when you get two vcpus
trying to pause each other and waiting for each other to become
de-scheduled.

I see this has been reused by the altp2m code, but it *should* be safe
to deadlocks now that it takes the hypercall_deadlock_mutext.

Anyway - sorry for not being more help, but I bet the problem is going
to be somewhere around vcpu pausing.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to