On 09/05/2019 14:38, Tamas K Lengyel wrote: > Hi all, > I'm investigating an issue with altp2m that can easily be reproduced > and leads to a hypervisor deadlock when PML is available in hardware. > I haven't been able to trace down where the actual deadlock occurs. > > The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer > that calls p2m_change_type_one on all gfns that were recorded the PML > buffer. The problem occurs when the PML buffer full vmexit happens > while the active p2m is an altp2m. Switching p2m_change_type_one to > work with the altp2m instead of the hostp2m however results in EPT > misconfiguration crashes. > > Adding to the issue is that it seem to only occur when the altp2m has > remapped GFNs. Since PML records entries based on GFN leads me to > question whether it is safe at all to use PML when altp2m is used with > GFN remapping. However, AFAICT the GFNs in the PML buffer are not the > remapped GFNs and my understanding is that it should be safe as long > as the GFNs being tracked by PML are never the remapped GFNs. > > Booting Xen with ept=pml=0 resolves the issue. > > If anyone has any insight into what might be happening, please let me know.
I could have sworn that George spotted a problem here and fixed it. I shouldn't be surprised if we have more. The problem that PML introduced (and this is mostly my fault, as I suggested the buggy solution) is that the vmexit handler from one vcpu pauses others to drain the PML queue into the dirty bitmap. Overall I wasn't happy with the design and I've got some ideas to improve it, but within the scope of how altp2m was engineered, I proposed domain_pause_except_self(). As it turns out, that is vulnerable to deadlocks when you get two vcpus trying to pause each other and waiting for each other to become de-scheduled. I see this has been reused by the altp2m code, but it *should* be safe to deadlocks now that it takes the hypercall_deadlock_mutext. Anyway - sorry for not being more help, but I bet the problem is going to be somewhere around vcpu pausing. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel