>>> On 21.09.16 at 16:18, <kevin.ma...@gdata.de> wrote:
> I have found the problem (after hours and hours of gruesome
> debugging with the almighty print) and it seems that this could potentially
> have quite a bit of impact if altp2m is enabled for a guest domain (even if
> the
> functionality is never actively used), since destroying any vcpu of this
> guest could lead to a hypervisor panic.
> So a malicious user could simply destroy and restart his VM(s) in order to
> DOS the VMs of other users by killing the hypervisor.
> Granted, this is not very effective, but, depending on the environment, it
> is extremely easy to implement.

So this is not a security problem because altp2m isn't a supported
feature yet, albeit the features page doesn't explicitly state this one
way or the other. The correct way to report a suspected security
issue would, however, have been to contact secur...@xenproject.org 
(see also https://www.xenproject.org/security-policy.html).

> The bug persists in Xen 4.7 and I do not that it was fixed in the current
> master branch.
> The following happens.
> The call
> void hvm_vcpu_destroy(struct vcpu *v)
> {
>     hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
>     if ( hvm_altp2m_supported() )
>         altp2m_vcpu_destroy(v);
> at some time reaches vmx_vcpu_update_eptp which ends with a
> vmx_vmcs_exit(v);.

I don't see how this can be a problem - it is properly paired with
a vmx_vmcs_enter().

> For the next function in hvm_vcpu_destroy, the nestedhvm_vcpu_destroy(v) the
> missing vmcs is no problem (at least in our use case), but the
> free_compat_arg_xlat crashes.
> The callstack is as follows:
> hvm_vcpu_destroy
> free_compat_arg_xlat
> destroy_perdomain_mapping
> map_domain_page
> (probably inlined) mapcache_current_vcpu
> sync_local_execstate

For you to get here, you must be running on the idle vCPU, yet
proof of this is not visible from the partial call stack you provide.
And anyway, things breaking here suggest something going wrong
earlier, or else - afaict - we'd run into this problem also without use
of altp2m (basically whenever map_domain_page() would get used
on the guest cleanup path, which - as you see from the call tree -
happens always). So I'm afraid the patch you've put together is
papering over a problem rather than fixing it, and the actual bug
remains non-understood.

Perhaps a relevant aspect is you saying "some time reaches
vmx_vcpu_update_eptp": Why only sometimes? Afaics
altp2m_vcpu_destroy() unconditionally calls
altp2m_vcpu_update_p2m(), which is just a wrapper around


