I am writing to follow up on the bug report I sent, regarding a BUG()
triggered in Xen when performing a nested VMRUN with CR0.PG=0 in Long
Mode. The issue was discussed with Andrew Cooper at that time, and I
would like to check if there have been any updates or plans for
addressing this issue.

To briefly recap:
- The problem occurs when an L1 hypervisor, while in 64-bit mode,
executes VMRUN with CR0.PG=0 in VMCB12, targeting a 64-bit L2 guest.
- Instead of raising VMEXIT_INVALID, the system encounters a BUG() at
`nsvm_vmcb_guest_intercepts_exitcode`.
- VMEXIT reason observed was 0x402 (AVIC_NOACCEL), although Xen does not
support AVIC.

Andrew pointed out that this could indicate either a missing validity
check (as the state LMA=1 && PG=0 is invalid) or possible memory
corruption.

Given that this issue could potentially allow a guest VM to trigger a
hypervisor panic, I believe it might be worth formally recognizing and
addressing.
May I kindly ask if this has been acknowledged as a bug internally, or
if there are any plans to handle this case safely (e.g., raising
VMEXIT_INVALID instead of BUG()) in future Xen releases?

Thank you very much for your time


On Wed, Dec 6, 2023 at 12:05 PM Reima ISHII <ish...@g.ecc.u-tokyo.ac.jp>
wrote:

> Thank you for your prompt response.
>
> On Tue, Dec 5, 2023 at 11:43 PM Andrew Cooper <andrew.coop...@citrix.com>
> wrote:
> > Who is still in 64-bit mode ?
> >
> > It is legal for a 64-bit L1 to VMRUN into a 32-bit L2 with PG=0.
> >
> > But I'm guessing that you mean L2 is also 64-bit, and we're clearing PG,
> > thus creating an illegal state (LMA=1 && PG=0) in VMCB12.
> >
> > And yes, in that case (virtual) VMRUN at L1 ought to fail with
> > VMEXIT_INVALID.
>
> Yes, you are correct in your understanding. This issue is triggered by
> VMRUN execution to 64-bit L2 guests, when CR0.PG is cleared in VMCB12.
> Contrary to the expected behavior where a VMRUN at L1 should fail with
> VMEXIT_INVALID, the VMRUN does not fail but instead, the system
> encounters a BUG().
>
> > As an incidental observation, that function is particularly absurd and
> > the two switches should be merged.
> >
> > VMExit reason 0x402 is AVIC_NOACCEL and Xen has no support for AVIC in
> > the slightest right now.  i.e. Xen shouldn't have AVIC active in the
> > VMCB, and should never any AVIC related VMExits.
> >
> > It is possible that we've got memory corruption, and have accidentally
> > activated AVIC in the VMCB.
>
> The idea of potential memory corruption activating AVIC in the VMCB is
> certainly an interesting perspective. While I'm not sure how exactly
> such memory corruption could occur, the suggestion does provide a
> compelling explanation for the VMExit reason 0x402 (AVIC_NOACCEL),
> particularly considering Xen's current lack of AVIC support.
>
> > But, is this by any chance all running nested under KVM in your fuzzer?
>
> No, KVM was not used. The issue was observed on a Xen hypervisor's
> domU HVM running directly on the hardware. Within the guest HVM, a
> simple custom hypervisor was utilized.
>
> --
> Graduate School of Information Science and Technology, The University of
> Tokyo
> Reima Ishii
> ish...@g.ecc.u-tokyo.ac.jp
>


-- 
東京大学大学院 情報理工学系研究科 システム情報学専攻 修士2年
石井玲真
ish...@g.ecc.u-tokyo.ac.jp

Reply via email to