I am writing to follow up on the bug report I sent, regarding a BUG() triggered in Xen when performing a nested VMRUN with CR0.PG=0 in Long Mode. The issue was discussed with Andrew Cooper at that time, and I would like to check if there have been any updates or plans for addressing this issue.
To briefly recap: - The problem occurs when an L1 hypervisor, while in 64-bit mode, executes VMRUN with CR0.PG=0 in VMCB12, targeting a 64-bit L2 guest. - Instead of raising VMEXIT_INVALID, the system encounters a BUG() at `nsvm_vmcb_guest_intercepts_exitcode`. - VMEXIT reason observed was 0x402 (AVIC_NOACCEL), although Xen does not support AVIC. Andrew pointed out that this could indicate either a missing validity check (as the state LMA=1 && PG=0 is invalid) or possible memory corruption. Given that this issue could potentially allow a guest VM to trigger a hypervisor panic, I believe it might be worth formally recognizing and addressing. May I kindly ask if this has been acknowledged as a bug internally, or if there are any plans to handle this case safely (e.g., raising VMEXIT_INVALID instead of BUG()) in future Xen releases? Thank you very much for your time On Wed, Dec 6, 2023 at 12:05 PM Reima ISHII <ish...@g.ecc.u-tokyo.ac.jp> wrote: > Thank you for your prompt response. > > On Tue, Dec 5, 2023 at 11:43 PM Andrew Cooper <andrew.coop...@citrix.com> > wrote: > > Who is still in 64-bit mode ? > > > > It is legal for a 64-bit L1 to VMRUN into a 32-bit L2 with PG=0. > > > > But I'm guessing that you mean L2 is also 64-bit, and we're clearing PG, > > thus creating an illegal state (LMA=1 && PG=0) in VMCB12. > > > > And yes, in that case (virtual) VMRUN at L1 ought to fail with > > VMEXIT_INVALID. > > Yes, you are correct in your understanding. This issue is triggered by > VMRUN execution to 64-bit L2 guests, when CR0.PG is cleared in VMCB12. > Contrary to the expected behavior where a VMRUN at L1 should fail with > VMEXIT_INVALID, the VMRUN does not fail but instead, the system > encounters a BUG(). > > > As an incidental observation, that function is particularly absurd and > > the two switches should be merged. > > > > VMExit reason 0x402 is AVIC_NOACCEL and Xen has no support for AVIC in > > the slightest right now. i.e. Xen shouldn't have AVIC active in the > > VMCB, and should never any AVIC related VMExits. > > > > It is possible that we've got memory corruption, and have accidentally > > activated AVIC in the VMCB. > > The idea of potential memory corruption activating AVIC in the VMCB is > certainly an interesting perspective. While I'm not sure how exactly > such memory corruption could occur, the suggestion does provide a > compelling explanation for the VMExit reason 0x402 (AVIC_NOACCEL), > particularly considering Xen's current lack of AVIC support. > > > But, is this by any chance all running nested under KVM in your fuzzer? > > No, KVM was not used. The issue was observed on a Xen hypervisor's > domU HVM running directly on the hardware. Within the guest HVM, a > simple custom hypervisor was utilized. > > -- > Graduate School of Information Science and Technology, The University of > Tokyo > Reima Ishii > ish...@g.ecc.u-tokyo.ac.jp > -- 東京大学大学院 情報理工学系研究科 システム情報学専攻 修士2年 石井玲真 ish...@g.ecc.u-tokyo.ac.jp