On 21/04/2025 6:59 pm, REIMA ISHII wrote:
> Hi,
> I would like to follow up on the bug report I sent regarding a nested
> SVM issue in Xen, where an invalid CR4 value in VMCB12 leads to an
> assertion failure during VMRUN.
>
> As I haven't seen any updates or feedback, I wanted to kindly check if
> this issue has been acknowledged internally, or if there are any plans
> for addressing this case in future releases.
>
> Since this issue can potentially cause a hypervisor panic, I believe it
> would be valuable to handle this safely.
>
> Thank you for your time

Sorry, both issues fell between the cracks.

I've opened https://gitlab.com/xen-project/xen/-/issues/215 and so it
doesn't get lost again.

> On Mon, Nov 13, 2023 at 4:36 PM Reima ISHII <ish...@g.ecc.u-tokyo.ac.jp> 
> wrote:
>> Hi Xen Development Team,
>>
>> I am reporting a potential bug in the nested SVM implementation of the
>> Xen hypervisor, observed under specific conditions in a DomU HVM
>> guest.
>>
>> L1 on the DomU HVM guest sets a bit in CR4 of the VMCB12 save area
>> that is not part of hvm_cr4_guest_valid_bits and performs a VMRUN.
>> Subsequently, hvm_set_cr4 on the xen hypervisor fails and
>> nsvm_vcpu_vmexit_inject causes an assertion failure.
>>
>> The environment is as follows:
>> - Xen Version: Xen-4.18-unstable (commit
>> 290f82375d828ef93f831a5ef028f1283aa1ea47)
>> - Architecture: x86_64 (AMD)
>>
>> The potential impact on system stability and release builds remains
>> uncertain, but this issue might pose a problem and merits attention
>> for improved robustness in nested virtualization scenarios.
>>
>> (XEN) arch/x86/hvm/svm/nestedsvm.c:554:d1v0 hvm_set_cr4 failed, rc: 2
>> (XEN) d1v0[nsvm_vmcb_prepare4vmrun]: CR4: invalid value 0x20020 (valid
>> 0x750fff, rejected 0x20000)
>> (XEN) arch/x86/hvm/svm/nestedsvm.c:658:d1v0 virtual vmcb invalid
>> (XEN) arch/x86/hvm/svm/nestedsvm.c:729:d1v0 prepare4vmrun failed, ret = 1
>> (XEN) arch/x86/hvm/svm/nestedsvm.c:768:d1v0 inject VMEXIT(INVALID)
>> (XEN) Assertion 'vmcb->_vintr.fields.vgif == 0' failed at
>> arch/x86/hvm/svm/nestedsvm.c:799
>> (XEN) Debugging connection not set up.
>> (XEN) ----[ Xen-4.18-unstable  x86_64  debug=y gcov=y  Tainted:   C    ]----
>> (XEN) CPU:    2
>> (XEN) RIP:    e008:[<ffff82d04029bef6>] nsvm_vcpu_switch+0x34e/0x502
>> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
>> (XEN) rax: ffff830839677000   rbx: ffff83083967b000   rcx: 0000000000000030
>> (XEN) rdx: 0000000000000000   rsi: 0000000000000003   rdi: ffff83083967b000
>> (XEN) rbp: ffff83083abb7ee8   rsp: ffff83083abb7ed0   r8:  0000000000000010
>> (XEN) r9:  0000000000750fff   r10: 0000000000040000   r11: 0000000000000000
>> (XEN) r12: ffff83083abb7ef8   r13: ffffffffffffffff   r14: 0000000000000000
>> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 0000000000f506e0
>> (XEN) cr3: 00000008397bb000   cr2: 0000000000000000
>> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0033   gs: 0033   ss: 0000   cs: e008
>> (XEN) Xen code around <ffff82d04029bef6> (nsvm_vcpu_switch+0x34e/0x502):
>> (XEN)  48 83 05 7a c5 3b 00 01 <0f> 0b 48 83 05 78 c5 3b 00 01 48 83 05 60 
>> c5 3b
>> (XEN) Xen stack trace from rsp=ffff83083abb7ed0:
>> (XEN)    ffff83083967b000 0000000000000000 0000000000000000 00007cf7c54480e7
>> (XEN)    ffff82d0402a49d6 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000126000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000126000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 000000000012af30
>> (XEN)    0000beef0000beef 00000000001056f3 000000bf0000beef 0000000000000002
>> (XEN)    000000000012af60 000000000000beef 800000083abfbeef 800000083abfbeef
>> (XEN)    800000083abfbeef 800000083abfbeef 0000e01000000002 ffff83083967b000
>> (XEN)    00000037fa582000 0000000000f506e0 0000000000000000 0000000000000000
>> (XEN)    8000030300000000 800000083abff100
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04029bef6>] R nsvm_vcpu_switch+0x34e/0x502
>> (XEN)    [<ffff82d0402a49d6>] F svm_asm_do_resume+0x16/0x187
>> (XEN)
>> (XEN) debugtrace_dump() global buffer starting
>> 1 cpupool_create(pool=0,sched=6)
>> 2 Created cpupool 0 with scheduler SMP Credit Scheduler rev2 (credit2)
>> 3 cpupool_add_domain(dom=0,pool=0) n_dom 1 rc 0
>> 4-14 p2m: p2m_alloc_table(): allocating p2m table
>> 15 cpupool_add_domain(dom=1,pool=0) n_dom 2 rc 0
>> (XEN) wrap: 0
>> (XEN) debugtrace_dump() global buffer finished
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 2:
>> (XEN) Assertion 'vmcb->_vintr.fields.vgif == 0' failed at
>> arch/x86/hvm/svm/nestedsvm.c:799
>> (XEN) ****************************************

This is fun.  The ASSERT() is incorrect, but that's also not the real
issue here.

The real bug is trying to raise #GP in the virtual vmentry path because
of bad control register state.  It should trigger a virtual vmexit
reporting VMEXIT_INVALID.

As for the ASSERT(), (v)GIF blocks external interrupts (inc INIT, NMI
and any #MC which can be delayed).  It does not block exceptions, so a
#GP ought to be able to be injected like this.


The real issue here is the reuse of the helpers for main `MOV CR`.  They
simply don't behave correctly for nested virt. 

Unfortunately, this is going to be quite complicated to fix.  I have no
idea when I'm going to have enough time to look into this.

~Andrew

Reply via email to