On 19/04/2022 11:51, Juergen Gross wrote: > On 19.04.22 12:40, Andrew Cooper wrote: >> On 19/04/2022 11:18, Juergen Gross wrote: >>> A hypervisor built without CONFIG_GDBSX will crash in case the >>> XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will >>> end up in iommu_do_domctl() with d == NULL: >>> >>> (XEN) CPU: 6 >>> (XEN) RIP: e008:[<ffff82d040269984>] iommu_do_domctl+0x4/0x30 >>> (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v0) >>> (XEN) rax: 00000000000003e8 rbx: ffff830856277ef8 rcx: >>> ffff830856277fff >>> ... >>> (XEN) Xen call trace: >>> (XEN) [<ffff82d040269984>] R iommu_do_domctl+0x4/0x30 >>> (XEN) [<ffff82d04035cd5f>] S arch_do_domctl+0x7f/0x2330 >>> (XEN) [<ffff82d040239e46>] S do_domctl+0xe56/0x1930 >>> (XEN) [<ffff82d040238ff0>] S do_domctl+0/0x1930 >>> (XEN) [<ffff82d0402f8c59>] S pv_hypercall+0x99/0x110 >>> (XEN) [<ffff82d0402f5161>] S >>> arch/x86/pv/domain.c#_toggle_guest_pt+0x11/0x90 >>> (XEN) [<ffff82d040366288>] S lstar_enter+0x128/0x130 >>> (XEN) >>> (XEN) Pagetable walk from 0000000000000144: >>> (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 6: >>> (XEN) FATAL PAGE FAULT >>> (XEN) [error_code=0000] >>> (XEN) Faulting linear address: 0000000000000144 >>> >>> Fix this issue by modifying the interface of gdbsx_guest_mem_io() to >>> take the already known domain pointer instead of the domid. >> >> There is some explanation missing here. The adjustments you make are >> within CONFIG_GDBSX, with the exception of the final hunk. > > Yeah, and this is the one really fixing the issue, while the > other hunks are needed to cope with the way the problem is > fixed.
In which case the salient point needed in the commit message is "reject the use of XEN_DOMCTL_gdbsx_guestmemio without a valid domain". I'd go so far as to say that that ought to be a oneliner fix, which also discusses why it's safe now (it didn't used to be IIRC), and the cleanup in a separate patch. This also reminds me that there's a pile of almost complete series of debugger/gdbsx/traps cleanup in need of pushing over the line. > >> The actual bug is that non-IOMMU subops end up in iommu_do_domctl(), so >> while this is good cleanup to gdbsx_guest_mem_io() it, along with Jan's >> adjustment to iommu_do_domctl(), are not suitable fixes to the crash as >> reported. > > The same way non-arch subops might end up in arch_do_domctl(). > > What would be the right way to fix that in your opinion? > > IMO any subop handler called under the default label of a switch() should > be able to handle unknown subops. This is done for iommu_do_domctl() in > Jan's patch by not dereferencing d unconditionally. The problem isn't (specifically) how they're chained; it's the name. arch_do_domctl() is clearly a dumping ground for arbitrary subops. iommu_do_domctl() is clearly not. The bug was ultimately chaining iommu_do_domctl() in a default, which is why it *was* reasonable for "use is_iommu_enabled() where appropriate..." to assume that only iommu subops would reach the function. iommu_do_domctl() either needs re-chaining under several case xxx:, or it needs renaming to something generic like arch_do_domctl2(). Nothing else is going to leave the code in a position where it's harder to make mistakes like this. ~Andrew