On 19/04/2022 11:51, Juergen Gross wrote:
> On 19.04.22 12:40, Andrew Cooper wrote:
>> On 19/04/2022 11:18, Juergen Gross wrote:
>>> A hypervisor built without CONFIG_GDBSX will crash in case the
>>> XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will
>>> end up in iommu_do_domctl() with d == NULL:
>>>
>>> (XEN) CPU:    6
>>> (XEN) RIP:    e008:[<ffff82d040269984>] iommu_do_domctl+0x4/0x30
>>> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v0)
>>> (XEN) rax: 00000000000003e8   rbx: ffff830856277ef8   rcx:
>>> ffff830856277fff
>>> ...
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d040269984>] R iommu_do_domctl+0x4/0x30
>>> (XEN)    [<ffff82d04035cd5f>] S arch_do_domctl+0x7f/0x2330
>>> (XEN)    [<ffff82d040239e46>] S do_domctl+0xe56/0x1930
>>> (XEN)    [<ffff82d040238ff0>] S do_domctl+0/0x1930
>>> (XEN)    [<ffff82d0402f8c59>] S pv_hypercall+0x99/0x110
>>> (XEN)    [<ffff82d0402f5161>] S
>>> arch/x86/pv/domain.c#_toggle_guest_pt+0x11/0x90
>>> (XEN)    [<ffff82d040366288>] S lstar_enter+0x128/0x130
>>> (XEN)
>>> (XEN) Pagetable walk from 0000000000000144:
>>> (XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 6:
>>> (XEN) FATAL PAGE FAULT
>>> (XEN) [error_code=0000]
>>> (XEN) Faulting linear address: 0000000000000144
>>>
>>> Fix this issue by modifying the interface of gdbsx_guest_mem_io() to
>>> take the already known domain pointer instead of the domid.
>>
>> There is some explanation missing here.  The adjustments you make are
>> within CONFIG_GDBSX, with the exception of the final hunk.
>
> Yeah, and this is the one really fixing the issue, while the
> other hunks are needed to cope with the way the problem is
> fixed.

In which case the salient point needed in the commit message is "reject
the use of XEN_DOMCTL_gdbsx_guestmemio without a valid domain".

I'd go so far as to say that that ought to be a oneliner fix, which also
discusses why it's safe now (it didn't used to be IIRC), and the cleanup
in a separate patch.

This also reminds me that there's a pile of almost complete series of
debugger/gdbsx/traps cleanup in need of pushing over the line.

>
>> The actual bug is that non-IOMMU subops end up in iommu_do_domctl(), so
>> while this is good cleanup to gdbsx_guest_mem_io() it, along with Jan's
>> adjustment to iommu_do_domctl(), are not suitable fixes to the crash as
>> reported.
>
> The same way non-arch subops might end up in arch_do_domctl().
>
> What would be the right way to fix that in your opinion?
>
> IMO any subop handler called under the default label of a switch() should
> be able to handle unknown subops. This is done for iommu_do_domctl() in
> Jan's patch by not dereferencing d unconditionally.

The problem isn't (specifically) how they're chained; it's the name. 
arch_do_domctl() is clearly a dumping ground for arbitrary subops. 
iommu_do_domctl() is clearly not.

The bug was ultimately chaining iommu_do_domctl() in a default, which is
why it *was* reasonable for "use is_iommu_enabled() where
appropriate..." to assume that only iommu subops would reach the function.

iommu_do_domctl() either needs re-chaining under several case xxx:, or
it needs renaming to something generic like arch_do_domctl2().

Nothing else is going to leave the code in a position where it's harder
to make mistakes like this.

~Andrew

Reply via email to