On 20/04/2022 06:52, Jan Beulich wrote:
> On 19.04.2022 18:06, Andrew Cooper wrote:
>> On 19/04/2022 16:52, Juergen Gross wrote:
>>> On 19.04.22 17:48, Andrew Cooper wrote:
>>>> On 19/04/2022 10:39, Jan Beulich wrote:
>>>>> Besides the reporter's issue of hitting a NULL deref when
>>>>> !CONFIG_GDBSX,
>>>>> XEN_DOMCTL_test_assign_device can legitimately end up having NULL
>>>>> passed
>>>>> here, when the domctl was passed DOMID_INVALID.
>>>>>
>>>>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
>>>>> Reported-by: Cheyenne Wills <cheyenne.wi...@gmail.com>
>>>>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>>>>>
>>>>> --- a/xen/drivers/passthrough/iommu.c
>>>>> +++ b/xen/drivers/passthrough/iommu.c
>>>>> @@ -558,7 +558,7 @@ int iommu_do_domctl(
>>>>>   {
>>>>>       int ret = -ENODEV;
>>>>>   -    if ( !is_iommu_enabled(d) )
>>>>> +    if ( !(d ? is_iommu_enabled(d) : iommu_enabled) )
>>>>>           return -EOPNOTSUPP;
>>>> Having spent the better part of a day debugging this mess, this patch is
>>>> plain broken.
>>>>
>>>> It depends on Juergen's "xen/iommu: cleanup iommu related domctl
>>>> handling" patch, because otherwise it erroneously fails non-IOMMU
>>>> subops.
>>> Which is not a real problem, as it is being called after all other
>>> subops didn't match.
>> It is a real problem even now, because it is bogus for the host or
>> domain's IOMMU status to have any alteration to the
>> XEN_DOMCTL_gdbsx_guestmemio path.  The root cause of this bug is the
>> existing XEN_DOMCTL_gdbsx_guestmemio case being compiled out in the
>> !GDBSX case.
> I find your wording ("plain broken" in particular) irritating, to put
> it mildly. The change in behavior is that -EOPNOTSUPP may now be
> returned for the gdbsx operation instead of -ENOSYS.

It's not just gdbsx operations - it's every domctl subop whose case
statement happens to get conditionally compiled out:

XEN_DOMCTL_set_access_required
XEN_DOMCTL_debug_op
XEN_DOMCTL_mem_sharing_op
XEN_DOMCTL_audit_p2m

and every future domctl.  I didn't trying reasoning about the differing
populations of each arches arch_do_domctl().

>  And that's when
> it would better have been -EOPNOTSUPP in the first place.

This irrelevant unless you have a time machine, or you can prove that
the change doesn't break things.

For the record, I didn't know about Juergen's discovery of 2 ENOSYS vs
EOPNOTSUPP breakages in xen.git alone when writing the email.  The mass,
and spurious, change to almost 2^32 subops was enough to give pause for
thought.

>> It would be a more obvious problem if there were calls chained after
>> iommu_do_domctl() in the arch_domctl() default: blocks, because then it
>> wouldn't only be compiled-out functionality which hit this check.
> But that's not the case.

There is timebomb which just exploded on a user, and you've provided a
patch claiming to defuse it, when all you have done is swap out one
trigger for another.

Specifically, you've replaced a latent bug (nothing actually calls
test_assign_device with DOMID_INVALID) with a real error metastability
for logic that really does care about ENOSYS vs EOPNOTSUPP.

Yes - we should decide whether it ought be legal to call
test_assign_device with DOMID_INVALID or not, and then write the
reasoning down in the same patch which adjusts do_domctl() and/or
iommu_do_domctl() to have consistent behaviour.

But until iommu_do_domctl() is filtered to not operate on unrelated
subops, making this change breaks more things than it fixes.

~Andrew

Reply via email to