Hi Yi,

On 8/27/25 2:30 PM, Yi Liu wrote:
> Hi Eric,
>
> On 2025/8/27 19:22, Eric Auger wrote:
>> Hi
>>
>> On 8/27/25 1:13 PM, Yi Liu wrote:
>>> On 2025/8/22 14:40, Zhenzhong Duan wrote:
>>>> Introduce a new PCIIOMMUOps optional callback, get_viommu_cap() which
>>>> allows to retrieve capabilities exposed by a vIOMMU. The first planned
>>>> vIOMMU device capability is VIOMMU_CAP_HW_NESTED that advertises the
>>>> support of HW nested stage translation scheme.
>>>> pci_device_get_viommu_cap
>>>> is a wrapper that can be called on a PCI device potentially
>>>> protected by
>>>> a vIOMMU.
>>>>
>>>> get_viommu_cap() is designed to return 64bit bitmap of purely emulated
>>>> capabilities which are only determined by user's configuration, no
>>>> host
>>>> capabilities involved. Reasons are:
>>>>
>>>> 1. host may has heterogeneous IOMMUs, each with different capabilities
>>>> 2. this is migration friendly, return value is consistent between
>>>> source
>>>>      and target.
>>>> 3. host IOMMU capabilities are passed to vIOMMU through
>>>> set_iommu_device()
>>>>      interface which have to be after attach_device(), when
>>>> get_viommu_cap()
>>>>      is called in attach_device(), there is no way for vIOMMU to
>>>> get host
>>>>      IOMMU capabilities yet, so only emulated capabilities can be
>>>> returned.
>>>>      See below sequence:
>>>>
>>>>        vfio_device_attach():
>>>>            iommufd_cdev_attach():
>>>>                pci_device_get_viommu_cap() for HW nesting cap
>>>>                create a nesting parent hwpt
>>>>                attach device to the hwpt
>>>>                vfio_device_hiod_create_and_realize() creating hiod
>>>>        ...
>>>>        pci_device_set_iommu_device(hiod)
>>>
>>>> Suggested-by: Yi Liu <yi.l....@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
>>>> ---
>>>>    MAINTAINERS          |  1 +
>>>>    include/hw/iommu.h   | 19 +++++++++++++++++++
>>>>    include/hw/pci/pci.h | 25 +++++++++++++++++++++++++
>>>>    hw/pci/pci.c         | 11 +++++++++++
>>>>    4 files changed, 56 insertions(+)
>>>>    create mode 100644 include/hw/iommu.h
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index a07086ed76..54fb878128 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -2305,6 +2305,7 @@ F: include/system/iommufd.h
>>>>    F: backends/host_iommu_device.c
>>>>    F: include/system/host_iommu_device.h
>>>>    F: include/qemu/chardev_open.h
>>>> +F: include/hw/iommu.h
>>>>    F: util/chardev_open.c
>>>>    F: docs/devel/vfio-iommufd.rst
>>>>    diff --git a/include/hw/iommu.h b/include/hw/iommu.h
>>>> new file mode 100644
>>>> index 0000000000..7dd0c11b16
>>>> --- /dev/null
>>>> +++ b/include/hw/iommu.h
>>>> @@ -0,0 +1,19 @@
>>>> +/*
>>>> + * General vIOMMU capabilities, flags, etc
>>>> + *
>>>> + * Copyright (C) 2025 Intel Corporation.
>>>> + *
>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>> + */
>>>> +
>>>> +#ifndef HW_IOMMU_H
>>>> +#define HW_IOMMU_H
>>>> +
>>>> +#include "qemu/bitops.h"
>>>> +
>>>> +enum {
>>>> +    /* hardware nested stage-1 page table support */
>>>> +    VIOMMU_CAP_HW_NESTED = BIT_ULL(0),
>>>
>>> This naming is a bit confusing. get_viommu_cap indicates it will return
>>> the viommu's capability while this naming is HW_NESTED. It's conflict
>>> with the commit message which claims only emulated capability will be
>>> returned.
>>
>> it actually means the viommu has the code to handle HW nested case,
>> independently on the actual HW support.
>> maybe remove the "emulation" wording.
>
> yeah, I know the meaning and the purpose here. Just not quite satisfied
> with the naming.
>
>>
>> Otherwise we may also use the virtio has_feature naming?
>
> has_feature seems better. Looks to ask if vIOMMU has something and then
> do something.
>
>>
>>>
>>> TBH. I'm hesitating to name it as get_viommu_cap. The scope is a little
>>> larger than what we want so far. So I'm wondering if it can be done
>>> in a
>>> more straightforward way. e.g. just a bool op named
>>> iommu_nested_wanted(). Just an example, maybe better naming. We can
>>> extend the op to be returning a u64 value in the future when we see
>>> another request on VFIO from vIOMMU.
>> personnally I am fine with the bitmask which looks more future proof.
>
> not quite sure if there is another info that needs to be checked in
> this "VFIO asks vIOMMU" manner. Have you seen one beside this
> nested hwpt requirement by vIOMMU?

I don't remember any at this point. But I guess with ARM CCA device
passthrough we might have other needs

Eric
>
> Regards,
> Yi Liu
>


Reply via email to