On 3/21/25 1:54 AM, Donald Dutile wrote:
>
>
> On 3/19/25 1:04 PM, Eric Auger wrote:
>>
>>
>>
>> On 3/18/25 10:22 PM, Donald Dutile wrote:
>>>
>>>
>>> On 3/18/25 3:13 PM, Nicolin Chen wrote:
>>>> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>>>>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>>>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>>>>> I could imagine that all the accel steps would be bypassed since
>>>>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>>>>> as the TLBI command dispatching function will need to be aware
>>>>>>>> what
>>>>>>>> ASID is for emulated device and what is for vfio device..
>>>>>>> I think you should block it. We already expect different vSMMU's
>>>>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>>>>> model in the guest.
>>>>>> Yea, I agree and it'd be cleaner for an implementation separating
>>>>>> them.
>>>>>>
>>>>>> In my mind, the general idea of "accel=on" is also to keep things
>>>>>> in a more efficient way: passthrough devices go to HW-accelerated
>>>>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>>>>> bypassed (PCIE0).
>>>>
>>>>> Originally a specific SMMU device was needed to opt in for MSI
>>>>> reserved
>>>>> region ACPI IORT description which are not needed if you don't
>>>>> rely on
>>>>> S1+S2. However if we don't rely on this trick this was not even
>>>>> needed
>>>>> with legacy integration
>>>>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.au...@redhat.com/).
>>>>>
>>>>>
>>>>>
>>>>> Nevertheless I don't think anything prevents the acceleration granted
>>>>> device from also working with virtio/vhost devices for instance
>>>>> unless
>>>>> you unplug the existing infra. The translation and invalidation just
>>>>> should use different control paths (explicit translation requests,
>>>>> invalidation notifications towards vhost, ...).
>>>>
>>>> smmuv3_translate() is per sdev, so it's easy.
>>>>
>>>> Invalidation is done via commands, which could be tricky:
>>>> a) Broadcast command
>>>> b) ASID validation -- we'll need to keep track of a list of ASIDs
>>>>      for vfio device to compare the ASID in each per-ASID command,
>>>>      potentially by trapping all CFGI_CD(_ALL) commands? Note that
>>>>      each vfio device may have multiple ASIDs (for multiple CDs).
>>>> Either a or b above will have some validation efficiency impact.
>>>>
>>>>> Again, what does legitimate to have different qemu devices for the
>>>>> same
>>>>> IP? I understand that it simplifies the implementation but I am not
>>>>> sure
>>>>> this is a good reason. Nevertheless it worth challenging. What is the
>>>>> plan for intel iommu? Will we have 2 devices, the legacy device
>>>>> and one
>>>>> for nested?
>>>>
>>>> Hmm, it seems that there are two different topics:
>>>> 1. Use one SMMU device model (source code file; "iommu=" string)
>>>>      for both an emulated vSMMU and an HW-accelerated vSMMU.
>>>> 2. Allow one vSMMU instance to work with both an emulated device
>>>>      and a passthrough device.
>>>> And I get that you want both 1 and 2.
>>>>
>>>> I'm totally okay with 1, yet see no compelling benefit from 2 for
>>>> the increased complexity in the invalidation routine.
>>>>
>>>> And another question about the mixed device attachment. Let's say
>>>> we have in the host:
>>>>     VFIO passthrough dev0 -> pSMMU0
>>>>     VFIO passthrough dev1 -> pSMMU1
>>>> Should we allow emulated devices to be flexibly plugged?
>>>>     dev0 -> vSMMU0 /* Hard requirement */
>>>>     dev1 -> vSMMU1 /* Hard requirement */
>>>>     emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>>>>     emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
>>>>
>>>> Thanks
>>>> Nicolin
>>>>
>>> I agree w/Jason & Nicolin: different vSMMUs for pass-through devices
>>> than emulated, & vice-versa.
>>> Not mixing... because... of the next agreement:
>> you need to clarify what you mean by different vSMMUs: are you taking
>> about different instances or different qemu device types?
> Both. a device needed to use hw-accel feature has to use an smmu that
> has that feature;
> an emulated device can use such an smmu, but as mentioned in other
> threads,
> if you start with all emulated in one smmu, if you hot-plug a
> (assigned) device,
> it needs another smmu that has hw-accel features.
> Keeping them split makes it easier at config time, and it may enable
> the code to be simpler...
> but the other half of my brain wants common code paths with
> accel/emulate branches but
> a different smmu instance will like simplify the smmu-(accel-)specific
> lookups.

Yes I think we agree on the fact that several smmu instances are needed,
especially for matching the underneath HW topology and for having a
separate protection for emulated and host devices (esp with vCMD queues)

Eric
>
>>>
>>> I agree with Eric that 'accel' isn't needed -- this should be
>>> ascertained from the pSMMU that a physical device is attached to.
>> we can simply use an AUTO_ON_OFF property and by default choose AUTO
>> value. That would close the debate ;-)
>>
> Preaching to the choir... yes.
>
>> Eric
>>> Now... how does vfio(?; why not qemu?) layer determine that? -- where
>>> are SMMUv3 'accel' features exposed either: a) in the device struct
>>> (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find
>>> anything under either on my g-h system, but would appreciate a ptr if
>>> there is.
>>> and like Eric, although 'accel' is better than the original 'nested',
>>> it's non-obvious what accel feature(s) are being turned on, or not.
>>> In fact, if broken accel hw occurs ('if' -> 'when'), how should it be
>>> turned off? ... if info in the kernel, a kernel boot-param will be
>>> needed;
>>> if in sysfs, a write to 0 an enable(disable) it maybe an alternative
>>> as well.
>>> Bottom line: we need a way to (a) ascertain the accel feature (b) a
>>> way to disable it when it is broken,
>>> so qemu's smmuv3 spec will 'just work'.
>>> [This may also help when migrating from a machine that has accel
>>> working to one that does not.[
>>>
>>> ... and when an emulated device is assigned a vSMMU, there are no
>>> accel features ... unless we have tunables like batch iotlb
>>> invalidation for perf reasons, which can be viewed as an 'accel'
>>> option.
>>>
>>
>


Reply via email to