subject:"Re\: \[RFC 00\/18\] vfio\: Adopt iommufd"

RE: [RFC 00/18] vfio: Adopt iommufd

2022-06-28 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Yi Liu [mailto:yi.l@intel.com]
> Sent: 18 May 2022 15:01
> To: zhangfei@foxmail.com; Jason Gunthorpe ;
> Zhangfei Gao 
> Cc: eric.au...@redhat.com; Alex Williamson ;
> Shameerali Kolothum Thodi ;
> coh...@redhat.com; qemu-devel@nongnu.org;
> da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> nicol...@nvidia.com; eric.auger@gmail.com; kevin.t...@intel.com;
> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
> Subject: Re: [RFC 00/18] vfio: Adopt iommufd
> 
> On 2022/5/18 15:22, zhangfei@foxmail.com wrote:
> >
> >
> > On 2022/5/17 下午4:55, Yi Liu wrote:
> >> Hi Zhangfei,
> >>
> >> On 2022/5/12 17:01, zhangfei@foxmail.com wrote:
> >>>
> >>> Hi, Yi
> >>>
> >>> On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:
> >>>>
> >>>>
> >>>> On 2022/5/10 下午10:08, Yi Liu wrote:
> >>>>> On 2022/5/10 20:45, Jason Gunthorpe wrote:
> >>>>>> On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:
> >>>>>>> Thanks Yi and Eric,
> >>>>>>> Then will wait for the updated iommufd kernel for the PCI MMIO
> region.
> >>>>>>>
> >>>>>>> Another question,
> >>>>>>> How to get the iommu_domain in the ioctl.
> >>>>>>
> >>>>>> The ID of the iommu_domain (called the hwpt) it should be returned
> by
> >>>>>> the vfio attach ioctl.
> >>>>>
> >>>>> yes, hwpt_id is returned by the vfio attach ioctl and recorded in
> >>>>> qemu. You can query page table related capabilities with this id.
> >>>>>
> >>>>>
> https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/
> >>>>>
> >>>> Thanks Yi,
> >>>>
> >>>> Do we use iommufd_hw_pagetable_from_id in kernel?
> >>>>
> >>>> The qemu send hwpt_id via ioctl.
> >>>> Currently VFIOIOMMUFDContainer has hwpt_list,
> >>>> Which member is good to save hwpt_id, IOMMUTLBEntry?
> >>>
> >>> Can VFIOIOMMUFDContainer  have multi hwpt?
> >>
> >> yes, it is possible
> > Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n,
> IOMMUTLBEntry
> > *iotlb)
> 
> in map/unmap, should use ioas_id instead of hwpt_id
> 
> >
> >>
> >>> Since VFIOIOMMUFDContainer has hwpt_list now.
> >>> If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c,
> >>> where no vbasedev can be used for compare.
> >>>
> >>> I am testing with a workaround, adding VFIOIOASHwpt *hwpt in
> >>> VFIOIOMMUFDContainer.
> >>> And save hwpt when vfio_device_attach_container.
> >>>
> >>>>
> >>>> In kernel ioctl: iommufd_vfio_ioctl
> >>>> @dev: Device to get an iommu_domain for
> >>>> iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id,
> >>>> struct device *dev)
> >>>> But iommufd_vfio_ioctl seems no para dev?
> >>>
> >>> We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not
> need dev.
> >>> iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)
> >>
> >> this is not good. dev is passed in to this function to allocate domain
> >> and also check sw_msi things. If you pass in a NULL, it may even unable
> >> to get a domain for the hwpt. It won't work I guess.
> >
> > The iommufd_hw_pagetable_from_id can be used for
> > 1, allocate domain, which need para dev
> > case IOMMUFD_OBJ_IOAS
> > hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);
> 
> this is used when attaching ioas.
> 
> > 2. Just return allocated domain via hwpt_id, which does not need dev.
> > case IOMMUFD_OBJ_HW_PAGETABLE:
> > return container_of(obj, struct iommufd_hw_pagetable, obj);
> 
> yes, this would be the usage in nesting. you may check my below
> branch. It's for nesting integration.
> 
> https://github.com/luxis1999/iommufd/tree/iommufd-v5.18-rc4-nesting
> 
> > By the way, any plan of the nested mode?
> I'm working with Eric, Nic on it. Currently, I've got the above kernel
> branch, QEMU side is also WIP.

Hi Yi/Eric,

I had a look at the above nesting kernel and Qemu branches and as mentioned
in the cover letter it is not working on ARM yet.

IIUC, to get it working via the iommufd the main thing is we need a way to 
configure
the phys SMMU in nested mode and setup the mappings for the stage 2. The
Cache/PASID related changes looks more straight forward. 

I had quite a few hacks to get it working on ARM, but still a WIP. So just 
wondering
do you guys have something that can be shared yet?

Please let me know.

Thanks,
Shameer

Re: [RFC 00/18] vfio: Adopt iommufd

2022-06-28 Thread Eric Auger

Hi Shameer,

On 6/28/22 10:14, Shameerali Kolothum Thodi wrote:
>
>> -Original Message-
>> From: Yi Liu [mailto:yi.l@intel.com]
>> Sent: 18 May 2022 15:01
>> To: zhangfei@foxmail.com; Jason Gunthorpe ;
>> Zhangfei Gao 
>> Cc: eric.au...@redhat.com; Alex Williamson ;
>> Shameerali Kolothum Thodi ;
>> coh...@redhat.com; qemu-devel@nongnu.org;
>> da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
>> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
>> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
>> nicol...@nvidia.com; eric.auger@gmail.com; kevin.t...@intel.com;
>> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
>> Subject: Re: [RFC 00/18] vfio: Adopt iommufd
>>
>> On 2022/5/18 15:22, zhangfei@foxmail.com wrote:
>>>
>>> On 2022/5/17 下午4:55, Yi Liu wrote:
>>>> Hi Zhangfei,
>>>>
>>>> On 2022/5/12 17:01, zhangfei@foxmail.com wrote:
>>>>> Hi, Yi
>>>>>
>>>>> On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:
>>>>>>
>>>>>> On 2022/5/10 下午10:08, Yi Liu wrote:
>>>>>>> On 2022/5/10 20:45, Jason Gunthorpe wrote:
>>>>>>>> On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:
>>>>>>>>> Thanks Yi and Eric,
>>>>>>>>> Then will wait for the updated iommufd kernel for the PCI MMIO
>> region.
>>>>>>>>> Another question,
>>>>>>>>> How to get the iommu_domain in the ioctl.
>>>>>>>> The ID of the iommu_domain (called the hwpt) it should be returned
>> by
>>>>>>>> the vfio attach ioctl.
>>>>>>> yes, hwpt_id is returned by the vfio attach ioctl and recorded in
>>>>>>> qemu. You can query page table related capabilities with this id.
>>>>>>>
>>>>>>>
>> https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/
>>>>>> Thanks Yi,
>>>>>>
>>>>>> Do we use iommufd_hw_pagetable_from_id in kernel?
>>>>>>
>>>>>> The qemu send hwpt_id via ioctl.
>>>>>> Currently VFIOIOMMUFDContainer has hwpt_list,
>>>>>> Which member is good to save hwpt_id, IOMMUTLBEntry?
>>>>> Can VFIOIOMMUFDContainer  have multi hwpt?
>>>> yes, it is possible
>>> Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n,
>> IOMMUTLBEntry
>>> *iotlb)
>> in map/unmap, should use ioas_id instead of hwpt_id
>>
>>>>> Since VFIOIOMMUFDContainer has hwpt_list now.
>>>>> If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c,
>>>>> where no vbasedev can be used for compare.
>>>>>
>>>>> I am testing with a workaround, adding VFIOIOASHwpt *hwpt in
>>>>> VFIOIOMMUFDContainer.
>>>>> And save hwpt when vfio_device_attach_container.
>>>>>
>>>>>> In kernel ioctl: iommufd_vfio_ioctl
>>>>>> @dev: Device to get an iommu_domain for
>>>>>> iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id,
>>>>>> struct device *dev)
>>>>>> But iommufd_vfio_ioctl seems no para dev?
>>>>> We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not
>> need dev.
>>>>> iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)
>>>> this is not good. dev is passed in to this function to allocate domain
>>>> and also check sw_msi things. If you pass in a NULL, it may even unable
>>>> to get a domain for the hwpt. It won't work I guess.
>>> The iommufd_hw_pagetable_from_id can be used for
>>> 1, allocate domain, which need para dev
>>> case IOMMUFD_OBJ_IOAS
>>> hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);
>> this is used when attaching ioas.
>>
>>> 2. Just return allocated domain via hwpt_id, which does not need dev.
>>> case IOMMUFD_OBJ_HW_PAGETABLE:
>>> return container_of(obj, struct iommufd_hw_pagetable, obj);
>> yes, this would be the usage in nesting. you may check my below
>> branch. It's for nesting integration.
>>
>> https://github.com/luxis1999/iommufd/tree/iommufd-v5.18-rc4-nesting
>>
>>> By the way, any plan of the nested mode?
>> I'm working with Eric, Nic on it. Currently, I've got the above kernel
>> branch, QEMU side is also WIP.
> Hi Yi/Eric,
>
> I had a look at the above nesting kernel and Qemu branches and as mentioned
> in the cover letter it is not working on ARM yet.
>
> IIUC, to get it working via the iommufd the main thing is we need a way to 
> configure
> the phys SMMU in nested mode and setup the mappings for the stage 2. The
> Cache/PASID related changes looks more straight forward. 
>
> I had quite a few hacks to get it working on ARM, but still a WIP. So just 
> wondering
> do you guys have something that can be shared yet?

I am working on the respin based on latest iommufd kernel branches and
qemu RFC v2 but it is still WIP.

I will share as soon as possible.

Eric
>
> Please let me know.
>
> Thanks,
> Shameer

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-18 Thread Yi Liu


On 2022/5/18 15:22, zhangfei@foxmail.com wrote:



On 2022/5/17 下午4:55, Yi Liu wrote:

Hi Zhangfei,

On 2022/5/12 17:01, zhangfei@foxmail.com wrote:


Hi, Yi

On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


Can VFIOIOMMUFDContainer  have multi hwpt?


yes, it is possible
Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry 
*iotlb)


in map/unmap, should use ioas_id instead of hwpt_id






Since VFIOIOMMUFDContainer has hwpt_list now.
If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c, 
where no vbasedev can be used for compare.


I am testing with a workaround, adding VFIOIOASHwpt *hwpt in 
VFIOIOMMUFDContainer.

And save hwpt when vfio_device_attach_container.



In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, 
struct device *dev)

But iommufd_vfio_ioctl seems no para dev?


We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not need dev.
iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)


this is not good. dev is passed in to this function to allocate domain
and also check sw_msi things. If you pass in a NULL, it may even unable
to get a domain for the hwpt. It won't work I guess.


The iommufd_hw_pagetable_from_id can be used for
1, allocate domain, which need para dev
case IOMMUFD_OBJ_IOAS
hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);


this is used when attaching ioas.


2. Just return allocated domain via hwpt_id, which does not need dev.
case IOMMUFD_OBJ_HW_PAGETABLE:
return container_of(obj, struct iommufd_hw_pagetable, obj);


yes, this would be the usage in nesting. you may check my below
branch. It's for nesting integration.

https://github.com/luxis1999/iommufd/tree/iommufd-v5.18-rc4-nesting


By the way, any plan of the nested mode?

I'm working with Eric, Nic on it. Currently, I've got the above kernel
branch, QEMU side is also WIP.


Thanks


--
Regards,
Yi Liu

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-18 Thread zhangfei....@foxmail.com





On 2022/5/17 下午4:55, Yi Liu wrote:

Hi Zhangfei,

On 2022/5/12 17:01, zhangfei@foxmail.com wrote:


Hi, Yi

On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO 
region.


Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/ 




Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


Can VFIOIOMMUFDContainer  have multi hwpt?


yes, it is possible
Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)





Since VFIOIOMMUFDContainer has hwpt_list now.
If so, how to get specific hwpt from map/unmap_notify in 
hw/vfio/as.c, where no vbasedev can be used for compare.


I am testing with a workaround, adding VFIOIOASHwpt *hwpt in 
VFIOIOMMUFDContainer.

And save hwpt when vfio_device_attach_container.



In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, 
struct device *dev)

But iommufd_vfio_ioctl seems no para dev?


We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not need dev.
iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)


this is not good. dev is passed in to this function to allocate domain
and also check sw_msi things. If you pass in a NULL, it may even unable
to get a domain for the hwpt. It won't work I guess.


The iommufd_hw_pagetable_from_id can be used for
1, allocate domain, which need para dev
case IOMMUFD_OBJ_IOAS
hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);

2. Just return allocated domain via hwpt_id, which does not need dev.
case IOMMUFD_OBJ_HW_PAGETABLE:
return container_of(obj, struct iommufd_hw_pagetable, obj);

By the way, any plan of the nested mode?

Thanks

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-17 Thread Yi Liu


Hi Zhangfei,

On 2022/5/12 17:01, zhangfei@foxmail.com wrote:


Hi, Yi

On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


Can VFIOIOMMUFDContainer  have multi hwpt?


yes, it is possible


Since VFIOIOMMUFDContainer has hwpt_list now.
If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c, 
where no vbasedev can be used for compare.


I am testing with a workaround, adding VFIOIOASHwpt *hwpt in 
VFIOIOMMUFDContainer.

And save hwpt when vfio_device_attach_container.



In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, struct 
device *dev)

But iommufd_vfio_ioctl seems no para dev?


We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not need dev.
iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)


this is not good. dev is passed in to this function to allocate domain
and also check sw_msi things. If you pass in a NULL, it may even unable
to get a domain for the hwpt. It won't work I guess.


Thanks






--
Regards,
Yi Liu

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-17 Thread Yi Liu


Hi Zhangfei,

On 2022/5/11 22:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


currently, we don't make use of hwpt yet in the version we have
in the qemu branch. I have a change to make use of it. Also, it
would be used in future for nested translation setup and also
dirty page bit support query for a given domain.



In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, struct 
device *dev)

But iommufd_vfio_ioctl seems no para dev?


there is. you can look at the vfio_group_set_iommufd(), it loops the
device_list provided by vfio. And the device info is passed to iommufd.


Thanks






--
Regards,
Yi Liu

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-12 Thread zhangfei....@foxmail.com




Hi, Yi

On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


Can VFIOIOMMUFDContainer  have multi hwpt?
Since VFIOIOMMUFDContainer has hwpt_list now.
If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c, 
where no vbasedev can be used for compare.


I am testing with a workaround, adding VFIOIOASHwpt *hwpt in 
VFIOIOMMUFDContainer.

And save hwpt when vfio_device_attach_container.




In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, 
struct device *dev)

But iommufd_vfio_ioctl seems no para dev?


We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not need dev.
iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)

Thanks

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-11 Thread zhangfei....@foxmail.com





On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?

In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, struct 
device *dev)

But iommufd_vfio_ioctl seems no para dev?

Thanks

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-10 Thread Yi Liu


On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/

--
Regards,
Yi Liu

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-10 Thread Jason Gunthorpe

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:
> Thanks Yi and Eric,
> Then will wait for the updated iommufd kernel for the PCI MMIO region.
> 
> Another question,
> How to get the iommu_domain in the ioctl.

The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.

Jason

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-10 Thread Zhangfei Gao





On 2022/5/10 下午2:51, Eric Auger wrote:

Hi Hi, Zhangfei,

On 5/10/22 05:17, Yi Liu wrote:

Hi Zhangfei,

On 2022/5/9 22:24, Zhangfei Gao wrote:

Hi, Alex

On 2022/4/27 上午12:35, Alex Williamson wrote:

On Tue, 26 Apr 2022 12:43:35 +
Shameerali Kolothum Thodi  wrote:


-Original Message-
From: Eric Auger [mailto:eric.au...@redhat.com]
Sent: 26 April 2022 12:45
To: Shameerali Kolothum Thodi
; Yi
Liu ; alex.william...@redhat.com;
coh...@redhat.com;
qemu-devel@nongnu.org
Cc: da...@gibson.dropbear.id.au; th...@redhat.com;
far...@linux.ibm.com;
mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
pet...@redhat.com; Zhangfei Gao 
Subject: Re: [RFC 00/18] vfio: Adopt iommufd

[...]

https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com


/
[2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
[3]
https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1

Hi,

I had a go with the above branches on our ARM64 platform trying to

pass-through

a VF dev, but Qemu reports an error as below,

[    0.444728] hisi_sec2 :00:01.0: enabling device ( ->
0002)
qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,

0x80, 0x1, 0xb40ef000) = -14 (Bad address)

I think this happens for the dev BAR addr range. I haven't
debugged the

kernel

yet to see where it actually reports that.

Does it prevent your assigned device from working? I have such errors
too but this is a known issue. This is due to the fact P2P DMA is not
supported yet.

Yes, the basic tests all good so far. I am still not very clear how
it works if
the map() fails though. It looks like it fails in,

iommufd_ioas_map()
    iopt_map_user_pages()
     iopt_map_pages()
     ..
   pfn_reader_pin_pages()

So does it mean it just works because the page is resident()?

No, it just means that you're not triggering any accesses that require
peer-to-peer DMA support.  Any sort of test where the device is only
performing DMA to guest RAM, which is by far the standard use case,
will work fine.  This also doesn't affect vCPU access to BAR space.
It's only a failure of the mappings of the BAR space into the IOAS,
which is only used when a device tries to directly target another
device's BAR space via DMA.  Thanks,

I also get this issue when trying adding prereg listenner

+    container->prereg_listener = vfio_memory_prereg_listener;
+    memory_listener_register(>prereg_listener,
+    _space_memory);

host kernel log:
iommufd_ioas_map 1 iova=80, iova1=80,
cmd->iova=80, cmd->user_va=9c495000, cmd->length=1
iopt_alloc_area input area=859a2d00 iova=80
iopt_alloc_area area=859a2d00 iova=80
pin_user_pages_remote rc=-14

qemu log:
vfio_prereg_listener_region_add
iommufd_map iova=0x80
qemu-system-aarch64: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64: vfio_dma_map(0xfb96a930, 0x80,
0x1, 0x9c495000) = -14 (Bad address)
qemu-system-aarch64: (null)
double free or corruption (fasttop)
Aborted (core dumped)

With hack of ignoring address 0x80 in map and unmap, kernel
can boot.

do you know if the iova 0x80 guest RAM or MMIO? Currently,
iommufd kernel part doesn't support mapping device BAR MMIO. This is a
known gap.

In qemu arm virt machine this indeed matches the PCI MMIO region.


Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.

qemu can get container->ioas_id.

kernel can get ioas via the ioas_id.
But how to get the domain?
Currently I am hacking with ioas->iopt.next_domain_id, which is increasing.
domain = xa_load(>iopt.domains, ioas->iopt.next_domain_id-1);

Any idea?

Thanks

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-10 Thread Eric Auger

Hi Hi, Zhangfei,

On 5/10/22 05:17, Yi Liu wrote:
> Hi Zhangfei,
>
> On 2022/5/9 22:24, Zhangfei Gao wrote:
>> Hi, Alex
>>
>> On 2022/4/27 上午12:35, Alex Williamson wrote:
>>> On Tue, 26 Apr 2022 12:43:35 +
>>> Shameerali Kolothum Thodi  wrote:
>>>
>>>>> -Original Message-
>>>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>>>> Sent: 26 April 2022 12:45
>>>>> To: Shameerali Kolothum Thodi
>>>>> ; Yi
>>>>> Liu ; alex.william...@redhat.com;
>>>>> coh...@redhat.com;
>>>>> qemu-devel@nongnu.org
>>>>> Cc: da...@gibson.dropbear.id.au; th...@redhat.com;
>>>>> far...@linux.ibm.com;
>>>>> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
>>>>> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
>>>>> j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
>>>>> kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
>>>>> pet...@redhat.com; Zhangfei Gao 
>>>>> Subject: Re: [RFC 00/18] vfio: Adopt iommufd
>>>> [...]
>>>>> https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com
>>>>>
>>>>>>> /
>>>>>>> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
>>>>>>> [3]
>>>>>>> https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1
>>>>>> Hi,
>>>>>>
>>>>>> I had a go with the above branches on our ARM64 platform trying to
>>>>> pass-through
>>>>>> a VF dev, but Qemu reports an error as below,
>>>>>>
>>>>>> [    0.444728] hisi_sec2 :00:01.0: enabling device ( ->
>>>>>> 0002)
>>>>>> qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
>>>>>> qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,
>>>>> 0x80, 0x1, 0xb40ef000) = -14 (Bad address)
>>>>>> I think this happens for the dev BAR addr range. I haven't
>>>>>> debugged the
>>>>> kernel
>>>>>> yet to see where it actually reports that.
>>>>> Does it prevent your assigned device from working? I have such errors
>>>>> too but this is a known issue. This is due to the fact P2P DMA is not
>>>>> supported yet.
>>>> Yes, the basic tests all good so far. I am still not very clear how
>>>> it works if
>>>> the map() fails though. It looks like it fails in,
>>>>
>>>> iommufd_ioas_map()
>>>>    iopt_map_user_pages()
>>>>     iopt_map_pages()
>>>>     ..
>>>>   pfn_reader_pin_pages()
>>>>
>>>> So does it mean it just works because the page is resident()?
>>> No, it just means that you're not triggering any accesses that require
>>> peer-to-peer DMA support.  Any sort of test where the device is only
>>> performing DMA to guest RAM, which is by far the standard use case,
>>> will work fine.  This also doesn't affect vCPU access to BAR space.
>>> It's only a failure of the mappings of the BAR space into the IOAS,
>>> which is only used when a device tries to directly target another
>>> device's BAR space via DMA.  Thanks,
>>
>> I also get this issue when trying adding prereg listenner
>>
>> +    container->prereg_listener = vfio_memory_prereg_listener;
>> +    memory_listener_register(>prereg_listener,
>> +    _space_memory);
>>
>> host kernel log:
>> iommufd_ioas_map 1 iova=80, iova1=80,
>> cmd->iova=80, cmd->user_va=9c495000, cmd->length=1
>> iopt_alloc_area input area=859a2d00 iova=80
>> iopt_alloc_area area=859a2d00 iova=80
>> pin_user_pages_remote rc=-14
>>
>> qemu log:
>> vfio_prereg_listener_region_add
>> iommufd_map iova=0x80
>> qemu-system-aarch64: IOMMU_IOAS_MAP failed: Bad address
>> qemu-system-aarch64: vfio_dma_map(0xfb96a930, 0x80,
>> 0x1, 0x9c495000) = -14 (Bad address)
>> qemu-system-aarch64: (null)
>> double free or corruption (fasttop)
>> Aborted (core dumped)
>>
>> With hack of ignoring address 0x80 in map and unmap, kernel
>> can boot.
>
> do you know if the iova 0x80 guest RAM or MMIO? Currently,
> iommufd kernel part doesn't support mapping device BAR MMIO. This is a
> known gap.
In qemu arm virt machine this indeed matches the PCI MMIO region.

Thanks

Eric

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-09 Thread Yi Liu


Hi Zhangfei,

On 2022/5/9 22:24, Zhangfei Gao wrote:

Hi, Alex

On 2022/4/27 上午12:35, Alex Williamson wrote:

On Tue, 26 Apr 2022 12:43:35 +
Shameerali Kolothum Thodi  wrote:


-Original Message-
From: Eric Auger [mailto:eric.au...@redhat.com]
Sent: 26 April 2022 12:45
To: Shameerali Kolothum Thodi ; Yi
Liu ; alex.william...@redhat.com; coh...@redhat.com;
qemu-devel@nongnu.org
Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
pet...@redhat.com; Zhangfei Gao 
Subject: Re: [RFC 00/18] vfio: Adopt iommufd

[...]

https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com

/
[2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
[3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1

Hi,

I had a go with the above branches on our ARM64 platform trying to

pass-through

a VF dev, but Qemu reports an error as below,

[    0.444728] hisi_sec2 :00:01.0: enabling device ( -> 0002)
qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,

0x80, 0x1, 0xb40ef000) = -14 (Bad address)

I think this happens for the dev BAR addr range. I haven't debugged the

kernel

yet to see where it actually reports that.

Does it prevent your assigned device from working? I have such errors
too but this is a known issue. This is due to the fact P2P DMA is not
supported yet.
Yes, the basic tests all good so far. I am still not very clear how it 
works if

the map() fails though. It looks like it fails in,

iommufd_ioas_map()
   iopt_map_user_pages()
    iopt_map_pages()
    ..
  pfn_reader_pin_pages()

So does it mean it just works because the page is resident()?

No, it just means that you're not triggering any accesses that require
peer-to-peer DMA support.  Any sort of test where the device is only
performing DMA to guest RAM, which is by far the standard use case,
will work fine.  This also doesn't affect vCPU access to BAR space.
It's only a failure of the mappings of the BAR space into the IOAS,
which is only used when a device tries to directly target another
device's BAR space via DMA.  Thanks,


I also get this issue when trying adding prereg listenner

+    container->prereg_listener = vfio_memory_prereg_listener;
+    memory_listener_register(>prereg_listener,
+    _space_memory);

host kernel log:
iommufd_ioas_map 1 iova=80, iova1=80, cmd->iova=80, 
cmd->user_va=9c495000, cmd->length=1

iopt_alloc_area input area=859a2d00 iova=80
iopt_alloc_area area=859a2d00 iova=80
pin_user_pages_remote rc=-14

qemu log:
vfio_prereg_listener_region_add
iommufd_map iova=0x80
qemu-system-aarch64: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64: vfio_dma_map(0xfb96a930, 0x80, 0x1, 
0x9c495000) = -14 (Bad address)

qemu-system-aarch64: (null)
double free or corruption (fasttop)
Aborted (core dumped)

With hack of ignoring address 0x80 in map and unmap, kernel can boot.


do you know if the iova 0x80 guest RAM or MMIO? Currently, iommufd 
kernel part doesn't support mapping device BAR MMIO. This is a known gap.


--
Regards,
Yi Liu

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-09 Thread Zhangfei Gao


Hi, Alex

On 2022/4/27 上午12:35, Alex Williamson wrote:

On Tue, 26 Apr 2022 12:43:35 +
Shameerali Kolothum Thodi  wrote:


-Original Message-
From: Eric Auger [mailto:eric.au...@redhat.com]
Sent: 26 April 2022 12:45
To: Shameerali Kolothum Thodi ; Yi
Liu ; alex.william...@redhat.com; coh...@redhat.com;
qemu-devel@nongnu.org
Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
pet...@redhat.com; Zhangfei Gao 
Subject: Re: [RFC 00/18] vfio: Adopt iommufd

[...]
  
  

https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com

/
[2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
[3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1

Hi,

I had a go with the above branches on our ARM64 platform trying to

pass-through

a VF dev, but Qemu reports an error as below,

[0.444728] hisi_sec2 :00:01.0: enabling device ( -> 0002)
qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,

0x80, 0x1, 0xb40ef000) = -14 (Bad address)

I think this happens for the dev BAR addr range. I haven't debugged the

kernel

yet to see where it actually reports that.

Does it prevent your assigned device from working? I have such errors
too but this is a known issue. This is due to the fact P2P DMA is not
supported yet.
   

Yes, the basic tests all good so far. I am still not very clear how it works if
the map() fails though. It looks like it fails in,

iommufd_ioas_map()
   iopt_map_user_pages()
iopt_map_pages()
..
  pfn_reader_pin_pages()

So does it mean it just works because the page is resident()?

No, it just means that you're not triggering any accesses that require
peer-to-peer DMA support.  Any sort of test where the device is only
performing DMA to guest RAM, which is by far the standard use case,
will work fine.  This also doesn't affect vCPU access to BAR space.
It's only a failure of the mappings of the BAR space into the IOAS,
which is only used when a device tries to directly target another
device's BAR space via DMA.  Thanks,


I also get this issue when trying adding prereg listenner

+    container->prereg_listener = vfio_memory_prereg_listener;
+    memory_listener_register(>prereg_listener,
+    _space_memory);

host kernel log:
iommufd_ioas_map 1 iova=80, iova1=80, 
cmd->iova=80, cmd->user_va=9c495000, cmd->length=1

iopt_alloc_area input area=859a2d00 iova=80
iopt_alloc_area area=859a2d00 iova=80
pin_user_pages_remote rc=-14

qemu log:
vfio_prereg_listener_region_add
iommufd_map iova=0x80
qemu-system-aarch64: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64: vfio_dma_map(0xfb96a930, 0x80, 0x1, 
0x9c495000) = -14 (Bad address)

qemu-system-aarch64: (null)
double free or corruption (fasttop)
Aborted (core dumped)

With hack of ignoring address 0x80 in map and unmap, kernel can 
boot.


Thanks

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-28 Thread Tian, Kevin

> From: Daniel P. Berrangé 
> Sent: Friday, April 29, 2022 12:20 AM
> 
> On Thu, Apr 28, 2022 at 08:24:48AM -0600, Alex Williamson wrote:
> > On Thu, 28 Apr 2022 03:21:45 +
> > "Tian, Kevin"  wrote:
> >
> > > > From: Alex Williamson 
> > > > Sent: Wednesday, April 27, 2022 12:22 AM
> > > > > >
> > > > > > My expectation would be that libvirt uses:
> > > > > >
> > > > > >  -object iommufd,id=iommufd0,fd=NNN
> > > > > >  -device vfio-pci,fd=MMM,iommufd=iommufd0
> > > > > >
> > > > > > Whereas simple QEMU command line would be:
> > > > > >
> > > > > >  -object iommufd,id=iommufd0
> > > > > >  -device vfio-pci,iommufd=iommufd0,host=:02:00.0
> > > > > >
> > > > > > The iommufd object would open /dev/iommufd itself.  Creating an
> > > > > > implicit iommufd object is someone problematic because one of the
> > > > > > things I forgot to highlight in my previous description is that the
> > > > > > iommufd object is meant to be shared across not only various vfio
> > > > > > devices (platform, ccw, ap, nvme, etc), but also across subsystems,
> ex.
> > > > > > vdpa.
> > > > >
> > > > > Out of curiosity - in concept one iommufd is sufficient to support all
> > > > > ioas requirements across subsystems while having multiple
> iommufd's
> > > > > instead lose the benefit of centralized accounting. The latter will 
> > > > > also
> > > > > cause some trouble when we start virtualizing ENQCMD which
> requires
> > > > > VM-wide PASID virtualization thus further needs to share that
> > > > > information across iommufd's. Not unsolvable but really no gain by
> > > > > adding such complexity. So I'm curious whether Qemu provide
> > > > > a way to restrict that certain object type can only have one instance
> > > > > to discourage such multi-iommufd attempt?
> > > >
> > > > I don't see any reason for QEMU to restrict iommufd objects.  The
> QEMU
> > > > philosophy seems to be to let users create whatever configuration they
> > > > want.  For libvirt though, the assumption would be that a single
> > > > iommufd object can be used across subsystems, so libvirt would never
> > > > automatically create multiple objects.
> > >
> > > I like the flexibility what the objection approach gives in your proposal.
> > > But with the said complexity in mind (with no foreseen benefit), I wonder
> >
> > What's the actual complexity?  Front-end/backend splits are very common
> > in QEMU.  We're making the object connection via name, why is it
> > significantly more complicated to allow multiple iommufd objects?  On
> > the contrary, it seems to me that we'd need to go out of our way to add
> > code to block multiple iommufd objects.

Probably it's just a hypothetical concern when I thought about the need
of managing certain global information (e.g. PASID virtualization) cross
iommufd's down the road. With your and Daniel's replies I think we'll
first try to follow the common practice in Qemu first given there are
more positive reasons to do so than the hypothetical concern itself.

> >
> > > whether an alternative approach which treats iommufd as a global
> > > property instead of an object is acceptable in Qemu, i.e.:
> > >
> > > -iommufd on/off
> > > -device vfio-pci,iommufd,[fd=MMM/host=:02:00.0]
> > >
> > > All devices with iommufd specified then implicitly share a single iommufd
> > > object within Qemu.
> >
> > QEMU requires key-value pairs AFAIK, so the above doesn't work, then
> > we're just back to the iommufd=on/off.
> >
> > > This still allows vfio devices to be specified via fd but just requires 
> > > Libvirt
> > > to grant file permission on /dev/iommu. Is it a worthwhile tradeoff to be
> > > considered or just not a typical way in Qemu philosophy e.g. any object
> > > associated with a device must be explicitly specified?
> >
> > Avoiding QEMU opening files was a significant focus of my alternate
> > proposal.  Also note that we must be able to support hotplug, so we
> > need to be able to dynamically add and remove the iommufd object, I
> > don't see that a global property allows for that.  Implicit
> > associations of devices to shared resources doesn't seem particularly
> > desirable to me.  Thanks,
> 
> Adding new global properties/options is rather an anti-pattern for QEMU
> these days. Using -object is the right approach. If you only want to
> allow for one of them, just document this requirement. We've got other
> objects which are singletons like all the confidential guest classes
> for each arch.
> 

Good to know such last resort. As said we'll try to avoid this restriction
and follow Alex's proposal unless there are unexpectedly unreasonable
complexities arising later.

Thanks
Kevin

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-28 Thread Daniel P . Berrangé

On Thu, Apr 28, 2022 at 08:24:48AM -0600, Alex Williamson wrote:
> On Thu, 28 Apr 2022 03:21:45 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Wednesday, April 27, 2022 12:22 AM  
> > > > >
> > > > > My expectation would be that libvirt uses:
> > > > >
> > > > >  -object iommufd,id=iommufd0,fd=NNN
> > > > >  -device vfio-pci,fd=MMM,iommufd=iommufd0
> > > > >
> > > > > Whereas simple QEMU command line would be:
> > > > >
> > > > >  -object iommufd,id=iommufd0
> > > > >  -device vfio-pci,iommufd=iommufd0,host=:02:00.0
> > > > >
> > > > > The iommufd object would open /dev/iommufd itself.  Creating an
> > > > > implicit iommufd object is someone problematic because one of the
> > > > > things I forgot to highlight in my previous description is that the
> > > > > iommufd object is meant to be shared across not only various vfio
> > > > > devices (platform, ccw, ap, nvme, etc), but also across subsystems, 
> > > > > ex.
> > > > > vdpa.  
> > > >
> > > > Out of curiosity - in concept one iommufd is sufficient to support all
> > > > ioas requirements across subsystems while having multiple iommufd's
> > > > instead lose the benefit of centralized accounting. The latter will also
> > > > cause some trouble when we start virtualizing ENQCMD which requires
> > > > VM-wide PASID virtualization thus further needs to share that
> > > > information across iommufd's. Not unsolvable but really no gain by
> > > > adding such complexity. So I'm curious whether Qemu provide
> > > > a way to restrict that certain object type can only have one instance
> > > > to discourage such multi-iommufd attempt?  
> > > 
> > > I don't see any reason for QEMU to restrict iommufd objects.  The QEMU
> > > philosophy seems to be to let users create whatever configuration they
> > > want.  For libvirt though, the assumption would be that a single
> > > iommufd object can be used across subsystems, so libvirt would never
> > > automatically create multiple objects.  
> > 
> > I like the flexibility what the objection approach gives in your proposal.
> > But with the said complexity in mind (with no foreseen benefit), I wonder
> 
> What's the actual complexity?  Front-end/backend splits are very common
> in QEMU.  We're making the object connection via name, why is it
> significantly more complicated to allow multiple iommufd objects?  On
> the contrary, it seems to me that we'd need to go out of our way to add
> code to block multiple iommufd objects.
> 
> > whether an alternative approach which treats iommufd as a global
> > property instead of an object is acceptable in Qemu, i.e.:
> > 
> > -iommufd on/off
> > -device vfio-pci,iommufd,[fd=MMM/host=:02:00.0]
> > 
> > All devices with iommufd specified then implicitly share a single iommufd
> > object within Qemu.
> 
> QEMU requires key-value pairs AFAIK, so the above doesn't work, then
> we're just back to the iommufd=on/off.
>  
> > This still allows vfio devices to be specified via fd but just requires 
> > Libvirt
> > to grant file permission on /dev/iommu. Is it a worthwhile tradeoff to be
> > considered or just not a typical way in Qemu philosophy e.g. any object
> > associated with a device must be explicitly specified?
> 
> Avoiding QEMU opening files was a significant focus of my alternate
> proposal.  Also note that we must be able to support hotplug, so we
> need to be able to dynamically add and remove the iommufd object, I
> don't see that a global property allows for that.  Implicit
> associations of devices to shared resources doesn't seem particularly
> desirable to me.  Thanks,

Adding new global properties/options is rather an anti-pattern for QEMU
these days. Using -object is the right approach. If you only want to
allow for one of them, just document this requirement. We've got other
objects which are singletons like all the confidential guest classes
for each arch.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-28 Thread Alex Williamson

On Thu, 28 Apr 2022 03:21:45 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Wednesday, April 27, 2022 12:22 AM  
> > > >
> > > > My expectation would be that libvirt uses:
> > > >
> > > >  -object iommufd,id=iommufd0,fd=NNN
> > > >  -device vfio-pci,fd=MMM,iommufd=iommufd0
> > > >
> > > > Whereas simple QEMU command line would be:
> > > >
> > > >  -object iommufd,id=iommufd0
> > > >  -device vfio-pci,iommufd=iommufd0,host=:02:00.0
> > > >
> > > > The iommufd object would open /dev/iommufd itself.  Creating an
> > > > implicit iommufd object is someone problematic because one of the
> > > > things I forgot to highlight in my previous description is that the
> > > > iommufd object is meant to be shared across not only various vfio
> > > > devices (platform, ccw, ap, nvme, etc), but also across subsystems, ex.
> > > > vdpa.  
> > >
> > > Out of curiosity - in concept one iommufd is sufficient to support all
> > > ioas requirements across subsystems while having multiple iommufd's
> > > instead lose the benefit of centralized accounting. The latter will also
> > > cause some trouble when we start virtualizing ENQCMD which requires
> > > VM-wide PASID virtualization thus further needs to share that
> > > information across iommufd's. Not unsolvable but really no gain by
> > > adding such complexity. So I'm curious whether Qemu provide
> > > a way to restrict that certain object type can only have one instance
> > > to discourage such multi-iommufd attempt?  
> > 
> > I don't see any reason for QEMU to restrict iommufd objects.  The QEMU
> > philosophy seems to be to let users create whatever configuration they
> > want.  For libvirt though, the assumption would be that a single
> > iommufd object can be used across subsystems, so libvirt would never
> > automatically create multiple objects.  
> 
> I like the flexibility what the objection approach gives in your proposal.
> But with the said complexity in mind (with no foreseen benefit), I wonder

What's the actual complexity?  Front-end/backend splits are very common
in QEMU.  We're making the object connection via name, why is it
significantly more complicated to allow multiple iommufd objects?  On
the contrary, it seems to me that we'd need to go out of our way to add
code to block multiple iommufd objects.

> whether an alternative approach which treats iommufd as a global
> property instead of an object is acceptable in Qemu, i.e.:
> 
> -iommufd on/off
> -device vfio-pci,iommufd,[fd=MMM/host=:02:00.0]
> 
> All devices with iommufd specified then implicitly share a single iommufd
> object within Qemu.

QEMU requires key-value pairs AFAIK, so the above doesn't work, then
we're just back to the iommufd=on/off.
 
> This still allows vfio devices to be specified via fd but just requires 
> Libvirt
> to grant file permission on /dev/iommu. Is it a worthwhile tradeoff to be
> considered or just not a typical way in Qemu philosophy e.g. any object
> associated with a device must be explicitly specified?

Avoiding QEMU opening files was a significant focus of my alternate
proposal.  Also note that we must be able to support hotplug, so we
need to be able to dynamically add and remove the iommufd object, I
don't see that a global property allows for that.  Implicit
associations of devices to shared resources doesn't seem particularly
desirable to me.  Thanks,

Alex

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-27 Thread Tian, Kevin

> From: Alex Williamson 
> Sent: Wednesday, April 27, 2022 12:22 AM
> > >
> > > My expectation would be that libvirt uses:
> > >
> > >  -object iommufd,id=iommufd0,fd=NNN
> > >  -device vfio-pci,fd=MMM,iommufd=iommufd0
> > >
> > > Whereas simple QEMU command line would be:
> > >
> > >  -object iommufd,id=iommufd0
> > >  -device vfio-pci,iommufd=iommufd0,host=:02:00.0
> > >
> > > The iommufd object would open /dev/iommufd itself.  Creating an
> > > implicit iommufd object is someone problematic because one of the
> > > things I forgot to highlight in my previous description is that the
> > > iommufd object is meant to be shared across not only various vfio
> > > devices (platform, ccw, ap, nvme, etc), but also across subsystems, ex.
> > > vdpa.
> >
> > Out of curiosity - in concept one iommufd is sufficient to support all
> > ioas requirements across subsystems while having multiple iommufd's
> > instead lose the benefit of centralized accounting. The latter will also
> > cause some trouble when we start virtualizing ENQCMD which requires
> > VM-wide PASID virtualization thus further needs to share that
> > information across iommufd's. Not unsolvable but really no gain by
> > adding such complexity. So I'm curious whether Qemu provide
> > a way to restrict that certain object type can only have one instance
> > to discourage such multi-iommufd attempt?
> 
> I don't see any reason for QEMU to restrict iommufd objects.  The QEMU
> philosophy seems to be to let users create whatever configuration they
> want.  For libvirt though, the assumption would be that a single
> iommufd object can be used across subsystems, so libvirt would never
> automatically create multiple objects.

I like the flexibility what the objection approach gives in your proposal.
But with the said complexity in mind (with no foreseen benefit), I wonder
whether an alternative approach which treats iommufd as a global
property instead of an object is acceptable in Qemu, i.e.:

-iommufd on/off
-device vfio-pci,iommufd,[fd=MMM/host=:02:00.0]

All devices with iommufd specified then implicitly share a single iommufd
object within Qemu.

This still allows vfio devices to be specified via fd but just requires Libvirt
to grant file permission on /dev/iommu. Is it a worthwhile tradeoff to be
considered or just not a typical way in Qemu philosophy e.g. any object
associated with a device must be explicitly specified?

Thanks
Kevin

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Alex Williamson

On Tue, 26 Apr 2022 13:42:17 -0300
Jason Gunthorpe  wrote:

> On Tue, Apr 26, 2022 at 10:21:59AM -0600, Alex Williamson wrote:
> > We also need to be able to advise libvirt as to how each iommufd object
> > or user of that object factors into the VM locked memory requirement.
> > When used by vfio-pci, we're only mapping VM RAM, so we'd ask libvirt
> > to set the locked memory limit to the size of VM RAM per iommufd,
> > regardless of the number of devices using a given iommufd.  However, I
> > don't know if all users of iommufd will be exclusively mapping VM RAM.
> > Combinations of devices where some map VM RAM and others map QEMU
> > buffer space could still require some incremental increase per device
> > (I'm not sure if vfio-nvme is such a device).  It seems like heuristics
> > will still be involved even after iommufd solves the per-device
> > vfio-pci locked memory limit issue.  Thanks,  
> 
> If the model is to pass the FD, how about we put a limit on the FD
> itself instead of abusing the locked memory limit?
> 
> We could have a no-way-out ioctl that directly limits the # of PFNs
> covered by iopt_pages inside an iommufd.

FD passing would likely only be the standard for libvirt invoked VMs.
The QEMU vfio-pci device would still parse a host= or sysfsdev= option
when invoked by mortals and associate to use the legacy vfio group
interface or the new vfio device interface based on whether an iommufd
is specified.

Does that rule out your suggestion?  I don't know, please reveal more
about the mechanics of putting a limit on the FD itself and this
no-way-out ioctl.  The latter name suggests to me that I should also
note that we need to support memory hotplug with these devices.  Thanks,

Alex

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Jason Gunthorpe

On Tue, Apr 26, 2022 at 01:24:35PM -0600, Alex Williamson wrote:
> On Tue, 26 Apr 2022 13:42:17 -0300
> Jason Gunthorpe  wrote:
> 
> > On Tue, Apr 26, 2022 at 10:21:59AM -0600, Alex Williamson wrote:
> > > We also need to be able to advise libvirt as to how each iommufd object
> > > or user of that object factors into the VM locked memory requirement.
> > > When used by vfio-pci, we're only mapping VM RAM, so we'd ask libvirt
> > > to set the locked memory limit to the size of VM RAM per iommufd,
> > > regardless of the number of devices using a given iommufd.  However, I
> > > don't know if all users of iommufd will be exclusively mapping VM RAM.
> > > Combinations of devices where some map VM RAM and others map QEMU
> > > buffer space could still require some incremental increase per device
> > > (I'm not sure if vfio-nvme is such a device).  It seems like heuristics
> > > will still be involved even after iommufd solves the per-device
> > > vfio-pci locked memory limit issue.  Thanks,  
> > 
> > If the model is to pass the FD, how about we put a limit on the FD
> > itself instead of abusing the locked memory limit?
> > 
> > We could have a no-way-out ioctl that directly limits the # of PFNs
> > covered by iopt_pages inside an iommufd.
> 
> FD passing would likely only be the standard for libvirt invoked VMs.
> The QEMU vfio-pci device would still parse a host= or sysfsdev= option
> when invoked by mortals and associate to use the legacy vfio group
> interface or the new vfio device interface based on whether an iommufd
> is specified.

Yes, but perhaps we don't need resource limits in the mortals case..

> Does that rule out your suggestion?  I don't know, please reveal more
> about the mechanics of putting a limit on the FD itself and this
> no-way-out ioctl.  The latter name suggests to me that I should also
> note that we need to support memory hotplug with these devices.  Thanks,

So libvirt uses CAP_SYS_RESOURCE and prlimit to adjust things in
realtime today?

It could still work, instead of no way out iommufd would have to check
for CAP_SYS_RESOURCE to make the limit higher.

It is a pretty simple idea, we just attach a resource limit to the FD
and every PFN that gets mapped into the iommufd counts against that
limit, regardless if it is pinned or not. An ioctl on the FD would set
the limit, defaulting to unlimited.

To me this has the appeal that what is being resourced controlled is
strictly defined - address space mapped into an iommufd - which has a
bunch of nice additional consequences like partially bounding the
amount of kernel memory an iommufd can consume and so forth.

Doesn't interact with iouring or rdma however.

Though we could certianly consider allowing RDMA to consume an iommufd
to access pinned pages much like a vfio-mdev would - I'm not sure what
is ideal for the qemu usage of RDMA for migration..

Jason

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Alex Williamson

On Tue, 26 Apr 2022 12:43:35 +
Shameerali Kolothum Thodi  wrote:

> > -Original Message-
> > From: Eric Auger [mailto:eric.au...@redhat.com]
> > Sent: 26 April 2022 12:45
> > To: Shameerali Kolothum Thodi ; Yi
> > Liu ; alex.william...@redhat.com; coh...@redhat.com;
> > qemu-devel@nongnu.org
> > Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> > mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> > jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> > j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
> > kevin.t...@intel.com; chao.p.p...@intel.com; yi.y....@intel.com;
> > pet...@redhat.com; Zhangfei Gao 
> > Subject: Re: [RFC 00/18] vfio: Adopt iommufd  
> 
> [...]
>  
> > >>  
> > https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com  
> > >> /
> > >> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
> > >> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1  
> > > Hi,
> > >
> > > I had a go with the above branches on our ARM64 platform trying to  
> > pass-through  
> > > a VF dev, but Qemu reports an error as below,
> > >
> > > [0.444728] hisi_sec2 :00:01.0: enabling device ( -> 0002)
> > > qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
> > > qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,  
> > 0x80, 0x1, 0xb40ef000) = -14 (Bad address)  
> > >
> > > I think this happens for the dev BAR addr range. I haven't debugged the  
> > kernel  
> > > yet to see where it actually reports that.  
> > Does it prevent your assigned device from working? I have such errors
> > too but this is a known issue. This is due to the fact P2P DMA is not
> > supported yet.
> >   
> 
> Yes, the basic tests all good so far. I am still not very clear how it works 
> if
> the map() fails though. It looks like it fails in,
> 
> iommufd_ioas_map()
>   iopt_map_user_pages()
>iopt_map_pages()
>..
>  pfn_reader_pin_pages()
> 
> So does it mean it just works because the page is resident()?

No, it just means that you're not triggering any accesses that require
peer-to-peer DMA support.  Any sort of test where the device is only
performing DMA to guest RAM, which is by far the standard use case,
will work fine.  This also doesn't affect vCPU access to BAR space.
It's only a failure of the mappings of the BAR space into the IOAS,
which is only used when a device tries to directly target another
device's BAR space via DMA.  Thanks,

Alex

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Jason Gunthorpe

On Tue, Apr 26, 2022 at 10:21:59AM -0600, Alex Williamson wrote:
> We also need to be able to advise libvirt as to how each iommufd object
> or user of that object factors into the VM locked memory requirement.
> When used by vfio-pci, we're only mapping VM RAM, so we'd ask libvirt
> to set the locked memory limit to the size of VM RAM per iommufd,
> regardless of the number of devices using a given iommufd.  However, I
> don't know if all users of iommufd will be exclusively mapping VM RAM.
> Combinations of devices where some map VM RAM and others map QEMU
> buffer space could still require some incremental increase per device
> (I'm not sure if vfio-nvme is such a device).  It seems like heuristics
> will still be involved even after iommufd solves the per-device
> vfio-pci locked memory limit issue.  Thanks,

If the model is to pass the FD, how about we put a limit on the FD
itself instead of abusing the locked memory limit?

We could have a no-way-out ioctl that directly limits the # of PFNs
covered by iopt_pages inside an iommufd.

Jason

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Alex Williamson

On Tue, 26 Apr 2022 08:37:41 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Monday, April 25, 2022 10:38 PM
> > 
> > On Mon, 25 Apr 2022 11:10:14 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:  
> > > > [Cc +libvirt folks]
> > > >
> > > > On Thu, 14 Apr 2022 03:46:52 -0700
> > > > Yi Liu  wrote:
> > > >  
> > > > > With the introduction of iommufd[1], the linux kernel provides a  
> > generic  
> > > > > interface for userspace drivers to propagate their DMA mappings to  
> > kernel  
> > > > > for assigned devices. This series does the porting of the VFIO devices
> > > > > onto the /dev/iommu uapi and let it coexist with the legacy  
> > implementation.  
> > > > > Other devices like vpda, vfio mdev and etc. are not considered yet.  
> > >
> > > snip
> > >  
> > > > > The selection of the backend is made on a device basis using the new
> > > > > iommufd option (on/off/auto). By default the iommufd backend is  
> > selected  
> > > > > if supported by the host and by QEMU (iommufd KConfig). This option  
> > is  
> > > > > currently available only for the vfio-pci device. For other types of
> > > > > devices, it does not yet exist and the legacy BE is chosen by 
> > > > > default.  
> > > >
> > > > I've discussed this a bit with Eric, but let me propose a different
> > > > command line interface.  Libvirt generally likes to pass file
> > > > descriptors to QEMU rather than grant it access to those files
> > > > directly.  This was problematic with vfio-pci because libvirt can't
> > > > easily know when QEMU will want to grab another /dev/vfio/vfio
> > > > container.  Therefore we abandoned this approach and instead libvirt
> > > > grants file permissions.
> > > >
> > > > However, with iommufd there's no reason that QEMU ever needs more  
> > than  
> > > > a single instance of /dev/iommufd and we're using per device vfio file
> > > > descriptors, so it seems like a good time to revisit this.  
> > >
> > > I assume access to '/dev/iommufd' gives the process somewhat elevated
> > > privileges, such that you don't want to unconditionally give QEMU
> > > access to this device ?  
> > 
> > It's not that much dissimilar to /dev/vfio/vfio, it's an unprivileged
> > interface which should have limited scope for abuse, but more so here
> > the goal would be to de-privilege QEMU that one step further that it
> > cannot open the device file itself.
> >   
> > > > The interface I was considering would be to add an iommufd object to
> > > > QEMU, so we might have a:
> > > >
> > > > -device iommufd[,fd=#][,id=foo]
> > > >
> > > > For non-libivrt usage this would have the ability to open /dev/iommufd
> > > > itself if an fd is not provided.  This object could be shared with
> > > > other iommufd users in the VM and maybe we'd allow multiple instances
> > > > for more esoteric use cases.  [NB, maybe this should be a -object 
> > > > rather  
> > than  
> > > > -device since the iommufd is not a guest visible device?]  
> > >
> > > Yes,  -object would be the right answer for something that's purely
> > > a host side backend impl selector.
> > >  
> > > > The vfio-pci device might then become:
> > > >
> > > > -device vfio-  
> > pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=f
> > oo]  
> > > >
> > > > So essentially we can specify the device via host, sysfsdev, or passing
> > > > an fd to the vfio device file.  When an iommufd object is specified,
> > > > "foo" in the example above, each of those options would use the
> > > > vfio-device access mechanism, essentially the same as iommufd=on in
> > > > your example.  With the fd passing option, an iommufd object would be
> > > > required and necessarily use device level access.
> > > >
> > > > In your example, the iommufd=auto seems especially troublesome for
> > > > libvirt because QEMU is going to have different locked memory
> > > > requirements based on whether we're using type1 or iommufd, where  
> > the  
> > > > latter resolves the duplicate accounting issues.  libvirt needs to know 
> > > >  
> 
> Based on current plan there is probably a transition window between the
> point where the first vfio device type (vfio-pci) gaining iommufd support
> and the point where all vfio types supporting iommufd. Libvirt can figure
> out which one to use iommufd by checking the presence of
> /dev/vfio/devices/vfioX. But what would be the resource limit policy
> in Libvirt in such transition window when both type1 and iommufd might
> be used? Or do we just expect Libvirt to support iommufd only after the
> transition window ends to avoid handling such mess?

Good point regarding libvirt testing for the vfio device files for use
with iommufd, so libvirt would test if /dev/iommufd exists and if the
device they want to assign maps to a /dev/vfio/devices/vfioX file.  This
was essentially implicit in the fd=# option to the vfio-pci device.

In mixed combinations, I'd expect

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 26 April 2022 12:45
> To: Shameerali Kolothum Thodi ; Yi
> Liu ; alex.william...@redhat.com; coh...@redhat.com;
> qemu-devel@nongnu.org
> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
> kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
> pet...@redhat.com; Zhangfei Gao 
> Subject: Re: [RFC 00/18] vfio: Adopt iommufd

[...]
 
> >>
> https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com
> >> /
> >> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
> >> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1
> > Hi,
> >
> > I had a go with the above branches on our ARM64 platform trying to
> pass-through
> > a VF dev, but Qemu reports an error as below,
> >
> > [0.444728] hisi_sec2 :00:01.0: enabling device ( -> 0002)
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
> > qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,
> 0x80, 0x1, 0xb40ef000) = -14 (Bad address)
> >
> > I think this happens for the dev BAR addr range. I haven't debugged the
> kernel
> > yet to see where it actually reports that.
> Does it prevent your assigned device from working? I have such errors
> too but this is a known issue. This is due to the fact P2P DMA is not
> supported yet.
> 

Yes, the basic tests all good so far. I am still not very clear how it works if
the map() fails though. It looks like it fails in,

iommufd_ioas_map()
  iopt_map_user_pages()
   iopt_map_pages()
   ..
 pfn_reader_pin_pages()

So does it mean it just works because the page is resident()?

Thanks,
Shameer

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Jason Gunthorpe

On Tue, Apr 26, 2022 at 08:37:41AM +, Tian, Kevin wrote:

> Based on current plan there is probably a transition window between the
> point where the first vfio device type (vfio-pci) gaining iommufd support
> and the point where all vfio types supporting iommufd. 

I am still hoping to do all in one shot, lets see :) 

Jason

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Eric Auger

Hi Shameer,

On 4/26/22 11:47 AM, Shameerali Kolothum Thodi wrote:
>
>> -Original Message-
>> From: Yi Liu [mailto:yi.l@intel.com]
>> Sent: 14 April 2022 11:47
>> To: alex.william...@redhat.com; coh...@redhat.com;
>> qemu-devel@nongnu.org
>> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
>> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
>> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
>> j...@nvidia.com; nicol...@nvidia.com; eric.au...@redhat.com;
>> eric.auger@gmail.com; kevin.t...@intel.com; yi.l@intel.com;
>> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
>> Subject: [RFC 00/18] vfio: Adopt iommufd
>>
>> With the introduction of iommufd[1], the linux kernel provides a generic
>> interface for userspace drivers to propagate their DMA mappings to kernel
>> for assigned devices. This series does the porting of the VFIO devices
>> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>> Other devices like vpda, vfio mdev and etc. are not considered yet.
>>
>> For vfio devices, the new interface is tied with device fd and iommufd
>> as the iommufd solution is device-centric. This is different from legacy
>> vfio which is group-centric. To support both interfaces in QEMU, this
>> series introduces the iommu backend concept in the form of different
>> container classes. The existing vfio container is named legacy container
>> (equivalent with legacy iommu backend in this series), while the new
>> iommufd based container is named as iommufd container (may also be
>> mentioned
>> as iommufd backend in this series). The two backend types have their own
>> way to setup secure context and dma management interface. Below diagram
>> shows how it looks like with both BEs.
>>
>> VFIO
>> AddressSpace/Memory
>> +---+  +--+  +-+  +-+
>> |  pci  |  | platform |  |  ap |  | ccw |
>> +---+---+  ++-+  +--+--+  +--+--+ +--+
>> |   |   |||   AddressSpace
>> |
>> |   |   ||++-+
>> +---V---V---VV+   /
>> |   VFIOAddressSpace  | <+
>> |  |  |  MemoryListener
>> |  VFIOContainer list |
>> +---+++
>> ||
>> ||
>> +---V--++V--+
>> |   iommufd||vfio legacy|
>> |  container   || container |
>> +---+--+++--+
>> ||
>> | /dev/iommu | /dev/vfio/vfio
>> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>>  Userspace  ||
>>
>> ===++==
>> ==
>>  Kernel |  device fd |
>> +---+| group/container fd
>> | (BIND_IOMMUFD ||
>> (SET_CONTAINER/SET_IOMMU)
>> |  ATTACH_IOAS) || device fd
>> |   ||
>> |   +---VV-+
>> iommufd |   |vfio  |
>> (map/unmap  |   +-++---+
>>  ioas_copy) | || map/unmap
>> | ||
>>  +--V--++-V--+  +--V+
>>  | iommfd core ||  device|  |  vfio iommu   |
>>  +-+++  +---+
>>
>> [Secure Context setup]
>> - iommufd BE: uses device fd and iommufd to setup secure context
>>   (bind_iommufd, attach_ioas)
>> - vfio legacy BE: uses group fd and container fd to setup secure context
>>   (set_container, set_iommu)
>> [Device access]
>> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>> - vfio legacy BE: device fd is retrieved from group fd ioctl
>> [DMA Mapping flow]
>> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>> - VFIO populates DMA map/unmap via the container BEs
>>   *) iommufd BE: uses iommufd
>>   *) vfio legacy BE: uses container fd
>>
>> This series qomifies the VFIOContainer object which acts as a base class
>> for a container. This base class is derived into the legacy VFIO container
>> and the new iommufd based container. The base class implements generic
>> code
>> such as code related to memory_listener and address space management
>> whereas
>> the derived class implements callbacks that depend on the kernel user space
>> being used.
>>
>> The selection of

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Shameerali Kolothum Thodi via




> -Original Message-
> From: Yi Liu [mailto:yi.l@intel.com]
> Sent: 14 April 2022 11:47
> To: alex.william...@redhat.com; coh...@redhat.com;
> qemu-devel@nongnu.org
> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> j...@nvidia.com; nicol...@nvidia.com; eric.au...@redhat.com;
> eric.auger@gmail.com; kevin.t...@intel.com; yi.l@intel.com;
> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
> Subject: [RFC 00/18] vfio: Adopt iommufd
> 
> With the introduction of iommufd[1], the linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> Other devices like vpda, vfio mdev and etc. are not considered yet.
> 
> For vfio devices, the new interface is tied with device fd and iommufd
> as the iommufd solution is device-centric. This is different from legacy
> vfio which is group-centric. To support both interfaces in QEMU, this
> series introduces the iommu backend concept in the form of different
> container classes. The existing vfio container is named legacy container
> (equivalent with legacy iommu backend in this series), while the new
> iommufd based container is named as iommufd container (may also be
> mentioned
> as iommufd backend in this series). The two backend types have their own
> way to setup secure context and dma management interface. Below diagram
> shows how it looks like with both BEs.
> 
> VFIO
> AddressSpace/Memory
> +---+  +--+  +-+  +-+
> |  pci  |  | platform |  |  ap |  | ccw |
> +---+---+  ++-+  +--+--+  +--+--+ +--+
> |   |   |||   AddressSpace
> |
> |   |   ||++-+
> +---V---V---VV+   /
> |   VFIOAddressSpace  | <+
> |  |  |  MemoryListener
> |  VFIOContainer list |
> +---+++
> ||
> ||
> +---V--++V--+
> |   iommufd||vfio legacy|
> |  container   || container |
> +---+--+++--+
> ||
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>  Userspace  ||
> 
> ===++==
> ==
>  Kernel |  device fd |
> +---+| group/container fd
> | (BIND_IOMMUFD ||
> (SET_CONTAINER/SET_IOMMU)
> |  ATTACH_IOAS) || device fd
> |   ||
> |   +---VV-+
> iommufd |   |vfio  |
> (map/unmap  |   +-++---+
>  ioas_copy) | || map/unmap
> | ||
>  +--V--++-V--+  +--V+
>  | iommfd core ||  device|  |  vfio iommu   |
>  +-+++  +---+
> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>   (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>   (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> - VFIO populates DMA map/unmap via the container BEs
>   *) iommufd BE: uses iommufd
>   *) vfio legacy BE: uses container fd
> 
> This series qomifies the VFIOContainer object which acts as a base class
> for a container. This base class is derived into the legacy VFIO container
> and the new iommufd based container. The base class implements generic
> code
> such as code related to memory_listener and address space management
> whereas
> the derived class implements callbacks that depend on the kernel user space
> being used.
> 
> The selection of the backend is made on a device basis using the new
> iommufd option (on/off/auto). By default the iommufd backend is selected
> if supported by the host

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Tian, Kevin

> From: Eric Auger 
> Sent: Tuesday, April 26, 2022 3:55 AM
> 
> Hi Kevin,
> 
> On 4/18/22 10:49 AM, Tian, Kevin wrote:
> >> From: Liu, Yi L 
> >> Sent: Thursday, April 14, 2022 6:47 PM
> >>
> >> This series qomifies the VFIOContainer object which acts as a base class
> > what does 'qomify' mean? I didn't find this word from dictionary...
> sorry this is pure QEMU terminology. This stands for "QEMU Object Model"
> additional info at:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 

Nice to know!

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Tian, Kevin

> From: Alex Williamson 
> Sent: Monday, April 25, 2022 10:38 PM
> 
> On Mon, 25 Apr 2022 11:10:14 +0100
> Daniel P. Berrangé  wrote:
> 
> > On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:
> > > [Cc +libvirt folks]
> > >
> > > On Thu, 14 Apr 2022 03:46:52 -0700
> > > Yi Liu  wrote:
> > >
> > > > With the introduction of iommufd[1], the linux kernel provides a
> generic
> > > > interface for userspace drivers to propagate their DMA mappings to
> kernel
> > > > for assigned devices. This series does the porting of the VFIO devices
> > > > onto the /dev/iommu uapi and let it coexist with the legacy
> implementation.
> > > > Other devices like vpda, vfio mdev and etc. are not considered yet.
> >
> > snip
> >
> > > > The selection of the backend is made on a device basis using the new
> > > > iommufd option (on/off/auto). By default the iommufd backend is
> selected
> > > > if supported by the host and by QEMU (iommufd KConfig). This option
> is
> > > > currently available only for the vfio-pci device. For other types of
> > > > devices, it does not yet exist and the legacy BE is chosen by default.
> > >
> > > I've discussed this a bit with Eric, but let me propose a different
> > > command line interface.  Libvirt generally likes to pass file
> > > descriptors to QEMU rather than grant it access to those files
> > > directly.  This was problematic with vfio-pci because libvirt can't
> > > easily know when QEMU will want to grab another /dev/vfio/vfio
> > > container.  Therefore we abandoned this approach and instead libvirt
> > > grants file permissions.
> > >
> > > However, with iommufd there's no reason that QEMU ever needs more
> than
> > > a single instance of /dev/iommufd and we're using per device vfio file
> > > descriptors, so it seems like a good time to revisit this.
> >
> > I assume access to '/dev/iommufd' gives the process somewhat elevated
> > privileges, such that you don't want to unconditionally give QEMU
> > access to this device ?
> 
> It's not that much dissimilar to /dev/vfio/vfio, it's an unprivileged
> interface which should have limited scope for abuse, but more so here
> the goal would be to de-privilege QEMU that one step further that it
> cannot open the device file itself.
> 
> > > The interface I was considering would be to add an iommufd object to
> > > QEMU, so we might have a:
> > >
> > > -device iommufd[,fd=#][,id=foo]
> > >
> > > For non-libivrt usage this would have the ability to open /dev/iommufd
> > > itself if an fd is not provided.  This object could be shared with
> > > other iommufd users in the VM and maybe we'd allow multiple instances
> > > for more esoteric use cases.  [NB, maybe this should be a -object rather
> than
> > > -device since the iommufd is not a guest visible device?]
> >
> > Yes,  -object would be the right answer for something that's purely
> > a host side backend impl selector.
> >
> > > The vfio-pci device might then become:
> > >
> > > -device vfio-
> pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=f
> oo]
> > >
> > > So essentially we can specify the device via host, sysfsdev, or passing
> > > an fd to the vfio device file.  When an iommufd object is specified,
> > > "foo" in the example above, each of those options would use the
> > > vfio-device access mechanism, essentially the same as iommufd=on in
> > > your example.  With the fd passing option, an iommufd object would be
> > > required and necessarily use device level access.
> > >
> > > In your example, the iommufd=auto seems especially troublesome for
> > > libvirt because QEMU is going to have different locked memory
> > > requirements based on whether we're using type1 or iommufd, where
> the
> > > latter resolves the duplicate accounting issues.  libvirt needs to know

Based on current plan there is probably a transition window between the
point where the first vfio device type (vfio-pci) gaining iommufd support
and the point where all vfio types supporting iommufd. Libvirt can figure
out which one to use iommufd by checking the presence of
/dev/vfio/devices/vfioX. But what would be the resource limit policy
in Libvirt in such transition window when both type1 and iommufd might
be used? Or do we just expect Libvirt to support iommufd only after the
transition window ends to avoid handling such mess?

> > > deterministically which backed is being used, which this proposal seems
> > > to provide, while at the same time bringing us more in line with fd
> > > passing.  Thoughts?  Thanks,
> >
> > Yep, I agree that libvirt needs to have more direct control over this.
> > This is also even more important if there are notable feature differences
> > in the 2 backends.
> >
> > I wonder if anyone has considered an even more distinct impl, whereby
> > we have a completely different device type on the backend, eg
> >
> >   -device vfio-iommu-
> pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=f
> oo]
> >
> > If a vendor wants to fully

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Alex Williamson

On Mon, 25 Apr 2022 22:23:05 +0200
Eric Auger  wrote:

> Hi Alex,
> 
> On 4/23/22 12:09 AM, Alex Williamson wrote:
> > [Cc +libvirt folks]
> >
> > On Thu, 14 Apr 2022 03:46:52 -0700
> > Yi Liu  wrote:
> >  
> >> With the introduction of iommufd[1], the linux kernel provides a generic
> >> interface for userspace drivers to propagate their DMA mappings to kernel
> >> for assigned devices. This series does the porting of the VFIO devices
> >> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> >> Other devices like vpda, vfio mdev and etc. are not considered yet.
> >>
> >> For vfio devices, the new interface is tied with device fd and iommufd
> >> as the iommufd solution is device-centric. This is different from legacy
> >> vfio which is group-centric. To support both interfaces in QEMU, this
> >> series introduces the iommu backend concept in the form of different
> >> container classes. The existing vfio container is named legacy container
> >> (equivalent with legacy iommu backend in this series), while the new
> >> iommufd based container is named as iommufd container (may also be 
> >> mentioned
> >> as iommufd backend in this series). The two backend types have their own
> >> way to setup secure context and dma management interface. Below diagram
> >> shows how it looks like with both BEs.
> >>
> >> VFIO   AddressSpace/Memory
> >> +---+  +--+  +-+  +-+
> >> |  pci  |  | platform |  |  ap |  | ccw |
> >> +---+---+  ++-+  +--+--+  +--+--+ +--+
> >> |   |   |||   AddressSpace   |
> >> |   |   ||++-+
> >> +---V---V---VV+   /
> >> |   VFIOAddressSpace  | <+
> >> |  |  |  MemoryListener
> >> |  VFIOContainer list |
> >> +---+++
> >> ||
> >> ||
> >> +---V--++V--+
> >> |   iommufd||vfio legacy|
> >> |  container   || container |
> >> +---+--+++--+
> >> ||
> >> | /dev/iommu | /dev/vfio/vfio
> >> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
> >>  Userspace  ||
> >>  ===++
> >>  Kernel |  device fd |
> >> +---+| group/container fd
> >> | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
> >> |  ATTACH_IOAS) || device fd
> >> |   ||
> >> |   +---VV-+
> >> iommufd |   |vfio  |
> >> (map/unmap  |   +-++---+
> >>  ioas_copy) | || map/unmap
> >> | ||
> >>  +--V--++-V--+  +--V+
> >>  | iommfd core ||  device|  |  vfio iommu   |
> >>  +-+++  +---+
> >>
> >> [Secure Context setup]
> >> - iommufd BE: uses device fd and iommufd to setup secure context
> >>   (bind_iommufd, attach_ioas)
> >> - vfio legacy BE: uses group fd and container fd to setup secure context
> >>   (set_container, set_iommu)
> >> [Device access]
> >> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> >> - vfio legacy BE: device fd is retrieved from group fd ioctl
> >> [DMA Mapping flow]
> >> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> >> - VFIO populates DMA map/unmap via the container BEs
> >>   *) iommufd BE: uses iommufd
> >>   *) vfio legacy BE: uses container fd
> >>
> >> This series qomifies the VFIOContainer object which acts as a base class
> >> for a container. This base class is derived into the legacy VFIO container
> >> and the new iommufd based container. The base class implements generic code
> >> such as code related to memory_listener and address space management 
> >> whereas
> >> the derived class implements callbacks that depend on the kernel user space
> >> being used.
> >>
> >> The selection of the backend is made on a device basis using the new
> >> iommufd option (on/off/auto). By default the iommufd backend is selected
> >> if supported by the host and by QEMU (iommufd KConfig). This option is
> >> currently available only for the vfio-pci device. For other types of
> >> devices, it does not yet exist and the legacy BE is

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Eric Auger

Hi Alex,

On 4/23/22 12:09 AM, Alex Williamson wrote:
> [Cc +libvirt folks]
>
> On Thu, 14 Apr 2022 03:46:52 -0700
> Yi Liu  wrote:
>
>> With the introduction of iommufd[1], the linux kernel provides a generic
>> interface for userspace drivers to propagate their DMA mappings to kernel
>> for assigned devices. This series does the porting of the VFIO devices
>> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>> Other devices like vpda, vfio mdev and etc. are not considered yet.
>>
>> For vfio devices, the new interface is tied with device fd and iommufd
>> as the iommufd solution is device-centric. This is different from legacy
>> vfio which is group-centric. To support both interfaces in QEMU, this
>> series introduces the iommu backend concept in the form of different
>> container classes. The existing vfio container is named legacy container
>> (equivalent with legacy iommu backend in this series), while the new
>> iommufd based container is named as iommufd container (may also be mentioned
>> as iommufd backend in this series). The two backend types have their own
>> way to setup secure context and dma management interface. Below diagram
>> shows how it looks like with both BEs.
>>
>> VFIO   AddressSpace/Memory
>> +---+  +--+  +-+  +-+
>> |  pci  |  | platform |  |  ap |  | ccw |
>> +---+---+  ++-+  +--+--+  +--+--+ +--+
>> |   |   |||   AddressSpace   |
>> |   |   ||++-+
>> +---V---V---VV+   /
>> |   VFIOAddressSpace  | <+
>> |  |  |  MemoryListener
>> |  VFIOContainer list |
>> +---+++
>> ||
>> ||
>> +---V--++V--+
>> |   iommufd||vfio legacy|
>> |  container   || container |
>> +---+--+++--+
>> ||
>> | /dev/iommu | /dev/vfio/vfio
>> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>>  Userspace  ||
>>  ===++
>>  Kernel |  device fd |
>> +---+| group/container fd
>> | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
>> |  ATTACH_IOAS) || device fd
>> |   ||
>> |   +---VV-+
>> iommufd |   |vfio  |
>> (map/unmap  |   +-++---+
>>  ioas_copy) | || map/unmap
>> | ||
>>  +--V--++-V--+  +--V+
>>  | iommfd core ||  device|  |  vfio iommu   |
>>  +-+++  +---+
>>
>> [Secure Context setup]
>> - iommufd BE: uses device fd and iommufd to setup secure context
>>   (bind_iommufd, attach_ioas)
>> - vfio legacy BE: uses group fd and container fd to setup secure context
>>   (set_container, set_iommu)
>> [Device access]
>> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>> - vfio legacy BE: device fd is retrieved from group fd ioctl
>> [DMA Mapping flow]
>> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>> - VFIO populates DMA map/unmap via the container BEs
>>   *) iommufd BE: uses iommufd
>>   *) vfio legacy BE: uses container fd
>>
>> This series qomifies the VFIOContainer object which acts as a base class
>> for a container. This base class is derived into the legacy VFIO container
>> and the new iommufd based container. The base class implements generic code
>> such as code related to memory_listener and address space management whereas
>> the derived class implements callbacks that depend on the kernel user space
>> being used.
>>
>> The selection of the backend is made on a device basis using the new
>> iommufd option (on/off/auto). By default the iommufd backend is selected
>> if supported by the host and by QEMU (iommufd KConfig). This option is
>> currently available only for the vfio-pci device. For other types of
>> devices, it does not yet exist and the legacy BE is chosen by default.
> I've discussed this a bit with Eric, but let me propose a different
> command line interface.  Libvirt generally likes to pass file
> descriptors to QEMU rather than grant it access to those files
> directly.  This was

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Eric Auger

Hi Kevin,

On 4/18/22 10:49 AM, Tian, Kevin wrote:
>> From: Liu, Yi L 
>> Sent: Thursday, April 14, 2022 6:47 PM
>>
>> With the introduction of iommufd[1], the linux kernel provides a generic
>> interface for userspace drivers to propagate their DMA mappings to kernel
>> for assigned devices. This series does the porting of the VFIO devices
>> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>> Other devices like vpda, vfio mdev and etc. are not considered yet.
> vfio mdev has no special support in Qemu. Just that it's not supported
> by iommufd yet thus can only be operated in legacy container interface at
> this point. Later once it's supported by the kernel suppose no additional
> enabling work is required for mdev in Qemu.
>
>> For vfio devices, the new interface is tied with device fd and iommufd
>> as the iommufd solution is device-centric. This is different from legacy
>> vfio which is group-centric. To support both interfaces in QEMU, this
>> series introduces the iommu backend concept in the form of different
>> container classes. The existing vfio container is named legacy container
>> (equivalent with legacy iommu backend in this series), while the new
>> iommufd based container is named as iommufd container (may also be
>> mentioned
>> as iommufd backend in this series). The two backend types have their own
>> way to setup secure context and dma management interface. Below diagram
>> shows how it looks like with both BEs.
>>
>> VFIO   AddressSpace/Memory
>> +---+  +--+  +-+  +-+
>> |  pci  |  | platform |  |  ap |  | ccw |
>> +---+---+  ++-+  +--+--+  +--+--+ +--+
>> |   |   |||   AddressSpace   |
>> |   |   ||++-+
>> +---V---V---VV+   /
>> |   VFIOAddressSpace  | <+
>> |  |  |  MemoryListener
>> |  VFIOContainer list |
>> +---+++
>> ||
>> ||
>> +---V--++V--+
>> |   iommufd||vfio legacy|
>> |  container   || container |
>> +---+--+++--+
>> ||
>> | /dev/iommu | /dev/vfio/vfio
>> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>>  Userspace  ||
>>
>> ===++===
>> =
>>  Kernel |  device fd |
>> +---+| group/container fd
>> | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
>> |  ATTACH_IOAS) || device fd
>> |   ||
>> |   +---VV-+
>> iommufd |   |vfio  |
>> (map/unmap  |   +-++---+
>>  ioas_copy) | || map/unmap
>> | ||
>>  +--V--++-V--+  +--V+
>>  | iommfd core ||  device|  |  vfio iommu   |
>>  +-+++  +---+
> last row: s/iommfd/iommufd/
>
> overall this sounds a reasonable abstraction. Later when vdpa starts
> supporting iommufd probably the iommufd BE will become even
> smaller with more logic shareable between vfio and vdpa.
>
>> [Secure Context setup]
>> - iommufd BE: uses device fd and iommufd to setup secure context
>>   (bind_iommufd, attach_ioas)
>> - vfio legacy BE: uses group fd and container fd to setup secure context
>>   (set_container, set_iommu)
>> [Device access]
>> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>> - vfio legacy BE: device fd is retrieved from group fd ioctl
>> [DMA Mapping flow]
>> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>> - VFIO populates DMA map/unmap via the container BEs
>>   *) iommufd BE: uses iommufd
>>   *) vfio legacy BE: uses container fd
>>
>> This series qomifies the VFIOContainer object which acts as a base class
> what does 'qomify' mean? I didn't find this word from dictionary...
sorry this is pure QEMU terminology. This stands for "QEMU Object Model"
additional info at:
https://qemu.readthedocs.io/en/latest/devel/qom.html

Eric
>
>> for a container. This base class is derived into the legacy VFIO container
>> and the new iommufd based container. The base class implements generic
>> code
>> such as code related to memory_listener and address

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Eric Auger

Hi,

On 4/18/22 2:09 PM, Yi Liu wrote:
> Hi Kevin,
>
> On 2022/4/18 16:49, Tian, Kevin wrote:
>>> From: Liu, Yi L 
>>> Sent: Thursday, April 14, 2022 6:47 PM
>>>
>>> With the introduction of iommufd[1], the linux kernel provides a
>>> generic
>>> interface for userspace drivers to propagate their DMA mappings to
>>> kernel
>>> for assigned devices. This series does the porting of the VFIO devices
>>> onto the /dev/iommu uapi and let it coexist with the legacy
>>> implementation.
>>> Other devices like vpda, vfio mdev and etc. are not considered yet.
>>
>> vfio mdev has no special support in Qemu. Just that it's not supported
>> by iommufd yet thus can only be operated in legacy container
>> interface at
>> this point. Later once it's supported by the kernel suppose no
>> additional
>> enabling work is required for mdev in Qemu.
>
> yes. will make it more precise in next version.
>
>>>
>>> For vfio devices, the new interface is tied with device fd and iommufd
>>> as the iommufd solution is device-centric. This is different from
>>> legacy
>>> vfio which is group-centric. To support both interfaces in QEMU, this
>>> series introduces the iommu backend concept in the form of different
>>> container classes. The existing vfio container is named legacy
>>> container
>>> (equivalent with legacy iommu backend in this series), while the new
>>> iommufd based container is named as iommufd container (may also be
>>> mentioned
>>> as iommufd backend in this series). The two backend types have their
>>> own
>>> way to setup secure context and dma management interface. Below diagram
>>> shows how it looks like with both BEs.
>>>
>>>  VFIO   AddressSpace/Memory
>>>  +---+  +--+  +-+  +-+
>>>  |  pci  |  | platform |  |  ap |  | ccw |
>>>  +---+---+  ++-+  +--+--+  +--+--+
>>> +--+
>>>  |   |   |    |    |  
>>> AddressSpace   |
>>>  |   |   |    |   
>>> ++-+
>>>  +---V---V---VV+   /
>>>  |   VFIOAddressSpace  | <+
>>>  |  |  |  MemoryListener
>>>  |  VFIOContainer list |
>>>  +---+++
>>>  |    |
>>>  |    |
>>>  +---V--+    +V--+
>>>  |   iommufd    |    |    vfio legacy    |
>>>  |  container   |    | container |
>>>  +---+--+    ++--+
>>>  |    |
>>>  | /dev/iommu | /dev/vfio/vfio
>>>  | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
>>>   Userspace  |    |
>>>
>>> ===++===
>>> =
>>>   Kernel |  device fd |
>>>  +---+    | group/container fd
>>>  | (BIND_IOMMUFD |    | (SET_CONTAINER/SET_IOMMU)
>>>  |  ATTACH_IOAS) |    | device fd
>>>  |   |    |
>>>  |   +---VV-+
>>>  iommufd |   |    vfio  |
>>> (map/unmap  |   +-++---+
>>>   ioas_copy) | |    | map/unmap
>>>  | |    |
>>>   +--V--+    +-V--+  +--V+
>>>   | iommfd core |    |  device    |  |  vfio iommu   |
>>>   +-+    ++  +---+
>>
>> last row: s/iommfd/iommufd/
>
> thanks. a typo.
>
>> overall this sounds a reasonable abstraction. Later when vdpa starts
>> supporting iommufd probably the iommufd BE will become even
>> smaller with more logic shareable between vfio and vdpa.
>
> let's see if Jason Wang will give some idea. :-)
>
>>>
>>> [Secure Context setup]
>>> - iommufd BE: uses device fd and iommufd to setup secure context
>>>    (bind_iommufd, attach_ioas)
>>> - vfio legacy BE: uses group fd and container fd to setup secure
>>> context
>>>    (set_container, set_iommu)
>>> [Device access]
>>> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>>> - vfio legacy BE: device fd is retrieved from group fd ioctl
>>> [DMA Mapping flow]
>>> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>>> - VFIO populates DMA map/unmap via the container BEs
>>>    *) iommufd BE: uses iommufd
>>>    *) vfio legacy BE: uses container fd
>>>
>>> This series qomifies the VFIOContainer object which acts as a base
>>> class
>>
>> what does 'qomify' mean? I didn't find this word from dictionary...
>>
>>> for a

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Eric Auger

Hi Nicolin,

On 4/19/22 5:26 AM, Nicolin Chen wrote:
> On Sun, Apr 17, 2022 at 12:30:40PM +0200, Eric Auger wrote:
>
 - More tests
>>> I did a quick test on my ARM64 platform, using "iommu=smmuv3"
>>> string. The behaviors are different between using default and
>>> using legacy "iommufd=off".
>>>
>>> The legacy pathway exits the VM with:
>>> vfio 0002:01:00.0:
>>> failed to setup container for group 1:
>>> memory listener initialization failed:
>>> Region smmuv3-iommu-memory-region-16-0:
>>> device 00.02.0 requires iommu MAP notifier which is not currently 
>>> supported
>>>
>>> while the iommufd pathway started the VM but reported errors
>>> from host kernel about address translation failures, probably
>>> because of accessing unmapped addresses.
>>>
>>> I found iommufd pathway also calls error_propagate_prepend()
>>> to add to errp for not supporting IOMMU_NOTIFIER_MAP, but it
>>> doesn't get a chance to print errp out. Perhaps there should
>>> be a final error check somewhere to exit?
>> thank you for giving it a try.
>>
>> vsmmuv3 + vfio is not supported as we miss the HW nested stage support
>> and SMMU does not support cache mode. If you want to test viommu on ARM
>> you shall test virtio-iommu+vfio. This should work but this is not yet
>> tested.
> I tried "-device virtio-iommu" and "-device virtio-iommu-pci"
> separately with vfio-pci, but neither seems to work. The host
> SMMU driver reports Translation Faults.
>
> Do you know what commands I should use to run QEMU for that
> combination?
you shall use :

 -device virtio-iommu-pci -device vfio-pci,host=

Please make sure the "-device virtio-iommu-pci" is set *before* the
"-device vfio-pci,"

Otherwise the IOMMU MR notifiers are not set properly and this may be
the cause of your physical SMMU translations faults.

Eric
>
>> I pushed a fix for the error notification issue:
>> qemu-for-5.17-rc6-vm-rfcv2-rc0 on my git https://github.com/eauger/qemu.git
> Yes. This fixes the problem. Thanks!
>

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Alex Williamson

On Mon, 25 Apr 2022 11:10:14 +0100
Daniel P. Berrangé  wrote:

> On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:
> > [Cc +libvirt folks]
> > 
> > On Thu, 14 Apr 2022 03:46:52 -0700
> > Yi Liu  wrote:
> >   
> > > With the introduction of iommufd[1], the linux kernel provides a generic
> > > interface for userspace drivers to propagate their DMA mappings to kernel
> > > for assigned devices. This series does the porting of the VFIO devices
> > > onto the /dev/iommu uapi and let it coexist with the legacy 
> > > implementation.
> > > Other devices like vpda, vfio mdev and etc. are not considered yet.  
> 
> snip
> 
> > > The selection of the backend is made on a device basis using the new
> > > iommufd option (on/off/auto). By default the iommufd backend is selected
> > > if supported by the host and by QEMU (iommufd KConfig). This option is
> > > currently available only for the vfio-pci device. For other types of
> > > devices, it does not yet exist and the legacy BE is chosen by default.  
> > 
> > I've discussed this a bit with Eric, but let me propose a different
> > command line interface.  Libvirt generally likes to pass file
> > descriptors to QEMU rather than grant it access to those files
> > directly.  This was problematic with vfio-pci because libvirt can't
> > easily know when QEMU will want to grab another /dev/vfio/vfio
> > container.  Therefore we abandoned this approach and instead libvirt
> > grants file permissions.
> > 
> > However, with iommufd there's no reason that QEMU ever needs more than
> > a single instance of /dev/iommufd and we're using per device vfio file
> > descriptors, so it seems like a good time to revisit this.  
> 
> I assume access to '/dev/iommufd' gives the process somewhat elevated
> privileges, such that you don't want to unconditionally give QEMU
> access to this device ?

It's not that much dissimilar to /dev/vfio/vfio, it's an unprivileged
interface which should have limited scope for abuse, but more so here
the goal would be to de-privilege QEMU that one step further that it
cannot open the device file itself.

> > The interface I was considering would be to add an iommufd object to
> > QEMU, so we might have a:
> > 
> > -device iommufd[,fd=#][,id=foo]
> > 
> > For non-libivrt usage this would have the ability to open /dev/iommufd
> > itself if an fd is not provided.  This object could be shared with
> > other iommufd users in the VM and maybe we'd allow multiple instances
> > for more esoteric use cases.  [NB, maybe this should be a -object rather 
> > than
> > -device since the iommufd is not a guest visible device?]  
> 
> Yes,  -object would be the right answer for something that's purely
> a host side backend impl selector.
> 
> > The vfio-pci device might then become:
> > 
> > -device 
> > vfio-pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]
> > 
> > So essentially we can specify the device via host, sysfsdev, or passing
> > an fd to the vfio device file.  When an iommufd object is specified,
> > "foo" in the example above, each of those options would use the
> > vfio-device access mechanism, essentially the same as iommufd=on in
> > your example.  With the fd passing option, an iommufd object would be
> > required and necessarily use device level access.
> > 
> > In your example, the iommufd=auto seems especially troublesome for
> > libvirt because QEMU is going to have different locked memory
> > requirements based on whether we're using type1 or iommufd, where the
> > latter resolves the duplicate accounting issues.  libvirt needs to know
> > deterministically which backed is being used, which this proposal seems
> > to provide, while at the same time bringing us more in line with fd
> > passing.  Thoughts?  Thanks,  
> 
> Yep, I agree that libvirt needs to have more direct control over this.
> This is also even more important if there are notable feature differences
> in the 2 backends.
> 
> I wonder if anyone has considered an even more distinct impl, whereby
> we have a completely different device type on the backend, eg
> 
>   -device 
> vfio-iommu-pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]
> 
> If a vendor wants to fully remove the legacy impl, they can then use the
> Kconfig mechanism to disable the build of the legacy impl device, while
> keeping the iommu impl (or vica-verca if the new iommu impl isn't considered
> reliable enough for them to support yet).
> 
> Libvirt would use
> 
>-object iommu,id=iommu0,fd=NNN
>-device vfio-iommu-pci,fd=MMM,iommu=iommu0
> 
> Non-libvirt would use a simpler
> 
>-device vfio-iommu-pci,host=:03:22.1
> 
> with QEMU auto-creating a 'iommu' object in the background.
> 
> This would fit into libvirt's existing modelling better. We currently have
> a concept of a PCI assignment backend, which previously supported the
> legacy PCI assignment, vs the VFIO PCI assignment. This new iommu impl
> feels like a 3rd PCI assignment

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Jason Gunthorpe

On Mon, Apr 25, 2022 at 11:10:14AM +0100, Daniel P. Berrangé wrote:

> > However, with iommufd there's no reason that QEMU ever needs more than
> > a single instance of /dev/iommufd and we're using per device vfio file
> > descriptors, so it seems like a good time to revisit this.
> 
> I assume access to '/dev/iommufd' gives the process somewhat elevated
> privileges, such that you don't want to unconditionally give QEMU
> access to this device ?

I doesn't give much, at worst it allows userspace to allocate kernel
memory and pin pages which can be already be done through all sorts of
other interfaces qemu already has access to..

Jason

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-25 Thread Daniel P . Berrangé

On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:
> [Cc +libvirt folks]
> 
> On Thu, 14 Apr 2022 03:46:52 -0700
> Yi Liu  wrote:
> 
> > With the introduction of iommufd[1], the linux kernel provides a generic
> > interface for userspace drivers to propagate their DMA mappings to kernel
> > for assigned devices. This series does the porting of the VFIO devices
> > onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> > Other devices like vpda, vfio mdev and etc. are not considered yet.

snip

> > The selection of the backend is made on a device basis using the new
> > iommufd option (on/off/auto). By default the iommufd backend is selected
> > if supported by the host and by QEMU (iommufd KConfig). This option is
> > currently available only for the vfio-pci device. For other types of
> > devices, it does not yet exist and the legacy BE is chosen by default.
> 
> I've discussed this a bit with Eric, but let me propose a different
> command line interface.  Libvirt generally likes to pass file
> descriptors to QEMU rather than grant it access to those files
> directly.  This was problematic with vfio-pci because libvirt can't
> easily know when QEMU will want to grab another /dev/vfio/vfio
> container.  Therefore we abandoned this approach and instead libvirt
> grants file permissions.
> 
> However, with iommufd there's no reason that QEMU ever needs more than
> a single instance of /dev/iommufd and we're using per device vfio file
> descriptors, so it seems like a good time to revisit this.

I assume access to '/dev/iommufd' gives the process somewhat elevated
privileges, such that you don't want to unconditionally give QEMU
access to this device ?

> The interface I was considering would be to add an iommufd object to
> QEMU, so we might have a:
> 
> -device iommufd[,fd=#][,id=foo]
> 
> For non-libivrt usage this would have the ability to open /dev/iommufd
> itself if an fd is not provided.  This object could be shared with
> other iommufd users in the VM and maybe we'd allow multiple instances
> for more esoteric use cases.  [NB, maybe this should be a -object rather than
> -device since the iommufd is not a guest visible device?]

Yes,  -object would be the right answer for something that's purely
a host side backend impl selector.

> The vfio-pci device might then become:
> 
> -device 
> vfio-pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]
> 
> So essentially we can specify the device via host, sysfsdev, or passing
> an fd to the vfio device file.  When an iommufd object is specified,
> "foo" in the example above, each of those options would use the
> vfio-device access mechanism, essentially the same as iommufd=on in
> your example.  With the fd passing option, an iommufd object would be
> required and necessarily use device level access.
> 
> In your example, the iommufd=auto seems especially troublesome for
> libvirt because QEMU is going to have different locked memory
> requirements based on whether we're using type1 or iommufd, where the
> latter resolves the duplicate accounting issues.  libvirt needs to know
> deterministically which backed is being used, which this proposal seems
> to provide, while at the same time bringing us more in line with fd
> passing.  Thoughts?  Thanks,

Yep, I agree that libvirt needs to have more direct control over this.
This is also even more important if there are notable feature differences
in the 2 backends.

I wonder if anyone has considered an even more distinct impl, whereby
we have a completely different device type on the backend, eg

  -device 
vfio-iommu-pci[,host=:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]

If a vendor wants to fully remove the legacy impl, they can then use the
Kconfig mechanism to disable the build of the legacy impl device, while
keeping the iommu impl (or vica-verca if the new iommu impl isn't considered
reliable enough for them to support yet).

Libvirt would use

   -object iommu,id=iommu0,fd=NNN
   -device vfio-iommu-pci,fd=MMM,iommu=iommu0

Non-libvirt would use a simpler

   -device vfio-iommu-pci,host=:03:22.1

with QEMU auto-creating a 'iommu' object in the background.

This would fit into libvirt's existing modelling better. We currently have
a concept of a PCI assignment backend, which previously supported the
legacy PCI assignment, vs the VFIO PCI assignment. This new iommu impl
feels like a 3rd PCI assignment approach, and so fits with how we modelled
it as a different device type in the past.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-22 Thread Alex Williamson

[Cc +libvirt folks]

On Thu, 14 Apr 2022 03:46:52 -0700
Yi Liu  wrote:

> With the introduction of iommufd[1], the linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> Other devices like vpda, vfio mdev and etc. are not considered yet.
> 
> For vfio devices, the new interface is tied with device fd and iommufd
> as the iommufd solution is device-centric. This is different from legacy
> vfio which is group-centric. To support both interfaces in QEMU, this
> series introduces the iommu backend concept in the form of different
> container classes. The existing vfio container is named legacy container
> (equivalent with legacy iommu backend in this series), while the new
> iommufd based container is named as iommufd container (may also be mentioned
> as iommufd backend in this series). The two backend types have their own
> way to setup secure context and dma management interface. Below diagram
> shows how it looks like with both BEs.
> 
> VFIO   AddressSpace/Memory
> +---+  +--+  +-+  +-+
> |  pci  |  | platform |  |  ap |  | ccw |
> +---+---+  ++-+  +--+--+  +--+--+ +--+
> |   |   |||   AddressSpace   |
> |   |   ||++-+
> +---V---V---VV+   /
> |   VFIOAddressSpace  | <+
> |  |  |  MemoryListener
> |  VFIOContainer list |
> +---+++
> ||
> ||
> +---V--++V--+
> |   iommufd||vfio legacy|
> |  container   || container |
> +---+--+++--+
> ||
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>  Userspace  ||
>  ===++
>  Kernel |  device fd |
> +---+| group/container fd
> | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
> |  ATTACH_IOAS) || device fd
> |   ||
> |   +---VV-+
> iommufd |   |vfio  |
> (map/unmap  |   +-++---+
>  ioas_copy) | || map/unmap
> | ||
>  +--V--++-V--+  +--V+
>  | iommfd core ||  device|  |  vfio iommu   |
>  +-+++  +---+
> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>   (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>   (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> - VFIO populates DMA map/unmap via the container BEs
>   *) iommufd BE: uses iommufd
>   *) vfio legacy BE: uses container fd
> 
> This series qomifies the VFIOContainer object which acts as a base class
> for a container. This base class is derived into the legacy VFIO container
> and the new iommufd based container. The base class implements generic code
> such as code related to memory_listener and address space management whereas
> the derived class implements callbacks that depend on the kernel user space
> being used.
> 
> The selection of the backend is made on a device basis using the new
> iommufd option (on/off/auto). By default the iommufd backend is selected
> if supported by the host and by QEMU (iommufd KConfig). This option is
> currently available only for the vfio-pci device. For other types of
> devices, it does not yet exist and the legacy BE is chosen by default.

I've discussed this a bit with Eric, but let me propose a different
command line interface.  Libvirt generally likes to pass file
descriptors to QEMU rather than grant it access to those files
directly.  This was problematic with vfio-pci because libvirt can't
easily know when QEMU will want to grab another /dev/vfio/vfio
container.  Therefore we abandoned

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-18 Thread Nicolin Chen

On Sun, Apr 17, 2022 at 12:30:40PM +0200, Eric Auger wrote:

> >> - More tests
> > I did a quick test on my ARM64 platform, using "iommu=smmuv3"
> > string. The behaviors are different between using default and
> > using legacy "iommufd=off".
> >
> > The legacy pathway exits the VM with:
> > vfio 0002:01:00.0:
> > failed to setup container for group 1:
> > memory listener initialization failed:
> > Region smmuv3-iommu-memory-region-16-0:
> > device 00.02.0 requires iommu MAP notifier which is not currently 
> > supported
> >
> > while the iommufd pathway started the VM but reported errors
> > from host kernel about address translation failures, probably
> > because of accessing unmapped addresses.
> >
> > I found iommufd pathway also calls error_propagate_prepend()
> > to add to errp for not supporting IOMMU_NOTIFIER_MAP, but it
> > doesn't get a chance to print errp out. Perhaps there should
> > be a final error check somewhere to exit?
> 
> thank you for giving it a try.
> 
> vsmmuv3 + vfio is not supported as we miss the HW nested stage support
> and SMMU does not support cache mode. If you want to test viommu on ARM
> you shall test virtio-iommu+vfio. This should work but this is not yet
> tested.

I tried "-device virtio-iommu" and "-device virtio-iommu-pci"
separately with vfio-pci, but neither seems to work. The host
SMMU driver reports Translation Faults.

Do you know what commands I should use to run QEMU for that
combination?

> I pushed a fix for the error notification issue:
> qemu-for-5.17-rc6-vm-rfcv2-rc0 on my git https://github.com/eauger/qemu.git

Yes. This fixes the problem. Thanks!

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-18 Thread Yi Liu


Hi Kevin,

On 2022/4/18 16:49, Tian, Kevin wrote:

From: Liu, Yi L 
Sent: Thursday, April 14, 2022 6:47 PM

With the introduction of iommufd[1], the linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.
Other devices like vpda, vfio mdev and etc. are not considered yet.


vfio mdev has no special support in Qemu. Just that it's not supported
by iommufd yet thus can only be operated in legacy container interface at
this point. Later once it's supported by the kernel suppose no additional
enabling work is required for mdev in Qemu.


yes. will make it more precise in next version.



For vfio devices, the new interface is tied with device fd and iommufd
as the iommufd solution is device-centric. This is different from legacy
vfio which is group-centric. To support both interfaces in QEMU, this
series introduces the iommu backend concept in the form of different
container classes. The existing vfio container is named legacy container
(equivalent with legacy iommu backend in this series), while the new
iommufd based container is named as iommufd container (may also be
mentioned
as iommufd backend in this series). The two backend types have their own
way to setup secure context and dma management interface. Below diagram
shows how it looks like with both BEs.

 VFIO   AddressSpace/Memory
 +---+  +--+  +-+  +-+
 |  pci  |  | platform |  |  ap |  | ccw |
 +---+---+  ++-+  +--+--+  +--+--+ +--+
 |   |   |||   AddressSpace   |
 |   |   ||++-+
 +---V---V---VV+   /
 |   VFIOAddressSpace  | <+
 |  |  |  MemoryListener
 |  VFIOContainer list |
 +---+++
 ||
 ||
 +---V--++V--+
 |   iommufd||vfio legacy|
 |  container   || container |
 +---+--+++--+
 ||
 | /dev/iommu | /dev/vfio/vfio
 | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
  Userspace  ||

===++===
=
  Kernel |  device fd |
 +---+| group/container fd
 | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
 |  ATTACH_IOAS) || device fd
 |   ||
 |   +---VV-+
 iommufd |   |vfio  |
(map/unmap  |   +-++---+
  ioas_copy) | || map/unmap
 | ||
  +--V--++-V--+  +--V+
  | iommfd core ||  device|  |  vfio iommu   |
  +-+++  +---+


last row: s/iommfd/iommufd/


thanks. a typo.


overall this sounds a reasonable abstraction. Later when vdpa starts
supporting iommufd probably the iommufd BE will become even
smaller with more logic shareable between vfio and vdpa.


let's see if Jason Wang will give some idea. :-)



[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
   (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
   (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
- VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
- VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd

This series qomifies the VFIOContainer object which acts as a base class


what does 'qomify' mean? I didn't find this word from dictionary...


for a container. This base class is derived into the legacy VFIO container
and the new iommufd based container. The base class implements generic
code
such as code related to memory_listener and address space management
whereas
the derived class implements callbacks that depend on the kernel user space


'the kernel user space'?


aha, just want to express different BE callbacks will use different user 
interface exposed by

RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-18 Thread Tian, Kevin

> From: Liu, Yi L 
> Sent: Thursday, April 14, 2022 6:47 PM
> 
> With the introduction of iommufd[1], the linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> Other devices like vpda, vfio mdev and etc. are not considered yet.

vfio mdev has no special support in Qemu. Just that it's not supported
by iommufd yet thus can only be operated in legacy container interface at
this point. Later once it's supported by the kernel suppose no additional
enabling work is required for mdev in Qemu.

> 
> For vfio devices, the new interface is tied with device fd and iommufd
> as the iommufd solution is device-centric. This is different from legacy
> vfio which is group-centric. To support both interfaces in QEMU, this
> series introduces the iommu backend concept in the form of different
> container classes. The existing vfio container is named legacy container
> (equivalent with legacy iommu backend in this series), while the new
> iommufd based container is named as iommufd container (may also be
> mentioned
> as iommufd backend in this series). The two backend types have their own
> way to setup secure context and dma management interface. Below diagram
> shows how it looks like with both BEs.
> 
> VFIO   AddressSpace/Memory
> +---+  +--+  +-+  +-+
> |  pci  |  | platform |  |  ap |  | ccw |
> +---+---+  ++-+  +--+--+  +--+--+ +--+
> |   |   |||   AddressSpace   |
> |   |   ||++-+
> +---V---V---VV+   /
> |   VFIOAddressSpace  | <+
> |  |  |  MemoryListener
> |  VFIOContainer list |
> +---+++
> ||
> ||
> +---V--++V--+
> |   iommufd||vfio legacy|
> |  container   || container |
> +---+--+++--+
> ||
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>  Userspace  ||
> 
> ===++===
> =
>  Kernel |  device fd |
> +---+| group/container fd
> | (BIND_IOMMUFD || (SET_CONTAINER/SET_IOMMU)
> |  ATTACH_IOAS) || device fd
> |   ||
> |   +---VV-+
> iommufd |   |vfio  |
> (map/unmap  |   +-++---+
>  ioas_copy) | || map/unmap
> | ||
>  +--V--++-V--+  +--V+
>  | iommfd core ||  device|  |  vfio iommu   |
>  +-+++  +---+

last row: s/iommfd/iommufd/

overall this sounds a reasonable abstraction. Later when vdpa starts
supporting iommufd probably the iommufd BE will become even
smaller with more logic shareable between vfio and vdpa.

> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>   (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>   (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> - VFIO populates DMA map/unmap via the container BEs
>   *) iommufd BE: uses iommufd
>   *) vfio legacy BE: uses container fd
> 
> This series qomifies the VFIOContainer object which acts as a base class

what does 'qomify' mean? I didn't find this word from dictionary...

> for a container. This base class is derived into the legacy VFIO container
> and the new iommufd based container. The base class implements generic
> code
> such as code related to memory_listener and address space management
> whereas
> the derived class implements callbacks that depend on the kernel user space

'the kernel user space'?

> being used.
> 
> The selection of the backend is made on a device basis using the new
> iommufd option (on/off/auto). By default the iommufd backend is

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-17 Thread Eric Auger

Hi Nicolin,

On 4/15/22 10:37 AM, Nicolin Chen wrote:
> Hi,
>
> Thanks for the work!
>
> On Thu, Apr 14, 2022 at 03:46:52AM -0700, Yi Liu wrote:
>  
>> - More tests
> I did a quick test on my ARM64 platform, using "iommu=smmuv3"
> string. The behaviors are different between using default and
> using legacy "iommufd=off".
>
> The legacy pathway exits the VM with:
> vfio 0002:01:00.0:
> failed to setup container for group 1:
> memory listener initialization failed:
> Region smmuv3-iommu-memory-region-16-0:
> device 00.02.0 requires iommu MAP notifier which is not currently 
> supported
>
> while the iommufd pathway started the VM but reported errors
> from host kernel about address translation failures, probably
> because of accessing unmapped addresses.
>
> I found iommufd pathway also calls error_propagate_prepend()
> to add to errp for not supporting IOMMU_NOTIFIER_MAP, but it
> doesn't get a chance to print errp out. Perhaps there should
> be a final error check somewhere to exit?

thank you for giving it a try.

vsmmuv3 + vfio is not supported as we miss the HW nested stage support
and SMMU does not support cache mode. If you want to test viommu on ARM
you shall test virtio-iommu+vfio. This should work but this is not yet
tested.

I pushed a fix for the error notification issue:
qemu-for-5.17-rc6-vm-rfcv2-rc0 on my git https://github.com/eauger/qemu.git

Thanks

Eric
>
> Nic
>

Re: [RFC 00/18] vfio: Adopt iommufd

2022-04-15 Thread Nicolin Chen

Hi,

Thanks for the work!

On Thu, Apr 14, 2022 at 03:46:52AM -0700, Yi Liu wrote:
 
> - More tests

I did a quick test on my ARM64 platform, using "iommu=smmuv3"
string. The behaviors are different between using default and
using legacy "iommufd=off".

The legacy pathway exits the VM with:
vfio 0002:01:00.0:
failed to setup container for group 1:
memory listener initialization failed:
Region smmuv3-iommu-memory-region-16-0:
device 00.02.0 requires iommu MAP notifier which is not currently supported

while the iommufd pathway started the VM but reported errors
from host kernel about address translation failures, probably
because of accessing unmapped addresses.

I found iommufd pathway also calls error_propagate_prepend()
to add to errp for not supporting IOMMU_NOTIFIER_MAP, but it
doesn't get a chance to print errp out. Perhaps there should
be a final error check somewhere to exit?

Nic

43 matches

Mail list logo