subject:"\"RE\\\: \\\(proposal\\\) RE\\\: \\\[PATCH v7 00\\\/16\\\] vfio\\\: expose virtual Shared Virtual Addressing to VMs\""

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-04 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 08:14:29PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote:
> > I think the same PCI driver with a small flag to support the PF or
> > VF is not the same as two completely different drivers in different
> > subsystems
> 
> There are counter-examples: ixgbe vs. ixgbevf.
>
> Note that also a single driver can support both, an SVA device and an
> mdev device, sharing code for accessing parts of the device like queues
> and handling interrupts.

Needing a mdev device at all is the larger issue, mdev means the
kernel must carry a lot of emulation code depending on how the SVA
device is designed. Eg creating queues may require an emulated BAR.

Shifting that code to userspace and having a single clean 'SVA'
interface from the kernel for the device makes a lot more sense,
esepcially from a security perspective.

Forcing all vIOMMU stuff to only use VFIO permanently closes this as
an option.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote:
> I think the same PCI driver with a small flag to support the PF or
> VF is not the same as two completely different drivers in different
> subsystems

There are counter-examples: ixgbe vs. ixgbevf.

Note that also a single driver can support both, an SVA device and an
mdev device, sharing code for accessing parts of the device like queues
and handling interrupts.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 05:55:40PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote:
> > This whole thread was brought up by IDXD which has a SVA driver and
> > now wants to add a vfio-mdev driver too. SVA devices that want to be
> > plugged into VMs are going to be common - this architecture that a SVA
> > driver cannot cover the kvm case seems problematic.
> 
> Isn't that the same pattern as having separate drivers for VFs and the
> parent device in SR-IOV?

I think the same PCI driver with a small flag to support the PF or
VF is not the same as two completely different drivers in different
subsystems

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote:
> This whole thread was brought up by IDXD which has a SVA driver and
> now wants to add a vfio-mdev driver too. SVA devices that want to be
> plugged into VMs are going to be common - this architecture that a SVA
> driver cannot cover the kvm case seems problematic.

Isn't that the same pattern as having separate drivers for VFs and the
parent device in SR-IOV?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 03:35:32PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote:
> > The point is that other places beyond VFIO need this
> 
> Which and why?
>
> > Sure, but sometimes it is necessary, and in those cases the answer
> > can't be "rewrite a SVA driver to use vfio"
> 
> This is getting to abstract. Can you come up with an example where
> handling this in VFIO or an endpoint device kernel driver does not work?

This whole thread was brought up by IDXD which has a SVA driver and
now wants to add a vfio-mdev driver too. SVA devices that want to be
plugged into VMs are going to be common - this architecture that a SVA
driver cannot cover the kvm case seems problematic.

Yes, everything can have a SVA driver and a vfio-mdev, it works just
fine, but it is not very clean or simple.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote:
> The point is that other places beyond VFIO need this

Which and why?

> Sure, but sometimes it is necessary, and in those cases the answer
> can't be "rewrite a SVA driver to use vfio"

This is getting to abstract. Can you come up with an example where
handling this in VFIO or an endpoint device kernel driver does not work?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 03:03:18PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote:
> > Userspace needs fine grained control over the composition of the page
> > table behind the PASID, 1:1 with the mm_struct is only one use case.
> 
> VFIO already offers an interface for that. It shouldn't be too
> complicated to expand that for PASID-bound page-tables.
> 
> > Userspace needs to be able to handle IOMMU faults, apparently
> 
> Could be implemented by a fault-fd handed out by VFIO.

The point is that other places beyond VFIO need this

> I really don't think that user-space should have to deal with details
> like PASIDs or other IOMMU internals, unless absolutly necessary. This
> is an OS we work on, and the idea behind an OS is to abstract the
> hardware away.

Sure, but sometimes it is necessary, and in those cases the answer
can't be "rewrite a SVA driver to use vfio"

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote:
> Userspace needs fine grained control over the composition of the page
> table behind the PASID, 1:1 with the mm_struct is only one use case.

VFIO already offers an interface for that. It shouldn't be too
complicated to expand that for PASID-bound page-tables.

> Userspace needs to be able to handle IOMMU faults, apparently

Could be implemented by a fault-fd handed out by VFIO.

> The Intel guys had a bunch of other stuff too, looking through the new
> API they are proposing for vfio gives some flavour what they think is
> needed..

I really don't think that user-space should have to deal with details
like PASIDs or other IOMMU internals, unless absolutly necessary. This
is an OS we work on, and the idea behind an OS is to abstract the
hardware away.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 02:18:52PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> > > So having said this, what is the benefit of exposing those SVA internals
> > > to user-space?
> > 
> > Only the device use of the PASID is device specific, the actual PASID
> > and everything on the IOMMU side is generic.
> > 
> > There is enough API there it doesn't make sense to duplicate it into
> > every single SVA driver.
> 
> What generic things have to be done by the drivers besides
> allocating/deallocating PASIDs and binding an address space to it?
> 
> Is there anything which isn't better handled in a kernel-internal
> library which drivers just use?

Userspace needs fine grained control over the composition of the page
table behind the PASID, 1:1 with the mm_struct is only one use case.

Userspace needs to be able to handle IOMMU faults, apparently

The Intel guys had a bunch of other stuff too, looking through the new
API they are proposing for vfio gives some flavour what they think is
needed..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> > So having said this, what is the benefit of exposing those SVA internals
> > to user-space?
> 
> Only the device use of the PASID is device specific, the actual PASID
> and everything on the IOMMU side is generic.
> 
> There is enough API there it doesn't make sense to duplicate it into
> every single SVA driver.

What generic things have to be done by the drivers besides
allocating/deallocating PASIDs and binding an address space to it?

Is there anything which isn't better handled in a kernel-internal
library which drivers just use?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe

On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> So having said this, what is the benefit of exposing those SVA internals
> to user-space?

Only the device use of the PASID is device specific, the actual PASID
and everything on the IOMMU side is generic.

There is enough API there it doesn't make sense to duplicate it into
every single SVA driver.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org

On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > From: Jason Wang 

> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).

Honestly, I fail to see the benefit of offloading these IOMMU specific
setup tasks to user-space.

The ways PASID and the device partitioning it allows are used are very
device specific. A GPU will be partitioned completly different than a
network card. So the device drivers should use the (v)SVA APIs to setup
the partitioning in a way which makes sense for the device.

And VFIO is of course a user by itself, as it allows assigning device
partitions to guests. Or even allow assigning complete devices and allow
the guests to partition it themselfes.

So having said this, what is the benefit of exposing those SVA internals
to user-space?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Wang



On 2020/10/22 上午11:54, Liu, Yi L wrote:

Hi Jason,


From: Jason Wang 
Sent: Thursday, October 22, 2020 10:56 AM


[...]

If you(Intel) don't have plan to do vDPA, you should not prevent other vendors
from implementing PASID capable hardware through non-VFIO subsystem/uAPI
on top of your SIOV architecture. Isn't it?

yes, that's true.


So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g 
it's not
hard to have a PASID capable virtio device through qemu, and we can start from
there.

actually, I'm already doing a poc to move the PASID allocation/free interface
out of VFIO. So that other users could use it as well. I think this is also
what you replied previously. :-) I'll send out when it's ready and seek for
your help on mature it. does it sound good to you?



Yes, fine with me.

Thanks




Regards,
Yi Liu


Thanks




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Liu, Yi L

Hi Jason,

> From: Jason Wang 
> Sent: Thursday, October 22, 2020 10:56 AM
> 
[...]
> If you(Intel) don't have plan to do vDPA, you should not prevent other vendors
> from implementing PASID capable hardware through non-VFIO subsystem/uAPI
> on top of your SIOV architecture. Isn't it?

yes, that's true.

> So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g 
> it's not
> hard to have a PASID capable virtio device through qemu, and we can start from
> there.

actually, I'm already doing a poc to move the PASID allocation/free interface
out of VFIO. So that other users could use it as well. I think this is also
what you replied previously. :-) I'll send out when it's ready and seek for
your help on mature it. does it sound good to you?

Regards,
Yi Liu

> 
> Thanks
> 
> 
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Wang



On 2020/10/22 上午1:51, Raj, Ashok wrote:

On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:

On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:

On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:

I think we agreed (or agree to disagree and commit) for device types that
we have for SIOV, VFIO based approach works well without having to re-invent
another way to do the same things. Not looking for a shortcut by any means,
but we need to plan around existing hardware though. Looks like vDPA took
some shortcuts then to not abstract iommu uAPI instead :-)? When all
necessary hardware was available.. This would be a solved puzzle.

I think it is the opposite, vIOMMU and related has outgrown VFIO as
the "home" and needs to stand alone.

Apparently the HW that will need PASID for vDPA is Intel HW, so if

So just to make this clear, I did check internally if there are any plans
for vDPA + SVM. There are none at the moment.

Not SVM, SIOV.

... And that included.. I should have said vDPA + PASID, No current plans.
I have no idea who set expectations with you. Can you please put me in touch
with that person, privately is fine.

It was the team that aruged VDPA had to be done through VFIO - SIOV
and PASID was one of their reasons it had to be VFIO, check the list
archives

Humm... I could search the arhives, but the point is I'm confirming that
there is no forward looking plan!

And who ever did was it was based on probably strawman hypothetical argument 
that wasn't
grounded in reality.


If they didn't plan to use it, bit of a strawman argument, right?

This doesn't need to continue like the debates :-) Pun intended :-)

I don't think it makes any sense to have an abstract strawman argument
design discussion. Yi is looking into for pasid management alone. Rest
of the IOMMU related topics should wait until we have another
*real* use requiring consolidation.

Contrary to your argument, vDPA went with a half blown device only
iommu user without considering existing abstractions like containers
and such in VFIO is part of the reason the gap is big at the moment.
And you might not agree, but that's beside the point.



Can you explain why it must care VFIO abstractions? vDPA is trying to 
hide device details which is fundamentally different with what VFIO 
wants to do. vDPA allows the parent to deal with IOMMU stuffs, and if 
necessary, the parent can talk with IOMMU drivers directly via IOMMU APIs.



  


Rather than pivot ourselves around hypothetical, strawman,
non-intersecting, suggesting architecture without having done a proof of
concept to validate the proposal should stop. We have to ground ourselves
in reality.



The reality is VFIO should not be the only user for (v)SVA/SIOV/PASID. 
The kernel hard already had users like GPU or uacce.





The use cases we have so far for SIOV, VFIO and mdev seem to be the right
candidates and addresses them well. Now you might disagree, but as noted we
all agreed to move past this.



The mdev is not perfect for sure, but it's another topic.

If you(Intel) don't have plan to do vDPA, you should not prevent other 
vendors from implementing PASID capable hardware through non-VFIO 
subsystem/uAPI on top of your SIOV architecture. Isn't it?


So if Intel has the willing to collaborate on the POC, I'd happy to 
help. E.g it's not hard to have a PASID capable virtio device through 
qemu, and we can start from there.


Thanks






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok

On Wed, Oct 21, 2020 at 08:32:18PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote:
> 
> > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
> > SVM is orthogonal to how we achieve mdev passthrough to guest and
> > vSVM.
> 
> Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed
> on the VDPA side as well, I think that is why JasonW brought this up
> in the first place.

True, to that effect we are working on trying to move PASID allocation
outside of VFIO, so both agents VFIO and vDPA with PASID, when that comes
available can support one way to allocate and manage PASID's from user
space.

Since the IOASID is almost standalone, this is possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe

On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote:

> I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
> SVM is orthogonal to how we achieve mdev passthrough to guest and
> vSVM.

Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed
on the VDPA side as well, I think that is why JasonW brought this up
in the first place.

We may not see vSVA for VDPA, but that seems like some special sub
mode of all the other vIOMMU and PASID stuff, and not a completely
distinct thing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok

On Wed, Oct 21, 2020 at 03:24:42PM -0300, Jason Gunthorpe wrote:
> 
> > Contrary to your argument, vDPA went with a half blown device only 
> > iommu user without considering existing abstractions like containers
> 
> VDPA IOMMU was done *for Intel*, as the kind of half-architected thing
> you are advocating should be allowed for IDXD here. Not sure why you
> think bashing that work is going to help your case here.

I'm not bashing that work, sorry if it comes out that way, 
but just feels like double standards.

I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
SVM is orthogonal to how we achieve mdev passthrough to guest and vSVM. 
We visited that exact thing multiple times. Doing SVM is quite simple and 
doesn't carry the weight of other (Kevin explained this in detail 
not too long ago) long list of things we need to accomplish for mdev pass 
through. 

For SVM, just access to hw, mmio and bind_mm to get a PASID bound with
IOMMU. 

For IDXD that creates passthough devices for guest access and vSVM is
through the VFIO path. 

For guest SVM, we expose mdev's to guest OS, idxd in the guest provides vSVM
services. vSVM is *not* build around native SVM interfaces. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok

On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:
> > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > > > I think we agreed (or agree to disagree and commit) for device 
> > > > > > types that 
> > > > > > we have for SIOV, VFIO based approach works well without having to 
> > > > > > re-invent 
> > > > > > another way to do the same things. Not looking for a shortcut by 
> > > > > > any means, 
> > > > > > but we need to plan around existing hardware though. Looks like 
> > > > > > vDPA took 
> > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > > > necessary hardware was available.. This would be a solved puzzle. 
> > > > > 
> > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > > > the "home" and needs to stand alone.
> > > > > 
> > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > > > 
> > > > So just to make this clear, I did check internally if there are any 
> > > > plans
> > > > for vDPA + SVM. There are none at the moment. 
> > > 
> > > Not SVM, SIOV.
> > 
> > ... And that included.. I should have said vDPA + PASID, No current plans. 
> > I have no idea who set expectations with you. Can you please put me in 
> > touch 
> > with that person, privately is fine.
> 
> It was the team that aruged VDPA had to be done through VFIO - SIOV
> and PASID was one of their reasons it had to be VFIO, check the list
> archives

Humm... I could search the arhives, but the point is I'm confirming that
there is no forward looking plan!

And who ever did was it was based on probably strawman hypothetical argument 
that wasn't
grounded in reality. 

> 
> If they didn't plan to use it, bit of a strawman argument, right?

This doesn't need to continue like the debates :-) Pun intended :-)

I don't think it makes any sense to have an abstract strawman argument
design discussion. Yi is looking into for pasid management alone. Rest 
of the IOMMU related topics should wait until we have another 
*real* use requiring consolidation. 

Contrary to your argument, vDPA went with a half blown device only 
iommu user without considering existing abstractions like containers 
and such in VFIO is part of the reason the gap is big at the moment.
And you might not agree, but that's beside the point. 

Rather than pivot ourselves around hypothetical, strawman,
non-intersecting, suggesting architecture without having done a proof of
concept to validate the proposal should stop. We have to ground ourselves
in reality.

The use cases we have so far for SIOV, VFIO and mdev seem to be the right
candidates and addresses them well. Now you might disagree, but as noted we
all agreed to move past this.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe

On Wed, Oct 21, 2020 at 10:51:46AM -0700, Raj, Ashok wrote:

> > If they didn't plan to use it, bit of a strawman argument, right?
> 
> This doesn't need to continue like the debates :-) Pun intended :-)
> 
> I don't think it makes any sense to have an abstract strawman argument
> design discussion. Yi is looking into for pasid management alone. Rest 
> of the IOMMU related topics should wait until we have another 
> *real* use requiring consolidation. 

Actually I'm really annoyed right now that the other Intel team wasted
quiet a lot of the rest of our time on arguing about vDPA and vfio
with no actual interest in this technology.

So you'll excuse me if I'm not particularly enamored with this
discussion right now.

> Contrary to your argument, vDPA went with a half blown device only 
> iommu user without considering existing abstractions like containers

VDPA IOMMU was done *for Intel*, as the kind of half-architected thing
you are advocating should be allowed for IDXD here. Not sure why you
think bashing that work is going to help your case here.

I'm saying Intel needs to get its architecture together and stop
ceating this mess across the kernel to support Intel devices.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:
> On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > > I think we agreed (or agree to disagree and commit) for device types 
> > > > > that 
> > > > > we have for SIOV, VFIO based approach works well without having to 
> > > > > re-invent 
> > > > > another way to do the same things. Not looking for a shortcut by any 
> > > > > means, 
> > > > > but we need to plan around existing hardware though. Looks like vDPA 
> > > > > took 
> > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > > necessary hardware was available.. This would be a solved puzzle. 
> > > > 
> > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > > the "home" and needs to stand alone.
> > > > 
> > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > > 
> > > So just to make this clear, I did check internally if there are any plans
> > > for vDPA + SVM. There are none at the moment. 
> > 
> > Not SVM, SIOV.
> 
> ... And that included.. I should have said vDPA + PASID, No current plans. 
> I have no idea who set expectations with you. Can you please put me in touch 
> with that person, privately is fine.

It was the team that aruged VDPA had to be done through VFIO - SIOV
and PASID was one of their reasons it had to be VFIO, check the list
archives

If they didn't plan to use it, bit of a strawman argument, right?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Wang



On 2020/10/20 下午10:19, Liu, Yi L wrote:

From: Jason Gunthorpe 
Sent: Tuesday, October 20, 2020 10:02 PM

[...]

Whoever provides the vIOMMU emulation and relays the page fault to the

guest

has to translate the RID -

that's the point. But the device info (especially the sub-device info) is
within the passthru framework (e.g. VFIO). So page fault reporting needs
to go through passthru framework.


what does that have to do with VFIO?

How will VPDA provide the vIOMMU emulation?

a pardon here. I believe vIOMMU emulation should be based on IOMMU

vendor

specification, right? you may correct me if I'm missing anything.

I'm asking how will VDPA translate the RID when VDPA triggers a page
fault that has to be relayed to the guest. VDPA also has virtual PCI
devices it creates.

I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU
or other type vIOMMU.



The kernel code is ready. Note that vhost suppport for vIOMMU is even 
earlier than VFIO.


The API is designed to be generic is not limited to any specific type of 
vIOMMU.


For qemu, it just need a patch to implement map/unmap notifier as what 
VFIO did.


Thanks





Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok

On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > I think we agreed (or agree to disagree and commit) for device types 
> > > > that 
> > > > we have for SIOV, VFIO based approach works well without having to 
> > > > re-invent 
> > > > another way to do the same things. Not looking for a shortcut by any 
> > > > means, 
> > > > but we need to plan around existing hardware though. Looks like vDPA 
> > > > took 
> > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > necessary hardware was available.. This would be a solved puzzle. 
> > > 
> > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > the "home" and needs to stand alone.
> > > 
> > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > 
> > So just to make this clear, I did check internally if there are any plans
> > for vDPA + SVM. There are none at the moment. 
> 
> Not SVM, SIOV.

... And that included.. I should have said vDPA + PASID, No current plans. 
I have no idea who set expectations with you. Can you please put me in touch 
with that person, privately is fine.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > I think we agreed (or agree to disagree and commit) for device types that 
> > > we have for SIOV, VFIO based approach works well without having to 
> > > re-invent 
> > > another way to do the same things. Not looking for a shortcut by any 
> > > means, 
> > > but we need to plan around existing hardware though. Looks like vDPA took 
> > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > necessary hardware was available.. This would be a solved puzzle. 
> > 
> > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > the "home" and needs to stand alone.
> > 
> > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> 
> So just to make this clear, I did check internally if there are any plans
> for vDPA + SVM. There are none at the moment. 

Not SVM, SIOV.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok

On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > I think we agreed (or agree to disagree and commit) for device types that 
> > we have for SIOV, VFIO based approach works well without having to 
> > re-invent 
> > another way to do the same things. Not looking for a shortcut by any means, 
> > but we need to plan around existing hardware though. Looks like vDPA took 
> > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > necessary hardware was available.. This would be a solved puzzle. 
> 
> I think it is the opposite, vIOMMU and related has outgrown VFIO as
> the "home" and needs to stand alone.
> 
> Apparently the HW that will need PASID for vDPA is Intel HW, so if

So just to make this clear, I did check internally if there are any plans
for vDPA + SVM. There are none at the moment. It seems like you have
better insight into our plans ;-). Please do let me know who confirmed vDPA
roadmap with you and I would love to talk to them to clear the air.


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> I think we agreed (or agree to disagree and commit) for device types that 
> we have for SIOV, VFIO based approach works well without having to re-invent 
> another way to do the same things. Not looking for a shortcut by any means, 
> but we need to plan around existing hardware though. Looks like vDPA took 
> some shortcuts then to not abstract iommu uAPI instead :-)? When all
> necessary hardware was available.. This would be a solved puzzle. 

I think it is the opposite, vIOMMU and related has outgrown VFIO as
the "home" and needs to stand alone.

Apparently the HW that will need PASID for vDPA is Intel HW, so if
more is needed to do a good design you are probably the only one that
can get it/do it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok

On Tue, Oct 20, 2020 at 02:03:36PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote:
> > Hi Jason,
> > 
> > 
> > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> > > 
> > > > > I'm sure there will be some
> > > > > weird overlaps because we can't delete any of the existing VFIO APIs, 
> > > > > but
> > > > > that
> > > > > should not be a blocker.
> > > > 
> > > > but the weird thing is what we should consider. And it perhaps not just
> > > > overlap, it may be a re-definition of VFIO container. As I mentioned, 
> > > > VFIO
> > > > container is IOMMU context from the day it was defined. It could be the
> > > > blocker. :-(
> > > 
> > > So maybe you have to broaden the VFIO container to be usable by other
> > > subsystems. The discussion here is about what the uAPI should look
> > > like in a fairly abstract way. When we say 'dev/sva' it just some
> > > placeholder for a shared cdev that provides the necessary
> > > dis-aggregated functionality 
> > > 
> > > It could be an existing cdev with broader functionaltiy, it could
> > > really be /dev/iommu, etc. This is up to the folks building it to
> > > decide.
> > > 
> > > > I'm not expert on vDPA for now, but I saw you three open source
> > > > veterans have a similar idea for a place to cover IOMMU handling,
> > > > I think it may be a valuable thing to do. I said "may be" as I'm not
> > > > sure about Alex's opinion on such idea. But the sure thing is this
> > > > idea may introduce weird overlap even re-definition of existing
> > > > thing as I replied above. We need to evaluate the impact and mature
> > > > the idea step by step. 
> > > 
> > > This has happened before, uAPIs do get obsoleted and replaced with
> > > more general/better versions. It is often too hard to create a uAPI
> > > that lasts for decades when the HW landscape is constantly changing
> > > and sometime a reset is needed. 
> > 
> > I'm throwing this out with a lot of hesitation, but I'm going to :-)
> > 
> > So we have been disussing this for months now, with some high level vision
> > trying to get the uAPI's solidified with a vDPA hardware that might
> > potentially have SIOV/SVM like extensions in hardware which actualy doesn't
> > exist today. Understood people have plans. 
> 
> > Given that vDPA today has diverged already with duplicating use of IOMMU
> > api's without making an effort to gravitate to /dev/iommu as how you are
> > proposing.
> 
> I see it more like, given that we already know we have multiple users
> of IOMMU, adding new IOMMU focused features has to gravitate toward
> some kind of convergance.
> 
> Currently things are not so bad, VDPA is just getting started and the
> current IOMMU feature set is not so big.
> 
> PASID/vIOMMU/etc/et are all stressing this more, I think the
> responsibility falls to the people proposing these features to do the
> architecture work.
> 
> > The question is should we hold hostage the current vSVM/vIOMMU efforts
> > without even having made an effort for current vDPA/VFIO convergence. 
> 
> I don't think it is "held hostage" it is a "no shortcuts" approach,
> there was always a recognition that future VDPA drivers will need some
> work to integrated with vIOMMU realted stuff.

I think we agreed (or agree to disagree and commit) for device types that 
we have for SIOV, VFIO based approach works well without having to re-invent 
another way to do the same things. Not looking for a shortcut by any means, 
but we need to plan around existing hardware though. Looks like vDPA took 
some shortcuts then to not abstract iommu uAPI instead :-)? When all
necessary hardware was available.. This would be a solved puzzle. 


> 
> This is no different than the IMS discussion. The first proposed patch
> was really simple, but a layering violation.
> 
> The correct solution was some wild 20 patch series modernizing how x86

That was more like 48 patches, not 20 :-). But we had a real device with
IMS to model and create these new abstractions and test them against. 

For vDPA+SVM we have non-intersecting conversations at the moment with no
real hardware to model our discussion around. 

> interrupts works because it had outgrown itself. This general approach
> to use the shared MSI infrastructure was pointed out at the very
> beginning of IMS, BTW.

Agreed, and thankfully Thomas worked hard and made it a lot easier :-). 
Today IMS only deals with on device store. Although IMS could mean 
just simply having system memory to hold the interrupt attributes. 
This is how some of the graphics devices would be with context 
holding interrupt attributes. 

But certainly not rushing this since we need a REAL user to be there before we
support DEV_MSI that uses msg_addr/msg_data held in system memory. 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/ma

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote:
> Hi Jason,
> 
> 
> On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> > 
> > > > I'm sure there will be some
> > > > weird overlaps because we can't delete any of the existing VFIO APIs, 
> > > > but
> > > > that
> > > > should not be a blocker.
> > > 
> > > but the weird thing is what we should consider. And it perhaps not just
> > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> > > container is IOMMU context from the day it was defined. It could be the
> > > blocker. :-(
> > 
> > So maybe you have to broaden the VFIO container to be usable by other
> > subsystems. The discussion here is about what the uAPI should look
> > like in a fairly abstract way. When we say 'dev/sva' it just some
> > placeholder for a shared cdev that provides the necessary
> > dis-aggregated functionality 
> > 
> > It could be an existing cdev with broader functionaltiy, it could
> > really be /dev/iommu, etc. This is up to the folks building it to
> > decide.
> > 
> > > I'm not expert on vDPA for now, but I saw you three open source
> > > veterans have a similar idea for a place to cover IOMMU handling,
> > > I think it may be a valuable thing to do. I said "may be" as I'm not
> > > sure about Alex's opinion on such idea. But the sure thing is this
> > > idea may introduce weird overlap even re-definition of existing
> > > thing as I replied above. We need to evaluate the impact and mature
> > > the idea step by step. 
> > 
> > This has happened before, uAPIs do get obsoleted and replaced with
> > more general/better versions. It is often too hard to create a uAPI
> > that lasts for decades when the HW landscape is constantly changing
> > and sometime a reset is needed. 
> 
> I'm throwing this out with a lot of hesitation, but I'm going to :-)
> 
> So we have been disussing this for months now, with some high level vision
> trying to get the uAPI's solidified with a vDPA hardware that might
> potentially have SIOV/SVM like extensions in hardware which actualy doesn't
> exist today. Understood people have plans. 

> Given that vDPA today has diverged already with duplicating use of IOMMU
> api's without making an effort to gravitate to /dev/iommu as how you are
> proposing.

I see it more like, given that we already know we have multiple users
of IOMMU, adding new IOMMU focused features has to gravitate toward
some kind of convergance.

Currently things are not so bad, VDPA is just getting started and the
current IOMMU feature set is not so big.

PASID/vIOMMU/etc/et are all stressing this more, I think the
responsibility falls to the people proposing these features to do the
architecture work.

> The question is should we hold hostage the current vSVM/vIOMMU efforts
> without even having made an effort for current vDPA/VFIO convergence. 

I don't think it is "held hostage" it is a "no shortcuts" approach,
there was always a recognition that future VDPA drivers will need some
work to integrated with vIOMMU realted stuff.

This is no different than the IMS discussion. The first proposed patch
was really simple, but a layering violation.

The correct solution was some wild 20 patch series modernizing how x86
interrupts works because it had outgrown itself. This general approach
to use the shared MSI infrastructure was pointed out at the very
beginning of IMS, BTW.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok

Hi Jason,


On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> 
> > > I'm sure there will be some
> > > weird overlaps because we can't delete any of the existing VFIO APIs, but
> > > that
> > > should not be a blocker.
> > 
> > but the weird thing is what we should consider. And it perhaps not just
> > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> > container is IOMMU context from the day it was defined. It could be the
> > blocker. :-(
> 
> So maybe you have to broaden the VFIO container to be usable by other
> subsystems. The discussion here is about what the uAPI should look
> like in a fairly abstract way. When we say 'dev/sva' it just some
> placeholder for a shared cdev that provides the necessary
> dis-aggregated functionality 
> 
> It could be an existing cdev with broader functionaltiy, it could
> really be /dev/iommu, etc. This is up to the folks building it to
> decide.
> 
> > I'm not expert on vDPA for now, but I saw you three open source
> > veterans have a similar idea for a place to cover IOMMU handling,
> > I think it may be a valuable thing to do. I said "may be" as I'm not
> > sure about Alex's opinion on such idea. But the sure thing is this
> > idea may introduce weird overlap even re-definition of existing
> > thing as I replied above. We need to evaluate the impact and mature
> > the idea step by step. 
> 
> This has happened before, uAPIs do get obsoleted and replaced with
> more general/better versions. It is often too hard to create a uAPI
> that lasts for decades when the HW landscape is constantly changing
> and sometime a reset is needed. 

I'm throwing this out with a lot of hesitation, but I'm going to :-)

So we have been disussing this for months now, with some high level vision
trying to get the uAPI's solidified with a vDPA hardware that might
potentially have SIOV/SVM like extensions in hardware which actualy doesn't
exist today. Understood people have plans. 

Given that vDPA today has diverged already with duplicating use of IOMMU
api's without making an effort to gravitate to /dev/iommu as how you are
proposing.

I think we all understand creating a permanent uAPI is hard, and they can
evolve in future. 

Maybe  we should start work on how to converge on generalizing the IOMMU
story first with what we have today (vDPA + VFIO) convergence and let it evolve 
with real hardware and new features like SVM/SIOV in mind. This is going 
to take time and we can start with what we have today for pulling vDPA and 
VFIO pieces first.

The question is should we hold hostage the current vSVM/vIOMMU efforts
without even having made an effort for current vDPA/VFIO convergence. 

> 
> The jump to shared PASID based IOMMU feels like one of those moments here.

As we have all noted, even without PASID we have divergence today?


> 
> > > Whoever provides the vIOMMU emulation and relays the page fault to the 
> > > guest
> > > has to translate the RID -
> > 
> > that's the point. But the device info (especially the sub-device info) is
> > within the passthru framework (e.g. VFIO). So page fault reporting needs
> > to go through passthru framework.
> >
> > > what does that have to do with VFIO?
> > > 
> > > How will VPDA provide the vIOMMU emulation?
> > 
> > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor
> > specification, right? you may correct me if I'm missing anything.
> 
> I'm asking how will VDPA translate the RID when VDPA triggers a page
> fault that has to be relayed to the guest. VDPA also has virtual PCI
> devices it creates.
> 
> We can't rely on VFIO to be the place that the vIOMMU lives because it
> excludes/complicates everything that is not VFIO from using that
> stuff.
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 10:02 PM
[...]
> > > Whoever provides the vIOMMU emulation and relays the page fault to the
> guest
> > > has to translate the RID -
> >
> > that's the point. But the device info (especially the sub-device info) is
> > within the passthru framework (e.g. VFIO). So page fault reporting needs
> > to go through passthru framework.
> >
> > > what does that have to do with VFIO?
> > >
> > > How will VPDA provide the vIOMMU emulation?
> >
> > a pardon here. I believe vIOMMU emulation should be based on IOMMU
> vendor
> > specification, right? you may correct me if I'm missing anything.
> 
> I'm asking how will VDPA translate the RID when VDPA triggers a page
> fault that has to be relayed to the guest. VDPA also has virtual PCI
> devices it creates.

I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU
or other type vIOMMU.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 10:05 PM
> To: Liu, Yi L 
> 
> On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, October 20, 2020 9:55 PM
> > >
> > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> > >
> > > > > See previous discussion with Kevin. If I understand correctly,
> > > > > you expect a
> > > shared
> > > > > L2 table if vDPA and VFIO device are using the same PASID.
> > > >
> > > > L2 table sharing is not mandatory. The mapping is the same, but no
> > > > need to assume L2 tables are shared. Especially for VFIO/vDPA
> > > > devices. Even within a passthru framework, like VFIO, if the
> > > > attributes of backend IOMMU are not the same, the L2 page table are not
> shared, but the mapping is the same.
> > >
> > > I think not being able to share the PASID shows exactly why this
> > > VFIO centric approach is bad.
> >
> > no, I didn't say PASID is not sharable. My point is sharing L2 page
> > table is not mandatory.
> 
> IMHO a PASID should be 1:1 with a page table, what does it even mean to share
> a PASID but have different page tables?

PASID is actually 1:1 with an address space. Not really needs to be 1:1 with
page table. :-)

Regards,
Yi Liu

> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, October 20, 2020 9:55 PM
> > 
> > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> > 
> > > > See previous discussion with Kevin. If I understand correctly, you 
> > > > expect a
> > shared
> > > > L2 table if vDPA and VFIO device are using the same PASID.
> > >
> > > L2 table sharing is not mandatory. The mapping is the same, but no need to
> > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> > > a passthru framework, like VFIO, if the attributes of backend IOMMU are 
> > > not
> > > the same, the L2 page table are not shared, but the mapping is the same.
> > 
> > I think not being able to share the PASID shows exactly why this VFIO
> > centric approach is bad.
> 
> no, I didn't say PASID is not sharable. My point is sharing L2 page table is
> not mandatory.

IMHO a PASID should be 1:1 with a page table, what does it even mean
to share a PASID but have different page tables?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:

> > I'm sure there will be some
> > weird overlaps because we can't delete any of the existing VFIO APIs, but
> > that
> > should not be a blocker.
> 
> but the weird thing is what we should consider. And it perhaps not just
> overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> container is IOMMU context from the day it was defined. It could be the
> blocker. :-(

So maybe you have to broaden the VFIO container to be usable by other
subsystems. The discussion here is about what the uAPI should look
like in a fairly abstract way. When we say 'dev/sva' it just some
placeholder for a shared cdev that provides the necessary
dis-aggregated functionality 

It could be an existing cdev with broader functionaltiy, it could
really be /dev/iommu, etc. This is up to the folks building it to
decide.

> I'm not expert on vDPA for now, but I saw you three open source
> veterans have a similar idea for a place to cover IOMMU handling,
> I think it may be a valuable thing to do. I said "may be" as I'm not
> sure about Alex's opinion on such idea. But the sure thing is this
> idea may introduce weird overlap even re-definition of existing
> thing as I replied above. We need to evaluate the impact and mature
> the idea step by step. 

This has happened before, uAPIs do get obsoleted and replaced with
more general/better versions. It is often too hard to create a uAPI
that lasts for decades when the HW landscape is constantly changing
and sometime a reset is needed. 

The jump to shared PASID based IOMMU feels like one of those moments here.

> > Whoever provides the vIOMMU emulation and relays the page fault to the guest
> > has to translate the RID -
> 
> that's the point. But the device info (especially the sub-device info) is
> within the passthru framework (e.g. VFIO). So page fault reporting needs
> to go through passthru framework.
>
> > what does that have to do with VFIO?
> > 
> > How will VPDA provide the vIOMMU emulation?
> 
> a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor
> specification, right? you may correct me if I'm missing anything.

I'm asking how will VDPA translate the RID when VDPA triggers a page
fault that has to be relayed to the guest. VDPA also has virtual PCI
devices it creates.

We can't rely on VFIO to be the place that the vIOMMU lives because it
excludes/complicates everything that is not VFIO from using that
stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 9:55 PM
> 
> On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> 
> > > See previous discussion with Kevin. If I understand correctly, you expect 
> > > a
> shared
> > > L2 table if vDPA and VFIO device are using the same PASID.
> >
> > L2 table sharing is not mandatory. The mapping is the same, but no need to
> > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> > a passthru framework, like VFIO, if the attributes of backend IOMMU are not
> > the same, the L2 page table are not shared, but the mapping is the same.
> 
> I think not being able to share the PASID shows exactly why this VFIO
> centric approach is bad.

no, I didn't say PASID is not sharable. My point is sharing L2 page table is
not mandatory.

Regards,
Yi Liu

> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe

On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:

> > See previous discussion with Kevin. If I understand correctly, you expect a 
> > shared
> > L2 table if vDPA and VFIO device are using the same PASID.
> 
> L2 table sharing is not mandatory. The mapping is the same, but no need to
> assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> a passthru framework, like VFIO, if the attributes of backend IOMMU are not
> the same, the L2 page table are not shared, but the mapping is the same.

I think not being able to share the PASID shows exactly why this VFIO
centric approach is bad.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Monday, October 19, 2020 10:25 PM
> 
> On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote:
> > Hi Jason,
> >
> > Good to see your response.
> 
> Ah, I was away

got it. :-)

> > > > > Second, IOMMU nested translation is a per IOMMU domain
> > > > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > > > > (alloc/free domain, attach/detach device, set/get domain
> > > > > attribute, etc.), reporting/enabling the nesting capability is
> > > > > an natural extension to the domain uAPI of existing passthrough
> frameworks.
> > > > > Actually, VFIO already includes a nesting enable interface even
> > > > > before this series. So it doesn't make sense to generalize this
> > > > > uAPI out.
> > >
> > > The subsystem that obtains an IOMMU domain for a device would have
> > > to register it with an open FD of the '/dev/sva'. That is the
> > > connection between the two subsystems. It would be some simple
> > > kernel internal
> > > stuff:
> > >
> > >   sva = get_sva_from_file(fd);
> >
> > Is this fd provided by userspace? I suppose the /dev/sva has a set of
> > uAPIs which will finally program page table to host iommu driver. As
> > far as I know, it's weird for VFIO user. Why should VFIO user connect
> > to a /dev/sva fd after it sets a proper iommu type to the opened
> > container. VFIO container already stands for an iommu context with
> > which userspace could program page mapping to host iommu.
> 
> Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it
> can
> be shared between more subsystems that need it.

I understand you here. :-)

> I'm sure there will be some
> weird overlaps because we can't delete any of the existing VFIO APIs, but
> that
> should not be a blocker.

but the weird thing is what we should consider. And it perhaps not just
overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
container is IOMMU context from the day it was defined. It could be the
blocker. :-(

> Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is
> a possible path.

This looks to be similar with the proposal from Jason Wang and Kevin Tian.
It is an idea to add "/dev/iommu" and delegate the IOMMU domain alloc,
device attach/detach which is no in passthru framework to an independent
kernel driver. Just as Jason Wang said replace vfio iommu type1 driver.

Jason Wang:
 "And all the proposal in this series is to reuse the container fd. It 
 should be possible to replace e.g type1 IOMMU with a unified module."
link: 
https://lore.kernel.org/kvm/20201019142526.gj6...@nvidia.com/T/#md49fe9ac9d9eff6ddf5b8c2ee2f27eb2766f66f3

Kevin Tian:
 "Based on above, I feel a more reasonable way is to first make a 
 /dev/iommu uAPI supporting DMA map/unmap usages and then 
 introduce vSVA to it. Doing this order is because DMA map/unmap 
 is widely used thus can better help verify the core logic with 
 many existing devices."
link: 
https://lore.kernel.org/kvm/mwhpr11mb1645c702d148a2852b41fca08c...@mwhpr11mb1645.namprd11.prod.outlook.com/

> 
> If your plan is to just opencode everything into VFIO then I don't
> see how VDPA will work well, and if proper in-kernel abstractions are built I
> fail to see how
> routing some of it through userspace is a fundamental problem.

I'm not expert on vDPA for now, but I saw you three open source
veterans have a similar idea for a place to cover IOMMU handling,
I think it may be a valuable thing to do. I said "may be" as I'm not
sure about Alex's opinion on such idea. But the sure thing is this
idea may introduce weird overlap even re-definition of existing
thing as I replied above. We need to evaluate the impact and mature
the idea step by step. That means it would take time, so perhaps we
may do it in a staging way. First having a "/dev/iommu" be ready to
handle page MAP/UNMAP which can be used by both VFIO and vDPA, mean-
while let VFIO grow up (adding features) by itself and consider to
adopt the new /dev/iommu later once /dev/iommu is competent. Of
course this needs Alex's approval. And then adding new features
to /dev/iommu, like SVA.

> 
> > >   sva_register_device_to_pasid(sva, pasid, pci_device,
> > > iommu_domain);
> >
> > So this is supposed to be called by VFIO/VDPA to register the info to
> > /dev/sva.
> > right? And in dev/sva, it will also maintain the device/iommu_domain
> > and pasid info? will it be duplicated with VFIO/VDPA?
> 
> Each part needs to have the information it needs?

yeah, but it's the duplication which I'm not very much in. Perhaps the idea
from Jason Wang and Kevin would avoid such duplication.

> > > > > Moreover, mapping page fault to subdevice requires pre-
> > > > > registering subdevice fault data to IOMMU layer when binding
> > > > > guest page table, while such fault data can be only retrieved
> > > > > from parent driver through VFIO/VDPA.
> > >
> > > Not sure what this means, page fault should be tied to the PASID,
> > > any hookup needed for that

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

> From: Jason Wang 
> Sent: Tuesday, October 20, 2020 5:20 PM
> 
> Hi Yi:
> 
> On 2020/10/20 ??4:19, Liu, Yi L wrote:
> >> Yes, but since PASID is a global identifier now, I think kernel
> >> should track the a device list per PASID?
> > We have such track. It's done in iommu driver. You can refer to the
> > struct intel_svm. PASID is a global identifier, but it doesn’t affect
> > that the PASID table is per-device.
> >
> >> So for such binding, PASID should be
> >> sufficient for uAPI.
> > not quite get it. PASID may be bound to multiple devices, how do you
> > figure out the target device if you don’t provide such info.
> 
> 
> I may miss soemthing but is there any reason that userspace need to figure out
> the target device? PASID is about address space not a specific device I think.

If you have multiple devices assigned to a VM, you won't expect to bind all
of them to a PASID in a single bind operation, right? you may want to bind
only the devices you really mean. This manner should be more flexible and
reasonable. :-)

> 
> >
> > The binding request is initiated by the virtual IOMMU, when
> > capturing guest attempt of binding page table to a virtual PASID
> > entry for a given device.
>  And for L2 page table programming, if PASID is use by both e.g VFIO
>  and vDPA, user need to choose one of uAPI to build l2 mappings?
> >>> for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I
> >>> guess it is tlb flush. so you are right. Keeping L1/L2 page table
> >>> management in a single uAPI set is also a reason for my current
> >>> series which extends VFIO for L1 management.
> >> I'm afraid that would introduce confusing to userspace. E.g:
> >>
> >> 1) when having only vDPA device, it uses vDPA uAPI to do l2
> >> management
> >> 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the
> >> l2 management?
> > I think vDPA will still use its own l2 for the l2 mappings. not sure
> > why you need vDPA use VFIO's l2 management. I don't think it is the case.
> 
> 
> See previous discussion with Kevin. If I understand correctly, you expect a 
> shared
> L2 table if vDPA and VFIO device are using the same PASID.

L2 table sharing is not mandatory. The mapping is the same, but no need to
assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
a passthru framework, like VFIO, if the attributes of backend IOMMU are not
the same, the L2 page table are not shared, but the mapping is the same.

> In this case, if l2 is still managed separately, there will be duplicated 
> request of
> map and unmap.

yes, but this is not a functional issue, right? If we want to solve it, we
should have a single uAPI set which can handle both L1 and L2 management.
That's also why you proposed to replace type1 driver. right?

Regards,
Yi Liu

> 
> Thanks
> 
> 
> >
> > Regards,
> > Yi Liu
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Wang


Hi Yi:

On 2020/10/20 下午4:19, Liu, Yi L wrote:

Yes, but since PASID is a global identifier now, I think kernel should
track the a device list per PASID?

We have such track. It's done in iommu driver. You can refer to the
struct intel_svm. PASID is a global identifier, but it doesn’t affect that
the PASID table is per-device.


So for such binding, PASID should be
sufficient for uAPI.

not quite get it. PASID may be bound to multiple devices, how do
you figure out the target device if you don’t provide such info.



I may miss soemthing but is there any reason that userspace need to 
figure out the target device? PASID is about address space not a 
specific device I think.






The binding request is initiated by the virtual IOMMU, when capturing
guest attempt of binding page table to a virtual PASID entry for a
given device.

And for L2 page table programming, if PASID is use by both e.g VFIO and
vDPA, user need to choose one of uAPI to build l2 mappings?

for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I guess
it is tlb flush. so you are right. Keeping L1/L2 page table management in
a single uAPI set is also a reason for my current series which extends VFIO
for L1 management.

I'm afraid that would introduce confusing to userspace. E.g:

1) when having only vDPA device, it uses vDPA uAPI to do l2 management
2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the l2
management?

I think vDPA will still use its own l2 for the l2 mappings. not sure why you
need vDPA use VFIO's l2 management. I don't think it is the case.



See previous discussion with Kevin. If I understand correctly, you 
expect a shared L2 table if vDPA and VFIO device are using the same PASID.


In this case, if l2 is still managed separately, there will be 
duplicated request of map and unmap.


Thanks




Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L

Hey Jason,

> From: Jason Wang 
> Sent: Tuesday, October 20, 2020 2:18 PM
> 
> On 2020/10/15 ??6:14, Liu, Yi L wrote:
> >> From: Jason Wang 
> >> Sent: Thursday, October 15, 2020 4:41 PM
> >>
> >>
> >> On 2020/10/15 ??3:58, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Thursday, October 15, 2020 2:52 PM
> 
> 
>  On 2020/10/14 ??11:08, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Tuesday, October 13, 2020 2:22 PM
> >>
> >>
> >> On 2020/10/12 ??4:38, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Monday, September 14, 2020 12:20 PM
> 
> >>> [...]
> >>>  > If it's possible, I would suggest a generic uAPI instead of
> >>> a VFIO
>  specific one.
> 
>  Jason suggest something like /dev/sva. There will be a lot of
>  other subsystems that could benefit from this (e.g vDPA).
> 
>  Have you ever considered this approach?
> 
> >>> Hi, Jason,
> >>>
> >>> We did some study on this approach and below is the output. It's a
> >>> long writing but I didn't find a way to further abstract w/o
> >>> losing necessary context. Sorry about that.
> >>>
> >>> Overall the real purpose of this series is to enable IOMMU nested
> >>> translation capability with vSVA as one major usage, through below
> >>> new uAPIs:
> >>>   1) Report/enable IOMMU nested translation capability;
> >>>   2) Allocate/free PASID;
> >>>   3) Bind/unbind guest page table;
> >>>   4) Invalidate IOMMU cache;
> >>>   5) Handle IOMMU page request/response (not in this series);
> >>> 1/3/4) is the minimal set for using IOMMU nested translation, with
> >>> the other two optional. For example, the guest may enable vSVA on
> >>> a device without using PASID. Or, it may bind its gIOVA page table
> >>> which doesn't require page fault support. Finally, all operations
> >>> can be applied to either physical device or subdevice.
> >>>
> >>> Then we evaluated each uAPI whether generalizing it is a good
> >>> thing both in concept and regarding to complexity.
> >>>
> >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> >>> allocation/free is through the IOASID sub-system.
> >> A question here, is IOASID expected to be the single management
> >> interface for PASID?
> > yes
> >
> >> (I'm asking since there're already vendor specific IDA based PASID
> >> allocator e.g amdgpu_pasid_alloc())
> > That comes before IOASID core was introduced. I think it should be
> > changed to use the new generic interface. Jacob/Jean can better
> > comment if other reason exists for this exception.
>  If there's no exception it should be fixed.
> 
> 
> >>>  From this angle
> >>> we feel generalizing PASID management does make some sense.
> >>> First, PASID is just a number and not related to any device before
> >>> it's bound to a page table and IOMMU domain. Second, PASID is a
> >>> global resource (at least on Intel VT-d),
> >> I think we need a definition of "global" here. It looks to me for
> >> vt-d the PASID table is per device.
> > PASID table is per device, thus VT-d could support per-device PASIDs
> > in concept.
>  I think that's the requirement of PCIE spec which said PASID + RID
>  identifies the process address space ID.
> 
> 
> > However on Intel platform we require PASIDs to be managed in
> > system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> > and ENQCMD together.
>  Any reason for such requirement? (I'm not familiar with ENQCMD, but
>  my understanding is that vSVA, SIOV or SR-IOV doesn't have the
>  requirement for system-wide PASID).
> >>> ENQCMD is a new instruction to allow multiple processes submitting
> >>> workload to one shared workqueue. Each process has an unique PASID
> >>> saved in a MSR, which is included in the ENQCMD payload to indicate
> >>> the address space when the CPU sends to the device. As one process
> >>> might issue ENQCMD to multiple devices, OS-wide PASID allocation is
> >>> required both in host and guest side.
> >>>
> >>> When executing ENQCMD in the guest to a SIOV device, the guest
> >>> programmed value in the PASID_MSR must be translated to a host PASID
> >>> value for proper function/isolation as PASID represents the address
> >>> space. The translation is done through a new VMCS PASID translation
> >>> structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
> >>> must be allocated 'globally' cross all assigned devices otherwise it
> >>> may lead to 1:N mapping when a guest process issues ENQCMD to multiple
> >>> assigned devices/subdevices.
> >>>
> >>> There will be a KVM forum session for this topic btw.
> >>
> >> Thanks for the background. Now I see the restrict comes from ENQCMD.
> >>
> >>
>

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Jason Wang



On 2020/10/15 下午6:14, Liu, Yi L wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 4:41 PM


On 2020/10/15 ??3:58, Tian, Kevin wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 2:52 PM


On 2020/10/14 ??11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 ??4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
 > If it's possible, I would suggest a generic uAPI instead of
a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of
other subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o
losing necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through below
new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations
can be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good
thing both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.

A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

If there's no exception it should be fixed.



 From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),

I think we need a definition of "global" here. It looks to me for
vt-d the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.

I think that's the requirement of PCIE spec which said PASID + RID
identifies the process address space ID.



However on Intel platform we require PASIDs to be managed in
system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.

Any reason for such requirement? (I'm not familiar with ENQCMD, but
my understanding is that vSVA, SIOV or SR-IOV doesn't have the
requirement for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process
might issue ENQCMD to multiple devices, OS-wide PASID allocation is
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
must be allocated 'globally' cross all assigned devices otherwise it
may lead to 1:N mapping when a guest process issues ENQCMD to multiple
assigned devices/subdevices.

There will be a KVM forum session for this topic btw.


Thanks for the background. Now I see the restrict comes from ENQCMD.



Thus the host creates only one 'global' PASID namespace but do use
per-device PASID table to assure isolation between devices on Intel
platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware
unit(at least I can see two even in my laptop). In this case, is
PASID still a global resource?

yes


 while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in
userspace, e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.

Yes.



One unclear part with this generalization is about the permission.
Do we

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Jason Gunthorpe

On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote:
> Hi Jason,
> 
> Good to see your response.

Ah, I was away

> > > > Second, IOMMU nested translation is a per IOMMU domain
> > > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > > etc.), reporting/enabling the nesting capability is an natural
> > > > extension to the domain uAPI of existing passthrough frameworks.
> > > > Actually, VFIO already includes a nesting enable interface even
> > > > before this series. So it doesn't make sense to generalize this uAPI
> > > > out.
> > 
> > The subsystem that obtains an IOMMU domain for a device would have to
> > register it with an open FD of the '/dev/sva'. That is the connection
> > between the two subsystems. It would be some simple kernel internal
> > stuff:
> > 
> >   sva = get_sva_from_file(fd);
> 
> Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
> which will finally program page table to host iommu driver. As far as I know,
> it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
> it sets a proper iommu type to the opened container. VFIO container already
> stands for an iommu context with which userspace could program page mapping
> to host iommu.

Again the point is to dis-aggregate the vIOMMU related stuff from VFIO
so it can be shared between more subsystems that need it. I'm sure
there will be some weird overlaps because we can't delete any of the
existing VFIO APIs, but that should not be a blocker.

Having VFIO run in a mode where '/dev/sva' provides all the IOMMU
handling is a possible path.

If your plan is to just opencode everything into VFIO then I don't see
how VDPA will work well, and if proper in-kernel abstractions are
built I fail to see how routing some of it through userspace is a
fundamental problem.

> >   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);
> 
> So this is supposed to be called by VFIO/VDPA to register the info to 
> /dev/sva.
> right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
> info? will it be duplicated with VFIO/VDPA?

Each part needs to have the information it needs? 

> > > > Moreover, mapping page fault to subdevice requires pre-
> > > > registering subdevice fault data to IOMMU layer when binding
> > > > guest page table, while such fault data can be only retrieved from
> > > > parent driver through VFIO/VDPA.
> > 
> > Not sure what this means, page fault should be tied to the PASID, any
> > hookup needed for that should be done in-kernel when the device is
> > connected to the PASID.
> 
> you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
> software together with the requestor id of the device. For the page request
> injects to guest, it should have the device info.

Whoever provides the vIOMMU emulation and relays the page fault to the
guest has to translate the RID - what does that have to do with VFIO?

How will VPDA provide the vIOMMU emulation?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Liu, Yi L

Hi Jason,

Good to see your response.

> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 11:37 PM
> 
> On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> > Hi, Alex and Jason (G),
> >
> > How about your opinion for this new proposal? For now looks both
> > Jason (W) and Jean are OK with this direction and more discussions
> > are possibly required for the new /dev/ioasid interface. Internally
> > we're doing a quick prototype to see any unforeseen issue with this
> > separation.
> 
> Assuming VDPA and VFIO will be the only two users so duplicating
> everything only twice sounds pretty restricting to me.
> 
> > > Second, IOMMU nested translation is a per IOMMU domain
> > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > etc.), reporting/enabling the nesting capability is an natural
> > > extension to the domain uAPI of existing passthrough frameworks.
> > > Actually, VFIO already includes a nesting enable interface even
> > > before this series. So it doesn't make sense to generalize this uAPI
> > > out.
> 
> The subsystem that obtains an IOMMU domain for a device would have to
> register it with an open FD of the '/dev/sva'. That is the connection
> between the two subsystems. It would be some simple kernel internal
> stuff:
> 
>   sva = get_sva_from_file(fd);

Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
which will finally program page table to host iommu driver. As far as I know,
it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
it sets a proper iommu type to the opened container. VFIO container already
stands for an iommu context with which userspace could program page mapping
to host iommu.

>   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva.
right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
info? will it be duplicated with VFIO/VDPA?

> Not sure why this is a roadblock?
> 
> How would this be any different from having some kernel libsva that
> VDPA and VFIO would both rely on?
> 
> You don't plan to just open code all this stuff in VFIO, do you?
> 
> > > Then the tricky part comes with the remaining operations (3/4/5),
> > > which are all backed by iommu_ops thus effective only within an
> > > IOMMU domain. To generalize them, the first thing is to find a way
> > > to associate the sva_FD (opened through generic /dev/sva) with an
> > > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > > to replicate {domain<->device/subdevice} association in /dev/sva
> > > path because some operations (e.g. page fault) is triggered/handled
> > > per device/subdevice. Therefore, /dev/sva must provide both per-
> > > domain and per-device uAPIs similar to what VFIO/VDPA already
> > > does.
> 
> Yes, the point here was to move the general APIs out of VFIO and into
> a sharable location. So, of course one would expect some duplication
> during the transition period.
> 
> > > Moreover, mapping page fault to subdevice requires pre-
> > > registering subdevice fault data to IOMMU layer when binding
> > > guest page table, while such fault data can be only retrieved from
> > > parent driver through VFIO/VDPA.
> 
> Not sure what this means, page fault should be tied to the PASID, any
> hookup needed for that should be done in-kernel when the device is
> connected to the PASID.

you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
software together with the requestor id of the device. For the page request
injects to guest, it should have the device info.

Regards,
Yi Liu

> 
> > > space but they may be organized in multiple IOMMU domains based
> > > on their bus type. How (should we let) the userspace know the
> > > domain information and open an sva_FD for each domain is the main
> > > problem here.
> 
> Why is one sva_FD per iommu domain required? The HW can attach the
> same PASID to multiple iommu domains, right?
> 
> > > In the end we just realized that doing such generalization doesn't
> > > really lead to a clear design and instead requires tight coordination
> > > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > > (especially about synchronization when the domain/device
> > > association is changed or when the device/subdevice is being reset/
> > > drained). Finally it may become a usability burden to the userspace
> > > on proper use of the two interfaces on the assigned device.
> 
> If you have a list of things that needs to be done to attach a PCI
> device to a PASID then of course they should be tidy kernel APIs
> already, and not just hard wired into VFIO.
> 
> The worst outcome would be to have VDPA and VFIO have to different
> ways to do all of this with a different set of bugs. Bug fixes/new
> features in VFIO won't flow over to VDPA.
> 
> Jason
___

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-16 Thread Jason Gunthorpe

On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> Hi, Alex and Jason (G),
> 
> How about your opinion for this new proposal? For now looks both
> Jason (W) and Jean are OK with this direction and more discussions
> are possibly required for the new /dev/ioasid interface. Internally 
> we're doing a quick prototype to see any unforeseen issue with this
> separation. 

Assuming VDPA and VFIO will be the only two users so duplicating
everything only twice sounds pretty restricting to me.

> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.

The subsystem that obtains an IOMMU domain for a device would have to
register it with an open FD of the '/dev/sva'. That is the connection
between the two subsystems. It would be some simple kernel internal
stuff:

  sva = get_sva_from_file(fd);
  sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

Not sure why this is a roadblock?

How would this be any different from having some kernel libsva that
VDPA and VFIO would both rely on?

You don't plan to just open code all this stuff in VFIO, do you?

> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice} association in /dev/sva
> > path because some operations (e.g. page fault) is triggered/handled
> > per device/subdevice. Therefore, /dev/sva must provide both per-
> > domain and per-device uAPIs similar to what VFIO/VDPA already
> > does. 

Yes, the point here was to move the general APIs out of VFIO and into
a sharable location. So, of course one would expect some duplication
during the transition period.

> > Moreover, mapping page fault to subdevice requires pre-
> > registering subdevice fault data to IOMMU layer when binding
> > guest page table, while such fault data can be only retrieved from
> > parent driver through VFIO/VDPA.

Not sure what this means, page fault should be tied to the PASID, any
hookup needed for that should be done in-kernel when the device is
connected to the PASID.

> > space but they may be organized in multiple IOMMU domains based
> > on their bus type. How (should we let) the userspace know the
> > domain information and open an sva_FD for each domain is the main
> > problem here.

Why is one sva_FD per iommu domain required? The HW can attach the
same PASID to multiple iommu domains, right?

> > In the end we just realized that doing such generalization doesn't
> > really lead to a clear design and instead requires tight coordination
> > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > (especially about synchronization when the domain/device
> > association is changed or when the device/subdevice is being reset/
> > drained). Finally it may become a usability burden to the userspace
> > on proper use of the two interfaces on the assigned device.

If you have a list of things that needs to be done to attach a PCI
device to a PASID then of course they should be tidy kernel APIs
already, and not just hard wired into VFIO.

The worst outcome would be to have VDPA and VFIO have to different
ways to do all of this with a different set of bugs. Bug fixes/new
features in VFIO won't flow over to VDPA.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Liu, Yi L

> From: Jason Wang 
> Sent: Thursday, October 15, 2020 4:41 PM
> 
> 
> On 2020/10/15 ??3:58, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Thursday, October 15, 2020 2:52 PM
> >>
> >>
> >> On 2020/10/14 ??11:08, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Tuesday, October 13, 2020 2:22 PM
> 
> 
>  On 2020/10/12 ??4:38, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Monday, September 14, 2020 12:20 PM
> >>
> > [...]
> > > If it's possible, I would suggest a generic uAPI instead of
> > a VFIO
> >> specific one.
> >>
> >> Jason suggest something like /dev/sva. There will be a lot of
> >> other subsystems that could benefit from this (e.g vDPA).
> >>
> >> Have you ever considered this approach?
> >>
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o
> > losing necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through below
> > new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations
> > can be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good
> > thing both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system.
>  A question here, is IOASID expected to be the single management
>  interface for PASID?
> >>> yes
> >>>
>  (I'm asking since there're already vendor specific IDA based PASID
>  allocator e.g amdgpu_pasid_alloc())
> >>> That comes before IOASID core was introduced. I think it should be
> >>> changed to use the new generic interface. Jacob/Jean can better
> >>> comment if other reason exists for this exception.
> >>
> >> If there's no exception it should be fixed.
> >>
> >>
> > From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d),
>  I think we need a definition of "global" here. It looks to me for
>  vt-d the PASID table is per device.
> >>> PASID table is per device, thus VT-d could support per-device PASIDs
> >>> in concept.
> >>
> >> I think that's the requirement of PCIE spec which said PASID + RID
> >> identifies the process address space ID.
> >>
> >>
> >>>However on Intel platform we require PASIDs to be managed in
> >>> system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> >>> and ENQCMD together.
> >>
> >> Any reason for such requirement? (I'm not familiar with ENQCMD, but
> >> my understanding is that vSVA, SIOV or SR-IOV doesn't have the
> >> requirement for system-wide PASID).
> > ENQCMD is a new instruction to allow multiple processes submitting
> > workload to one shared workqueue. Each process has an unique PASID
> > saved in a MSR, which is included in the ENQCMD payload to indicate
> > the address space when the CPU sends to the device. As one process
> > might issue ENQCMD to multiple devices, OS-wide PASID allocation is
> > required both in host and guest side.
> >
> > When executing ENQCMD in the guest to a SIOV device, the guest
> > programmed value in the PASID_MSR must be translated to a host PASID
> > value for proper function/isolation as PASID represents the address
> > space. The translation is done through a new VMCS PASID translation
> > structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
> > must be allocated 'globally' cross all assigned devices otherwise it
> > may lead to 1:N mapping when a guest process issues ENQCMD to multiple
> > assigned devices/subdevices.
> >
> > There will be a KVM forum session for this topic btw.
> 
> 
> Thanks for the background. Now I see the restrict comes from ENQCMD.
> 
> 
> >
> >>
> >>> Thus the host creates only one 'global' PASID namespace but do use
> >>> per-device PASID table to assure isolation between devices on Intel
> >>> platforms. But ARM does it differently as Jean explained.
> >>> They have a global namespace for host processes on all host-owned
> >>> devices (s

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Jason Wang



On 2020/10/15 下午3:58, Tian, Kevin wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 2:52 PM


On 2020/10/14 上午11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
> If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.

A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.


If there's no exception it should be fixed.



From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),

I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.


I think that's the requirement of PCIE spec which said PASID + RID
identifies the process address space ID.



   However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.


Any reason for such requirement? (I'm not familiar with ENQCMD, but my
understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement
for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process
might issue ENQCMD to multiple devices, OS-wide PASID allocation is
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
must be allocated 'globally' cross all assigned devices otherwise it may
lead to 1:N mapping when a guest process issues ENQCMD to multiple
assigned devices/subdevices.

There will be a KVM forum session for this topic btw.



Thanks for the background. Now I see the restrict comes from ENQCMD.







Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes


while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.

Yes.



One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, w

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Tian, Kevin

> From: Jason Wang 
> Sent: Thursday, October 15, 2020 2:52 PM
> 
> 
> On 2020/10/14 上午11:08, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Tuesday, October 13, 2020 2:22 PM
> >>
> >>
> >> On 2020/10/12 下午4:38, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Monday, September 14, 2020 12:20 PM
> 
> >>> [...]
> >>>> If it's possible, I would suggest a generic uAPI instead of a VFIO
>  specific one.
> 
>  Jason suggest something like /dev/sva. There will be a lot of other
>  subsystems that could benefit from this (e.g vDPA).
> 
>  Have you ever considered this approach?
> 
> >>> Hi, Jason,
> >>>
> >>> We did some study on this approach and below is the output. It's a
> >>> long writing but I didn't find a way to further abstract w/o losing
> >>> necessary context. Sorry about that.
> >>>
> >>> Overall the real purpose of this series is to enable IOMMU nested
> >>> translation capability with vSVA as one major usage, through
> >>> below new uAPIs:
> >>>   1) Report/enable IOMMU nested translation capability;
> >>>   2) Allocate/free PASID;
> >>>   3) Bind/unbind guest page table;
> >>>   4) Invalidate IOMMU cache;
> >>>   5) Handle IOMMU page request/response (not in this series);
> >>> 1/3/4) is the minimal set for using IOMMU nested translation, with
> >>> the other two optional. For example, the guest may enable vSVA on
> >>> a device without using PASID. Or, it may bind its gIOVA page table
> >>> which doesn't require page fault support. Finally, all operations can
> >>> be applied to either physical device or subdevice.
> >>>
> >>> Then we evaluated each uAPI whether generalizing it is a good thing
> >>> both in concept and regarding to complexity.
> >>>
> >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> >>> allocation/free is through the IOASID sub-system.
> >>
> >> A question here, is IOASID expected to be the single management
> >> interface for PASID?
> > yes
> >
> >> (I'm asking since there're already vendor specific IDA based PASID
> >> allocator e.g amdgpu_pasid_alloc())
> > That comes before IOASID core was introduced. I think it should be
> > changed to use the new generic interface. Jacob/Jean can better
> > comment if other reason exists for this exception.
> 
> 
> If there's no exception it should be fixed.
> 
> 
> >
> >>
> >>>From this angle
> >>> we feel generalizing PASID management does make some sense.
> >>> First, PASID is just a number and not related to any device before
> >>> it's bound to a page table and IOMMU domain. Second, PASID is a
> >>> global resource (at least on Intel VT-d),
> >>
> >> I think we need a definition of "global" here. It looks to me for vt-d
> >> the PASID table is per device.
> > PASID table is per device, thus VT-d could support per-device PASIDs
> > in concept.
> 
> 
> I think that's the requirement of PCIE spec which said PASID + RID
> identifies the process address space ID.
> 
> 
> >   However on Intel platform we require PASIDs to be managed
> > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> > and ENQCMD together.
> 
> 
> Any reason for such requirement? (I'm not familiar with ENQCMD, but my
> understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement
> for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process 
might issue ENQCMD to multiple devices, OS-wide PASID allocation is 
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation 
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs 
must be allocated 'globally' cross all assigned devices otherwise it may 
lead to 1:N mapping when a guest process issues ENQCMD to multiple 
assigned devices/subdevices. 

There will be a KVM forum session for this topic btw.

> 
> 
> > Thus the host creates only one 'global' PASID
> > namespace but do use per-device PASID table to assure isolation between
> > devices on Intel platforms. But ARM does it differently as Jean explained.
> > They have a global namespace for host processes on all host-owned
> > devices (same as Intel), but then per-device namespace when a device
> > (and its PASID table) is assigned to userspace.
> >
> >> Another question, is this possible to have two DMAR hardware unit(at
> >> least I can see two even in my laptop). In this case, is PASID still a
> >> global resource?
> > yes
> >
> >>
> >>>while having separate VFIO/
> >>> VDPA allocation interfaces may easily cause confusion in userspace,
> >>> e.g. which interface to be u

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Jason Wang



On 2020/10/15 上午7:10, Alex Williamson wrote:

On Wed, 14 Oct 2020 03:08:31 +
"Tian, Kevin"  wrote:


From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM
  

[...]
   > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?
  

Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.


A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

   

   From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),


I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept. However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together. Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes

   

   while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.


Yes.

   

One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?


I'm not sure, but if you just want a permission, you probably can
introduce new capability (CAP_XXX) for this.

   

   A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.


I see, so I think the answer is to prepare for the namespace support
from the start. (btw, I don't see how namespace is handled in current
IOASID module?)

The PASID table is based on GPA when nested translation is enabled
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned
device. From this angle we don't need explicit namespace in the host
API. Just need a way to control how many PASIDs a process is allowed
to allocate in the global namespace. btw IOASID module already has
'set' concept per-process and PASIDs are managed per-set. Then the
quota control can be easily introduced in the 'set' level.

   

   I'm not sure
how such requirement can be unified w/o involving passthrough
frameworks, or whether ARM could also switch to global PASID
style...

Second, IOMMU nested translation is a per IOMMU domain
capability. Since IO

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-14 Thread Jason Wang



On 2020/10/14 上午11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
   > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.


A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.



If there's no exception it should be fixed.







   From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),


I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.



I think that's the requirement of PCIE spec which said PASID + RID 
identifies the process address space ID.




  However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.



Any reason for such requirement? (I'm not familiar with ENQCMD, but my 
understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement 
for system-wide PASID).




Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes




   while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.


Yes.



One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?


I'm not sure, but if you just want a permission, you probably can
introduce new capability (CAP_XXX) for this.



   A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.


I see, so I think the answer is to prepare for the namespace support
from the start. (btw, I don't see how namespace is handled in current
IOASID module?)

The PASID table is based on GPA when nested translation is enabled
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned
device. From this angle we don't need explicit namespace in the host
API. Just need a way to control how many PASIDs a process is allowed
to allocate in the global namespace. btw IOASID module already has
'set' concept per-process and PASIDs are managed per-set. Then the
quota control can be

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-14 Thread Alex Williamson

On Wed, 14 Oct 2020 03:08:31 +
"Tian, Kevin"  wrote:

> > From: Jason Wang 
> > Sent: Tuesday, October 13, 2020 2:22 PM
> > 
> > 
> > On 2020/10/12 下午4:38, Tian, Kevin wrote:  
> > >> From: Jason Wang 
> > >> Sent: Monday, September 14, 2020 12:20 PM
> > >>  
> > > [...]  
> > >   > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > >> specific one.
> > >>
> > >> Jason suggest something like /dev/sva. There will be a lot of other
> > >> subsystems that could benefit from this (e.g vDPA).
> > >>
> > >> Have you ever considered this approach?
> > >>  
> > > Hi, Jason,
> > >
> > > We did some study on this approach and below is the output. It's a
> > > long writing but I didn't find a way to further abstract w/o losing
> > > necessary context. Sorry about that.
> > >
> > > Overall the real purpose of this series is to enable IOMMU nested
> > > translation capability with vSVA as one major usage, through
> > > below new uAPIs:
> > >   1) Report/enable IOMMU nested translation capability;
> > >   2) Allocate/free PASID;
> > >   3) Bind/unbind guest page table;
> > >   4) Invalidate IOMMU cache;
> > >   5) Handle IOMMU page request/response (not in this series);
> > > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > > the other two optional. For example, the guest may enable vSVA on
> > > a device without using PASID. Or, it may bind its gIOVA page table
> > > which doesn't require page fault support. Finally, all operations can
> > > be applied to either physical device or subdevice.
> > >
> > > Then we evaluated each uAPI whether generalizing it is a good thing
> > > both in concept and regarding to complexity.
> > >
> > > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > > allocation/free is through the IOASID sub-system.  
> > 
> > 
> > A question here, is IOASID expected to be the single management
> > interface for PASID?  
> 
> yes
> 
> > 
> > (I'm asking since there're already vendor specific IDA based PASID
> > allocator e.g amdgpu_pasid_alloc())  
> 
> That comes before IOASID core was introduced. I think it should be
> changed to use the new generic interface. Jacob/Jean can better
> comment if other reason exists for this exception.
> 
> > 
> >   
> > >   From this angle
> > > we feel generalizing PASID management does make some sense.
> > > First, PASID is just a number and not related to any device before
> > > it's bound to a page table and IOMMU domain. Second, PASID is a
> > > global resource (at least on Intel VT-d),  
> > 
> > 
> > I think we need a definition of "global" here. It looks to me for vt-d
> > the PASID table is per device.  
> 
> PASID table is per device, thus VT-d could support per-device PASIDs
> in concept. However on Intel platform we require PASIDs to be managed 
> in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV 
> and ENQCMD together. Thus the host creates only one 'global' PASID 
> namespace but do use per-device PASID table to assure isolation between 
> devices on Intel platforms. But ARM does it differently as Jean explained. 
> They have a global namespace for host processes on all host-owned 
> devices (same as Intel), but then per-device namespace when a device 
> (and its PASID table) is assigned to userspace.
> 
> > 
> > Another question, is this possible to have two DMAR hardware unit(at
> > least I can see two even in my laptop). In this case, is PASID still a
> > global resource?  
> 
> yes
> 
> > 
> >   
> > >   while having separate VFIO/
> > > VDPA allocation interfaces may easily cause confusion in userspace,
> > > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > > Moreover, an unified interface allows centralized control over how
> > > many PASIDs are allowed per process.  
> > 
> > 
> > Yes.
> > 
> >   
> > >
> > > One unclear part with this generalization is about the permission.
> > > Do we open this interface to any process or only to those which
> > > have assigned devices? If the latter, what would be the mechanism
> > > to coordinate between this new interface and specific passthrough
> > > frameworks?  
> > 
> > 
> > I'm not sure, but if you just want a permission, you probably can
> > introduce new capability (CAP_XXX) for this.
> > 
> >   
> > >   A more tricky case, vSVA support on ARM (Eric/Jean
> > > please correct me) plans to do per-device PASID namespace which
> > > is built on a bind_pasid_table iommu callback to allow guest fully
> > > manage its PASIDs on a given passthrough device.  
> > 
> > 
> > I see, so I think the answer is to prepare for the namespace support
> > from the start. (btw, I don't see how namespace is handled in current
> > IOASID module?)  
> 
> The PASID table is based on GPA when nested translation is enabled 
> on ARM SMMU. This design implies that the guest manages PASID
> table thus PASIDs instead of going through host-side API on assigned 
> device. From this angle we don't need explicit namespace in the

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin

Hi, Alex and Jason (G),

How about your opinion for this new proposal? For now looks both
Jason (W) and Jean are OK with this direction and more discussions
are possibly required for the new /dev/ioasid interface. Internally 
we're doing a quick prototype to see any unforeseen issue with this
separation. 

Please let us know your thoughts.

Thanks
Kevin

> From: Tian, Kevin 
> Sent: Monday, October 12, 2020 4:39 PM
> 
> > From: Jason Wang 
> > Sent: Monday, September 14, 2020 12:20 PM
> >
> [...]
>  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > specific one.
> >
> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).
> >
> > Have you ever considered this approach?
> >
> 
> Hi, Jason,
> 
> We did some study on this approach and below is the output. It's a
> long writing but I didn't find a way to further abstract w/o losing
> necessary context. Sorry about that.
> 
> Overall the real purpose of this series is to enable IOMMU nested
> translation capability with vSVA as one major usage, through
> below new uAPIs:
>   1) Report/enable IOMMU nested translation capability;
>   2) Allocate/free PASID;
>   3) Bind/unbind guest page table;
>   4) Invalidate IOMMU cache;
>   5) Handle IOMMU page request/response (not in this series);
> 1/3/4) is the minimal set for using IOMMU nested translation, with
> the other two optional. For example, the guest may enable vSVA on
> a device without using PASID. Or, it may bind its gIOVA page table
> which doesn't require page fault support. Finally, all operations can
> be applied to either physical device or subdevice.
> 
> Then we evaluated each uAPI whether generalizing it is a good thing
> both in concept and regarding to complexity.
> 
> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> allocation/free is through the IOASID sub-system. From this angle
> we feel generalizing PASID management does make some sense.
> First, PASID is just a number and not related to any device before
> it's bound to a page table and IOMMU domain. Second, PASID is a
> global resource (at least on Intel VT-d), while having separate VFIO/
> VDPA allocation interfaces may easily cause confusion in userspace,
> e.g. which interface to be used if both VFIO/VDPA devices exist.
> Moreover, an unified interface allows centralized control over how
> many PASIDs are allowed per process.
> 
> One unclear part with this generalization is about the permission.
> Do we open this interface to any process or only to those which
> have assigned devices? If the latter, what would be the mechanism
> to coordinate between this new interface and specific passthrough
> frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> please correct me) plans to do per-device PASID namespace which
> is built on a bind_pasid_table iommu callback to allow guest fully
> manage its PASIDs on a given passthrough device. I'm not sure
> how such requirement can be unified w/o involving passthrough
> frameworks, or whether ARM could also switch to global PASID
> style...
> 
> Second, IOMMU nested translation is a per IOMMU domain
> capability. Since IOMMU domains are managed by VFIO/VDPA
>  (alloc/free domain, attach/detach device, set/get domain attribute,
> etc.), reporting/enabling the nesting capability is an natural
> extension to the domain uAPI of existing passthrough frameworks.
> Actually, VFIO already includes a nesting enable interface even
> before this series. So it doesn't make sense to generalize this uAPI
> out.
> 
> Then the tricky part comes with the remaining operations (3/4/5),
> which are all backed by iommu_ops thus effective only within an
> IOMMU domain. To generalize them, the first thing is to find a way
> to associate the sva_FD (opened through generic /dev/sva) with an
> IOMMU domain that is created by VFIO/VDPA. The second thing is
> to replicate {domain<->device/subdevice} association in /dev/sva
> path because some operations (e.g. page fault) is triggered/handled
> per device/subdevice. Therefore, /dev/sva must provide both per-
> domain and per-device uAPIs similar to what VFIO/VDPA already
> does. Moreover, mapping page fault to subdevice requires pre-
> registering subdevice fault data to IOMMU layer when binding
> guest page table, while such fault data can be only retrieved from
> parent driver through VFIO/VDPA.
> 
> However, we failed to find a good way even at the 1st step about
> domain association. The iommu domains are not exposed to the
> userspace, and there is no 1:1 mapping between domain and device.
> In VFIO, all devices within the same VFIO container share the address
> space but they may be organized in multiple IOMMU domains based
> on their bus type. How (should we let) the userspace know the
> domain information and open an sva_FD for each domain is the main
> problem here.
> 
> In the end we just realized that doing such generalization

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin

> From: Jason Wang 
> Sent: Tuesday, October 13, 2020 2:22 PM
> 
> 
> On 2020/10/12 下午4:38, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Monday, September 14, 2020 12:20 PM
> >>
> > [...]
> >   > If it's possible, I would suggest a generic uAPI instead of a VFIO
> >> specific one.
> >>
> >> Jason suggest something like /dev/sva. There will be a lot of other
> >> subsystems that could benefit from this (e.g vDPA).
> >>
> >> Have you ever considered this approach?
> >>
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o losing
> > necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through
> > below new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations can
> > be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good thing
> > both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system.
> 
> 
> A question here, is IOASID expected to be the single management
> interface for PASID?

yes

> 
> (I'm asking since there're already vendor specific IDA based PASID
> allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

> 
> 
> >   From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d),
> 
> 
> I think we need a definition of "global" here. It looks to me for vt-d
> the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept. However on Intel platform we require PASIDs to be managed 
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV 
and ENQCMD together. Thus the host creates only one 'global' PASID 
namespace but do use per-device PASID table to assure isolation between 
devices on Intel platforms. But ARM does it differently as Jean explained. 
They have a global namespace for host processes on all host-owned 
devices (same as Intel), but then per-device namespace when a device 
(and its PASID table) is assigned to userspace.

> 
> Another question, is this possible to have two DMAR hardware unit(at
> least I can see two even in my laptop). In this case, is PASID still a
> global resource?

yes

> 
> 
> >   while having separate VFIO/
> > VDPA allocation interfaces may easily cause confusion in userspace,
> > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > Moreover, an unified interface allows centralized control over how
> > many PASIDs are allowed per process.
> 
> 
> Yes.
> 
> 
> >
> > One unclear part with this generalization is about the permission.
> > Do we open this interface to any process or only to those which
> > have assigned devices? If the latter, what would be the mechanism
> > to coordinate between this new interface and specific passthrough
> > frameworks?
> 
> 
> I'm not sure, but if you just want a permission, you probably can
> introduce new capability (CAP_XXX) for this.
> 
> 
> >   A more tricky case, vSVA support on ARM (Eric/Jean
> > please correct me) plans to do per-device PASID namespace which
> > is built on a bind_pasid_table iommu callback to allow guest fully
> > manage its PASIDs on a given passthrough device.
> 
> 
> I see, so I think the answer is to prepare for the namespace support
> from the start. (btw, I don't see how namespace is handled in current
> IOASID module?)

The PASID table is based on GPA when nested translation is enabled 
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned 
device. From this angle we don't need explicit namespace in the host 
API. Just need a way to control how many PASIDs a process is allowed 
to allocate in the global namespace. btw IOASID module already has 
'set' concept per-process and PASIDs are managed per-set. Then the 
quota control can be easily introduced in the 'set' level.

> 
> 
> >   I'm not sure
> > how such requirement can be un

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Tuesday, October 13, 2020 6:28 PM
> 
> On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > > From: Jason Wang 
> > > Sent: Monday, September 14, 2020 12:20 PM
> > >
> > [...]
> >  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > > specific one.
> > >
> > > Jason suggest something like /dev/sva. There will be a lot of other
> > > subsystems that could benefit from this (e.g vDPA).
> > >
> > > Have you ever considered this approach?
> > >
> >
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o losing
> > necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through
> > below new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations can
> > be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good thing
> > both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system. From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d), while having separate VFIO/
> > VDPA allocation interfaces may easily cause confusion in userspace,
> > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > Moreover, an unified interface allows centralized control over how
> > many PASIDs are allowed per process.
> >
> > One unclear part with this generalization is about the permission.
> > Do we open this interface to any process or only to those which
> > have assigned devices? If the latter, what would be the mechanism
> > to coordinate between this new interface and specific passthrough
> > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> > please correct me) plans to do per-device PASID namespace which
> > is built on a bind_pasid_table iommu callback to allow guest fully
> > manage its PASIDs on a given passthrough device.
> 
> Yes we need a bind_pasid_table. The guest needs to allocate the PASID
> tables because they are accessed via guest-physical addresses by the HW
> SMMU.
> 
> With bind_pasid_table, the invalidation message also requires a scope to
> invalidate a whole PASID context, in addition to invalidating a mappings
> ranges.
> 
> > I'm not sure
> > how such requirement can be unified w/o involving passthrough
> > frameworks, or whether ARM could also switch to global PASID
> > style...
> 
> Not planned at the moment, sorry. It requires a PV IOMMU to do PASID
> allocation, which is possible with virtio-iommu but not with a vSMMU
> emulation. The VM will manage its own PASID space. The upside is that we
> don't need userspace access to IOASID, so I won't pester you with comments
> on that part of the API :)

It makes sense. Possibly in the future when you plan to support 
SIOV-like capability then you may have to convert PASID table
to use host physical address then the same API could be reused. :)

Thanks
Kevin

> 
> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.
> 
> Agree for enabling, but for reporting we did consider adding a sysfs
> interface in /sys/class/iommu/ describing an IOMMU's properties. Then
> opted for VFIO capabilities to keep the API nice and contained, but if
> we're breaking up the API, sysfs might be more convenient to use and
> extend.
> 
> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Jean-Philippe Brucker

On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > From: Jason Wang 
> > Sent: Monday, September 14, 2020 12:20 PM
> >
> [...]
>  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > specific one.
> > 
> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).
> > 
> > Have you ever considered this approach?
> > 
> 
> Hi, Jason,
> 
> We did some study on this approach and below is the output. It's a
> long writing but I didn't find a way to further abstract w/o losing 
> necessary context. Sorry about that.
> 
> Overall the real purpose of this series is to enable IOMMU nested
> translation capability with vSVA as one major usage, through
> below new uAPIs:
>   1) Report/enable IOMMU nested translation capability;
>   2) Allocate/free PASID;
>   3) Bind/unbind guest page table;
>   4) Invalidate IOMMU cache;
>   5) Handle IOMMU page request/response (not in this series);
> 1/3/4) is the minimal set for using IOMMU nested translation, with 
> the other two optional. For example, the guest may enable vSVA on 
> a device without using PASID. Or, it may bind its gIOVA page table 
> which doesn't require page fault support. Finally, all operations can 
> be applied to either physical device or subdevice.
> 
> Then we evaluated each uAPI whether generalizing it is a good thing 
> both in concept and regarding to complexity.
> 
> First, unlike other uAPIs which are all backed by iommu_ops, PASID 
> allocation/free is through the IOASID sub-system. From this angle
> we feel generalizing PASID management does make some sense. 
> First, PASID is just a number and not related to any device before 
> it's bound to a page table and IOMMU domain. Second, PASID is a 
> global resource (at least on Intel VT-d), while having separate VFIO/
> VDPA allocation interfaces may easily cause confusion in userspace,
> e.g. which interface to be used if both VFIO/VDPA devices exist. 
> Moreover, an unified interface allows centralized control over how 
> many PASIDs are allowed per process.
> 
> One unclear part with this generalization is about the permission.
> Do we open this interface to any process or only to those which
> have assigned devices? If the latter, what would be the mechanism
> to coordinate between this new interface and specific passthrough 
> frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> please correct me) plans to do per-device PASID namespace which
> is built on a bind_pasid_table iommu callback to allow guest fully 
> manage its PASIDs on a given passthrough device.

Yes we need a bind_pasid_table. The guest needs to allocate the PASID
tables because they are accessed via guest-physical addresses by the HW
SMMU.

With bind_pasid_table, the invalidation message also requires a scope to
invalidate a whole PASID context, in addition to invalidating a mappings
ranges.

> I'm not sure 
> how such requirement can be unified w/o involving passthrough
> frameworks, or whether ARM could also switch to global PASID 
> style...

Not planned at the moment, sorry. It requires a PV IOMMU to do PASID
allocation, which is possible with virtio-iommu but not with a vSMMU
emulation. The VM will manage its own PASID space. The upside is that we
don't need userspace access to IOASID, so I won't pester you with comments
on that part of the API :)

> Second, IOMMU nested translation is a per IOMMU domain
> capability. Since IOMMU domains are managed by VFIO/VDPA
>  (alloc/free domain, attach/detach device, set/get domain attribute,
> etc.), reporting/enabling the nesting capability is an natural 
> extension to the domain uAPI of existing passthrough frameworks. 
> Actually, VFIO already includes a nesting enable interface even 
> before this series. So it doesn't make sense to generalize this uAPI 
> out.

Agree for enabling, but for reporting we did consider adding a sysfs
interface in /sys/class/iommu/ describing an IOMMU's properties. Then
opted for VFIO capabilities to keep the API nice and contained, but if
we're breaking up the API, sysfs might be more convenient to use and
extend.

> Then the tricky part comes with the remaining operations (3/4/5),
> which are all backed by iommu_ops thus effective only within an 
> IOMMU domain. To generalize them, the first thing is to find a way 
> to associate the sva_FD (opened through generic /dev/sva) with an 
> IOMMU domain that is created by VFIO/VDPA. The second thing is 
> to replicate {domain<->device/subdevice} association in /dev/sva 
> path because some operations (e.g. page fault) is triggered/handled 
> per device/subdevice. Therefore, /dev/sva must provide both per-
> domain and per-device uAPIs similar to what VFIO/VDPA already 
> does. Moreover, mapping page fault to subdevice requires pre-
> registering subdevice fault data to IOMMU layer when binding 
> guest page table, while such fault data can be only retrieved from 
> parent

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-12 Thread Jason Wang



On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
  > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.



A question here, is IOASID expected to be the single management 
interface for PASID?


(I'm asking since there're already vendor specific IDA based PASID 
allocator e.g amdgpu_pasid_alloc())




  From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),



I think we need a definition of "global" here. It looks to me for vt-d 
the PASID table is per device.


Another question, is this possible to have two DMAR hardware unit(at 
least I can see two even in my laptop). In this case, is PASID still a 
global resource?




  while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.



Yes.




One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?



I'm not sure, but if you just want a permission, you probably can 
introduce new capability (CAP_XXX) for this.




  A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.



I see, so I think the answer is to prepare for the namespace support 
from the start. (btw, I don't see how namespace is handled in current 
IOASID module?)




  I'm not sure
how such requirement can be unified w/o involving passthrough
frameworks, or whether ARM could also switch to global PASID
style...

Second, IOMMU nested translation is a per IOMMU domain
capability. Since IOMMU domains are managed by VFIO/VDPA
  (alloc/free domain, attach/detach device, set/get domain attribute,
etc.), reporting/enabling the nesting capability is an natural
extension to the domain uAPI of existing passthrough frameworks.
Actually, VFIO already includes a nesting enable interface even
before this series. So it doesn't make sense to generalize this uAPI
out.



So my understanding is that VFIO already:

1) use multiple fds
2) separate IOMMU ops to a dedicated container fd (type1 iommu)
3) provides API to associated devices/group with a container

And all the proposal in this series is to reuse the container fd. It 
should be possible to replace e.g type1 IOMMU with a unified module.





Then the tricky part comes with the remaining operations (3/4/5),
which are all backed by iommu_ops thus effective only within an
IOMMU domain. To generalize them, the first thing is to find a way
to associate the sva_FD (opened through generic /dev/sva) with an
IOMMU domain that is created by VFIO/VDPA. The second thing is
to replicate {domain<->device/subdevice} association in /dev/sva
path because some operations (e.g. page fault) is triggered/handled
per device/subdevice.



Is there any reason that the #PF can not be handled via SVA fd?



  Therefore, /dev/sva must provide both per-
domain and per-device uAPIs similar to what VFIO/VDPA already
does. Moreover, mapping page fault to subdevice requires pre-
registering subdevice fault data to IOMMU layer when binding
guest page ta

54 matches

Mail list logo