Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-06 Thread Xu Zaibo



On 2018/9/5 19:02, Jean-Philippe Brucker wrote:

On 05/09/2018 04:15, Xu Zaibo wrote:

   1. While the series are finished well, VFIO-PCI device can be held
by only one process
   through binding IOCTL command without PASID (without PASID
being exposed user space).

It could, but isn't supported at the moment. In addition to adding
support in the I/O page fault code, we'd also need to update the VFIO
API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
The case you describe isn't compatible with MAP/UNMAP, since the process
manages the shared address space with mmap or malloc. We'd probably need
to introduce a new VFIO IOMMU type, in which case the bind could be
performed implicitly when the process does VFIO_SET_IOMMU. Then the
process wouldn't need to send an additional BIND IOCTL.

ok. got it.  This is the legacy mode, so all the VFIO APIs are kept
unchanged?

Yes, existing VFIO semantics are preserved


   2. While using VFIO-PCI device to support multiple processes with
SVA series, a primary
   process with multiple secondary processes must be deployed just
like DPDK(https://www.dpdk.org/).
   And, the PASID still has to be exposed to user land.

Right. A third case, also implemented by this patch (and complete), is
the primary process simply doing a BIND for itself, and using the
returned PASID to share its own address space with the device.


ok. But I am worried that the sulotion of one primary processes with
several secondary ones

is a little bit limited. Maybe, users don't want to depend on the
primary process. :)

I don't see a better way for vfio-pci, though. But more importantly, I
don't know of any users :) While the feature is great for testing new
hardware, and I've been using it for all kinds of stress testing, I
haven't received feedback from possible users in production settings
(DPDK etc) and can't speculate about what they'd prefer.

At present, It seems no other way existing while being compatible with 
current logic.

Thank you.

Thanks,
Zaibo



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-05 Thread Jean-Philippe Brucker
On 05/09/2018 04:15, Xu Zaibo wrote:
>>>   1. While the series are finished well, VFIO-PCI device can be held
>>> by only one process
>>>   through binding IOCTL command without PASID (without PASID
>>> being exposed user space).
>> It could, but isn't supported at the moment. In addition to adding
>> support in the I/O page fault code, we'd also need to update the VFIO
>> API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
>> The case you describe isn't compatible with MAP/UNMAP, since the process
>> manages the shared address space with mmap or malloc. We'd probably need
>> to introduce a new VFIO IOMMU type, in which case the bind could be
>> performed implicitly when the process does VFIO_SET_IOMMU. Then the
>> process wouldn't need to send an additional BIND IOCTL.
> ok. got it.  This is the legacy mode, so all the VFIO APIs are kept 
> unchanged?

Yes, existing VFIO semantics are preserved

>>>   2. While using VFIO-PCI device to support multiple processes with
>>> SVA series, a primary
>>>   process with multiple secondary processes must be deployed just
>>> like DPDK(https://www.dpdk.org/).
>>>   And, the PASID still has to be exposed to user land.
>> Right. A third case, also implemented by this patch (and complete), is
>> the primary process simply doing a BIND for itself, and using the
>> returned PASID to share its own address space with the device.
>>
> ok. But I am worried that the sulotion of one primary processes with 
> several secondary ones
> 
> is a little bit limited. Maybe, users don't want to depend on the 
> primary process. :)

I don't see a better way for vfio-pci, though. But more importantly, I
don't know of any users :) While the feature is great for testing new
hardware, and I've been using it for all kinds of stress testing, I
haven't received feedback from possible users in production settings
(DPDK etc) and can't speculate about what they'd prefer.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-04 Thread Jean-Philippe Brucker
On 04/09/2018 03:12, Xu Zaibo wrote:
> On 2018/9/3 18:34, Jean-Philippe Brucker wrote:
>> On 01/09/18 03:23, Xu Zaibo wrote:
>>> As one application takes a whole function while using VFIO-PCI, why do
>>> the application and the
>>> function need to enable PASID capability? (Since just one I/O page table
>>> is enough for them.)
>> At the moment the series doesn't provide support for SVA without PASID
>> (on the I/O page fault path, 08/40). In addition the BIND ioctl could be
>> used by the owner application to bind other processes (slaves) and
>> perform sub-assignment. But that feature is incomplete because we don't
>> send stop_pasid notification to the owner when a slave dies.
>>
> So, Could I understand like this?
> 
>  1. While the series are finished well, VFIO-PCI device can be held 
> by only one process
>  through binding IOCTL command without PASID (without PASID 
> being exposed user space).

It could, but isn't supported at the moment. In addition to adding
support in the I/O page fault code, we'd also need to update the VFIO
API. Currently a VFIO_TYPE1 domain always supports the MAP/UNMAP ioctl.
The case you describe isn't compatible with MAP/UNMAP, since the process
manages the shared address space with mmap or malloc. We'd probably need
to introduce a new VFIO IOMMU type, in which case the bind could be
performed implicitly when the process does VFIO_SET_IOMMU. Then the
process wouldn't need to send an additional BIND IOCTL.

>  2. While using VFIO-PCI device to support multiple processes with 
> SVA series, a primary
>  process with multiple secondary processes must be deployed just 
> like DPDK(https://www.dpdk.org/).
>  And, the PASID still has to be exposed to user land.

Right. A third case, also implemented by this patch (and complete), is
the primary process simply doing a BIND for itself, and using the
returned PASID to share its own address space with the device.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-03 Thread Xu Zaibo




On 2018/9/3 18:34, Jean-Philippe Brucker wrote:

On 01/09/18 03:23, Xu Zaibo wrote:

As one application takes a whole function while using VFIO-PCI, why do
the application and the
function need to enable PASID capability? (Since just one I/O page table
is enough for them.)

At the moment the series doesn't provide support for SVA without PASID
(on the I/O page fault path, 08/40). In addition the BIND ioctl could be
used by the owner application to bind other processes (slaves) and
perform sub-assignment. But that feature is incomplete because we don't
send stop_pasid notification to the owner when a slave dies.


So, Could I understand like this?

1. While the series are finished well, VFIO-PCI device can be held 
by only one process
through binding IOCTL command without PASID (without PASID 
being exposed user space).


2. While using VFIO-PCI device to support multiple processes with 
SVA series, a primary
process with multiple secondary processes must be deployed just 
like DPDK(https://www.dpdk.org/).

And, the PASID still has to be exposed to user land.


Thanks,
Zaibo

.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-09-03 Thread Jean-Philippe Brucker
On 01/09/18 03:23, Xu Zaibo wrote:
> As one application takes a whole function while using VFIO-PCI, why do 
> the application and the
> function need to enable PASID capability? (Since just one I/O page table 
> is enough for them.)

At the moment the series doesn't provide support for SVA without PASID
(on the I/O page fault path, 08/40). In addition the BIND ioctl could be
used by the owner application to bind other processes (slaves) and
perform sub-assignment. But that feature is incomplete because we don't
send stop_pasid notification to the owner when a slave dies.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-08-31 Thread Xu Zaibo

Hi Jean,

On 2018/8/31 21:34, Jean-Philippe Brucker wrote:

On 27/08/18 09:06, Xu Zaibo wrote:

+struct vfio_iommu_type1_bind_process {
+__u32flags;
+#define VFIO_IOMMU_BIND_PID(1 << 0)
+__u32pasid;

As I am doing some works on the SVA patch set. I just consider why the
user space need this pasid.
Maybe, is it much more reasonable to set the pasid into all devices
under the vfio container by
a call back function from 'vfio_devices'  while
'VFIO_IOMMU_BIND_PROCESS' CMD is executed
in kernel land? I am not sure because there exists no suitable call back
in 'vfio_device' at present.

When using vfio-pci, the kernel doesn't know how to program the PASID
into the device because the only kernel driver for the device is the
generic vfio-pci module. The PCI specification doesn't describe a way of
setting up the PASID, it's vendor-specific. Only the userspace
application owning the device (and calling VFIO_IOMMU_BIND) knows how to
do it, so we return the allocated PASID.

Note that unlike vfio-mdev where applications share slices of a
function, with vfio-pci one application owns the whole function so it's
safe to let userspace set the PASID in hardware. With vfio-mdev it's the
kernel driver that should be in charge of setting the PASID as you
described, and we wouldn't have a reason to return the PASID in the
vfio_iommu_type1_bind_process structure.

Understood. But I still get the following confusion:

As one application takes a whole function while using VFIO-PCI, why do 
the application and the
function need to enable PASID capability? (Since just one I/O page table 
is enough for them.)


Maybe I misunderstood, hope you can help me clear it. Thank you very much.

Thanks,
Zaibo

.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-08-31 Thread Jean-Philippe Brucker
Hi Zaibo,

On 27/08/18 09:06, Xu Zaibo wrote:
>> +struct vfio_iommu_type1_bind_process {
>> +    __u32    flags;
>> +#define VFIO_IOMMU_BIND_PID    (1 << 0)
>> +    __u32    pasid;
> As I am doing some works on the SVA patch set. I just consider why the
> user space need this pasid.
> Maybe, is it much more reasonable to set the pasid into all devices
> under the vfio container by
> a call back function from 'vfio_devices'  while
> 'VFIO_IOMMU_BIND_PROCESS' CMD is executed
> in kernel land? I am not sure because there exists no suitable call back
> in 'vfio_device' at present.

When using vfio-pci, the kernel doesn't know how to program the PASID
into the device because the only kernel driver for the device is the
generic vfio-pci module. The PCI specification doesn't describe a way of
setting up the PASID, it's vendor-specific. Only the userspace
application owning the device (and calling VFIO_IOMMU_BIND) knows how to
do it, so we return the allocated PASID.

Note that unlike vfio-mdev where applications share slices of a
function, with vfio-pci one application owns the whole function so it's
safe to let userspace set the PASID in hardware. With vfio-mdev it's the
kernel driver that should be in charge of setting the PASID as you
described, and we wouldn't have a reason to return the PASID in the
vfio_iommu_type1_bind_process structure.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-08-27 Thread Xu Zaibo

Hi Jean,

On 2018/5/12 3:06, Jean-Philippe Brucker wrote:

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1aa7b82e8169..dc07752c8fe8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -665,6 +665,82 @@ struct vfio_iommu_type1_dma_unmap {
  #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15)
  #define VFIO_IOMMU_DISABLE_IO(VFIO_TYPE, VFIO_BASE + 16)
  
+/*

+ * VFIO_IOMMU_BIND_PROCESS
+ *
+ * Allocate a PASID for a process address space, and use it to attach this
+ * process to all devices in the container. Devices can then tag their DMA
+ * traffic with the returned @pasid to perform transactions on the associated
+ * virtual address space. Mapping and unmapping buffers is performed by 
standard
+ * functions such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process 
to
+ * bind. Otherwise the current task is bound. Given that the caller owns the
+ * device, setting this flag grants the caller read and write permissions on 
the
+ * entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound 
by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ *
+ * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
+ * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
+ * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the 
current
+ * task from the container.
+ */
+struct vfio_iommu_type1_bind_process {
+   __u32   flags;
+#define VFIO_IOMMU_BIND_PID(1 << 0)
+   __u32   pasid;
As I am doing some works on the SVA patch set. I just consider why the 
user space need this pasid.
Maybe, is it much more reasonable to set the pasid into all devices 
under the vfio container by
a call back function from 'vfio_devices'  while 
'VFIO_IOMMU_BIND_PROCESS' CMD is executed
in kernel land? I am not sure because there exists no suitable call back 
in 'vfio_device' at present.


Thanks,
Zaibo

+   __s32   pid;
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+   __u32   argsz;
+   __u32   flags;
+#define VFIO_IOMMU_BIND_PROCESS(1 << 0)
+   __u8data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND  _IO(VFIO_TYPE, VFIO_BASE + 23)
+
  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
  
  /*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-29 Thread Xu Zaibo

Hi,

On 2018/5/29 19:55, Jean-Philippe Brucker wrote:

(If possible, please reply in plain text to the list. Reading this in a
text-only reader is confusing, because all quotes have the same level)

Sorry for that, I have reset the thunderbird, :) thanks.

On 26/05/18 04:53, Xu Zaibo wrote:

I guess there may be some misunderstandings :).

 From the current patches, 'iommu_sva_device_shutdown' is called by 
'vfio_iommu_sva_shutdown', which
is mainly used by 'vfio_iommu_type1_detach_group' that is finally called by 
processes' release of vfio facilitiy
automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the 
processes.

So, just image that 2 processes is working on the device with IOPF feature, and 
the 2 do the following to enable SVA:

 1.open("/dev/vfio/vfio") ;

2.open the group of the devcie by calling open("/dev/vfio/x"), but now,
  I think the second processes will fail to open the group because current 
VFIO thinks that one group only can be used by one process/vm,
  at present, I use mediated device to create more groups on the parent 
device to support multiple processes;

 3.VFIO_GROUP_SET_CONTAINER;

 4.VFIO_SET_IOMMU;

 5.VFIO_IOMMU_BIND;

I have a concern regarding your driver. With mdev you can't allow
processes to program the PASID themselves, since the parent device has a
single PASID table. You lose all isolation since processes could write
any value in the PASID field and access address spaces of other
processes bound to this parent device (even if the BIND call was for
other mdevs).
Yes, if wrapdrive do nothing on this PASID setting procedure in kernel 
space, I think

there definitely exists this security risk.

The wrap driver has to mediate calls to bind(), and either program the
PASID into the device itself, or at least check that, when receiving a
SET_PASID ioctl from a process, the given PASID was actually allocated
to the process.

Yes, good advice, thanks.



 6.Do some works with the hardware working unit filled by PASID on the 
device;

7.VFIO_IOMMU_UNBIND;

 8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another 
process to finish works of the step 6;

 9. close(group); close(containner);


So, my idea is: If it is possible to just release the params or facilities that 
only belong to the current process while the process shutdown the device,
and while the last process releases them all. Then, as in the above step 8, we
don't need to wait, or maybe wait for a very long time if the other process is 
doing lots of work on the device.

Given that you need to notify the mediating driver of IOMMU_BIND calls
as explained above, you could do something similar for shutdown: from
the mdev driver, call iommu_sva_shutdown_device() only for the last mdev.
Currently, I add an API to check if it is the last mdev in wrapdrive, as 
vfio shutdowns the device,

it call the API to do the check at first.

Thanks
Zaibo


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-29 Thread Jean-Philippe Brucker
(If possible, please reply in plain text to the list. Reading this in a
text-only reader is confusing, because all quotes have the same level)

On 26/05/18 04:53, Xu Zaibo wrote:
> I guess there may be some misunderstandings :).
> 
> From the current patches, 'iommu_sva_device_shutdown' is called by 
> 'vfio_iommu_sva_shutdown', which
> is mainly used by 'vfio_iommu_type1_detach_group' that is finally called by 
> processes' release of vfio facilitiy
> automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the 
> processes.
> 
> So, just image that 2 processes is working on the device with IOPF feature, 
> and the 2 do the following to enable SVA:
> 
> 1.open("/dev/vfio/vfio") ;
> 
>2.open the group of the devcie by calling open("/dev/vfio/x"), but now,
>  I think the second processes will fail to open the group because current 
> VFIO thinks that one group only can be used by one process/vm,
>  at present, I use mediated device to create more groups on the parent 
> device to support multiple processes;
> 
> 3.VFIO_GROUP_SET_CONTAINER;
> 
> 4.VFIO_SET_IOMMU;
> 
> 5.VFIO_IOMMU_BIND;

I have a concern regarding your driver. With mdev you can't allow
processes to program the PASID themselves, since the parent device has a
single PASID table. You lose all isolation since processes could write
any value in the PASID field and access address spaces of other
processes bound to this parent device (even if the BIND call was for
other mdevs).

The wrap driver has to mediate calls to bind(), and either program the
PASID into the device itself, or at least check that, when receiving a
SET_PASID ioctl from a process, the given PASID was actually allocated
to the process.

> 6.Do some works with the hardware working unit filled by PASID on the 
> device;
> 
>7.VFIO_IOMMU_UNBIND;
> 
> 8.VFIO_GROUP_UNSET_CONTAINER;---here, have to sleep to wait another 
> process to finish works of the step 6;
> 
> 9. close(group); close(containner);
> 
> 
> So, my idea is: If it is possible to just release the params or facilities 
> that only belong to the current process while the process shutdown the device,
> and while the last process releases them all. Then, as in the above step 8, we
> don't need to wait, or maybe wait for a very long time if the other process 
> is doing lots of work on the device.
Given that you need to notify the mediating driver of IOMMU_BIND calls
as explained above, you could do something similar for shutdown: from
the mdev driver, call iommu_sva_shutdown_device() only for the last mdev.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-25 Thread Xu Zaibo

Hi,

On 2018/5/25 17:47, Jean-Philippe Brucker wrote:

On 25/05/18 03:39, Xu Zaibo wrote:

Hi,

On 2018/5/24 23:04, Jean-Philippe Brucker wrote:

On 24/05/18 13:35, Xu Zaibo wrote:

Right, sva_init() must be called once for any device that intends to use
bind(). For the second process though, group->sva_enabled will be true
so we won't call sva_init() again, only bind().

Well, while I create mediated devices based on one parent device to support 
multiple
processes(a new process will create a new 'vfio_group' for the corresponding 
mediated device,
and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown 
are basically
working on parent device, so, as a result, I just only need sva initiation and 
shutdown on the
parent device only once. So I change the two as following:

@@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned 
long features,
if (features & ~IOMMU_SVA_FEAT_IOPF)
   return -EINVAL;

+/* If already exists, do nothing  */
+mutex_lock(>iommu_param->lock);
+if (dev->iommu_param->sva_param) {
+mutex_unlock(>iommu_param->lock);
+return 0;
+}
+mutex_unlock(>iommu_param->lock);

   if (features & IOMMU_SVA_FEAT_IOPF) {
   ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,


@@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
   if (!domain)
   return -ENODEV;

+/* If any other process is working on the device, shut down does nothing. 
*/
+mutex_lock(>iommu_param->lock);
+if (!list_empty(>iommu_param->sva_param->mm_list)) {
+mutex_unlock(>iommu_param->lock);
+return 0;
+}
+mutex_unlock(>iommu_param->lock);

I don't think iommu-sva.c is the best place for this, it's probably
better to implement an intermediate layer (the mediating driver), that
calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
vfio-pci would still call these functions itself, but for mdev the
mediating driver keeps a refcount of groups, and calls device_shutdown()
only when freeing the last mdev.

A device driver (non mdev in this example) expects to be able to free
all its resources after sva_device_shutdown() returns. Imagine the
mm_list isn't empty (mm_exit() is running late), and instead of waiting
in unbind_dev_all() below, we return 0 immediately. Then the calling
driver frees its resources, and the mm_exit callback along with private
data passed to bind() disappear. If a mm_exit() is still running in
parallel, then it will try to access freed data and corrupt memory. So
in this function if mm_list isn't empty, the only thing we can do is wait.


I still don't understand why we should 'unbind_dev_all', is it possible
to do a 'unbind_dev_pasid'?

Not in sva_device_shutdown(), it needs to clean up everything. For
example you want to physically unplug the device, or assign it to a VM.
To prevent any leak sva_device_shutdown() needs to remove all bonds. In
theory there shouldn't be any, since either the driver did unbind_dev(),
or all process exited. This is a safety net.


Then we can do other things instead of waiting that user may not like. :)

They may not like it, but it's for their own good :) At the moment we're
waiting that:

* All exit_mm() callback for this device have finished. If we don't wait
   then the caller will free the private data passed to bind and the
   mm_exit() callback while they are still being used.

* All page requests targeting this device are dealt with. If we don't
   wait then some requests, that are lingering in the IOMMU PRI queue,
   may hit the next contexts bound to this device, possibly in a
   different VM. It may not be too risky (though probably exploitable in
   some way), but is incredibly messy.

All of this is bounded in time, and normally should be over pretty fast
unless the device driver's exit_mm() does something strange. If the
driver did the right thing, there shouldn't be any wait here (although
there may be one in unbind_dev() for the same reasons - prevent use
after free).


I guess there may be some misunderstandings :).

From the current patches, '/iommu_sva_device_shutdown/' is called by 
'/vfio_iommu_sva_shutdown/', which
is mainly used by '/vfio_iommu_type1_detach_group/' that is finally 
called by processes' release of vfio facilitiy
automatically or called by 'VFIO_GROUP_UNSET_CONTAINER' manually in the 
processes.


So, just image that 2 processes is working on the device with IOPF 
feature, and the 2 do the following to enable SVA:


/1.open("/dev/vfio/vfio") ;//
//
//   2.open the group of the devcie by calling open("/dev/vfio/x"), but 
now, //
// I think the second processes will fail to open the group because 
current VFIO thinks that one group only can be used by one process/vm,
 at present, I use mediated device to create more groups on the 
parent device to support multiple processes;//

/ //

/3.VFIO_GROUP_SET_CONTAINER;/

/

Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-25 Thread Jean-Philippe Brucker
On 25/05/18 03:39, Xu Zaibo wrote:
> Hi,
> 
> On 2018/5/24 23:04, Jean-Philippe Brucker wrote:
>> On 24/05/18 13:35, Xu Zaibo wrote:
 Right, sva_init() must be called once for any device that intends to use
 bind(). For the second process though, group->sva_enabled will be true
 so we won't call sva_init() again, only bind().
>>> Well, while I create mediated devices based on one parent device to support 
>>> multiple
>>> processes(a new process will create a new 'vfio_group' for the 
>>> corresponding mediated device,
>>> and 'sva_enabled' cannot work any more), in fact, *sva_init and 
>>> *sva_shutdown are basically
>>> working on parent device, so, as a result, I just only need sva initiation 
>>> and shutdown on the
>>> parent device only once. So I change the two as following:
>>>
>>> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned 
>>> long features,
>>>if (features & ~IOMMU_SVA_FEAT_IOPF)
>>>   return -EINVAL;
>>>
>>> +/* If already exists, do nothing  */
>>> +mutex_lock(>iommu_param->lock);
>>> +if (dev->iommu_param->sva_param) {
>>> +mutex_unlock(>iommu_param->lock);
>>> +return 0;
>>> +}
>>> +mutex_unlock(>iommu_param->lock);
>>>
>>>   if (features & IOMMU_SVA_FEAT_IOPF) {
>>>   ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
>>>
>>>
>>> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>>>   if (!domain)
>>>   return -ENODEV;
>>>
>>> +/* If any other process is working on the device, shut down does 
>>> nothing. */
>>> +mutex_lock(>iommu_param->lock);
>>> +if (!list_empty(>iommu_param->sva_param->mm_list)) {
>>> +mutex_unlock(>iommu_param->lock);
>>> +return 0;
>>> +}
>>> +mutex_unlock(>iommu_param->lock);
>> I don't think iommu-sva.c is the best place for this, it's probably
>> better to implement an intermediate layer (the mediating driver), that
>> calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
>> vfio-pci would still call these functions itself, but for mdev the
>> mediating driver keeps a refcount of groups, and calls device_shutdown()
>> only when freeing the last mdev.
>>
>> A device driver (non mdev in this example) expects to be able to free
>> all its resources after sva_device_shutdown() returns. Imagine the
>> mm_list isn't empty (mm_exit() is running late), and instead of waiting
>> in unbind_dev_all() below, we return 0 immediately. Then the calling
>> driver frees its resources, and the mm_exit callback along with private
>> data passed to bind() disappear. If a mm_exit() is still running in
>> parallel, then it will try to access freed data and corrupt memory. So
>> in this function if mm_list isn't empty, the only thing we can do is wait.
>>
> I still don't understand why we should 'unbind_dev_all', is it possible 
> to do a 'unbind_dev_pasid'?

Not in sva_device_shutdown(), it needs to clean up everything. For
example you want to physically unplug the device, or assign it to a VM.
To prevent any leak sva_device_shutdown() needs to remove all bonds. In
theory there shouldn't be any, since either the driver did unbind_dev(),
or all process exited. This is a safety net.

> Then we can do other things instead of waiting that user may not like. :)

They may not like it, but it's for their own good :) At the moment we're
waiting that:

* All exit_mm() callback for this device have finished. If we don't wait
  then the caller will free the private data passed to bind and the
  mm_exit() callback while they are still being used.

* All page requests targeting this device are dealt with. If we don't
  wait then some requests, that are lingering in the IOMMU PRI queue,
  may hit the next contexts bound to this device, possibly in a
  different VM. It may not be too risky (though probably exploitable in
  some way), but is incredibly messy.

All of this is bounded in time, and normally should be over pretty fast
unless the device driver's exit_mm() does something strange. If the
driver did the right thing, there shouldn't be any wait here (although
there may be one in unbind_dev() for the same reasons - prevent use
after free).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-24 Thread Xu Zaibo

Hi,

On 2018/5/24 23:04, Jean-Philippe Brucker wrote:

On 24/05/18 13:35, Xu Zaibo wrote:

Right, sva_init() must be called once for any device that intends to use
bind(). For the second process though, group->sva_enabled will be true
so we won't call sva_init() again, only bind().

Well, while I create mediated devices based on one parent device to support 
multiple
processes(a new process will create a new 'vfio_group' for the corresponding 
mediated device,
and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown 
are basically
working on parent device, so, as a result, I just only need sva initiation and 
shutdown on the
parent device only once. So I change the two as following:

@@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned 
long features,
   if (features & ~IOMMU_SVA_FEAT_IOPF)
  return -EINVAL;

+/* If already exists, do nothing  */
+mutex_lock(>iommu_param->lock);
+if (dev->iommu_param->sva_param) {
+mutex_unlock(>iommu_param->lock);
+return 0;
+}
+mutex_unlock(>iommu_param->lock);

  if (features & IOMMU_SVA_FEAT_IOPF) {
  ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,


@@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
  if (!domain)
  return -ENODEV;

+/* If any other process is working on the device, shut down does nothing. 
*/
+mutex_lock(>iommu_param->lock);
+if (!list_empty(>iommu_param->sva_param->mm_list)) {
+mutex_unlock(>iommu_param->lock);
+return 0;
+}
+mutex_unlock(>iommu_param->lock);

I don't think iommu-sva.c is the best place for this, it's probably
better to implement an intermediate layer (the mediating driver), that
calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
vfio-pci would still call these functions itself, but for mdev the
mediating driver keeps a refcount of groups, and calls device_shutdown()
only when freeing the last mdev.

A device driver (non mdev in this example) expects to be able to free
all its resources after sva_device_shutdown() returns. Imagine the
mm_list isn't empty (mm_exit() is running late), and instead of waiting
in unbind_dev_all() below, we return 0 immediately. Then the calling
driver frees its resources, and the mm_exit callback along with private
data passed to bind() disappear. If a mm_exit() is still running in
parallel, then it will try to access freed data and corrupt memory. So
in this function if mm_list isn't empty, the only thing we can do is wait.

I still don't understand why we should 'unbind_dev_all', is it possible 
to do a 'unbind_dev_pasid'?

Then we can do other things instead of waiting that user may not like. :)

Thanks
Zaibo



+
  __iommu_sva_unbind_dev_all(dev);

  mutex_lock(>iommu_param->lock);

I add the above two checkings in both *sva_init and *sva_shutdown, it is 
working now,
but i don't know if it will cause any new problems. What's more, i doubt if it 
is
reasonable to check this to avoid repeating operation in VFIO, maybe it is 
better to check
in IOMMU. :)




.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-24 Thread Jean-Philippe Brucker
On 24/05/18 13:35, Xu Zaibo wrote:
>> Right, sva_init() must be called once for any device that intends to use
>> bind(). For the second process though, group->sva_enabled will be true
>> so we won't call sva_init() again, only bind().
> 
> Well, while I create mediated devices based on one parent device to support 
> multiple
> processes(a new process will create a new 'vfio_group' for the corresponding 
> mediated device,
> and 'sva_enabled' cannot work any more), in fact, *sva_init and *sva_shutdown 
> are basically
> working on parent device, so, as a result, I just only need sva initiation 
> and shutdown on the
> parent device only once. So I change the two as following:
> 
> @@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, unsigned 
> long features,
>   if (features & ~IOMMU_SVA_FEAT_IOPF)
>  return -EINVAL;
> 
> +/* If already exists, do nothing  */
> +mutex_lock(>iommu_param->lock);
> +if (dev->iommu_param->sva_param) {
> +mutex_unlock(>iommu_param->lock);
> +return 0;
> +}
> +mutex_unlock(>iommu_param->lock);
> 
>  if (features & IOMMU_SVA_FEAT_IOPF) {
>  ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
> 
> 
> @@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)
>  if (!domain)
>  return -ENODEV;
> 
> +/* If any other process is working on the device, shut down does 
> nothing. */
> +mutex_lock(>iommu_param->lock);
> +if (!list_empty(>iommu_param->sva_param->mm_list)) {
> +mutex_unlock(>iommu_param->lock);
> +return 0;
> +}
> +mutex_unlock(>iommu_param->lock);

I don't think iommu-sva.c is the best place for this, it's probably
better to implement an intermediate layer (the mediating driver), that
calls iommu_sva_device_init() and iommu_sva_device_shutdown() once. Then
vfio-pci would still call these functions itself, but for mdev the
mediating driver keeps a refcount of groups, and calls device_shutdown()
only when freeing the last mdev.

A device driver (non mdev in this example) expects to be able to free
all its resources after sva_device_shutdown() returns. Imagine the
mm_list isn't empty (mm_exit() is running late), and instead of waiting
in unbind_dev_all() below, we return 0 immediately. Then the calling
driver frees its resources, and the mm_exit callback along with private
data passed to bind() disappear. If a mm_exit() is still running in
parallel, then it will try to access freed data and corrupt memory. So
in this function if mm_list isn't empty, the only thing we can do is wait.

Thanks,
Jean

> +
>  __iommu_sva_unbind_dev_all(dev);
> 
>  mutex_lock(>iommu_param->lock);
> 
> I add the above two checkings in both *sva_init and *sva_shutdown, it is 
> working now,
> but i don't know if it will cause any new problems. What's more, i doubt if 
> it is
> reasonable to check this to avoid repeating operation in VFIO, maybe it is 
> better to check
> in IOMMU. :)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-24 Thread Xu Zaibo



On 2018/5/24 19:44, Jean-Philippe Brucker wrote:

Hi,

On 23/05/18 10:38, Xu Zaibo wrote:

+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+ struct vfio_group *group,
+ struct vfio_mm *vfio_mm)
+{
+int ret;
+bool enabled_sva = false;
+struct vfio_iommu_sva_bind_data data = {
+.vfio_mm= vfio_mm,
+.iommu= iommu,
+.count= 0,
+};
+
+if (!group->sva_enabled) {
+ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+   vfio_iommu_sva_init);

Do we need to do *sva_init here or do anything to avoid repeated
initiation?
while another process already did initiation at this device, I think
that current process will get an EEXIST.

Right, sva_init() must be called once for any device that intends to use
bind(). For the second process though, group->sva_enabled will be true
so we won't call sva_init() again, only bind().

Well, while I create mediated devices based on one parent device to 
support multiple
processes(a new process will create a new 'vfio_group' for the 
corresponding mediated device,
and 'sva_enabled' cannot work any more), in fact, *sva_init and 
*sva_shutdown are basically
working on parent device, so, as a result, I just only need sva 
initiation and shutdown on the

parent device only once. So I change the two as following:

/@@ -551,8 +565,18 @@ int iommu_sva_device_init(struct device *dev, 
unsigned long features,/

  if (features & ~IOMMU_SVA_FEAT_IOPF)
 return -EINVAL;

/+/* If already exists, do nothing  *///
//+mutex_lock(>iommu_param->lock);//
//+if (dev->iommu_param->sva_param) {//
//+mutex_unlock(>iommu_param->lock);//
//+return 0;//
//+}//
//+mutex_unlock(>iommu_param->lock);//

// if (features & IOMMU_SVA_FEAT_IOPF) {//
// ret = iommu_register_device_fault_handler(dev, 
iommu_queue_iopf,/ /

//
//
//@@ -621,6 +646,14 @@ int iommu_sva_device_shutdown(struct device *dev)//
// if (!domain)//
// return -ENODEV;//

//+/* If any other process is working on the device, shut down does 
nothing. *///

//+mutex_lock(>iommu_param->lock);//
//+if (!list_empty(>iommu_param->sva_param->mm_list)) {//
//+mutex_unlock(>iommu_param->lock);//
//+return 0;//
//+}//
//+mutex_unlock(>iommu_param->lock);//
//+//
// __iommu_sva_unbind_dev_all(dev);//

// mutex_lock(>iommu_param->lock);/

I add the above two checkings in both *sva_init and *sva_shutdown, it is 
working now,
but i don't know if it will cause any new problems. What's more, i doubt 
if it is
reasonable to check this to avoid repeating operation in VFIO, maybe it 
is better to check

in IOMMU. :)

Thanks
Zaibo


.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-24 Thread Jean-Philippe Brucker
Hi,

On 23/05/18 10:38, Xu Zaibo wrote:
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> + struct vfio_group *group,
>> + struct vfio_mm *vfio_mm)
>> +{
>> +    int ret;
>> +    bool enabled_sva = false;
>> +    struct vfio_iommu_sva_bind_data data = {
>> +    .vfio_mm    = vfio_mm,
>> +    .iommu    = iommu,
>> +    .count    = 0,
>> +    };
>> +
>> +    if (!group->sva_enabled) {
>> +    ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +   vfio_iommu_sva_init);
> Do we need to do *sva_init here or do anything to avoid repeated
> initiation?
> while another process already did initiation at this device, I think
> that current process will get an EEXIST.

Right, sva_init() must be called once for any device that intends to use
bind(). For the second process though, group->sva_enabled will be true
so we won't call sva_init() again, only bind().

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-23 Thread Xu Zaibo

Hi,

On 2018/5/12 3:06, Jean-Philippe Brucker wrote:

Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
Process Address Space ID (PASID). Devices in the container append this
PASID to DMA transactions in order to access the process' address space.
The process page tables are shared with the IOMMU, and mechanisms such as
PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
bond created with VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker 






+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+struct vfio_group *group,
+struct vfio_mm *vfio_mm)
+{
+   int ret;
+   bool enabled_sva = false;
+   struct vfio_iommu_sva_bind_data data = {
+   .vfio_mm= vfio_mm,
+   .iommu  = iommu,
+   .count  = 0,
+   };
+
+   if (!group->sva_enabled) {
+   ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+  vfio_iommu_sva_init);

Do we need to do *sva_init here or do anything to avoid repeated initiation?
while another process already did initiation at this device, I think 
that current process will get an EEXIST.


Thanks.

+   if (ret)
+   return ret;
+
+   group->sva_enabled = enabled_sva = true;
+   }
+
+   ret = iommu_group_for_each_dev(group->iommu_group, ,
+  vfio_iommu_sva_bind_dev);
+   if (ret && data.count > 1)
+   iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+vfio_iommu_sva_unbind_dev);
+   if (ret && enabled_sva) {
+   iommu_group_for_each_dev(group->iommu_group, NULL,
+vfio_iommu_sva_shutdown);
+   group->sva_enabled = false;
+   }
+
+   return ret;
+}


  
@@ -1442,6 +1636,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,

if (ret)
goto out_detach;
  
+	ret = vfio_iommu_replay_bind(iommu, group);

+   if (ret)
+   goto out_detach;
+
if (resv_msi) {
ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
if (ret)
@@ -1547,6 +1745,11 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
continue;
  
  		iommu_detach_group(domain->domain, iommu_group);

+   if (group->sva_enabled) {
+   iommu_group_for_each_dev(iommu_group, NULL,
+vfio_iommu_sva_shutdown);
+   group->sva_enabled = false;
+   }
Here, why shut down here? If another process is working on the device, 
there may be a crash?


Thanks.

list_del(>next);
kfree(group);
/*
@@ -1562,6 +1765,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-21 Thread Jean-Philippe Brucker
On 17/05/18 16:58, Jonathan Cameron wrote:
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> + struct vfio_group *group,
>> + struct vfio_mm *vfio_mm)
>> +{
>> +int ret;
>> +bool enabled_sva = false;
>> +struct vfio_iommu_sva_bind_data data = {
>> +.vfio_mm= vfio_mm,
>> +.iommu  = iommu,
>> +.count  = 0,
>> +};
>> +
>> +if (!group->sva_enabled) {
>> +ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +   vfio_iommu_sva_init);
>> +if (ret)
>> +return ret;
>> +
>> +group->sva_enabled = enabled_sva = true;
>> +}
>> +
>> +ret = iommu_group_for_each_dev(group->iommu_group, ,
>> +   vfio_iommu_sva_bind_dev);
>> +if (ret && data.count > 1)
> 
> Are we safe to run this even if data.count == 1?  I assume that at
> that point we would always not have an iommu domain associated with the
> device so the initial check would error out.

If data.count == 1, then the first bind didn't succeed. But it's not
necessarily a domain missing, failure to bind may come from various
places. If this vfio_mm was already bound to another device then it
contains a valid PASID and it's safe to call unbind(). Otherwise we call
it with PASID -1 (VFIO_INVALID_PASID) and that's a bit of a grey area.
-1 is currently invalid everywhere, but in the future someone might
implement 32 bits of PASIDs, in which case a bond between this dev and
PASID -1 might exist. But I think it's safe for now, and whoever
redefines VFIO_INVALID_PASID when such implementation appears will also
fix this case.

> Just be nice to get rid of the special casing in this error path as then
> could just do it all under if (ret) and make it visually clearer these
> are different aspects of the same error path.

[...]
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1728,6 +2097,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  return copy_to_user((void __user *)arg, , minsz) ?
>>  -EFAULT : 0;
>> +
>> +} else if (cmd == VFIO_IOMMU_BIND) {
>> +struct vfio_iommu_type1_bind bind;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (bind.argsz < minsz)
>> +return -EINVAL;
>> +
>> +switch (bind.flags) {
>> +case VFIO_IOMMU_BIND_PROCESS:
>> +return vfio_iommu_type1_bind_process(iommu, (void *)arg,
>> + );
> 
> Can we combine these two blocks given it is only this case statement that is 
> different?

That would be nicer, though I don't know yet what's needed for vSVA (by
Yi Liu on Cc), which will add statements to the switches.

Thanks,
Jean

> 
>> +default:
>> +return -EINVAL;
>> +}
>> +
>> +} else if (cmd == VFIO_IOMMU_UNBIND) {
>> +struct vfio_iommu_type1_bind bind;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_bind, flags);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (bind.argsz < minsz)
>> +return -EINVAL;
>> +
>> +switch (bind.flags) {
>> +case VFIO_IOMMU_BIND_PROCESS:
>> +return vfio_iommu_type1_unbind_process(iommu, (void 
>> *)arg,
>> +   );
>> +default:
>> +return -EINVAL;
>> +}
>>  }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-17 Thread Jonathan Cameron
On Fri, 11 May 2018 20:06:14 +0100
Jean-Philippe Brucker  wrote:

> Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> Process Address Space ID (PASID). Devices in the container append this
> PASID to DMA transactions in order to access the process' address space.
> The process page tables are shared with the IOMMU, and mechanisms such as
> PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
> bond created with VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker 

A couple of small comments inline..

Jonathan

> 
> ---
> v1->v2:
> * Simplify mm_exit
> * Can be built as module again
> ---
>  drivers/vfio/vfio_iommu_type1.c | 449 ++--
>  include/uapi/linux/vfio.h   |  76 ++
>  2 files changed, 504 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5c212bf29640..2902774062b8 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>   struct list_headdomain_list;
> + struct list_headmm_list;
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> @@ -90,6 +92,14 @@ struct vfio_dma {
>  struct vfio_group {
>   struct iommu_group  *iommu_group;
>   struct list_headnext;
> + boolsva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID   (-1)
> + int pasid;
> + struct mm_struct*mm;
> + struct list_headnext;
>  };
>  
>  /*
> @@ -1238,6 +1248,164 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>   return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> + struct vfio_mm *vfio_mm;
> + struct vfio_iommu *iommu = data;
> +
> + mutex_lock(>lock);
> + list_for_each_entry(vfio_mm, >mm_list, next) {
> + if (vfio_mm->pasid == pasid) {
> + list_del(_mm->next);
> + kfree(vfio_mm);
> + break;
> + }
> + }
> + mutex_unlock(>lock);
> +
> + return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{
> + return iommu_sva_device_init(dev, IOMMU_SVA_FEAT_IOPF, 0,
> +  vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> + iommu_sva_device_shutdown(dev);
> +
> + return 0;
> +}
> +
> +struct vfio_iommu_sva_bind_data {
> + struct vfio_mm *vfio_mm;
> + struct vfio_iommu *iommu;
> + int count;
> +};
> +
> +static int vfio_iommu_sva_bind_dev(struct device *dev, void *data)
> +{
> + struct vfio_iommu_sva_bind_data *bind_data = data;
> +
> + /* Multi-device groups aren't support for SVA */
> + if (bind_data->count++)
> + return -EINVAL;
> +
> + return __iommu_sva_bind_device(dev, bind_data->vfio_mm->mm,
> +_data->vfio_mm->pasid,
> +IOMMU_SVA_FEAT_IOPF, bind_data->iommu);
> +}
> +
> +static int vfio_iommu_sva_unbind_dev(struct device *dev, void *data)
> +{
> + struct vfio_mm *vfio_mm = data;
> +
> + return __iommu_sva_unbind_device(dev, vfio_mm->pasid);
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +  struct vfio_group *group,
> +  struct vfio_mm *vfio_mm)
> +{
> + int ret;
> + bool enabled_sva = false;
> + struct vfio_iommu_sva_bind_data data = {
> + .vfio_mm= vfio_mm,
> + .iommu  = iommu,
> + .count  = 0,
> + };
> +
> + if (!group->sva_enabled) {
> + ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +vfio_iommu_sva_init);
> + if (ret)
> + return ret;
> +
> + group->sva_enabled = enabled_sva = true;
> + }
> +
> + ret = iommu_group_for_each_dev(group->iommu_group, ,
> +vfio_iommu_sva_bind_dev);
> + if (ret && data.count > 1)

Are we safe to run this even if data.count == 1?  I assume that at
that point we would always not have an iommu domain associated with the
device so the initial check would error out.

Just be nice to get rid of the special casing in this error path as then
could just do it all under if (ret) and make it visually clearer these
are different 

[PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-05-11 Thread Jean-Philippe Brucker
Add two new ioctls for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
Process Address Space ID (PASID). Devices in the container append this
PASID to DMA transactions in order to access the process' address space.
The process page tables are shared with the IOMMU, and mechanisms such as
PCI ATS/PRI are used to handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a
bond created with VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker 

---
v1->v2:
* Simplify mm_exit
* Can be built as module again
---
 drivers/vfio/vfio_iommu_type1.c | 449 ++--
 include/uapi/linux/vfio.h   |  76 ++
 2 files changed, 504 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5c212bf29640..2902774062b8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
struct list_headdomain_list;
+   struct list_headmm_list;
struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
@@ -90,6 +92,14 @@ struct vfio_dma {
 struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
+   boolsva_enabled;
+};
+
+struct vfio_mm {
+#define VFIO_PASID_INVALID (-1)
+   int pasid;
+   struct mm_struct*mm;
+   struct list_headnext;
 };
 
 /*
@@ -1238,6 +1248,164 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
return 0;
 }
 
+static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
+{
+   struct vfio_mm *vfio_mm;
+   struct vfio_iommu *iommu = data;
+
+   mutex_lock(>lock);
+   list_for_each_entry(vfio_mm, >mm_list, next) {
+   if (vfio_mm->pasid == pasid) {
+   list_del(_mm->next);
+   kfree(vfio_mm);
+   break;
+   }
+   }
+   mutex_unlock(>lock);
+
+   return 0;
+}
+
+static int vfio_iommu_sva_init(struct device *dev, void *data)
+{
+   return iommu_sva_device_init(dev, IOMMU_SVA_FEAT_IOPF, 0,
+vfio_iommu_mm_exit);
+}
+
+static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
+{
+   iommu_sva_device_shutdown(dev);
+
+   return 0;
+}
+
+struct vfio_iommu_sva_bind_data {
+   struct vfio_mm *vfio_mm;
+   struct vfio_iommu *iommu;
+   int count;
+};
+
+static int vfio_iommu_sva_bind_dev(struct device *dev, void *data)
+{
+   struct vfio_iommu_sva_bind_data *bind_data = data;
+
+   /* Multi-device groups aren't support for SVA */
+   if (bind_data->count++)
+   return -EINVAL;
+
+   return __iommu_sva_bind_device(dev, bind_data->vfio_mm->mm,
+  _data->vfio_mm->pasid,
+  IOMMU_SVA_FEAT_IOPF, bind_data->iommu);
+}
+
+static int vfio_iommu_sva_unbind_dev(struct device *dev, void *data)
+{
+   struct vfio_mm *vfio_mm = data;
+
+   return __iommu_sva_unbind_device(dev, vfio_mm->pasid);
+}
+
+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+struct vfio_group *group,
+struct vfio_mm *vfio_mm)
+{
+   int ret;
+   bool enabled_sva = false;
+   struct vfio_iommu_sva_bind_data data = {
+   .vfio_mm= vfio_mm,
+   .iommu  = iommu,
+   .count  = 0,
+   };
+
+   if (!group->sva_enabled) {
+   ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+  vfio_iommu_sva_init);
+   if (ret)
+   return ret;
+
+   group->sva_enabled = enabled_sva = true;
+   }
+
+   ret = iommu_group_for_each_dev(group->iommu_group, ,
+  vfio_iommu_sva_bind_dev);
+   if (ret && data.count > 1)
+   iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+vfio_iommu_sva_unbind_dev);
+   if (ret && enabled_sva) {
+   iommu_group_for_each_dev(group->iommu_group, NULL,
+vfio_iommu_sva_shutdown);
+   group->sva_enabled = false;
+   }
+
+   return ret;
+}
+
+static void vfio_iommu_unbind_group(struct vfio_group *group,
+   struct vfio_mm *vfio_mm)
+{
+   iommu_group_for_each_dev(group->iommu_group, vfio_mm,
+vfio_iommu_sva_unbind_dev);
+}
+