Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-03-02 Thread Jean-Philippe Brucker
On 01/03/18 03:03, Liu, Yi L wrote:
> Hi Jean,
> 
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Thursday, February 15, 2018 8:41 PM
>> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
>>
>> On 13/02/18 23:34, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 8:57 PM
>>>>
>>>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>>>> From: Jean-Philippe Brucker
>>>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>>>
>>>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>>>> can
>>>>>> use them to share process page tables with their devices.
>>>>>> bind_group() is provided for VFIO's convenience, as it needs to
>>>>>> provide a coherent interface on containers. Other device drivers
>>>>>> will most likely want to use bind_device(), which binds a single device 
>>>>>> in the
>> group.
>>>>>
>>>>> I saw your bind_group implementation tries to bind the address space
>>>>> for all devices within a group, which IMO has some problem. Based on
>>>> PCIe
>>>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>>>> since devices within same group cannot be isolated based on
>>>>> requestor-
>>>> ID
>>>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>>>> devices
>>>>> could cause undesired p2p.
>>>> But so does enabling "classic" DMA... If two devices are not
>>>> protected by ACS for example, they are put in the same IOMMU group,
>>>> and one device might be able to snoop the other's DMA. VFIO allows
>>>> userspace to create a container for them and use MAP/UNMAP, but makes
>>>> it explicit to the user that for DMA, these devices are not isolated
>>>> and must be considered as a single device (you can't pass them to
>>>> different VMs or put them in different containers). So I tried to
>>>> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
>>>> operations on the VFIO container instead of the device.
>>>
>>> there is a small difference. for classic DMA we can reserve PCI BARs
>>> when allocating IOVA, thus multiple devices in the same group can
>>> still work correctly applied with same translation, if isolation is
>>> not cared in between. However for SVA it's CPU virtual addresses
>>> managed by kernel mm thus difficult to introduce similar address
>>> reservation. Then it's possible for a VA falling into other device's
>>> BAR in the same group and cause undesired p2p traffic. In such regard,
>>> SVA is actually functionally-broken.
>>
>> I think the problem exists even if there is a single device in the group.
>> If for example, malloc() returns a VA that corresponds to a PCI host bridge 
>> in IOVA
>> space, performing DMA on that buffer won't reach the IOMMU and will cause
>> undesirable side-effects.
> 
> If only a single device in a group, should it mean there is ACS support in
> the path from this device to root complex? It means any memory request
> from this device would be upstreamed to root complex, thus it should be
> able to avoid undesired p2p traffics. So I intend to believe, even we do
> bind in group level, we actually expect to make it work only for the case
> where a single device within a group.

Yes if each device has its own group then ACS is properly enabled.

Even without thinking about ACS or p2p, all memory requests don't
necessarily make it to the IOMMU. For example transactions targeting the
PCI host bridge MMIO window (marked as RESV_RESERVED by dma-iommu.c),
may get eaten by the RC and not reach the IOMMU (I'm blindly following
the code here, don't have anything in the spec to back me up). Commit
fade1ec055dc also refers to "faults, corruption and other badness"
though I don't know if that's only for PCI or could also affect future
systems.

And I don't think prefixing transactions with a PASID changes the
situation. I couldn't find anything in the PCIe spec contradicting it
and I guess it's up to the root complex implementation. So I tend to
take a conservative approach and assume that RESV_RESERVED regions will
also apply to PASID-prefixed traffic.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-03-02 Thread Jean-Philippe Brucker
On 28/02/18 20:34, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
>> +{
>> +struct group_device *device;
>> +
>> +mutex_lock(>mutex);
>> +list_for_each_entry(device, >devices, list)
>> +iommu_sva_unbind_device(device->dev, pasid);
>> +mutex_unlock(>mutex);
>> +
>> +return 0;
>> +}
> 
> I think we should handle the errors returned by iommu_sva_unbind_device() here
> or at least print a warning if we want to still continue unbinding. 

Agreed, though bind_group/unbind_group are probably going away in next
series

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-28 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
> >>>> From: Jean-Philippe Brucker
> >>>> Sent: Tuesday, February 13, 2018 2:33 AM
> >>>>
> >>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
> >>>> use them to share process page tables with their devices.
> >>>> bind_group() is provided for VFIO's convenience, as it needs to
> >>>> provide a coherent interface on containers. Other device drivers
> >>>> will most likely want to use bind_device(), which binds a single device 
> >>>> in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge 
> in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-28 Thread Sinan Kaya
On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> + struct group_device *device;
> +
> + mutex_lock(>mutex);
> + list_for_each_entry(device, >devices, list)
> + iommu_sva_unbind_device(device->dev, pasid);
> + mutex_unlock(>mutex);
> +
> + return 0;
> +}

I think we should handle the errors returned by iommu_sva_unbind_device() here
or at least print a warning if we want to still continue unbinding. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-15 Thread Jean-Philippe Brucker
On 15/02/18 10:21, j...@8bytes.org wrote:
> On Tue, Feb 13, 2018 at 12:57:23PM +, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> 
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

As I understood it ACS also isolate functions within a device, for example
the two PFs of my ixgbe card are in different groups. Strangely all VFs go
in the same group, I haven't investigated why yet.

> Before this causes us big headaches I suggest to only provide the
> bind_device() function.

Ok. I added bind_group() to make it easier for VFIO - so if one bind()
fails in the group, iommu.c can rollback and remove the bonds already
created. If we mandate a single device in the group for SVA, then VFIO can
use iommu_group_for_each_dev() and ensure that the callback was only
called once.

> This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
> 
> IOMMU-groups provide two types of isolation:
> 
>   1) They group devices together which the IOMMU can't distinguish
>  from each other, like PCI devices behind a PCIe bridge.
> 
>   2) Devices that can't be isolated from each other are also put
>  into the same group. This is the case for multi-function PCIe
>  devices as well as all PCIe devices behind a non-ACS bridge.
>  But all these devices cann still be distinguished by the
>  IOMMU.

But transactions don't necessarily reach the IOMMU if devices are not
isolated by ACS. So even if you disable all translation in the IOMMU for
one device in the group, it may still have a view of address spaces shared
with another device in that group.

> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).

I'm not as optimistic that we won't need IOMMU groups with SVA devices for
2) (hardware bugs, integration issues, etc). I'd be more comfortable if we
added a sanity-check as suggested by Kevin, to ensure that SVA is
disallowed if multiple devices are in the group.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-15 Thread Jean-Philippe Brucker
On 13/02/18 23:34, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:57 PM
>>
>> On 13/02/18 07:54, Tian, Kevin wrote:
 From: Jean-Philippe Brucker
 Sent: Tuesday, February 13, 2018 2:33 AM

 Add bind() and unbind() operations to the IOMMU API. Device drivers
>> can
 use them to share process page tables with their devices. bind_group()
 is provided for VFIO's convenience, as it needs to provide a coherent
 interface on containers. Other device drivers will most likely want to
 use bind_device(), which binds a single device in the group.
>>>
>>> I saw your bind_group implementation tries to bind the address space
>>> for all devices within a group, which IMO has some problem. Based on
>> PCIe
>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>> since devices within same group cannot be isolated based on requestor-
>> ID
>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>> devices
>>> could cause undesired p2p.
>> But so does enabling "classic" DMA... If two devices are not protected by
>> ACS for example, they are put in the same IOMMU group, and one device
>> might be able to snoop the other's DMA. VFIO allows userspace to create a
>> container for them and use MAP/UNMAP, but makes it explicit to the user
>> that for DMA, these devices are not isolated and must be considered as a
>> single device (you can't pass them to different VMs or put them in
>> different containers). So I tried to keep the same idea as MAP/UNMAP for
>> SVA, performing BIND/UNBIND operations on the VFIO container instead of
>> the device.
> 
> there is a small difference. for classic DMA we can reserve PCI BARs 
> when allocating IOVA, thus multiple devices in the same group can 
> still work correctly applied with same translation, if isolation is not
> cared in between. However for SVA it's CPU virtual addresses 
> managed by kernel mm thus difficult to introduce similar address 
> reservation. Then it's possible for a VA falling into other device's 
> BAR in the same group and cause undesired p2p traffic. In such 
> regard, SVA is actually functionally-broken.

I think the problem exists even if there is a single device in the group.
If for example, malloc() returns a VA that corresponds to a PCI host
bridge in IOVA space, performing DMA on that buffer won't reach the IOMMU
and will cause undesirable side-effects.

My series doesn't address the problem, but I believe we should carve
reserved regions out of the process address space during bind(), for
example by creating a PROT_NONE vma preventing userspace from obtaining
that VA.

If you solve this problem, you also solve it for multiple devices in a
group, because the IOMMU core provides the resv API on groups... That's
until you hotplug a device into a live group (currently WARN in VFIO),
with different resv regions.

>> I kept the analogy simple though, because I don't think there will be many
>> SVA-capable systems that require IOMMU groups. They will likely
> 
> I agree that multiple SVA-capable devices in same IOMMU group is not
> a typical configuration, especially it's usually observed on new devices.
> Then based on above limitation, I think we could just explicitly avoid
> enabling SVA in such case. :-)

I'd certainly like that :)

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-15 Thread Christian König

Am 15.02.2018 um 11:21 schrieb j...@8bytes.org:

On Tue, Feb 13, 2018 at 12:57:23PM +, Jean-Philippe Brucker wrote:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).


Yeah, but SVA only applies to rather new AMD-GPUs, which in turn can 
only do PCIe and there the problem doesn't seems to exist any more.


E.g. the audio device on my Vega10 gets a separate group despite being 
behind several bridges:
0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Vega 10 [Radeon Vega Frontier Edition]

0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8

...

[    6.362665] iommu: Adding device :0b:00.0 to group 14
[    6.368468] iommu: Using direct mapping for device :0b:00.0
[    6.380040] iommu: Adding device :0b:00.1 to group 15


Regards,
Christian.



Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

1) They group devices together which the IOMMU can't distinguish
   from each other, like PCI devices behind a PCIe bridge.

2) Devices that can't be isolated from each other are also put
   into the same group. This is the case for multi-function PCIe
   devices as well as all PCIe devices behind a non-ACS bridge.
   But all these devices cann still be distinguished by the
   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



Joerg



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-15 Thread j...@8bytes.org
On Tue, Feb 13, 2018 at 12:57:23PM +, Jean-Philippe Brucker wrote:
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).

Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

1) They group devices together which the IOMMU can't distinguish
   from each other, like PCI devices behind a PCIe bridge.

2) Devices that can't be isolated from each other are also put
   into the same group. This is the case for multi-function PCIe
   devices as well as all PCIe devices behind a non-ACS bridge.
   But all these devices cann still be distinguished by the
   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-13 Thread Tian, Kevin
> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:57 PM
> 
> On 13/02/18 07:54, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 2:33 AM
> >>
> >> Add bind() and unbind() operations to the IOMMU API. Device drivers
> can
> >> use them to share process page tables with their devices. bind_group()
> >> is provided for VFIO's convenience, as it needs to provide a coherent
> >> interface on containers. Other device drivers will most likely want to
> >> use bind_device(), which binds a single device in the group.
> >
> > I saw your bind_group implementation tries to bind the address space
> > for all devices within a group, which IMO has some problem. Based on
> PCIe
> > spec, packet routing on the bus doesn't take PASID into consideration.
> > since devices within same group cannot be isolated based on requestor-
> ID
> > i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> devices
> > could cause undesired p2p.
> But so does enabling "classic" DMA... If two devices are not protected by
> ACS for example, they are put in the same IOMMU group, and one device
> might be able to snoop the other's DMA. VFIO allows userspace to create a
> container for them and use MAP/UNMAP, but makes it explicit to the user
> that for DMA, these devices are not isolated and must be considered as a
> single device (you can't pass them to different VMs or put them in
> different containers). So I tried to keep the same idea as MAP/UNMAP for
> SVA, performing BIND/UNBIND operations on the VFIO container instead of
> the device.

there is a small difference. for classic DMA we can reserve PCI BARs 
when allocating IOVA, thus multiple devices in the same group can 
still work correctly applied with same translation, if isolation is not
cared in between. However for SVA it's CPU virtual addresses 
managed by kernel mm thus difficult to introduce similar address 
reservation. Then it's possible for a VA falling into other device's 
BAR in the same group and cause undesired p2p traffic. In such 
regard, SVA is actually functionally-broken.

> 
> I kept the analogy simple though, because I don't think there will be many
> SVA-capable systems that require IOMMU groups. They will likely

I agree that multiple SVA-capable devices in same IOMMU group is not
a typical configuration, especially it's usually observed on new devices.
Then based on above limitation, I think we could just explicitly avoid
enabling SVA in such case. :-)

> implement
> proper device isolation. Unlike iommu_attach_device(), bind_device()
> doesn't call bind_group(), because keeping bonds consistent in groups is
> complicated, not worth implementing (drivers can explicitly bind() all
> devices that need it) and probably wouldn't ever be used. I also can't
> test it. But maybe we could implement the following for now:
> 
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.
> 
> * hotplugging a device into a group still succeeds even if the group
> already has mm bonds. Same happens for classic DMA, a hotplugged
> device
> will have access to all mappings already present in the domain.
> 
> > If my understanding of PCIe spec is correct, probably we should fail
> > calling bind_group()/bind_device() when there are multiple devices within
> > the given group. If only one device then bind_group is essentially a
> wrapper
> > to bind_device.>>
> >> Regardless of the IOMMU group or domain a device is in, device drivers
> >> should call bind() for each device that will use the PASID.
> >>
> [...]
> >> +/**
> >> + * iommu_sva_bind_device() - Bind a process address space to a device
> >> + * @dev: the device
> >> + * @mm: the mm to bind, caller must hold a reference to it
> >> + * @pasid: valid address where the PASID will be stored
> >> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> >> + * @drvdata: private data passed to the mm exit handler
> >> + *
> >> + * Create a bond between device and task, allowing the device to access
> >> the mm
> >> + * using the returned PASID. A subsequent bind() for the same device
> and
> >> mm will
> >> + * reuse the bond (and return the same PASID), but users will have to
> call
> >> + * unbind() twice.
> >
> > what's the point of requiring unbind twice?
> 
> Mmh, that was necessary when we kept bond information as domain<-
> >mm, but
> since it's now device<->mm, we can probably remove the bond refcount. I
> consider that a bind() between a given device and mm will always be 

Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-13 Thread Jean-Philippe Brucker
On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-12 Thread Tian, Kevin
> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Add bind() and unbind() operations to the IOMMU API. Device drivers can
> use them to share process page tables with their devices. bind_group()
> is provided for VFIO's convenience, as it needs to provide a coherent
> interface on containers. Other device drivers will most likely want to
> use bind_device(), which binds a single device in the group.

I saw your bind_group implementation tries to bind the address space
for all devices within a group, which IMO has some problem. Based on PCIe
spec, packet routing on the bus doesn't take PASID into consideration. 
since devices within same group cannot be isolated based on requestor-ID
i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
could cause undesired p2p.
 
If my understanding of PCIe spec is correct, probably we should fail 
calling bind_group()/bind_device() when there are multiple devices within 
the given group. If only one device then bind_group is essentially a wrapper
to bind_device.

> 
> Regardless of the IOMMU group or domain a device is in, device drivers
> should call bind() for each device that will use the PASID.
> 
> This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu-sva.c | 105
> ++
>  drivers/iommu/iommu.c |  63 
>  include/linux/iommu.h |  36 
>  3 files changed, 204 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index cab5d723520f..593685d891bf 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -9,6 +9,9 @@
> 
>  #include 
> 
> +/* TODO: stub for the fault queue. Remove later. */
> +#define iommu_fault_queue_flush(...)
> +
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
>   * @dev: the device
> @@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>   if (!domain)
>   return -ENODEV;
> 
> + __iommu_sva_unbind_dev_all(dev);
> +
>   if (domain->ops->sva_device_shutdown)
>   domain->ops->sva_device_shutdown(dev);
> 
> @@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device
> *dev)
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access
> the mm
> + * using the returned PASID. A subsequent bind() for the same device and
> mm will
> + * reuse the bond (and return the same PASID), but users will have to call
> + * unbind() twice.

what's the point of requiring unbind twice?

> + *
> + * Callers should have taken care of setting up SVA for this device with
> + * iommu_sva_device_init() beforehand. They may also be notified of the
> bond
> + * disappearing, for example when the last task that uses the mm dies, by
> + * registering a notifier with iommu_register_mm_exit_handler().
> + *
> + * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and
> returned.
> + * TODO: The alternative, binding the non-PASID context to an mm, isn't
> + * supported at the moment because existing IOMMU domain types
> initialize the
> + * non-PASID context for iommu_map()/unmap() or bypass. This requires
> a new
> + * domain type.
> + *
> + * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down
> all
> + * mappings shared with the device. mlock() isn't sufficient, as it doesn't
> + * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't
> allowed at
> + * the moment.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> int *pasid,
> +   unsigned long flags, void *drvdata)
> +{
> + struct iommu_domain *domain;
> + struct iommu_param *dev_param = dev->iommu_param;
> +
> + domain = iommu_get_domain_for_dev(dev);
> + if (!domain)
> + return -EINVAL;
> +
> + if (!pasid)
> + return -EINVAL;
> +
> + if (!dev_param || (flags & ~dev_param->sva_features))
> + return -EINVAL;
> +
> + if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
> + return -EINVAL;
> +
> + return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
> + * @dev: the device
>