Re: [PATCH 03/37] iommu/sva: Manage process address spaces

2018-02-28 Thread Lu Baolu
Hi Jean,

On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
> Introduce boilerplate code for allocating IOMMU mm structures and binding
> them to devices. Four operations are added to IOMMU drivers:
>
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
>
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
>
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
>
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
>
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
>
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
>
> mm_attach and mm_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at
> least prevent concurrent attach()/detach() on the same domain (so
> multi-level PASID table code can allocate tables lazily). mm_alloc() can
> sleep, but mm_free must not (because we'll have to call it from call_srcu
> later on.)
>
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
>
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu-sva.c | 382 
> +-
>  drivers/iommu/iommu.c |   2 +
>  include/linux/iommu.h |  25 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 593685d891bf..f9af9d66b3ed 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -7,11 +7,321 @@
>   * SPDX-License-Identifier: GPL-2.0
>   */
>  
> +#include 
>  #include 
> +#include 
> +#include 
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and 
> IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm 
> and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *  ___
> + * |  IOMMU domain A   |
> + * |   |
> + * | |  IOMMU group   |+--- io_pgtables
> + * | |||
> + * | |   dev 00:00.0 +--- bond --- io_mm X
> + * | ||   \|
> + * |   '- bond ---.
> + * |___|   \
> + *  ___ \
> + * |  IOMMU domain B   |   io_mm Y
> + * |   |   / /
> + * | |  IOMMU group   ||  / /
> + * | ||| / /
> + * | |   dev 00:01.0  bond -' /
> + * | |   dev 00:01.1  bond --'
> + * | |||
> + * |   +--- io_pgtables
> + * |___|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in 
> domain
> + * B. All devices within the same domain access the same address spaces. 
> Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an 
> mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + * iommu_sva_bind_device(dev 00:00.0, mm 

Re: Question on IOMMU indentity mapping for intel-iommu

2018-02-28 Thread Alexander Duyck
On Wed, Feb 28, 2018 at 7:15 PM, Tian, Kevin  wrote:
>> From: David Woodhouse
>> Sent: Tuesday, February 27, 2018 3:48 PM
>>
>> On Mon, 2018-02-26 at 15:01 -0800, Alexander Duyck wrote:
>> > I am interested in adding a new memory mapping option that
>> > establishes
>> > one identity-mapped region for all DMA_TO_DEVICE mappings and
>> creates
>> > a new dynamic mapping for any DMA_FROM_DEVICE and
>> DMA_BIDIRECTIONAL
>> > mappings. My thought is it should allow for a compromise between
>> > security and performance (in the case of networking) in that many of
>> > the server NICs drivers these days are running with mostly pinned or
>> > resused paged for Rx. By using an identity mapping for the Tx packets
>
> Rx packets?

Yes, Rx packets. In the case of drivers like ixgbevf the driver has a
page reuse mechanism in place that allows it to reuse the same page
multiple times for each Rx buffer. As a result we often end up
processing millions of packets and don't have to allocate and/or
map/unmap a page while doing so. As such we normally don't have to
invalidate or create any new mappings in the IOMMU for those packets
which greatly reduces the overhead.

>> > we should be able to significantly cut down on the IOMMU overhead for
>> > the device. The other advantage if this works is that we could use
>> > this to possibly do something like dirty page tracking in the case of
>> > a emulated version of the IOMMU.
>
> I didn't quite get this part. dirty-page tracking on emulated IOMMU
> doesn't rely on the changes that you are proposing - just monitor
> invalidation requests and then count all changed pages as dirty.
> Of course it would be overkill since even RO mapping changes are
> also counted. In that manner your proposal is still an optimization
> instead of a must to enable dirty-page tracking, correct?

Yes this is meant to be an optimization. One of the theoretical issues
with trying to use the IOMMU for dirty page tracking is that we are
currently having to update a paravirtual IOMMU to create and
invalidate mappings for every packet. With that being the case then
the argument could be make that we might as well just be using a
paravirtual network interface since many of the benefits of SR-IOV are
likely lost.

I feel that the design of drivers like ixgbevf already solves half the
issue since we almost never map/unmap Rx buffers. If we identity
mapped the Tx packets then that is the other half of the issue solved.
Essentially what you end up with is that the SR-IOV device, in this
case a VF running under something like ixgbevf, has a means to
provided the hypervisor with a list of pages that the device is
writing to so they could theoretically either be avoided for the live
migration, or at least we could keep track of them so that when the
device is either halted or invalidates the mapping we could then mark
the pages as dirty and migrate them.

The added bonus with all this is that we could probably use it as a
half-way point between the current iommu=pt solution that is often
used to get performance and the standard setup which gives you a bit
more in the way of security. At least with this we would probably see
just about the same throughput for most NICs, but we would have the
added security of a device that is unable to write outside of where we
had permitted it to.

>> >
>> > I was originally thinking I could get away with just reusing the
>> > identity mapping code but it looks like that would end up merging
>> > everything into one domain if I am understanding correctly. Do I have
>> > that right?
>> >
>> > Would I be correct in assuming that I will need to have a separate
>> > domain per device, each domain containing the 1 TO_DEVICE identity
>> > mapped region, and then whatever other mappings are needed to handle
>> > the FROM and BIDIRECTIONAL mappings?
>>
>> In the normal model where we explicitly map every RX and TX buffer, you
>> have a domain device anyway; that's not a new requirement for your
>> model.
>>
>> It sounds like an interesting idea; I agree that it's a reasonable
>> compromise between security and performance. The device can *read* all
>> of memory, but it can't write anywhere that isn't explicitly mapped.
>>
>> In addition, we're mapping buffers for RX some time in advance of them
>> being needed (replenishing the tail of the RX ring), where the latency
>> of the map operation hopefully shouldn't be quite so much of an issue.
>> While packets for TX can go straight to the device with no latency.
>> Overall, I think it might work really well.
>>
>> You don't want the existing identity mapping code; that will give you a
>> RW mapping which you don't want — you really do want read-only or this
>> whole exercise is pointless, right? And you're right, it would have put
>> the domain into the single identity domain.
>>
>> You could probably start by mocking this up with the IOMMU API. Create
>> a domain with the 1:1 read-only mapping of all memory, add your 

RE: Question on IOMMU indentity mapping for intel-iommu

2018-02-28 Thread Tian, Kevin
> From: David Woodhouse
> Sent: Tuesday, February 27, 2018 3:48 PM
> 
> On Mon, 2018-02-26 at 15:01 -0800, Alexander Duyck wrote:
> > I am interested in adding a new memory mapping option that
> > establishes
> > one identity-mapped region for all DMA_TO_DEVICE mappings and
> creates
> > a new dynamic mapping for any DMA_FROM_DEVICE and
> DMA_BIDIRECTIONAL
> > mappings. My thought is it should allow for a compromise between
> > security and performance (in the case of networking) in that many of
> > the server NICs drivers these days are running with mostly pinned or
> > resused paged for Rx. By using an identity mapping for the Tx packets

Rx packets?

> > we should be able to significantly cut down on the IOMMU overhead for
> > the device. The other advantage if this works is that we could use
> > this to possibly do something like dirty page tracking in the case of
> > a emulated version of the IOMMU.

I didn't quite get this part. dirty-page tracking on emulated IOMMU
doesn't rely on the changes that you are proposing - just monitor
invalidation requests and then count all changed pages as dirty.
Of course it would be overkill since even RO mapping changes are
also counted. In that manner your proposal is still an optimization
instead of a must to enable dirty-page tracking, correct?

> >
> > I was originally thinking I could get away with just reusing the
> > identity mapping code but it looks like that would end up merging
> > everything into one domain if I am understanding correctly. Do I have
> > that right?
> >
> > Would I be correct in assuming that I will need to have a separate
> > domain per device, each domain containing the 1 TO_DEVICE identity
> > mapped region, and then whatever other mappings are needed to handle
> > the FROM and BIDIRECTIONAL mappings?
> 
> In the normal model where we explicitly map every RX and TX buffer, you
> have a domain device anyway; that's not a new requirement for your
> model.
> 
> It sounds like an interesting idea; I agree that it's a reasonable
> compromise between security and performance. The device can *read* all
> of memory, but it can't write anywhere that isn't explicitly mapped.
> 
> In addition, we're mapping buffers for RX some time in advance of them
> being needed (replenishing the tail of the RX ring), where the latency
> of the map operation hopefully shouldn't be quite so much of an issue.
> While packets for TX can go straight to the device with no latency.
> Overall, I think it might work really well.
> 
> You don't want the existing identity mapping code; that will give you a
> RW mapping which you don't want — you really do want read-only or this
> whole exercise is pointless, right? And you're right, it would have put
> the domain into the single identity domain.
> 
> You could probably start by mocking this up with the IOMMU API. Create
> a domain with the 1:1 read-only mapping of all memory, add your device
> to it, and then do your writeable mappings on top (at IOVAs higher than
> the top of physical memory). That's probably a quick way to assess
> performance and prove the concept (although you don't get deferred
> unmap of RX packets that way, which might mess things up a bit).

Curious question. Do you think above could be a final solution or just
a proof-of-concept? If the latter, will the cleaner option be to allow the
device binding to multiple domains e.g. RO & RW which are selected
based on DMA mapping direction (not sure whether it's an intrusive 
change to iommu driver which today likely assumes single binding)?

> 
> When we expose this through the DMA API, I'd quite like this *not* to
> be Intel-specific. It could reasonably live in a higher layer and be
> usable with all kinds of IOMMU implementations.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-28 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
>  From: Jean-Philippe Brucker
>  Sent: Tuesday, February 13, 2018 2:33 AM
> 
>  Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
>  use them to share process page tables with their devices.
>  bind_group() is provided for VFIO's convenience, as it needs to
>  provide a coherent interface on containers. Other device drivers
>  will most likely want to use bind_device(), which binds a single device 
>  in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge 
> in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU

2018-02-28 Thread JeffyChen

Hi Robin,

Thanks for your reply.

On 02/28/2018 11:06 PM, Robin Murphy wrote:

On 28/02/18 13:00, JeffyChen wrote:

Hi Robin,

Thanks for your reply.

On 02/28/2018 12:59 AM, Robin Murphy wrote:

the rockchip IOMMU is part of the master block in hardware, so it
needs
to control the master's power domain and some of the master's clocks
when access it's registers.

and the number of clocks needed here, might be different between each
IOMMUs(according to which master block it belongs), it's a little like
our power domain:
https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935




i'm not sure how to describe this correctly, is it ok use something
like
"the same as it's master block"?


would it make sense to add a property to specify the master who owns
the iommu, and we can get all clocks(only some of those clocks are
actually needed) from it in the of_xlate()? and we can also reuse the
clock-names of that master to build clk_bulk_data and log errors in
clk_bulk_get.


I'm inclined to agree with Rob here - if we're to add anything to the
binding, it should only be whatever clock inputs are defined for the
IOMMU IP block itself. If Linux doesn't properly handle the interconnect
clock hierarchy external to a particular integration, that's a separate
issue and it's not the binding's problem.

I actually quite like the hack of "borrowing" the clocks from
dev->of_node in of_xlate() - you shouldn't need any DT changes for that,
because you already know that each IOMMU instance only has the one
master device anyway.


Thanks:) but actually we are going to support sharing IOMMU between
multiple masters(one of them is the main master i think) in the newer
chips(not yet supported on upstream kernel)...


Ha! OK, fair enough, back to the first point then...


So we might have to get all clocks from all masters, or find a way to
specify the main master...and for the multiple masters case, do it in
of_xlate() turns out to be a little racy...maybe we can add a property
to specify main master, and get it's clocks in probe()?


I notice that the 4.4 BSP kernel consistently specifies "aclk" and
"hclk" for the IOMMU instances - it feels unusual to say "why don't we
follow the downstream binding?", but it does look a lot like what I
would expect (I'd guess at one for the register slave interface and one
for the master interface/general operation?)

huh, right.
i did noticed that, but there's a hevc_mmu with ("aclk", "hclk", 
"clk_core", "clk_cabac") confused me.


so confirmed with Simon, that hevc_mmu is wrong. currently all IOMMUs 
should only have 2 clks, either aclk+hclk or aclk+pclk (depends on the 
clk tree)


so it seems to be a good idea to do so, will send patches soon, thanks :)


If we can implement conceptually-correct clock handling based on an
accurate binding, which should cover most cases, and *then* look at
hacking around those where it doesn't quite work in practice due to
shortcomings elsewhere, that would be ideal, and of course a lot nicer
than just jumping straight into piles of hacks.

Robin.






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

2018-02-28 Thread Sinan Kaya
On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> + struct group_device *device;
> +
> + mutex_lock(>mutex);
> + list_for_each_entry(device, >devices, list)
> + iommu_sva_unbind_device(device->dev, pasid);
> + mutex_unlock(>mutex);
> +
> + return 0;
> +}

I think we should handle the errors returned by iommu_sva_unbind_device() here
or at least print a warning if we want to still continue unbinding. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] dma-mapping fixe for Linux 4.16

2018-02-28 Thread Christoph Hellwig
The following changes since commit a638af00b27266c09ab7ac69141e6f4ac6c00eff:

  Merge tag 'usb-4.16-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb (2018-02-22 12:13:01 
-0800)

are available in the git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.16-3

for you to fetch changes up to af1da686843750809738c01e153320106e890804:

  dma-debug: fix memory leak in debug_dma_alloc_coherent (2018-02-22 15:02:33 
-0800)


A single fix for a memory leak regression in the dma-debug code.


Miles Chen (1):
  dma-debug: fix memory leak in debug_dma_alloc_coherent

 lib/dma-debug.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing

2018-02-28 Thread Jean-Philippe Brucker
On 28/02/18 01:26, Sinan Kaya wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
> 
> data is not getting used.

That's the pointer passed to "iommu_group_for_each_dev", NULL at the
moment. Next version of this patch will keep some state in data to
ensure one device per group.

>> +
>> +int ret;
>> +
>> +ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +IOMMU_SVA_FEAT_IOPF, 0);
>> +if (ret)
>> +return ret;
>> +
>> +return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +iommu_sva_device_shutdown(dev);
>> +iommu_unregister_mm_exit_handler(dev);
>> +
>> +return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> + struct vfio_group *group,
>> + struct vfio_mm *vfio_mm)
>> +{
>> +int ret;
>> +int pasid;
>> +
>> +if (!group->sva_enabled) {
>> +ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +   vfio_iommu_sva_init);
>> +if (ret)
>> +return ret;
>> +
>> +group->sva_enabled = true;
>> +}
>> +
>> +ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, ,
>> +   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
>> +   vfio_mm);
>> +if (ret)
>> +return ret;
> 
> don't you need to clean up the work done by vfio_iommu_sva_init() here.

Yes I suppose we can, if we enabled during this bind

[...]
>> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
>> +  void __user *arg,
>> +  struct vfio_iommu_type1_bind *bind)
>> +{
>> +struct vfio_iommu_type1_bind_process params;
>> +struct vfio_domain *domain;
>> +struct vfio_group *group;
>> +struct vfio_mm *vfio_mm;
>> +struct mm_struct *mm;
>> +unsigned long minsz;
>> +int ret = 0;
>> +
>> +minsz = sizeof(*bind) + sizeof(params);
>> +if (bind->argsz < minsz)
>> +return -EINVAL;
>> +
>> +arg += sizeof(*bind);
>> +if (copy_from_user(, arg, sizeof(params)))
>> +return -EFAULT;
>> +
>> +if (params.flags & ~VFIO_IOMMU_BIND_PID)
>> +return -EINVAL;
>> +
>> +if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +if (IS_ERR(mm))
>> +return PTR_ERR(mm);
>> +} else {
>> +mm = get_task_mm(current);
>> +if (!mm)
>> +return -EINVAL;
>> +}
> 
> I think you can merge mm failure in both states.

Yes, I think vfio_iommu_get_mm_by_vpid could return NULL instead of an
error pointer, and we can throw -ESRCH in all cases (the existing
get_task_mm() failure in this driver does return -ESRCH, so it would be
consistent.)

[...]
>> +/*
>> + * We can't simply unbind a foreign process by PASID, because the
>> + * process might have died and the PASID might have been reallocated to
>> + * another process. Instead we need to fetch that process mm by PID
>> + * again to make sure we remove the right vfio_mm. In addition, holding
>> + * the mm guarantees that mm_users isn't dropped while we unbind and the
>> + * exit_mm handler doesn't fire. While not strictly necessary, not
>> + * having to care about that race simplifies everyone's life.
>> + */
>> +if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +if (IS_ERR(mm))
>> +return PTR_ERR(mm);
>> +} else {
>> +mm = get_task_mm(current);
>> +if (!mm)
>> +return -EINVAL;
>> +}
>> +
> 
> I think you can merge mm failure in both states.

ok

>> +ret = -ESRCH;
>> +mutex_lock(>lock);
>> +list_for_each_entry(vfio_mm, >mm_list, next) {
>> +if (vfio_mm->mm != mm)
>> +continue;
>> +
> 
> these loops look wierd 
> 1. for loops + break 
> 2. for loops + goto
> 
> how about closing the for loop here. and then return here if not vfio_mm
> not found.

ok

>> +vfio_iommu_unbind(iommu, vfio_mm);
>> +list_del(_mm->next);
>> +kfree(vfio_mm);
>> +ret = 0;
>> +break;
>> +}
>> +mutex_unlock(>lock);
>> +mmput(mm);
>> +
>> +return ret;
>> +}
>> +
> 

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 16/37] iommu: Add generic PASID table library

2018-02-28 Thread Jean-Philippe Brucker
On 27/02/18 18:51, Jacob Pan wrote:
[...]
>> +struct iommu_pasid_table_ops *
>> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
>> +  struct iommu_pasid_table_cfg *cfg, void
>> *cookie) +{
> I guess you don't need to pass in cookie here.

The cookie is stored in the table driver and passed back to the IOMMU
driver when invalidating a PASID table entry

>> +struct iommu_pasid_table *table;
>> +const struct iommu_pasid_init_fns *fns;
>> +
>> +if (fmt >= PASID_TABLE_NUM_FMTS)
>> +return NULL;
>> +
>> +fns = pasid_table_init_fns[fmt];
>> +if (!fns)
>> +return NULL;
>> +
>> +table = fns->alloc(cfg, cookie);
>> +if (!table)
>> +return NULL;
>> +
>> +table->fmt = fmt;
>> +table->cookie = cookie;
>> +table->cfg = *cfg;
>> +
> the ops is already IOMMU model specific, why do you need to pass cfg
> back?

The table driver needs some config information at runtime. Callbacks such
as iommu_pasid_table_ops::alloc_shared_entry() receive the
iommu_pasid_table_ops instance as argument. They can then get the
iommu_pasid_table structure with container_of() and retrieve the config
stored in table->cfg.

>> +return >ops;
> If there is no common code that uses these ops, I don't see the benefit
> of having these APIs. Or the plan is to consolidate even further such
> that referene to pasid table can be attached at per iommu_domain etc,
> but that would be model specific choice.

I don't plan to consolidate further. This API is for multiple IOMMU
drivers with different transports implementing the same PASID table
formats. For example my vSVA implementation uses this API in virtio-iommu
for assigning PASID tables to the guest (All fairly experimental at this
point. I initially intended to assign just the page directories, but
passing the whole PASID table seemed more popular.)

In the future there might be other vendor IOMMUs implementing the same
PASID table formats, just like there are currently 6 IOMMU drivers using
the page-table code implemented by the io-pgtable.c lib (which I copied in
this patch).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API

2018-02-28 Thread Jean-Philippe Brucker
On 27/02/18 06:21, Tian, Kevin wrote:
[...]
>> Technically nothing prevents it, but now the resv problem discussed on
>> patch 2/37 stands out. For example on x86 you'd probably need to carve
>> the
>> IOAPIC MSI range out of the process address space. On Arm you'd need to
>> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
>> address, but thankfully accessing the doorbell from CPU side doesn't
>> trigger an MSI.)
> 
> so if overlap already exists when binding a process address space
> (since binding may happen much later than creating the process),
> I assume the call will simply fail since carve out at this point is not
> possible?

Yes in this case I think it's safer to abort the bind() call

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Fix a potential memory leak

2018-02-28 Thread Jacob Pan
On Sat, 24 Feb 2018 13:42:27 +0800
Lu Baolu  wrote:

> A memory block was allocated in intel_svm_bind_mm() but never freed
> in a failure path. This patch fixes this by free it to avoid memory
> leakage.
> 
looks good to me.

Thanks,
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc:  # v4.4+
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-svm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 35a408d..3d4b924 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -396,6 +396,7 @@ int intel_svm_bind_mm(struct device *dev, int
> *pasid, int flags, struct svm_dev_ pasid_max - 1, GFP_KERNEL);
>   if (ret < 0) {
>   kfree(svm);
> + kfree(sdev);
>   goto out;
>   }
>   svm->pasid = ret;

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU

2018-02-28 Thread Robin Murphy

On 28/02/18 13:00, JeffyChen wrote:

Hi Robin,

Thanks for your reply.

On 02/28/2018 12:59 AM, Robin Murphy wrote:

the rockchip IOMMU is part of the master block in hardware, so it needs
to control the master's power domain and some of the master's clocks
when access it's registers.

and the number of clocks needed here, might be different between each
IOMMUs(according to which master block it belongs), it's a little like
our power domain:
https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935 





i'm not sure how to describe this correctly, is it ok use something 
like

"the same as it's master block"?


would it make sense to add a property to specify the master who owns
the iommu, and we can get all clocks(only some of those clocks are
actually needed) from it in the of_xlate()? and we can also reuse the
clock-names of that master to build clk_bulk_data and log errors in
clk_bulk_get.


I'm inclined to agree with Rob here - if we're to add anything to the
binding, it should only be whatever clock inputs are defined for the
IOMMU IP block itself. If Linux doesn't properly handle the interconnect
clock hierarchy external to a particular integration, that's a separate
issue and it's not the binding's problem.

I actually quite like the hack of "borrowing" the clocks from
dev->of_node in of_xlate() - you shouldn't need any DT changes for that,
because you already know that each IOMMU instance only has the one
master device anyway.


Thanks:) but actually we are going to support sharing IOMMU between 
multiple masters(one of them is the main master i think) in the newer 
chips(not yet supported on upstream kernel)...


Ha! OK, fair enough, back to the first point then...

So we might have to get all clocks from all masters, or find a way to 
specify the main master...and for the multiple masters case, do it in 
of_xlate() turns out to be a little racy...maybe we can add a property 
to specify main master, and get it's clocks in probe()?


I notice that the 4.4 BSP kernel consistently specifies "aclk" and 
"hclk" for the IOMMU instances - it feels unusual to say "why don't we 
follow the downstream binding?", but it does look a lot like what I 
would expect (I'd guess at one for the register slave interface and one 
for the master interface/general operation?)


If we can implement conceptually-correct clock handling based on an 
accurate binding, which should cover most cases, and *then* look at 
hacking around those where it doesn't quite work in practice due to 
shortcomings elsewhere, that would be ideal, and of course a lot nicer 
than just jumping straight into piles of hacks.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU

2018-02-28 Thread JeffyChen

Hi Robin,

Thanks for your reply.

On 02/28/2018 12:59 AM, Robin Murphy wrote:

the rockchip IOMMU is part of the master block in hardware, so it needs
to control the master's power domain and some of the master's clocks
when access it's registers.

and the number of clocks needed here, might be different between each
IOMMUs(according to which master block it belongs), it's a little like
our power domain:
https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935



i'm not sure how to describe this correctly, is it ok use something like
"the same as it's master block"?


would it make sense to add a property to specify the master who owns
the iommu, and we can get all clocks(only some of those clocks are
actually needed) from it in the of_xlate()? and we can also reuse the
clock-names of that master to build clk_bulk_data and log errors in
clk_bulk_get.


I'm inclined to agree with Rob here - if we're to add anything to the
binding, it should only be whatever clock inputs are defined for the
IOMMU IP block itself. If Linux doesn't properly handle the interconnect
clock hierarchy external to a particular integration, that's a separate
issue and it's not the binding's problem.

I actually quite like the hack of "borrowing" the clocks from
dev->of_node in of_xlate() - you shouldn't need any DT changes for that,
because you already know that each IOMMU instance only has the one
master device anyway.


Thanks:) but actually we are going to support sharing IOMMU between 
multiple masters(one of them is the main master i think) in the newer 
chips(not yet supported on upstream kernel)...


So we might have to get all clocks from all masters, or find a way to 
specify the main master...and for the multiple masters case, do it in 
of_xlate() turns out to be a little racy...maybe we can add a property 
to specify main master, and get it's clocks in probe()?




Robin.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu