Re: [PATCH 03/37] iommu/sva: Manage process address spaces
Hi Jean, On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote: > Introduce boilerplate code for allocating IOMMU mm structures and binding > them to devices. Four operations are added to IOMMU drivers: > > * mm_alloc(): to create an io_mm structure and perform architecture- > specific operations required to grab the process (for instance on ARM, > pin down the CPU ASID so that the process doesn't get assigned a new > ASID on rollover). > > There is a single valid io_mm structure per Linux mm. Future extensions > may also use io_mm for kernel-managed address spaces, populated with > map()/unmap() calls instead of bound to process address spaces. This > patch focuses on "shared" io_mm. > > * mm_attach(): attach an mm to a device. The IOMMU driver checks that the > device is capable of sharing an address space, and writes the PASID > table entry to install the pgd. > > Some IOMMU drivers will have a single PASID table per domain, for > convenience. Other can implement it differently but to help these > drivers, mm_attach and mm_detach take 'attach_domain' and > 'detach_domain' parameters, that tell whether they need to set and clear > the PASID entry or only send the required TLB invalidations. > > * mm_detach(): detach an mm from a device. The IOMMU driver removes the > PASID table entry and invalidates the IOTLBs. > > * mm_free(): free a structure allocated by mm_alloc(), and let arch > release the process. > > mm_attach and mm_detach operations are serialized with a spinlock. At the > moment it is global, but if we try to optimize it, the core should at > least prevent concurrent attach()/detach() on the same domain (so > multi-level PASID table code can allocate tables lazily). mm_alloc() can > sleep, but mm_free must not (because we'll have to call it from call_srcu > later on.) > > At the moment we use an IDR for allocating PASIDs and retrieving contexts. > We also use a single spinlock. These can be refined and optimized later (a > custom allocator will be needed for top-down PASID allocation). > > Keeping track of address spaces requires the use of MMU notifiers. > Handling process exit with regard to unbind() is tricky, so it is left for > another patch and we explicitly fail mm_alloc() for the moment. > > Signed-off-by: Jean-Philippe Brucker> --- > drivers/iommu/iommu-sva.c | 382 > +- > drivers/iommu/iommu.c | 2 + > include/linux/iommu.h | 25 +++ > 3 files changed, 406 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c > index 593685d891bf..f9af9d66b3ed 100644 > --- a/drivers/iommu/iommu-sva.c > +++ b/drivers/iommu/iommu-sva.c > @@ -7,11 +7,321 @@ > * SPDX-License-Identifier: GPL-2.0 > */ > > +#include > #include > +#include > +#include > + > +/** > + * DOC: io_mm model > + * > + * The io_mm keeps track of process address spaces shared between CPU and > IOMMU. > + * The following example illustrates the relation between structures > + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm > and > + * device. A device can have multiple io_mm and an io_mm may be bound to > + * multiple devices. > + * ___ > + * | IOMMU domain A | > + * | | > + * | | IOMMU group |+--- io_pgtables > + * | ||| > + * | | dev 00:00.0 +--- bond --- io_mm X > + * | || \| > + * | '- bond ---. > + * |___| \ > + * ___ \ > + * | IOMMU domain B | io_mm Y > + * | | / / > + * | | IOMMU group || / / > + * | ||| / / > + * | | dev 00:01.0 bond -' / > + * | | dev 00:01.1 bond --' > + * | ||| > + * | +--- io_pgtables > + * |___| > + * > + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in > domain > + * B. All devices within the same domain access the same address spaces. > Device > + * 00:00.0 accesses address spaces X and Y, each corresponding to an > mm_struct. > + * Devices 00:01.* only access address space Y. In addition each > + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is > + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU. > + * > + * To obtain the above configuration, users would for instance issue the > + * following calls: > + * > + * iommu_sva_bind_device(dev 00:00.0, mm
Re: Question on IOMMU indentity mapping for intel-iommu
On Wed, Feb 28, 2018 at 7:15 PM, Tian, Kevinwrote: >> From: David Woodhouse >> Sent: Tuesday, February 27, 2018 3:48 PM >> >> On Mon, 2018-02-26 at 15:01 -0800, Alexander Duyck wrote: >> > I am interested in adding a new memory mapping option that >> > establishes >> > one identity-mapped region for all DMA_TO_DEVICE mappings and >> creates >> > a new dynamic mapping for any DMA_FROM_DEVICE and >> DMA_BIDIRECTIONAL >> > mappings. My thought is it should allow for a compromise between >> > security and performance (in the case of networking) in that many of >> > the server NICs drivers these days are running with mostly pinned or >> > resused paged for Rx. By using an identity mapping for the Tx packets > > Rx packets? Yes, Rx packets. In the case of drivers like ixgbevf the driver has a page reuse mechanism in place that allows it to reuse the same page multiple times for each Rx buffer. As a result we often end up processing millions of packets and don't have to allocate and/or map/unmap a page while doing so. As such we normally don't have to invalidate or create any new mappings in the IOMMU for those packets which greatly reduces the overhead. >> > we should be able to significantly cut down on the IOMMU overhead for >> > the device. The other advantage if this works is that we could use >> > this to possibly do something like dirty page tracking in the case of >> > a emulated version of the IOMMU. > > I didn't quite get this part. dirty-page tracking on emulated IOMMU > doesn't rely on the changes that you are proposing - just monitor > invalidation requests and then count all changed pages as dirty. > Of course it would be overkill since even RO mapping changes are > also counted. In that manner your proposal is still an optimization > instead of a must to enable dirty-page tracking, correct? Yes this is meant to be an optimization. One of the theoretical issues with trying to use the IOMMU for dirty page tracking is that we are currently having to update a paravirtual IOMMU to create and invalidate mappings for every packet. With that being the case then the argument could be make that we might as well just be using a paravirtual network interface since many of the benefits of SR-IOV are likely lost. I feel that the design of drivers like ixgbevf already solves half the issue since we almost never map/unmap Rx buffers. If we identity mapped the Tx packets then that is the other half of the issue solved. Essentially what you end up with is that the SR-IOV device, in this case a VF running under something like ixgbevf, has a means to provided the hypervisor with a list of pages that the device is writing to so they could theoretically either be avoided for the live migration, or at least we could keep track of them so that when the device is either halted or invalidates the mapping we could then mark the pages as dirty and migrate them. The added bonus with all this is that we could probably use it as a half-way point between the current iommu=pt solution that is often used to get performance and the standard setup which gives you a bit more in the way of security. At least with this we would probably see just about the same throughput for most NICs, but we would have the added security of a device that is unable to write outside of where we had permitted it to. >> > >> > I was originally thinking I could get away with just reusing the >> > identity mapping code but it looks like that would end up merging >> > everything into one domain if I am understanding correctly. Do I have >> > that right? >> > >> > Would I be correct in assuming that I will need to have a separate >> > domain per device, each domain containing the 1 TO_DEVICE identity >> > mapped region, and then whatever other mappings are needed to handle >> > the FROM and BIDIRECTIONAL mappings? >> >> In the normal model where we explicitly map every RX and TX buffer, you >> have a domain device anyway; that's not a new requirement for your >> model. >> >> It sounds like an interesting idea; I agree that it's a reasonable >> compromise between security and performance. The device can *read* all >> of memory, but it can't write anywhere that isn't explicitly mapped. >> >> In addition, we're mapping buffers for RX some time in advance of them >> being needed (replenishing the tail of the RX ring), where the latency >> of the map operation hopefully shouldn't be quite so much of an issue. >> While packets for TX can go straight to the device with no latency. >> Overall, I think it might work really well. >> >> You don't want the existing identity mapping code; that will give you a >> RW mapping which you don't want — you really do want read-only or this >> whole exercise is pointless, right? And you're right, it would have put >> the domain into the single identity domain. >> >> You could probably start by mocking this up with the IOMMU API. Create >> a domain with the 1:1 read-only mapping of all memory, add your
RE: Question on IOMMU indentity mapping for intel-iommu
> From: David Woodhouse > Sent: Tuesday, February 27, 2018 3:48 PM > > On Mon, 2018-02-26 at 15:01 -0800, Alexander Duyck wrote: > > I am interested in adding a new memory mapping option that > > establishes > > one identity-mapped region for all DMA_TO_DEVICE mappings and > creates > > a new dynamic mapping for any DMA_FROM_DEVICE and > DMA_BIDIRECTIONAL > > mappings. My thought is it should allow for a compromise between > > security and performance (in the case of networking) in that many of > > the server NICs drivers these days are running with mostly pinned or > > resused paged for Rx. By using an identity mapping for the Tx packets Rx packets? > > we should be able to significantly cut down on the IOMMU overhead for > > the device. The other advantage if this works is that we could use > > this to possibly do something like dirty page tracking in the case of > > a emulated version of the IOMMU. I didn't quite get this part. dirty-page tracking on emulated IOMMU doesn't rely on the changes that you are proposing - just monitor invalidation requests and then count all changed pages as dirty. Of course it would be overkill since even RO mapping changes are also counted. In that manner your proposal is still an optimization instead of a must to enable dirty-page tracking, correct? > > > > I was originally thinking I could get away with just reusing the > > identity mapping code but it looks like that would end up merging > > everything into one domain if I am understanding correctly. Do I have > > that right? > > > > Would I be correct in assuming that I will need to have a separate > > domain per device, each domain containing the 1 TO_DEVICE identity > > mapped region, and then whatever other mappings are needed to handle > > the FROM and BIDIRECTIONAL mappings? > > In the normal model where we explicitly map every RX and TX buffer, you > have a domain device anyway; that's not a new requirement for your > model. > > It sounds like an interesting idea; I agree that it's a reasonable > compromise between security and performance. The device can *read* all > of memory, but it can't write anywhere that isn't explicitly mapped. > > In addition, we're mapping buffers for RX some time in advance of them > being needed (replenishing the tail of the RX ring), where the latency > of the map operation hopefully shouldn't be quite so much of an issue. > While packets for TX can go straight to the device with no latency. > Overall, I think it might work really well. > > You don't want the existing identity mapping code; that will give you a > RW mapping which you don't want — you really do want read-only or this > whole exercise is pointless, right? And you're right, it would have put > the domain into the single identity domain. > > You could probably start by mocking this up with the IOMMU API. Create > a domain with the 1:1 read-only mapping of all memory, add your device > to it, and then do your writeable mappings on top (at IOVAs higher than > the top of physical memory). That's probably a quick way to assess > performance and prove the concept (although you don't get deferred > unmap of RX packets that way, which might mess things up a bit). Curious question. Do you think above could be a final solution or just a proof-of-concept? If the latter, will the cleaner option be to allow the device binding to multiple domains e.g. RO & RW which are selected based on DMA mapping direction (not sure whether it's an intrusive change to iommu driver which today likely assumes single binding)? > > When we expose this through the DMA API, I'd quite like this *not* to > be Intel-specific. It could reasonably live in a higher layer and be > usable with all kinds of IOMMU implementations. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
Hi Jean, > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Thursday, February 15, 2018 8:41 PM > Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices > > On 13/02/18 23:34, Tian, Kevin wrote: > >> From: Jean-Philippe Brucker > >> Sent: Tuesday, February 13, 2018 8:57 PM > >> > >> On 13/02/18 07:54, Tian, Kevin wrote: > From: Jean-Philippe Brucker > Sent: Tuesday, February 13, 2018 2:33 AM > > Add bind() and unbind() operations to the IOMMU API. Device drivers > >> can > use them to share process page tables with their devices. > bind_group() is provided for VFIO's convenience, as it needs to > provide a coherent interface on containers. Other device drivers > will most likely want to use bind_device(), which binds a single device > in the > group. > >>> > >>> I saw your bind_group implementation tries to bind the address space > >>> for all devices within a group, which IMO has some problem. Based on > >> PCIe > >>> spec, packet routing on the bus doesn't take PASID into consideration. > >>> since devices within same group cannot be isolated based on > >>> requestor- > >> ID > >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple > >> devices > >>> could cause undesired p2p. > >> But so does enabling "classic" DMA... If two devices are not > >> protected by ACS for example, they are put in the same IOMMU group, > >> and one device might be able to snoop the other's DMA. VFIO allows > >> userspace to create a container for them and use MAP/UNMAP, but makes > >> it explicit to the user that for DMA, these devices are not isolated > >> and must be considered as a single device (you can't pass them to > >> different VMs or put them in different containers). So I tried to > >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND > >> operations on the VFIO container instead of the device. > > > > there is a small difference. for classic DMA we can reserve PCI BARs > > when allocating IOVA, thus multiple devices in the same group can > > still work correctly applied with same translation, if isolation is > > not cared in between. However for SVA it's CPU virtual addresses > > managed by kernel mm thus difficult to introduce similar address > > reservation. Then it's possible for a VA falling into other device's > > BAR in the same group and cause undesired p2p traffic. In such regard, > > SVA is actually functionally-broken. > > I think the problem exists even if there is a single device in the group. > If for example, malloc() returns a VA that corresponds to a PCI host bridge > in IOVA > space, performing DMA on that buffer won't reach the IOMMU and will cause > undesirable side-effects. If only a single device in a group, should it mean there is ACS support in the path from this device to root complex? It means any memory request from this device would be upstreamed to root complex, thus it should be able to avoid undesired p2p traffics. So I intend to believe, even we do bind in group level, we actually expect to make it work only for the case where a single device within a group. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU
Hi Robin, Thanks for your reply. On 02/28/2018 11:06 PM, Robin Murphy wrote: On 28/02/18 13:00, JeffyChen wrote: Hi Robin, Thanks for your reply. On 02/28/2018 12:59 AM, Robin Murphy wrote: the rockchip IOMMU is part of the master block in hardware, so it needs to control the master's power domain and some of the master's clocks when access it's registers. and the number of clocks needed here, might be different between each IOMMUs(according to which master block it belongs), it's a little like our power domain: https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935 i'm not sure how to describe this correctly, is it ok use something like "the same as it's master block"? would it make sense to add a property to specify the master who owns the iommu, and we can get all clocks(only some of those clocks are actually needed) from it in the of_xlate()? and we can also reuse the clock-names of that master to build clk_bulk_data and log errors in clk_bulk_get. I'm inclined to agree with Rob here - if we're to add anything to the binding, it should only be whatever clock inputs are defined for the IOMMU IP block itself. If Linux doesn't properly handle the interconnect clock hierarchy external to a particular integration, that's a separate issue and it's not the binding's problem. I actually quite like the hack of "borrowing" the clocks from dev->of_node in of_xlate() - you shouldn't need any DT changes for that, because you already know that each IOMMU instance only has the one master device anyway. Thanks:) but actually we are going to support sharing IOMMU between multiple masters(one of them is the main master i think) in the newer chips(not yet supported on upstream kernel)... Ha! OK, fair enough, back to the first point then... So we might have to get all clocks from all masters, or find a way to specify the main master...and for the multiple masters case, do it in of_xlate() turns out to be a little racy...maybe we can add a property to specify main master, and get it's clocks in probe()? I notice that the 4.4 BSP kernel consistently specifies "aclk" and "hclk" for the IOMMU instances - it feels unusual to say "why don't we follow the downstream binding?", but it does look a lot like what I would expect (I'd guess at one for the register slave interface and one for the master interface/general operation?) huh, right. i did noticed that, but there's a hevc_mmu with ("aclk", "hclk", "clk_core", "clk_cabac") confused me. so confirmed with Simon, that hevc_mmu is wrong. currently all IOMMUs should only have 2 clks, either aclk+hclk or aclk+pclk (depends on the clk tree) so it seems to be a good idea to do so, will send patches soon, thanks :) If we can implement conceptually-correct clock handling based on an accurate binding, which should cover most cases, and *then* look at hacking around those where it doesn't quite work in practice due to shortcomings elsewhere, that would be ideal, and of course a lot nicer than just jumping straight into piles of hacks. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote: > +int iommu_sva_unbind_group(struct iommu_group *group, int pasid) > +{ > + struct group_device *device; > + > + mutex_lock(>mutex); > + list_for_each_entry(device, >devices, list) > + iommu_sva_unbind_device(device->dev, pasid); > + mutex_unlock(>mutex); > + > + return 0; > +} I think we should handle the errors returned by iommu_sva_unbind_device() here or at least print a warning if we want to still continue unbinding. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] dma-mapping fixe for Linux 4.16
The following changes since commit a638af00b27266c09ab7ac69141e6f4ac6c00eff: Merge tag 'usb-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb (2018-02-22 12:13:01 -0800) are available in the git repository at: git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.16-3 for you to fetch changes up to af1da686843750809738c01e153320106e890804: dma-debug: fix memory leak in debug_dma_alloc_coherent (2018-02-22 15:02:33 -0800) A single fix for a memory leak regression in the dma-debug code. Miles Chen (1): dma-debug: fix memory leak in debug_dma_alloc_coherent lib/dma-debug.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
On 28/02/18 01:26, Sinan Kaya wrote: [...] >> +static int vfio_iommu_sva_init(struct device *dev, void *data) >> +{ > > data is not getting used. That's the pointer passed to "iommu_group_for_each_dev", NULL at the moment. Next version of this patch will keep some state in data to ensure one device per group. >> + >> +int ret; >> + >> +ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID | >> +IOMMU_SVA_FEAT_IOPF, 0); >> +if (ret) >> +return ret; >> + >> +return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit); >> +} >> + >> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data) >> +{ >> +iommu_sva_device_shutdown(dev); >> +iommu_unregister_mm_exit_handler(dev); >> + >> +return 0; >> +} >> + >> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu, >> + struct vfio_group *group, >> + struct vfio_mm *vfio_mm) >> +{ >> +int ret; >> +int pasid; >> + >> +if (!group->sva_enabled) { >> +ret = iommu_group_for_each_dev(group->iommu_group, NULL, >> + vfio_iommu_sva_init); >> +if (ret) >> +return ret; >> + >> +group->sva_enabled = true; >> +} >> + >> +ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, , >> + IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF, >> + vfio_mm); >> +if (ret) >> +return ret; > > don't you need to clean up the work done by vfio_iommu_sva_init() here. Yes I suppose we can, if we enabled during this bind [...] >> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu, >> + void __user *arg, >> + struct vfio_iommu_type1_bind *bind) >> +{ >> +struct vfio_iommu_type1_bind_process params; >> +struct vfio_domain *domain; >> +struct vfio_group *group; >> +struct vfio_mm *vfio_mm; >> +struct mm_struct *mm; >> +unsigned long minsz; >> +int ret = 0; >> + >> +minsz = sizeof(*bind) + sizeof(params); >> +if (bind->argsz < minsz) >> +return -EINVAL; >> + >> +arg += sizeof(*bind); >> +if (copy_from_user(, arg, sizeof(params))) >> +return -EFAULT; >> + >> +if (params.flags & ~VFIO_IOMMU_BIND_PID) >> +return -EINVAL; >> + >> +if (params.flags & VFIO_IOMMU_BIND_PID) { >> +mm = vfio_iommu_get_mm_by_vpid(params.pid); >> +if (IS_ERR(mm)) >> +return PTR_ERR(mm); >> +} else { >> +mm = get_task_mm(current); >> +if (!mm) >> +return -EINVAL; >> +} > > I think you can merge mm failure in both states. Yes, I think vfio_iommu_get_mm_by_vpid could return NULL instead of an error pointer, and we can throw -ESRCH in all cases (the existing get_task_mm() failure in this driver does return -ESRCH, so it would be consistent.) [...] >> +/* >> + * We can't simply unbind a foreign process by PASID, because the >> + * process might have died and the PASID might have been reallocated to >> + * another process. Instead we need to fetch that process mm by PID >> + * again to make sure we remove the right vfio_mm. In addition, holding >> + * the mm guarantees that mm_users isn't dropped while we unbind and the >> + * exit_mm handler doesn't fire. While not strictly necessary, not >> + * having to care about that race simplifies everyone's life. >> + */ >> +if (params.flags & VFIO_IOMMU_BIND_PID) { >> +mm = vfio_iommu_get_mm_by_vpid(params.pid); >> +if (IS_ERR(mm)) >> +return PTR_ERR(mm); >> +} else { >> +mm = get_task_mm(current); >> +if (!mm) >> +return -EINVAL; >> +} >> + > > I think you can merge mm failure in both states. ok >> +ret = -ESRCH; >> +mutex_lock(>lock); >> +list_for_each_entry(vfio_mm, >mm_list, next) { >> +if (vfio_mm->mm != mm) >> +continue; >> + > > these loops look wierd > 1. for loops + break > 2. for loops + goto > > how about closing the for loop here. and then return here if not vfio_mm > not found. ok >> +vfio_iommu_unbind(iommu, vfio_mm); >> +list_del(_mm->next); >> +kfree(vfio_mm); >> +ret = 0; >> +break; >> +} >> +mutex_unlock(>lock); >> +mmput(mm); >> + >> +return ret; >> +} >> + > Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 16/37] iommu: Add generic PASID table library
On 27/02/18 18:51, Jacob Pan wrote: [...] >> +struct iommu_pasid_table_ops * >> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt, >> + struct iommu_pasid_table_cfg *cfg, void >> *cookie) +{ > I guess you don't need to pass in cookie here. The cookie is stored in the table driver and passed back to the IOMMU driver when invalidating a PASID table entry >> +struct iommu_pasid_table *table; >> +const struct iommu_pasid_init_fns *fns; >> + >> +if (fmt >= PASID_TABLE_NUM_FMTS) >> +return NULL; >> + >> +fns = pasid_table_init_fns[fmt]; >> +if (!fns) >> +return NULL; >> + >> +table = fns->alloc(cfg, cookie); >> +if (!table) >> +return NULL; >> + >> +table->fmt = fmt; >> +table->cookie = cookie; >> +table->cfg = *cfg; >> + > the ops is already IOMMU model specific, why do you need to pass cfg > back? The table driver needs some config information at runtime. Callbacks such as iommu_pasid_table_ops::alloc_shared_entry() receive the iommu_pasid_table_ops instance as argument. They can then get the iommu_pasid_table structure with container_of() and retrieve the config stored in table->cfg. >> +return >ops; > If there is no common code that uses these ops, I don't see the benefit > of having these APIs. Or the plan is to consolidate even further such > that referene to pasid table can be attached at per iommu_domain etc, > but that would be model specific choice. I don't plan to consolidate further. This API is for multiple IOMMU drivers with different transports implementing the same PASID table formats. For example my vSVA implementation uses this API in virtio-iommu for assigning PASID tables to the guest (All fairly experimental at this point. I initially intended to assign just the page directories, but passing the whole PASID table seemed more popular.) In the future there might be other vendor IOMMUs implementing the same PASID table formats, just like there are currently 6 IOMMU drivers using the page-table code implemented by the io-pgtable.c lib (which I copied in this patch). Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
On 27/02/18 06:21, Tian, Kevin wrote: [...] >> Technically nothing prevents it, but now the resv problem discussed on >> patch 2/37 stands out. For example on x86 you'd probably need to carve >> the >> IOAPIC MSI range out of the process address space. On Arm you'd need to >> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip >> address, but thankfully accessing the doorbell from CPU side doesn't >> trigger an MSI.) > > so if overlap already exists when binding a process address space > (since binding may happen much later than creating the process), > I assume the call will simply fail since carve out at this point is not > possible? Yes in this case I think it's safer to abort the bind() call Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: Fix a potential memory leak
On Sat, 24 Feb 2018 13:42:27 +0800 Lu Baoluwrote: > A memory block was allocated in intel_svm_bind_mm() but never freed > in a failure path. This patch fixes this by free it to avoid memory > leakage. > looks good to me. Thanks, > Cc: Ashok Raj > Cc: Jacob Pan > Cc: # v4.4+ > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-svm.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index 35a408d..3d4b924 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -396,6 +396,7 @@ int intel_svm_bind_mm(struct device *dev, int > *pasid, int flags, struct svm_dev_ pasid_max - 1, GFP_KERNEL); > if (ret < 0) { > kfree(svm); > + kfree(sdev); > goto out; > } > svm->pasid = ret; [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU
On 28/02/18 13:00, JeffyChen wrote: Hi Robin, Thanks for your reply. On 02/28/2018 12:59 AM, Robin Murphy wrote: the rockchip IOMMU is part of the master block in hardware, so it needs to control the master's power domain and some of the master's clocks when access it's registers. and the number of clocks needed here, might be different between each IOMMUs(according to which master block it belongs), it's a little like our power domain: https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935 i'm not sure how to describe this correctly, is it ok use something like "the same as it's master block"? would it make sense to add a property to specify the master who owns the iommu, and we can get all clocks(only some of those clocks are actually needed) from it in the of_xlate()? and we can also reuse the clock-names of that master to build clk_bulk_data and log errors in clk_bulk_get. I'm inclined to agree with Rob here - if we're to add anything to the binding, it should only be whatever clock inputs are defined for the IOMMU IP block itself. If Linux doesn't properly handle the interconnect clock hierarchy external to a particular integration, that's a separate issue and it's not the binding's problem. I actually quite like the hack of "borrowing" the clocks from dev->of_node in of_xlate() - you shouldn't need any DT changes for that, because you already know that each IOMMU instance only has the one master device anyway. Thanks:) but actually we are going to support sharing IOMMU between multiple masters(one of them is the main master i think) in the newer chips(not yet supported on upstream kernel)... Ha! OK, fair enough, back to the first point then... So we might have to get all clocks from all masters, or find a way to specify the main master...and for the multiple masters case, do it in of_xlate() turns out to be a little racy...maybe we can add a property to specify main master, and get it's clocks in probe()? I notice that the 4.4 BSP kernel consistently specifies "aclk" and "hclk" for the IOMMU instances - it feels unusual to say "why don't we follow the downstream binding?", but it does look a lot like what I would expect (I'd guess at one for the register slave interface and one for the master interface/general operation?) If we can implement conceptually-correct clock handling based on an accurate binding, which should cover most cases, and *then* look at hacking around those where it doesn't quite work in practice due to shortcomings elsewhere, that would be ideal, and of course a lot nicer than just jumping straight into piles of hacks. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU
Hi Robin, Thanks for your reply. On 02/28/2018 12:59 AM, Robin Murphy wrote: the rockchip IOMMU is part of the master block in hardware, so it needs to control the master's power domain and some of the master's clocks when access it's registers. and the number of clocks needed here, might be different between each IOMMUs(according to which master block it belongs), it's a little like our power domain: https://elixir.free-electrons.com/linux/latest/source/arch/arm64/boot/dts/rockchip/rk3399.dtsi#L935 i'm not sure how to describe this correctly, is it ok use something like "the same as it's master block"? would it make sense to add a property to specify the master who owns the iommu, and we can get all clocks(only some of those clocks are actually needed) from it in the of_xlate()? and we can also reuse the clock-names of that master to build clk_bulk_data and log errors in clk_bulk_get. I'm inclined to agree with Rob here - if we're to add anything to the binding, it should only be whatever clock inputs are defined for the IOMMU IP block itself. If Linux doesn't properly handle the interconnect clock hierarchy external to a particular integration, that's a separate issue and it's not the binding's problem. I actually quite like the hack of "borrowing" the clocks from dev->of_node in of_xlate() - you shouldn't need any DT changes for that, because you already know that each IOMMU instance only has the one master device anyway. Thanks:) but actually we are going to support sharing IOMMU between multiple masters(one of them is the main master i think) in the newer chips(not yet supported on upstream kernel)... So we might have to get all clocks from all masters, or find a way to specify the main master...and for the multiple masters case, do it in of_xlate() turns out to be a little racy...maybe we can add a property to specify main master, and get it's clocks in probe()? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu