RE: [PATCH v2 2/4] iommu: Introduce device fault data
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com] > Sent: Thursday, June 6, 2019 1:38 AM > > On Wed, 5 Jun 2019 08:51:45 + > "Tian, Kevin" wrote: > > > > From: Jacob Pan > > > Sent: Tuesday, June 4, 2019 6:09 AM > > > > > > On Mon, 3 Jun 2019 15:57:47 +0100 > > > Jean-Philippe Brucker wrote: > > > > > > > +/** > > > > + * struct iommu_fault_page_request - Page Request data > > > > + * @flags: encodes whether the corresponding fields are valid and > > > > whether this > > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > > > values) > > > > + * @pasid: Process Address Space ID > > > > + * @grpid: Page Request Group Index > > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* > values) > > > > + * @addr: page address > > > > + * @private_data: device-specific private information > > > > + */ > > > > +struct iommu_fault_page_request { > > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > > > + __u32 flags; > > > > + __u32 pasid; > > > > + __u32 grpid; > > > > + __u32 perm; > > > > + __u64 addr; > > > > + __u64 private_data[2]; > > > > +}; > > > > + > > > > > > Just a thought, for non-identity G-H PASID management. We could > > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > > > HPASID, IOMMU driver > > > can retrieve GPASID from the bind data then report to the guest via > > > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > > > > > Should we add a gpasid field here? or we can add a flag and field > > > later, up to you. > > > > > > > Can private_data serve this purpose? It's better not introducing > > gpasid awareness within host IOMMU driver. It is just a user-level > > data associated with a PASID when binding happens. Kernel doesn't > > care the actual meaning, simply record it and then return back to > > user space later upon device fault. Qemu interprets the meaning as > > gpasid in its own context. otherwise usages may use it for other > > purpose. > > > private_data was intended for device PRQ with private data, part of the > VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host > then respond back when page response from the guest matches pending PRQ > with the data withheld. But for in-kernel PRQ reporting, private data > still might be passed on to any driver who wants to process the PRQ. So > we can't re-purpose it. sure. I just use it as one example to extend. > > But for in-kernel VDCM driver, it needs a lookup from guest PASID to > host PASID. I thought you wanted to have IOMMU driver provide such > service since the knowledge of H-G pasid can be established during > bind_gpasid time. In that sense, we _do_ have gpasid awareness. > yes, it makes sense. My original point is that IOMMU driver itself doesn't need to know the actual meaning of this field (then it may be reused for different purpose from gpasid), but you are right that mdev driver in kernel anyway needs to do G-H translation then explicitly defining it looks reasonable. Thanks Kevin
Re: [PATCH] iommu/dma: Apply dma_{alloc,free}_contiguous functions
Looks fine to me. Robin, any comments? On Mon, Jun 03, 2019 at 03:52:59PM -0700, Nicolin Chen wrote: > This patch replaces dma_{alloc,release}_from_contiguous() with > dma_{alloc,free}_contiguous() to simplify those function calls. > > Signed-off-by: Nicolin Chen > --- > drivers/iommu/dma-iommu.c | 14 -- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 0cd49c2d3770..cc3d39dbbe1a 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -951,8 +951,8 @@ static void __iommu_dma_free(struct device *dev, size_t > size, void *cpu_addr) > > if (pages) > __iommu_dma_free_pages(pages, count); > - if (page && !dma_release_from_contiguous(dev, page, count)) > - __free_pages(page, get_order(alloc_size)); > + if (page) > + dma_free_contiguous(dev, page, alloc_size); > } > > static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, > @@ -970,12 +970,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, > size_t size, > struct page *page = NULL; > void *cpu_addr; > > - if (gfpflags_allow_blocking(gfp)) > - page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT, > - get_order(alloc_size), > - gfp & __GFP_NOWARN); > - if (!page) > - page = alloc_pages(gfp, get_order(alloc_size)); > + page = dma_alloc_contiguous(dev, alloc_size, gfp); > if (!page) > return NULL; > > @@ -997,8 +992,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, > size_t size, > memset(cpu_addr, 0, alloc_size); > return cpu_addr; > out_free_pages: > - if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT)) > - __free_pages(page, get_order(alloc_size)); > + dma_free_contiguous(dev, page, alloc_size); > return NULL; > } > > -- > 2.17.1 ---end quoted text---
RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
Hi Christoph, Thank you for your comments! > From: Christoph Hellwig, Sent: Wednesday, June 5, 2019 9:38 PM > > On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote: > > And if the problem is really that you're not getting merging because of > > exposing the wrong parameters to the DMA API and/or the block layer, or > > that you just can't quite express your requirement to the block layer in > > the first place, then that should really be tackled at the source rather > > than worked around further down in the stack. > > The problem is that we need a way to communicate to the block layer > that more than a single segment is ok IFF the DMA API instance supports > merging. And of course the answer will depend on futher parameters > like the maximum merged segment size and alignment for the segement. I'm afraid but I don't understand why we need a way to communicate to the block layer that more than a single segment is ok IFF the DMA API instance supports merging. > We'll need some way to communicate that, but I don't really think this > is series is the way to go. I should discard the patches 1/8 through 4/8. > We'd really need something hanging off the device (or through a query > API) how the dma map ops implementation exposes under what circumstances > it can merge. The driver can then communicate that to the block layer > so that the block layer doesn't split requests up when reaching the > segement limit. The block layer already has a limit "max_segment_size" for each device so that regardless it can/cannot merge the segments, we can use the limit. Is my understanding incorrect? Best regards, Yoshihiro Shimoda ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
Hi Robin, Thank you for your comments! > From: Robin Murphy, Sent: Wednesday, June 5, 2019 9:22 PM > > @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct > > scatterlist *sg, > > if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) > > goto out_free_iova; > > > > - return __finalise_sg(dev, sg, nents, iova); > > + ret = __finalise_sg(dev, sg, nents, iova); > > + /* > > +* Check whether the sg entry is single if a device requires and > > +* the IOMMU driver has the capable. > > +*/ > > + if (iova_contiguous && ret != 1) > > + goto out_unmap_sg; > > I don't see that just failing really gives this option any value. > Clearly the MMC driver has to do *something* to handle the failure (plus > presumably the case of not having IOMMU DMA ops at all), which begs the > question of why it couldn't just do whatever that is anyway, without all > this infrastructure. For starters, it would be a far simpler and less > invasive patch: > > if (dma_map_sg(...) > 1) { > dma_unmap_sg(...); > /* split into multiple requests and try again */ > } I understood it. > But then it would make even more sense to just have the driver be > proactive about its special requirement in the first place, and simply > validate the list before it even tries to map it: > > for_each_sg(sgl, sg, n, i) > if ((i > 0 && sg->offset % PAGE_SIZE) || > (i < n - 1 && sg->length % PAGE_SIZE)) > /* segment will not be mergeable */ In previous version, I made such a code [1]. But, I think I misunderstood Christoph's comments [2] [3]. [1] https://patchwork.kernel.org/patch/10970047/ [2] https://marc.info/?l=linux-renesas-soc&m=155956751811689&w=2 [3] https://marc.info/?l=linux-renesas-soc&m=155852814607202&w=2 > For reference, I think v4l2 and possibly some areas of DRM already do > something vaguely similar to judge whether they get contiguous buffers > or not. I see. I'll check these areas later. > > + > > + return ret; > > > > +out_unmap_sg: > > + iommu_dma_unmap_sg(dev, sg, nents, dir, attrs); > > out_free_iova: > > iommu_dma_free_iova(cookie, iova, iova_len); > > out_restore_sg: > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 91af22a..f971dd3 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -104,6 +104,7 @@ enum iommu_cap { > >transactions */ > > IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ > > IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ > > + IOMMU_CAP_MERGING, /* IOMMU supports segments merging */ > > This isn't a 'capability' of the IOMMU - "segment merging" equates to > just remapping pages, and there's already a fundamental assumption that > IOMMUs are capable of that. Plus it's very much a DMA API concept, so > hardly belongs in the IOMMU API anyway. I got it. > All in all, I'm struggling to see the point of this. Although it's not a > DMA API guarantee, iommu-dma already merges scatterlists as aggressively > as it is allowed to, and will continue to do so for the foreseeable > future (since it avoids considerable complication in the IOVA > allocation), so if you want to make sure iommu_dma_map_sg() merges an > entire list, just don't give it a non-mergeable list. Thank you for the explanation. I didn't know that a driver should not give it a non-mergeable list. > And if you still > really really want dma_map_sg() to have a behaviour of "merge to a > single segment or fail", then that should at least be a DMA API > attribute, which could in principle be honoured by bounce-buffering > implementations as well. I got it. For this patch series, it seems I have to modify a block layer so that such a new DMA API is not needed though. > And if the problem is really that you're not getting merging because of > exposing the wrong parameters to the DMA API and/or the block layer, or > that you just can't quite express your requirement to the block layer in > the first place, then that should really be tackled at the source rather > than worked around further down in the stack. I'll reply on Christoph email about this topic later. Best regards, Yoshihiro Shimoda > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler
On Tue, 4 Jun 2019 18:11:08 +0200 Auger Eric wrote: > Hi Alex, > > On 6/4/19 12:31 AM, Alex Williamson wrote: > > On Sun, 26 May 2019 18:10:01 +0200 > > Eric Auger wrote: > > > >> This patch registers a fault handler which records faults in > >> a circular buffer and then signals an eventfd. This buffer is > >> exposed within the fault region. > >> > >> Signed-off-by: Eric Auger > >> > >> --- > >> > >> v3 -> v4: > >> - move iommu_unregister_device_fault_handler to vfio_pci_release > >> --- > >> drivers/vfio/pci/vfio_pci.c | 49 > >> + drivers/vfio/pci/vfio_pci_private.h > >> | 1 + 2 files changed, 50 insertions(+) > >> > >> diff --git a/drivers/vfio/pci/vfio_pci.c > >> b/drivers/vfio/pci/vfio_pci.c index f75f61127277..52094ba8 > >> 100644 --- a/drivers/vfio/pci/vfio_pci.c > >> +++ b/drivers/vfio/pci/vfio_pci.c > >> @@ -30,6 +30,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> #include "vfio_pci_private.h" > >> > >> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops > >> vfio_pci_fault_prod_regops = { .add_capability = > >> vfio_pci_fault_prod_add_capability, }; > >> > >> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event > >> *evt, void *data) +{ > >> + struct vfio_pci_device *vdev = (struct vfio_pci_device *) > >> data; > >> + struct vfio_region_fault_prod *prod_region = > >> + (struct vfio_region_fault_prod > >> *)vdev->fault_pages; > >> + struct vfio_region_fault_cons *cons_region = > >> + (struct vfio_region_fault_cons > >> *)(vdev->fault_pages + 2 * PAGE_SIZE); > >> + struct iommu_fault *new = > >> + (struct iommu_fault *)(vdev->fault_pages + > >> prod_region->offset + > >> + prod_region->prod * > >> prod_region->entry_size); > >> + int prod, cons, size; > >> + > >> + mutex_lock(&vdev->fault_queue_lock); > >> + > >> + if (!vdev->fault_abi) > >> + goto unlock; > >> + > >> + prod = prod_region->prod; > >> + cons = cons_region->cons; > >> + size = prod_region->nb_entries; > >> + > >> + if (CIRC_SPACE(prod, cons, size) < 1) > >> + goto unlock; > >> + > >> + *new = evt->fault; > >> + prod = (prod + 1) % size; > >> + prod_region->prod = prod; > >> + mutex_unlock(&vdev->fault_queue_lock); > >> + > >> + mutex_lock(&vdev->igate); > >> + if (vdev->dma_fault_trigger) > >> + eventfd_signal(vdev->dma_fault_trigger, 1); > >> + mutex_unlock(&vdev->igate); > >> + return 0; > >> + > >> +unlock: > >> + mutex_unlock(&vdev->fault_queue_lock); > >> + return -EINVAL; > >> +} > >> + > >> static int vfio_pci_init_fault_region(struct vfio_pci_device > >> *vdev) { > >>struct vfio_region_fault_prod *header; > >> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct > >> vfio_pci_device *vdev) header = (struct vfio_region_fault_prod > >> *)vdev->fault_pages; header->version = -1; > >>header->offset = PAGE_SIZE; > >> + > >> + ret = > >> iommu_register_device_fault_handler(&vdev->pdev->dev, > >> + > >> vfio_pci_iommu_dev_fault_handler, > >> + vdev); > >> + if (ret) > >> + goto out; > >> + > >>return 0; > >> out: > >>kfree(vdev->fault_pages); > >> @@ -570,6 +618,7 @@ static void vfio_pci_release(void *device_data) > >>if (!(--vdev->refcnt)) { > >>vfio_spapr_pci_eeh_release(vdev->pdev); > >>vfio_pci_disable(vdev); > >> + > >> iommu_unregister_device_fault_handler(&vdev->pdev->dev); > > > > > > But this can fail if there are pending faults which leaves a device > > reference and then the system is broken :( > This series only features unrecoverable errors and for those the > unregistration cannot fail. Now unrecoverable errors were added I > admit this is confusing. We need to sort this out or clean the > dependencies. As Alex pointed out in 4/29, we can make iommu_unregister_device_fault_handler() never fail and clean up all the pending faults in the host IOMMU belong to that device. But the problem is that if a fault, such as PRQ, has already been injected into the guest, the page response may come back after handler is unregistered and registered again. We need a way to reject such page response belong to the previous life of the handler. Perhaps a sync call to the guest with your fault queue eventfd? I am not sure. Jacob
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On Wed, 5 Jun 2019 12:24:09 +0100 Jean-Philippe Brucker wrote: > On 05/06/2019 09:51, Tian, Kevin wrote: > >> From: Jacob Pan > >> Sent: Tuesday, June 4, 2019 6:09 AM > >> > >> On Mon, 3 Jun 2019 15:57:47 +0100 > >> Jean-Philippe Brucker wrote: > >> > >>> +/** > >>> + * struct iommu_fault_page_request - Page Request data > >>> + * @flags: encodes whether the corresponding fields are valid and > >>> whether this > >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > >>> values) > >>> + * @pasid: Process Address Space ID > >>> + * @grpid: Page Request Group Index > >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > >>> + * @addr: page address > >>> + * @private_data: device-specific private information > >>> + */ > >>> +struct iommu_fault_page_request { > >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > >>> + __u32 flags; > >>> + __u32 pasid; > >>> + __u32 grpid; > >>> + __u32 perm; > >>> + __u64 addr; > >>> + __u64 private_data[2]; > >>> +}; > >>> + > >> > >> Just a thought, for non-identity G-H PASID management. We could > >> pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > >> QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > >> QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > >> HPASID, IOMMU driver > >> can retrieve GPASID from the bind data then report to the guest via > >> VFIO. In this case QEMU does not need to do a H->G PASID lookup. > >> > >> Should we add a gpasid field here? or we can add a flag and field > >> later, up to you. > >> > > > > Can private_data serve this purpose? > > Isn't private_data already used for VT-d's Private Data field? > yes, as part of the PRQ. please see my explanation in the previous email. > > It's better not introducing > > gpasid awareness within host IOMMU driver. It is just a user-level > > data associated with a PASID when binding happens. Kernel doesn't > > care the actual meaning, simply record it and then return back to > > user space later upon device fault. Qemu interprets the meaning as > > gpasid in its own context. otherwise usages may use it for other > > purpose. > > Regarding a gpasid field I don't mind either way, but extending the > iommu_fault structure later won't be completely straightforward so we > could add some padding now. > > Userspace negotiate the iommu_fault struct format with VFIO, before > allocating a circular buffer of N fault structures > (). > So adding new fields requires introducing a new ABI version and a > struct iommu_fault_v2. That may be OK for disruptive changes, but > just adding a new field indicated by a flag shouldn't have to be that > complicated. > > How about setting the iommu_fault structure to 128 bytes? > > struct iommu_fault { > __u32 type; > __u32 padding; > union { > struct iommu_fault_unrecoverable event; > struct iommu_fault_page_request prm; > __u8 padding2[120]; > }; > }; > > Given that @prm is currently 40 bytes and @event 32 bytes, the padding > allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit > fields, which is still representable with new flags) before we have to > upgrade the ABI version. > > A 4kB and a 64kB queue can hold respectively: > > * 85 and 1365 records when iommu_fault is 48 bytes (current format). > * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow > only 2 new 64-bit fields). > * 32 and 512 records when iommu_fault is 128 bytes. > > In comparison, > * the SMMU even queue can hold 128 and 2048 events respectively at > those sizes (and is allowed to grow up to 524k entries) > * the SMMU PRI queue can hold 256 and 4096 PR. > > But the SMMU queues have to be physically contiguous, whereas our > fault queues are in userspace memory which is less expensive. So > 128-byte records might be reasonable. What do you think? > I think though 128-byte is large enough for any future extension but 64B might be good enough and it is a cache line. PCI page request msg is only 16B :) VT-d currently uses one 4K page for PRQ, holds 128 records of PRQ descriptors. This can grow to 16K entries per spec. That is per IOMMU. The user fault queue here is per device. So we do have to be frugal about it since it will support mdev at per PASID level at some point? I have to look into Eric's patchset on how he handles queue full in the producer. If we go with 128B size in iommu_fault and 4KB size queue (32 entries as in your table), VT-d PRQ size of 128 entries can potentially cause queue full. We have to handle this VFIO queue full differently than the IOMMU queue full in that we only need to discard PRQ for one device. (Whereas IOMMU queue full has to clear out all). Anyway, I think 64B should be enough but 128B is fine too. We have to de
[RFC 2/2] iommu: arm-smmu: Don't blindly use first SMR to calculate mask
With the SMRs inherited from the bootloader the first SMR might actually be valid and in use. As such probing the SMR mask using the first SMR might break a stream in use. Search for an unused stream and use this to probe the SMR mask. Signed-off-by: Bjorn Andersson --- drivers/iommu/arm-smmu.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8629a656b42..0c6f5fe6f382 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1084,23 +1084,35 @@ static void arm_smmu_test_smr_masks(struct arm_smmu_device *smmu) { void __iomem *gr0_base = ARM_SMMU_GR0(smmu); u32 smr; + int idx; if (!smmu->smrs) return; + for (idx = 0; idx < smmu->num_mapping_groups; idx++) { + smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx)); + if (!(smr & SMR_VALID)) + break; + } + + if (idx == smmu->num_mapping_groups) { + dev_err(smmu->dev, "Unable to compute streamid_mask\n"); + return; + } + /* * SMR.ID bits may not be preserved if the corresponding MASK * bits are set, so check each one separately. We can reject * masters later if they try to claim IDs outside these masks. */ smr = smmu->streamid_mask << SMR_ID_SHIFT; - writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0)); - smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0)); + writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx)); + smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx)); smmu->streamid_mask = smr >> SMR_ID_SHIFT; smr = smmu->streamid_mask << SMR_MASK_SHIFT; - writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0)); - smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0)); + writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx)); + smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx)); smmu->smr_mask_mask = smr >> SMR_MASK_SHIFT; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 1/2] iommu: arm-smmu: Handoff SMR registers and context banks
Boot splash screen or EFI framebuffer requires the display hardware to operate while the Linux iommu driver probes. Therefore, we cannot simply wipe out the SMR register settings programmed by the bootloader. Detect which SMR registers are in use during probe, and which context banks they are associated with. Reserve these context banks for the first Linux device whose stream-id matches the SMR register. Any existing page-tables will be discarded. Heavily based on downstream implementation by Patrick Daly . Signed-off-by: Bjorn Andersson --- drivers/iommu/arm-smmu-regs.h | 2 + drivers/iommu/arm-smmu.c | 80 --- 2 files changed, 77 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index e9132a926761..8c1fd84032a2 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -105,7 +105,9 @@ #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2)) #define SMR_VALID (1 << 31) #define SMR_MASK_SHIFT 16 +#define SMR_MASK_MASK 0x7fff #define SMR_ID_SHIFT 0 +#define SMR_ID_MASK0x #define ARM_SMMU_GR0_S2CR(n) (0xc00 + ((n) << 2)) #define S2CR_CBNDX_SHIFT 0 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5e54cc0a28b3..c8629a656b42 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -135,6 +135,7 @@ struct arm_smmu_s2cr { enum arm_smmu_s2cr_type type; enum arm_smmu_s2cr_privcfg privcfg; u8 cbndx; + boolhandoff; }; #define s2cr_init_val (struct arm_smmu_s2cr){ \ @@ -399,9 +400,22 @@ static int arm_smmu_register_legacy_master(struct device *dev, return err; } -static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end) +static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start, + struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + unsigned long *map = smmu->context_map; + int end = smmu->num_context_banks; + int cbndx; int idx; + int i; + + for_each_cfg_sme(fwspec, i, idx) { + if (smmu->s2crs[idx].handoff) { + cbndx = smmu->s2crs[idx].cbndx; + goto found_handoff; + } + } do { idx = find_next_zero_bit(map, end, start); @@ -410,6 +424,17 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end) } while (test_and_set_bit(idx, map)); return idx; + +found_handoff: + for (i = 0; i < smmu->num_mapping_groups; i++) { + if (smmu->s2crs[i].cbndx == cbndx) { + smmu->s2crs[i].cbndx = 0; + smmu->s2crs[i].handoff = false; + smmu->s2crs[i].count--; + } + } + + return cbndx; } static void __arm_smmu_free_bitmap(unsigned long *map, int idx) @@ -759,7 +784,8 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) } static int arm_smmu_init_domain_context(struct iommu_domain *domain, - struct arm_smmu_device *smmu) + struct arm_smmu_device *smmu, + struct device *dev) { int irq, start, ret = 0; unsigned long ias, oas; @@ -873,8 +899,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, ret = -EINVAL; goto out_unlock; } - ret = __arm_smmu_alloc_bitmap(smmu->context_map, start, - smmu->num_context_banks); + ret = __arm_smmu_alloc_cb(smmu, start, dev); if (ret < 0) goto out_unlock; @@ -1264,7 +1289,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return ret; /* Ensure that the domain is finalised */ - ret = arm_smmu_init_domain_context(domain, smmu); + ret = arm_smmu_init_domain_context(domain, smmu, dev); if (ret < 0) goto rpm_put; @@ -1798,6 +1823,49 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); } +static void arm_smmu_read_smr_state(struct arm_smmu_device *smmu) +{ + u32 privcfg; + u32 cbndx; + u32 mask; + u32 type; + u32 s2cr; + u32 smr; + u32 id; + int i; + + for (i = 0; i < smmu->num_mapping_groups; i++) { + smr = readl_relaxed(ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_SMR(i)); + mask = (smr >> SMR_MASK_SHIFT) & SMR_MASK_MASK; + id = smr & SMR_ID_MASK; +
[RFC 0/2] iommu: arm-smmu: Inherit SMR and CB config during init
Qualcomm devices typically boot with some sort of splash screen on the display, or for some devices (i.e. recent Qualcomm laptops) an EFI framebuffer. For this the bootloader allocates a static framebuffer, configures the display hardware to output this on the display, sets up the SMMU for the display hardware and jumps to the kernel. But as the arm-smmu disables all SMRs the display hardware will no longer be able to access the framebuffer and the result is that the device resets. The given proposal reads back the SMR state at boot and for marks these contexts as busy. This ensures that the display hardware will have continued access to the framebuffer. Once a device is attached we try to match it to the predefined stream mapping, so that e.g. the display driver will inherit the particular SMRs/CBs. This has the positive side effect that on some Qualcomm platforms, e.g. QCS404, writes to the SMR registers are ignored. But as we inherit the preconfigured mapping from the bootloader we can use the arm-smmu driver on these platforms. Bjorn Andersson (2): iommu: arm-smmu: Handoff SMR registers and context banks iommu: arm-smmu: Don't blindly use first SMR to calculate mask drivers/iommu/arm-smmu-regs.h | 2 + drivers/iommu/arm-smmu.c | 100 +++--- 2 files changed, 93 insertions(+), 9 deletions(-) -- 2.18.0
Re: [PATCH] driver: core: Allow subsystems to continue deferring probe
On Wed, Jun 05, 2019 at 04:23:12PM +0200, Thierry Reding wrote: > From: Thierry Reding > > Some subsystems, such as pinctrl, allow continuing to defer probe > indefinitely. This is useful for devices that depend on resources > provided by devices that are only probed after the init stage. > > One example of this can be seen on Tegra, where the DPAUX hardware > contains pinmuxing controls for pins that it shares with an I2C > controller. The I2C controller is typically used for communication > with a monitor over HDMI (DDC). However, other instances of the I2C > controller are used to access system critical components, such as a > PMIC. The I2C controller driver will therefore usually be a builtin > driver, whereas the DPAUX driver is part of the display driver that > is loaded from a module to avoid bloating the kernel image with all > of the DRM/KMS subsystem. > > In this particular case the pins used by this I2C/DDC controller > become accessible very late in the boot process. However, since the > controller is only used in conjunction with display, that's not an > issue. > > Unfortunately the driver core currently outputs a warning message > when a device fails to get the pinctrl before the end of the init > stage. That can be confusing for the user because it may sound like > an unwanted error occurred, whereas it's really an expected and > harmless situation. > > In order to eliminate this warning, this patch allows callers of the > driver_deferred_probe_check_state() helper to specify that they want > to continue deferring probe, regardless of whether we're past the > init stage or not. All of the callers of that function are updated > for the new signature, but only the pinctrl subsystem passes a true > value in the new persist parameter if appropriate. > > Signed-off-by: Thierry Reding > --- > drivers/base/dd.c| 17 - > drivers/base/power/domain.c | 2 +- > drivers/iommu/of_iommu.c | 2 +- > drivers/pinctrl/devicetree.c | 10 ++ > include/linux/device.h | 2 +- > 5 files changed, 21 insertions(+), 12 deletions(-) > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 0df9b4461766..25ffbadf4187 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -238,23 +238,30 @@ __setup("deferred_probe_timeout=", > deferred_probe_timeout_setup); > /** > * driver_deferred_probe_check_state() - Check deferred probe state > * @dev: device to check > + * @persist: Boolean flag indicating whether drivers should keep trying to > + * probe after built-in drivers have had a chance to probe. This is useful > + * for built-in drivers that rely on resources provided by modular drivers. Functionality aside, having random boolean flags in a function parameter list is a horrid way to have an api. Now when you see a call to this function with either "true" or "false" as an option, you have no idea what that means. So you need to go look up this definition and then work backwards. So even if the idea is sane (I'm not saying that), this implementation is not acceptable at all. thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On Wed, 5 Jun 2019 08:51:45 + "Tian, Kevin" wrote: > > From: Jacob Pan > > Sent: Tuesday, June 4, 2019 6:09 AM > > > > On Mon, 3 Jun 2019 15:57:47 +0100 > > Jean-Philippe Brucker wrote: > > > > > +/** > > > + * struct iommu_fault_page_request - Page Request data > > > + * @flags: encodes whether the corresponding fields are valid and > > > whether this > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > > values) > > > + * @pasid: Process Address Space ID > > > + * @grpid: Page Request Group Index > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > > > + * @addr: page address > > > + * @private_data: device-specific private information > > > + */ > > > +struct iommu_fault_page_request { > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > > + __u32 flags; > > > + __u32 pasid; > > > + __u32 grpid; > > > + __u32 perm; > > > + __u64 addr; > > > + __u64 private_data[2]; > > > +}; > > > + > > > > Just a thought, for non-identity G-H PASID management. We could > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > > HPASID, IOMMU driver > > can retrieve GPASID from the bind data then report to the guest via > > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > > > Should we add a gpasid field here? or we can add a flag and field > > later, up to you. > > > > Can private_data serve this purpose? It's better not introducing > gpasid awareness within host IOMMU driver. It is just a user-level > data associated with a PASID when binding happens. Kernel doesn't > care the actual meaning, simply record it and then return back to > user space later upon device fault. Qemu interprets the meaning as > gpasid in its own context. otherwise usages may use it for other > purpose. > private_data was intended for device PRQ with private data, part of the VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host then respond back when page response from the guest matches pending PRQ with the data withheld. But for in-kernel PRQ reporting, private data still might be passed on to any driver who wants to process the PRQ. So we can't re-purpose it. But for in-kernel VDCM driver, it needs a lookup from guest PASID to host PASID. I thought you wanted to have IOMMU driver provide such service since the knowledge of H-G pasid can be established during bind_gpasid time. In that sense, we _do_ have gpasid awareness. > Thanks > Kevin [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
On 06/05/2019 02:11 PM, Yoshihiro Shimoda wrote: > This patch adds a new capable IOMMU_CAP_MERGING to check whether > the IOVA would be contiguous strictly if a device requires and > the IOMMU driver has the capable. s/has/is/? Or capable what? > Signed-off-by: Yoshihiro Shimoda > --- > drivers/iommu/dma-iommu.c | 26 -- > include/linux/iommu.h | 1 + > 2 files changed, 25 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 034caae..ecf1a04 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c [...] > @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct > scatterlist *sg, > sg_dma_len(s) = s_length; > s->offset -= s_iova_off; > s_length = iova_align(iovad, s_length + s_iova_off); > + /* > + * Check whether the IOVA would be contiguous strictly if > + * a device requires and the IOMMU driver has the capable. Same question here... > + */ > + if (iova_contiguous && i > 0 && > + (s_iova_off || s->length != s_length)) > + return 0; > s->length = s_length; > > /* > @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct > scatterlist *sg, > if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) > goto out_free_iova; > > - return __finalise_sg(dev, sg, nents, iova); > + ret = __finalise_sg(dev, sg, nents, iova); > + /* > + * Check whether the sg entry is single if a device requires and > + * the IOMMU driver has the capable. You meant capability perhaps? [...] MBR, Sergei
[PATCH] driver: core: Allow subsystems to continue deferring probe
From: Thierry Reding Some subsystems, such as pinctrl, allow continuing to defer probe indefinitely. This is useful for devices that depend on resources provided by devices that are only probed after the init stage. One example of this can be seen on Tegra, where the DPAUX hardware contains pinmuxing controls for pins that it shares with an I2C controller. The I2C controller is typically used for communication with a monitor over HDMI (DDC). However, other instances of the I2C controller are used to access system critical components, such as a PMIC. The I2C controller driver will therefore usually be a builtin driver, whereas the DPAUX driver is part of the display driver that is loaded from a module to avoid bloating the kernel image with all of the DRM/KMS subsystem. In this particular case the pins used by this I2C/DDC controller become accessible very late in the boot process. However, since the controller is only used in conjunction with display, that's not an issue. Unfortunately the driver core currently outputs a warning message when a device fails to get the pinctrl before the end of the init stage. That can be confusing for the user because it may sound like an unwanted error occurred, whereas it's really an expected and harmless situation. In order to eliminate this warning, this patch allows callers of the driver_deferred_probe_check_state() helper to specify that they want to continue deferring probe, regardless of whether we're past the init stage or not. All of the callers of that function are updated for the new signature, but only the pinctrl subsystem passes a true value in the new persist parameter if appropriate. Signed-off-by: Thierry Reding --- drivers/base/dd.c| 17 - drivers/base/power/domain.c | 2 +- drivers/iommu/of_iommu.c | 2 +- drivers/pinctrl/devicetree.c | 10 ++ include/linux/device.h | 2 +- 5 files changed, 21 insertions(+), 12 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 0df9b4461766..25ffbadf4187 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -238,23 +238,30 @@ __setup("deferred_probe_timeout=", deferred_probe_timeout_setup); /** * driver_deferred_probe_check_state() - Check deferred probe state * @dev: device to check + * @persist: Boolean flag indicating whether drivers should keep trying to + * probe after built-in drivers have had a chance to probe. This is useful + * for built-in drivers that rely on resources provided by modular drivers. * * Returns -ENODEV if init is done and all built-in drivers have had a chance - * to probe (i.e. initcalls are done), -ETIMEDOUT if deferred probe debug - * timeout has expired, or -EPROBE_DEFER if none of those conditions are met. + * to probe (i.e. initcalls are done) and unless persist is set, -ETIMEDOUT if + * deferred probe debug timeout has expired, or -EPROBE_DEFER if none of those + * conditions are met. * * Drivers or subsystems can opt-in to calling this function instead of directly * returning -EPROBE_DEFER. */ -int driver_deferred_probe_check_state(struct device *dev) +int driver_deferred_probe_check_state(struct device *dev, bool persist) { if (initcalls_done) { if (!deferred_probe_timeout) { dev_WARN(dev, "deferred probe timeout, ignoring dependency"); return -ETIMEDOUT; } - dev_warn(dev, "ignoring dependency for device, assuming no driver"); - return -ENODEV; + + if (!persist) { + dev_warn(dev, "ignoring dependency for device, assuming no driver"); + return -ENODEV; + } } return -EPROBE_DEFER; } diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 33c30c1e6a30..effa5276a773 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2423,7 +2423,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev, mutex_unlock(&gpd_list_lock); dev_dbg(dev, "%s() failed to find PM domain: %ld\n", __func__, PTR_ERR(pd)); - return driver_deferred_probe_check_state(base_dev); + return driver_deferred_probe_check_state(base_dev, false); } dev_dbg(dev, "adding to PM domain %s\n", pd->name); diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index f04a6df65eb8..70f3946b088a 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -117,7 +117,7 @@ static int of_iommu_xlate(struct device *dev, * a proper probe-ordering dependency mechanism in future. */ if (!ops) - return driver_deferred_probe_check_state(dev); + return driver_deferred_probe_check_state(dev, false); return ops->of_xlate(dev, iommu_spec); } diff --git a/drivers/pinctrl/devicetree.c b/dri
Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops
On Wed, Jun 5, 2019 at 6:18 AM Marek Szyprowski wrote: > > Hi Rob, > > On 2019-06-05 14:57, Rob Clark wrote: > > On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa wrote: > >> But first of all, I remember Marek already submitted some patches long > >> ago that extended struct driver with some flag that means that the > >> driver doesn't want the IOMMU to be attached before probe. Why > >> wouldn't that work? Sounds like a perfect opt-out solution. > > Actually, yeah.. we should do that. That is the simplest solution. > > Tomasz has very good memory. It took me a while to find that old patches: > > https://patchwork.kernel.org/patch/4677251/ > https://patchwork.kernel.org/patch/4677941/ > https://patchwork.kernel.org/patch/4677401/ > > It looks that my idea was a bit ahead of its time ;) > if I re-spin this, was their a reason not to just use bitfields, ie: -bool suppress_bind_attrs;/* disables bind/unbind via sysfs */ +bool suppress_bind_attrs : 1;/* disables bind/unbind via sysfs */ +bool has_own_iommu_manager : 1; /* driver explictly manages IOMMU */ That seems like it would have been a bit less churn and a bit nicer looking (IMO at least) BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops
Hi Rob, On 2019-06-05 14:57, Rob Clark wrote: > On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa wrote: >> But first of all, I remember Marek already submitted some patches long >> ago that extended struct driver with some flag that means that the >> driver doesn't want the IOMMU to be attached before probe. Why >> wouldn't that work? Sounds like a perfect opt-out solution. > Actually, yeah.. we should do that. That is the simplest solution. Tomasz has very good memory. It took me a while to find that old patches: https://patchwork.kernel.org/patch/4677251/ https://patchwork.kernel.org/patch/4677941/ https://patchwork.kernel.org/patch/4677401/ It looks that my idea was a bit ahead of its time ;) Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops
On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa wrote: > > But first of all, I remember Marek already submitted some patches long > ago that extended struct driver with some flag that means that the > driver doesn't want the IOMMU to be attached before probe. Why > wouldn't that work? Sounds like a perfect opt-out solution. Actually, yeah.. we should do that. That is the simplest solution. BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote: > And if the problem is really that you're not getting merging because of > exposing the wrong parameters to the DMA API and/or the block layer, or > that you just can't quite express your requirement to the block layer in > the first place, then that should really be tackled at the source rather > than worked around further down in the stack. The problem is that we need a way to communicate to the block layer that more than a single segment is ok IFF the DMA API instance supports merging. And of course the answer will depend on futher parameters like the maximum merged segment size and alignment for the segement. We'll need some way to communicate that, but I don't really think this is series is the way to go. We'd really need something hanging off the device (or through a query API) how the dma map ops implementation exposes under what circumstances it can merge. The driver can then communicate that to the block layer so that the block layer doesn't split requests up when reaching the segement limit. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
On 05/06/2019 12:11, Yoshihiro Shimoda wrote: This patch adds a new capable IOMMU_CAP_MERGING to check whether the IOVA would be contiguous strictly if a device requires and the IOMMU driver has the capable. Signed-off-by: Yoshihiro Shimoda --- drivers/iommu/dma-iommu.c | 26 -- include/linux/iommu.h | 1 + 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 034caae..ecf1a04 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, dma_addr_t iova; size_t iova_len = 0; unsigned long mask = dma_get_seg_boundary(dev); - int i; + int i, ret; + bool iova_contiguous = false; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) iommu_dma_sync_sg_for_device(dev, sg, nents, dir); + if (dma_get_iova_contiguous(dev) && + iommu_capable(dev->bus, IOMMU_CAP_MERGING)) + iova_contiguous = true; + /* * Work out how much IOVA space we need, and align the segments to * IOVA granules for the IOMMU driver to handle. With some clever @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, sg_dma_len(s) = s_length; s->offset -= s_iova_off; s_length = iova_align(iovad, s_length + s_iova_off); + /* +* Check whether the IOVA would be contiguous strictly if +* a device requires and the IOMMU driver has the capable. +*/ + if (iova_contiguous && i > 0 && + (s_iova_off || s->length != s_length)) + return 0; s->length = s_length; /* @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) goto out_free_iova; - return __finalise_sg(dev, sg, nents, iova); + ret = __finalise_sg(dev, sg, nents, iova); + /* +* Check whether the sg entry is single if a device requires and +* the IOMMU driver has the capable. +*/ + if (iova_contiguous && ret != 1) + goto out_unmap_sg; I don't see that just failing really gives this option any value. Clearly the MMC driver has to do *something* to handle the failure (plus presumably the case of not having IOMMU DMA ops at all), which begs the question of why it couldn't just do whatever that is anyway, without all this infrastructure. For starters, it would be a far simpler and less invasive patch: if (dma_map_sg(...) > 1) { dma_unmap_sg(...); /* split into multiple requests and try again */ } But then it would make even more sense to just have the driver be proactive about its special requirement in the first place, and simply validate the list before it even tries to map it: for_each_sg(sgl, sg, n, i) if ((i > 0 && sg->offset % PAGE_SIZE) || (i < n - 1 && sg->length % PAGE_SIZE)) /* segment will not be mergeable */ For reference, I think v4l2 and possibly some areas of DRM already do something vaguely similar to judge whether they get contiguous buffers or not. + + return ret; +out_unmap_sg: + iommu_dma_unmap_sg(dev, sg, nents, dir, attrs); out_free_iova: iommu_dma_free_iova(cookie, iova, iova_len); out_restore_sg: diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 91af22a..f971dd3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -104,6 +104,7 @@ enum iommu_cap { transactions */ IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ + IOMMU_CAP_MERGING, /* IOMMU supports segments merging */ This isn't a 'capability' of the IOMMU - "segment merging" equates to just remapping pages, and there's already a fundamental assumption that IOMMUs are capable of that. Plus it's very much a DMA API concept, so hardly belongs in the IOMMU API anyway. All in all, I'm struggling to see the point of this. Although it's not a DMA API guarantee, iommu-dma already merges scatterlists as aggressively as it is allowed to, and will continue to do so for the foreseeable future (since it avoids considerable complication in the IOVA allocation), so if you want to make sure iommu_dma_map_sg() merges an entire list, just don't give it a non-mergeable list. And if you still really really want dma_map_sg() to have a behaviour of "merge to a single segment or fail", then that should at least be a DMA API attribute, which could in principle be honoured by bounce-buffer
Re: [PATCH v3] iommu/arm-smmu: Avoid constant zero in TLBI writes
[+Joerg on To:] On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote: > From: Robin Murphy > > Apparently, some Qualcomm arm64 platforms which appear to expose their > SMMU global register space are still, in fact, using a hypervisor to > mediate it by trapping and emulating register accesses. Sadly, some > deployed versions of said trapping code have bugs wherein they go > horribly wrong for stores using r31 (i.e. XZR/WZR) as the source > register. > > While this can be mitigated for GCC today by tweaking the constraints > for the implementation of writel_relaxed(), to avoid any potential > arms race with future compilers more aggressively optimising register > allocation, the simple way is to just remove all the problematic > constant zeros. For the write-only TLB operations, the actual value is > irrelevant anyway and any old nearby variable will provide a suitable > GPR to encode. The one point at which we really do need a zero to clear > a context bank happens before any of the TLB maintenance where crashes > have been reported, so is apparently not a problem... :/ > > Reported-by: AngeloGioacchino Del Regno > Tested-by: Marc Gonzalez > Signed-off-by: Robin Murphy > Signed-off-by: Marc Gonzalez Acked-by: Will Deacon Joerg -- Please can you take this as a fix for 5.2, with a Cc stable? Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/4] iommu: Add device fault reporting API
On 03/06/2019 22:59, Jacob Pan wrote: > On Mon, 3 Jun 2019 15:57:45 +0100 > Jean-Philippe Brucker wrote: > >> Allow device drivers and VFIO to get notified on IOMMU translation >> fault, and handle recoverable faults (PCI PRI). Several series require >> this API (Intel VT-d and Arm SMMUv3 nested support, as well as the >> generic host SVA implementation). >> >> Changes since v1 [1]: >> * Allocate iommu_param earlier, in iommu_probe_device(). >> * Pass struct iommu_fault to fault handlers, instead of the >> iommu_fault_event wrapper. >> * Removed unused iommu_fault_event::iommu_private. >> * Removed unnecessary iommu_page_response::addr. >> * Added iommu_page_response::version, which would allow to introduce a >> new incompatible iommu_page_response structure (as opposed to just >> adding a flag + field). >> >> [1] [PATCH 0/4] iommu: Add device fault reporting API >> >> https://lore.kernel.org/lkml/20190523180613.55049-1-jean-philippe.bruc...@arm.com/ >> >> Jacob Pan (3): >> driver core: Add per device iommu param >> iommu: Introduce device fault data >> iommu: Introduce device fault report API >> >> Jean-Philippe Brucker (1): >> iommu: Add recoverable fault reporting >> > This interface meet the need for vt-d, just one more comment on 2/4. Do > you want to add Co-developed-by you for the three patches from me? I'm fine without it, I don't think it adds much to the Signed-off-by, which is required Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On 05/06/2019 09:51, Tian, Kevin wrote: >> From: Jacob Pan >> Sent: Tuesday, June 4, 2019 6:09 AM >> >> On Mon, 3 Jun 2019 15:57:47 +0100 >> Jean-Philippe Brucker wrote: >> >>> +/** >>> + * struct iommu_fault_page_request - Page Request data >>> + * @flags: encodes whether the corresponding fields are valid and >>> whether this >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* >>> values) >>> + * @pasid: Process Address Space ID >>> + * @grpid: Page Request Group Index >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) >>> + * @addr: page address >>> + * @private_data: device-specific private information >>> + */ >>> +struct iommu_fault_page_request { >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) >>> + __u32 flags; >>> + __u32 pasid; >>> + __u32 grpid; >>> + __u32 perm; >>> + __u64 addr; >>> + __u64 private_data[2]; >>> +}; >>> + >> >> Just a thought, for non-identity G-H PASID management. We could pass on >> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU >> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a >> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU >> driver >> can retrieve GPASID from the bind data then report to the guest via >> VFIO. In this case QEMU does not need to do a H->G PASID lookup. >> >> Should we add a gpasid field here? or we can add a flag and field >> later, up to you. >> > > Can private_data serve this purpose? Isn't private_data already used for VT-d's Private Data field? > It's better not introducing > gpasid awareness within host IOMMU driver. It is just a user-level > data associated with a PASID when binding happens. Kernel doesn't > care the actual meaning, simply record it and then return back to user > space later upon device fault. Qemu interprets the meaning as gpasid > in its own context. otherwise usages may use it for other purpose. Regarding a gpasid field I don't mind either way, but extending the iommu_fault structure later won't be completely straightforward so we could add some padding now. Userspace negotiate the iommu_fault struct format with VFIO, before allocating a circular buffer of N fault structures (https://lore.kernel.org/lkml/20190526161004.25232-26-eric.au...@redhat.com/). So adding new fields requires introducing a new ABI version and a struct iommu_fault_v2. That may be OK for disruptive changes, but just adding a new field indicated by a flag shouldn't have to be that complicated. How about setting the iommu_fault structure to 128 bytes? struct iommu_fault { __u32 type; __u32 padding; union { struct iommu_fault_unrecoverable event; struct iommu_fault_page_request prm; __u8 padding2[120]; }; }; Given that @prm is currently 40 bytes and @event 32 bytes, the padding allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit fields, which is still representable with new flags) before we have to upgrade the ABI version. A 4kB and a 64kB queue can hold respectively: * 85 and 1365 records when iommu_fault is 48 bytes (current format). * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow only 2 new 64-bit fields). * 32 and 512 records when iommu_fault is 128 bytes. In comparison, * the SMMU even queue can hold 128 and 2048 events respectively at those sizes (and is allowed to grow up to 524k entries) * the SMMU PRI queue can hold 256 and 4096 PR. But the SMMU queues have to be physically contiguous, whereas our fault queues are in userspace memory which is less expensive. So 128-byte records might be reasonable. What do you think? The iommu_fault_response (patch 4/4) is a bit easier to extend because it's userspace->kernel and userspace can just declare the size it's using. I did add a version field in case we run out of flags or want to change the whole thing, but I think I was being overly cautious and it might just be a waste of space. Thanks, Jean
[RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous
This API can set a flag whether a device requires iova contiguous strictly. Signed-off-by: Yoshihiro Shimoda --- include/linux/device.h | 1 + include/linux/dma-mapping.h | 16 2 files changed, 17 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index e85264f..a33d611 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -752,6 +752,7 @@ struct device_dma_parameters { */ unsigned int max_segment_size; unsigned long segment_boundary_mask; + bool iova_contiguous; }; /** diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 6309a72..cdb4e75 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -729,6 +729,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) return -EIO; } +static inline int dma_get_iova_contiguous(struct device *dev) +{ + if (dev->dma_parms) + return dev->dma_parms->iova_contiguous; + return false; +} + +static inline int dma_set_iova_contiguous(struct device *dev, bool contiguous) +{ + if (dev->dma_parms) { + dev->dma_parms->iova_contiguous = contiguous; + return 0; + } + return -EIO; +} + #ifndef dma_max_pfn static inline unsigned long dma_max_pfn(struct device *dev) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU
This patch adds a condition to avoid a memory size limitation of swiotlb if the driver runs on IOMMU. Tested-by: Takeshi Saito Signed-off-by: Yoshihiro Shimoda Reviewed-by: Simon Horman Reviewed-by: Wolfram Sang --- drivers/mmc/host/tmio_mmc_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c index 130b91c..c9f6a59 100644 --- a/drivers/mmc/host/tmio_mmc_core.c +++ b/drivers/mmc/host/tmio_mmc_core.c @@ -1194,9 +1194,10 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host) * Since swiotlb has memory size limitation, this will calculate * the maximum size locally (because we don't have any APIs for it now) * and check the current max_req_size. And then, this will update -* the max_req_size if needed as a workaround. +* the max_req_size if needed as a workaround. However, if the driver +* runs on IOMMU, this workaround isn't needed. */ - if (swiotlb_max_segment()) { + if (swiotlb_max_segment() && !device_iommu_mapped(&pdev->dev)) { unsigned int max_size = (1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE; if (mmc->max_req_size > max_size) -- 2.7.4
[RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible
If the IOMMU driver has a capable for merging segments to a segment strictly, this can expose the host->mmc->max_segs with multiple segments to a block layer by using blk_queue_max_segments() that mmc_setup_queue() calls. Notes that an sdio card may be possible to use multiple segments with non page aligned size, so that this will not expose multiple segments to avoid failing dma_map_sg() every time. Notes that on renesas_sdhi_sys_dmac, the max_segs value will change from 32 to 512, but the sys_dmac can handle 512 segments, so that this init_card ops is added on "TMIO_MMC_MIN_RCAR2" environment. Signed-off-by: Yoshihiro Shimoda --- drivers/mmc/host/renesas_sdhi_core.c | 27 +++ drivers/mmc/host/renesas_sdhi_internal_dmac.c | 4 2 files changed, 31 insertions(+) diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index c5ee4a6..379cefa 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include @@ -46,6 +47,8 @@ #define SDHI_VER_GEN3_SD 0xcc10 #define SDHI_VER_GEN3_SDMMC0xcd10 +#define SDHI_MAX_SEGS_IN_IOMMU 512 + struct renesas_sdhi_quirks { bool hs400_disabled; bool hs400_4taps; @@ -203,6 +206,28 @@ static void renesas_sdhi_clk_disable(struct tmio_mmc_host *host) clk_disable_unprepare(priv->clk_cd); } +static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card) +{ + struct tmio_mmc_host *host = mmc_priv(mmc); + + /* +* If the IOMMU driver has a capable for merging segments to +* a segment strictly, this can expose the host->mmc->max_segs with +* multiple segments to a block layer by using blk_queue_max_segments() +* that mmc_setup_queue() calls. Notes that an sdio card may be +* possible to use multiple segments with non page aligned size, so +* that this will not expose multiple segments to avoid failing +* dma_map_sg() every time. +*/ + if (host->pdata->max_segs < SDHI_MAX_SEGS_IN_IOMMU && + iommu_capable(host->pdev->dev.bus, IOMMU_CAP_MERGING) && + (mmc_card_mmc(card) || mmc_card_sd(card))) + host->mmc->max_segs = SDHI_MAX_SEGS_IN_IOMMU; + else + host->mmc->max_segs = host->pdata->max_segs ? : + TMIO_DEFAULT_MAX_SEGS; +} + static int renesas_sdhi_card_busy(struct mmc_host *mmc) { struct tmio_mmc_host *host = mmc_priv(mmc); @@ -726,6 +751,8 @@ int renesas_sdhi_probe(struct platform_device *pdev, /* SDR speeds are only available on Gen2+ */ if (mmc_data->flags & TMIO_MMC_MIN_RCAR2) { + host->ops.init_card = renesas_sdhi_init_card; + /* card_busy caused issues on r8a73a4 (pre-Gen2) CD-less SDHI */ host->ops.card_busy = renesas_sdhi_card_busy; host->ops.start_signal_voltage_switch = diff --git a/drivers/mmc/host/renesas_sdhi_internal_dmac.c b/drivers/mmc/host/renesas_sdhi_internal_dmac.c index 751fe91..a442f86 100644 --- a/drivers/mmc/host/renesas_sdhi_internal_dmac.c +++ b/drivers/mmc/host/renesas_sdhi_internal_dmac.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -337,6 +338,9 @@ static int renesas_sdhi_internal_dmac_probe(struct platform_device *pdev) /* value is max of SD_SECCNT. Confirmed by HW engineers */ dma_set_max_seg_size(dev, 0x); + if (iommu_capable(dev->bus, IOMMU_CAP_MERGING)) + dma_set_iova_contiguous(dev, true); + return renesas_sdhi_probe(pdev, &renesas_sdhi_internal_dmac_dma_ops); } -- 2.7.4
[RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers
This patch ports the headers in alphabetic order to ease the maintenance for this part. Signed-off-by: Yoshihiro Shimoda --- drivers/mmc/host/renesas_sdhi_core.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index 5e9e36e..c5ee4a6 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -18,20 +18,20 @@ * */ -#include #include -#include -#include -#include -#include +#include +#include +#include #include #include -#include -#include -#include +#include +#include #include #include +#include #include +#include +#include #include #include "renesas_sdhi.h" -- 2.7.4
[RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops
This patch adds the .capable into iommu_ops that can merge scatter gather segments. Signed-off-by: Yoshihiro Shimoda --- drivers/iommu/ipmmu-vmsa.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 408ad0b..81170b8 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -608,6 +608,18 @@ static irqreturn_t ipmmu_irq(int irq, void *dev) * IOMMU Operations */ +static bool ipmmu_capable(enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_MERGING: + return true; + default: + break; + } + + return false; +} + static struct iommu_domain *__ipmmu_domain_alloc(unsigned type) { struct ipmmu_vmsa_domain *domain; @@ -950,6 +962,7 @@ static struct iommu_group *ipmmu_find_group(struct device *dev) } static const struct iommu_ops ipmmu_ops = { + .capable = ipmmu_capable, .domain_alloc = ipmmu_domain_alloc, .domain_free = ipmmu_domain_free, .attach_dev = ipmmu_attach_device, -- 2.7.4
[RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place
iommu_dma_map_sg() will use the unmap function in the future. To avoid a forward declaration, this patch move the function place. Signed-off-by: Yoshihiro Shimoda --- drivers/iommu/dma-iommu.c | 48 +++ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 0dee374..034caae 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -730,6 +730,30 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, __iommu_dma_unmap(dev, dma_handle, size); } +static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir, unsigned long attrs) +{ + dma_addr_t start, end; + struct scatterlist *tmp; + int i; + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir); + + /* +* The scatterlist segments are mapped into a single +* contiguous IOVA allocation, so this is incredibly easy. +*/ + start = sg_dma_address(sg); + for_each_sg(sg_next(sg), tmp, nents - 1, i) { + if (sg_dma_len(tmp) == 0) + break; + sg = tmp; + } + end = sg_dma_address(sg) + sg_dma_len(sg); + __iommu_dma_unmap(dev, start, end - start); +} + /* * Prepare a successfully-mapped scatterlist to give back to the caller. * @@ -887,30 +911,6 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, return 0; } -static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs) -{ - dma_addr_t start, end; - struct scatterlist *tmp; - int i; - - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir); - - /* -* The scatterlist segments are mapped into a single -* contiguous IOVA allocation, so this is incredibly easy. -*/ - start = sg_dma_address(sg); - for_each_sg(sg_next(sg), tmp, nents - 1, i) { - if (sg_dma_len(tmp) == 0) - break; - sg = tmp; - } - end = sg_dma_address(sg) + sg_dma_len(sg); - __iommu_dma_unmap(dev, start, end - start); -} - static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs
This patch adds a definition for default max_segs to be used by other driver (renesas_sdhi) in the future. Signed-off-by: Yoshihiro Shimoda Reviewed-by: Wolfram Sang --- drivers/mmc/host/tmio_mmc.h | 1 + drivers/mmc/host/tmio_mmc_core.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h index c5ba13f..9e387be 100644 --- a/drivers/mmc/host/tmio_mmc.h +++ b/drivers/mmc/host/tmio_mmc.h @@ -106,6 +106,7 @@ #define TMIO_MASK_IRQ (TMIO_MASK_READOP | TMIO_MASK_WRITEOP | TMIO_MASK_CMD) #define TMIO_MAX_BLK_SIZE 512 +#define TMIO_DEFAULT_MAX_SEGS 32 struct tmio_mmc_data; struct tmio_mmc_host; diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c index c9f6a59..af1343e 100644 --- a/drivers/mmc/host/tmio_mmc_core.c +++ b/drivers/mmc/host/tmio_mmc_core.c @@ -1185,7 +1185,7 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host) mmc->caps |= MMC_CAP_4_BIT_DATA | pdata->capabilities; mmc->caps2 |= pdata->capabilities2; - mmc->max_segs = pdata->max_segs ? : 32; + mmc->max_segs = pdata->max_segs ? : TMIO_DEFAULT_MAX_SEGS; mmc->max_blk_size = TMIO_MAX_BLK_SIZE; mmc->max_blk_count = pdata->max_blk_count ? : (PAGE_SIZE / mmc->max_blk_size) * mmc->max_segs; -- 2.7.4
[RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
This patch adds a new capable IOMMU_CAP_MERGING to check whether the IOVA would be contiguous strictly if a device requires and the IOMMU driver has the capable. Signed-off-by: Yoshihiro Shimoda --- drivers/iommu/dma-iommu.c | 26 -- include/linux/iommu.h | 1 + 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 034caae..ecf1a04 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, dma_addr_t iova; size_t iova_len = 0; unsigned long mask = dma_get_seg_boundary(dev); - int i; + int i, ret; + bool iova_contiguous = false; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) iommu_dma_sync_sg_for_device(dev, sg, nents, dir); + if (dma_get_iova_contiguous(dev) && + iommu_capable(dev->bus, IOMMU_CAP_MERGING)) + iova_contiguous = true; + /* * Work out how much IOVA space we need, and align the segments to * IOVA granules for the IOMMU driver to handle. With some clever @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, sg_dma_len(s) = s_length; s->offset -= s_iova_off; s_length = iova_align(iovad, s_length + s_iova_off); + /* +* Check whether the IOVA would be contiguous strictly if +* a device requires and the IOMMU driver has the capable. +*/ + if (iova_contiguous && i > 0 && + (s_iova_off || s->length != s_length)) + return 0; s->length = s_length; /* @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) goto out_free_iova; - return __finalise_sg(dev, sg, nents, iova); + ret = __finalise_sg(dev, sg, nents, iova); + /* +* Check whether the sg entry is single if a device requires and +* the IOMMU driver has the capable. +*/ + if (iova_contiguous && ret != 1) + goto out_unmap_sg; + + return ret; +out_unmap_sg: + iommu_dma_unmap_sg(dev, sg, nents, dir, attrs); out_free_iova: iommu_dma_free_iova(cookie, iova, iova_len); out_restore_sg: diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 91af22a..f971dd3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -104,6 +104,7 @@ enum iommu_cap { transactions */ IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ + IOMMU_CAP_MERGING, /* IOMMU supports segments merging */ }; /* -- 2.7.4
[RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance
This patch series is based on iommu.git / next branch. Since SDHI host internal DMAC of the R-Car Gen3 cannot handle two or more segments, the performance rate (especially, eMMC HS400 reading) is not good. However, if IOMMU is enabled on the DMAC, since IOMMU will map multiple scatter gather buffers as one contignous iova, the DMAC can handle the iova as well and then the performance rate is possible to improve. In fact, I have measured the performance by using bonnie++, "Sequential Input - block" rate was improved on r8a7795. To achieve this, this patch series modifies DMA MAPPING and IOMMU subsystem at first. Since I'd like to get any feedback from each subsystem whether this way is acceptable for upstream, I submit it to treewide with RFC. Changes from v4: - [DMA MAPPING] Add a new device_dma_parameters for iova contiguous. - [IOMMU] Add a new capable for "merging" segments. - [IOMMU] Add a capable ops into the ipmmu-vmsa driver. - [MMC] Sort headers in renesas_sdhi_core.c. - [MMC] Remove the following codes that made on v3 that can be achieved by DMA MAPPING and IOMMU subsystem: -- Check if R-Car Gen3 IPMMU is used or not on patch 3. -- Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3. https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=125593 Changes from v3: - Use a helper function device_iommu_mapped on patch 1 and 3. - Check if R-Car Gen3 IPMMU is used or not on patch 3. - Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3. - Add Reviewed-by Wolfram-san on patch 1 and 2. Note that I also got his Reviewed-by on patch 3, but I changed it from v2. So, I didn't add his Reviewed-by at this time. https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=120985 Changes from v2: - Add some conditions in the init_card(). - Add a comment in the init_card(). - Add definitions for some "MAX_SEGS". https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=116729 Changes from v1: - Remove adding init_card ops into struct tmio_mmc_dma_ops and tmio_mmc_host and just set init_card on renesas_sdhi_core.c. - Revise typos on "mmc: tmio: No memory size limitation if runs on IOMMU". - Add Simon-san's Reviewed-by on a tmio patch. https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=110485 *** BLURB HERE *** Yoshihiro Shimoda (8): dma-mapping: add a device driver helper for iova contiguous iommu/dma: move iommu_dma_unmap_sg() place iommu: add a new capable IOMMU_CAP_MERGING iommu/ipmmu-vmsa: add capable ops mmc: tmio: No memory size limitation if runs on IOMMU mmc: tmio: Add a definition for default max_segs mmc: renesas_sdhi: sort headers mmc: renesas_sdhi: use multiple segments if possible drivers/iommu/dma-iommu.c | 74 +-- drivers/iommu/ipmmu-vmsa.c| 13 + drivers/mmc/host/renesas_sdhi_core.c | 43 +--- drivers/mmc/host/renesas_sdhi_internal_dmac.c | 4 ++ drivers/mmc/host/tmio_mmc.h | 1 + drivers/mmc/host/tmio_mmc_core.c | 7 +-- include/linux/device.h| 1 + include/linux/dma-mapping.h | 16 ++ include/linux/iommu.h | 1 + 9 files changed, 123 insertions(+), 37 deletions(-) -- 2.7.4
RE: [PATCH v2 2/4] iommu: Introduce device fault data
> From: Jacob Pan > Sent: Tuesday, June 4, 2019 6:09 AM > > On Mon, 3 Jun 2019 15:57:47 +0100 > Jean-Philippe Brucker wrote: > > > +/** > > + * struct iommu_fault_page_request - Page Request data > > + * @flags: encodes whether the corresponding fields are valid and > > whether this > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > values) > > + * @pasid: Process Address Space ID > > + * @grpid: Page Request Group Index > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > > + * @addr: page address > > + * @private_data: device-specific private information > > + */ > > +struct iommu_fault_page_request { > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > + __u32 flags; > > + __u32 pasid; > > + __u32 grpid; > > + __u32 perm; > > + __u64 addr; > > + __u64 private_data[2]; > > +}; > > + > > Just a thought, for non-identity G-H PASID management. We could pass on > guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU > allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a > G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU > driver > can retrieve GPASID from the bind data then report to the guest via > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > Should we add a gpasid field here? or we can add a flag and field > later, up to you. > Can private_data serve this purpose? It's better not introducing gpasid awareness within host IOMMU driver. It is just a user-level data associated with a PASID when binding happens. Kernel doesn't care the actual meaning, simply record it and then return back to user space later upon device fault. Qemu interprets the meaning as gpasid in its own context. otherwise usages may use it for other purpose. Thanks Kevin
Re: [PATCH v2 3/3] xen/swiotlb: remember having called xen_create_contiguous_region()
On 31.05.19 13:38, Juergen Gross wrote: On 30/05/2019 14:46, Boris Ostrovsky wrote: On 5/30/19 5:04 AM, Christoph Hellwig wrote: Please don't add your private flag to page-flags.h. The whole point of the private flag is that you can use it in any way you want withou touching the common code. There is already a bunch of aliases for various sub-components (including Xen) in page-flags.h for private flags, which is why I suggested we do the same for the new use case. Adding this new alias will keep flag usage consistent. What about me adding another patch moving those Xen private aliases into arch/x86/include/asm/xen/page.h ? This is becoming difficult. I'd need to remove the "#undef PF_NO_COMPOUND" from page-flags.h or to #include a (new) xen/page-flags.h from page-flags.h after all the defines are ready. Is that really worth the effort given that other components (e.g. file systems) are doing the same? Juergen ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: add blacklist for iommu dma_ops
On Mon, Jun 3, 2019 at 7:48 PM Rob Clark wrote: > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa wrote: > > > > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark wrote: > > > > > > On Fri, May 10, 2019 at 7:35 AM Rob Clark wrote: > > > > > > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring wrote: > > > > > > > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark wrote: > > > > > > > > > > > > This solves a problem we see with drm/msm, caused by getting > > > > > > iommu_dma_ops while we attach our own domain and manage it directly > > > > > > at > > > > > > the iommu API level: > > > > > > > > > > > > [0038] user address but active_mm is swapper > > > > > > Internal error: Oops: 9605 [#1] PREEMPT SMP > > > > > > Modules linked in: > > > > > > CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW > > > > > > 4.19.3 #90 > > > > > > Hardware name: xxx (DT) > > > > > > Workqueue: events deferred_probe_work_func > > > > > > pstate: 80c9 (Nzcv daif +PAN +UAO) > > > > > > pc : iommu_dma_map_sg+0x7c/0x2c8 > > > > > > lr : iommu_dma_map_sg+0x40/0x2c8 > > > > > > sp : ff80095eb4f0 > > > > > > x29: ff80095eb4f0 x28: > > > > > > x27: ffc0f9431578 x26: > > > > > > x25: x24: 0003 > > > > > > x23: 0001 x22: ffc0fa9ac010 > > > > > > x21: x20: ffc0fab40980 > > > > > > x19: ffc0fab40980 x18: 0003 > > > > > > x17: 01c4 x16: 0007 > > > > > > x15: 000e x14: > > > > > > x13: x12: 0028 > > > > > > x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f > > > > > > x9 : x8 : ffc0fab409a0 > > > > > > x7 : x6 : 0002 > > > > > > x5 : 0001 x4 : > > > > > > x3 : 0001 x2 : 0002 > > > > > > x1 : ffc0f9431578 x0 : > > > > > > Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb) > > > > > > Call trace: > > > > > >iommu_dma_map_sg+0x7c/0x2c8 > > > > > >__iommu_map_sg_attrs+0x70/0x84 > > > > > >get_pages+0x170/0x1e8 > > > > > >msm_gem_get_iova+0x8c/0x128 > > > > > >_msm_gem_kernel_new+0x6c/0xc8 > > > > > >msm_gem_kernel_new+0x4c/0x58 > > > > > >dsi_tx_buf_alloc_6g+0x4c/0x8c > > > > > >msm_dsi_host_modeset_init+0xc8/0x108 > > > > > >msm_dsi_modeset_init+0x54/0x18c > > > > > >_dpu_kms_drm_obj_init+0x430/0x474 > > > > > >dpu_kms_hw_init+0x5f8/0x6b4 > > > > > >msm_drm_bind+0x360/0x6c8 > > > > > >try_to_bring_up_master.part.7+0x28/0x70 > > > > > >component_master_add_with_match+0xe8/0x124 > > > > > >msm_pdev_probe+0x294/0x2b4 > > > > > >platform_drv_probe+0x58/0xa4 > > > > > >really_probe+0x150/0x294 > > > > > >driver_probe_device+0xac/0xe8 > > > > > >__device_attach_driver+0xa4/0xb4 > > > > > >bus_for_each_drv+0x98/0xc8 > > > > > >__device_attach+0xac/0x12c > > > > > >device_initial_probe+0x24/0x30 > > > > > >bus_probe_device+0x38/0x98 > > > > > >deferred_probe_work_func+0x78/0xa4 > > > > > >process_one_work+0x24c/0x3dc > > > > > >worker_thread+0x280/0x360 > > > > > >kthread+0x134/0x13c > > > > > >ret_from_fork+0x10/0x18 > > > > > > Code: d284 91000725 6b17039f 5400048a (f9401f40) > > > > > > ---[ end trace f22dda57f3648e2c ]--- > > > > > > Kernel panic - not syncing: Fatal exception > > > > > > SMP: stopping secondary CPUs > > > > > > Kernel Offset: disabled > > > > > > CPU features: 0x0,22802a18 > > > > > > Memory Limit: none > > > > > > > > > > > > The problem is that when drm/msm does it's own > > > > > > iommu_attach_device(), > > > > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's > > > > > > domain, and it doesn't have domain->iova_cookie. > > > > > > > > > > > > We kind of avoided this problem prior to sdm845/dpu because the > > > > > > iommu > > > > > > was attached to the mdp node in dt, which is a child of the toplevel > > > > > > mdss node (which corresponds to the dev passed in dma_map_sg()). > > > > > > But > > > > > > with sdm845, now the iommu is attached at the mdss level so we hit > > > > > > the > > > > > > iommu_dma_ops in dma_map_sg(). > > > > > > > > > > > > But auto allocating/attaching a domain before the driver is probed > > > > > > was > > > > > > already a blocking problem for enabling per-context pagetables for > > > > > > the > > > > > > GPU. This problem is also now solved with this patch. > > > > > > > > > > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in > > > > > > of_dma_configure > > > > > > Tested-by: Douglas Anderson > > > > > > Signed-off-by: Rob Clark > > > > > > --- > > > > > > This is an alternative/replacement for [1]. What it lacks in > > > > > > elegance > > > > > > it makes up for in practicality ;-) > > > > > > > > > > >