RE: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-05 Thread Tian, Kevin
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Thursday, June 6, 2019 1:38 AM
> 
> On Wed, 5 Jun 2019 08:51:45 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan
> > > Sent: Tuesday, June 4, 2019 6:09 AM
> > >
> > > On Mon,  3 Jun 2019 15:57:47 +0100
> > > Jean-Philippe Brucker  wrote:
> > >
> > > > +/**
> > > > + * struct iommu_fault_page_request - Page Request data
> > > > + * @flags: encodes whether the corresponding fields are valid and
> > > > whether this
> > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > > > values)
> > > > + * @pasid: Process Address Space ID
> > > > + * @grpid: Page Request Group Index
> > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_*
> values)
> > > > + * @addr: page address
> > > > + * @private_data: device-specific private information
> > > > + */
> > > > +struct iommu_fault_page_request {
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID   (1 << 0)
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> > > > +   __u32   flags;
> > > > +   __u32   pasid;
> > > > +   __u32   grpid;
> > > > +   __u32   perm;
> > > > +   __u64   addr;
> > > > +   __u64   private_data[2];
> > > > +};
> > > > +
> > >
> > > Just a thought, for non-identity G-H PASID management. We could
> > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> > > HPASID, IOMMU driver
> > > can retrieve GPASID from the bind data then report to the guest via
> > > VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> > >
> > > Should we add a gpasid field here? or we can add a flag and field
> > > later, up to you.
> > >
> >
> > Can private_data serve this purpose? It's better not introducing
> > gpasid awareness within host IOMMU driver. It is just a user-level
> > data associated with a PASID when binding happens. Kernel doesn't
> > care the actual meaning, simply record it and then return back to
> > user space later upon device fault. Qemu interprets the meaning as
> > gpasid in its own context. otherwise usages may use it for other
> > purpose.
> >
> private_data was intended for device PRQ with private data, part of the
> VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host
> then respond back when page response from the guest matches pending PRQ
> with the data withheld. But for in-kernel PRQ reporting, private data
> still might be passed on to any driver who wants to process the PRQ. So
> we can't re-purpose it.

sure. I just use it as one example to extend.

> 
> But for in-kernel VDCM driver, it needs a lookup from guest PASID to
> host PASID. I thought you wanted to have IOMMU driver provide such
> service since the knowledge of H-G pasid can be established during
> bind_gpasid time. In that sense, we _do_ have gpasid awareness.
> 

yes, it makes sense. My original point is that IOMMU driver itself 
doesn't need to know the actual meaning of this field (then it may
be reused for different purpose from gpasid), but you are right that
mdev driver in kernel anyway needs to do G-H translation then 
explicitly defining it looks reasonable.

Thanks
Kevin 


Re: [PATCH] iommu/dma: Apply dma_{alloc,free}_contiguous functions

2019-06-05 Thread Christoph Hellwig
Looks fine to me.  Robin, any comments?

On Mon, Jun 03, 2019 at 03:52:59PM -0700, Nicolin Chen wrote:
> This patch replaces dma_{alloc,release}_from_contiguous() with
> dma_{alloc,free}_contiguous() to simplify those function calls.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/dma-iommu.c | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 0cd49c2d3770..cc3d39dbbe1a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -951,8 +951,8 @@ static void __iommu_dma_free(struct device *dev, size_t 
> size, void *cpu_addr)
>  
>   if (pages)
>   __iommu_dma_free_pages(pages, count);
> - if (page && !dma_release_from_contiguous(dev, page, count))
> - __free_pages(page, get_order(alloc_size));
> + if (page)
> + dma_free_contiguous(dev, page, alloc_size);
>  }
>  
>  static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
> @@ -970,12 +970,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, 
> size_t size,
>   struct page *page = NULL;
>   void *cpu_addr;
>  
> - if (gfpflags_allow_blocking(gfp))
> - page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT,
> -  get_order(alloc_size),
> -  gfp & __GFP_NOWARN);
> - if (!page)
> - page = alloc_pages(gfp, get_order(alloc_size));
> + page = dma_alloc_contiguous(dev, alloc_size, gfp);
>   if (!page)
>   return NULL;
>  
> @@ -997,8 +992,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, 
> size_t size,
>   memset(cpu_addr, 0, alloc_size);
>   return cpu_addr;
>  out_free_pages:
> - if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT))
> - __free_pages(page, get_order(alloc_size));
> + dma_free_contiguous(dev, page, alloc_size);
>   return NULL;
>  }
>  
> -- 
> 2.17.1
---end quoted text---


RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Yoshihiro Shimoda
Hi Christoph,

Thank you for your comments!

> From: Christoph Hellwig, Sent: Wednesday, June 5, 2019 9:38 PM
> 
> On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote:
> > And if the problem is really that you're not getting merging because of
> > exposing the wrong parameters to the DMA API and/or the block layer, or
> > that you just can't quite express your requirement to the block layer in
> > the first place, then that should really be tackled at the source rather
> > than worked around further down in the stack.
> 
> The problem is that we need a way to communicate to the block layer
> that more than a single segment is ok IFF the DMA API instance supports
> merging.  And of course the answer will depend on futher parameters
> like the maximum merged segment size and alignment for the segement.

I'm afraid but I don't understand why we need a way to communicate to
the block layer that more than a single segment is ok IFF the DMA API
instance supports merging.

> We'll need some way to communicate that, but I don't really think this
> is series is the way to go.

I should discard the patches 1/8 through 4/8.

> We'd really need something hanging off the device (or through a query
> API) how the dma map ops implementation exposes under what circumstances
> it can merge.  The driver can then communicate that to the block layer
> so that the block layer doesn't split requests up when reaching the
> segement limit.

The block layer already has a limit "max_segment_size" for each device so that
regardless it can/cannot merge the segments, we can use the limit.
Is my understanding incorrect?

Best regards,
Yoshihiro Shimoda

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Yoshihiro Shimoda
Hi Robin,

Thank you for your comments!

> From: Robin Murphy, Sent: Wednesday, June 5, 2019 9:22 PM

> > @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct 
> > scatterlist *sg,
> > if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
> > goto out_free_iova;
> >
> > -   return __finalise_sg(dev, sg, nents, iova);
> > +   ret = __finalise_sg(dev, sg, nents, iova);
> > +   /*
> > +* Check whether the sg entry is single if a device requires and
> > +* the IOMMU driver has the capable.
> > +*/
> > +   if (iova_contiguous && ret != 1)
> > +   goto out_unmap_sg;
> 
> I don't see that just failing really gives this option any value.
> Clearly the MMC driver has to do *something* to handle the failure (plus
> presumably the case of not having IOMMU DMA ops at all), which begs the
> question of why it couldn't just do whatever that is anyway, without all
> this infrastructure. For starters, it would be a far simpler and less
> invasive patch:
> 
>   if (dma_map_sg(...) > 1) {
>   dma_unmap_sg(...);
>   /* split into multiple requests and try again */
>   }

I understood it.

> But then it would make even more sense to just have the driver be
> proactive about its special requirement in the first place, and simply
> validate the list before it even tries to map it:
> 
>   for_each_sg(sgl, sg, n, i)
>   if ((i > 0 && sg->offset % PAGE_SIZE) ||
>   (i < n - 1 && sg->length % PAGE_SIZE))
>   /* segment will not be mergeable */

In previous version, I made such a code [1].
But, I think I misunderstood Christoph's comments [2] [3].

[1]
https://patchwork.kernel.org/patch/10970047/

[2]
https://marc.info/?l=linux-renesas-soc&m=155956751811689&w=2

[3]
https://marc.info/?l=linux-renesas-soc&m=155852814607202&w=2

> For reference, I think v4l2 and possibly some areas of DRM already do
> something vaguely similar to judge whether they get contiguous buffers
> or not.

I see. I'll check these areas later.

> > +
> > +   return ret;
> >
> > +out_unmap_sg:
> > +   iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
> >   out_free_iova:
> > iommu_dma_free_iova(cookie, iova, iova_len);
> >   out_restore_sg:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 91af22a..f971dd3 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -104,6 +104,7 @@ enum iommu_cap {
> >transactions */
> > IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
> > IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
> > +   IOMMU_CAP_MERGING,  /* IOMMU supports segments merging */
> 
> This isn't a 'capability' of the IOMMU - "segment merging" equates to
> just remapping pages, and there's already a fundamental assumption that
> IOMMUs are capable of that. Plus it's very much a DMA API concept, so
> hardly belongs in the IOMMU API anyway.

I got it.

> All in all, I'm struggling to see the point of this. Although it's not a
> DMA API guarantee, iommu-dma already merges scatterlists as aggressively
> as it is allowed to, and will continue to do so for the foreseeable
> future (since it avoids considerable complication in the IOVA
> allocation), so if you want to make sure iommu_dma_map_sg() merges an
> entire list, just don't give it a non-mergeable list.

Thank you for the explanation. I didn't know that a driver should not
give it a non-mergeable list.

> And if you still
> really really want dma_map_sg() to have a behaviour of "merge to a
> single segment or fail", then that should at least be a DMA API
> attribute, which could in principle be honoured by bounce-buffering
> implementations as well.

I got it. For this patch series, it seems I have to modify a block layer
so that such a new DMA API is not needed though.

> And if the problem is really that you're not getting merging because of
> exposing the wrong parameters to the DMA API and/or the block layer, or
> that you just can't quite express your requirement to the block layer in
> the first place, then that should really be tackled at the source rather
> than worked around further down in the stack.

I'll reply on Christoph email about this topic later.

Best regards,
Yoshihiro Shimoda

> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-05 Thread Jacob Pan
On Tue, 4 Jun 2019 18:11:08 +0200
Auger Eric  wrote:

> Hi Alex,
> 
> On 6/4/19 12:31 AM, Alex Williamson wrote:
> > On Sun, 26 May 2019 18:10:01 +0200
> > Eric Auger  wrote:
> >   
> >> This patch registers a fault handler which records faults in
> >> a circular buffer and then signals an eventfd. This buffer is
> >> exposed within the fault region.
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >>
> >> v3 -> v4:
> >> - move iommu_unregister_device_fault_handler to vfio_pci_release
> >> ---
> >>  drivers/vfio/pci/vfio_pci.c | 49
> >> + drivers/vfio/pci/vfio_pci_private.h
> >> |  1 + 2 files changed, 50 insertions(+)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c
> >> b/drivers/vfio/pci/vfio_pci.c index f75f61127277..52094ba8
> >> 100644 --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -30,6 +30,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  #include "vfio_pci_private.h"
> >>  
> >> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops
> >> vfio_pci_fault_prod_regops = { .add_capability =
> >> vfio_pci_fault_prod_add_capability, };
> >>  
> >> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event
> >> *evt, void *data) +{
> >> +  struct vfio_pci_device *vdev = (struct vfio_pci_device *)
> >> data;
> >> +  struct vfio_region_fault_prod *prod_region =
> >> +  (struct vfio_region_fault_prod
> >> *)vdev->fault_pages;
> >> +  struct vfio_region_fault_cons *cons_region =
> >> +  (struct vfio_region_fault_cons
> >> *)(vdev->fault_pages + 2 * PAGE_SIZE);
> >> +  struct iommu_fault *new =
> >> +  (struct iommu_fault *)(vdev->fault_pages +
> >> prod_region->offset +
> >> +  prod_region->prod *
> >> prod_region->entry_size);
> >> +  int prod, cons, size;
> >> +
> >> +  mutex_lock(&vdev->fault_queue_lock);
> >> +
> >> +  if (!vdev->fault_abi)
> >> +  goto unlock;
> >> +
> >> +  prod = prod_region->prod;
> >> +  cons = cons_region->cons;
> >> +  size = prod_region->nb_entries;
> >> +
> >> +  if (CIRC_SPACE(prod, cons, size) < 1)
> >> +  goto unlock;
> >> +
> >> +  *new = evt->fault;
> >> +  prod = (prod + 1) % size;
> >> +  prod_region->prod = prod;
> >> +  mutex_unlock(&vdev->fault_queue_lock);
> >> +
> >> +  mutex_lock(&vdev->igate);
> >> +  if (vdev->dma_fault_trigger)
> >> +  eventfd_signal(vdev->dma_fault_trigger, 1);
> >> +  mutex_unlock(&vdev->igate);
> >> +  return 0;
> >> +
> >> +unlock:
> >> +  mutex_unlock(&vdev->fault_queue_lock);
> >> +  return -EINVAL;
> >> +}
> >> +
> >>  static int vfio_pci_init_fault_region(struct vfio_pci_device
> >> *vdev) {
> >>struct vfio_region_fault_prod *header;
> >> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct
> >> vfio_pci_device *vdev) header = (struct vfio_region_fault_prod
> >> *)vdev->fault_pages; header->version = -1;
> >>header->offset = PAGE_SIZE;
> >> +
> >> +  ret =
> >> iommu_register_device_fault_handler(&vdev->pdev->dev,
> >> +
> >> vfio_pci_iommu_dev_fault_handler,
> >> +  vdev);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >>return 0;
> >>  out:
> >>kfree(vdev->fault_pages);
> >> @@ -570,6 +618,7 @@ static void vfio_pci_release(void *device_data)
> >>if (!(--vdev->refcnt)) {
> >>vfio_spapr_pci_eeh_release(vdev->pdev);
> >>vfio_pci_disable(vdev);
> >> +
> >> iommu_unregister_device_fault_handler(&vdev->pdev->dev);  
> > 
> > 
> > But this can fail if there are pending faults which leaves a device
> > reference and then the system is broken :(  
> This series only features unrecoverable errors and for those the
> unregistration cannot fail. Now unrecoverable errors were added I
> admit this is confusing. We need to sort this out or clean the
> dependencies.
As Alex pointed out in 4/29, we can make
iommu_unregister_device_fault_handler() never fail and clean up all the
pending faults in the host IOMMU belong to that device. But the problem
is that if a fault, such as PRQ, has already been injected into the
guest, the page response may come back after handler is unregistered
and registered again. We need a way to reject such page response belong
to the previous life of the handler. Perhaps a sync call to the guest
with your fault queue eventfd? I am not sure.

Jacob


Re: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-05 Thread Jacob Pan
On Wed, 5 Jun 2019 12:24:09 +0100
Jean-Philippe Brucker  wrote:

> On 05/06/2019 09:51, Tian, Kevin wrote:
> >> From: Jacob Pan
> >> Sent: Tuesday, June 4, 2019 6:09 AM
> >>
> >> On Mon,  3 Jun 2019 15:57:47 +0100
> >> Jean-Philippe Brucker  wrote:
> >>  
> >>> +/**
> >>> + * struct iommu_fault_page_request - Page Request data
> >>> + * @flags: encodes whether the corresponding fields are valid and
> >>> whether this
> >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> >>> values)
> >>> + * @pasid: Process Address Space ID
> >>> + * @grpid: Page Request Group Index
> >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> >>> + * @addr: page address
> >>> + * @private_data: device-specific private information
> >>> + */
> >>> +struct iommu_fault_page_request {
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE   (1 << 1)
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA   (1 << 2)
> >>> + __u32   flags;
> >>> + __u32   pasid;
> >>> + __u32   grpid;
> >>> + __u32   perm;
> >>> + __u64   addr;
> >>> + __u64   private_data[2];
> >>> +};
> >>> +  
> >>
> >> Just a thought, for non-identity G-H PASID management. We could
> >> pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> >> QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> >> QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> >> HPASID, IOMMU driver
> >> can retrieve GPASID from the bind data then report to the guest via
> >> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> >>
> >> Should we add a gpasid field here? or we can add a flag and field
> >> later, up to you.
> >>  
> > 
> > Can private_data serve this purpose?  
> 
> Isn't private_data already used for VT-d's Private Data field?
> 
yes, as part of the PRQ. please see my explanation in the previous
email.
> > It's better not introducing
> > gpasid awareness within host IOMMU driver. It is just a user-level
> > data associated with a PASID when binding happens. Kernel doesn't
> > care the actual meaning, simply record it and then return back to
> > user space later upon device fault. Qemu interprets the meaning as
> > gpasid in its own context. otherwise usages may use it for other
> > purpose.  
> 
> Regarding a gpasid field I don't mind either way, but extending the
> iommu_fault structure later won't be completely straightforward so we
> could add some padding now.
> 
> Userspace negotiate the iommu_fault struct format with VFIO, before
> allocating a circular buffer of N fault structures
> ().
> So adding new fields requires introducing a new ABI version and a
> struct iommu_fault_v2. That may be OK for disruptive changes, but
> just adding a new field indicated by a flag shouldn't have to be that
> complicated.
> 
> How about setting the iommu_fault structure to 128 bytes?
> 
> struct iommu_fault {
>   __u32   type;
>   __u32   padding;
>   union {
>   struct iommu_fault_unrecoverable event;
>   struct iommu_fault_page_request prm;
>   __u8 padding2[120];
>   };
> };
> 
> Given that @prm is currently 40 bytes and @event 32 bytes, the padding
> allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit
> fields, which is still representable with new flags) before we have to
> upgrade the ABI version.
> 
> A 4kB and a 64kB queue can hold respectively:
> 
> * 85 and 1365 records when iommu_fault is 48 bytes (current format).
> * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow
> only 2 new 64-bit fields).
> * 32 and 512 records when iommu_fault is 128 bytes.
> 
> In comparison,
> * the SMMU even queue can hold 128 and 2048 events respectively at
> those sizes (and is allowed to grow up to 524k entries)
> * the SMMU PRI queue can hold 256 and 4096 PR.
> 
> But the SMMU queues have to be physically contiguous, whereas our
> fault queues are in userspace memory which is less expensive. So
> 128-byte records might be reasonable. What do you think?
> 
I think though 128-byte is large enough for any future extension but
64B might be good enough and it is a cache line. PCI page request msg
is only 16B :)

VT-d currently uses one 4K page for PRQ, holds 128 records of PRQ
descriptors. This can grow to 16K entries per spec. That is per IOMMU.
The user fault queue here is per device. So we do have to be frugal
about it since it will support mdev at per PASID level at some point?

I have to look into Eric's patchset on how he handles queue full in the
producer. If we go with 128B size in iommu_fault and 4KB size queue
(32 entries as in your table), VT-d PRQ size of 128 entries can
potentially cause queue full. We have to handle this VFIO queue full
differently than the IOMMU queue full in that we only need to discard
PRQ for one device. (Whereas IOMMU queue full has to clear out all).

Anyway, I think 64B should be enough but 128B is fine too. We have to
de

[RFC 2/2] iommu: arm-smmu: Don't blindly use first SMR to calculate mask

2019-06-05 Thread Bjorn Andersson
With the SMRs inherited from the bootloader the first SMR might actually
be valid and in use. As such probing the SMR mask using the first SMR
might break a stream in use. Search for an unused stream and use this to
probe the SMR mask.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8629a656b42..0c6f5fe6f382 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1084,23 +1084,35 @@ static void arm_smmu_test_smr_masks(struct 
arm_smmu_device *smmu)
 {
void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
u32 smr;
+   int idx;
 
if (!smmu->smrs)
return;
 
+   for (idx = 0; idx < smmu->num_mapping_groups; idx++) {
+   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
+   if (!(smr & SMR_VALID))
+   break;
+   }
+
+   if (idx == smmu->num_mapping_groups) {
+   dev_err(smmu->dev, "Unable to compute streamid_mask\n");
+   return;
+   }
+
/*
 * SMR.ID bits may not be preserved if the corresponding MASK
 * bits are set, so check each one separately. We can reject
 * masters later if they try to claim IDs outside these masks.
 */
smr = smmu->streamid_mask << SMR_ID_SHIFT;
-   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
-   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
+   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
+   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
smmu->streamid_mask = smr >> SMR_ID_SHIFT;
 
smr = smmu->streamid_mask << SMR_MASK_SHIFT;
-   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
-   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
+   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
+   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
smmu->smr_mask_mask = smr >> SMR_MASK_SHIFT;
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 1/2] iommu: arm-smmu: Handoff SMR registers and context banks

2019-06-05 Thread Bjorn Andersson
Boot splash screen or EFI framebuffer requires the display hardware to
operate while the Linux iommu driver probes. Therefore, we cannot simply
wipe out the SMR register settings programmed by the bootloader.

Detect which SMR registers are in use during probe, and which context
banks they are associated with. Reserve these context banks for the
first Linux device whose stream-id matches the SMR register.

Any existing page-tables will be discarded.

Heavily based on downstream implementation by Patrick Daly
.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu-regs.h |  2 +
 drivers/iommu/arm-smmu.c  | 80 ---
 2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index e9132a926761..8c1fd84032a2 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -105,7 +105,9 @@
 #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
 #define SMR_VALID  (1 << 31)
 #define SMR_MASK_SHIFT 16
+#define SMR_MASK_MASK  0x7fff
 #define SMR_ID_SHIFT   0
+#define SMR_ID_MASK0x
 
 #define ARM_SMMU_GR0_S2CR(n)   (0xc00 + ((n) << 2))
 #define S2CR_CBNDX_SHIFT   0
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..c8629a656b42 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -135,6 +135,7 @@ struct arm_smmu_s2cr {
enum arm_smmu_s2cr_type type;
enum arm_smmu_s2cr_privcfg  privcfg;
u8  cbndx;
+   boolhandoff;
 };
 
 #define s2cr_init_val (struct arm_smmu_s2cr){  \
@@ -399,9 +400,22 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
return err;
 }
 
-static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
+  struct device *dev)
 {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   unsigned long *map = smmu->context_map;
+   int end = smmu->num_context_banks;
+   int cbndx;
int idx;
+   int i;
+
+   for_each_cfg_sme(fwspec, i, idx) {
+   if (smmu->s2crs[idx].handoff) {
+   cbndx = smmu->s2crs[idx].cbndx;
+   goto found_handoff;
+   }
+   }
 
do {
idx = find_next_zero_bit(map, end, start);
@@ -410,6 +424,17 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, int 
start, int end)
} while (test_and_set_bit(idx, map));
 
return idx;
+
+found_handoff:
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   if (smmu->s2crs[i].cbndx == cbndx) {
+   smmu->s2crs[i].cbndx = 0;
+   smmu->s2crs[i].handoff = false;
+   smmu->s2crs[i].count--;
+   }
+   }
+
+   return cbndx;
 }
 
 static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
@@ -759,7 +784,8 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
 }
 
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-   struct arm_smmu_device *smmu)
+   struct arm_smmu_device *smmu,
+   struct device *dev)
 {
int irq, start, ret = 0;
unsigned long ias, oas;
@@ -873,8 +899,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
ret = -EINVAL;
goto out_unlock;
}
-   ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
- smmu->num_context_banks);
+   ret = __arm_smmu_alloc_cb(smmu, start, dev);
if (ret < 0)
goto out_unlock;
 
@@ -1264,7 +1289,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 
/* Ensure that the domain is finalised */
-   ret = arm_smmu_init_domain_context(domain, smmu);
+   ret = arm_smmu_init_domain_context(domain, smmu, dev);
if (ret < 0)
goto rpm_put;
 
@@ -1798,6 +1823,49 @@ static void arm_smmu_device_reset(struct arm_smmu_device 
*smmu)
writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 }
 
+static void arm_smmu_read_smr_state(struct arm_smmu_device *smmu)
+{
+   u32 privcfg;
+   u32 cbndx;
+   u32 mask;
+   u32 type;
+   u32 s2cr;
+   u32 smr;
+   u32 id;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = readl_relaxed(ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_SMR(i));
+   mask = (smr >> SMR_MASK_SHIFT) & SMR_MASK_MASK;
+   id = smr & SMR_ID_MASK;
+

[RFC 0/2] iommu: arm-smmu: Inherit SMR and CB config during init

2019-06-05 Thread Bjorn Andersson
Qualcomm devices typically boot with some sort of splash screen on the display,
or for some devices (i.e. recent Qualcomm laptops) an EFI framebuffer. For this
the bootloader allocates a static framebuffer, configures the display hardware
to output this on the display, sets up the SMMU for the display hardware and
jumps to the kernel.

But as the arm-smmu disables all SMRs the display hardware will no longer be
able to access the framebuffer and the result is that the device resets.

The given proposal reads back the SMR state at boot and for marks these
contexts as busy. This ensures that the display hardware will have continued
access to the framebuffer. Once a device is attached we try to match it to the
predefined stream mapping, so that e.g. the display driver will inherit the
particular SMRs/CBs.


This has the positive side effect that on some Qualcomm platforms, e.g. QCS404,
writes to the SMR registers are ignored. But as we inherit the preconfigured
mapping from the bootloader we can use the arm-smmu driver on these platforms.

Bjorn Andersson (2):
  iommu: arm-smmu: Handoff SMR registers and context banks
  iommu: arm-smmu: Don't blindly use first SMR to calculate mask

 drivers/iommu/arm-smmu-regs.h |   2 +
 drivers/iommu/arm-smmu.c  | 100 +++---
 2 files changed, 93 insertions(+), 9 deletions(-)

-- 
2.18.0



Re: [PATCH] driver: core: Allow subsystems to continue deferring probe

2019-06-05 Thread Greg Kroah-Hartman
On Wed, Jun 05, 2019 at 04:23:12PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Some subsystems, such as pinctrl, allow continuing to defer probe
> indefinitely. This is useful for devices that depend on resources
> provided by devices that are only probed after the init stage.
> 
> One example of this can be seen on Tegra, where the DPAUX hardware
> contains pinmuxing controls for pins that it shares with an I2C
> controller. The I2C controller is typically used for communication
> with a monitor over HDMI (DDC). However, other instances of the I2C
> controller are used to access system critical components, such as a
> PMIC. The I2C controller driver will therefore usually be a builtin
> driver, whereas the DPAUX driver is part of the display driver that
> is loaded from a module to avoid bloating the kernel image with all
> of the DRM/KMS subsystem.
> 
> In this particular case the pins used by this I2C/DDC controller
> become accessible very late in the boot process. However, since the
> controller is only used in conjunction with display, that's not an
> issue.
> 
> Unfortunately the driver core currently outputs a warning message
> when a device fails to get the pinctrl before the end of the init
> stage. That can be confusing for the user because it may sound like
> an unwanted error occurred, whereas it's really an expected and
> harmless situation.
> 
> In order to eliminate this warning, this patch allows callers of the
> driver_deferred_probe_check_state() helper to specify that they want
> to continue deferring probe, regardless of whether we're past the
> init stage or not. All of the callers of that function are updated
> for the new signature, but only the pinctrl subsystem passes a true
> value in the new persist parameter if appropriate.
> 
> Signed-off-by: Thierry Reding 
> ---
>  drivers/base/dd.c| 17 -
>  drivers/base/power/domain.c  |  2 +-
>  drivers/iommu/of_iommu.c |  2 +-
>  drivers/pinctrl/devicetree.c | 10 ++
>  include/linux/device.h   |  2 +-
>  5 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 0df9b4461766..25ffbadf4187 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -238,23 +238,30 @@ __setup("deferred_probe_timeout=", 
> deferred_probe_timeout_setup);
>  /**
>   * driver_deferred_probe_check_state() - Check deferred probe state
>   * @dev: device to check
> + * @persist: Boolean flag indicating whether drivers should keep trying to
> + *   probe after built-in drivers have had a chance to probe. This is useful
> + *   for built-in drivers that rely on resources provided by modular drivers.

Functionality aside, having random boolean flags in a function parameter
list is a horrid way to have an api.  Now when you see a call to this
function with either "true" or "false" as an option, you have no idea
what that means.  So you need to go look up this definition and then
work backwards.

So even if the idea is sane (I'm not saying that), this implementation
is not acceptable at all.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-05 Thread Jacob Pan
On Wed, 5 Jun 2019 08:51:45 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan
> > Sent: Tuesday, June 4, 2019 6:09 AM
> > 
> > On Mon,  3 Jun 2019 15:57:47 +0100
> > Jean-Philippe Brucker  wrote:
> >   
> > > +/**
> > > + * struct iommu_fault_page_request - Page Request data
> > > + * @flags: encodes whether the corresponding fields are valid and
> > > whether this
> > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > > values)
> > > + * @pasid: Process Address Space ID
> > > + * @grpid: Page Request Group Index
> > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> > > + * @addr: page address
> > > + * @private_data: device-specific private information
> > > + */
> > > +struct iommu_fault_page_request {
> > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE   (1 << 1)
> > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA   (1 << 2)
> > > + __u32   flags;
> > > + __u32   pasid;
> > > + __u32   grpid;
> > > + __u32   perm;
> > > + __u64   addr;
> > > + __u64   private_data[2];
> > > +};
> > > +  
> > 
> > Just a thought, for non-identity G-H PASID management. We could
> > pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> > HPASID, IOMMU driver
> > can retrieve GPASID from the bind data then report to the guest via
> > VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> > 
> > Should we add a gpasid field here? or we can add a flag and field
> > later, up to you.
> >   
> 
> Can private_data serve this purpose? It's better not introducing
> gpasid awareness within host IOMMU driver. It is just a user-level
> data associated with a PASID when binding happens. Kernel doesn't
> care the actual meaning, simply record it and then return back to
> user space later upon device fault. Qemu interprets the meaning as
> gpasid in its own context. otherwise usages may use it for other
> purpose.
> 
private_data was intended for device PRQ with private data, part of the
VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host
then respond back when page response from the guest matches pending PRQ
with the data withheld. But for in-kernel PRQ reporting, private data
still might be passed on to any driver who wants to process the PRQ. So
we can't re-purpose it.

But for in-kernel VDCM driver, it needs a lookup from guest PASID to
host PASID. I thought you wanted to have IOMMU driver provide such
service since the knowledge of H-G pasid can be established during
bind_gpasid time. In that sense, we _do_ have gpasid awareness.
 
> Thanks
> Kevin

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Sergei Shtylyov
On 06/05/2019 02:11 PM, Yoshihiro Shimoda wrote:

> This patch adds a new capable IOMMU_CAP_MERGING to check whether
> the IOVA would be contiguous strictly if a device requires and
> the IOMMU driver has the capable.

   s/has/is/? Or capable what?

> Signed-off-by: Yoshihiro Shimoda 
> ---
>  drivers/iommu/dma-iommu.c | 26 --
>  include/linux/iommu.h |  1 +
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 034caae..ecf1a04 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
[...]
> @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   sg_dma_len(s) = s_length;
>   s->offset -= s_iova_off;
>   s_length = iova_align(iovad, s_length + s_iova_off);
> + /*
> +  * Check whether the IOVA would be contiguous strictly if
> +  * a device requires and the IOMMU driver has the capable.

   Same question here...

> +  */
> + if (iova_contiguous && i > 0 &&
> + (s_iova_off || s->length != s_length))
> + return 0;
>   s->length = s_length;
>  
>   /*
> @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
>   goto out_free_iova;
>  
> - return __finalise_sg(dev, sg, nents, iova);
> + ret = __finalise_sg(dev, sg, nents, iova);
> + /*
> +  * Check whether the sg entry is single if a device requires and
> +  * the IOMMU driver has the capable.

   You  meant capability perhaps?

[...]

MBR, Sergei


[PATCH] driver: core: Allow subsystems to continue deferring probe

2019-06-05 Thread Thierry Reding
From: Thierry Reding 

Some subsystems, such as pinctrl, allow continuing to defer probe
indefinitely. This is useful for devices that depend on resources
provided by devices that are only probed after the init stage.

One example of this can be seen on Tegra, where the DPAUX hardware
contains pinmuxing controls for pins that it shares with an I2C
controller. The I2C controller is typically used for communication
with a monitor over HDMI (DDC). However, other instances of the I2C
controller are used to access system critical components, such as a
PMIC. The I2C controller driver will therefore usually be a builtin
driver, whereas the DPAUX driver is part of the display driver that
is loaded from a module to avoid bloating the kernel image with all
of the DRM/KMS subsystem.

In this particular case the pins used by this I2C/DDC controller
become accessible very late in the boot process. However, since the
controller is only used in conjunction with display, that's not an
issue.

Unfortunately the driver core currently outputs a warning message
when a device fails to get the pinctrl before the end of the init
stage. That can be confusing for the user because it may sound like
an unwanted error occurred, whereas it's really an expected and
harmless situation.

In order to eliminate this warning, this patch allows callers of the
driver_deferred_probe_check_state() helper to specify that they want
to continue deferring probe, regardless of whether we're past the
init stage or not. All of the callers of that function are updated
for the new signature, but only the pinctrl subsystem passes a true
value in the new persist parameter if appropriate.

Signed-off-by: Thierry Reding 
---
 drivers/base/dd.c| 17 -
 drivers/base/power/domain.c  |  2 +-
 drivers/iommu/of_iommu.c |  2 +-
 drivers/pinctrl/devicetree.c | 10 ++
 include/linux/device.h   |  2 +-
 5 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 0df9b4461766..25ffbadf4187 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -238,23 +238,30 @@ __setup("deferred_probe_timeout=", 
deferred_probe_timeout_setup);
 /**
  * driver_deferred_probe_check_state() - Check deferred probe state
  * @dev: device to check
+ * @persist: Boolean flag indicating whether drivers should keep trying to
+ *   probe after built-in drivers have had a chance to probe. This is useful
+ *   for built-in drivers that rely on resources provided by modular drivers.
  *
  * Returns -ENODEV if init is done and all built-in drivers have had a chance
- * to probe (i.e. initcalls are done), -ETIMEDOUT if deferred probe debug
- * timeout has expired, or -EPROBE_DEFER if none of those conditions are met.
+ * to probe (i.e. initcalls are done) and unless persist is set, -ETIMEDOUT if
+ * deferred probe debug timeout has expired, or -EPROBE_DEFER if none of those
+ * conditions are met.
  *
  * Drivers or subsystems can opt-in to calling this function instead of 
directly
  * returning -EPROBE_DEFER.
  */
-int driver_deferred_probe_check_state(struct device *dev)
+int driver_deferred_probe_check_state(struct device *dev, bool persist)
 {
if (initcalls_done) {
if (!deferred_probe_timeout) {
dev_WARN(dev, "deferred probe timeout, ignoring 
dependency");
return -ETIMEDOUT;
}
-   dev_warn(dev, "ignoring dependency for device, assuming no 
driver");
-   return -ENODEV;
+
+   if (!persist) {
+   dev_warn(dev, "ignoring dependency for device, assuming 
no driver");
+   return -ENODEV;
+   }
}
return -EPROBE_DEFER;
 }
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 33c30c1e6a30..effa5276a773 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2423,7 +2423,7 @@ static int __genpd_dev_pm_attach(struct device *dev, 
struct device *base_dev,
mutex_unlock(&gpd_list_lock);
dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
__func__, PTR_ERR(pd));
-   return driver_deferred_probe_check_state(base_dev);
+   return driver_deferred_probe_check_state(base_dev, false);
}
 
dev_dbg(dev, "adding to PM domain %s\n", pd->name);
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index f04a6df65eb8..70f3946b088a 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -117,7 +117,7 @@ static int of_iommu_xlate(struct device *dev,
 * a proper probe-ordering dependency mechanism in future.
 */
if (!ops)
-   return driver_deferred_probe_check_state(dev);
+   return driver_deferred_probe_check_state(dev, false);
 
return ops->of_xlate(dev, iommu_spec);
 }
diff --git a/drivers/pinctrl/devicetree.c b/dri

Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-05 Thread Rob Clark
On Wed, Jun 5, 2019 at 6:18 AM Marek Szyprowski
 wrote:
>
> Hi Rob,
>
> On 2019-06-05 14:57, Rob Clark wrote:
> > On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa  wrote:
> >> But first of all, I remember Marek already submitted some patches long
> >> ago that extended struct driver with some flag that means that the
> >> driver doesn't want the IOMMU to be attached before probe. Why
> >> wouldn't that work? Sounds like a perfect opt-out solution.
> > Actually, yeah.. we should do that.  That is the simplest solution.
>
> Tomasz has very good memory. It took me a while to find that old patches:
>
> https://patchwork.kernel.org/patch/4677251/
> https://patchwork.kernel.org/patch/4677941/
> https://patchwork.kernel.org/patch/4677401/
>
> It looks that my idea was a bit ahead of its time ;)
>

if I re-spin this, was their a reason not to just use bitfields, ie:

-bool suppress_bind_attrs;/* disables bind/unbind via sysfs */
+bool suppress_bind_attrs : 1;/* disables bind/unbind via sysfs */
+bool has_own_iommu_manager : 1;  /* driver explictly manages IOMMU */

That seems like it would have been a bit less churn and a bit nicer
looking (IMO at least)

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-05 Thread Marek Szyprowski
Hi Rob,

On 2019-06-05 14:57, Rob Clark wrote:
> On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa  wrote:
>> But first of all, I remember Marek already submitted some patches long
>> ago that extended struct driver with some flag that means that the
>> driver doesn't want the IOMMU to be attached before probe. Why
>> wouldn't that work? Sounds like a perfect opt-out solution.
> Actually, yeah.. we should do that.  That is the simplest solution.

Tomasz has very good memory. It took me a while to find that old patches:

https://patchwork.kernel.org/patch/4677251/
https://patchwork.kernel.org/patch/4677941/
https://patchwork.kernel.org/patch/4677401/

It looks that my idea was a bit ahead of its time ;)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-05 Thread Rob Clark
On Tue, Jun 4, 2019 at 11:58 PM Tomasz Figa  wrote:
>
> But first of all, I remember Marek already submitted some patches long
> ago that extended struct driver with some flag that means that the
> driver doesn't want the IOMMU to be attached before probe. Why
> wouldn't that work? Sounds like a perfect opt-out solution.

Actually, yeah.. we should do that.  That is the simplest solution.

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Christoph Hellwig
On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote:
> And if the problem is really that you're not getting merging because of 
> exposing the wrong parameters to the DMA API and/or the block layer, or 
> that you just can't quite express your requirement to the block layer in 
> the first place, then that should really be tackled at the source rather 
> than worked around further down in the stack.

The problem is that we need a way to communicate to the block layer
that more than a single segment is ok IFF the DMA API instance supports
merging.  And of course the answer will depend on futher parameters
like the maximum merged segment size and alignment for the segement.

We'll need some way to communicate that, but I don't really think this
is series is the way to go.

We'd really need something hanging off the device (or through a query
API) how the dma map ops implementation exposes under what circumstances
it can merge.  The driver can then communicate that to the block layer
so that the block layer doesn't split requests up when reaching the
segement limit.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Robin Murphy

On 05/06/2019 12:11, Yoshihiro Shimoda wrote:

This patch adds a new capable IOMMU_CAP_MERGING to check whether
the IOVA would be contiguous strictly if a device requires and
the IOMMU driver has the capable.

Signed-off-by: Yoshihiro Shimoda 
---
  drivers/iommu/dma-iommu.c | 26 --
  include/linux/iommu.h |  1 +
  2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 034caae..ecf1a04 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
-   int i;
+   int i, ret;
+   bool iova_contiguous = false;
  
  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))

iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
  
+	if (dma_get_iova_contiguous(dev) &&

+   iommu_capable(dev->bus, IOMMU_CAP_MERGING))
+   iova_contiguous = true;
+
/*
 * Work out how much IOVA space we need, and align the segments to
 * IOVA granules for the IOMMU driver to handle. With some clever
@@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
sg_dma_len(s) = s_length;
s->offset -= s_iova_off;
s_length = iova_align(iovad, s_length + s_iova_off);
+   /*
+* Check whether the IOVA would be contiguous strictly if
+* a device requires and the IOMMU driver has the capable.
+*/
+   if (iova_contiguous && i > 0 &&
+   (s_iova_off || s->length != s_length))
+   return 0;
s->length = s_length;
  
  		/*

@@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
  
-	return __finalise_sg(dev, sg, nents, iova);

+   ret = __finalise_sg(dev, sg, nents, iova);
+   /*
+* Check whether the sg entry is single if a device requires and
+* the IOMMU driver has the capable.
+*/
+   if (iova_contiguous && ret != 1)
+   goto out_unmap_sg;


I don't see that just failing really gives this option any value. 
Clearly the MMC driver has to do *something* to handle the failure (plus 
presumably the case of not having IOMMU DMA ops at all), which begs the 
question of why it couldn't just do whatever that is anyway, without all 
this infrastructure. For starters, it would be a far simpler and less 
invasive patch:


if (dma_map_sg(...) > 1) {
dma_unmap_sg(...);
/* split into multiple requests and try again */
}

But then it would make even more sense to just have the driver be 
proactive about its special requirement in the first place, and simply 
validate the list before it even tries to map it:


for_each_sg(sgl, sg, n, i)
if ((i > 0 && sg->offset % PAGE_SIZE) ||
(i < n - 1 && sg->length % PAGE_SIZE))
/* segment will not be mergeable */

For reference, I think v4l2 and possibly some areas of DRM already do 
something vaguely similar to judge whether they get contiguous buffers 
or not.



+
+   return ret;
  
+out_unmap_sg:

+   iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
  out_free_iova:
iommu_dma_free_iova(cookie, iova, iova_len);
  out_restore_sg:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 91af22a..f971dd3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -104,6 +104,7 @@ enum iommu_cap {
   transactions */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
+   IOMMU_CAP_MERGING,  /* IOMMU supports segments merging */


This isn't a 'capability' of the IOMMU - "segment merging" equates to 
just remapping pages, and there's already a fundamental assumption that 
IOMMUs are capable of that. Plus it's very much a DMA API concept, so 
hardly belongs in the IOMMU API anyway.


All in all, I'm struggling to see the point of this. Although it's not a 
DMA API guarantee, iommu-dma already merges scatterlists as aggressively 
as it is allowed to, and will continue to do so for the foreseeable 
future (since it avoids considerable complication in the IOVA 
allocation), so if you want to make sure iommu_dma_map_sg() merges an 
entire list, just don't give it a non-mergeable list. And if you still 
really really want dma_map_sg() to have a behaviour of "merge to a 
single segment or fail", then that should at least be a DMA API 
attribute, which could in principle be honoured by bounce-buffer

Re: [PATCH v3] iommu/arm-smmu: Avoid constant zero in TLBI writes

2019-06-05 Thread Will Deacon
[+Joerg on To:]

On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
> From: Robin Murphy 
> 
> Apparently, some Qualcomm arm64 platforms which appear to expose their
> SMMU global register space are still, in fact, using a hypervisor to
> mediate it by trapping and emulating register accesses. Sadly, some
> deployed versions of said trapping code have bugs wherein they go
> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
> register.
> 
> While this can be mitigated for GCC today by tweaking the constraints
> for the implementation of writel_relaxed(), to avoid any potential
> arms race with future compilers more aggressively optimising register
> allocation, the simple way is to just remove all the problematic
> constant zeros. For the write-only TLB operations, the actual value is
> irrelevant anyway and any old nearby variable will provide a suitable
> GPR to encode. The one point at which we really do need a zero to clear
> a context bank happens before any of the TLB maintenance where crashes
> have been reported, so is apparently not a problem... :/
> 
> Reported-by: AngeloGioacchino Del Regno 
> Tested-by: Marc Gonzalez 
> Signed-off-by: Robin Murphy 
> Signed-off-by: Marc Gonzalez 

Acked-by: Will Deacon 

Joerg -- Please can you take this as a fix for 5.2, with a Cc stable?

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-05 Thread Jean-Philippe Brucker
On 03/06/2019 22:59, Jacob Pan wrote:
> On Mon,  3 Jun 2019 15:57:45 +0100
> Jean-Philippe Brucker  wrote:
> 
>> Allow device drivers and VFIO to get notified on IOMMU translation
>> fault, and handle recoverable faults (PCI PRI). Several series require
>> this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
>> generic host SVA implementation).
>>
>> Changes since v1 [1]:
>> * Allocate iommu_param earlier, in iommu_probe_device().
>> * Pass struct iommu_fault to fault handlers, instead of the
>>   iommu_fault_event wrapper.
>> * Removed unused iommu_fault_event::iommu_private.
>> * Removed unnecessary iommu_page_response::addr.
>> * Added iommu_page_response::version, which would allow to introduce a
>>   new incompatible iommu_page_response structure (as opposed to just
>>   adding a flag + field).
>>
>> [1] [PATCH 0/4] iommu: Add device fault reporting API
>> 
>> https://lore.kernel.org/lkml/20190523180613.55049-1-jean-philippe.bruc...@arm.com/
>>
>> Jacob Pan (3):
>>   driver core: Add per device iommu param
>>   iommu: Introduce device fault data
>>   iommu: Introduce device fault report API
>>
>> Jean-Philippe Brucker (1):
>>   iommu: Add recoverable fault reporting
>>
> This interface meet the need for vt-d, just one more comment on 2/4. Do
> you want to add Co-developed-by you for the three patches from me?

I'm fine without it, I don't think it adds much to the Signed-off-by,
which is required

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-05 Thread Jean-Philippe Brucker
On 05/06/2019 09:51, Tian, Kevin wrote:
>> From: Jacob Pan
>> Sent: Tuesday, June 4, 2019 6:09 AM
>>
>> On Mon,  3 Jun 2019 15:57:47 +0100
>> Jean-Philippe Brucker  wrote:
>>
>>> +/**
>>> + * struct iommu_fault_page_request - Page Request data
>>> + * @flags: encodes whether the corresponding fields are valid and
>>> whether this
>>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
>>> values)
>>> + * @pasid: Process Address Space ID
>>> + * @grpid: Page Request Group Index
>>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
>>> + * @addr: page address
>>> + * @private_data: device-specific private information
>>> + */
>>> +struct iommu_fault_page_request {
>>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID   (1 << 0)
>>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
>>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
>>> +   __u32   flags;
>>> +   __u32   pasid;
>>> +   __u32   grpid;
>>> +   __u32   perm;
>>> +   __u64   addr;
>>> +   __u64   private_data[2];
>>> +};
>>> +
>>
>> Just a thought, for non-identity G-H PASID management. We could pass on
>> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
>> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
>> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU
>> driver
>> can retrieve GPASID from the bind data then report to the guest via
>> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
>>
>> Should we add a gpasid field here? or we can add a flag and field
>> later, up to you.
>>
> 
> Can private_data serve this purpose?

Isn't private_data already used for VT-d's Private Data field?

> It's better not introducing
> gpasid awareness within host IOMMU driver. It is just a user-level
> data associated with a PASID when binding happens. Kernel doesn't
> care the actual meaning, simply record it and then return back to user 
> space later upon device fault. Qemu interprets the meaning as gpasid
> in its own context. otherwise usages may use it for other purpose.

Regarding a gpasid field I don't mind either way, but extending the
iommu_fault structure later won't be completely straightforward so we
could add some padding now.

Userspace negotiate the iommu_fault struct format with VFIO, before
allocating a circular buffer of N fault structures
(https://lore.kernel.org/lkml/20190526161004.25232-26-eric.au...@redhat.com/).
So adding new fields requires introducing a new ABI version and a struct
iommu_fault_v2. That may be OK for disruptive changes, but just adding a
new field indicated by a flag shouldn't have to be that complicated.

How about setting the iommu_fault structure to 128 bytes?

struct iommu_fault {
__u32   type;
__u32   padding;
union {
struct iommu_fault_unrecoverable event;
struct iommu_fault_page_request prm;
__u8 padding2[120];
};
};

Given that @prm is currently 40 bytes and @event 32 bytes, the padding
allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit
fields, which is still representable with new flags) before we have to
upgrade the ABI version.

A 4kB and a 64kB queue can hold respectively:

* 85 and 1365 records when iommu_fault is 48 bytes (current format).
* 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow
only 2 new 64-bit fields).
* 32 and 512 records when iommu_fault is 128 bytes.

In comparison,
* the SMMU even queue can hold 128 and 2048 events respectively at those
sizes (and is allowed to grow up to 524k entries)
* the SMMU PRI queue can hold 256 and 4096 PR.

But the SMMU queues have to be physically contiguous, whereas our fault
queues are in userspace memory which is less expensive. So 128-byte
records might be reasonable. What do you think?


The iommu_fault_response (patch 4/4) is a bit easier to extend because
it's userspace->kernel and userspace can just declare the size it's
using. I did add a version field in case we run out of flags or want to
change the whole thing, but I think I was being overly cautious and it
might just be a waste of space.

Thanks,
Jean


[RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous

2019-06-05 Thread Yoshihiro Shimoda
This API can set a flag whether a device requires iova contiguous
strictly.

Signed-off-by: Yoshihiro Shimoda 
---
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 16 
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264f..a33d611 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -752,6 +752,7 @@ struct device_dma_parameters {
 */
unsigned int max_segment_size;
unsigned long segment_boundary_mask;
+   bool iova_contiguous;
 };
 
 /**
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6309a72..cdb4e75 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -729,6 +729,22 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
+static inline int dma_get_iova_contiguous(struct device *dev)
+{
+   if (dev->dma_parms)
+   return dev->dma_parms->iova_contiguous;
+   return false;
+}
+
+static inline int dma_set_iova_contiguous(struct device *dev, bool contiguous)
+{
+   if (dev->dma_parms) {
+   dev->dma_parms->iova_contiguous = contiguous;
+   return 0;
+   }
+   return -EIO;
+}
+
 #ifndef dma_max_pfn
 static inline unsigned long dma_max_pfn(struct device *dev)
 {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU

2019-06-05 Thread Yoshihiro Shimoda
This patch adds a condition to avoid a memory size limitation of
swiotlb if the driver runs on IOMMU.

Tested-by: Takeshi Saito 
Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Simon Horman 
Reviewed-by: Wolfram Sang 
---
 drivers/mmc/host/tmio_mmc_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index 130b91c..c9f6a59 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1194,9 +1194,10 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 * Since swiotlb has memory size limitation, this will calculate
 * the maximum size locally (because we don't have any APIs for it now)
 * and check the current max_req_size. And then, this will update
-* the max_req_size if needed as a workaround.
+* the max_req_size if needed as a workaround. However, if the driver
+* runs on IOMMU, this workaround isn't needed.
 */
-   if (swiotlb_max_segment()) {
+   if (swiotlb_max_segment() && !device_iommu_mapped(&pdev->dev)) {
unsigned int max_size = (1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 
if (mmc->max_req_size > max_size)
-- 
2.7.4



[RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible

2019-06-05 Thread Yoshihiro Shimoda
If the IOMMU driver has a capable for merging segments to
a segment strictly, this can expose the host->mmc->max_segs with
multiple segments to a block layer by using blk_queue_max_segments()
that mmc_setup_queue() calls. Notes that an sdio card may be
possible to use multiple segments with non page aligned size, so
that this will not expose multiple segments to avoid failing
dma_map_sg() every time.

Notes that on renesas_sdhi_sys_dmac, the max_segs value will change
from 32 to 512, but the sys_dmac can handle 512 segments, so that
this init_card ops is added on "TMIO_MMC_MIN_RCAR2" environment.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/mmc/host/renesas_sdhi_core.c  | 27 +++
 drivers/mmc/host/renesas_sdhi_internal_dmac.c |  4 
 2 files changed, 31 insertions(+)

diff --git a/drivers/mmc/host/renesas_sdhi_core.c 
b/drivers/mmc/host/renesas_sdhi_core.c
index c5ee4a6..379cefa 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,6 +47,8 @@
 #define SDHI_VER_GEN3_SD   0xcc10
 #define SDHI_VER_GEN3_SDMMC0xcd10
 
+#define SDHI_MAX_SEGS_IN_IOMMU 512
+
 struct renesas_sdhi_quirks {
bool hs400_disabled;
bool hs400_4taps;
@@ -203,6 +206,28 @@ static void renesas_sdhi_clk_disable(struct tmio_mmc_host 
*host)
clk_disable_unprepare(priv->clk_cd);
 }
 
+static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card)
+{
+   struct tmio_mmc_host *host = mmc_priv(mmc);
+
+   /*
+* If the IOMMU driver has a capable for merging segments to
+* a segment strictly, this can expose the host->mmc->max_segs with
+* multiple segments to a block layer by using blk_queue_max_segments()
+* that mmc_setup_queue() calls. Notes that an sdio card may be
+* possible to use multiple segments with non page aligned size, so
+* that this will not expose multiple segments to avoid failing
+* dma_map_sg() every time.
+*/
+   if (host->pdata->max_segs < SDHI_MAX_SEGS_IN_IOMMU &&
+   iommu_capable(host->pdev->dev.bus, IOMMU_CAP_MERGING) &&
+   (mmc_card_mmc(card) || mmc_card_sd(card)))
+   host->mmc->max_segs = SDHI_MAX_SEGS_IN_IOMMU;
+   else
+   host->mmc->max_segs = host->pdata->max_segs ? :
+ TMIO_DEFAULT_MAX_SEGS;
+}
+
 static int renesas_sdhi_card_busy(struct mmc_host *mmc)
 {
struct tmio_mmc_host *host = mmc_priv(mmc);
@@ -726,6 +751,8 @@ int renesas_sdhi_probe(struct platform_device *pdev,
 
/* SDR speeds are only available on Gen2+ */
if (mmc_data->flags & TMIO_MMC_MIN_RCAR2) {
+   host->ops.init_card = renesas_sdhi_init_card;
+
/* card_busy caused issues on r8a73a4 (pre-Gen2) CD-less SDHI */
host->ops.card_busy = renesas_sdhi_card_busy;
host->ops.start_signal_voltage_switch =
diff --git a/drivers/mmc/host/renesas_sdhi_internal_dmac.c 
b/drivers/mmc/host/renesas_sdhi_internal_dmac.c
index 751fe91..a442f86 100644
--- a/drivers/mmc/host/renesas_sdhi_internal_dmac.c
+++ b/drivers/mmc/host/renesas_sdhi_internal_dmac.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -337,6 +338,9 @@ static int renesas_sdhi_internal_dmac_probe(struct 
platform_device *pdev)
/* value is max of SD_SECCNT. Confirmed by HW engineers */
dma_set_max_seg_size(dev, 0x);
 
+   if (iommu_capable(dev->bus, IOMMU_CAP_MERGING))
+   dma_set_iova_contiguous(dev, true);
+
return renesas_sdhi_probe(pdev, &renesas_sdhi_internal_dmac_dma_ops);
 }
 
-- 
2.7.4



[RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers

2019-06-05 Thread Yoshihiro Shimoda
This patch ports the headers in alphabetic order to ease
the maintenance for this part.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/mmc/host/renesas_sdhi_core.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/host/renesas_sdhi_core.c 
b/drivers/mmc/host/renesas_sdhi_core.c
index 5e9e36e..c5ee4a6 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -18,20 +18,20 @@
  *
  */
 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
+#include 
+#include 
+#include 
 #include 
 #include 
-#include 
-#include 
-#include 
+#include 
+#include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 
 #include "renesas_sdhi.h"
-- 
2.7.4



[RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops

2019-06-05 Thread Yoshihiro Shimoda
This patch adds the .capable into iommu_ops that can merge scatter
gather segments.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/ipmmu-vmsa.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 408ad0b..81170b8 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -608,6 +608,18 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
+static bool ipmmu_capable(enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_MERGING:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
 static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
 {
struct ipmmu_vmsa_domain *domain;
@@ -950,6 +962,7 @@ static struct iommu_group *ipmmu_find_group(struct device 
*dev)
 }
 
 static const struct iommu_ops ipmmu_ops = {
+   .capable = ipmmu_capable,
.domain_alloc = ipmmu_domain_alloc,
.domain_free = ipmmu_domain_free,
.attach_dev = ipmmu_attach_device,
-- 
2.7.4



[RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place

2019-06-05 Thread Yoshihiro Shimoda
iommu_dma_map_sg() will use the unmap function in the future. To
avoid a forward declaration, this patch move the function place.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/dma-iommu.c | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0dee374..034caae 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -730,6 +730,30 @@ static void iommu_dma_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
__iommu_dma_unmap(dev, dma_handle, size);
 }
 
+static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+   int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+   dma_addr_t start, end;
+   struct scatterlist *tmp;
+   int i;
+
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
+
+   /*
+* The scatterlist segments are mapped into a single
+* contiguous IOVA allocation, so this is incredibly easy.
+*/
+   start = sg_dma_address(sg);
+   for_each_sg(sg_next(sg), tmp, nents - 1, i) {
+   if (sg_dma_len(tmp) == 0)
+   break;
+   sg = tmp;
+   }
+   end = sg_dma_address(sg) + sg_dma_len(sg);
+   __iommu_dma_unmap(dev, start, end - start);
+}
+
 /*
  * Prepare a successfully-mapped scatterlist to give back to the caller.
  *
@@ -887,30 +911,6 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
return 0;
 }
 
-static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   dma_addr_t start, end;
-   struct scatterlist *tmp;
-   int i;
-
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
-
-   /*
-* The scatterlist segments are mapped into a single
-* contiguous IOVA allocation, so this is incredibly easy.
-*/
-   start = sg_dma_address(sg);
-   for_each_sg(sg_next(sg), tmp, nents - 1, i) {
-   if (sg_dma_len(tmp) == 0)
-   break;
-   sg = tmp;
-   }
-   end = sg_dma_address(sg) + sg_dma_len(sg);
-   __iommu_dma_unmap(dev, start, end - start);
-}
-
 static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs

2019-06-05 Thread Yoshihiro Shimoda
This patch adds a definition for default max_segs to be used by other
driver (renesas_sdhi) in the future.

Signed-off-by: Yoshihiro Shimoda 
Reviewed-by: Wolfram Sang 
---
 drivers/mmc/host/tmio_mmc.h  | 1 +
 drivers/mmc/host/tmio_mmc_core.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
index c5ba13f..9e387be 100644
--- a/drivers/mmc/host/tmio_mmc.h
+++ b/drivers/mmc/host/tmio_mmc.h
@@ -106,6 +106,7 @@
 #define TMIO_MASK_IRQ (TMIO_MASK_READOP | TMIO_MASK_WRITEOP | 
TMIO_MASK_CMD)
 
 #define TMIO_MAX_BLK_SIZE 512
+#define TMIO_DEFAULT_MAX_SEGS 32
 
 struct tmio_mmc_data;
 struct tmio_mmc_host;
diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index c9f6a59..af1343e 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1185,7 +1185,7 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 
mmc->caps |= MMC_CAP_4_BIT_DATA | pdata->capabilities;
mmc->caps2 |= pdata->capabilities2;
-   mmc->max_segs = pdata->max_segs ? : 32;
+   mmc->max_segs = pdata->max_segs ? : TMIO_DEFAULT_MAX_SEGS;
mmc->max_blk_size = TMIO_MAX_BLK_SIZE;
mmc->max_blk_count = pdata->max_blk_count ? :
(PAGE_SIZE / mmc->max_blk_size) * mmc->max_segs;
-- 
2.7.4



[RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING

2019-06-05 Thread Yoshihiro Shimoda
This patch adds a new capable IOMMU_CAP_MERGING to check whether
the IOVA would be contiguous strictly if a device requires and
the IOMMU driver has the capable.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/dma-iommu.c | 26 --
 include/linux/iommu.h |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 034caae..ecf1a04 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
-   int i;
+   int i, ret;
+   bool iova_contiguous = false;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
 
+   if (dma_get_iova_contiguous(dev) &&
+   iommu_capable(dev->bus, IOMMU_CAP_MERGING))
+   iova_contiguous = true;
+
/*
 * Work out how much IOVA space we need, and align the segments to
 * IOVA granules for the IOMMU driver to handle. With some clever
@@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
sg_dma_len(s) = s_length;
s->offset -= s_iova_off;
s_length = iova_align(iovad, s_length + s_iova_off);
+   /*
+* Check whether the IOVA would be contiguous strictly if
+* a device requires and the IOMMU driver has the capable.
+*/
+   if (iova_contiguous && i > 0 &&
+   (s_iova_off || s->length != s_length))
+   return 0;
s->length = s_length;
 
/*
@@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
 
-   return __finalise_sg(dev, sg, nents, iova);
+   ret = __finalise_sg(dev, sg, nents, iova);
+   /*
+* Check whether the sg entry is single if a device requires and
+* the IOMMU driver has the capable.
+*/
+   if (iova_contiguous && ret != 1)
+   goto out_unmap_sg;
+
+   return ret;
 
+out_unmap_sg:
+   iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
 out_free_iova:
iommu_dma_free_iova(cookie, iova, iova_len);
 out_restore_sg:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 91af22a..f971dd3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -104,6 +104,7 @@ enum iommu_cap {
   transactions */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
+   IOMMU_CAP_MERGING,  /* IOMMU supports segments merging */
 };
 
 /*
-- 
2.7.4



[RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance

2019-06-05 Thread Yoshihiro Shimoda
This patch series is based on iommu.git / next branch.

Since SDHI host internal DMAC of the R-Car Gen3 cannot handle two or
more segments, the performance rate (especially, eMMC HS400 reading)
is not good. However, if IOMMU is enabled on the DMAC, since IOMMU will
map multiple scatter gather buffers as one contignous iova, the DMAC can
handle the iova as well and then the performance rate is possible to
improve. In fact, I have measured the performance by using bonnie++,
"Sequential Input - block" rate was improved on r8a7795.

To achieve this, this patch series modifies DMA MAPPING and IOMMU
subsystem at first. Since I'd like to get any feedback from each
subsystem whether this way is acceptable for upstream, I submit it
to treewide with RFC.

Changes from v4:
 - [DMA MAPPING] Add a new device_dma_parameters for iova contiguous.
 - [IOMMU] Add a new capable for "merging" segments.
 - [IOMMU] Add a capable ops into the ipmmu-vmsa driver.
 - [MMC] Sort headers in renesas_sdhi_core.c.
 - [MMC] Remove the following codes that made on v3 that can be achieved by
 DMA MAPPING and IOMMU subsystem:
 -- Check if R-Car Gen3 IPMMU is used or not on patch 3.
 -- Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=125593

Changes from v3:
 - Use a helper function device_iommu_mapped on patch 1 and 3.
 - Check if R-Car Gen3 IPMMU is used or not on patch 3.
 - Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3.
 - Add Reviewed-by Wolfram-san on patch 1 and 2. Note that I also got his
   Reviewed-by on patch 3, but I changed it from v2. So, I didn't add
   his Reviewed-by at this time.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=120985

Changes from v2:
 - Add some conditions in the init_card().
 - Add a comment in the init_card().
 - Add definitions for some "MAX_SEGS".
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=116729

Changes from v1:
 - Remove adding init_card ops into struct tmio_mmc_dma_ops and
   tmio_mmc_host and just set init_card on renesas_sdhi_core.c.
 - Revise typos on "mmc: tmio: No memory size limitation if runs on IOMMU".
 - Add Simon-san's Reviewed-by on a tmio patch.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=110485

*** BLURB HERE ***

Yoshihiro Shimoda (8):
  dma-mapping: add a device driver helper for iova contiguous
  iommu/dma: move iommu_dma_unmap_sg() place
  iommu: add a new capable IOMMU_CAP_MERGING
  iommu/ipmmu-vmsa: add capable ops
  mmc: tmio: No memory size limitation if runs on IOMMU
  mmc: tmio: Add a definition for default max_segs
  mmc: renesas_sdhi: sort headers
  mmc: renesas_sdhi: use multiple segments if possible

 drivers/iommu/dma-iommu.c | 74 +--
 drivers/iommu/ipmmu-vmsa.c| 13 +
 drivers/mmc/host/renesas_sdhi_core.c  | 43 +---
 drivers/mmc/host/renesas_sdhi_internal_dmac.c |  4 ++
 drivers/mmc/host/tmio_mmc.h   |  1 +
 drivers/mmc/host/tmio_mmc_core.c  |  7 +--
 include/linux/device.h|  1 +
 include/linux/dma-mapping.h   | 16 ++
 include/linux/iommu.h |  1 +
 9 files changed, 123 insertions(+), 37 deletions(-)

-- 
2.7.4



RE: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-05 Thread Tian, Kevin
> From: Jacob Pan
> Sent: Tuesday, June 4, 2019 6:09 AM
> 
> On Mon,  3 Jun 2019 15:57:47 +0100
> Jean-Philippe Brucker  wrote:
> 
> > +/**
> > + * struct iommu_fault_page_request - Page Request data
> > + * @flags: encodes whether the corresponding fields are valid and
> > whether this
> > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > values)
> > + * @pasid: Process Address Space ID
> > + * @grpid: Page Request Group Index
> > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> > + * @addr: page address
> > + * @private_data: device-specific private information
> > + */
> > +struct iommu_fault_page_request {
> > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID   (1 << 0)
> > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> > +   __u32   flags;
> > +   __u32   pasid;
> > +   __u32   grpid;
> > +   __u32   perm;
> > +   __u64   addr;
> > +   __u64   private_data[2];
> > +};
> > +
> 
> Just a thought, for non-identity G-H PASID management. We could pass on
> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU
> driver
> can retrieve GPASID from the bind data then report to the guest via
> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> 
> Should we add a gpasid field here? or we can add a flag and field
> later, up to you.
> 

Can private_data serve this purpose? It's better not introducing
gpasid awareness within host IOMMU driver. It is just a user-level
data associated with a PASID when binding happens. Kernel doesn't
care the actual meaning, simply record it and then return back to user 
space later upon device fault. Qemu interprets the meaning as gpasid
in its own context. otherwise usages may use it for other purpose.

Thanks
Kevin


Re: [PATCH v2 3/3] xen/swiotlb: remember having called xen_create_contiguous_region()

2019-06-05 Thread Juergen Gross

On 31.05.19 13:38, Juergen Gross wrote:

On 30/05/2019 14:46, Boris Ostrovsky wrote:

On 5/30/19 5:04 AM, Christoph Hellwig wrote:

Please don't add your private flag to page-flags.h.  The whole point of
the private flag is that you can use it in any way you want withou
touching the common code.



There is already a bunch of aliases for various sub-components
(including Xen) in page-flags.h for private flags, which is why I
suggested we do the same for the new use case. Adding this new alias
will keep flag usage consistent.


What about me adding another patch moving those Xen private aliases
into arch/x86/include/asm/xen/page.h ?


This is becoming difficult.

I'd need to remove the "#undef PF_NO_COMPOUND" from page-flags.h or to
#include a (new) xen/page-flags.h from page-flags.h after all the
defines are ready. Is that really worth the effort given that other
components (e.g. file systems) are doing the same?


Juergen
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-05 Thread Tomasz Figa
On Mon, Jun 3, 2019 at 7:48 PM Rob Clark  wrote:
>
> On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> >
> > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > >
> > > On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > > >
> > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > > > >
> > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > > > >
> > > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > > iommu_dma_ops while we attach our own domain and manage it directly 
> > > > > > at
> > > > > > the iommu API level:
> > > > > >
> > > > > >   [0038] user address but active_mm is swapper
> > > > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > > > >   Modules linked in:
> > > > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 
> > > > > > 4.19.3 #90
> > > > > >   Hardware name: xxx (DT)
> > > > > >   Workqueue: events deferred_probe_work_func
> > > > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > > > >   sp : ff80095eb4f0
> > > > > >   x29: ff80095eb4f0 x28: 
> > > > > >   x27: ffc0f9431578 x26: 
> > > > > >   x25:  x24: 0003
> > > > > >   x23: 0001 x22: ffc0fa9ac010
> > > > > >   x21:  x20: ffc0fab40980
> > > > > >   x19: ffc0fab40980 x18: 0003
> > > > > >   x17: 01c4 x16: 0007
> > > > > >   x15: 000e x14: 
> > > > > >   x13:  x12: 0028
> > > > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > > >   x9 :  x8 : ffc0fab409a0
> > > > > >   x7 :  x6 : 0002
> > > > > >   x5 : 0001 x4 : 
> > > > > >   x3 : 0001 x2 : 0002
> > > > > >   x1 : ffc0f9431578 x0 : 
> > > > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > > > >   Call trace:
> > > > > >iommu_dma_map_sg+0x7c/0x2c8
> > > > > >__iommu_map_sg_attrs+0x70/0x84
> > > > > >get_pages+0x170/0x1e8
> > > > > >msm_gem_get_iova+0x8c/0x128
> > > > > >_msm_gem_kernel_new+0x6c/0xc8
> > > > > >msm_gem_kernel_new+0x4c/0x58
> > > > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > > > >msm_dsi_modeset_init+0x54/0x18c
> > > > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > > > >msm_drm_bind+0x360/0x6c8
> > > > > >try_to_bring_up_master.part.7+0x28/0x70
> > > > > >component_master_add_with_match+0xe8/0x124
> > > > > >msm_pdev_probe+0x294/0x2b4
> > > > > >platform_drv_probe+0x58/0xa4
> > > > > >really_probe+0x150/0x294
> > > > > >driver_probe_device+0xac/0xe8
> > > > > >__device_attach_driver+0xa4/0xb4
> > > > > >bus_for_each_drv+0x98/0xc8
> > > > > >__device_attach+0xac/0x12c
> > > > > >device_initial_probe+0x24/0x30
> > > > > >bus_probe_device+0x38/0x98
> > > > > >deferred_probe_work_func+0x78/0xa4
> > > > > >process_one_work+0x24c/0x3dc
> > > > > >worker_thread+0x280/0x360
> > > > > >kthread+0x134/0x13c
> > > > > >ret_from_fork+0x10/0x18
> > > > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > > > >   ---[ end trace f22dda57f3648e2c ]---
> > > > > >   Kernel panic - not syncing: Fatal exception
> > > > > >   SMP: stopping secondary CPUs
> > > > > >   Kernel Offset: disabled
> > > > > >   CPU features: 0x0,22802a18
> > > > > >   Memory Limit: none
> > > > > >
> > > > > > The problem is that when drm/msm does it's own 
> > > > > > iommu_attach_device(),
> > > > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > > > domain, and it doesn't have domain->iova_cookie.
> > > > > >
> > > > > > We kind of avoided this problem prior to sdm845/dpu because the 
> > > > > > iommu
> > > > > > was attached to the mdp node in dt, which is a child of the toplevel
> > > > > > mdss node (which corresponds to the dev passed in dma_map_sg()).  
> > > > > > But
> > > > > > with sdm845, now the iommu is attached at the mdss level so we hit 
> > > > > > the
> > > > > > iommu_dma_ops in dma_map_sg().
> > > > > >
> > > > > > But auto allocating/attaching a domain before the driver is probed 
> > > > > > was
> > > > > > already a blocking problem for enabling per-context pagetables for 
> > > > > > the
> > > > > > GPU.  This problem is also now solved with this patch.
> > > > > >
> > > > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > > > > > of_dma_configure
> > > > > > Tested-by: Douglas Anderson 
> > > > > > Signed-off-by: Rob Clark 
> > > > > > ---
> > > > > > This is an alternative/replacement for [1].  What it lacks in 
> > > > > > elegance
> > > > > > it makes up for in practicality ;-)
> > > > > >
> > > > >