[PATCH] iommu/mediatek: fix leaked of_node references

2019-04-16 Thread Wen Yang
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

581 static int mtk_iommu_probe(struct platform_device *pdev)
582 {
...
626 for (i = 0; i < larb_nr; i++) {
627 struct device_node *larbnode;
...
631 larbnode = of_parse_phandle(...);
632 if (!larbnode)
633 return -EINVAL;
634
635 if (!of_device_is_available(larbnode))
636 continue; ---> leaked here
637
...
643 if (!plarbdev)
644 return -EPROBE_DEFER; ---> leaked here
...
647 component_match_add_release(dev, , release_of,
648 compare_of, larbnode);
   ---> release_of will call of_node_put
649 }
...
650

Detected by coccinelle with the following warnings:
./drivers/iommu/mtk_iommu.c:644:3-9: ERROR: missing of_node_put; acquired a 
node pointer with refcount incremented on line 631, but without a corresponding 
object release within this function.

Signed-off-by: Wen Yang 
Cc: Joerg Roedel 
Cc: Matthias Brugger 
Cc: iommu@lists.linux-foundation.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-media...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/iommu/mtk_iommu.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index de3e022..b66d11b 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -632,16 +632,20 @@ static int mtk_iommu_probe(struct platform_device *pdev)
if (!larbnode)
return -EINVAL;
 
-   if (!of_device_is_available(larbnode))
+   if (!of_device_is_available(larbnode)) {
+   of_node_put(larbnode);
continue;
+   }
 
ret = of_property_read_u32(larbnode, "mediatek,larb-id", );
if (ret)/* The id is consecutive if there is no this property */
id = i;
 
plarbdev = of_find_device_by_node(larbnode);
-   if (!plarbdev)
+   if (!plarbdev) {
+   of_node_put(larbnode);
return -EPROBE_DEFER;
+   }
data->smi_imu.larb_imu[id].dev = >dev;
 
component_match_add_release(dev, , release_of,
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-16 Thread Leizhen (ThunderTown)



On 2019/4/16 23:21, Will Deacon wrote:
> On Fri, Apr 12, 2019 at 02:11:31PM +0100, Robin Murphy wrote:
>> On 12/04/2019 11:26, John Garry wrote:
>>> On 09/04/2019 13:53, Zhen Lei wrote:
 +static int __init iommu_dma_mode_setup(char *str)
 +{
 +if (!str)
 +goto fail;
 +
 +if (!strncmp(str, "passthrough", 11))
 +iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
 +else if (!strncmp(str, "lazy", 4))
 +iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
 +else if (!strncmp(str, "strict", 6))
 +iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
 +else
 +goto fail;
 +
 +pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);
>>>
>>> What happens if the cmdline option iommu.dma_mode is passed multiple
>>> times? We get mutliple - possibily conflicting - prints, right?
>>
>> Indeed; we ended up removing such prints for the existing options here,
>> specifically because multiple messages seemed more likely to be confusing
>> than useful.

I originally intended to be compatible with X86 printing.

} else if (!strncmp(str, "strict", 6)) {
pr_info("Disable batched IOTLB flush\n");
intel_iommu_strict = 1;
}

>>
>>> And do we need to have backwards compatibility, such that the setting
>>> for iommu.strict or iommu.passthrough trumps iommu.dma_mode, regardless
>>> of order?
>>
>> As above I think it would be preferable to just keep using the existing
>> options anyway. The current behaviour works out as:
>>
>> iommu.passthrough |  Y   | N
>> iommu.strict   |  x  |Y N
>> --|-|-|
>> MODE   | PASSTHROUGH | STRICT  |  LAZY
>>
>> which seems intuitive enough that a specific dma_mode option doesn't add
>> much value, and would more likely just overcomplicate things for users as
>> well as our implementation.
> 
> Agreed. We can't remove the existing options, and they do the job perfectly
> well so I don't see the need to add more options on top.

OK, I will remove the iommu.dma_mode option in the next version. Thanks for you 
three.

I didn't want to add it at first, but later found that the boot options on
each ARCH are different, then want to normalize it.

In addition, do we need to compatible the build option name 
IOMMU_DEFAULT_PASSTHROUGH? or
change it to IOMMU_DEFAULT_DMA_MODE_PASSTHROUGH or 
IOMMU_DEFAULT_MODE_PASSTHROUGH?

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled

2019-04-16 Thread Leizhen (ThunderTown)



On 2019/4/16 17:14, Will Deacon wrote:
> On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote:
>> On 2019/4/4 23:30, Will Deacon wrote:
>>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote:
 v1 --> v2:
 1. Drop part2. Now, we only use the SMMUv3 hardware feature 
 STE.config=0b000
 (Report abort to device, no event recorded) to suppress the event messages
 caused by the unexpected devices.
 2. rewrite the patch description.
>>>
>>> This issue came up a while back:
>>>
>>> https://lore.kernel.org/linux-pci/20180302103032.gb19...@arm.com/
>>>
>>> and I'd still prefer to solve it using the disable_bypass logic which we
>>> already have. Something along the lines of the diff below?
>>
>> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If
>> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries
>> are allocated and initialized(set ste.config=0b000). But if 
>> SMMU_IDR0.ST_LEVEL=1
>> (2-level Stream Table), we only allocated and initialized the first level 
>> tables,
>> but leave level 2 tables dynamic allocated. That means, 
>> C_BAD_STREAMID(eventid=0x2)
>> will be reported, if an unexpeted device access memory without reinitialized 
>> in
>> kdump kernel.
> 
> So is your problem just that the C_BAD_STREAMID events are noisy? If so,
> perhaps we should be disabling fault reporting entirely in the kdump kernel.
> 
> How about the update diff below? I'm keen to have this as simple as
> possible, so we don't end up introducing rarely tested, complex code on
> the crash path.
In theory, it can solve the problem, let me test it.

But then again, below patch will also disable the fault reporting come from the
expected devices which are used in the kdump kernel. In fact, my patches have 
been
merged into our interval version more than 2 months, no bug have been found yet.

However, my patches do not support the case that the hardware does not support 
the
"STE bypass" feature, I think your patch can also resolve it.

> 
> Will
> 
> --->8
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d3880010c6cf..d8b73da6447d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct 
> arm_smmu_device *smmu, bool bypass)
>   /* Clear CR0 and sync (disables SMMU and queue processing) */
>   reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
>   if (reg & CR0_SMMUEN) {
> - if (is_kdump_kernel()) {
> - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> - arm_smmu_device_disable(smmu);
> - return -EBUSY;
> - }
> -
>   dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
> + WARN_ON(is_kdump_kernel() && !disable_bypass);
> + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
>   }
>  
>   ret = arm_smmu_device_disable(smmu);
> @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
> *smmu, bool bypass)
>   return ret;
>   }
>  
> + if (is_kdump_kernel())
> + enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
>  
>   /* Enable the SMMU interface, or ensure bypass */
>   if (!bypass || disable_bypass) {
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/9] iommu/dma-iommu: Add function to flush any cached not present IOTLB entries

2019-04-16 Thread Tom Murphy via iommu
>That said, I've now gone and looked and AFAICS both the Intel...
Ah, I missed that, you're right.

>...and AMD
It doesn't look like it. On AMD the cache is flushed during
iommu_ops::map only if the there are page table pages to free (if
we're allocating a large page and freeing the sub pages), right?
I guess this is a bug in the AMD iommu driver, as you said, it should
be handled in iommu_map(). I'll submit another patch to flush the np
cache on a call to amd iommu_ops::map if amd_iommu_np_cache is set.

>What might be
>worthwhile, though, is seeing if there's scope to refactor those drivers
>to push some of it into an iommu_ops::iotlb_sync_map callback to
>optimise the flushing for multi-page mappings.
I am working on a different patch series to improve the flushing and
page table page freeing for both amd and intel

On Tue, Apr 16, 2019 at 3:01 PM Robin Murphy  wrote:
>
> On 11/04/2019 19:47, Tom Murphy wrote:
> > Both the AMD and Intel drivers can cache not present IOTLB entries. To
> > convert these drivers to the dma-iommu api we need a generic way to
> > flush the NP cache. IOMMU drivers which have a NP cache can implement
> > the .flush_np_cache function in the iommu ops struct. I will implement
> > .flush_np_cache for both the Intel and AMD drivers in later patches.
> >
> > The Intel np-cache is described here:
> > https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf#G7.66452
> >
> > And the AMD np-cache is described here:
> > https://developer.amd.com/wordpress/media/2012/10/34434-IOMMU-Rev_1.26_2-11-09.pdf#page=63
>
> Callers expect that once iommu_map() returns successfully, the mapping
> exists and is ready to use - if these drivers aren't handling this
> flushing internally, how are they not already broken for e.g. VFIO?
>
> > Signed-off-by: Tom Murphy 
> > ---
> >   drivers/iommu/dma-iommu.c | 10 ++
> >   include/linux/iommu.h |  3 +++
> >   2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1a4bff3f8427..cc5da30d6e58 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -594,6 +594,9 @@ struct page **iommu_dma_alloc(struct device *dev, 
> > size_t size, gfp_t gfp,
> >   < size)
> >   goto out_free_sg;
> >
> > + if (domain->ops->flush_np_cache)
> > + domain->ops->flush_np_cache(domain, iova, size);
> > +
>
> This doesn't scale. At the very least, it should be internal to
> iommu_map() and exposed to be the responsibility of every external
> caller now and forever after.
>
> That said, I've now gone and looked and AFAICS both the Intel and AMD
> drivers *do* appear to handle this in their iommu_ops::map callbacks
> already, so the whole patch does indeed seem bogus. What might be
> worthwhile, though, is seeing if there's scope to refactor those drivers
> to push some of it into an iommu_ops::iotlb_sync_map callback to
> optimise the flushing for multi-page mappings.
>
> Robin.
>
> >   *handle = iova;
> >   sg_free_table();
> >   return pages;
> > @@ -652,6 +655,10 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
> > phys_addr_t phys,
> >   iommu_dma_free_iova(cookie, iova, size);
> >   return DMA_MAPPING_ERROR;
> >   }
> > +
> > + if (domain->ops->flush_np_cache)
> > + domain->ops->flush_np_cache(domain, iova, size);
> > +
> >   return iova + iova_off;
> >   }
> >
> > @@ -812,6 +819,9 @@ int iommu_dma_map_sg(struct device *dev, struct 
> > scatterlist *sg,
> >   if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
> >   goto out_free_iova;
> >
> > + if (domain->ops->flush_np_cache)
> > + domain->ops->flush_np_cache(domain, iova, iova_len);
> > +
> >   return __finalise_sg(dev, sg, nents, iova);
> >
> >   out_free_iova:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 75559918d9bd..47ff8d731d6a 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -173,6 +173,7 @@ struct iommu_resv_region {
> >* @iotlb_sync_map: Sync mappings created recently using @map to the 
> > hardware
> >* @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty 
> > flush
> >*queue
> > + * @flush_np_cache: Flush the non present entry cache
> >* @iova_to_phys: translate iova to physical address
> >* @add_device: add device to iommu grouping
> >* @remove_device: remove device from iommu grouping
> > @@ -209,6 +210,8 @@ struct iommu_ops {
> >   unsigned long iova, size_t size);
> >   void (*iotlb_sync_map)(struct iommu_domain *domain);
> >   void (*iotlb_sync)(struct iommu_domain *domain);
> > + void (*flush_np_cache)(struct iommu_domain *domain,
> > + unsigned long iova, size_t size);
> >   phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, 

Re: [PATCH v2 1/7] ACPI/IORT: Check ATS capability in root complex nodes

2019-04-16 Thread Jean-Philippe Brucker
On 15/04/2019 19:31, Robin Murphy wrote:
> On 09/04/2019 17:52, Jean-Philippe Brucker wrote:
>> Root complex node in IORT has a bit telling whether it supports ATS or
>> not. Store this bit in the IOMMU fwspec when setting up a device, so it
>> can be accessed later by an IOMMU driver.
> 
> Hmm, it doesn't feel quite right to store a copy of the same RC/SMMU 
> integration property for every endpoint...
> 
> It seems like it might be more logical to track this at the SMMU end, 
> i.e. only allow ARM_SMMU_FEAT_ATS to be set if all RCs targeting that 
> SMMU also support ATS. For the moment that seems sufficiently realistic, 
> and unless some wacky topology ever actually shows up in silicon to 
> complain, I'm inclined not to care too much about it being potentially 
> overly restrictive.

Doesn't that require us to implement a reverse lookup of smmu-node ->
RC-node using the iommu maps, in both DT and IORT? Seems like a lot of
work for little gain. Putting the bit in fwspec isn't ideal, but I don't
find aggregating properties of RCs into ARM_SMMU_FEAT_ATS much nicer.

> Furthermore, I'm now wondering whether it would make sense to push this 
> into the PCI layer as well (or instead), i.e. hook into pci_init_ats() 
> or pci_enable_ats() and it the device is untrusted or the topology 
> doesn't support ATS, prevent the capability from ever being enabled at 
> all, rather than trying to mitigate it later at the SMMU end. What do 
> you reckon?

For the RC property it's a bit difficult because everyone does it
differently. All of DMAR, IVRS, IORT and DT-based implementations would
have to call into the PCI core to declare whether a device is allowed to
do ATS or not. Which is essentially what the IOMMU drivers do now by
calling pci_enable_ats() themselves.

Having pci_enable_ats() check the untrusted bit seems like a good
cleanup. I'm not sure yet but given that it has side effects (AMD IOMMU
doesn't check that bit at the moment) it would probably fit better in a
separate series.

>> Use the negative version (NO_ATS) at the moment because it's not clear
>> if/how the bit needs to be integrated in other firmware descriptions. The
>> SMMU has a feature bit telling if it supports ATS, which might be
>> sufficient in most systems for deciding whether or not we should enable
>> the ATS capability in endpoints.
> 
> I can fairly confidently guarantee that it won't. For instance, MMU-600 
> reports IDR0.ATS==1 because MMU-600 implements the SMMUv3 architectural 
> ATS support. Actually making use of that support, though, still requires 
> an RC capable of generating the appropriate DTI-ATS messages, and that a 
> DTI interface is wired up correctly between the two.

Right, we need to explicitly validate that the RC understands ATS. If we
enable ATS but the root complex doesn't support it, in addition to being
unusable it might be possible for the EP to corrupt memory, because to
an RC that ignores the AT field, a Translation Request looks like a
Memory Read Request. The endpoint could end up storing the content of
memory in its ATC instead of a translated address, and later use that
as address of a write.

>> Signed-off-by: Jean-Philippe Brucker 
>> ---
>>   drivers/acpi/arm64/iort.c | 11 +++
>>   include/linux/iommu.h |  4 
>>   2 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>> index e48894e002ba..7f2c1c9c6b38 100644
>> --- a/drivers/acpi/arm64/iort.c
>> +++ b/drivers/acpi/arm64/iort.c
>> @@ -1028,6 +1028,14 @@ void iort_dma_setup(struct device *dev, u64 
>> *dma_addr, u64 *dma_size)
>>  dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
>>   }
>>   
>> +static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
>> +{
>> +struct acpi_iort_root_complex *pci_rc;
>> +
>> +pci_rc = (struct acpi_iort_root_complex *)node->node_data;
>> +return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
>> +}
>> +
>>   /**
>>* iort_iommu_configure - Set-up IOMMU configuration for a device.
>>*
>> @@ -1063,6 +1071,9 @@ const struct iommu_ops *iort_iommu_configure(struct 
>> device *dev)
>>  info.node = node;
>>  err = pci_for_each_dma_alias(to_pci_dev(dev),
>>   iort_pci_iommu_init, );
>> +
>> +if (!err && !iort_pci_rc_supports_ats(node))
>> +dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
>>  } else {
>>  int i = 0;
>>   
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 3dbeb457fb16..ed6738c358ca 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -509,10 +509,14 @@ struct iommu_fwspec {
>>  const struct iommu_ops  *ops;
>>  struct fwnode_handle*iommu_fwnode;
>>  void*iommu_priv;
>> +u32 flags;
>>  unsigned intnum_ids;
>>  u32 ids[1];
>>   };
>>   
>> +/* Firmware 

Re: [PATCH -next] xen-swiotlb: Make two functions static

2019-04-16 Thread Boris Ostrovsky
On 4/16/19 10:49 AM, Yue Haibing wrote:
> From: YueHaibing 
>
> Fix sparse warnings:
>
> drivers/xen/swiotlb-xen.c:489:1: warning:
>  symbol 'xen_swiotlb_sync_single_for_cpu' was not declared. Should it be 
> static?
> drivers/xen/swiotlb-xen.c:496:1: warning:
>  symbol 'xen_swiotlb_sync_single_for_device' was not declared. Should it be 
> static?
>
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 


I think latest patches from Christoph take care of this.

-boris


> ---
>  drivers/xen/swiotlb-xen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 877baf2..e741df4 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -485,14 +485,14 @@ xen_swiotlb_sync_single(struct device *hwdev, 
> dma_addr_t dev_addr,
>   xen_dma_sync_single_for_device(hwdev, dev_addr, size, dir);
>  }
>  
> -void
> +static void
>  xen_swiotlb_sync_single_for_cpu(struct device *hwdev, dma_addr_t dev_addr,
>   size_t size, enum dma_data_direction dir)
>  {
>   xen_swiotlb_sync_single(hwdev, dev_addr, size, dir, SYNC_FOR_CPU);
>  }
>  
> -void
> +static void
>  xen_swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
>  size_t size, enum dma_data_direction dir)
>  {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID

2019-04-16 Thread Jacob Pan
On Mon, 15 Apr 2019 14:37:11 -0600
Alex Williamson  wrote:

> On Mon,  8 Apr 2019 16:59:25 -0700
> Jacob Pan  wrote:
> 
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch register a
> > custom IOASID allocator which takes precedence over the default
> > IDR based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> > 
> > Signed-off-by: Lu Baolu 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 50
> > +
> > include/linux/intel-iommu.h |  1 + 2 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 28cb713..a38d774 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4820,6 +4820,42 @@ static int __init
> > platform_optin_force_iommu(void) return 1;
> >  }
> >  
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > +   struct intel_iommu *iommu = data;
> > +   ioasid_t ioasid;
> > +
> > +   if (vcmd_alloc_pasid(iommu, ))
> > +   return INVALID_IOASID;
> > +   return ioasid;  
> 
> How does this honor min/max?
> 
Sorry I missed this in my previous reply.
VT-d virtual command interface always allocate PASIDs with full 20bit
range. I think a range checking is missing here. Thanks for pointing
this out.

>  [...]  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-16 Thread Will Deacon
On Fri, Apr 12, 2019 at 02:11:31PM +0100, Robin Murphy wrote:
> On 12/04/2019 11:26, John Garry wrote:
> > On 09/04/2019 13:53, Zhen Lei wrote:
> > > +static int __init iommu_dma_mode_setup(char *str)
> > > +{
> > > +    if (!str)
> > > +    goto fail;
> > > +
> > > +    if (!strncmp(str, "passthrough", 11))
> > > +    iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
> > > +    else if (!strncmp(str, "lazy", 4))
> > > +    iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
> > > +    else if (!strncmp(str, "strict", 6))
> > > +    iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
> > > +    else
> > > +    goto fail;
> > > +
> > > +    pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);
> > 
> > What happens if the cmdline option iommu.dma_mode is passed multiple
> > times? We get mutliple - possibily conflicting - prints, right?
> 
> Indeed; we ended up removing such prints for the existing options here,
> specifically because multiple messages seemed more likely to be confusing
> than useful.
> 
> > And do we need to have backwards compatibility, such that the setting
> > for iommu.strict or iommu.passthrough trumps iommu.dma_mode, regardless
> > of order?
> 
> As above I think it would be preferable to just keep using the existing
> options anyway. The current behaviour works out as:
> 
> iommu.passthrough |  Y| N
> iommu.strict|  x  |Y N
> --|-|-|
> MODE| PASSTHROUGH | STRICT  |  LAZY
> 
> which seems intuitive enough that a specific dma_mode option doesn't add
> much value, and would more likely just overcomplicate things for users as
> well as our implementation.

Agreed. We can't remove the existing options, and they do the job perfectly
well so I don't see the need to add more options on top.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] xen-swiotlb: Make two functions static

2019-04-16 Thread Yue Haibing
From: YueHaibing 

Fix sparse warnings:

drivers/xen/swiotlb-xen.c:489:1: warning:
 symbol 'xen_swiotlb_sync_single_for_cpu' was not declared. Should it be static?
drivers/xen/swiotlb-xen.c:496:1: warning:
 symbol 'xen_swiotlb_sync_single_for_device' was not declared. Should it be 
static?

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 877baf2..e741df4 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -485,14 +485,14 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t 
dev_addr,
xen_dma_sync_single_for_device(hwdev, dev_addr, size, dir);
 }
 
-void
+static void
 xen_swiotlb_sync_single_for_cpu(struct device *hwdev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir)
 {
xen_swiotlb_sync_single(hwdev, dev_addr, size, dir, SYNC_FOR_CPU);
 }
 
-void
+static void
 xen_swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
   size_t size, enum dma_data_direction dir)
 {
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] iommu/amd: Implement map_atomic

2019-04-16 Thread Robin Murphy

On 11/04/2019 19:47, Tom Murphy wrote:

Instead of using a spin lock I removed the mutex lock from both the
amd_iommu_map and amd_iommu_unmap path as well. iommu_map doesn’t lock
while mapping and so if iommu_map is called by two different threads on
the same iova region it results in a race condition even with the locks.
So the locking in amd_iommu_map and amd_iommu_unmap doesn't add any real
protection. The solution to this is for whatever manages the allocated
iova’s externally to make sure iommu_map isn’t called twice on the same
region at the same time.


Note that that assumption is not necessarily sufficient - even with 
correct address space management you can have cases like two threads 
mapping adjacent pages, where even thought they are targeting different 
PTEs they can race to install/modify intermediate levels of the 
pagetable. I believe AMD is actually OK in that regard, but some drivers 
*are* relying on locking for correctness so it can't just be 
unequivocally removed everywhere.


Robin.


Signed-off-by: Tom Murphy 
---
  drivers/iommu/amd_iommu.c | 25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2d4ee10626b4..b45e0e033adc 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3089,12 +3089,12 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
return ret;
  }
  
-static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,

-phys_addr_t paddr, size_t page_size, int iommu_prot)
+static int __amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
+phys_addr_t paddr, size_t page_size, int iommu_prot,
+gfp_t gfp)
  {
struct protection_domain *domain = to_pdomain(dom);
int prot = 0;
-   int ret;
  
  	if (domain->mode == PAGE_MODE_NONE)

return -EINVAL;
@@ -3104,11 +3104,21 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
  
-	mutex_lock(>api_lock);

-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
-   mutex_unlock(>api_lock);
+   return iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
+}
  
-	return ret;

+static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
+phys_addr_t paddr, size_t page_size, int iommu_prot)
+{
+   return __amd_iommu_map(dom, iova, paddr, page_size, iommu_prot,
+   GFP_KERNEL);
+}
+
+static int amd_iommu_map_atomic(struct iommu_domain *dom, unsigned long iova,
+phys_addr_t paddr, size_t page_size, int iommu_prot)
+{
+   return __amd_iommu_map(dom, iova, paddr, page_size, iommu_prot,
+   GFP_ATOMIC);
  }
  
  static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,

@@ -3262,6 +3272,7 @@ const struct iommu_ops amd_iommu_ops = {
.attach_dev = amd_iommu_attach_device,
.detach_dev = amd_iommu_detach_device,
.map = amd_iommu_map,
+   .map_atomic = amd_iommu_map_atomic,
.unmap = amd_iommu_unmap,
.iova_to_phys = amd_iommu_iova_to_phys,
.add_device = amd_iommu_add_device,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/9] iommu/dma-iommu: Add function to flush any cached not present IOTLB entries

2019-04-16 Thread Robin Murphy

On 11/04/2019 19:47, Tom Murphy wrote:

Both the AMD and Intel drivers can cache not present IOTLB entries. To
convert these drivers to the dma-iommu api we need a generic way to
flush the NP cache. IOMMU drivers which have a NP cache can implement
the .flush_np_cache function in the iommu ops struct. I will implement
.flush_np_cache for both the Intel and AMD drivers in later patches.

The Intel np-cache is described here:
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf#G7.66452

And the AMD np-cache is described here:
https://developer.amd.com/wordpress/media/2012/10/34434-IOMMU-Rev_1.26_2-11-09.pdf#page=63


Callers expect that once iommu_map() returns successfully, the mapping 
exists and is ready to use - if these drivers aren't handling this 
flushing internally, how are they not already broken for e.g. VFIO?



Signed-off-by: Tom Murphy 
---
  drivers/iommu/dma-iommu.c | 10 ++
  include/linux/iommu.h |  3 +++
  2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1a4bff3f8427..cc5da30d6e58 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -594,6 +594,9 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
< size)
goto out_free_sg;
  
+	if (domain->ops->flush_np_cache)

+   domain->ops->flush_np_cache(domain, iova, size);
+


This doesn't scale. At the very least, it should be internal to 
iommu_map() and exposed to be the responsibility of every external 
caller now and forever after.


That said, I've now gone and looked and AFAICS both the Intel and AMD 
drivers *do* appear to handle this in their iommu_ops::map callbacks 
already, so the whole patch does indeed seem bogus. What might be 
worthwhile, though, is seeing if there's scope to refactor those drivers 
to push some of it into an iommu_ops::iotlb_sync_map callback to 
optimise the flushing for multi-page mappings.


Robin.


*handle = iova;
sg_free_table();
return pages;
@@ -652,6 +655,10 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
iommu_dma_free_iova(cookie, iova, size);
return DMA_MAPPING_ERROR;
}
+
+   if (domain->ops->flush_np_cache)
+   domain->ops->flush_np_cache(domain, iova, size);
+
return iova + iova_off;
  }
  
@@ -812,6 +819,9 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,

if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
  
+	if (domain->ops->flush_np_cache)

+   domain->ops->flush_np_cache(domain, iova, iova_len);
+
return __finalise_sg(dev, sg, nents, iova);
  
  out_free_iova:

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 75559918d9bd..47ff8d731d6a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -173,6 +173,7 @@ struct iommu_resv_region {
   * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
   *queue
+ * @flush_np_cache: Flush the non present entry cache
   * @iova_to_phys: translate iova to physical address
   * @add_device: add device to iommu grouping
   * @remove_device: remove device from iommu grouping
@@ -209,6 +210,8 @@ struct iommu_ops {
unsigned long iova, size_t size);
void (*iotlb_sync_map)(struct iommu_domain *domain);
void (*iotlb_sync)(struct iommu_domain *domain);
+   void (*flush_np_cache)(struct iommu_domain *domain,
+   unsigned long iova, size_t size);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/9] iommu/dma-iommu: Add iommu_dma_copy_reserved_iova, iommu_dma_apply_resv_region to the dma-iommu api

2019-04-16 Thread Robin Murphy

On 16/04/2019 14:22, Tom Murphy wrote:

I hoped this could be an exception, it's easier to grok without the
line break and isn't crazy long. Because you mentioned it I'll fix it.


Frankly this patch is hard to justify anyway - iommu-dma already has its 
own reserved region handling, and there should be no need for external 
callers to be poking into the innards provided the IOMMU driver reports 
the correct reserved regions in the first place.


If the iommu-dma abstraction is not quite sufficient to actually convert 
amd-iommu to use it properly, then we should improve the abstraction, 
rather than just punching holes in it to merely poke renamed parts of 
the existing amd-iommu logic into.


Robin.


On Mon, Apr 15, 2019 at 7:31 AM Christoph Hellwig  wrote:


On Thu, Apr 11, 2019 at 07:47:32PM +0100, Tom Murphy via iommu wrote:

+
+ WARN_ON_ONCE(iommu_dma_reserve_iova(domain, region->start, end) == NULL);


Overly long line..

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/9] iommu/dma-iommu: Add iommu_dma_copy_reserved_iova, iommu_dma_apply_resv_region to the dma-iommu api

2019-04-16 Thread Tom Murphy via iommu
I hoped this could be an exception, it's easier to grok without the
line break and isn't crazy long. Because you mentioned it I'll fix it.

On Mon, Apr 15, 2019 at 7:31 AM Christoph Hellwig  wrote:
>
> On Thu, Apr 11, 2019 at 07:47:32PM +0100, Tom Murphy via iommu wrote:
> > +
> > + WARN_ON_ONCE(iommu_dma_reserve_iova(domain, region->start, end) == 
> > NULL);
>
> Overly long line..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA

2019-04-16 Thread Srinath Mannam via iommu
Hi Bjorn,

Thanks for review. Please find my reply below.

On Sat, Apr 13, 2019 at 4:04 AM Bjorn Helgaas  wrote:
>
> On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote:
> > Few SOCs have limitation that their PCIe host can't allow few inbound
> > address ranges. Allowed inbound address ranges are listed in dma-ranges
> > DT property and this address ranges are required to do IOVA mapping.
> > Remaining address ranges have to be reserved in IOVA mapping.
>
> If I understand correctly, devices below these PCIe host bridges can
> DMA only to the listed address ranges, and you prevent devices from
> doing DMA to the holes between the listed ranges by reserving the
> holes in dma-iommu.
Yes, devices below these PCIe host bridges can DMA only to the listed
address ranges,
and this patch prevents to allocate DMA(IOVA) addresses in the holes
of listed ranges.
>
> Apparently there's something that makes sure driver dma_map_*() always
> goes through dma-iommu?  I traced as far as seeing that dma-iommu
> depends on CONFIG_IOMMU_DMA, and that arm64 selects CONFIG_IOMMU_DMA
> if CONFIG_IOMMU_SUPPORT, but then the trail got cold.  I didn't see
> what selects CONFIG_IOMMU_SUPPORT.
IOMMU_SUPPORT depends on MMU.
>
> This does look like what Robin suggested, as far as I can tell.
> Hopefully he'll take a look and give his reviewed-by.  Thanks for
> persevering!
Thank you.

Regards,
Srinath.
>
> > PCIe Host driver of those SOCs has to list resource entries of allowed
> > address ranges given in dma-ranges DT property in sorted order. This
> > sorted list of resources will be processed and reserve IOVA address for
> > inaccessible address holes while initializing IOMMU domain.
> >
> > This patch set is based on Linux-5.0-rc2.
> >
> > Changes from v3:
> >   - Addressed Robin Murphy review comments.
> > - pcie-iproc: parse dma-ranges and make sorted resource list.
> > - dma-iommu: process list and reserve gaps between entries
> >
> > Changes from v2:
> >   - Patch set rebased to Linux-5.0-rc2
> >
> > Changes from v1:
> >   - Addressed Oza review comments.
> >
> > Srinath Mannam (3):
> >   PCI: Add dma_ranges window list
> >   iommu/dma: Reserve IOVA for PCIe inaccessible DMA address
> >   PCI: iproc: Add sorted dma ranges resource entries to host bridge
> >
> >  drivers/iommu/dma-iommu.c   | 19 
> >  drivers/pci/controller/pcie-iproc.c | 44 
> > -
> >  drivers/pci/probe.c |  3 +++
> >  include/linux/pci.h |  1 +
> >  4 files changed, 66 insertions(+), 1 deletion(-)
> >
> > --
> > 2.7.4
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 8/9] xen/gntdev.c: Convert to use vm_map_pages()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel
memory to user vma.

map->count is passed to vm_map_pages() and internal API
verify map->count against count ( count = vma_pages(vma))
for page array boundary overrun condition.

Signed-off-by: Souptick Joarder 
Reviewed-by: Boris Ostrovsky 
---
 drivers/xen/gntdev.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5efc5ee..5d64262 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1084,7 +1084,7 @@ static int gntdev_mmap(struct file *flip, struct 
vm_area_struct *vma)
int index = vma->vm_pgoff;
int count = vma_pages(vma);
struct gntdev_grant_map *map;
-   int i, err = -EINVAL;
+   int err = -EINVAL;
 
if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1145,12 +1145,9 @@ static int gntdev_mmap(struct file *flip, struct 
vm_area_struct *vma)
goto out_put_map;
 
if (!use_ptemod) {
-   for (i = 0; i < count; i++) {
-   err = vm_insert_page(vma, vma->vm_start + i*PAGE_SIZE,
-   map->pages[i]);
-   if (err)
-   goto out_put_map;
-   }
+   err = vm_map_pages(vma, map->pages, map->count);
+   if (err)
+   goto out_put_map;
} else {
 #ifdef CONFIG_X86
/*
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 9/9] xen/privcmd-buf.c: Convert to use vm_map_pages_zero()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages_zero() to map range of kernel
memory to user vma.

This driver has ignored vm_pgoff. We could later "fix" these drivers
to behave according to the normal vm_pgoff offsetting simply by
removing the _zero suffix on the function name and if that causes
regressions, it gives us an easy way to revert.

Signed-off-by: Souptick Joarder 
Reviewed-by: Boris Ostrovsky 
---
 drivers/xen/privcmd-buf.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/privcmd-buf.c b/drivers/xen/privcmd-buf.c
index de01a6d..d02dc43 100644
--- a/drivers/xen/privcmd-buf.c
+++ b/drivers/xen/privcmd-buf.c
@@ -166,12 +166,8 @@ static int privcmd_buf_mmap(struct file *file, struct 
vm_area_struct *vma)
if (vma_priv->n_pages != count)
ret = -ENOMEM;
else
-   for (i = 0; i < vma_priv->n_pages; i++) {
-   ret = vm_insert_page(vma, vma->vm_start + i * PAGE_SIZE,
-vma_priv->pages[i]);
-   if (ret)
-   break;
-   }
+   ret = vm_map_pages_zero(vma, vma_priv->pages,
+   vma_priv->n_pages);
 
if (ret)
privcmd_buf_vmapriv_free(vma_priv);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 3/9] drivers/firewire/core-iso.c: Convert to use vm_map_pages_zero()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages_zero() to map range of kernel memory
to user vma.

This driver has ignored vm_pgoff and mapped the entire pages. We
could later "fix" these drivers to behave according to the normal
vm_pgoff offsetting simply by removing the _zero suffix on the
function name and if that causes regressions, it gives us an easy
way to revert.

Signed-off-by: Souptick Joarder 
---
 drivers/firewire/core-iso.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/firewire/core-iso.c b/drivers/firewire/core-iso.c
index 35e784c..5414eb1 100644
--- a/drivers/firewire/core-iso.c
+++ b/drivers/firewire/core-iso.c
@@ -107,19 +107,8 @@ int fw_iso_buffer_init(struct fw_iso_buffer *buffer, 
struct fw_card *card,
 int fw_iso_buffer_map_vma(struct fw_iso_buffer *buffer,
  struct vm_area_struct *vma)
 {
-   unsigned long uaddr;
-   int i, err;
-
-   uaddr = vma->vm_start;
-   for (i = 0; i < buffer->page_count; i++) {
-   err = vm_insert_page(vma, uaddr, buffer->pages[i]);
-   if (err)
-   return err;
-
-   uaddr += PAGE_SIZE;
-   }
-
-   return 0;
+   return vm_map_pages_zero(vma, buffer->pages,
+   buffer->page_count);
 }
 
 void fw_iso_buffer_destroy(struct fw_iso_buffer *buffer,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 5/9] drm/xen/xen_drm_front_gem.c: Convert to use vm_map_pages()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
Reviewed-by: Oleksandr Andrushchenko 
---
 drivers/gpu/drm/xen/xen_drm_front_gem.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 28bc501..dd0602d 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -224,8 +224,7 @@ struct drm_gem_object *
 static int gem_mmap_obj(struct xen_gem_object *xen_obj,
struct vm_area_struct *vma)
 {
-   unsigned long addr = vma->vm_start;
-   int i;
+   int ret;
 
/*
 * clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
@@ -246,18 +245,11 @@ static int gem_mmap_obj(struct xen_gem_object *xen_obj,
 * FIXME: as we insert all the pages now then no .fault handler must
 * be called, so don't provide one
 */
-   for (i = 0; i < xen_obj->num_pages; i++) {
-   int ret;
-
-   ret = vm_insert_page(vma, addr, xen_obj->pages[i]);
-   if (ret < 0) {
-   DRM_ERROR("Failed to insert pages into vma: %d\n", ret);
-   return ret;
-   }
+   ret = vm_map_pages(vma, xen_obj->pages, xen_obj->num_pages);
+   if (ret < 0)
+   DRM_ERROR("Failed to map pages into vma: %d\n", ret);
 
-   addr += PAGE_SIZE;
-   }
-   return 0;
+   return ret;
 }
 
 int xen_drm_front_gem_mmap(struct file *filp, struct vm_area_struct *vma)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 4/9] drm/rockchip/rockchip_drm_gem.c: Convert to use vm_map_pages()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel
memory to user vma.

Tested on Rockchip hardware and display is working,
including talking to Lima via prime.

Signed-off-by: Souptick Joarder 
Tested-by: Heiko Stuebner 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index a8db758..a2ebb08 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -221,26 +221,13 @@ static int rockchip_drm_gem_object_mmap_iommu(struct 
drm_gem_object *obj,
  struct vm_area_struct *vma)
 {
struct rockchip_gem_object *rk_obj = to_rockchip_obj(obj);
-   unsigned int i, count = obj->size >> PAGE_SHIFT;
+   unsigned int count = obj->size >> PAGE_SHIFT;
unsigned long user_count = vma_pages(vma);
-   unsigned long uaddr = vma->vm_start;
-   unsigned long offset = vma->vm_pgoff;
-   unsigned long end = user_count + offset;
-   int ret;
 
if (user_count == 0)
return -ENXIO;
-   if (end > count)
-   return -ENXIO;
 
-   for (i = offset; i < end; i++) {
-   ret = vm_insert_page(vma, uaddr, rk_obj->pages[i]);
-   if (ret)
-   return ret;
-   uaddr += PAGE_SIZE;
-   }
-
-   return 0;
+   return vm_map_pages(vma, rk_obj->pages, count);
 }
 
 static int rockchip_drm_gem_object_mmap_dma(struct drm_gem_object *obj,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 7/9] videobuf2/videobuf2-dma-sg.c: Convert to use vm_map_pages()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel memory
to user vma.

vm_pgoff is treated in V4L2 API as a 'cookie' to select a buffer,
not as a in-buffer offset by design and it always want to mmap a
whole buffer from its beginning.

Signed-off-by: Souptick Joarder 
Suggested-by: Marek Szyprowski 
Reviewed-by: Marek Szyprowski 
---
 drivers/media/common/videobuf2/videobuf2-core.c|  7 +++
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  6 --
 drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 22 ++
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
b/drivers/media/common/videobuf2/videobuf2-core.c
index 70e8c33..ca4577a 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -2175,6 +2175,13 @@ int vb2_mmap(struct vb2_queue *q, struct vm_area_struct 
*vma)
goto unlock;
}
 
+   /*
+* vm_pgoff is treated in V4L2 API as a 'cookie' to select a buffer,
+* not as a in-buffer offset. We always want to mmap a whole buffer
+* from its beginning.
+*/
+   vma->vm_pgoff = 0;
+
ret = call_memop(vb, mmap, vb->planes[plane].mem_priv, vma);
 
 unlock:
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index aff0ab7..46245c5 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -186,12 +186,6 @@ static int vb2_dc_mmap(void *buf_priv, struct 
vm_area_struct *vma)
return -EINVAL;
}
 
-   /*
-* dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to
-* map whole buffer
-*/
-   vma->vm_pgoff = 0;
-
ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
buf->dma_addr, buf->size, buf->attrs);
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 015e737..d6b8eca 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -328,28 +328,18 @@ static unsigned int vb2_dma_sg_num_users(void *buf_priv)
 static int vb2_dma_sg_mmap(void *buf_priv, struct vm_area_struct *vma)
 {
struct vb2_dma_sg_buf *buf = buf_priv;
-   unsigned long uaddr = vma->vm_start;
-   unsigned long usize = vma->vm_end - vma->vm_start;
-   int i = 0;
+   int err;
 
if (!buf) {
printk(KERN_ERR "No memory to map\n");
return -EINVAL;
}
 
-   do {
-   int ret;
-
-   ret = vm_insert_page(vma, uaddr, buf->pages[i++]);
-   if (ret) {
-   printk(KERN_ERR "Remapping memory, error: %d\n", ret);
-   return ret;
-   }
-
-   uaddr += PAGE_SIZE;
-   usize -= PAGE_SIZE;
-   } while (usize > 0);
-
+   err = vm_map_pages(vma, buf->pages, buf->num_pages);
+   if (err) {
+   printk(KERN_ERR "Remapping memory, error: %d\n", err);
+   return err;
+   }
 
/*
 * Use common vm_area operations to track buffer refcount.
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[REBASE PATCH v5 2/9] arm: mm: dma-mapping: Convert to use vm_map_pages()

2019-04-16 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
---
 arch/arm/mm/dma-mapping.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1e2922..de7c76e 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1575,31 +1575,21 @@ static int __arm_iommu_mmap_attrs(struct device *dev, 
struct vm_area_struct *vma
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-   unsigned long uaddr = vma->vm_start;
-   unsigned long usize = vma->vm_end - vma->vm_start;
struct page **pages = __iommu_get_pages(cpu_addr, attrs);
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
+   int err;
 
if (!pages)
return -ENXIO;
 
-   if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
+   if (vma->vm_pgoff >= nr_pages)
return -ENXIO;
 
-   pages += off;
-
-   do {
-   int ret = vm_insert_page(vma, uaddr, *pages++);
-   if (ret) {
-   pr_err("Remapping memory failed: %d\n", ret);
-   return ret;
-   }
-   uaddr += PAGE_SIZE;
-   usize -= PAGE_SIZE;
-   } while (usize > 0);
+   err = vm_map_pages(vma, pages, nr_pages);
+   if (err)
+   pr_err("Remapping memory failed: %d\n", err);
 
-   return 0;
+   return err;
 }
 static int arm_iommu_mmap_attrs(struct device *dev,
struct vm_area_struct *vma, void *cpu_addr,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/7] iommu/arm-smmu-v3: Add support for PCI ATS

2019-04-16 Thread Will Deacon
On Mon, Apr 15, 2019 at 07:00:27PM +0100, Jean-Philippe Brucker wrote:
> On 15/04/2019 14:21, Will Deacon wrote:
> > On Tue, Apr 09, 2019 at 05:52:44PM +0100, Jean-Philippe Brucker wrote:
> >> PCIe devices can implement their own TLB, named Address Translation Cache
> >> (ATC). Enable Address Translation Service (ATS) for devices that support
> >> it and send them invalidation requests whenever we invalidate the IOTLBs.
> >>
> >> ATC invalidation is allowed to take up to 90 seconds, according to the
> >> PCIe spec, so it is possible to get a SMMU command queue timeout during
> >> normal operations. However we expect implementations to complete
> >> invalidation in reasonable time.
> >>
> >> We only enable ATS for "trusted" devices, and currently rely on the
> >> pci_dev->untrusted bit. For ATS we have to trust that:
> > 
> > AFAICT, devicetree has no way to describe a device as untrusted, so
> > everything will be trusted by default on those systems. Is that right?
> 
> Yes, although I'm adding a devicetree property for PCI in v5.2:
> https://lore.kernel.org/linux-pci/20190411211823.gu256...@google.com/T/#m9f2c036748bab6e262233280b22c1e6fab4e1607

Perfect :)

> >> (a) The device doesn't issue "translated" memory requests for addresses
> >> that weren't returned by the SMMU in a Translation Completion. In
> >> particular, if we give control of a device or device partition to a VM
> >> or userspace, software cannot program the device to access arbitrary
> >> "translated" addresses.
> > 
> > Any plans to look at split-stage ATS later on? I think that would allow
> > us to pass untrusted devices through to a VM behind S1 ATS.
> 
> I haven't tested it so far, we can look at that after adding support for
> nested translation

Sure, just curious. Thanks.

> 
> >> (b) The device follows permissions granted by the SMMU in a Translation
> >> Completion. If the device requested read+write permission and only
> >> got read, then it doesn't write.
> > 
> > Guessing we just ignore execute permissions, or do we need read implies
> > exec?
> 
> Without PASID, a function cannot ask for execute permission, only RO and
> RW. In this case execution by the endpoint is never permitted (because
> the Exe bit in an ATS completion is always zero).
> 
> With PASID, the endpoint must explicitly ask for execute permission (and
> interestingly, can't obtain it if the page is mapped exec-only, because
> in ATS exec implies read.)

Got it, thanks again.

> [...]
> >> +static bool disable_ats_check;
> >> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> >> +MODULE_PARM_DESC(disable_ats_check,
> >> +  "By default, the SMMU checks whether each incoming transaction marked 
> >> as translated is allowed by the stream configuration. This option disables 
> >> the check.");
> > 
> > Yikes, do we really want this option, or is it just a leftover from
> > debugging?
> 
> I wasn't sure what to do with it, I'll drop it in next version

Cheers.

> [...]
> >> @@ -876,6 +911,14 @@ static void arm_smmu_cmdq_skip_err(struct 
> >> arm_smmu_device *smmu)
> >>dev_err(smmu->dev, "retrying command fetch\n");
> >>case CMDQ_ERR_CERROR_NONE_IDX:
> >>return;
> >> +  case CMDQ_ERR_CERROR_ATC_INV_IDX:
> >> +  /*
> >> +   * ATC Invalidation Completion timeout. CONS is still pointing
> >> +   * at the CMD_SYNC. Attempt to complete other pending commands
> >> +   * by repeating the CMD_SYNC, though we might well end up back
> >> +   * here since the ATC invalidation may still be pending.
> >> +   */
> >> +  return;
> > 
> > This is pretty bad, as it means we're unable to unmap a page safely from a
> > misbehaving device. Ideally, we'd block further transactions from the
> > problematic endpoint, but I suppose we can't easily know which one it was,
> > or inject a fault back into the unmap() path.
> > 
> > Having said that, won't we get a CMD_SYNC timeout long before the ATC 
> > timeout?
> 
> Yes, CMD_SYNC timeout is 1s, ATC timeout is 90s...
> 
> > Perhaps we could propagate that out of arm_smmu_cmdq_issue_sync() and fail 
> > the
> > unmap?
> > 
> > Not sure, what do you think?
> 
> The callers of iommu_unmap() will free the page regardless of the return
> value, even though the device could still be accessing it. But I'll look
> at returning 0 if the CMD_SYNC times out, it's a good start for
> consolidating this. With dma-iommu.c it will trigger a WARN().

If it's not too tricky, that would be good.

> It gets a worse with PRI, when the invalidation comes from an MMU
> notifier and we can't even return an error. Ideally we'd hold a
> reference to these pages until invalidation completes.

Agreed.

> [...]
> >> +  /*
> >> +   * Find the smallest power of two that covers the range. Most
> >> +   * significant differing bit between start and end address indicates the
> >> +   * required span, ie. fls(start ^ end). For example:
> >> + 

Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled

2019-04-16 Thread Will Deacon
On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote:
> On 2019/4/4 23:30, Will Deacon wrote:
> > On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote:
> >> v1 --> v2:
> >> 1. Drop part2. Now, we only use the SMMUv3 hardware feature 
> >> STE.config=0b000
> >> (Report abort to device, no event recorded) to suppress the event messages
> >> caused by the unexpected devices.
> >> 2. rewrite the patch description.
> > 
> > This issue came up a while back:
> > 
> > https://lore.kernel.org/linux-pci/20180302103032.gb19...@arm.com/
> > 
> > and I'd still prefer to solve it using the disable_bypass logic which we
> > already have. Something along the lines of the diff below?
> 
> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If
> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries
> are allocated and initialized(set ste.config=0b000). But if 
> SMMU_IDR0.ST_LEVEL=1
> (2-level Stream Table), we only allocated and initialized the first level 
> tables,
> but leave level 2 tables dynamic allocated. That means, 
> C_BAD_STREAMID(eventid=0x2)
> will be reported, if an unexpeted device access memory without reinitialized 
> in
> kdump kernel.

So is your problem just that the C_BAD_STREAMID events are noisy? If so,
perhaps we should be disabling fault reporting entirely in the kdump kernel.

How about the update diff below? I'm keen to have this as simple as
possible, so we don't end up introducing rarely tested, complex code on
the crash path.

Will

--->8

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..d8b73da6447d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
/* Clear CR0 and sync (disables SMMU and queue processing) */
reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
if (reg & CR0_SMMUEN) {
-   if (is_kdump_kernel()) {
-   arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
-   arm_smmu_device_disable(smmu);
-   return -EBUSY;
-   }
-
dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
+   WARN_ON(is_kdump_kernel() && !disable_bypass);
+   arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
}
 
ret = arm_smmu_device_disable(smmu);
@@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return ret;
}
 
+   if (is_kdump_kernel())
+   enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
 
/* Enable the SMMU interface, or ensure bypass */
if (!bypass || disable_bypass) {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver

2019-04-16 Thread Daniel Vetter
On Mon, Apr 15, 2019 at 10:30:14AM +0100, Steven Price wrote:
> On 15/04/2019 10:18, Daniel Vetter wrote:
> > On Fri, Apr 05, 2019 at 05:42:33PM +0100, Steven Price wrote:
> >> On 05/04/2019 17:16, Alyssa Rosenzweig wrote:
> >>> acronym once ever and have it as a "??"), I'm not sure how to respond to
> >>> that... We don't know how to allocate memory for the GPU-internal data
> >>> structures (the tiler heap, for instance, but also a few others I've
> >>> just named "misc_0" and "scratchpad" -- guessing one of those is for
> >>> "TLS"). With kbase, I took the worst-case strategy of allocating
> >>> gigantic chunks on startup with tiny commit counts and GROW_ON_GPF set.
> >>> With the new driver, well, our memory consumption is scary since
> >>> implementing GROW_ON_GPF in an upstream-friendly way is a bit more work
> >>> and isn't expected to hit the 5.2 window.
> >>
> >> Yes GROW_ON_GPF is pretty much required for the tiler heap - it's not
> >> (reasonably) possible to determine how big it should be. The Arm user
> >> space driver does the same approach (tiny commit count, but allow it to
> >> grow).
> > 
> > Jumping in here with a drive through comment ...
> > 
> > Growing gem bo and dma-buf is going to be endless amounts of fun, since we
> > hard-coded that their size is invariant.
> > 
> > I think the only reasonable way to implement this is if you allocate a
> > really huge bo, map it, but only put the pages in on faulting. Or when
> > really evil userspace tries to export it. Actually changing the underlying
> > buffer size is not going to work I think.
> 
> Yes the idea is that you allocate a large amount of virtual address
> space, but only put a few physical pages in. If the GPU needs more you
> fault them in as necessary. The "buffer size" (i.e. virtual address
> region) never changes size.
> 
> > Note: I didn't read kbase, so might be totally wrong in how GROW_ON_GPF
> > works.
> 
> For kbase we simply don't support exporting this type of memory, and are
> fairly restrictive about mapping it into user space (user space
> shouldn't normally need to read it).

You can still disallow sharing with any other driver (in the dma-buf
attach callback), and then enforce whatever mapping restrictions you want
on the panfrost.ko ioctl interface. That should be roughly equivalent to
the restrictions kbase imposes.
> 
> Since Panfrost is using GEM BO it will have to deal with malicious user
> space. But it would be sufficient to simply fully back the region in
> that case.
> 
> Recent version of kbase also support what is termed JIT (Just In Time
> memory allocation). Basically this involves the kernel driver
> allocating/freeing memory regions just before the job is loaded onto the
> GPU. These regions might also be GROW_ON_GPF. The intention is that when
> there isn't memory pressure these regions can be kept between jobs, but
> under memory pressure they can be discarded and recreated if they turn
> out to be needed again.
> 
> Given the differences between the Panfrost and the proprietary user
> space I'm not sure exactly what form this will need to be for Panfrost,
> but Vulkan makes memory management "more interesting"! Allocating
> upfront for the worst case can become prohibitively expensive.

The usual way to do that is with a WONTNEED/WILLNEED madvise ioctl on the
gem bo. I guess that could also be set at create time to essentially only
require the bo to exist during an execbuf call. Should fit pretty well
into what other drivers are doing with gem shmem already I think.

ofc needs a shrinker to be able to reap these bo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu