Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-26 Thread Peter Xu
Hi, Baolu,

On Fri, Sep 27, 2019 at 10:27:24AM +0800, Lu Baolu wrote:
> > > > > + spin_lock(&(domain)->page_table_lock);  
> > > > > \
> > > > 
> > > > Is this intended to lock here instead of taking the lock during the
> > > > whole page table walk?  Is it safe?
> > > > 
> > > > Taking the example where nm==PTE: when we reach here how do we
> > > > guarantee that the PMD page that has this PTE is still valid?
> > > 
> > > We will always keep the non-leaf pages in the table,
> > 
> > I see.  Though, could I ask why?  It seems to me that the existing 2nd
> > level page table does not keep these when unmap, and it's not even use
> > locking at all by leveraging cmpxchg()?
> 
> I still need some time to understand how cmpxchg() solves the race issue
> when reclaims pages. For example.
> 
> Thread A  Thread B
> -A1: check all PTE's empty-B1: up-level PDE valid
> -A2: clear the up-level PDE
> -A3: reclaim the page -B2: populate the PTEs
> 
> Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen.

I'm not sure of this, but IMHO it is similarly because we need to
allocate the iova ranges from iova allocator first, so thread A (who's
going to unmap pages) and thread B (who's going to map new pages)
should never have collapsed regions if happening concurrently.  I'm
referring to intel_unmap() in which we won't free the iova region
before domain_unmap() completes (which should cover the whole process
of A1-A3) so the same iova range to be unmapped won't be allocated to
any new pages in some other thread.

There's also a hint in domain_unmap():

  /* we don't need lock here; nobody else touches the iova range */

> 
> Actually, the iova allocator always packs IOVA ranges close to the top
> of the address space. This results in requiring a minimal number of
> pages to map the allocated IOVA ranges, which makes memory onsumption
> by IOMMU page tables tolerable. Hence, we don't need to reclaim the
> pages until the whole page table is about to tear down. The real data
> on my test machine also improves this.

Do you mean you have run the code with a 1st-level-supported IOMMU
hardware?  IMHO any data point would be good to be in the cover letter
as reference.

[...]

> > > > > +static struct page *
> > > > > +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd,
> > > > > +   unsigned long addr, unsigned long end,
> > > > > +   struct page *freelist, bool reclaim)
> > > > > +{
> > > > > + int i;
> > > > > + unsigned long start;
> > > > > + pte_t *pte, *first_pte;
> > > > > +
> > > > > + start = addr;
> > > > > + pte = pte_offset_kernel(pmd, addr);
> > > > > + first_pte = pte;
> > > > > + do {
> > > > > + set_pte(pte, __pte(0));
> > > > > + } while (pte++, addr += PAGE_SIZE, addr != end);
> > > > > +
> > > > > + domain_flush_cache(domain, first_pte, (void *)pte - (void 
> > > > > *)first_pte);
> > > > > +
> > > > > + /* Add page to free list if all entries are empty. */
> > > > > + if (reclaim) {
> > > > 
> > > > Shouldn't we know whether to reclaim if with (addr, end) specified as
> > > > long as they cover the whole range of this PMD?
> > > 
> > > Current policy is that we don't reclaim any pages until the whole page
> > > table will be torn down.
> > 
> > Ah OK.  But I saw that you're passing in relaim==!start_addr.
> > Shouldn't that errornously trigger if one wants to unmap the 1st page
> > as well even if not the whole address space?
> 
> IOVA 0 is assumed to be reserved by the allocator. Otherwise, we have no
> means to check whether a IOVA is valid.

Is this an assumption of the allocator?  Could that change in the future?

IMHO that's not necessary if so, after all it's as simple as replacing
(!start_addr) with (start == 0 && end == END).  I see that in
domain_unmap() it has a similar check when freeing pgd:

  if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw))

Thanks,

-- 
Peter Xu


Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-26 Thread Lu Baolu

Hi Peter,

On 9/26/19 11:49 AM, Peter Xu wrote:

On Thu, Sep 26, 2019 at 10:35:24AM +0800, Lu Baolu wrote:

[...]


@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * intel-pgtable.c - Intel IOMMU page table manipulation library


Could this be a bit misleading?  Normally I'll use "IOMMU page table"
to refer to the 2nd level page table only, and I'm always
understanding it as "the new IOMMU will understand MMU page table as
the 1st level".  At least mention "IOMMU 1st level page table"?



This file is a place holder for all code that manipulating iommu page
tables (both first level and second level). Instead of putting
everything in intel_iommu.c, let's make the code more structured so that
it's easier for reading and maintaining. This is the motivation of this
file.


I see.




+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+#define pr_fmt(fmt) "DMAR: " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_X86
+/*
+ * mmmap: Map a range of IO virtual address to physical addresses.


"... to physical addresses using MMU page table"?

Might be clearer?


Yes.




+ */
+#define pgtable_populate(domain, nm)   \
+do {   \
+   void *__new = alloc_pgtable_page(domain->nid);   \
+   if (!__new) \
+   return -ENOMEM; \
+   smp_wmb();  \


Could I ask what's this wmb used for?


Sure. This is answered by a comment in __pte_alloc() in mm/memory.c. Let
me post it here.

 /*
  * Ensure all pte setup (eg. pte page lock and page clearing) are
  * visible before the pte is made visible to other CPUs by being
  * put into page tables.
  *
  * The other side of the story is the pointer chasing in the page
  * table walking code (when walking the page table without locking;
  * ie. most of the time). Fortunately, these data accesses consist
  * of a chain of data-dependent loads, meaning most CPUs (alpha
  * being the notable exception) will already guarantee loads are
  * seen in-order. See the alpha page table accessors for the
  * smp_read_barrier_depends() barriers in page table walking code.
  */
 smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */


Ok.  I don't understand the rationale much behind but the comment
seems to make sense...  Could you help to comment above, like "please
reference to comment in __pte_alloc" above the line?


Yes.








+   spin_lock(&(domain)->page_table_lock);   \


Is this intended to lock here instead of taking the lock during the
whole page table walk?  Is it safe?

Taking the example where nm==PTE: when we reach here how do we
guarantee that the PMD page that has this PTE is still valid?


We will always keep the non-leaf pages in the table,


I see.  Though, could I ask why?  It seems to me that the existing 2nd
level page table does not keep these when unmap, and it's not even use
locking at all by leveraging cmpxchg()?


I still need some time to understand how cmpxchg() solves the race issue
when reclaims pages. For example.

Thread AThread B
-A1: check all PTE's empty  -B1: up-level PDE valid
-A2: clear the up-level PDE
-A3: reclaim the page   -B2: populate the PTEs

Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen.

Actually, the iova allocator always packs IOVA ranges close to the top
of the address space. This results in requiring a minimal number of
pages to map the allocated IOVA ranges, which makes memory onsumption
by IOMMU page tables tolerable. Hence, we don't need to reclaim the
pages until the whole page table is about to tear down. The real data
on my test machine also improves this.




hence we only need
a spin lock to serialize multiple tries of populating a entry for pde.
As for pte, we can assume there is only single thread which can access
it at a time because different mappings will have different iova's.


Ah yes sorry nm will never be pte here... so do you mean the upper
layer, e.g., the iova allocator will make sure the ranges to be mapped
will never collapse with each other so setting PTEs do not need lock?


Yes.








+   if (nm ## _present(*nm)) {  \
+   free_pgtable_page(__new);   \
+   } else {\
+   set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\


It seems to me that PV could trap calls to set_pte().  Then these
could also be trapped by e.g. Xen?  Are 

Re: [RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable

2019-09-26 Thread Halil Pasic
On Thu, 26 Sep 2019 14:04:13 +0100
Robin Murphy  wrote:

> On 26/09/2019 13:37, Halil Pasic wrote:
> > On Mon, 23 Sep 2019 17:21:17 +0200
> > Christoph Hellwig  wrote:
> > 
> >> On Mon, Sep 23, 2019 at 02:34:16PM +0200, Halil Pasic wrote:
> >>> Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in
> >>> common code") tweaking the client code supplied GFP_* flags used to be
> >>> an issue handled in the architecture specific code. The commit message
> >>> suggests, that fixing the client code would actually be a better way
> >>> of dealing with this.
> >>>
> >>> On s390 common I/O devices are generally capable of using the full 64
> >>> bit address space for DMA I/O, but some chunks of the DMA memory need to
> >>> be 31 bit addressable (in physical address space) because the
> >>> instructions involved mandate it. Before switching to DMA API this used
> >>> to be a non-issue, we used to allocate those chunks from ZONE_DMA.
> >>> Currently our only option with the DMA API is to restrict the devices to
> >>> (via dma_mask and coherent_dma_mask) to 31 bit, which is sub-optimal.
> >>>
> >>> Thus s390 we would benefit form having control over what flags are
> >>> dropped.
> >>
> >> No way, sorry.  You need to express that using a dma mask instead of
> >> overloading the GFP flags.
> > 
> > Thanks for your feedback and sorry for the delay. Can you help me figure
> > out how can I express that using a dma mask?
> > 
> > IMHO what you ask from me is frankly impossible.
> > 
> > What I need is the ability to ask for  (considering the physical
> > address) 31 bit addressable DMA memory if the chunk is supposed to host
> > control-type data that needs to be 31 bit addressable because that is
> > how the architecture is, without affecting the normal data-path. So
> > normally 64 bit mask is fine but occasionally (control) we would need
> > a 31 bit mask.
> 
> If it's possible to rework the "data" path to use streaming mappings 
> instead of coherent allocations, you could potentially mimic what virtio 
> does for a similar situation - see commit a0be1db4304f.
> 

Thank you for your feedback. Just to be sure we are on the same pager, I
read commit a0be1db4304f like this:
1) virtio_pci_legacy needs to allocate the virtqueues so that the base
address fits 44 bits
2) if 64 bit dma is possible they set coherent_dma_mask to
  DMA_BIT_MASK(44) and dma_mask to DMA_BIT_MASK(64)
3) since the queues get allocated with coherent allocations 1) is
satisfied
4) when the streaming mappings see a buffer that is beyond
  DMA_BIT_MASK(44) then it has to treat it as not coherent memory
  and do the syncing magic (which isn't actually required, just
  a side effect of the workaround.

I'm actually trying to get virtio_ccw working nice with Protected
Virtualization (best think of encrypted memory). So the "data" path
is mostly the same as for virtio_pci.

But unlike virtio_pci_legacy we are perfectly fine with virtqueues at
an arbitrary address.

We can make
coherent_dma_mask == DMA_BIT_MASK(31) != dma_mask == DMA_BIT_MASK(64)
but that affects all dma coherent allocations and needlessly force
the virtio control structures into the  [0..2G] range. Furthermore this
whole issue has nothing to do with memory coherence. For ccw devices
memory at addresses above 2G is no less coherent for ccw devices than
memory at addresses below 2G.

I've already implemented a patch (see after the scissors line) that
takes a similar route as commit a0be1db4304f, but I consider that a
workaround at best. But if that is what the community wants... I have to
get the job done one way or the other.

Many thanks for your help and your time.

---8<

From: Halil Pasic 
Date: Thu, 25 Jul 2019 18:44:21 +0200
Subject: [PATCH 1/1] s390/cio: fix virtio-ccw DMA without PV

Commit 37db8985b211 ("s390/cio: add basic protected virtualization
support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non
Protected Virtualization (PV) guests. The problem is that the dma_mask
of the ccw device, which is used by virtio core, gets changed from 64 to
31 bit, because some of the DMA allocations do require 31 bit
addressable memory. For PV the only drawback is that some of the virtio
structures must end up in ZONE_DMA because we have the bounce the
buffers mapped via DMA API anyway.

But for non PV guests we have a problem: because of the 31 bit mask
guests bigger than 2G are likely to try bouncing buffers. The swiotlb
however is only initialized for PV guests, because we don't want to
bounce anything for non PV guests. The first such map kills the guest.

Since the DMA API won't allow us to specify for each allocating whether
we need memory from ZONE_DMA (31 bit addressable) or any DMA capable
memory will do, let us abuse coherent_dma_mask (which is used for
allocations) to force allocating form ZONE_DMA while changing dma_mask
to DMA_BIT_MASK(64).

Signed-off-by: Halil Pasic 
Reported-by: Marc 

Re: [PATCH 1/6] iommu/qcom: Use the asid read from device-tree if specified

2019-09-26 Thread Rob Clark
On Thu, Sep 26, 2019 at 5:05 AM  wrote:
>
> From: AngeloGioacchino Del Regno 
>
> As specified in this driver, the context banks are 0x1000 apart.
> Problem is that sometimes the context number (our asid) does not
> match this logic and we end up using the wrong one: this starts
> being a problem in the case that we need to send TZ commands
> to do anything on a specific context.
>
> For this reason, read the ASID from the DT if the property
> "qcom,ctx-num" is present on the IOMMU context node.
>
> Signed-off-by: AngeloGioacchino Del Regno 
> ---
>  .../devicetree/bindings/iommu/qcom,iommu.txt|  1 +
>  drivers/iommu/qcom_iommu.c  | 17 ++---
>  2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
> b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> index 059139abce35..98102b323196 100644
> --- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
> @@ -38,6 +38,7 @@ to non-secure vs secure interrupt line.
>  - "qcom,msm-iommu-v1-sec" : secure context bank
>- reg: Base address and size of context bank within the iommu
>- interrupts : The context fault irq.
> +  - qcom,ctx-num   : The number associated to the context bank


I guess this should be more like:

+  and the following optional properties:
+  - qcom,ctx-num   : The number associated to the context bank

(since this is an optional property)

BR,
-R

>
>  ** Optional properties:
>
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index dadc707573a2..5837556af147 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -557,7 +557,8 @@ static int qcom_iommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>  * index into qcom_iommu->ctxs:
>  */
> if (WARN_ON(asid < 1) ||
> -   WARN_ON(asid > qcom_iommu->num_ctxs))
> +   WARN_ON(asid > qcom_iommu->num_ctxs) ||
> +   WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL))
> return -EINVAL;
>
> if (!fwspec->iommu_priv) {
> @@ -665,7 +666,8 @@ static int qcom_iommu_sec_ptbl_init(struct device *dev)
>
>  static int get_asid(const struct device_node *np)
>  {
> -   u32 reg;
> +   u32 reg, val;
> +   int asid;
>
> /* read the "reg" property directly to get the relative address
>  * of the context bank, and calculate the asid from that:
> @@ -673,7 +675,16 @@ static int get_asid(const struct device_node *np)
> if (of_property_read_u32_index(np, "reg", 0, ))
> return -ENODEV;
>
> -   return reg / 0x1000;  /* context banks are 0x1000 apart */
> +   /* Context banks are 0x1000 apart but, in some cases, the ASID
> +* number doesn't match to this logic and needs to be passed
> +* from the DT configuration explicitly.
> +*/
> +   if (of_property_read_u32(np, "qcom,ctx-num", ))
> +   asid = reg / 0x1000;
> +   else
> +   asid = val;
> +
> +   return asid;
>  }
>
>  static int qcom_iommu_ctx_probe(struct platform_device *pdev)
> --
> 2.21.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching code

2019-09-26 Thread Biju Das
Hi Geert,

Thanks for the feedback.

> Subject: Re: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching
> code
> 
> Hi Biju,
> 
> On Tue, Sep 24, 2019 at 9:43 AM Biju Das  wrote:
> > Support RZ/G2N (R8A774B1) IPMMU.
> >
> > Signed-off-by: Biju Das 
> 
> Thanks for your patch!
> 
> > --- a/drivers/iommu/ipmmu-vmsa.c
> > +++ b/drivers/iommu/ipmmu-vmsa.c
> 
> >  static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = {
> > { .soc_id = "r8a774c0", },
> > +   { .soc_id = "r8a774b1", },
> 
> Please preserve alphabetical sort order.
OK. Will send V2.


> > { .soc_id = "r8a7795", .revision = "ES3.*" },
> > { .soc_id = "r8a77965", },
> > { .soc_id = "r8a77990", },
> 
> With the above fixed:
> Reviewed-by: Geert Uytterhoeven 
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-
> m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching code

2019-09-26 Thread Geert Uytterhoeven
Hi Biju,

On Tue, Sep 24, 2019 at 9:43 AM Biju Das  wrote:
> Support RZ/G2N (R8A774B1) IPMMU.
>
> Signed-off-by: Biju Das 

Thanks for your patch!

> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c

>  static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = {
> { .soc_id = "r8a774c0", },
> +   { .soc_id = "r8a774b1", },

Please preserve alphabetical sort order.

> { .soc_id = "r8a7795", .revision = "ES3.*" },
> { .soc_id = "r8a77965", },
> { .soc_id = "r8a77990", },

With the above fixed:
Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable

2019-09-26 Thread Robin Murphy

On 26/09/2019 13:37, Halil Pasic wrote:

On Mon, 23 Sep 2019 17:21:17 +0200
Christoph Hellwig  wrote:


On Mon, Sep 23, 2019 at 02:34:16PM +0200, Halil Pasic wrote:

Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in
common code") tweaking the client code supplied GFP_* flags used to be
an issue handled in the architecture specific code. The commit message
suggests, that fixing the client code would actually be a better way
of dealing with this.

On s390 common I/O devices are generally capable of using the full 64
bit address space for DMA I/O, but some chunks of the DMA memory need to
be 31 bit addressable (in physical address space) because the
instructions involved mandate it. Before switching to DMA API this used
to be a non-issue, we used to allocate those chunks from ZONE_DMA.
Currently our only option with the DMA API is to restrict the devices to
(via dma_mask and dma_mask_coherent) to 31 bit, which is sub-optimal.

Thus s390 we would benefit form having control over what flags are
dropped.


No way, sorry.  You need to express that using a dma mask instead of
overloading the GFP flags.


Thanks for your feedback and sorry for the delay. Can you help me figure
out how can I express that using a dma mask?

IMHO what you ask from me is frankly impossible.

What I need is the ability to ask for  (considering the physical
address) 31 bit addressable DMA memory if the chunk is supposed to host
control-type data that needs to be 31 bit addressable because that is
how the architecture is, without affecting the normal data-path. So
normally 64 bit mask is fine but occasionally (control) we would need
a 31 bit mask.


If it's possible to rework the "data" path to use streaming mappings 
instead of coherent allocations, you could potentially mimic what virtio 
does for a similar situation - see commit a0be1db4304f.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dt-bindings: iommu: ipmmu-vmsa: Add r8a774b1 support

2019-09-26 Thread Geert Uytterhoeven
On Tue, Sep 24, 2019 at 9:41 AM Biju Das  wrote:
> Document RZ/G2N (R8A774B1) SoC bindings.
>
> Signed-off-by: Biju Das 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] iommu/qcom: Add support for QCIOMMUv2 and QCIOMMU-500 secured contexts

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

This IOMMU is yet another Qualcomm variant of known IOMMUs, found in
Family-B SoCs, such as MSM8956, MSM8976, MSM8953, MSM8917 and others,
and that firmware perfectly adheres to this driver logic.
This time, though, the catch is that the secure contexts are also
secured, meaning that these are programmed by the bootloader or TZ
and their "interesting" registers are locked out, so the hypervisor
disallows touching them from the non-secure world: in this case
the OS is supposed to blindly trust the secure configuration of
these contexts and just use them "as they are".

For this reason, it is necessary to distinguish between the v1 and
500/v2 secure contexts in this driver in order to adhere to this
specification. To do this, add a new DT compatible, named
"qcom,msm-iommu-v2-sec" that will trigger the new behavior.

For the sake of completeness, also add a "qcom,msm-iommu-v2-ns" so
that the human eye gets pleased with it when reading the contexts
in the final SoC DT. Of course, the latter is just cosmetic.

Signed-off-by: AngeloGioacchino Del Regno 
---
 .../devicetree/bindings/iommu/qcom,iommu.txt  |  2 ++
 drivers/iommu/qcom_iommu.c| 19 +--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
index a4dd76b8c566..44676d3221d5 100644
--- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
@@ -36,6 +36,8 @@ to non-secure vs secure interrupt line.
   - compatible : Should be one of:
 - "qcom,msm-iommu-v1-ns"  : non-secure context bank
 - "qcom,msm-iommu-v1-sec" : secure context bank
+- "qcom,msm-iommu-v2-ns"  : non-secure QSMMUv2/QSMMU500 context bank
+- "qcom,msm-iommu-v2-sec" : secure QSMMUv2/QSMMU500 context bank
   - reg: Base address and size of context bank within the iommu
   - interrupts : The context fault irq.
   - qcom,ctx-num   : The number associated to the context bank
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 03c68fe9439b..2f65a4cdca78 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -56,6 +56,7 @@ struct qcom_iommu_ctx {
struct device   *dev;
void __iomem*base;
bool secure_init;
+   bool secured_ctx;
u8   asid;  /* asid and ctx bank # are 1:1 */
struct iommu_domain *domain;
 };
@@ -281,6 +282,12 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
ctx->secure_init = true;
}
 
+   /* Secured QSMMU-500/QSMMU-v2 contexts cannot be programmed */
+   if (ctx->secured_ctx) {
+   ctx->domain = domain;
+   break;
+   }
+
qcom_iommu_reset_ctx(ctx);
 
tcr[0] = pgtbl_cfg.arm_lpae_s1_cfg.tcr;
@@ -762,10 +769,15 @@ static int qcom_iommu_ctx_probe(struct platform_device 
*pdev)
return -ENODEV;
}
 
+   if (of_device_is_compatible(dev->of_node, "qcom,msm-iommu-v2-sec"))
+   ctx->secured_ctx = true;
+
/* clear IRQs before registering fault handler, just in case the
 * boot-loader left us a surprise:
 */
-   iommu_writel(ctx, ARM_SMMU_CB_FSR, iommu_readl(ctx, ARM_SMMU_CB_FSR));
+   if (!ctx->secured_ctx)
+   iommu_writel(ctx, ARM_SMMU_CB_FSR,
+iommu_readl(ctx, ARM_SMMU_CB_FSR));
 
ret = devm_request_irq(dev, irq,
   qcom_iommu_fault,
@@ -807,6 +819,8 @@ static int qcom_iommu_ctx_remove(struct platform_device 
*pdev)
 static const struct of_device_id ctx_of_match[] = {
{ .compatible = "qcom,msm-iommu-v1-ns" },
{ .compatible = "qcom,msm-iommu-v1-sec" },
+   { .compatible = "qcom,msm-iommu-v2-ns" },
+   { .compatible = "qcom,msm-iommu-v2-sec" },
{ /* sentinel */ }
 };
 
@@ -824,7 +838,8 @@ static bool qcom_iommu_has_secure_context(struct 
qcom_iommu_dev *qcom_iommu)
struct device_node *child;
 
for_each_child_of_node(qcom_iommu->dev->of_node, child)
-   if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec"))
+   if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec") ||
+   of_device_is_compatible(child, "qcom,msm-iommu-v2-sec"))
return true;
 
return false;
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] iommu/qcom: Add support for AArch64 IOMMU pagetables

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

Some IOMMUs associated with some TZ firmwares may support switching
to the AArch64 pagetable format by sending a "set pagetable format"
scm command indicating the IOMMU secure ID and the context number
to switch.

Add a DT property "qcom,use-aarch64-pagetables" for this driver to
send this command to the secure world and to switch the pagetable
format to benefit of the ARM64 IOMMU pagetables, where possible.

Note that, even though the command should be valid to switch each
context, the property is made global because:
1. It doesn't make too much sense to switch only one or two
   context(s) to AA64 instead of just the entire thing
2. Some IOMMUs will go crazy and produce spectacular results when
   trying to mix up the pagetables on a per-context basis.

Signed-off-by: AngeloGioacchino Del Regno 
---
 .../devicetree/bindings/iommu/qcom,iommu.txt  |  2 +
 drivers/iommu/qcom_iommu.c| 55 +++
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
index 98102b323196..a4dd76b8c566 100644
--- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
@@ -39,6 +39,8 @@ to non-secure vs secure interrupt line.
   - reg: Base address and size of context bank within the iommu
   - interrupts : The context fault irq.
   - qcom,ctx-num   : The number associated to the context bank
+  - qcom,use-aarch64-pagetables : Switch to AArch64 pagetable format on all
+  contexts declared in this IOMMU
 
 ** Optional properties:
 
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 2f31da2e7add..233ef496af27 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -48,6 +48,7 @@ struct qcom_iommu_dev {
void __iomem*local_base;
u32  sec_id;
u8   num_ctxs;
+   bool use_aarch64_pt;
struct qcom_iommu_ctx   *ctxs[0];   /* indexed by asid-1 */
 };
 
@@ -153,11 +154,17 @@ static void qcom_iommu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
reg = leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
for (i = 0; i < fwspec->num_ids; i++) {
+   struct qcom_iommu_dev *qcom_iommu = to_iommu(fwspec);
struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
size_t s = size;
 
-   iova &= ~12UL;
-   iova |= ctx->asid;
+   if (qcom_iommu->use_aarch64_pt) {
+   iova >>= 12;
+   iova |= (u64)ctx->asid << 48;
+   } else {
+   iova &= ~12UL;
+   iova |= ctx->asid;
+   }
do {
iommu_writel(ctx, reg, iova);
iova += granule;
@@ -222,6 +229,8 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
struct io_pgtable_ops *pgtbl_ops;
struct io_pgtable_cfg pgtbl_cfg;
+   enum io_pgtable_fmt pgtbl_fmt;
+   unsigned long ias, oas;
int i, ret = 0;
u32 reg;
 
@@ -229,16 +238,25 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
if (qcom_domain->iommu)
goto out_unlock;
 
+   if (qcom_iommu->use_aarch64_pt) {
+   pgtbl_fmt = ARM_64_LPAE_S1;
+   ias = oas = 48;
+   } else {
+   pgtbl_fmt = ARM_32_LPAE_S1;
+   ias = 32;
+   oas = 40;
+   }
+
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = qcom_iommu_ops.pgsize_bitmap,
-   .ias= 32,
-   .oas= 40,
+   .ias= ias,
+   .oas= oas,
.tlb= _gather_ops,
.iommu_dev  = qcom_iommu->dev,
};
 
qcom_domain->iommu = qcom_iommu;
-   pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, _cfg, fwspec);
+   pgtbl_ops = alloc_io_pgtable_ops(pgtbl_fmt, _cfg, fwspec);
if (!pgtbl_ops) {
dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
ret = -ENOMEM;
@@ -252,6 +270,7 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
 
for (i = 0; i < fwspec->num_ids; i++) {
struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
+   u32 tcr[2];
 
if (!ctx->secure_init) {
ret = qcom_scm_restore_sec_cfg(qcom_iommu->sec_id, 
ctx->asid);
@@ -264,12 +283,25 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
 
qcom_iommu_reset_ctx(ctx);
 

[PATCH 2/6] iommu/qcom: Write TCR before TTBRs to fix ASID access behavior

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

As also stated in the arm-smmu driver, we must write the TCR before
writing the TTBRs, since the TCR determines the access behavior of
some fields.

Signed-off-by: AngeloGioacchino Del Regno 
---
 drivers/iommu/qcom_iommu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 5837556af147..8431fb97a50f 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -245,6 +245,13 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
ctx->secure_init = true;
}
 
+   /* TCR */
+   iommu_writel(ctx, ARM_SMMU_CB_TCR2,
+   (pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
+   FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
+   iommu_writel(ctx, ARM_SMMU_CB_TCR,
+   pgtbl_cfg.arm_lpae_s1_cfg.tcr);
+
/* TTBRs */
iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
@@ -253,13 +260,6 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
FIELD_PREP(TTBRn_ASID, ctx->asid));
 
-   /* TCR */
-   iommu_writel(ctx, ARM_SMMU_CB_TCR2,
-   (pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
-   FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
-   iommu_writel(ctx, ARM_SMMU_CB_TCR,
-   pgtbl_cfg.arm_lpae_s1_cfg.tcr);
-
/* MAIRs (stage-1 only) */
iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
pgtbl_cfg.arm_lpae_s1_cfg.mair[0]);
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/6] iommu/qcom: Index contexts by asid number to allow asid 0

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

This driver was indexing the contexts by asid-1, which is probably
done under the assumption that the first ASID is always 1.

Unfortunately this is not entirely true: at least in the MSM8956
and MSM8976 GPU IOMMU, the gpu_user context's ASID number is zero.
To allow using an asid number of zero, stop indexing the contexts
by asid-1 and rather index them by asid.

Signed-off-by: AngeloGioacchino Del Regno 
---
 drivers/iommu/qcom_iommu.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 233ef496af27..03c68fe9439b 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -49,7 +49,7 @@ struct qcom_iommu_dev {
u32  sec_id;
u8   num_ctxs;
bool use_aarch64_pt;
-   struct qcom_iommu_ctx   *ctxs[0];   /* indexed by asid-1 */
+   struct qcom_iommu_ctx   *ctxs[0];   /* indexed by asid */
 };
 
 struct qcom_iommu_ctx {
@@ -87,7 +87,7 @@ static struct qcom_iommu_ctx * to_ctx(struct iommu_fwspec 
*fwspec, unsigned asid
struct qcom_iommu_dev *qcom_iommu = to_iommu(fwspec);
if (!qcom_iommu)
return NULL;
-   return qcom_iommu->ctxs[asid - 1];
+   return qcom_iommu->ctxs[asid];
 }
 
 static inline void
@@ -604,12 +604,10 @@ static int qcom_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
qcom_iommu = platform_get_drvdata(iommu_pdev);
 
/* make sure the asid specified in dt is valid, so we don't have
-* to sanity check this elsewhere, since 'asid - 1' is used to
-* index into qcom_iommu->ctxs:
+* to sanity check this elsewhere:
 */
-   if (WARN_ON(asid < 1) ||
-   WARN_ON(asid > qcom_iommu->num_ctxs) ||
-   WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL))
+   if (WARN_ON(asid > qcom_iommu->num_ctxs) ||
+   WARN_ON(qcom_iommu->ctxs[asid] == NULL))
return -EINVAL;
 
if (!fwspec->iommu_priv) {
@@ -789,7 +787,7 @@ static int qcom_iommu_ctx_probe(struct platform_device 
*pdev)
 
dev_dbg(dev, "found asid %u\n", ctx->asid);
 
-   qcom_iommu->ctxs[ctx->asid - 1] = ctx;
+   qcom_iommu->ctxs[ctx->asid] = ctx;
 
return 0;
 }
@@ -801,7 +799,7 @@ static int qcom_iommu_ctx_remove(struct platform_device 
*pdev)
 
platform_set_drvdata(pdev, NULL);
 
-   qcom_iommu->ctxs[ctx->asid - 1] = NULL;
+   qcom_iommu->ctxs[ctx->asid] = NULL;
 
return 0;
 }
@@ -846,7 +844,8 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
for_each_child_of_node(dev->of_node, child)
max_asid = max(max_asid, get_asid(child));
 
-   sz = sizeof(*qcom_iommu) + (max_asid * sizeof(qcom_iommu->ctxs[0]));
+   sz = sizeof(*qcom_iommu);
+   sz += (max_asid + 1) * sizeof(qcom_iommu->ctxs[0]);
 
qcom_iommu = devm_kzalloc(dev, sz, GFP_KERNEL);
if (!qcom_iommu)
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] iommu/qcom: Use the asid read from device-tree if specified

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

As specified in this driver, the context banks are 0x1000 apart.
Problem is that sometimes the context number (our asid) does not
match this logic and we end up using the wrong one: this starts
being a problem in the case that we need to send TZ commands
to do anything on a specific context.

For this reason, read the ASID from the DT if the property
"qcom,ctx-num" is present on the IOMMU context node.

Signed-off-by: AngeloGioacchino Del Regno 
---
 .../devicetree/bindings/iommu/qcom,iommu.txt|  1 +
 drivers/iommu/qcom_iommu.c  | 17 ++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
index 059139abce35..98102b323196 100644
--- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
@@ -38,6 +38,7 @@ to non-secure vs secure interrupt line.
 - "qcom,msm-iommu-v1-sec" : secure context bank
   - reg: Base address and size of context bank within the iommu
   - interrupts : The context fault irq.
+  - qcom,ctx-num   : The number associated to the context bank
 
 ** Optional properties:
 
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index dadc707573a2..5837556af147 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -557,7 +557,8 @@ static int qcom_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
 * index into qcom_iommu->ctxs:
 */
if (WARN_ON(asid < 1) ||
-   WARN_ON(asid > qcom_iommu->num_ctxs))
+   WARN_ON(asid > qcom_iommu->num_ctxs) ||
+   WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL))
return -EINVAL;
 
if (!fwspec->iommu_priv) {
@@ -665,7 +666,8 @@ static int qcom_iommu_sec_ptbl_init(struct device *dev)
 
 static int get_asid(const struct device_node *np)
 {
-   u32 reg;
+   u32 reg, val;
+   int asid;
 
/* read the "reg" property directly to get the relative address
 * of the context bank, and calculate the asid from that:
@@ -673,7 +675,16 @@ static int get_asid(const struct device_node *np)
if (of_property_read_u32_index(np, "reg", 0, ))
return -ENODEV;
 
-   return reg / 0x1000;  /* context banks are 0x1000 apart */
+   /* Context banks are 0x1000 apart but, in some cases, the ASID
+* number doesn't match to this logic and needs to be passed
+* from the DT configuration explicitly.
+*/
+   if (of_property_read_u32(np, "qcom,ctx-num", ))
+   asid = reg / 0x1000;
+   else
+   asid = val;
+
+   return asid;
 }
 
 static int qcom_iommu_ctx_probe(struct platform_device *pdev)
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/6] Add support for QCOM IOMMU v2 and 500

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

Some Qualcomm Family-B SoCs have got a different version of the QCOM
IOMMU, specifically v2 and 500, which perfectly adhere to the current
qcom_iommu driver, but need some variations due to slightly different
hypervisor behavior.

The personal aim is to upstream MSM8956 as much as possible.

This code has been tested on two Sony phones featuring the Qualcomm
MSM8956 SoC.

AngeloGioacchino Del Regno (6):
  iommu/qcom: Use the asid read from device-tree if specified
  iommu/qcom: Write TCR before TTBRs to fix ASID access behavior
  iommu/qcom: Properly reset the IOMMU context
  iommu/qcom: Add support for AArch64 IOMMU pagetables
  iommu/qcom: Index contexts by asid number to allow asid 0
  iommu/qcom: Add support for QCIOMMUv2 and QCIOMMU-500 secured contexts

 .../devicetree/bindings/iommu/qcom,iommu.txt  |   5 +
 drivers/iommu/qcom_iommu.c| 133 ++
 2 files changed, 111 insertions(+), 27 deletions(-)

-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] iommu/qcom: Properly reset the IOMMU context

2019-09-26 Thread kholk11
From: AngeloGioacchino Del Regno 

To avoid context faults reset the context entirely on detach and
to ensure a fresh clean start also do a complete reset before
programming the context for domain initialization.

Signed-off-by: AngeloGioacchino Del Regno 
---
 drivers/iommu/qcom_iommu.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 8431fb97a50f..2f31da2e7add 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -198,6 +198,23 @@ static irqreturn_t qcom_iommu_fault(int irq, void *dev)
return IRQ_HANDLED;
 }
 
+static void qcom_iommu_reset_ctx(struct qcom_iommu_ctx *ctx)
+{
+   iommu_writel(ctx, ARM_SMMU_CB_FAR, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_FSR, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR1, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_PAR, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_SCTLR, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_TCR2, 0);
+   iommu_writel(ctx, ARM_SMMU_CB_TCR, 0);
+   iommu_writeq(ctx, ARM_SMMU_CB_TTBR0, 0);
+   iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
+
+   /* Should we issue a TLBSYNC there instead? */
+   mb();
+}
+
 static int qcom_iommu_init_domain(struct iommu_domain *domain,
  struct qcom_iommu_dev *qcom_iommu,
  struct iommu_fwspec *fwspec)
@@ -245,6 +262,8 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
ctx->secure_init = true;
}
 
+   qcom_iommu_reset_ctx(ctx);
+
/* TCR */
iommu_writel(ctx, ARM_SMMU_CB_TCR2,
(pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
@@ -390,8 +409,8 @@ static void qcom_iommu_detach_dev(struct iommu_domain 
*domain, struct device *de
for (i = 0; i < fwspec->num_ids; i++) {
struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
 
-   /* Disable the context bank: */
-   iommu_writel(ctx, ARM_SMMU_CB_SCTLR, 0);
+   /* Disable and reset the context bank */
+   qcom_iommu_reset_ctx(ctx);
 
ctx->domain = NULL;
}
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] iommu/amd: Lock code paths traversing protection_domain->dev_list

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

The traversing of this list requires protection_domain->lock to be taken
to avoid nasty races with attach/detach code. Make sure the lock is held
on all code-paths traversing this list.

Reported-by: Filippo Sironi 
Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c | 25 -
1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index bac4e20a5919..9c26976a0f99 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1334,8 +1334,12 @@ static void domain_flush_np_cache(struct 
protection_domain *domain,
dma_addr_t iova, size_t size)
{
if (unlikely(amd_iommu_np_cache)) {
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
domain_flush_pages(domain, iova, size);
domain_flush_complete(domain);
+   spin_unlock_irqrestore(>lock, flags);
}
}

@@ -1700,8 +1704,13 @@ static int iommu_map_page(struct protection_domain *dom,
ret = 0;

out:
-   if (updated)
+   if (updated) {
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
update_domain(dom);
+   spin_unlock_irqrestore(>lock, flags);
+   }

/* Everything flushed out, free pages now */
free_page_list(freelist);
@@ -1857,8 +1866,12 @@ static void free_gcr3_table(struct protection_domain 
*domain)

static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom)
{
+   unsigned long flags;
+
+   spin_lock_irqsave(>domain.lock, flags);
domain_flush_tlb(>domain);
domain_flush_complete(>domain);
+   spin_unlock_irqrestore(>domain.lock, flags);
}

static void iova_domain_flush_tlb(struct iova_domain *iovad)
@@ -2414,6 +2427,7 @@ static dma_addr_t __map_single(struct device *dev,
{
dma_addr_t offset = paddr & ~PAGE_MASK;
dma_addr_t address, start, ret;
+   unsigned long flags;
unsigned int pages;
int prot = 0;
int i;
@@ -2451,8 +2465,10 @@ static dma_addr_t __map_single(struct device *dev,
iommu_unmap_page(_dom->domain, start, PAGE_SIZE);
}

+   spin_lock_irqsave(_dom->domain.lock, flags);
domain_flush_tlb(_dom->domain);
domain_flush_complete(_dom->domain);
+   spin_unlock_irqrestore(_dom->domain.lock, flags);

dma_ops_free_iova(dma_dom, address, pages);

@@ -2481,8 +2497,12 @@ static void __unmap_single(struct dma_ops_domain 
*dma_dom,
}

if (amd_iommu_unmap_flush) {
+   unsigned long flags;
+
+   spin_lock_irqsave(_dom->domain.lock, flags);
domain_flush_tlb(_dom->domain);
domain_flush_complete(_dom->domain);
+   spin_unlock_irqrestore(_dom->domain.lock, flags);
dma_ops_free_iova(dma_dom, dma_addr, pages);
} else {
pages = __roundup_pow_of_two(pages);
@@ -3246,9 +3266,12 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
{
struct protection_domain *dom = to_pdomain(domain);
+   unsigned long flags;

+   spin_lock_irqsave(>lock, flags);
domain_flush_tlb_pde(dom);
domain_flush_complete(dom);
+   spin_unlock_irqrestore(>lock, flags);
}

static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/6] iommu/amd: Lock dev_data in attach/detach code paths

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

Make sure that attaching a detaching a device can't race against each
other and protect the iommu_dev_data with a spin_lock in these code
paths.

Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c   | 9 +
drivers/iommu/amd_iommu_types.h | 3 +++
2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 459247c32dc0..bac4e20a5919 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -201,6 +201,7 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
if (!dev_data)
return NULL;

+   spin_lock_init(_data->lock);
dev_data->devid = devid;
ratelimit_default_init(_data->rs);

@@ -2157,6 +2158,8 @@ static int attach_device(struct device *dev,

dev_data = get_dev_data(dev);

+   spin_lock(_data->lock);
+
ret = -EBUSY;
if (dev_data->domain != NULL)
goto out;
@@ -2199,6 +2202,8 @@ static int attach_device(struct device *dev,
domain_flush_complete(domain);

out:
+   spin_unlock(_data->lock);
+
spin_unlock_irqrestore(>lock, flags);

return ret;
@@ -2218,6 +2223,8 @@ static void detach_device(struct device *dev)

spin_lock_irqsave(>lock, flags);

+   spin_lock(_data->lock);
+
/*
 * First check if the device is still attached. It might already
 * be detached from its domain because the generic
@@ -2240,6 +2247,8 @@ static void detach_device(struct device *dev)
dev_data->ats.enabled = false;

out:
+   spin_unlock(_data->lock);
+
spin_unlock_irqrestore(>lock, flags);
}

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0186501ab971..c9c1612d52e0 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -633,6 +633,9 @@ struct devid_map {
 * This struct contains device specific data for the IOMMU
 */
struct iommu_dev_data {
+   /*Protect against attach/detach races */
+   spinlock_t lock;
+
struct list_head list;/* For domain->dev_list */
struct llist_node dev_data_list;  /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu/amd: Check for busy devices earlier in attach_device()

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

Check early in attach_device whether the device is already attached to a
domain. This also simplifies the code path so that __attach_device() can
be removed.

Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c | 25 +++--
1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2919168577ff..459247c32dc0 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2072,23 +2072,6 @@ static void do_detach(struct iommu_dev_data *dev_data)
domain->dev_cnt -= 1;
}

-/*
- * If a device is not yet associated with a domain, this function makes the
- * device visible in the domain
- */
-static int __attach_device(struct iommu_dev_data *dev_data,
-  struct protection_domain *domain)
-{
-   if (dev_data->domain != NULL)
-   return -EBUSY;
-
-   /* Attach alias group root */
-   do_attach(dev_data, domain);
-
-   return 0;
-}
-
-
static void pdev_iommuv2_disable(struct pci_dev *pdev)
{
pci_disable_ats(pdev);
@@ -2174,6 +2157,10 @@ static int attach_device(struct device *dev,

dev_data = get_dev_data(dev);

+   ret = -EBUSY;
+   if (dev_data->domain != NULL)
+   goto out;
+
if (!dev_is_pci(dev))
goto skip_ats_check;

@@ -2198,7 +2185,9 @@ static int attach_device(struct device *dev,
}

skip_ats_check:
-   ret = __attach_device(dev_data, domain);
+   ret = 0;
+
+   do_attach(dev_data, domain);

/*
 * We might boot into a crash-kernel here. The crashed kernel
--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/6] iommu/amd: Take domain->lock for complete attach/detach path

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

The code-paths before __attach_device() and __detach_device() are called
also access and modify domain state, so take the domain lock there too.
This allows to get rid of the __detach_device() function.

Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c | 65 ---
1 file changed, 26 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 37a9c04fc728..2919168577ff 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2079,27 +2079,13 @@ static void do_detach(struct iommu_dev_data *dev_data)
static int __attach_device(struct iommu_dev_data *dev_data,
   struct protection_domain *domain)
{
-   unsigned long flags;
-   int ret;
-
-   /* lock domain */
-   spin_lock_irqsave(>lock, flags);
-
-   ret = -EBUSY;
if (dev_data->domain != NULL)
-   goto out_unlock;
+   return -EBUSY;

/* Attach alias group root */
do_attach(dev_data, domain);

-   ret = 0;
-
-out_unlock:
-
-   /* ready */
-   spin_unlock_irqrestore(>lock, flags);
-
-   return ret;
+   return 0;
}


@@ -2181,8 +2167,11 @@ static int attach_device(struct device *dev,
{
struct pci_dev *pdev;
struct iommu_dev_data *dev_data;
+   unsigned long flags;
int ret;

+   spin_lock_irqsave(>lock, flags);
+
dev_data = get_dev_data(dev);

if (!dev_is_pci(dev))
@@ -2190,12 +2179,13 @@ static int attach_device(struct device *dev,

pdev = to_pci_dev(dev);
if (domain->flags & PD_IOMMUV2_MASK) {
+   ret = -EINVAL;
if (!dev_data->passthrough)
-   return -EINVAL;
+   goto out;

if (dev_data->iommu_v2) {
if (pdev_iommuv2_enable(pdev) != 0)
-   return -EINVAL;
+   goto out;

dev_data->ats.enabled = true;
dev_data->ats.qdep= pci_ats_queue_depth(pdev);
@@ -2219,24 +2209,10 @@ static int attach_device(struct device *dev,

domain_flush_complete(domain);

-   return ret;
-}
-
-/*
- * Removes a device from a protection domain (unlocked)
- */
-static void __detach_device(struct iommu_dev_data *dev_data)
-{
-   struct protection_domain *domain;
-   unsigned long flags;
-
-   domain = dev_data->domain;
-
-   spin_lock_irqsave(>lock, flags);
-
-   do_detach(dev_data);
-
+out:
spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
}

/*
@@ -2246,10 +,13 @@ static void detach_device(struct device *dev)
{
struct protection_domain *domain;
struct iommu_dev_data *dev_data;
+   unsigned long flags;

dev_data = get_dev_data(dev);
domain   = dev_data->domain;

+   spin_lock_irqsave(>lock, flags);
+
/*
 * First check if the device is still attached. It might already
 * be detached from its domain because the generic
@@ -2257,12 +2236,12 @@ static void detach_device(struct device *dev)
 * our alias handling.
 */
if (WARN_ON(!dev_data->domain))
-   return;
+   goto out;

-   __detach_device(dev_data);
+   do_detach(dev_data);

if (!dev_is_pci(dev))
-   return;
+   goto out;

if (domain->flags & PD_IOMMUV2_MASK && dev_data->iommu_v2)
pdev_iommuv2_disable(to_pci_dev(dev));
@@ -2270,6 +2249,9 @@ static void detach_device(struct device *dev)
pci_disable_ats(to_pci_dev(dev));

dev_data->ats.enabled = false;
+
+out:
+   spin_unlock_irqrestore(>lock, flags);
}

static int amd_iommu_add_device(struct device *dev)
@@ -2904,13 +2886,18 @@ int __init amd_iommu_init_dma_ops(void)
static void cleanup_domain(struct protection_domain *domain)
{
struct iommu_dev_data *entry;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);

while (!list_empty(>dev_list)) {
entry = list_first_entry(>dev_list,
 struct iommu_dev_data, list);
BUG_ON(!entry->domain);
-   __detach_device(entry);
+   do_detach(entry);
}
+
+   spin_unlock_irqrestore(>lock, flags);
}

static void protection_domain_free(struct protection_domain *domain)
--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/6] iommu/amd: Remove amd_iommu_devtable_lock

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

The lock is not necessary because the device table does not
contain shared state that needs protection. Locking is only
needed on an individual entry basis, and that needs to
happen on the iommu_dev_data level.

Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c | 23 ++-
1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 042854bbc5bc..37a9c04fc728 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -70,7 +70,6 @@
 */
#define AMD_IOMMU_PGSIZES   ((~0xFFFUL) & ~(2ULL << 38))

-static DEFINE_SPINLOCK(amd_iommu_devtable_lock);
static DEFINE_SPINLOCK(pd_bitmap_lock);

/* List of all available dev_data structures */
@@ -2080,10 +2079,11 @@ static void do_detach(struct iommu_dev_data *dev_data)
static int __attach_device(struct iommu_dev_data *dev_data,
   struct protection_domain *domain)
{
+   unsigned long flags;
int ret;

/* lock domain */
-   spin_lock(>lock);
+   spin_lock_irqsave(>lock, flags);

ret = -EBUSY;
if (dev_data->domain != NULL)
@@ -2097,7 +2097,7 @@ static int __attach_device(struct iommu_dev_data 
*dev_data,
out_unlock:

/* ready */
-   spin_unlock(>lock);
+   spin_unlock_irqrestore(>lock, flags);

return ret;
}
@@ -2181,7 +2181,6 @@ static int attach_device(struct device *dev,
{
struct pci_dev *pdev;
struct iommu_dev_data *dev_data;
-   unsigned long flags;
int ret;

dev_data = get_dev_data(dev);
@@ -2209,9 +2208,7 @@ static int attach_device(struct device *dev,
}

skip_ats_check:
-   spin_lock_irqsave(_iommu_devtable_lock, flags);
ret = __attach_device(dev_data, domain);
-   spin_unlock_irqrestore(_iommu_devtable_lock, flags);

/*
 * We might boot into a crash-kernel here. The crashed kernel
@@ -2231,14 +2228,15 @@ static int attach_device(struct device *dev,
static void __detach_device(struct iommu_dev_data *dev_data)
{
struct protection_domain *domain;
+   unsigned long flags;

domain = dev_data->domain;

-   spin_lock(>lock);
+   spin_lock_irqsave(>lock, flags);

do_detach(dev_data);

-   spin_unlock(>lock);
+   spin_unlock_irqrestore(>lock, flags);
}

/*
@@ -2248,7 +2246,6 @@ static void detach_device(struct device *dev)
{
struct protection_domain *domain;
struct iommu_dev_data *dev_data;
-   unsigned long flags;

dev_data = get_dev_data(dev);
domain   = dev_data->domain;
@@ -2262,10 +2259,7 @@ static void detach_device(struct device *dev)
if (WARN_ON(!dev_data->domain))
return;

-   /* lock device table */
-   spin_lock_irqsave(_iommu_devtable_lock, flags);
__detach_device(dev_data);
-   spin_unlock_irqrestore(_iommu_devtable_lock, flags);

if (!dev_is_pci(dev))
return;
@@ -2910,9 +2904,6 @@ int __init amd_iommu_init_dma_ops(void)
static void cleanup_domain(struct protection_domain *domain)
{
struct iommu_dev_data *entry;
-   unsigned long flags;
-
-   spin_lock_irqsave(_iommu_devtable_lock, flags);

while (!list_empty(>dev_list)) {
entry = list_first_entry(>dev_list,
@@ -2920,8 +2911,6 @@ static void cleanup_domain(struct protection_domain 
*domain)
BUG_ON(!entry->domain);
__detach_device(entry);
}
-
-   spin_unlock_irqrestore(_iommu_devtable_lock, flags);
}

static void protection_domain_free(struct protection_domain *domain)
--
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] iommu/amd: Remove domain->updated

2019-09-26 Thread Jerry Snitselaar

On Wed Sep 25 19, Joerg Roedel wrote:

From: Joerg Roedel 

This struct member was used to track whether a domain
change requires updates to the device-table and IOMMU cache
flushes. The problem is, that access to this field is racy
since locking in the common mapping code-paths has been
eliminated.

Move the updated field to the stack to get rid of all
potential races and remove the field from the struct.

Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Signed-off-by: Joerg Roedel 
---
drivers/iommu/amd_iommu.c   | 49 +
drivers/iommu/amd_iommu_types.h |  1 -
2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7bdce3b10f3d..042854bbc5bc 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1458,10 +1458,11 @@ static void free_pagetable(struct protection_domain 
*domain)
 * another level increases the size of the address space by 9 bits to a size up
 * to 64 bits.
 */
-static void increase_address_space(struct protection_domain *domain,
+static bool increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
{
unsigned long flags;
+   bool ret = false;
u64 *pte;

spin_lock_irqsave(>lock, flags);
@@ -1478,19 +1479,21 @@ static void increase_address_space(struct 
protection_domain *domain,
iommu_virt_to_phys(domain->pt_root));
domain->pt_root  = pte;
domain->mode+= 1;
-   domain->updated  = true;
+
+   ret = true;

out:
spin_unlock_irqrestore(>lock, flags);

-   return;
+   return ret;
}

static u64 *alloc_pte(struct protection_domain *domain,
  unsigned long address,
  unsigned long page_size,
  u64 **pte_page,
- gfp_t gfp)
+ gfp_t gfp,
+ bool *updated)
{
int level, end_lvl;
u64 *pte, *page;
@@ -1498,7 +1501,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
BUG_ON(!is_power_of_2(page_size));

while (address > PM_LEVEL_SIZE(domain->mode))
-   increase_address_space(domain, gfp);
+   *updated = increase_address_space(domain, gfp) || *updated;

level   = domain->mode - 1;
pte = >pt_root[PM_LEVEL_INDEX(level, address)];
@@ -1530,7 +1533,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
for (i = 0; i < count; ++i)
cmpxchg64([i], __pte, 0ULL);

-   domain->updated = true;
+   *updated = true;
continue;
}

@@ -1547,7 +1550,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
if (cmpxchg64(pte, __pte, __npte) != __pte)
free_page((unsigned long)page);
else if (IOMMU_PTE_PRESENT(__pte))
-   domain->updated = true;
+   *updated = true;

continue;
}
@@ -1656,28 +1659,29 @@ static int iommu_map_page(struct protection_domain *dom,
  gfp_t gfp)
{
struct page *freelist = NULL;
+   bool updated = false;
u64 __pte, *pte;
-   int i, count;
+   int ret, i, count;

BUG_ON(!IS_ALIGNED(bus_addr, page_size));
BUG_ON(!IS_ALIGNED(phys_addr, page_size));

+   ret = -EINVAL;
if (!(prot & IOMMU_PROT_MASK))
-   return -EINVAL;
+   goto out;

count = PAGE_SIZE_PTE_COUNT(page_size);
-   pte   = alloc_pte(dom, bus_addr, page_size, NULL, gfp);
+   pte   = alloc_pte(dom, bus_addr, page_size, NULL, gfp, );

-   if (!pte) {
-   update_domain(dom);
-   return -ENOMEM;
-   }
+   ret = -ENOMEM;
+   if (!pte)
+   goto out;

for (i = 0; i < count; ++i)
freelist = free_clear_pte([i], pte[i], freelist);

if (freelist != NULL)
-   dom->updated = true;
+   updated = true;

if (count > 1) {
__pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
@@ -1693,12 +1697,16 @@ static int iommu_map_page(struct protection_domain *dom,
for (i = 0; i < count; ++i)
pte[i] = __pte;

-   update_domain(dom);
+   ret = 0;
+
+out:
+   if (updated)
+   update_domain(dom);

/* Everything flushed out, free pages now */
free_page_list(freelist);

-   return 0;
+   return ret;
}

static unsigned long iommu_unmap_page(struct protection_domain *dom,
@@ -2399,15 +2407,10 @@ static void update_device_table(struct 
protection_domain *domain)

static void update_domain(struct protection_domain *domain)
{
-   if (!domain->updated)
- 

[LINUX PATCH] dma-mapping: Control memset operation using gfp flags

2019-09-26 Thread Dylan Yip
In case of 4k video buffer, the allocation from a reserved memory is
taking a long time, ~500ms. This is root caused to the memset()
operations on the allocated memory which is consuming more cpu cycles.
Due to this delay, we see that initial frames are being dropped.

To fix this, we have wrapped the default memset, done when allocating
coherent memory, under the __GFP_ZERO flag. So, we only clear
allocated memory if __GFP_ZERO flag is enabled. We believe this
should be safe as the video decoder always writes before reading.
This optimizes decoder initialization as we do not set the __GFP_ZERO
flag when allocating memory for decoder. With this optimization, we
don't see initial frame drops and decoder initialization time is
~100ms.

This patch adds plumbing through dma_alloc functions to pass gfp flag
set by user to __dma_alloc_from_coherent(). Here gfp flag is checked
for __GFP_ZERO. If present, we memset the buffer to 0 otherwise we
skip memset.

Signed-off-by: Dylan Yip 
---
 arch/arm/mm/dma-mapping-nommu.c |  2 +-
 include/linux/dma-mapping.h | 11 +++
 kernel/dma/coherent.c   | 15 +--
 kernel/dma/mapping.c|  2 +-
 4 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 52b8255..242b2c3 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -35,7 +35,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t 
size,
 unsigned long attrs)
 
 {
-   void *ret = dma_alloc_from_global_coherent(size, dma_handle);
+   void *ret = dma_alloc_from_global_coherent(size, dma_handle, gfp);
 
/*
 * dma_alloc_from_global_coherent() may fail because:
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea..b715c9f 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -160,24 +160,27 @@ static inline int is_device_dma_capable(struct device 
*dev)
  * Don't use them in device drivers.
  */
 int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size,
-  dma_addr_t *dma_handle, void **ret);
+  dma_addr_t *dma_handle, void **ret,
+  gfp_t flag);
 int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr);
 
 int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, size_t size, int *ret);
 
-void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle);
+void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle,
+gfp_t flag);
 int dma_release_from_global_coherent(int order, void *vaddr);
 int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr,
  size_t size, int *ret);
 
 #else
-#define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0)
+#define dma_alloc_from_dev_coherent(dev, size, handle, ret, flag) (0)
 #define dma_release_from_dev_coherent(dev, order, vaddr) (0)
 #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0)
 
 static inline void *dma_alloc_from_global_coherent(ssize_t size,
-  dma_addr_t *dma_handle)
+  dma_addr_t *dma_handle,
+  gfp_t flag)
 {
return NULL;
 }
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 29fd659..d85fab5 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -136,7 +136,7 @@ void dma_release_declared_memory(struct device *dev)
 EXPORT_SYMBOL(dma_release_declared_memory);
 
 static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem,
-   ssize_t size, dma_addr_t *dma_handle)
+   ssize_t size, dma_addr_t *dma_handle, gfp_t gfp_flag)
 {
int order = get_order(size);
unsigned long flags;
@@ -158,7 +158,8 @@ static void *__dma_alloc_from_coherent(struct 
dma_coherent_mem *mem,
*dma_handle = mem->device_base + (pageno << PAGE_SHIFT);
ret = mem->virt_base + (pageno << PAGE_SHIFT);
spin_unlock_irqrestore(>spinlock, flags);
-   memset(ret, 0, size);
+   if (gfp_flag & __GFP_ZERO)
+   memset(ret, 0, size);
return ret;
 err:
spin_unlock_irqrestore(>spinlock, flags);
@@ -172,6 +173,7 @@ static void *__dma_alloc_from_coherent(struct 
dma_coherent_mem *mem,
  * @dma_handle:This will be filled with the correct dma handle
  * @ret:   This pointer will be filled with the virtual address
  * to allocated area.
+ * @flag:  gfp flag set by user
  *
  * This function should be only called from per-arch dma_alloc_coherent()
  * to support allocation from per-device coherent memory pools.
@@ -180,24 +182,25