Re: [PATCH v2 01/11] iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth

2015-07-28 Thread Joerg Roedel
Hi Bjorn,

On Mon, Jul 27, 2015 at 05:54:53PM -0500, Bjorn Helgaas wrote:
  Hmm, this is a place where the relaxed error handling in
  pci_enable_ats() can get problematic. 
 
 By relaxed error handling, do you mean the fact that in v4.1,
 pci_enable_ats() has a BUG_ON(is_enabled), while now it merely
 returns -EINVAL?
 
 (BTW, I did change it to add a WARN_ON and return -EBUSY as you suggested.)

Okay, great.

  If ATS is (by accident or a bug
  elsewhere) already enabled an the function returns -EINVAL, the IOMMU
  driver considers ATS disabled and doesn't flush the IO/TLBs of the
  device. This can cause data corruption later on, so we should make sure
  that info-ats.enabled is consistent with pdev-ats_enabled.
 
 I'm not quite sure I understand this.  In the patch above, we only set
 info-ats.enabled = 1 if pci_enable_ats() has succeeded.  The amd_iommu.c
 code is similar.
 
 Are you concerned about the case where future code enables ATS before
 intel-iommu, the pci_enable_ats() in intel-iommu fails, intel-iommu
 believes ATS is disabled, intel-iommu calls iommu_flush_dev_iotlb(), but it
 doesn't flush the IOTLB?
 
 I guess I could make intel-iommu handle -EBUSY differently from -EINVAL.
 Would that help?  It seems sort of clumsy, but ...?

I was concerned that it was harder now to spot bugs in ATS
enabling/disabling, when pci_enable_ats just returns -EINVAL when ATS is
already enabled.

But with the WARN_ON now we will notice these bugs early again, thanks
for adding it.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library

2015-07-28 Thread Joerg Roedel
On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote:
 From: Harish Chegondi harish.chego...@intel.com
 
 This patch converts iova.c into a library, moving it from
 drivers/iommu/ to lib/, and exports its virtual address allocation and
 management functions so that other modules can reuse them.
 
 Cc: Joerg Roedel j...@8bytes.org
 Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com
 Reviewed-by: Sudeep Dutt sudeep.d...@intel.com
 Signed-off-by: Harish Chegondi harish.chego...@intel.com

Where is this going to be used outside of the IOMMU world?



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use

2015-07-28 Thread Will Deacon
Hi Robin,

On Mon, Jul 27, 2015 at 07:18:08PM +0100, Robin Murphy wrote:
 Currently, users of the LPAE page table code are (ab)using dma_map_page()
 as a means to flush page table updates for non-coherent IOMMUs. Since
 from the CPU's point of view, creating IOMMU page tables *is* passing
 DMA buffers to a device (the IOMMU's page table walker), there's little
 reason not to use the DMA API correctly.
 
 Allow drivers to opt into appropriate DMA operations for page table
 allocation and updates by providing the relevant device, and make the
 flush_pgtable() callback optional in case those DMA API operations are
 sufficient. The expectation is that an LPAE IOMMU should have a full view
 of system memory, so use streaming mappings to avoid unnecessary pressure
 on ZONE_DMA, and treat any DMA translation as a warning sign.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com
 ---
 
 Hi all,
 
 Since Russell fixing Tegra[1] reminded me, I dug this out from, er,
 rather a long time ago[2] and tidied it up. I've tested the SMMUv2
 version with the MMU-401s on Juno (both coherent and non-coherent)
 with no visible regressions; I have the same hope for the SMMUv3 and
 IPMMU changes since they should be semantically identical. At worst
 the Renesas driver might need a larger DMA mask setting as per
 f1d84548694f, but given that there shouldn't be any highmem involved
 I'd think it should be OK as-is.
 
 Robin.
 
 [1]:http://article.gmane.org/gmane.linux.ports.tegra/23150
 [2]:http://article.gmane.org/gmane.linux.kernel.iommu/8972
 
  drivers/iommu/io-pgtable-arm.c | 107 
 +++--
  drivers/iommu/io-pgtable.h |   2 +
  2 files changed, 84 insertions(+), 25 deletions(-)
 
 diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
 index 4e46021..b93a60e 100644
 --- a/drivers/iommu/io-pgtable-arm.c
 +++ b/drivers/iommu/io-pgtable-arm.c
 @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte;
  
  static bool selftest_running = false;
  
 +static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages)
 +{
 + return phys_to_dma(dev, virt_to_phys(pages));
 +}
 +
 +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 + struct io_pgtable_cfg *cfg, void *cookie)
 +{
 + void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO);
 + struct device *dev = cfg-iommu_dev;
 + dma_addr_t dma;
 +
 + if (!pages)
 + return NULL;

Missing newline here.

 + if (dev) {
 + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
 + if (dma_mapping_error(dev, dma))
 + goto out_free;
 + /*
 +  * We depend on the IOMMU being able to work with any physical
 +  * address directly, so if the DMA layer suggests it can't by
 +  * giving us back some translation, that bodes very badly...
 +  */
 + if (WARN(dma != __arm_lpae_dma(dev, pages),
 +  Cannot accommodate DMA translation for IOMMU page 
 tables\n))

Now that we have a struct device for the iommu, we could use dev_err to make
this diagnostic more useful.

 + goto out_unmap;
 + }

Missing newline again...

 + if (cfg-tlb-flush_pgtable)

Why would you have both a dev and a flush callback? In which cases is the
DMA API insufficient?

 + cfg-tlb-flush_pgtable(pages, size, cookie);

... and here (yeah, pedantry, but consistency keeps this file easier to
read).

 + return pages;
 +
 +out_unmap:
 + dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
 +out_free:
 + free_pages_exact(pages, size);
 + return NULL;
 +}
 +
 +static void __arm_lpae_free_pages(void *pages, size_t size,
 +   struct io_pgtable_cfg *cfg)
 +{
 + struct device *dev = cfg-iommu_dev;
 +
 + if (dev)
 + dma_unmap_single(dev, __arm_lpae_dma(dev, pages),
 +  size, DMA_TO_DEVICE);
 + free_pages_exact(pages, size);
 +}
 +
 +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
 +struct io_pgtable_cfg *cfg, void *cookie)
 +{
 + struct device *dev = cfg-iommu_dev;
 +
 + *ptep = pte;
 +
 + if (dev)
 + dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep),
 +sizeof(pte), DMA_TO_DEVICE);
 + if (cfg-tlb-flush_pgtable)
 + cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie);

Could we kill the flush_pgtable callback completely and just stick in a
dma_wmb() here? Ideally, we've have something like dma_store_release,
which we could use to set the parent page table entry, but that's left
as a future exercise ;)

 diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
 index 10e32f6..39fe864 100644
 --- a/drivers/iommu/io-pgtable.h
 +++ b/drivers/iommu/io-pgtable.h
 @@ -41,6 +41,7 @@ struct iommu_gather_ops {
   * @ias:  

Re: [PATCH 2/5] iommu/arm-smmu: Sort out coherency

2015-07-28 Thread Will Deacon
Hi Robin,

Cheers for doing this. Just a few comments (mainly in relation to being
consistent with SMMUv3).

On Mon, Jul 27, 2015 at 07:18:09PM +0100, Robin Murphy wrote:
 Currently we detect the whether the SMMU has coherent page table walk

we detect the whether?

 capability from the IDR0.CTTW field, and base our cache maintenance
 decisions on that. In preparation for fixing the bogus DMA API usage,
 however, we need to ensure that the DMA API agrees about this, which
 necessitates deferring to the dma-coherent property in the device tree
 for the final say.
 
 As an added bonus, since systems exist where an external CTTW signal
 has been tied off incorrectly at integration, allowing DT to override
 it offers a neat workaround for coherency issues with such SMMUs.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com
 ---
  Documentation/devicetree/bindings/iommu/arm,smmu.txt |  4 
  drivers/iommu/arm-smmu.c | 17 ++---
  2 files changed, 18 insertions(+), 3 deletions(-)
 
 diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
 b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
 index 0676050..04724fc 100644
 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
 +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
 @@ -43,6 +43,10 @@ conditions.
  
  ** System MMU optional properties:
  
 +- dma-coherent  : Presence or absence should correctly reflect whether
 +  the SMMU translation table walk interface is
 +  cache-coherent with the CPU.
 +

Please copy the text from:
Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt#
and remove the bit about stream table accesses.

  - calxeda,smmu-secure-config-access : Enable proper handling of buggy
implementations that always use secure access to
SMMU configuration registers. In this case non-secure
 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index 4cd0c29..9a59bef 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -1532,6 +1532,7 @@ static int arm_smmu_device_cfg_probe(struct 
 arm_smmu_device *smmu)
   unsigned long size;
   void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
   u32 id;
 + bool cttw_dt, cttw_reg;
  
   dev_notice(smmu-dev, probing hardware configuration...\n);
   dev_notice(smmu-dev, SMMUv%d with:\n, smmu-version);
 @@ -1571,10 +1572,20 @@ static int arm_smmu_device_cfg_probe(struct 
 arm_smmu_device *smmu)
   dev_notice(smmu-dev, \taddress translation ops\n);
   }
  
 - if (id  ID0_CTTW) {
 + /*
 +  * In order for DMA API calls to work properly, we must defer to what
 +  * the DT says about coherency, regardless of what the hardware claims.
 +  * Fortunately, this also opens up a workaround for systems where the
 +  * ID register value has ended up configured incorrectly.
 +  */
 + cttw_dt = of_property_read_bool(smmu-dev-of_node, dma-coherent);

Use of_dma_is_coherent(smmu-dev-of_node)

 + if (cttw_dt)
   smmu-features |= ARM_SMMU_FEAT_COHERENT_WALK;
 - dev_notice(smmu-dev, \tcoherent table walk\n);
 - }
 + cttw_reg = !!(id  ID0_CTTW);
 + if (cttw_dt || cttw_reg)
 + dev_notice(smmu-dev, \t%scoherent table walk%s\n,
 +cttw_dt ?  : no ,
 +(cttw_dt == cttw_reg) ?  :  (overridden by DT));

Similarly here; I think we should just follow the message printed by the
SMMUv3 driver but s/IDR0.COHACC/ID0.CTTW/.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.

2015-07-28 Thread Will Deacon
On Tue, Jul 28, 2015 at 02:37:43PM +0100, Yong Wu wrote:
 On Tue, 2015-07-28 at 12:00 +0100, Will Deacon wrote:
  On Tue, Jul 28, 2015 at 06:08:14AM +0100, Yong Wu wrote:
   On Mon, 2015-07-27 at 15:11 +0100, Will Deacon wrote:
On Mon, Jul 27, 2015 at 03:05:38PM +0100, Robin Murphy wrote:
 On 27/07/15 05:21, Yong Wu wrote:
  +   } else {/* page or largepage */
  +   if (quirk  IO_PGTABLE_QUIRK_SHORT_MTK) {
  +   if (large) { /* special Bit */
 
  This definitely needs a better comment! What exactly are you 
  doing here
  and what is that quirk all about?
 
  I use this quirk is for MTK Special Bit as we don't have the XN 
  bit in
  pagetable.
 
  I'm still not really clear about what this is.
 
  There is some difference between the standard spec and MTK HW,
  Our hw don't implement some bits, like XN and AP.
  So I add a quirk for MTK special.
 
 When you say it doesn't implement these bits, do you mean that having 
 them set will lead to Bad Things happening in the hardware, or that 
 it 
 will simply ignore them and not enforce any of the protections they 
 imply? The former case would definitely want clearly documenting 
 somewhere, whereas for the latter case I'm not sure it's even worth 
 the 
 complication of having a quirk - if the value doesn't matter there 
 seems 
 little point in doing a special dance just for the sake of semantic 
 correctness of the in-memory PTEs, in my opinion.

Agreed. We should only use quirks if the current (architecturally
compliant) code causes real issues with the hardware. Then the quirk can
be used to either avoid the problematic routines or to take extra steps
to make things work as the architecture intended.

I've asked how this IOMMU differs from the architecture on a number of
occasions, but I'm still yet to receive a response other than it's 
special.

   
   After check further with DE, Our pagetable is refer to ARM-v7's
   short-descriptor which is a little different from ARM-v8. like bit0(PXN)
   in section and supersection, I didn't read ARM-v7 spec before, so I add
   a MTK quirk to disable PXN bit in section and supersection.(if the PXN
   bit is wrote in ARM-v7 spec, HW will page fault.)
  
  I've been reviewing this using the ARMv7 ARM (Rev.C of DDI0406C) the whole
  time. PXN is there as an optional field in non-LPAE implementations. That's
  fine and doesn't require any quirks.
 
 Thanks for your confirm.
 Then I delete all the PXN bit in here?
 
 Take a example, 
 #define ARM_SHORT_PGD_SECTION_XN  (BIT(0) | BIT(4))
 I will change it to BIT(4).

Yes. Then the PXN bit can be added later as a quirk when we have an
implementation that supports it.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library

2015-07-28 Thread David Woodhouse
On Tue, 2015-07-28 at 11:41 +0100, Robin Murphy wrote:
 On 28/07/15 11:03, Joerg Roedel wrote:
  On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote:
   From: Harish Chegondi harish.chego...@intel.com
   
   This patch converts iova.c into a library, moving it from
   drivers/iommu/ to lib/, and exports its virtual address 
   allocation and
   management functions so that other modules can reuse them.
   
   Cc: Joerg Roedel j...@8bytes.org
   Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com
   
   Reviewed-by: Sudeep Dutt sudeep.d...@intel.com
   Signed-off-by: Harish Chegondi harish.chego...@intel.com
  
  Where is this going to be used outside of the IOMMU world?
  
 
 ...and how does it relate to the patches from Sakari (+CC) doing much 
 the same thing[1]?

I merged Sakari's patches into the intel-iommu git tree today, FWIW.

If there's really a need to move it from drivers/iommu/ to lib/ then we
could feasibly do that too.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/5] iommu/arm-smmu: Clean up DMA API usage

2015-07-28 Thread Robin Murphy

On 28/07/15 11:18, Will Deacon wrote:

On Mon, Jul 27, 2015 at 07:18:10PM +0100, Robin Murphy wrote:

With the correct DMA API calls now integrated into the io-pgtable code,
let that handle the flushing of non-coherent page table updates.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
  drivers/iommu/arm-smmu.c | 23 +++
  1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9a59bef..9501668 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -610,24 +610,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
  static void arm_smmu_flush_pgtable(void *addr, size_t size, void *cookie)
  {
struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_device *smmu = smmu_domain-smmu;
-   unsigned long offset = (unsigned long)addr  ~PAGE_MASK;

-
-   /* Ensure new page tables are visible to the hardware walker */
-   if (smmu-features  ARM_SMMU_FEAT_COHERENT_WALK) {
+   /*
+* Ensure new page tables are visible to a coherent hardware walker.
+* The page table code deals with flushing for the non-coherent case.
+*/
+   if (smmu_domain-smmu-features  ARM_SMMU_FEAT_COHERENT_WALK)
dsb(ishst);
-   } else {
-   /*
-* If the SMMU can't walk tables in the CPU caches, treat them
-* like non-coherent DMA since we need to flush the new entries
-* all the way out to memory. There's no possibility of
-* recursion here as the SMMU table walker will not be wired
-* through another SMMU.
-*/
-   dma_map_page(smmu-dev, virt_to_page(addr), offset, size,
-DMA_TO_DEVICE);
-   }
  }

  static struct iommu_gather_ops arm_smmu_gather_ops = {
@@ -899,6 +888,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.oas= oas,
.tlb= arm_smmu_gather_ops,
};
+   if (!(smmu-features  ARM_SMMU_FEAT_COHERENT_WALK))
+   pgtbl_cfg.iommu_dev = smmu-dev;


Ideally, I'd like to set the dev unconditionally and kill the flush_pgtable
callback. See my comments on the other patch.


I'm not sure setting dev unconditionally would be right - we'd end up 
doing a load of busy work in the DMA API to decide to do nothing and 
never get a barrier out of it. We could have a bool coherent in the 
cfg to make the wmb() happen, but the mere absence of dev would do that 
job just as well (wmb() isn't going to need it to report anything, 
unlike the DMA API path); either way flush_pgtable can go.


Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/11] PCI: Fix ATS deadlock

2015-07-28 Thread Joerg Roedel
Hi Bjorn,

On Mon, Jul 20, 2015 at 07:13:49PM -0500, Bjorn Helgaas wrote:
 Bjorn Helgaas (11):
   iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth
   PCI: Allocate ATS struct during enumeration
   PCI: Embed ATS info directly into struct pci_dev
   PCI: Reduce size of ATS structure elements
   PCI: Rationalize pci_ats_queue_depth() error checking
   PCI: Inline the ATS setup code into pci_ats_init()
   PCI: Use pci_physfn() rather than looking up physfn by hand
   PCI: Clean up ATS error handling
   PCI: Move ATS declarations to linux/pci.h so they're all together
   PCI: Stop caching ATS Invalidate Queue Depth
   PCI: Remove pci_ats_enabled()

So while you are at working on ATS, could we probably also introduce a
way to globally disable ATS on a box from the kernel command line
(pci=noats?)? ATS is basically a way for a device to tunnel through the
IOMMU without access checking (because pre-translated requests are not
checked again).

For security reasons it would be good to have a way to disable ATS
completly if desired.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/arm-smmu: Sort out coherency

2015-07-28 Thread Robin Murphy

Hi Will,

On 28/07/15 10:43, Will Deacon wrote:

Hi Robin,

Cheers for doing this. Just a few comments (mainly in relation to being
consistent with SMMUv3).


Thanks!


On Mon, Jul 27, 2015 at 07:18:09PM +0100, Robin Murphy wrote:

Currently we detect the whether the SMMU has coherent page table walk


we detect the whether?


Er, yeah, looks like rain tomorrow afternoon :P


capability from the IDR0.CTTW field, and base our cache maintenance
decisions on that. In preparation for fixing the bogus DMA API usage,
however, we need to ensure that the DMA API agrees about this, which
necessitates deferring to the dma-coherent property in the device tree
for the final say.

As an added bonus, since systems exist where an external CTTW signal
has been tied off incorrectly at integration, allowing DT to override
it offers a neat workaround for coherency issues with such SMMUs.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
  Documentation/devicetree/bindings/iommu/arm,smmu.txt |  4 
  drivers/iommu/arm-smmu.c | 17 ++---
  2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 0676050..04724fc 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -43,6 +43,10 @@ conditions.

  ** System MMU optional properties:

+- dma-coherent  : Presence or absence should correctly reflect whether
+  the SMMU translation table walk interface is
+  cache-coherent with the CPU.
+


Please copy the text from:
Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt#
and remove the bit about stream table accesses.


Done, although since the resulting ...DMA operations made by the SMMU 
(page table walks)... is both redundant and horribly clunky it gets 
rewritten as ...page table walks made by the SMMU...



  - calxeda,smmu-secure-config-access : Enable proper handling of buggy
implementations that always use secure access to
SMMU configuration registers. In this case non-secure
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4cd0c29..9a59bef 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1532,6 +1532,7 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
unsigned long size;
void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
u32 id;
+   bool cttw_dt, cttw_reg;

dev_notice(smmu-dev, probing hardware configuration...\n);
dev_notice(smmu-dev, SMMUv%d with:\n, smmu-version);
@@ -1571,10 +1572,20 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
dev_notice(smmu-dev, \taddress translation ops\n);
}

-   if (id  ID0_CTTW) {
+   /*
+* In order for DMA API calls to work properly, we must defer to what
+* the DT says about coherency, regardless of what the hardware claims.
+* Fortunately, this also opens up a workaround for systems where the
+* ID register value has ended up configured incorrectly.
+*/
+   cttw_dt = of_property_read_bool(smmu-dev-of_node, dma-coherent);


Use of_dma_is_coherent(smmu-dev-of_node)


*facepalm*


+   if (cttw_dt)
smmu-features |= ARM_SMMU_FEAT_COHERENT_WALK;
-   dev_notice(smmu-dev, \tcoherent table walk\n);
-   }
+   cttw_reg = !!(id  ID0_CTTW);
+   if (cttw_dt || cttw_reg)
+   dev_notice(smmu-dev, \t%scoherent table walk%s\n,
+  cttw_dt ?  : no ,
+  (cttw_dt == cttw_reg) ?  :  (overridden by DT));


Similarly here; I think we should just follow the message printed by the
SMMUv3 driver but s/IDR0.COHACC/ID0.CTTW/.


I note that the v3 driver stays silent as long as the DT and hardware 
agree - I'm reticent to entirely remove the original message here as 
it's proved rather useful (especially in debugging-over-email exchanges) 
to see at a glance from a boot log whether the driver thinks a 
particular SMMU is coherent or not (particularly with my SMMUv1 
many-instances-in-the-same-system hat on). I can certainly make the 
override message more consistent, though.


Robin

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use

2015-07-28 Thread Will Deacon
On Tue, Jul 28, 2015 at 04:39:23PM +0100, Robin Murphy wrote:
 On 28/07/15 11:17, Will Deacon wrote:
  +  if (cfg-tlb-flush_pgtable)
 
  Why would you have both a dev and a flush callback? In which cases is the
  DMA API insufficient?
 
 a) Bisectability given existing users.

You could still make it an if (dev) .. else if (flush_pgtable) ... though.
Then we could put the wmb() in the if () clause after the DMA-api calls
and remove the conditional once everybody has been ported over.

  +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
  + struct io_pgtable_cfg *cfg, void *cookie)
  +{
  +  struct device *dev = cfg-iommu_dev;
  +
  +  *ptep = pte;
  +
  +  if (dev)
  +  dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep),
  + sizeof(pte), DMA_TO_DEVICE);
  +  if (cfg-tlb-flush_pgtable)
  +  cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie);
 
  Could we kill the flush_pgtable callback completely and just stick in a
  dma_wmb() here? Ideally, we've have something like dma_store_release,
  which we could use to set the parent page table entry, but that's left
  as a future exercise ;)
 
 I couldn't convince myself that there wouldn't never be some weird case 
 where an IOMMU driver still needs to do something funky for a coherent 
 device, so I left it in. Given that the existing implementations are 
 either dsb or nothing, however, I agree it may be worth taking out now 
 and only bringing back later if proven necessary.

Yeah, let's keep it as simple as we can and avoid giving people callbacks
unless they actually need them.

 I would think we'd need at least wmb() though, since dma_wmb() only 
 gives us a dmb on arm64; if the PTE is going from invalid to valid (i.e. 
 no TLB involved), we'd have the normal cacheable write of the PTE 
 potentially racing with an MMIO write after we return (the do DMA with 
 this address command to the master) and I think we might need a dsb to 
 avoid that - if the PTE write hasn't got far enough for the IOMMU to 
 snoop it, the walk hits the stale invalid entry, aborts the incoming 
 transaction and kills the device.

Yes, you're right. I was in a CPU-centric mindset and forgot that we can't
deal with transient faults when going from invalid - valid for a device
mapping.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains

2015-07-28 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 89e6d4b..6d3dae9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2164,15 +2164,17 @@ static int attach_device(struct device *dev,
dev_data = get_dev_data(dev);
 
if (domain-flags  PD_IOMMUV2_MASK) {
-   if (!dev_data-iommu_v2 || !dev_data-passthrough)
+   if (!dev_data-passthrough)
return -EINVAL;
 
-   if (pdev_iommuv2_enable(pdev) != 0)
-   return -EINVAL;
+   if (dev_data-iommu_v2) {
+   if (pdev_iommuv2_enable(pdev) != 0)
+   return -EINVAL;
 
-   dev_data-ats.enabled = true;
-   dev_data-ats.qdep= pci_ats_queue_depth(pdev);
-   dev_data-pri_tlp = pci_pri_tlp_required(pdev);
+   dev_data-ats.enabled = true;
+   dev_data-ats.qdep= pci_ats_queue_depth(pdev);
+   dev_data-pri_tlp = pci_pri_tlp_required(pdev);
+   }
} else if (amd_iommu_iotlb_sup 
   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
dev_data-ats.enabled = true;
@@ -2237,7 +2239,7 @@ static void detach_device(struct device *dev)
__detach_device(dev_data);
write_unlock_irqrestore(amd_iommu_devtable_lock, flags);
 
-   if (domain-flags  PD_IOMMUV2_MASK)
+   if (domain-flags  PD_IOMMUV2_MASK  dev_data-iommu_v2)
pdev_iommuv2_disable(to_pci_dev(dev));
else if (dev_data-ats.enabled)
pci_disable_ats(to_pci_dev(dev));
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] iommu/amd: Use iommu core for passthrough mode

2015-07-28 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu.c  | 58 ++
 drivers/iommu/amd_iommu_init.c | 10 +---
 2 files changed, 3 insertions(+), 65 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a57e9b7..89e6d4b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -76,8 +76,6 @@ LIST_HEAD(hpet_map);
  * Domain for untranslated devices - only allocated
  * if iommu=pt passed on kernel cmd line.
  */
-static struct protection_domain *pt_domain;
-
 static const struct iommu_ops amd_iommu_ops;
 
 static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
@@ -96,7 +94,7 @@ struct iommu_dev_data {
struct protection_domain *domain; /* Domain the device is bound to */
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
-   bool passthrough; /* Default for device is pt_domain */
+   bool passthrough; /* Device is identity mapped */
struct {
bool enabled;
int qdep;
@@ -116,7 +114,6 @@ struct iommu_cmd {
 struct kmem_cache *amd_iommu_irq_cache;
 
 static void update_domain(struct protection_domain *domain);
-static int alloc_passthrough_domain(void);
 static int protection_domain_init(struct protection_domain *domain);
 
 /
@@ -2221,15 +2218,6 @@ static void __detach_device(struct iommu_dev_data 
*dev_data)
do_detach(head);
 
spin_unlock_irqrestore(domain-lock, flags);
-
-   /*
-* If we run in passthrough mode the device must be assigned to the
-* passthrough domain if it is detached from any other domain.
-* Make sure we can deassign from the pt_domain itself.
-*/
-   if (dev_data-passthrough 
-   (dev_data-domain == NULL  domain != pt_domain))
-   __attach_device(dev_data, pt_domain);
 }
 
 /*
@@ -2287,7 +2275,7 @@ static int amd_iommu_add_device(struct device *dev)
 
BUG_ON(!dev_data);
 
-   if (dev_data-iommu_v2)
+   if (iommu_pass_through || dev_data-iommu_v2)
iommu_request_dm_for_dev(dev);
 
/* Domains are initialized for this device - have a look what we ended 
up with */
@@ -2947,21 +2935,6 @@ out_err:
return NULL;
 }
 
-static int alloc_passthrough_domain(void)
-{
-   if (pt_domain != NULL)
-   return 0;
-
-   /* allocate passthrough domain */
-   pt_domain = protection_domain_alloc();
-   if (!pt_domain)
-   return -ENOMEM;
-
-   pt_domain-mode = PAGE_MODE_NONE;
-
-   return 0;
-}
-
 static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
 {
struct protection_domain *pdomain;
@@ -3222,33 +3195,6 @@ static const struct iommu_ops amd_iommu_ops = {
  *
  */
 
-int __init amd_iommu_init_passthrough(void)
-{
-   struct iommu_dev_data *dev_data;
-   struct pci_dev *dev = NULL;
-   int ret;
-
-   ret = alloc_passthrough_domain();
-   if (ret)
-   return ret;
-
-   for_each_pci_dev(dev) {
-   if (!check_device(dev-dev))
-   continue;
-
-   dev_data = get_dev_data(dev-dev);
-   dev_data-passthrough = true;
-
-   attach_device(dev-dev, pt_domain);
-   }
-
-   amd_iommu_stats_init();
-
-   pr_info(AMD-Vi: Initialized for Passthrough Mode\n);
-
-   return 0;
-}
-
 /* IOMMUv2 specific functions */
 int amd_iommu_register_ppr_notifier(struct notifier_block *nb)
 {
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index dbda9ae..a24495e 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2026,14 +2026,6 @@ static bool detect_ivrs(void)
return true;
 }
 
-static int amd_iommu_init_dma(void)
-{
-   if (iommu_pass_through)
-   return amd_iommu_init_passthrough();
-   else
-   return amd_iommu_init_dma_ops();
-}
-
 /
  *
  * AMD IOMMU Initialization State Machine
@@ -2073,7 +2065,7 @@ static int __init state_next(void)
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_INTERRUPTS_EN;
break;
case IOMMU_INTERRUPTS_EN:
-   ret = amd_iommu_init_dma();
+   ret = amd_iommu_init_dma_ops();
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_DMA_OPS;
break;
case IOMMU_DMA_OPS:
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH 4/5] iommu/amd: Use swiotlb in passthrough mode

2015-07-28 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6d3dae9..e29baa6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2282,12 +2282,10 @@ static int amd_iommu_add_device(struct device *dev)
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
-   if (domain-type == IOMMU_DOMAIN_IDENTITY) {
+   if (domain-type == IOMMU_DOMAIN_IDENTITY)
dev_data-passthrough = true;
-   dev-archdata.dma_ops = nommu_dma_ops;
-   } else {
+   else
dev-archdata.dma_ops = amd_iommu_dma_ops;
-   }
 
 out:
iommu_completion_wait(iommu);
@@ -2852,8 +2850,8 @@ int __init amd_iommu_init_api(void)
 
 int __init amd_iommu_init_dma_ops(void)
 {
+   swiotlb= iommu_pass_through ? 1 : 0;
iommu_detected = 1;
-   swiotlb = 0;
 
amd_iommu_stats_init();
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use

2015-07-28 Thread Robin Murphy

On 28/07/15 11:17, Will Deacon wrote:
[...]

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4e46021..b93a60e 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte;

  static bool selftest_running = false;

+static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages)
+{
+   return phys_to_dma(dev, virt_to_phys(pages));
+}
+
+static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+   struct io_pgtable_cfg *cfg, void *cookie)
+{
+   void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO);
+   struct device *dev = cfg-iommu_dev;
+   dma_addr_t dma;
+
+   if (!pages)
+   return NULL;


Missing newline here.


Coding style flamewar, go! Or not, since I have neither the energy or 
inclination, so I'll go with the status quo.


(Here we come and here we come and here we go etc.)


+   if (dev) {
+   dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
+   if (dma_mapping_error(dev, dma))
+   goto out_free;
+   /*
+* We depend on the IOMMU being able to work with any physical
+* address directly, so if the DMA layer suggests it can't by
+* giving us back some translation, that bodes very badly...
+*/
+   if (WARN(dma != __arm_lpae_dma(dev, pages),
+Cannot accommodate DMA translation for IOMMU page 
tables\n))


Now that we have a struct device for the iommu, we could use dev_err to make
this diagnostic more useful.


Good point.


+   goto out_unmap;
+   }


Missing newline again...


+   if (cfg-tlb-flush_pgtable)


Why would you have both a dev and a flush callback? In which cases is the
DMA API insufficient?


a) Bisectability given existing users.
b) When the device is coherent, the DMA API stuff should be a nop even 
if a device was provided, therefore some other synchronisation is needed.



+   cfg-tlb-flush_pgtable(pages, size, cookie);


... and here (yeah, pedantry, but consistency keeps this file easier to
read).


+   return pages;
+
+out_unmap:
+   dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+out_free:
+   free_pages_exact(pages, size);
+   return NULL;
+}
+
+static void __arm_lpae_free_pages(void *pages, size_t size,
+ struct io_pgtable_cfg *cfg)
+{
+   struct device *dev = cfg-iommu_dev;
+
+   if (dev)
+   dma_unmap_single(dev, __arm_lpae_dma(dev, pages),
+size, DMA_TO_DEVICE);
+   free_pages_exact(pages, size);
+}
+
+static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
+  struct io_pgtable_cfg *cfg, void *cookie)
+{
+   struct device *dev = cfg-iommu_dev;
+
+   *ptep = pte;
+
+   if (dev)
+   dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep),
+  sizeof(pte), DMA_TO_DEVICE);
+   if (cfg-tlb-flush_pgtable)
+   cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie);


Could we kill the flush_pgtable callback completely and just stick in a
dma_wmb() here? Ideally, we've have something like dma_store_release,
which we could use to set the parent page table entry, but that's left
as a future exercise ;)


I couldn't convince myself that there wouldn't never be some weird case 
where an IOMMU driver still needs to do something funky for a coherent 
device, so I left it in. Given that the existing implementations are 
either dsb or nothing, however, I agree it may be worth taking out now 
and only bringing back later if proven necessary.


I would think we'd need at least wmb() though, since dma_wmb() only 
gives us a dmb on arm64; if the PTE is going from invalid to valid (i.e. 
no TLB involved), we'd have the normal cacheable write of the PTE 
potentially racing with an MMIO write after we return (the do DMA with 
this address command to the master) and I think we might need a dsb to 
avoid that - if the PTE write hasn't got far enough for the IOMMU to 
snoop it, the walk hits the stale invalid entry, aborts the incoming 
transaction and kills the device.



diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 10e32f6..39fe864 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -41,6 +41,7 @@ struct iommu_gather_ops {
   * @ias:   Input address (iova) size, in bits.
   * @oas:   Output address (paddr) size, in bits.
   * @tlb:   TLB management callbacks for this set of tables.
+ * @iommu_dev: The owner of the page table memory (for DMA purposes).
   */
  struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_NS (1  0)  /* Set NS bit in PTEs */
@@ -49,6 +50,7 @@ struct io_pgtable_cfg {
unsigned int 

[PATCH 5/5] iommu/amd: Set global dma_ops if swiotlb is disabled

2015-07-28 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e29baa6..fa9508b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2853,6 +2853,15 @@ int __init amd_iommu_init_dma_ops(void)
swiotlb= iommu_pass_through ? 1 : 0;
iommu_detected = 1;
 
+   /*
+* In case we don't initialize SWIOTLB (actually the common case
+* when AMD IOMMU is enabled), make sure there are global
+* dma_ops set as a fall-back for devices not handled by this
+* driver (for example non-PCI devices).
+*/
+   if (!swiotlb)
+   dma_ops = nommu_dma_ops;
+
amd_iommu_stats_init();
 
if (amd_iommu_unmap_flush)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] iommu/amd: Use iommu_attach_group()

2015-07-28 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

Since the conversion to default domains the
iommu_attach_device function only works for devices with
their own group. But this isn't always true for current
IOMMUv2 capable devices, so use iommu_attach_group instead.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu_v2.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 3465faf..f7b875b 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -132,11 +132,19 @@ static struct device_state *get_device_state(u16 devid)
 
 static void free_device_state(struct device_state *dev_state)
 {
+   struct iommu_group *group;
+
/*
 * First detach device from domain - No more PRI requests will arrive
 * from that device after it is unbound from the IOMMUv2 domain.
 */
-   iommu_detach_device(dev_state-domain, dev_state-pdev-dev);
+   group = iommu_group_get(dev_state-pdev-dev);
+   if (WARN_ON(!group))
+   return;
+
+   iommu_detach_group(dev_state-domain, group);
+
+   iommu_group_put(group);
 
/* Everything is down now, free the IOMMUv2 domain */
iommu_domain_free(dev_state-domain);
@@ -731,6 +739,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
 int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 {
struct device_state *dev_state;
+   struct iommu_group *group;
unsigned long flags;
int ret, tmp;
u16 devid;
@@ -776,10 +785,16 @@ int amd_iommu_init_device(struct pci_dev *pdev, int 
pasids)
if (ret)
goto out_free_domain;
 
-   ret = iommu_attach_device(dev_state-domain, pdev-dev);
-   if (ret != 0)
+   group = iommu_group_get(pdev-dev);
+   if (!group)
goto out_free_domain;
 
+   ret = iommu_attach_group(dev_state-domain, group);
+   if (ret != 0)
+   goto out_drop_group;
+
+   iommu_group_put(group);
+
spin_lock_irqsave(state_lock, flags);
 
if (__get_device_state(devid) != NULL) {
@@ -794,6 +809,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
return 0;
 
+out_drop_group:
+   iommu_group_put(group);
+
 out_free_domain:
iommu_domain_free(dev_state-domain);
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] AMD IOMMU Fixes for v4.2-rc4

2015-07-28 Thread Joerg Roedel
Hi,

here are a couple of fixes for the AMD IOMMU driver for
issues found recently. The issues were introduced by the
default-domain conversion in this development cycle. I plan
to send these fixes upstream by the end of the week.

Please review!

Thanks,

Joerg

Joerg Roedel (5):
  iommu/amd: Use iommu_attach_group()
  iommu/amd: Use iommu core for passthrough mode
  iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
  iommu/amd: Use swiotlb in passthrough mode
  iommu/amd: Set global dma_ops if swiotlb is disabled

 drivers/iommu/amd_iommu.c  | 91 +++---
 drivers/iommu/amd_iommu_init.c | 10 +
 drivers/iommu/amd_iommu_v2.c   | 24 +--
 3 files changed, 45 insertions(+), 80 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/5] iommu/arm-smmu: add support for non-pci devices

2015-07-28 Thread Will Deacon
On Tue, Jul 21, 2015 at 11:30:07AM +0100, Robin Murphy wrote:
 On 21/07/15 08:30, Zhen Lei wrote:
  Changelog:
  v2 - v3:
  1. add support for pci device hotplug, which missed in patch v2.
  2. only support #iommu-cells = 1, add corresponding description in 
  arm,smmu-v3.txt.
  3. add function find_smmu_by_device which extracted from find_smmu_by_node, 
  to resolve
  the problem mentioned by Robin Murphy in [PATCH v2 7/9].
  Additionally:
  +platform_set_drvdata(pdev, smmu);   //Patch v2
  +   dev-archdata.iommu = smmu; //Patch v3, dev = 
  pdev-dev
 
 
 I didn't give any Reviewed-by tags, much less to revised patches that 
 I've not even looked at yet; please see section 13 of 
 Documentation/SubmittingPatches for what the Reviewed-by tag means.

That said, it would be great if you could review the patches anyway ;)

Also, it looks like this is all based on Laurent's IOMMU probe deferral
series which hasn't seen any movement for a while.

Laurent: are you planning to repost that, or is it worth us picking up
the original version and bashing it into shape ourselves?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping

2015-07-28 Thread Will Deacon
Hi Joerg, Robin,

On Mon, Jul 20, 2015 at 06:23:35PM +0100, Robin Murphy wrote:
 On 20/07/15 16:26, Joerg Roedel wrote:
  On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
arch/arm64/Kconfig   |   1 +
arch/arm64/include/asm/dma-mapping.h |  15 +-
arch/arm64/mm/dma-mapping.c  | 452 +
 
  What happened to the plan to merge this with the existing iommu-based
  dma-api implementation for 32 bit ARM?
 
 The issue currently is that there are a bunch of drivers using the 
 exported arm_iommu_* functions directly. From what I can tell, they seem 
 like they could probably all be converted to using default domains 
 and/or the new domain type abstractions via the core IOMMU API, which 
 would then allow killing off dma_iommu_mapping and rewriting the 
 arch/arm implementation to use the new shared code. I don't currently 
 have any 32-bit platform to test with, so I'm a little dubious of taking 
 that all on myself right now.
 
 In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform 
 device support, the Mediatek M4U patches and my own SMMUv2 work, so it 
 would be very useful to get the arm64 and common code in as a first 
 step, then look at cleaning up arch/arm for 4.4 without dangling 
 dependencies.

What's the plan with this? Do we need to port 32-bit ARM before it's a
candidate for merging, or can we push ahead with getting this up and
running for arm64 (which currently can't use an IOMMU for DMA)?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.

2015-07-28 Thread Will Deacon
On Tue, Jul 28, 2015 at 06:08:14AM +0100, Yong Wu wrote:
 On Mon, 2015-07-27 at 15:11 +0100, Will Deacon wrote:
  On Mon, Jul 27, 2015 at 03:05:38PM +0100, Robin Murphy wrote:
   On 27/07/15 05:21, Yong Wu wrote:
+   } else {/* page or largepage */
+   if (quirk  IO_PGTABLE_QUIRK_SHORT_MTK) {
+   if (large) { /* special Bit */
   
This definitely needs a better comment! What exactly are you doing 
here
and what is that quirk all about?
   
I use this quirk is for MTK Special Bit as we don't have the XN bit in
pagetable.
   
I'm still not really clear about what this is.
   
There is some difference between the standard spec and MTK HW,
Our hw don't implement some bits, like XN and AP.
So I add a quirk for MTK special.
   
   When you say it doesn't implement these bits, do you mean that having 
   them set will lead to Bad Things happening in the hardware, or that it 
   will simply ignore them and not enforce any of the protections they 
   imply? The former case would definitely want clearly documenting 
   somewhere, whereas for the latter case I'm not sure it's even worth the 
   complication of having a quirk - if the value doesn't matter there seems 
   little point in doing a special dance just for the sake of semantic 
   correctness of the in-memory PTEs, in my opinion.
  
  Agreed. We should only use quirks if the current (architecturally
  compliant) code causes real issues with the hardware. Then the quirk can
  be used to either avoid the problematic routines or to take extra steps
  to make things work as the architecture intended.
  
  I've asked how this IOMMU differs from the architecture on a number of
  occasions, but I'm still yet to receive a response other than it's 
  special.
  
 
 After check further with DE, Our pagetable is refer to ARM-v7's
 short-descriptor which is a little different from ARM-v8. like bit0(PXN)
 in section and supersection, I didn't read ARM-v7 spec before, so I add
 a MTK quirk to disable PXN bit in section and supersection.(if the PXN
 bit is wrote in ARM-v7 spec, HW will page fault.)

I've been reviewing this using the ARMv7 ARM (Rev.C of DDI0406C) the whole
time. PXN is there as an optional field in non-LPAE implementations. That's
fine and doesn't require any quirks.

 Then I write this code according to ARM-v8 spec defaultly, and add a
 ARM-v7 quirk?

No, I don't think you need this, as the v8 and v7 short-descriptor formats
look compatible to me. You should only need a quirk if architecturally
compliant code cannot work on your hardware.

 And there is a little different between ARM-v7 spec and MTK pagetable.
 It's the XN(bit0) in small page. MTK don't implement XN bit. 
 The bit[1:0] in MTK's small page should be 2'b10, if it's 2'b11, HW will
 page fault.

Aha, thanks! *That* is worthy of a quirk. Something like:

  IO_PGTABLE_QUIRK_ARM_NO_XN

 (MTK don't implement AP bits too, but HW don't use them, it is ok even
 though AP bits is wrote)

Yeah, I think that's fine. The pgtable code will honour the request but
the h/w will ignore it.

 In the end, I will add two quirk like this, is it OK?

I think you only need the one I mentioned above. I don't see the need
for PXN at all (as I said in the last review).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/mediatek: Add mt8173 IOMMU driver

2015-07-28 Thread Yong Wu
On Mon, 2015-07-27 at 16:49 +0100, Robin Murphy wrote:
 On 27/07/15 16:31, Russell King - ARM Linux wrote:
  On Mon, Jul 27, 2015 at 02:23:26PM +0100, Robin Murphy wrote:
  On 16/07/15 10:04, Yong Wu wrote:
  This patch adds support for mediatek m4u (MultiMedia Memory Management
  Unit).
 
  Signed-off-by: Yong Wu yong...@mediatek.com
  [...]
  +static void mtk_iommu_flush_pgtable(void *ptr, size_t size, void *cookie)
  +{
  +   struct mtk_iommu_domain *domain = cookie;
  +   unsigned long offset = (unsigned long)ptr  ~PAGE_MASK;
  +
  +   dma_map_page(domain-data-dev, virt_to_page(ptr), offset,
  +size, DMA_TO_DEVICE);
 
  Nit: this looks like it may as well be dma_map_single.
 
  It would probably be worth following it with a matching unmap too, just to
  avoid any possible leakage bugs (especially if this M4U ever appears in a
  SoC supporting RAM above the 32-bit boundary).
 
  Why not do the job properly?  Take a look at how I implemented the
  streaming DMA API on Tegra SMMU (patch set recently sent out earlier
  today).
 
  There's no need for hacks like dma_map_page() (and discarding it's
  return value) or dma_map_page() followed by dma_unmap_page().
 
 Indeed, as it happens I do have a branch where I prototyped that for the 
 long-descriptor io-pgtable-arm code a while ago; this discussion has 
 prompted me to dig it up again. Stay tuned, folks...

Hi Russell, Robin,

 From I see in arm-smmu-v3.c in v4.2-rc1, 
 
 The flush_pgtable seems like this:
//==
dma_addr_t dma_addr;

dma_addr = dma_map_page(dev, ptr, size, DMA_TO_DEVICE);

if (dma_mapping_error(dev, dma_addr))
dev_err(dev, failed to flush pgtable at %p\n, ptr);
else
dma_unmap_page(dev, dma_addr, size, DMA_TO_DEVICE);
//==
   I will change map like this and use dma_map_single instead.

   Is this also seems to be not proper?

   Then how to do it?  add this before unmap? :
   dma_sync_single_for_device(dev, dma_addr, size, DMA_TO_DEVICE);

 
 Robin.
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:13PM +0100, Robin Murphy wrote:
 +/**
 + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
 + * @dev: Device to allocate memory for. Must be a real device
 + *attached to an iommu_dma_domain
 + * @size: Size of buffer in bytes
 + * @gfp: Allocation flags
 + * @prot: IOMMU mapping flags
 + * @handle: Out argument for allocated DMA handle
 + * @flush_page: Arch callback to flush a single page from all caches to
 + *   ensure DMA visibility of __GFP_ZERO. May be NULL if not
 + *   required.
 + *
 + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
 + * but an IOMMU which supports smaller pages might not map the whole thing.
 + *
 + * Return: Array of struct page pointers describing the buffer,
 + *  or NULL on failure.
 + */
 +struct page **iommu_dma_alloc(struct device *dev, size_t size,
 + gfp_t gfp, int prot, dma_addr_t *handle,
 + void (*flush_page)(const void *, phys_addr_t))
 +{
 + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + struct iova *iova;
 + struct page **pages;
 + struct sg_table sgt;
 + dma_addr_t dma_addr;
 + unsigned int count = PAGE_ALIGN(size)  PAGE_SHIFT;
 +
 + *handle = DMA_ERROR_CODE;
 +
 + pages = __iommu_dma_alloc_pages(count, gfp);
 + if (!pages)
 + return NULL;
 +
 + iova = __alloc_iova(iovad, size, dev-coherent_dma_mask);
 + if (!iova)
 + goto out_free_pages;
 +
 + size = iova_align(iovad, size);
 + if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
 + goto out_free_iova;
 +
 + if (gfp  __GFP_ZERO) {
 + struct sg_mapping_iter miter;
 + /*
 +  * The flushing provided by the SG_MITER_TO_SG flag only
 +  * applies to highmem pages, so it won't do the job here.
 +  */

The comment here is slightly wrong. The SG_MITER_FROM_SG flushing is
done by calling flush_kernel_dcache_page(). This function can be no-op
even on architectures implementing highmem. For example, on arm32 it
only does some flushing if there is a potential D-cache alias with user
space. The flush that you call below is for DMA operations, something
entirely different.

 + sg_miter_start(miter, sgt.sgl, sgt.orig_nents, 
 SG_MITER_FROM_SG);
 + while (sg_miter_next(miter)) {
 + memset(miter.addr, 0, PAGE_SIZE);

Isn't __GFP_ZERO already honoured by alloc_pages in
__iommu_dma_alloc_pages?

 + if (flush_page)
 + flush_page(miter.addr, 
 page_to_phys(miter.page));

Could you instead do the check as !(prot  IOMEM_CACHE) so that it saves
architecture code from passing NULL when coherent?

I'm still not convinced we need this flushing here but I'll follow up in
patch 3/4.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library

2015-07-28 Thread Sudeep Dutt
On Tue, 2015-07-28 at 15:38 +0100, David Woodhouse wrote:
 On Tue, 2015-07-28 at 11:41 +0100, Robin Murphy wrote:
  On 28/07/15 11:03, Joerg Roedel wrote:
   On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote:
From: Harish Chegondi harish.chego...@intel.com

This patch converts iova.c into a library, moving it from
drivers/iommu/ to lib/, and exports its virtual address 
allocation and
management functions so that other modules can reuse them.

Cc: Joerg Roedel j...@8bytes.org
Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com

Reviewed-by: Sudeep Dutt sudeep.d...@intel.com
Signed-off-by: Harish Chegondi harish.chego...@intel.com
   
   Where is this going to be used outside of the IOMMU world?
   

We are using the IOVA generator in the SCIF driver posted @
http://thread.gmane.org/gmane.linux.kernel/2005895 under
drivers/misc/mic/scif

  
  ...and how does it relate to the patches from Sakari (+CC) doing much 
  the same thing[1]?
 

The patch series from Sakari does the right thing by moving the IOVA
cache management to the IOVA library. We will simply drop this current
patch as it is incorrect.

 I merged Sakari's patches into the intel-iommu git tree today, FWIW.
 
 If there's really a need to move it from drivers/iommu/ to lib/ then we
 could feasibly do that too.
 

The patch series from Sakari should work perfectly for us. We will post
a v2 of the current SCIF patch series without this IOVA patch and modify
the SCIF driver to use the newly added iova_cache_get(..) and
iova_cache_put(..) APIs once it is available in Linus's tree. It would
make it easier for us to integrate if Sakari's patches reach mainline
soon.

It might be cleaner to move IOVA to lib/ in the longer term since we
will have multiple driver subsystems using it, but it should work just
fine for now.

Thanks for the review!

Sudeep Dutt

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/4] arm64: Add IOMMU dma_ops

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:14PM +0100, Robin Murphy wrote:
 +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 +  dma_addr_t *handle, gfp_t gfp,
 +  struct dma_attrs *attrs)
 +{
 + bool coherent = is_device_dma_coherent(dev);
 + int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
 + void *addr;
 +
 + if (WARN(!dev, cannot create IOMMU mapping for unknown device\n))
 + return NULL;
 +
 + /*
 +  * Some drivers rely on this, and we may not want to potentially
 +  * expose stale kernel data to devices anyway.
 +  */
 + gfp |= __GFP_ZERO;
 +
 + if (gfp  __GFP_WAIT) {
 + struct page **pages;
 + pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
 +  __pgprot(PROT_NORMAL_NC);
 +
 + pgprot = __get_dma_pgprot(attrs, pgprot, coherent);

I think you can simplify this:

pgprot_t pgprot = __get_dma_pgprot(attrs, PAGE_KERNEL, 
coherent);

(and we could do the same in __dma_alloc)

 + pages = iommu_dma_alloc(dev, size, gfp, ioprot, handle,
 + coherent ? NULL : flush_page);

I commented on patch 2/4. If we checked for IOMMU_CACHE in
iommu_dma_alloc(), we could always pass flush_page here without the
NULL.

 + if (!pages)
 + return NULL;
 +
 + addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
 +   __builtin_return_address(0));
 + if (!addr)
 + iommu_dma_free(dev, pages, size, handle);

For arm64, it would have been easier to simply flush the caches here and
avoid the flush_page argument. However, I reread your earlier reply, so
if we ever honour the ATTR_NO_KERNEL_MAPPING, then we would have to
iterate over the individual pages in the arch code. Let's leave it as
per your patch 2/4 (with a flush_page argument).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:15PM +0100, Robin Murphy wrote:
 With iommu_dma_ops in place, hook them up to the configuration code, so
 IOMMU-fronted devices will get them automatically.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com

Acked-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 1/1] iommu: Detach device from domain when removed from group

2015-07-28 Thread Gerald Schaefer
This patch adds a call to __iommu_detach_device() to the
iommu_group_remove_device() function, which will trigger a missing
detach_dev callback in (at least) the following scenario:

When a user completes the VFIO_SET_IOMMU ioctl for a vfio-pci device,
and the corresponding device is removed thereafter (before any other
ioctl like VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of
the underlying IOMMU API is never called.

This also fixes an asymmetry with iommu_group_add_device() and
iommu_group_remove_device(), where the former did an attach_dev but
the latter did no detach_dev.

Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com
---
 drivers/iommu/iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f286090..82ac8b3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -447,6 +447,9 @@ rename:
 }
 EXPORT_SYMBOL_GPL(iommu_group_add_device);
 
+static void __iommu_detach_device(struct iommu_domain *domain,
+ struct device *dev);
+
 /**
  * iommu_group_remove_device - remove a device from it's current group
  * @dev: device to be removed
@@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev)
 IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
 
mutex_lock(group-mutex);
+   if (group-domain)
+   __iommu_detach_device(group-domain, dev);
list_for_each_entry(tmp_device, group-devices, list) {
if (tmp_device-dev == dev) {
device = tmp_device;
-- 
2.3.8

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/1] iommu: Detach device from domain when removed from group

2015-07-28 Thread Gerald Schaefer
Hi,

during IOMMU API function testing on s390 I hit the following scenario:

After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU
ioctl and stops, see the sample C program below. Now the device is manually
removed via echo 1  /sys/bus/pci/devices/.../remove.

Although the SET_IOMMU ioctl triggered the attach_dev callback in the
underlying IOMMU API, removing the device in this way won't trigger the
detach_dev callback, neither during remove nor when the user program
continues with closing group/container.

On s390, this eventually leads to a kernel panic when binding the device
again to its non-vfio PCI driver, because of the missing arch-specific
cleanup in detach_dev. On x86, the detach_dev callback will also not be
called directly, but there is a notifier that will catch
BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other
architectures w/o the notifier probably have at least some kind of memory
leak in this scenario, so a general fix would be nice.

My first approach was to try and fix this in VFIO code, but Alex Williamson
pointed me to some asymmetry in the IOMMU code: iommu_group_add_device()
will invoke the attach_dev callback, but iommu_group_remove_device() won't
trigger detach_dev. Fixing this asymmetry would fix the issue for me, but
is this the correct fix? Any thoughts?

Regards,
Gerald


Here is the sample C program to trigger the ioctl:

#include stdio.h
#include fcntl.h
#include linux/vfio.h

int main(void)
{
int container, group, rc;

container = open(/dev/vfio/vfio, O_RDWR);
if (container  0) {
perror(open /dev/vfio/vfio\n);
return -1;
}

group = open(/dev/vfio/0, O_RDWR);
if (group  0) {
perror(open /dev/vfio/0\n);
return -1;
}

rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, container);
if (rc) {
perror(ioctl VFIO_GROUP_SET_CONTAINER\n);
return -1;
}

rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
if (rc) {
perror(ioctl VFIO_SET_IOMMU\n);
return -1;
}

printf(Try device remove...\n);
getchar();

close(group);
close(container);
return 0;
}

Gerald Schaefer (1):
  iommu: Detach device from domain when removed from group

 drivers/iommu/iommu.c | 5 +
 1 file changed, 5 insertions(+)

-- 
2.3.8

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx

2015-07-28 Thread Tony Lindgren
* Suman Anna s-a...@ti.com [150724 09:27]:
 Hi Tony,
 
 On 07/23/2015 11:30 PM, Tony Lindgren wrote:
  * Suman Anna s-a...@ti.com [150723 09:25]:
  Hi Tony,
 
  On 07/23/2015 02:24 AM, Tony Lindgren wrote:
  * Suman Anna s-a...@ti.com [150722 09:25]:
  On 07/22/2015 12:26 AM, Tony Lindgren wrote:
 
  I don't like using syscon for tinkering directly with SoC registers.
 
  This is not a SoC-level register, but a register within a sub-module of
  the DSP processor sub-system. The DSP_SYSTEM sub-module in general is
  described in Section 5.3.3 of the TRM [1], and it implements different
  functionalities like the PRCM handshaking, wakeup logic and DSP
  subsystem top-level configuration. It is a module present within the DSP
  processor sub-system, so can only be accessed when the sub-system is
  clocked and the appropriate reset is released.
 
  OK so if it's specific to the DSP driver along the lines of sysc and
  syss registers.
 
  There will be those registers too within the MMU config register space,
  even for DRA7xx MMUs. This is different, think of it like a register in
  the Control module except that it is present within the DSP sub-system
  instead of at the SoC level.
  
  And what is taking care of pm_runtime_get here to ensure the module
  is powered and clocked?
 
 pm_runtime_get_sync is indeed getting invoked, its just the current
 patch doesn't include the code block that invokes it. The function that
 invokes omap2_iommu_enable does so after a call to the
 pm_runtime_get_sync() call.

OK 

  I think you are missing a layer here and it's the Linux kernel side
  device driver for DSP that initializes things.
 
 We already have separate drivers for MMUs (omap-iommu) and the processor
 management (omap-rproc). The former manages all the low-level
 programming sequences for the MMUs, while the latter manages the
 low-level reset etc, and is a client user of the respective IOMMU
 device. Both integrate into the respective frameworks. The IOMMU API
 invocations are handled in the remoteproc core, with the OMAP remoteproc
 driver publishing itself as having an MMU with the remoteproc core. The
 IOMMU API invoke the appropriate iommu_ops.
 
 You can lookup the functions rproc_enable_iommu()/rproc_disable_iommu()
 in the remoteproc core. The IOMMU enabling sequences happen within the
 iommu_attach_device() API. The call flow is
 iommu_attach_device()-omap_iommu_attach_dev()-omap_iommu_attach()-
 iommu_enable()-
omap_device_deassert_hardreset, pm_runtime_get_sync, omap2_iommu_enable.

OK. The thing to check here is that you have a separate device driver
for each sysc/syss device registers. This is because each hardware module
can be independently clocked and idled. Otherwise things will break at
some point. And no things are configured for autoidle or we're not
doing PM is not a good excuse here :)

Can you please check that? If the remoteproc driver and iommu driver
are tinkering with registers in separate hardware modules, we have
a layering violation. My guess is that we have at least two hardware
modules involved here, one for the iommu and one for the DSP device.
 
  Typically we handle these registers by mapping them to the PM runtime
  functions for the interconnect so we can reset and idle the hardware
  modules even if there is no driver, see for example
  omap54xx_mmu_dsp_hwmod.
 
  I haven't yet submitted the DRA7xx hwmods, but they will look almost
  exactly like the OMAP5 ones. The reset and idle on these are in general
  not effective at boot time since these are also controlled by a
  hard-reset line, so that's left to the drivers to deal with it through
  the omap_device_deassert_hardreset API.
  
  If the MMU configuration is one time init, it may make sense to add
  it to the hwmod reset function. However, if the Linux kernel side
  driver needs to configure things depending on the DSP firmware, it
  should be done in the kernel side device driver.
 
 The MMU configuration comes into picture whenever the remoteproc driver
 is being booted and shut down, and also during suspend (no support for
 this yet in mainline on OMAP rproc). Today, the hwmod
 _enable/_idle/reset functions alone are not enough to power on the OMAP
 remoteproc/iommus. We need sequencing calls to both
 omap_device_assert/_deassert_hardreset (done through pdata-quirks) and
 pm_runtime API to achieve this.

Right and that's why I'm worried that we have multiple hardware modules
involved :)  
 
  We should use some Linux generic framework for configuring these
  bits to avoid nasty dependencies between various hardware modules
  on the SoC.
 
  What does DSP_SYS_MMU_CONFIG register do? It seems it's probably
  a regulator or a gate clock? If so, it should be set up as a
  regulator or a clock and then the omap-iommu driver can just
  use regulator or clcok framework to request the resource.
 
  No, its neither. It is a control bit that dictates whether the
  processor/EDMA addresses go through the respective MMU 

Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library

2015-07-28 Thread Andrew Morton
On Mon, 27 Jul 2015 16:57:32 -0700 Ashutosh Dixit ashutosh.di...@intel.com 
wrote:

 From: Harish Chegondi harish.chego...@intel.com
 
 This patch converts iova.c into a library, moving it from
 drivers/iommu/ to lib/, and exports its virtual address allocation and
 management functions so that other modules can reuse them.

From the following emails it is unclear to me whether we'll actually be
going ahead with this, but whatever.  It's a chance to do some code
reading.

  {drivers/iommu = lib}/iova.c | 9 +

Except the code isn't here.  Stupid git.  Here 'tis:

 /*
  * Copyright __ 2006-2009, Intel Corporation.

Non-ascii thing which I can't get to display correctly in anything. 
Give it up, it'll never work, use (c).

  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
  * version 2, as published by the Free Software Foundation.
  *
  * This program is distributed in the hope it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
  * more details.
  *
  * You should have received a copy of the GNU General Public License along 
 with
  * this program; if not, write to the Free Software Foundation, Inc., 59 
 Temple
  * Place - Suite 330, Boston, MA 02111-1307 USA.
  *
  * Author: Anil S Keshavamurthy anil.s.keshavamur...@intel.com
  */
 
 #include linux/iova.h
 #include linux/slab.h

Wow.  This file only compiles by sheer dumb luck.  kernel.h? 
spinlock.h?  bitops.h?  Some of these things come in thanks to iova.h,
others don't.

 static struct kmem_cache *iommu_iova_cache;
 
 int iommu_iova_cache_init(void)

Should actually be called iommu_iova_cache_create().  Because it's a
wrapper around kmem_cache_create(), and create is what it does.

 {
   int ret = 0;
 
   iommu_iova_cache = kmem_cache_create(iommu_iova,
sizeof(struct iova),
0,
SLAB_HWCACHE_ALIGN,
NULL);

Could use KMEM_CACHE().

   if (!iommu_iova_cache) {
   pr_err(Couldn't create iova cache\n);
   ret = -ENOMEM;
   }
 
   return ret;
 }
 
 void iommu_iova_cache_destroy(void)

The naming throughout this driver is a mess.  Sometimes it's
iommu_iova_operation.  Other times it's operation_iova.  Other
times it's operation_iova_operation.  There's no thought and no
consistency.

The usual standard will work well here.  Stick with iova_operation
and use it consistently.

So

iova_cache_create
iova_cache_destroy
iova_mem_free
iova_init_domain
iova_alloc
iova_find
iova_etcetera

 {
   kmem_cache_destroy(iommu_iova_cache);
 }
 
 struct iova *alloc_iova_mem(void)
 {
   return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
 }

GFP_ATOMIC is unreliable and undesirably depletes page allocator
reserves.  If it is really really the case that no caller can ever
allocate an iova in any other context then OK, but that's an
extraordinary thing and I suggest it should be well documented in code
comments.

However I suspect the thing to do here is to convert this to take a
gfp_t argument to permit callers to use the stronger memory allocation
strategies.


 void free_iova_mem(struct iova *iova)
 {
   kmem_cache_free(iommu_iova_cache, iova);
 }
 
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
   unsigned long start_pfn, unsigned long pfn_32bit)
 {
   /*
* IOVA granularity will normally be equal to the smallest
* supported IOMMU page size; both *must* be capable of
* representing individual CPU pages exactly.
*/
   BUG_ON((granule  PAGE_SIZE) || !is_power_of_2(granule));
 
   spin_lock_init(iovad-iova_rbtree_lock);
   iovad-rbroot = RB_ROOT;
   iovad-cached32_node = NULL;
   iovad-granule = granule;
   iovad-start_pfn = start_pfn;
   iovad-dma_32bit_pfn = pfn_32bit;
 }
 
 static struct rb_node *
 __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
 {
   if ((*limit_pfn != iovad-dma_32bit_pfn) ||
   (iovad-cached32_node == NULL))
   return rb_last(iovad-rbroot);
   else {
   struct rb_node *prev_node = rb_prev(iovad-cached32_node);
   struct iova *curr_iova =
   container_of(iovad-cached32_node, struct iova, node);
   *limit_pfn = curr_iova-pfn_lo - 1;
   return prev_node;
   }
 }
 
 static void
 __cached_rbnode_insert_update(struct iova_domain *iovad,
   unsigned long limit_pfn, struct iova *new)
 {
   if (limit_pfn != iovad-dma_32bit_pfn)
   return;
   iovad-cached32_node = new-node;
 }
 
 static void
 __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 {