Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-25 Thread Sricharan R
Hi Russell,

On 5/25/2017 8:35 PM, Russell King - ARM Linux wrote:
> On Wed, May 24, 2017 at 02:26:13PM +0300, Laurent Pinchart wrote:
>> Again, the patch I propose is the simplest v4.12-rc fix I can think of, 
>> short 
>> of reverting your complete IOMMU probe deferral patch series. Let's focus on 
>> the v4.12-rc fix, and then discuss how to move forward in v4.13 and beyond.
> 
> Except, I don't think it fixes the problem that Sricharan is trying to
> fix, namely that of DMA ops that have been setup, torn down, and are
> trying to be re-setup again.  The issue here is that results in a stale
> setup, because the dma_ops are left in-place from the first iommu setup,
> but the iommu mapping has been disposed of.
> 
> Your patch only avoids the problem of tearing down someone else's DMA
> ops.  We need a combination of both patches together.
> 
> What needs to happen for Sricharan's problem to be resolved is:
> 
> 1. move all of __arm_iommu_detach_device() into arm_iommu_detach_device().
> 2. replace the __arm_iommu_detach_device() call in 
> arm_teardown_iommu_dma_ops()
>with arm_iommu_detach_device().
> 
> as I don't see any need to have a different order in
> arm_teardown_iommu_dma_ops().
> 

Right, both patches are required and i was also thining the same thing about
using arm_iommu_detach_device from arm_teardown_iommu_dma_ops instead. Will
repost with this.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-05-25 Thread Xunlei Pang
On 05/26/2017 at 10:49 AM, Dave Young wrote:
> Ccing Xunlei he is reading the patches see what need to be done for
> kdump. There should still be several places to handle to make kdump work.
>
> On 05/18/17 at 07:01pm, Borislav Petkov wrote:
>> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>> But why do user-space tools need to know that?
>>
>> I mean, when we load the kdump kernel, we do it with the first kernel,
>> with the kexec_load() syscall, AFAICT. And that code does a lot of
>> things during that init, like machine_kexec_prepare()->init_pgtable() to
>> prepare the ident mapping of the second kernel, for example.
>>
>> What I'm aiming at is that the first kernel knows *exactly* whether SME
>> is enabled or not and doesn't need to tell the second one through some
>> sysfs entries - it can do that during loading.
>>
>> So I don't think we need any userspace things at all...
> If kdump kernel can get the SME status from hardware register then this
> should be not necessary and this patch can be dropped.

Yes, I also agree with dropping this one.

Regards,
Xunlei
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-25 Thread Xunlei Pang
On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
>
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.

Hi Tom,

Looks like kdump will break, I didn't see the similar handling for kdump cases, 
see kernel:
kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc.

We need to support kdump with SME, kdump 
kernel/initramfs/purgatory/elfcorehdr/etc
are all loaded into the reserved memory(see crashkernel=X) by userspace 
kexec-tools.
I think a straightforward way would be to mark the whole reserved memory range 
without
encryption before loading all the kexec segments for kdump, I guess we can 
handle this
easily in arch_kexec_unprotect_crashkres().

Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped 
to the
encrypted data.

Regards,
Xunlei

>
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
>
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/init.h  |1 +
>  arch/x86/include/asm/irqflags.h  |5 +
>  arch/x86/include/asm/kexec.h |8 
>  arch/x86/include/asm/pgtable_types.h |1 +
>  arch/x86/kernel/machine_kexec_64.c   |   35 
> +-
>  arch/x86/kernel/process.c|   26 +++--
>  arch/x86/mm/ident_map.c  |   11 +++
>  include/linux/kexec.h|   14 ++
>  kernel/kexec_core.c  |7 +++
>  9 files changed, 101 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..b2ec511 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>   void *context;   /* context for alloc_pgt_page */
>   unsigned long pmd_flag;  /* page flag for PMD entry */
>   unsigned long offset;/* ident mapping offset */
> + unsigned long kernpg_flag;   /* kernel pagetable flag override */
>  };
>  
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index ac7692d..38b5920 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>   asm volatile("hlt": : :"memory");
>  }
>  
> +static inline __cpuidle void native_wbinvd_halt(void)
> +{
> + asm volatile("wbinvd; hlt" : : : "memory");
> +}
> +
>  #endif
>  
>  #ifdef CONFIG_PARAVIRT
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 70ef205..e8183ac 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>   uint64_t r15;
>   uint64_t rip;
>  };
> +
> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
> +gfp_t gfp);
> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
> +
> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
> +
>  #endif
>  
>  typedef void crash_vmclear_fn(void);
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index ce8cb1c..0f326f4 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -213,6 +213,7 @@ enum page_cache_mode {
>  #define PAGE_KERNEL  __pgprot(__PAGE_KERNEL | _PAGE_ENC)
>  #define PAGE_KERNEL_RO   __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>  #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
> +#define PAGE_KERNEL_EXEC_NOENC   __pgprot(__PAGE_KERNEL_EXEC)
>  #define PAGE_KERNEL_RX   __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>  #define PAGE_KERNEL_NOCACHE  __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>  #define PAGE_KERNEL_LARGE__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 085c3b3..11c0ca9 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, 
> pgd_t *pgd)
>  

Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-05-25 Thread Dave Young
Ccing Xunlei he is reading the patches see what need to be done for
kdump. There should still be several places to handle to make kdump work.

On 05/18/17 at 07:01pm, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
> > Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> > determine if SME is active.
> 
> But why do user-space tools need to know that?
> 
> I mean, when we load the kdump kernel, we do it with the first kernel,
> with the kexec_load() syscall, AFAICT. And that code does a lot of
> things during that init, like machine_kexec_prepare()->init_pgtable() to
> prepare the ident mapping of the second kernel, for example.
> 
> What I'm aiming at is that the first kernel knows *exactly* whether SME
> is enabled or not and doesn't need to tell the second one through some
> sysfs entries - it can do that during loading.
> 
> So I don't think we need any userspace things at all...

If kdump kernel can get the SME status from hardware register then this
should be not necessary and this patch can be dropped.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place

2017-05-25 Thread Tom Lendacky

On 5/18/2017 7:46 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:21:49PM -0500, Tom Lendacky wrote:

Add the support to encrypt the kernel in-place. This is done by creating
new page mappings for the kernel - a decrypted write-protected mapping
and an encrypted mapping. The kernel is encrypted by copying it through
a temporary buffer.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |6 +
 arch/x86/mm/Makefile   |2
 arch/x86/mm/mem_encrypt.c  |  262 
 arch/x86/mm/mem_encrypt_boot.S |  151 +
 4 files changed, 421 insertions(+)
 create mode 100644 arch/x86/mm/mem_encrypt_boot.S

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index b406df2..8f6f9b4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void)
return ((u64)sme_me_mask << 1) - 1;
 }

+void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
+unsigned long decrypted_kernel_vaddr,
+unsigned long kernel_len,
+unsigned long encryption_wa,
+unsigned long encryption_pgd);
+
 void __init sme_early_encrypt(resource_size_t paddr,
  unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 9e13841..0633142 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)+= numa_emulation.o
 obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 30b07a3..0ff41a4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Since SME related variables are set early in the boot process they must
@@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned 
long size)
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
 }

+void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,


static


Yup.




+ unsigned long end)
+{
+   unsigned long addr = start;
+   pgdval_t *pgd_p;
+
+   while (addr < end) {
+   unsigned long pgd_end;
+
+   pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+   if (pgd_end > end)
+   pgd_end = end;
+
+   pgd_p = (pgdval_t *)pgd_base + pgd_index(addr);
+   *pgd_p = 0;


Hmm, so this is a contiguous range from [start:end] which translates to
8-byte PGD pointers in the PGD page so you can simply memset that range,
no?

Instead of iterating over each one?


I guess I could do that, but this will probably only end up clearing a
single PGD entry anyway since it's highly doubtful the address range
would cross a 512GB boundary.




+
+   addr = pgd_end;
+   }
+}
+
+#define PGD_FLAGS  _KERNPG_TABLE_NOENC
+#define PUD_FLAGS  _KERNPG_TABLE_NOENC
+#define PMD_FLAGS  (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+
+static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
+unsigned long vaddr, pmdval_t pmd_val)
+{
+   pgdval_t pgd, *pgd_p;
+   pudval_t pud, *pud_p;
+   pmdval_t pmd, *pmd_p;


You should use the enclosing type, not the underlying one. I.e.,

pgd_t *pgd;
pud_t *pud;
...

and then the macros native_p*d_val(), p*d_offset() and so on. I say
native_* because we don't want to have any paravirt nastyness here.
I believe your previous version was using the proper interfaces.


I won't be able to use the p*d_offset() macros since they use __va()
and we're identity mapped during this time (which is why I would guess
the proposed changes for the 5-level pagetables in
arch/x86/kernel/head64.c, __startup_64, don't use these macros
either). I should be able to use the native_set_p*d() and others though,
I'll look into that.



And the kernel has gotten 5-level pagetables support in
the meantime, so this'll need to start at p4d AFAICT.
arch/x86/mm/fault.c::dump_pagetable() looks like a good example to stare
at.


Yeah, I accounted for that in the other parts of the code but I need
to do that here also.




+   pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr);
+   pgd = *pgd_p;
+   if (pgd) {
+   pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK);
+   } else {
+   pud_p = pgtable_area;
+   memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+   pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+   *pgd_p = 

Re: [RFC PATCH 04/30] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-25 Thread Roy Franz (Cavium)
On Tue, May 23, 2017 at 4:21 AM, Jean-Philippe Brucker
 wrote:
> On 23/05/17 09:41, Leizhen (ThunderTown) wrote:
>> On 2017/2/28 3:54, Jean-Philippe Brucker wrote:
>>> PCIe devices can implement their own TLB, named Address Translation Cache
>>> (ATC). Steps involved in the use and maintenance of such caches are:
>>>
>>> * Device sends an Address Translation Request for a given IOVA to the
>>>   IOMMU. If the translation succeeds, the IOMMU returns the corresponding
>>>   physical address, which is stored in the device's ATC.
>>>
>>> * Device can then use the physical address directly in a transaction.
>>>   A PCIe device does so by setting the TLP AT field to 0b10 - translated.
>>>   The SMMU might check that the device is allowed to send translated
>>>   transactions, and let it pass through.
>>>
>>> * When an address is unmapped, CPU sends a CMD_ATC_INV command to the
>>>   SMMU, that is relayed to the device.
>>>
>>> In theory, this doesn't require a lot of software intervention. The IOMMU
>>> driver needs to enable ATS when adding a PCI device, and send an
>>> invalidation request when unmapping. Note that this invalidation is
>>> allowed to take up to a minute, according to the PCIe spec. In
>>> addition, the invalidation queue on the ATC side is fairly small, 32 by
>>> default, so we cannot keep many invalidations in flight (see ATS spec
>>> section 3.5, Invalidate Flow Control).
>>>
>>> Handling these constraints properly would require to postpone
>>> invalidations, and keep the stale mappings until we're certain that all
>>> devices forgot about them. This requires major work in the page table
>>> managers, and is therefore not done by this patch.
>>>
>>>   Range calculation
>>>   -
>>>
>>> The invalidation packet itself is a bit awkward: range must be naturally
>>> aligned, which means that the start address is a multiple of the range
>>> size. In addition, the size must be a power of two number of 4k pages. We
>>> have a few options to enforce this constraint:
>>>
>>> (1) Find the smallest naturally aligned region that covers the requested
>>> range. This is simple to compute and only takes one ATC_INV, but it
>>> will spill on lots of neighbouring ATC entries.
>>>
>>> (2) Align the start address to the region size (rounded up to a power of
>>> two), and send a second invalidation for the next range of the same
>>> size. Still not great, but reduces spilling.
>>>
>>> (3) Cover the range exactly with the smallest number of naturally aligned
>>> regions. This would be interesting to implement but as for (2),
>>> requires multiple ATC_INV.
>>>
>>> As I suspect ATC invalidation packets will be a very scarce resource,
>>> we'll go with option (1) for now, and only send one big invalidation.
>>>
>>> Note that with io-pgtable, the unmap function is called for each page, so
>>> this doesn't matter. The problem shows up when sharing page tables with
>>> the MMU.
>> Suppose this is true, I'd like to choose option (2). Because the worst cases 
>> of
>> both (1) and (2) will not be happened, but the code of (2) will look clearer.
>> And (2) is technically more acceptable.
>
> I agree that (2) is a bit clearer, but the question is of performance
> rather than readability. I'd like to see some benchmarks or experiment on
> my own before switching to a two-invalidation system.
>
> Intuitively one big invalidation will result in more ATC trashing and will
> bring overall device performance down. But then according to the PCI spec,
> ATC invalidations are grossly expensive, they have an upper bound of a
> minute. I agree that this is highly improbable and might depend on the
> range size, but purely from an architectural standpoint, reducing the
> number of ATC invalidation requests is the priority, because this is much
> worse than any performance slow-down incurred by ATC trashing. And for the
> moment I can only base my decisions on the architecture.
>
> So I'd like to keep (1) for now, and update it to (2) (or even (3)) once
> we have more hardware to experiment with.
>
> Thanks,
> Jean
>

I think (1) is a good place to start, as the same restricted encoding
that is used in
the invalidations is also used in the translation responses - all of
the ATC entries
were created with regions described this way.  We still may end up with nothing
but STU sized ATC entries, as TAs are free to respond to large
translation requests
with multiple STU sized translations, and in some cases this is the
best that they
can do.  Picking the optimal strategy will depend on hardware, and
maybe workload
as well.

Thanks,
Roy


>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH 3/4] iommu: add qcom_iommu

2017-05-25 Thread Rob Clark
An iommu driver for Qualcomm "B" family devices which do not completely
implement the ARM SMMU spec.  These devices have context-bank register
layout that is similar to ARM SMMU, but no global register space (or at
least not one that is accessible).

Signed-off-by: Rob Clark 
---
v1: original
v2: bindings cleanups and kconfig issues that kbuild robot pointed out
v3: fix issues pointed out by Rob H. and actually make device removal
work
v4: fix WARN_ON() splats reported by Archit
v5: some fixes to build as a module.. note that it cannot actually
be built as a module yet (at minimum a bunch of other iommu syms
that are needed are not exported, but there may be more to it
than that), but at least qcom_iommu is ready should it become
possible to build iommu drivers as modules.
v6: Add additional pm-runtime get/puts around paths that can hit
TLB inv, to avoid unclocked register access if device using the
iommu is not powered on.  And pre-emptively clear interrupts
before registering IRQ handler just in case the bootloader has
left us a surpise.

 drivers/iommu/Kconfig  |  10 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/qcom_iommu.c | 878 +
 3 files changed, 889 insertions(+)
 create mode 100644 drivers/iommu/qcom_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6ee3a25..aa4b628 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -367,4 +367,14 @@ config MTK_IOMMU_V1
 
  if unsure, say N here.
 
+config QCOM_IOMMU
+   # Note: iommu drivers cannot (yet?) be built as modules
+   bool "Qualcomm IOMMU Support"
+   depends on ARCH_QCOM || COMPILE_TEST
+   select IOMMU_API
+   select IOMMU_IO_PGTABLE_LPAE
+   select ARM_DMA_USE_IOMMU
+   help
+ Support for IOMMU on certain Qualcomm SoCs.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 195f7b9..b910aea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
+obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
new file mode 100644
index 000..bfaf97c
--- /dev/null
+++ b/drivers/iommu/qcom_iommu.c
@@ -0,0 +1,878 @@
+/*
+ * IOMMU API for QCOM secure IOMMUs.  Somewhat based on arm-smmu.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ *
+ * Copyright (C) 2013 ARM Limited
+ * Copyright (C) 2017 Red Hat
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "io-pgtable.h"
+#include "arm-smmu-regs.h"
+
+#define SMMU_INTR_SEL_NS 0x2000
+
+struct qcom_iommu_dev {
+   /* IOMMU core code handle */
+   struct iommu_device  iommu;
+   struct device   *dev;
+   struct clk  *iface_clk;
+   struct clk  *bus_clk;
+   void __iomem*local_base;
+   u32  sec_id;
+   struct list_head context_list;   /* list of qcom_iommu_context 
*/
+};
+
+struct qcom_iommu_ctx {
+   struct device   *dev;
+   void __iomem*base;
+   unsigned int irq;
+   bool secure_init;
+   u32  asid;  /* asid and ctx bank # are 1:1 */
+   struct iommu_group  *group;
+   struct list_head node;  /* head in 
qcom_iommu_device::context_list */
+};
+
+struct qcom_iommu_domain {
+   struct io_pgtable_ops   *pgtbl_ops;
+   spinlock_t   pgtbl_lock;
+   struct mutex init_mutex; /* Protects iommu pointer */
+   struct iommu_domain  domain;
+   struct qcom_iommu_dev   *iommu;
+};
+
+static struct qcom_iommu_domain *to_qcom_iommu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct qcom_iommu_domain, domain);
+}
+
+static const struct iommu_ops qcom_iommu_ops;
+
+static struct qcom_iommu_dev * __to_iommu(struct iommu_fwspec *fwspec)
+{
+   if (!fwspec || 

[PATCH 2/4] iommu: arm-smmu: split out register defines

2017-05-25 Thread Rob Clark
I want to re-use some of these for qcom_iommu, which has (roughly) the
same context-bank registers.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm-smmu-regs.h | 227 ++
 drivers/iommu/arm-smmu.c  | 203 +
 2 files changed, 228 insertions(+), 202 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-regs.h

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
new file mode 100644
index 000..87589c8
--- /dev/null
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -0,0 +1,227 @@
+/*
+ * IOMMU API for ARM architected SMMU implementations.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon 
+ */
+
+#ifndef _ARM_SMMU_REGS_H
+#define _ARM_SMMU_REGS_H
+
+/* Configuration registers */
+#define ARM_SMMU_GR0_sCR0  0x0
+#define sCR0_CLIENTPD  (1 << 0)
+#define sCR0_GFRE  (1 << 1)
+#define sCR0_GFIE  (1 << 2)
+#define sCR0_EXIDENABLE(1 << 3)
+#define sCR0_GCFGFRE   (1 << 4)
+#define sCR0_GCFGFIE   (1 << 5)
+#define sCR0_USFCFG(1 << 10)
+#define sCR0_VMIDPNE   (1 << 11)
+#define sCR0_PTM   (1 << 12)
+#define sCR0_FB(1 << 13)
+#define sCR0_VMID16EN  (1 << 31)
+#define sCR0_BSU_SHIFT 14
+#define sCR0_BSU_MASK  0x3
+
+/* Auxiliary Configuration register */
+#define ARM_SMMU_GR0_sACR  0x10
+
+/* Identification registers */
+#define ARM_SMMU_GR0_ID0   0x20
+#define ARM_SMMU_GR0_ID1   0x24
+#define ARM_SMMU_GR0_ID2   0x28
+#define ARM_SMMU_GR0_ID3   0x2c
+#define ARM_SMMU_GR0_ID4   0x30
+#define ARM_SMMU_GR0_ID5   0x34
+#define ARM_SMMU_GR0_ID6   0x38
+#define ARM_SMMU_GR0_ID7   0x3c
+#define ARM_SMMU_GR0_sGFSR 0x48
+#define ARM_SMMU_GR0_sGFSYNR0  0x50
+#define ARM_SMMU_GR0_sGFSYNR1  0x54
+#define ARM_SMMU_GR0_sGFSYNR2  0x58
+
+#define ID0_S1TS   (1 << 30)
+#define ID0_S2TS   (1 << 29)
+#define ID0_NTS(1 << 28)
+#define ID0_SMS(1 << 27)
+#define ID0_ATOSNS (1 << 26)
+#define ID0_PTFS_NO_AARCH32(1 << 25)
+#define ID0_PTFS_NO_AARCH32S   (1 << 24)
+#define ID0_CTTW   (1 << 14)
+#define ID0_NUMIRPT_SHIFT  16
+#define ID0_NUMIRPT_MASK   0xff
+#define ID0_NUMSIDB_SHIFT  9
+#define ID0_NUMSIDB_MASK   0xf
+#define ID0_EXIDS  (1 << 8)
+#define ID0_NUMSMRG_SHIFT  0
+#define ID0_NUMSMRG_MASK   0xff
+
+#define ID1_PAGESIZE   (1 << 31)
+#define ID1_NUMPAGENDXB_SHIFT  28
+#define ID1_NUMPAGENDXB_MASK   7
+#define ID1_NUMS2CB_SHIFT  16
+#define ID1_NUMS2CB_MASK   0xff
+#define ID1_NUMCB_SHIFT0
+#define ID1_NUMCB_MASK 0xff
+
+#define ID2_OAS_SHIFT  4
+#define ID2_OAS_MASK   0xf
+#define ID2_IAS_SHIFT  0
+#define ID2_IAS_MASK   0xf
+#define ID2_UBS_SHIFT  8
+#define ID2_UBS_MASK   0xf
+#define ID2_PTFS_4K(1 << 12)
+#define ID2_PTFS_16K   (1 << 13)
+#define ID2_PTFS_64K   (1 << 14)
+#define ID2_VMID16 (1 << 15)
+
+#define ID7_MAJOR_SHIFT4
+#define ID7_MAJOR_MASK 0xf
+
+/* Global TLB invalidation */
+#define ARM_SMMU_GR0_TLBIVMID  0x64
+#define ARM_SMMU_GR0_TLBIALLNSNH   0x68
+#define ARM_SMMU_GR0_TLBIALLH  0x6c
+#define ARM_SMMU_GR0_sTLBGSYNC 0x70
+#define ARM_SMMU_GR0_sTLBGSTATUS   0x74
+#define sTLBGSTATUS_GSACTIVE   (1 << 0)
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+/* Stream mapping registers */
+#define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
+#define SMR_VALID 

[PATCH 4/4] iommu: qcom: initialize secure page table

2017-05-25 Thread Rob Clark
From: Stanimir Varbanov 

This basically gets the secure page table size, allocates memory for
secure pagetables and passes the physical address to the trusted zone.

Signed-off-by: Stanimir Varbanov 
Signed-off-by: Rob Clark 
---
 drivers/iommu/qcom_iommu.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index bfaf97c..3033862 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -632,6 +632,51 @@ static void qcom_iommu_disable_clocks(struct 
qcom_iommu_dev *qcom_iommu)
clk_disable_unprepare(qcom_iommu->iface_clk);
 }
 
+static int qcom_iommu_sec_ptbl_init(struct device *dev)
+{
+   size_t psize = 0;
+   unsigned int spare = 0;
+   void *cpu_addr;
+   dma_addr_t paddr;
+   unsigned long attrs;
+   static bool allocated = false;
+   int ret;
+
+   if (allocated)
+   return 0;
+
+   ret = qcom_scm_iommu_secure_ptbl_size(spare, );
+   if (ret) {
+   dev_err(dev, "failed to get iommu secure pgtable size (%d)\n",
+   ret);
+   return ret;
+   }
+
+   dev_info(dev, "iommu sec: pgtable size: %zu\n", psize);
+
+   attrs = DMA_ATTR_NO_KERNEL_MAPPING;
+
+   cpu_addr = dma_alloc_attrs(dev, psize, , GFP_KERNEL, attrs);
+   if (!cpu_addr) {
+   dev_err(dev, "failed to allocate %zu bytes for pgtable\n",
+   psize);
+   return -ENOMEM;
+   }
+
+   ret = qcom_scm_iommu_secure_ptbl_init(paddr, psize, spare);
+   if (ret) {
+   dev_err(dev, "failed to init iommu pgtable (%d)\n", ret);
+   goto free_mem;
+   }
+
+   allocated = true;
+   return 0;
+
+free_mem:
+   dma_free_attrs(dev, psize, cpu_addr, paddr, attrs);
+   return ret;
+}
+
 static int qcom_iommu_ctx_probe(struct platform_device *pdev)
 {
struct qcom_iommu_ctx *ctx;
@@ -718,6 +763,17 @@ static struct platform_driver qcom_iommu_ctx_driver = {
.remove = qcom_iommu_ctx_remove,
 };
 
+static bool qcom_iommu_has_secure_context(struct qcom_iommu_dev *qcom_iommu)
+{
+   struct device_node *child;
+
+   for_each_child_of_node(qcom_iommu->dev->of_node, child)
+   if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec"))
+   return true;
+
+   return false;
+}
+
 static int qcom_iommu_device_probe(struct platform_device *pdev)
 {
struct qcom_iommu_dev *qcom_iommu;
@@ -754,6 +810,14 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
return -ENODEV;
}
 
+   if (qcom_iommu_has_secure_context(qcom_iommu)) {
+   ret = qcom_iommu_sec_ptbl_init(dev);
+   if (ret) {
+   dev_err(dev, "cannot init secure pg table(%d)\n", ret);
+   return ret;
+   }
+   }
+
platform_set_drvdata(pdev, qcom_iommu);
 
pm_runtime_enable(dev);
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] Docs: dt: document qcom iommu bindings

2017-05-25 Thread Rob Clark
Cc: devicet...@vger.kernel.org
Signed-off-by: Rob Clark 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/iommu/qcom,iommu.txt   | 121 +
 1 file changed, 121 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt

diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
new file mode 100644
index 000..0d50f84
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
@@ -0,0 +1,121 @@
+* QCOM IOMMU v1 Implementation
+
+Qualcomm "B" family devices which are not compatible with arm-smmu have
+a similar looking IOMMU but without access to the global register space,
+and optionally requiring additional configuration to route context irqs
+to non-secure vs secure interrupt line.
+
+** Required properties:
+
+- compatible   : Should be one of:
+
+"qcom,msm8916-iommu"
+
+ Followed by "qcom,msm-iommu-v1".
+
+- clock-names  : Should be a pair of "iface" (required for IOMMUs
+ register group access) and "bus" (required for
+ the IOMMUs underlying bus access).
+
+- clocks   : Phandles for respective clocks described by
+ clock-names.
+
+- #address-cells   : must be 1.
+
+- #size-cells  : must be 1.
+
+- #iommu-cells : Must be 1.
+
+- ranges   : Base address and size of the iommu context banks.
+
+- qcom,iommu-secure-id  : secure-id.
+
+- List of sub-nodes, one per translation context bank.  Each sub-node
+  has the following required properties:
+
+  - compatible : Should be one of:
+- "qcom,msm-iommu-v1-ns"  : non-secure context bank
+- "qcom,msm-iommu-v1-sec" : secure context bank
+  - reg: Base address and size of context bank within the iommu
+  - interrupts : The context fault irq.
+
+** Optional properties:
+
+- reg  : Base address and size of the SMMU local base, should
+ be only specified if the iommu requires configuration
+ for routing of context bank irq's to secure vs non-
+ secure lines.  (Ie. if the iommu contains secure
+ context banks)
+
+
+** Examples:
+
+   apps_iommu: iommu@1e2 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+   compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
+   ranges = <0 0x1e2 0x4>;
+   reg = <0x1ef 0x3000>;
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_APSS_TCU_CLK>;
+   clock-names = "iface", "bus";
+   qcom,iommu-secure-id = <17>;
+
+   // mdp_0:
+   iommu-ctx@4000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x4000 0x1000>;
+   interrupts = ;
+   };
+
+   // venus_ns:
+   iommu-ctx@5000 {
+   compatible = "qcom,msm-iommu-v1-sec";
+   reg = <0x5000 0x1000>;
+   interrupts = ;
+   };
+   };
+
+   gpu_iommu: iommu@1f08000 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+   compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
+   ranges = <0 0x1f08000 0x1>;
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_GFX_TCU_CLK>;
+   clock-names = "iface", "bus";
+   qcom,iommu-secure-id = <18>;
+
+   // gfx3d_user:
+   iommu-ctx@1f09000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x1000 0x1000>;
+   interrupts = ;
+   };
+
+   // gfx3d_priv:
+   iommu-ctx@1f0a000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x2000 0x1000>;
+   interrupts = ;
+   };
+   };
+
+   ...
+
+   venus: video-codec@1d0 {
+   ...
+   iommus = <_iommu 5>;
+   };
+
+   mdp: mdp@1a01000 {
+   ...
+   iommus = <_iommu 4>;
+   };
+
+   gpu@01c0 {
+   ...
+   iommus = <_iommu 1>, <_iommu 2>;
+   };
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] iommu: add qcom_iommu for early "B" family devices

2017-05-25 Thread Rob Clark
An iommu driver for Qualcomm "B" family devices which do not completely
implement the ARM SMMU spec.  These devices have context-bank register
layout that is similar to ARM SMMU, but no global register space (or at
least not one that is accessible).

A couple more minor changes in 3/4, and dt bindings now have Rob H's r-b.

Rob Clark (3):
  Docs: dt: document qcom iommu bindings
  iommu: arm-smmu: split out register defines
  iommu: add qcom_iommu

Stanimir Varbanov (1):
  iommu: qcom: initialize secure page table

 .../devicetree/bindings/iommu/qcom,iommu.txt   | 121 +++
 drivers/iommu/Kconfig  |  10 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/arm-smmu-regs.h  | 227 +
 drivers/iommu/arm-smmu.c   | 203 +
 drivers/iommu/qcom_iommu.c | 942 +
 6 files changed, 1302 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt
 create mode 100644 drivers/iommu/arm-smmu-regs.h
 create mode 100644 drivers/iommu/qcom_iommu.c

-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-25 Thread Russell King - ARM Linux
On Wed, May 24, 2017 at 02:26:13PM +0300, Laurent Pinchart wrote:
> Again, the patch I propose is the simplest v4.12-rc fix I can think of, short 
> of reverting your complete IOMMU probe deferral patch series. Let's focus on 
> the v4.12-rc fix, and then discuss how to move forward in v4.13 and beyond.

Except, I don't think it fixes the problem that Sricharan is trying to
fix, namely that of DMA ops that have been setup, torn down, and are
trying to be re-setup again.  The issue here is that results in a stale
setup, because the dma_ops are left in-place from the first iommu setup,
but the iommu mapping has been disposed of.

Your patch only avoids the problem of tearing down someone else's DMA
ops.  We need a combination of both patches together.

What needs to happen for Sricharan's problem to be resolved is:

1. move all of __arm_iommu_detach_device() into arm_iommu_detach_device().
2. replace the __arm_iommu_detach_device() call in arm_teardown_iommu_dma_ops()
   with arm_iommu_detach_device().

as I don't see any need to have a different order in
arm_teardown_iommu_dma_ops().

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-25 Thread Raj, Ashok
Hi Jean

On Thu, May 11, 2017 at 11:50:24AM +0100, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 10/05/17 19:39, Ashok Raj wrote:
> > From: CQ Tang 
> > 
> > Requires: https://patchwork.kernel.org/patch/9593891
> 
> Since your series is likely to go in much earlier than my SVM mess, maybe
> you could carry that PCI patch along with it? Or I could resend it on its
> own if you prefer.

CQ has tested your patch along with this. In fact the original patch had
exactly what you had in this patch. But i split since i noticed you caught
part of our change.

Since Joerg already has your version, it might be easy to just pick
your patch and ours and commit separately.

Cheers,
Ashok
> 
> I'm planning to resend the SVM series in a few weeks but it still won't
> make it into mainline since it hasn't run on hardware.
> 
> Thanks,
> Jean-Philippe
> 
> > After a FLR, pci-states need to be restored. This patch saves PASID features
> > and PRI reqs cached.
> > 
> > Cc: Jean-Phillipe Brucker 
> > Cc: David Woodhouse 
> > Cc: iommu@lists.linux-foundation.org
> > 
> > Signed-off-by: CQ Tang 
> > Signed-off-by: Ashok Raj 
> > ---
> >  drivers/pci/ats.c   | 65 
> > +
> >  drivers/pci/pci.c   |  3 +++
> >  include/linux/pci-ats.h | 10 
> >  include/linux/pci.h |  6 +
> >  4 files changed, 69 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > index 2126497..a769955 100644
> > --- a/drivers/pci/ats.c
> > +++ b/drivers/pci/ats.c
> > @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
> > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
> > -   if ((control & PCI_PRI_CTRL_ENABLE) ||
> > -   !(status & PCI_PRI_STATUS_STOPPED))
> > +   if (!(status & PCI_PRI_STATUS_STOPPED))
> > return -EBUSY;
> >  
> > pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests);
> > reqs = min(max_requests, reqs);
> > +   pdev->pri_reqs_alloc = reqs;
> > pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> >  
> > -   control |= PCI_PRI_CTRL_ENABLE;
> > +   control = PCI_PRI_CTRL_ENABLE;
> > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> >  
> > pdev->pri_enabled = 1;
> > @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
> >  EXPORT_SYMBOL_GPL(pci_disable_pri);
> >  
> >  /**
> > + * pci_restore_pri_state - Restore PRI
> > + * @pdev: PCI device structure
> > + *
> > + */
> > +void pci_restore_pri_state(struct pci_dev *pdev)
> > +{
> > +   u16 control = PCI_PRI_CTRL_ENABLE;
> > +   u32 reqs = pdev->pri_reqs_alloc;
> > +   int pos;
> > +
> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > +   if (!pos)
> > +   return;
> > +
> > +   if (!pdev->pri_enabled)
> > +   return;
> > +
> > +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> > +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > +}
> > +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> > +
> > +/**
> >   * pci_reset_pri - Resets device's PRI state
> >   * @pdev: PCI device structure
> >   *
> > @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
> > -   if (control & PCI_PRI_CTRL_ENABLE)
> > -   return -EBUSY;
> > -
> > -   control |= PCI_PRI_CTRL_RESET;
> > -
> > +   control = PCI_PRI_CTRL_RESET;
> > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> >  
> > return 0;
> > @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int 
> > features)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PASID_CTRL, );
> > pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
> > -
> > -   if (control & PCI_PASID_CTRL_ENABLE)
> > -   return -EINVAL;
> > -
> > supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> >  
> > /* User wants to enable anything unsupported? */
> > @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> > return -EINVAL;
> >  
> > control = PCI_PASID_CTRL_ENABLE | features;
> > +   pdev->pasid_features = features;
> >  
> > pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> >  
> > @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
> >  EXPORT_SYMBOL_GPL(pci_disable_pasid);
> >  
> >  /**
> > + * pci_restore_pasid_state - Restore PASID capabilities.
> > + * @pdev: PCI device structure
> > + *
> > + */
> > +void pci_restore_pasid_state(struct pci_dev *pdev)
> > +{
> > +   u16 control;
> > +   int pos;
> > +
> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> > +   if (!pos)
> > +   return;
> 

RE: [PATCH 1/2] staging: fsl-dpaa2/eth: Fix address translations

2017-05-25 Thread Ruxandra Ioana Radulescu
> -Original Message-
> From: Laurentiu Tudor
> Sent: Wednesday, May 24, 2017 3:34 PM
> To: Ruxandra Ioana Radulescu ;
> gre...@linuxfoundation.org
> Cc: de...@driverdev.osuosl.org; linux-ker...@vger.kernel.org;
> ag...@suse.de; a...@arndb.de; linux-arm-ker...@lists.infradead.org;
> iommu@lists.linux-foundation.org; Bogdan Purcareata
> ; stuyo...@gmail.com; Nipun Gupta
> 
> Subject: Re: [PATCH 1/2] staging: fsl-dpaa2/eth: Fix address translations
> 
> Hi Ioana,
> 
> Debatable nit inline.
> 
> On 05/24/2017 03:13 PM, Ioana Radulescu wrote:
> > Use the correct mechanisms for translating a DMA-mapped IOVA
> > address into a virtual one. Without this fix, once SMMU is
> > enabled on Layerscape platforms, the Ethernet driver throws
> > IOMMU translation faults.
> >
> > Signed-off-by: Nipun Gupta 
> > Signed-off-by: Ioana Radulescu 
> > ---
> >   drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c | 25
> +++--
> >   drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h |  1 +
> >   2 files changed, 20 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
> b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
> > index 6f9eed66c64d..3fee0d6f17e0 100644
> > --- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
> > +++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
> > @@ -37,6 +37,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >
> >   #include "../../fsl-mc/include/mc.h"
> >   #include "../../fsl-mc/include/mc-sys.h"
> > @@ -54,6 +55,16 @@ MODULE_DESCRIPTION("Freescale DPAA2 Ethernet
> Driver");
> >
> >   const char dpaa2_eth_drv_version[] = "0.1";
> >
> > +static void *dpaa2_iova_to_virt(struct iommu_domain *domain,
> 
> if you pass a "struct dpaa2_eth_priv *priv" instead of "iommu_domain"
> you can move the priv->iommu_domain reference in the function and
> slightly simplify the call sites.
 
Fair point, but I'd prefer keeping this function independent of the
Ethernet driver's private data structure. This way, if other (future)
DPAA2 drivers will need a similar function, we can just move it
to a common area instead of duplicating the code.

Thanks,
Ioana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-05-25 Thread Jean-Philippe Brucker
On 23/05/17 08:50, Liu, Yi L wrote:
> On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
[...]

 For the next version of my SVM series, I was thinking of passing group
 instead of device to iommu_bind. Since all devices in a group are expected
 to share the same mappings (whether they want it or not), users will have
>>>
>>> Virtual address space is not tied to protection domain as I/O virtual 
>>> address
>>> space does. Is it really necessary to affect all the devices in this group.
>>> Or it is just for consistence?
>>
>> It's mostly about consistency, and also avoid hiding implicit behavior in
>> the IOMMU driver. I have the following example, described using group and
>> domain structures from the IOMMU API:
>>  
>> |IOMMU   |
>> |  |DOM  __ ||
>> |  ||GRP   ||| bind
>> |  ||A<-Task 1
>> |  ||B |||
>> |  ||__|||
>> |  | __ ||
>> |  ||GRP   |||
>> |  ||C |||
>> |  ||__|||
>> |  |||
>> |    |
>> |  |DOM  __ ||
>> |  ||GRP   |||
>> |  ||D |||
>> |  ||__|||
>> |  |||
>> ||
>>
>> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
>> to some hardware limitation (in the bus, the device or the IOMMU), B can
>> see all DMA transactions issued by A. A and B are therefore in the same
>> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
>> own group.
>>
>> (As far as I know, in the SVM world at the moment, devices are neatly
>> integrated and there is no need for putting multiple devices in the same
>> IOMMU group, but I don't think we should expect all future SVM systems to
>> be well-behaved.)
>>
>> So when a user binds Task 1 to device A, it is *implicitly* giving device
>> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
>> A from B, PASID or not. B could access the same address space as A, even
>> if you don't call bind again to explicitly attach the PASID table to B.
>>
>> If the bind is done with device as argument, maybe users will believe that
>> using PASIDs provides an additional level of isolation within a group,
>> when it really doesn't. That's why I'm inclined to have the whole bind API
>> be on groups rather than devices, if only for clarity.
> 
> This may depend on how the user understand the isolation. I think different
> PASID does mean different address space. From this perspective, it does look
> like isolation.

Yes, and it isn't isolation. Not at device granularity, that is. IOMMU has
the concept of group because sometimes the hardware simply cannot isolate
devices. Different PASIDs does mean different address spaces, but two
devices in the same group may be able to access each other's address
spaces, regardless of the presence of a PASID.

To illustrate the problem with PASIDs, let's say that for whatever reason
(e.g. lack of ACS Source Validation in a PCI switch), device B (0x0100)
can spoof device A's RID (0x0200). Therefore we put A and B in the same
IOMMU group.

User binds Task 1 to device A and Task 2 to device B. They use PASIDs X
and Y, so user thinks that they are isolated. But given the physical
properties of the system, device B can pretend it is device A, and access
the whole address space of Task 1 by sending transactions with RID 0x0200
and PASID X. So user effectively created a backdoor between tasks 1 and 2
without knowing it, and using PASIDs didn't add any protection.

>> But I don't know, maybe a comment explaining the above would be sufficient.
>>
>> To be frank my comment about group versus device is partly to make sure
>> that I grasp the various concepts correctly and that we're on the same
>> page. Doing the bind on groups is less significant in your case, for PASID
>> table binding, because VFIO already takes care of IOMMU group properly. In
>> my case I expect DRM, network, DMA drivers to use the API as well for
>> binding tasks, and I don't want to introduce ambiguity in the API that
>> would lead to security holes later.
> 
> For this part, would you provide more detail about why it would be more
> significant to bind on group level in your case? I think we need strong
> reason to support it. Currently, the other map_page APIs are passing
> device as argument. Would it also be recommended to use group as argument?

Well I'm only concerned about the API we're introducing at the moment, I'm
not suggesting we change existing ones. Because PASID is a new concept and
is currently unregulated, it would be good