Re: [PATCH] iommu/amd: Don't put completion-wait semaphore on stack

2016-09-14 Thread Ingo Molnar

* Joerg Roedel  wrote:

> On Wed, Sep 14, 2016 at 11:27:12PM +0200, Joerg Roedel wrote:
> > On Wed, Sep 14, 2016 at 05:26:48PM +0200, Ingo Molnar wrote:
> > > 
> > > Cool, thanks! I'll put this into tip:x86/asm which has the virtually 
> > > mapped stack 
> > > patches - ok?
> > 
> > Yeah, sure, that is the best thing to do. Just wait for the v2 I'll
> > sending tomorrow. I just realised that the locking is not correct in one
> > of the cases with this patch and I'd like to fix that first.
> 
> Oh sorry, just saw the tip-bot mail. Let me know whether you can still
> remove it and just take v2 or if you want a follow-on patch.

Yeah, I can still remove it - just zapped it in fact.

Thanks,

Ingo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Don't put completion-wait semaphore on stack

2016-09-14 Thread Joerg Roedel
On Wed, Sep 14, 2016 at 11:27:12PM +0200, Joerg Roedel wrote:
> On Wed, Sep 14, 2016 at 05:26:48PM +0200, Ingo Molnar wrote:
> > 
> > Cool, thanks! I'll put this into tip:x86/asm which has the virtually mapped 
> > stack 
> > patches - ok?
> 
> Yeah, sure, that is the best thing to do. Just wait for the v2 I'll
> sending tomorrow. I just realised that the locking is not correct in one
> of the cases with this patch and I'd like to fix that first.

Oh sorry, just saw the tip-bot mail. Let me know whether you can still
remove it and just take v2 or if you want a follow-on patch.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Don't put completion-wait semaphore on stack

2016-09-14 Thread Joerg Roedel
Hi Ingo,

On Wed, Sep 14, 2016 at 05:26:48PM +0200, Ingo Molnar wrote:
> 
> Cool, thanks! I'll put this into tip:x86/asm which has the virtually mapped 
> stack 
> patches - ok?

Yeah, sure, that is the best thing to do. Just wait for the v2 I'll
sending tomorrow. I just realised that the locking is not correct in one
of the cases with this patch and I'd like to fix that first.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Don't put completion-wait semaphore on stack

2016-09-14 Thread Ingo Molnar

* Joerg Roedel  wrote:

> From: Joerg Roedel 
> 
> The semaphore used by the AMD IOMMU to signal command
> completion lived on the stack until now, which was safe as
> the driver busy-waited on the semaphore with IRQs disabled,
> so the stack can't go away under the driver.
> 
> But the recently introduced vmap-based stacks break this as
> the physical address of the semaphore can't be determinded
> easily anymore. The driver used the __pa() macro, but that
> only works in the direct-mapping. The result were
> Completion-Wait timeout errors seen by the IOMMU driver,
> breaking system boot.
> 
> Since putting the semaphore on the stack is bad design
> anyway, move the semaphore into 'struct amd_iommu'. It is
> protected by the per-iommu lock and now in the direct
> mapping again. This fixes the Completion-Wait timeout errors
> and makes AMD IOMMU systems boot again with vmap-based
> stacks enabled.
> 
> Reported-by: Borislav Petkov 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd_iommu.c   | 14 --
>  drivers/iommu/amd_iommu_types.h |  2 ++
>  2 files changed, 10 insertions(+), 6 deletions(-)

Cool, thanks! I'll put this into tip:x86/asm which has the virtually mapped 
stack 
patches - ok?

Thanks,

Ingo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 19/20] x86: Access the setup data through debugfs un-encrypted

2016-09-14 Thread Borislav Petkov
On Wed, Sep 14, 2016 at 09:29:41AM -0500, Tom Lendacky wrote:
> This is still required because just using the __va() would still cause
> the mapping created to have the encryption bit set. The ioremap call
> will result in the mapping not having the encryption bit set.

I meant this: https://lkml.kernel.org/r/20160902181447.ga25...@nazgul.tnic

Wouldn't simply clearing the SME mask work?

#define __va(x) ((void *)(((unsigned long)(x)+PAGE_OFFSET) & 
~sme_me_mask))

Or are you saying, one needs the whole noodling through ioremap_cache()
because the data is already encrypted and accessing it with sme_me_mask
cleared would simply give you the encrypted garbage?

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 20/20] x86: Add support to make use of Secure Memory Encryption

2016-09-14 Thread Tom Lendacky
On 09/12/2016 12:08 PM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:39:08PM -0500, Tom Lendacky wrote:
>> This patch adds the support to check if SME has been enabled and if the
>> mem_encrypt=on command line option is set. If both of these conditions
>> are true, then the encryption mask is set and the kernel is encrypted
>> "in place."
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  Documentation/kernel-parameters.txt |3 
>>  arch/x86/kernel/asm-offsets.c   |2 
>>  arch/x86/kernel/mem_encrypt.S   |  302 
>> +++
>>  arch/x86/mm/mem_encrypt.c   |2 
>>  4 files changed, 309 insertions(+)
>>
>> diff --git a/Documentation/kernel-parameters.txt 
>> b/Documentation/kernel-parameters.txt
>> index 46c030a..a1986c8 100644
>> --- a/Documentation/kernel-parameters.txt
>> +++ b/Documentation/kernel-parameters.txt
>> @@ -2268,6 +2268,9 @@ bytes respectively. Such letter suffixes can also be 
>> entirely omitted.
>>  memory contents and reserves bad memory
>>  regions that are detected.
>>  
>> +mem_encrypt=on  [X86_64] Enable memory encryption on processors
>> +that support this feature.
>> +
>>  meye.*= [HW] Set MotionEye Camera parameters
>>  See Documentation/video4linux/meye.txt.
>>  
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index 2bd5c6f..e485ada 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -85,6 +85,8 @@ void common(void) {
>>  OFFSET(BP_init_size, boot_params, hdr.init_size);
>>  OFFSET(BP_pref_address, boot_params, hdr.pref_address);
>>  OFFSET(BP_code32_start, boot_params, hdr.code32_start);
>> +OFFSET(BP_cmd_line_ptr, boot_params, hdr.cmd_line_ptr);
>> +OFFSET(BP_ext_cmd_line_ptr, boot_params, ext_cmd_line_ptr);
>>  
>>  BLANK();
>>  DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
>> diff --git a/arch/x86/kernel/mem_encrypt.S b/arch/x86/kernel/mem_encrypt.S
>> index f2e0536..bf9f6a9 100644
>> --- a/arch/x86/kernel/mem_encrypt.S
>> +++ b/arch/x86/kernel/mem_encrypt.S
>> @@ -12,13 +12,230 @@
>>  
>>  #include 
>>  
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>>  .text
>>  .code64
>>  ENTRY(sme_enable)
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +/* Check for AMD processor */
>> +xorl%eax, %eax
>> +cpuid
>> +cmpl$0x68747541, %ebx   # AuthenticAMD
>> +jne .Lmem_encrypt_exit
>> +cmpl$0x69746e65, %edx
>> +jne .Lmem_encrypt_exit
>> +cmpl$0x444d4163, %ecx
>> +jne .Lmem_encrypt_exit
>> +
>> +/* Check for memory encryption leaf */
>> +movl$0x8000, %eax
>> +cpuid
>> +cmpl$0x801f, %eax
>> +jb  .Lmem_encrypt_exit
>> +
>> +/*
>> + * Check for memory encryption feature:
>> + *   CPUID Fn8000_001F[EAX] - Bit 0
>> + * Secure Memory Encryption support
>> + *   CPUID Fn8000_001F[EBX] - Bits 5:0
>> + * Pagetable bit position used to indicate encryption
>> + *   CPUID Fn8000_001F[EBX] - Bits 11:6
>> + * Reduction in physical address space (in bits) when enabled
>> + */
>> +movl$0x801f, %eax
>> +cpuid
>> +bt  $0, %eax
>> +jnc .Lmem_encrypt_exit
>> +
>> +/* Check if BIOS/UEFI has allowed memory encryption */
>> +movl$MSR_K8_SYSCFG, %ecx
>> +rdmsr
>> +bt  $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
>> +jnc .Lmem_encrypt_exit
> 
> Like other people suggested, it would be great if this were in C. Should be
> actually readable :)

Yup, working on that.  I'll try and make it all completely C.

> 
>> +
>> +/* Check for the mem_encrypt=on command line option */
>> +push%rsi/* Save RSI (real_mode_data) */
>> +push%rbx/* Save CPUID information */
>> +movlBP_ext_cmd_line_ptr(%rsi), %ecx
>> +shlq$32, %rcx
>> +movlBP_cmd_line_ptr(%rsi), %edi
>> +addq%rcx, %rdi
>> +leaqmem_encrypt_enable_option(%rip), %rsi
>> +callcmdline_find_option_bool
>> +pop %rbx/* Restore CPUID information */
>> +pop %rsi/* Restore RSI (real_mode_data) */
>> +testl   %eax, %eax
>> +jz  .Lno_mem_encrypt
> 
> This too.
> 
>> +
>> +/* Set memory encryption mask */
>> +movl%ebx, %ecx
>> +andl$0x3f, %ecx
>> +bts %ecx, sme_me_mask(%rip)
>> +
>> +.Lno_mem_encrypt:
>> +/*
>> + * BIOS/UEFI has allowed memory encryption so we need to set
>> + * the amount of physical address space reduction even if
>> + * the user decides not to use memory encryption.
>> + */
>> +movl%ebx, %ecx
>> +shrl$6, %ecx
>> +andl$0x3f, %ecx
>> +movb%cl, sme_me_loss(%rip)
>> +
>> +.Lmem_encrypt_exit:
>> +#endif  /* 

Re: [RFC PATCH v2 19/20] x86: Access the setup data through debugfs un-encrypted

2016-09-14 Thread Tom Lendacky
On 09/12/2016 11:59 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:59PM -0500, Tom Lendacky wrote:
>> Since the setup data is in memory in the clear, it must be accessed as
>> un-encrypted.  Always use ioremap (similar to sysfs setup data support)
>> to map the data.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/kernel/kdebugfs.c |   30 +++---
>>  1 file changed, 11 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
>> index bdb83e4..a58a82e 100644
>> --- a/arch/x86/kernel/kdebugfs.c
>> +++ b/arch/x86/kernel/kdebugfs.c
>> @@ -48,17 +48,13 @@ static ssize_t setup_data_read(struct file *file, char 
>> __user *user_buf,
>>  
>>  pa = node->paddr + sizeof(struct setup_data) + pos;
>>  pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
>> -if (PageHighMem(pg)) {
> 
> Why is it ok to get rid of the PageHighMem() check?

Since the change is to perform the ioremap call no matter what the check
for PageHighMem() wasn't needed anymore.

> 
> Btw, we did talk earlier in the thread about making __va() clear the SME
> mask and then you won't really need to change stuff here. Or?

This is still required because just using the __va() would still cause
the mapping created to have the encryption bit set. The ioremap call
will result in the mapping not having the encryption bit set.

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7.1 19/22] iommu/arm-smmu: Wire up generic configuration support

2016-09-14 Thread Robin Murphy
With everything else now in place, fill in an of_xlate callback and the
appropriate registration to plumb into the generic configuration
machinery, and watch everything just work.

Signed-off-by: Robin Murphy 

---

- Don't pull in of_platform.h for no reason (was an old leftover)
- Don't spam deprecated binding message when multiple SMMUs are using it

---
 drivers/iommu/arm-smmu.c | 168 ++-
 1 file changed, 108 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9dbb6a37e625..fd6cc19c4ced 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -418,6 +418,8 @@ struct arm_smmu_option_prop {
 
 static atomic_t cavium_smmu_context_count = ATOMIC_INIT(0);
 
+static bool using_legacy_binding, using_generic_binding;
+
 static struct arm_smmu_option_prop arm_smmu_options[] = {
{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
{ 0, NULL},
@@ -817,12 +819,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
if (smmu_domain->smmu)
goto out_unlock;
 
-   /* We're bypassing these SIDs, so don't allocate an actual context */
-   if (domain->type == IOMMU_DOMAIN_DMA) {
-   smmu_domain->smmu = smmu;
-   goto out_unlock;
-   }
-
/*
 * Mapping the requested stage onto what we support is surprisingly
 * complicated, mainly because the spec allows S1+S2 SMMUs without
@@ -981,7 +977,7 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
void __iomem *cb_base;
int irq;
 
-   if (!smmu || domain->type == IOMMU_DOMAIN_DMA)
+   if (!smmu)
return;
 
/*
@@ -1015,8 +1011,8 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
if (!smmu_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain)) {
+   if (type == IOMMU_DOMAIN_DMA && (using_legacy_binding ||
+   iommu_get_dma_cookie(_domain->domain))) {
kfree(smmu_domain);
return NULL;
}
@@ -1133,19 +1129,22 @@ static int arm_smmu_master_alloc_smes(struct device 
*dev)
mutex_lock(>stream_map_mutex);
/* Figure out a viable stream map entry allocation */
for_each_cfg_sme(fwspec, i, idx) {
+   u16 sid = fwspec->ids[i];
+   u16 mask = fwspec->ids[i] >> SMR_MASK_SHIFT;
+
if (idx != INVALID_SMENDX) {
ret = -EEXIST;
goto out_err;
}
 
-   ret = arm_smmu_find_sme(smmu, fwspec->ids[i], 0);
+   ret = arm_smmu_find_sme(smmu, sid, mask);
if (ret < 0)
goto out_err;
 
idx = ret;
if (smrs && smmu->s2crs[idx].count == 0) {
-   smrs[idx].id = fwspec->ids[i];
-   smrs[idx].mask = 0; /* We don't currently share SMRs */
+   smrs[idx].id = sid;
+   smrs[idx].mask = mask;
smrs[idx].valid = true;
}
smmu->s2crs[idx].count++;
@@ -1203,15 +1202,6 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u8 cbndx = smmu_domain->cfg.cbndx;
int i, idx;
 
-   /*
-* FIXME: This won't be needed once we have IOMMU-backed DMA ops
-* for all devices behind the SMMU. Note that we need to take
-* care configuring SMRs for devices both a platform_device and
-* and a PCI device (i.e. a PCI host controller)
-*/
-   if (smmu_domain->domain.type == IOMMU_DOMAIN_DMA)
-   type = S2CR_TYPE_BYPASS;
-
for_each_cfg_sme(fwspec, i, idx) {
if (type == s2cr[idx].type && cbndx == s2cr[idx].cbndx)
continue;
@@ -1373,25 +1363,50 @@ static bool arm_smmu_capable(enum iommu_cap cap)
}
 }
 
+static int arm_smmu_match_node(struct device *dev, void *data)
+{
+   return dev->of_node == data;
+}
+
+static struct arm_smmu_device *arm_smmu_get_by_node(struct device_node *np)
+{
+   struct device *dev = driver_find_device(_smmu_driver.driver, NULL,
+   np, arm_smmu_match_node);
+   put_device(dev);
+   return dev ? dev_get_drvdata(dev) : NULL;
+}
+
 static int arm_smmu_add_device(struct device *dev)
 {
struct arm_smmu_device *smmu;
struct arm_smmu_master_cfg *cfg;
-   struct iommu_fwspec *fwspec;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
int i, ret;
 
-   ret = arm_smmu_register_legacy_master(dev, );
-   fwspec = dev->iommu_fwspec;
-   if (ret)
-   goto out_free;
+   if (using_legacy_binding) {
+   ret = 

[PATCH v7.1 13/22] iommu/arm-smmu: Refactor mmu-masters handling

2016-09-14 Thread Robin Murphy
To be able to support the generic bindings and handle of_xlate() calls,
we need to be able to associate SMMUs and stream IDs directly with
devices *before* allocating IOMMU groups. Furthermore, to support real
default domains with multi-device groups we also have to handle domain
attach on a per-device basis, as the "whole group at a time" assumption
fails to properly handle subsequent devices added to a group after the
first has already triggered default domain creation and attachment.

To that end, use the now-vacant dev->archdata.iommu field for easy
config and SMMU instance lookup, and unify config management by chopping
down the platform-device-specific tree and probing the "mmu-masters"
property on-demand instead. This may add a bit of one-off overhead to
initially adding a new device, but we're about to deprecate that binding
in favour of the inherently-more-efficient generic ones anyway.

For the sake of simplicity, this patch does temporarily regress the case
of aliasing PCI devices by losing the duplicate stream ID detection that
the previous per-group config had. Stay tuned, because we'll be back to
fix that in a better and more general way momentarily...

Tested-by: Lorenzo Pieralisi 
Signed-off-by: Robin Murphy 
---

- Fix __find_legacy_master_phandle() to properly cope with more than 
  one SMMU using the legacy binding.

 drivers/iommu/arm-smmu.c | 382 +--
 1 file changed, 107 insertions(+), 275 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69b6cab65421..2023a77015a0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -317,18 +317,13 @@ struct arm_smmu_smr {
 };
 
 struct arm_smmu_master_cfg {
+   struct arm_smmu_device  *smmu;
int num_streamids;
u16 streamids[MAX_MASTER_STREAMIDS];
s16 smendx[MAX_MASTER_STREAMIDS];
 };
 #define INVALID_SMENDX -1
 
-struct arm_smmu_master {
-   struct device_node  *of_node;
-   struct rb_node  node;
-   struct arm_smmu_master_cfg  cfg;
-};
-
 struct arm_smmu_device {
struct device   *dev;
 
@@ -376,7 +371,6 @@ struct arm_smmu_device {
unsigned int*irqs;
 
struct list_headlist;
-   struct rb_root  masters;
 
u32 cavium_id_base; /* Specific to Cavium */
 };
@@ -415,12 +409,6 @@ struct arm_smmu_domain {
struct iommu_domain domain;
 };
 
-struct arm_smmu_phandle_args {
-   struct device_node *np;
-   int args_count;
-   uint32_t args[MAX_MASTER_STREAMIDS];
-};
-
 static DEFINE_SPINLOCK(arm_smmu_devices_lock);
 static LIST_HEAD(arm_smmu_devices);
 
@@ -462,132 +450,89 @@ static struct device_node *dev_get_dev_node(struct 
device *dev)
 
while (!pci_is_root_bus(bus))
bus = bus->parent;
-   return bus->bridge->parent->of_node;
+   return of_node_get(bus->bridge->parent->of_node);
}
 
-   return dev->of_node;
+   return of_node_get(dev->of_node);
 }
 
-static struct arm_smmu_master *find_smmu_master(struct arm_smmu_device *smmu,
-   struct device_node *dev_node)
+static int __arm_smmu_get_pci_sid(struct pci_dev *pdev, u16 alias, void *data)
 {
-   struct rb_node *node = smmu->masters.rb_node;
-
-   while (node) {
-   struct arm_smmu_master *master;
-
-   master = container_of(node, struct arm_smmu_master, node);
-
-   if (dev_node < master->of_node)
-   node = node->rb_left;
-   else if (dev_node > master->of_node)
-   node = node->rb_right;
-   else
-   return master;
-   }
-
-   return NULL;
+   *((__be32 *)data) = cpu_to_be32(alias);
+   return 0; /* Continue walking */
 }
 
-static struct arm_smmu_master_cfg *
-find_smmu_master_cfg(struct device *dev)
+static int __find_legacy_master_phandle(struct device *dev, void *data)
 {
-   struct arm_smmu_master_cfg *cfg = NULL;
-   struct iommu_group *group = iommu_group_get(dev);
+   struct of_phandle_iterator *it = *(void **)data;
+   struct device_node *np = it->node;
+   int err;
 
-   if (group) {
-   cfg = iommu_group_get_iommudata(group);
-   iommu_group_put(group);
-   }
-
-   return cfg;
-}
-
-static int insert_smmu_master(struct arm_smmu_device *smmu,
- struct arm_smmu_master *master)
-{
-   struct rb_node **new, *parent;
-
-   new = >masters.rb_node;
-   parent = NULL;
-   while (*new) {
-   struct arm_smmu_master *this
-   = 

Re: [RFC PATCH v2 11/20] mm: Access BOOT related data in the clear

2016-09-14 Thread Tom Lendacky
On 09/12/2016 11:55 AM, Andy Lutomirski wrote:
> On Aug 22, 2016 6:53 PM, "Tom Lendacky"  wrote:
>>
>> BOOT data (such as EFI related data) is not encyrpted when the system is
>> booted and needs to be accessed as non-encrypted.  Add support to the
>> early_memremap API to identify the type of data being accessed so that
>> the proper encryption attribute can be applied.  Currently, two types
>> of data are defined, KERNEL_DATA and BOOT_DATA.
> 
> What happens when you memremap boot services data outside of early
> boot?  Matt just added code that does this.
> 
> IMO this API is not so great.  It scatters a specialized consideration
> all over the place.  Could early_memremap not look up the PA to figure
> out what to do?

Yes, I could see if the PA falls outside of the kernel usable area and,
if so, remove the memory encryption attribute from the mapping (for both
early_memremap and memremap).

Let me look into that, I would prefer something along that line over
this change.

Thanks,
Tom

> 
> --Andy
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 16/20] x86: Check for memory encryption on the APs

2016-09-14 Thread Tom Lendacky


On 09/12/2016 11:43 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:29PM -0500, Tom Lendacky wrote:
>> Add support to check if memory encryption is active in the kernel and that
>> it has been enabled on the AP. If memory encryption is active in the kernel
> 
> A small nit: let's write out "AP" the first time at least: "... on the
> Application Processors (AP)." for more clarity.

Will do.

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 10/20] x86: Insure that memory areas are encrypted when possible

2016-09-14 Thread Tom Lendacky
On 09/12/2016 11:33 AM, Borislav Petkov wrote:
> On Mon, Sep 12, 2016 at 10:05:36AM -0500, Tom Lendacky wrote:
>> I can look into that.  The reason I put this here is this is all the
>> early page fault support that is very specific to this file. I modified
>> an existing static function to take advantage of the mapping support.
> 
> Yeah, but all this code is SME-specific and doesn't belong there.
> AFAICT, it uses global/public symbols so there shouldn't be a problem to
> have it in mem_encrypt.c.

Ok, I'll look into moving this into mem_encrypt.c. I'd like to avoid
duplicating code so I may have to make that static function external
unless I find a better way.

Thanks,
Tom

> 
>> Hmmm, maybe... With the change to the early_memremap() the initrd is now
>> identified as BOOT_DATA in relocate_initrd() and so it will be mapped
>> and copied as non-encyrpted data. But since it was encrypted before the
>> call to relocate_initrd() it will copy encrypted bytes which will later
>> be accessed encrypted. That isn't clear though, so I'll rework
>> reserve_initrd() to perform the sme_early_mem_enc() once at the end
>> whether the initrd is re-located or not.
> 
> Makes sense.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 18/20] x86/kvm: Enable Secure Memory Encryption of nested page tables

2016-09-14 Thread Tom Lendacky
On 09/12/2016 09:35 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:49PM -0500, Tom Lendacky wrote:
>> Update the KVM support to include the memory encryption mask when creating
>> and using nested page tables.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/include/asm/kvm_host.h |3 ++-
>>  arch/x86/kvm/mmu.c  |8 ++--
>>  arch/x86/kvm/vmx.c  |3 ++-
>>  arch/x86/kvm/x86.c  |3 ++-
>>  4 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 33ae3a4..c51c1cb 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1039,7 +1039,8 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu);
>>  void kvm_mmu_init_vm(struct kvm *kvm);
>>  void kvm_mmu_uninit_vm(struct kvm *kvm);
>>  void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>> -u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask);
>> +u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
>> +u64 me_mask);
> 
> Why do you need a separate mask?
> 
> arch/x86/kvm/mmu.c::set_spte() ORs in shadow_present_mask
> unconditionally. So you can simply do:
> 
> 
>   kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
> PT_DIRTY_MASK, PT64_NX_MASK, 0,
> PT_PRESENT_MASK | sme_me_mask);
> 
> and have this change much simpler.

Just keeping in step with the way this is done for the other masks.
They each have a specific meaning so I was trying not to combine
them in case they may be used differently in the future.

> 
>>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 3d4cc8cc..a7040f4 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -122,7 +122,7 @@ module_param(dbg, bool, 0644);
>>  * PT32_LEVEL_BITS))) - 1))
>>  
>>  #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | 
>> shadow_user_mask \
>> -| shadow_x_mask | shadow_nx_mask)
>> +| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
> 
> This would be sme_me_mask, of course, like with the baremetal masks.
> 
> Or am I missing something?

Given the current patch, this would be shadow_me_mask since that value
is set by the call to kvm_mmu_set_mask_ptes.

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 16/20] x86: Check for memory encryption on the APs

2016-09-14 Thread Tom Lendacky
On 09/12/2016 07:17 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:29PM -0500, Tom Lendacky wrote:
>> Add support to check if memory encryption is active in the kernel and that
>> it has been enabled on the AP. If memory encryption is active in the kernel
>> but has not been enabled on the AP then do not allow the AP to continue
>> start up.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/include/asm/msr-index.h |2 ++
>>  arch/x86/include/asm/realmode.h  |   12 
>>  arch/x86/realmode/init.c |4 
>>  arch/x86/realmode/rm/trampoline_64.S |   19 +++
>>  4 files changed, 37 insertions(+)
> 
> ...
> 
>> diff --git a/arch/x86/realmode/rm/trampoline_64.S 
>> b/arch/x86/realmode/rm/trampoline_64.S
>> index dac7b20..94e29f4 100644
>> --- a/arch/x86/realmode/rm/trampoline_64.S
>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>> @@ -30,6 +30,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include "realmode.h"
>>  
>>  .text
>> @@ -92,6 +93,23 @@ ENTRY(startup_32)
>>  movl%edx, %fs
>>  movl%edx, %gs
>>  
>> +/* Check for memory encryption support */
>> +bt  $TH_FLAGS_SME_ENABLE_BIT, pa_tr_flags
>> +jnc .Ldone
>> +movl$MSR_K8_SYSCFG, %ecx
>> +rdmsr
>> +bt  $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
>> +jc  .Ldone
>> +
>> +/*
>> + * Memory encryption is enabled but the MSR has not been set on this
>> + * CPU so we can't continue
> 
> Hmm, let me try to parse this correctly: BSP has SME enabled but the
> BIOS might not've set this on the AP? Really? Is that even possible?

Anything is possible, although it's highly unlikely.

> 
> Because if SME is enabled, that means that MSR_K8_SYSCFG[23] on the BSP
> is set, right?

Correct.

> 
> Also, I want to rule out here simple BIOS idiocy: if the only problem
> with the bit not being set in the AP is because some BIOS monkey forgot
> to do so, then we should try to set it ourselves and not die for no real
> reason.

Yes, we can do that.  I was debating on which way to go with this. Most
likely this would never happen, but if it did...  I can change this to
set the MSR bit and continue.

Thanks,
Tom

> 
> Or is there another issue?
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 15/20] iommu/amd: AMD IOMMU support for memory encryption

2016-09-14 Thread Tom Lendacky
On 09/12/2016 06:45 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:20PM -0500, Tom Lendacky wrote:
>> Add support to the AMD IOMMU driver to set the memory encryption mask if
>> memory encryption is enabled.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |2 ++
>>  arch/x86/mm/mem_encrypt.c  |5 +
>>  drivers/iommu/amd_iommu.c  |   10 ++
>>  3 files changed, 17 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h 
>> b/arch/x86/include/asm/mem_encrypt.h
>> index 384fdfb..e395729 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -36,6 +36,8 @@ void __init sme_early_init(void);
>>  /* Architecture __weak replacement functions */
>>  void __init mem_encrypt_init(void);
>>  
>> +unsigned long amd_iommu_get_me_mask(void);
>> +
>>  unsigned long swiotlb_get_me_mask(void);
>>  void swiotlb_set_mem_dec(void *vaddr, unsigned long size);
>>  
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 6b2e8bf..2f28d87 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -185,6 +185,11 @@ void __init mem_encrypt_init(void)
>>  swiotlb_clear_encryption();
>>  }
>>  
>> +unsigned long amd_iommu_get_me_mask(void)
>> +{
>> +return sme_me_mask;
>> +}
>> +
>>  unsigned long swiotlb_get_me_mask(void)
>>  {
>>  return sme_me_mask;
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 96de97a..63995e3 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -166,6 +166,15 @@ struct dma_ops_domain {
>>  static struct iova_domain reserved_iova_ranges;
>>  static struct lock_class_key reserved_rbtree_key;
>>  
>> +/*
>> + * Support for memory encryption. If memory encryption is supported, then an
>> + * override to this function will be provided.
>> + */
>> +unsigned long __weak amd_iommu_get_me_mask(void)
>> +{
>> +return 0;
>> +}
> 
> So instead of adding a function each time which returns sme_me_mask
> for each user it has, why don't you add a single function which
> returns sme_me_mask in mem_encrypt.c and add an inline in the header
> mem_encrypt.h which returns 0 for the !CONFIG_AMD_MEM_ENCRYPT case.

Currently, mem_encrypt.h only lives in the arch/x86 directory so it
wouldn't be able to be included here without breaking other archs.

> 
> This all is still funny because we access sme_me_mask directly for the
> different KERNEL_* masks but then you're adding an accessor function.

Because this lives outside of the arch/x86 I need to use the weak
function.

> 
> So what you should do instead, IMHO, is either hide sme_me_mask
> altogether and use the accessor functions only (not sure if that would
> work in all cases) or expose sme_me_mask unconditionally and have it be
> 0 if CONFIG_AMD_MEM_ENCRYPT is not enabled so that it just works.
> 
> Or is there a third, more graceful variant?

Is there a better way to do this given the support is only in x86?

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 14/20] x86: DMA support for memory encryption

2016-09-14 Thread Tom Lendacky
On 09/12/2016 05:58 AM, Borislav Petkov wrote:
> On Mon, Aug 22, 2016 at 05:38:07PM -0500, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create un-encrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/include/asm/dma-mapping.h |5 ++-
>>  arch/x86/include/asm/mem_encrypt.h |6 +++
>>  arch/x86/kernel/pci-dma.c  |   11 --
>>  arch/x86/kernel/pci-nommu.c|2 +
>>  arch/x86/kernel/pci-swiotlb.c  |8 +++--
>>  arch/x86/mm/mem_encrypt.c  |   22 
>>  include/linux/swiotlb.h|1 +
>>  init/main.c|   13 +++
>>  lib/swiotlb.c  |   64 
>> 
>>  9 files changed, 115 insertions(+), 17 deletions(-)
> 
> ...
> 
>> @@ -172,3 +174,23 @@ void __init sme_early_init(void)
>>  for (i = 0; i < ARRAY_SIZE(protection_map); i++)
>>  protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | 
>> sme_me_mask);
>>  }
>> +
>> +/* Architecture __weak replacement functions */
>> +void __init mem_encrypt_init(void)
>> +{
>> +if (!sme_me_mask)
>> +return;
>> +
>> +/* Make SWIOTLB use an unencrypted DMA area */
>> +swiotlb_clear_encryption();
>> +}
>> +
>> +unsigned long swiotlb_get_me_mask(void)
> 
> This could just as well be named to something more generic:
> 
> swiotlb_get_clear_dma_mask() or so which basically means the mask of
> bits which get cleared before returning DMA addresses...

Ok.

> 
>> +{
>> +return sme_me_mask;
>> +}
>> +
>> +void swiotlb_set_mem_dec(void *vaddr, unsigned long size)
>> +{
>> +sme_set_mem_dec(vaddr, size);
>> +}
>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> index 5f81f8a..5c909fc 100644
>> --- a/include/linux/swiotlb.h
>> +++ b/include/linux/swiotlb.h
>> @@ -29,6 +29,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, 
>> int verbose);
>>  extern unsigned long swiotlb_nr_tbl(void);
>>  unsigned long swiotlb_size_or_default(void);
>>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>> +extern void __init swiotlb_clear_encryption(void);
>>  
>>  /*
>>   * Enumeration for sync targets
>> diff --git a/init/main.c b/init/main.c
>> index a8a58e2..82c7cd9 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -458,6 +458,10 @@ void __init __weak thread_stack_cache_init(void)
>>  }
>>  #endif
>>  
>> +void __init __weak mem_encrypt_init(void)
>> +{
>> +}
>> +
>>  /*
>>   * Set up kernel memory allocators
>>   */
>> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>>   */
>>  locking_selftest();
>>  
>> +/*
>> + * This needs to be called before any devices perform DMA
>> + * operations that might use the swiotlb bounce buffers.
>> + * This call will mark the bounce buffers as un-encrypted so
>> + * that the usage of them will not cause "plain-text" data
> 
>   ...that their usage will not cause ...

Ok.

> 
>> + * to be decrypted when accessed.
>> + */
>> +mem_encrypt_init();
>> +
>>  #ifdef CONFIG_BLK_DEV_INITRD
>>  if (initrd_start && !initrd_below_start_ok &&
>>  page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> index 22e13a0..15d5741 100644
>> --- a/lib/swiotlb.c
>> +++ b/lib/swiotlb.c
>> @@ -131,6 +131,26 @@ unsigned long swiotlb_size_or_default(void)
>>  return size ? size : (IO_TLB_DEFAULT_SIZE);
>>  }
>>  
>> +/*
>> + * Support for memory encryption. If memory encryption is supported, then an
>> + * override to these functions will be provided.
>> + */
> 
> No need for that comment.

Ok.

> 
>> +unsigned long __weak swiotlb_get_me_mask(void)
>> +{
>> +return 0;
>> +}
>> +
>> +void __weak swiotlb_set_mem_dec(void *vaddr, unsigned long size)
>> +{
>> +}
>> +
>> +/* For swiotlb, clear memory encryption mask from dma addresses */
>> +static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
>> +  phys_addr_t address)
>> +{
>> +return phys_to_dma(hwdev, address) & ~swiotlb_get_me_mask();
>> +}
>> +
>>  /* Note that this doesn't work with highmem page */
>>  static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
>>volatile void *address)
>> @@ -159,6 +179,30 @@ void swiotlb_print_info(void)
>> bytes >> 20, vstart, vend - 1);
>>  }
>>  
>> +/*
>> + * If memory encryption is active, the DMA address for an encrypted page may
>> + * be beyond the range of the device. If bounce buffers are required be sure
>> + * that they are not on an encrypted page. This should be called before the
>> + * iotlb area is used.
>> + */
>> +void __init 

Re: [PATCH v7 22/22] iommu/dma: Avoid PCI host bridge windows

2016-09-14 Thread Robin Murphy
On 14/09/16 13:35, Marek Szyprowski wrote:
> Hi Robin,
> 
> 
> On 2016-09-14 13:10, Robin Murphy wrote:
>> On 14/09/16 11:55, Marek Szyprowski wrote:
>>> On 2016-09-12 18:14, Robin Murphy wrote:
 With our DMA ops enabled for PCI devices, we should avoid allocating
 IOVAs which a host bridge might misinterpret as peer-to-peer DMA and
 lead to faults, corruption or other badness. To be safe, punch out
 holes
 for all of the relevant host bridge's windows when initialising a DMA
 domain for a PCI device.

 CC: Marek Szyprowski 
 CC: Inki Dae 
 Reported-by: Lorenzo Pieralisi 
 Signed-off-by: Robin Murphy 
>>> I don't know much about PCI and their IOMMU integration, but can't we
>>> use
>>> the direct mapping region feature of iommu core for it? There are
>>> already
>>> iommu_get_dm_regions(), iommu_put_dm_regions() and
>>> iommu_request_dm_for_dev()
>>> functions for handling them...
>> It's rather the opposite problem - in the direct-mapping case, we're
>> making sure the iommu_domain has translations installed for the given
>> IOVAs (which are also the corresponding physical address) before it goes
>> live, whereas what we need to do here is make sure the these addresses
>> never get used as IOVAs at all, because any attempt to do so them will
>> likely go wrong. Thus we carve them out of the iova_domain such that
>> they will never get near an actual IOMMU API call.
>>
>> This is a slightly generalised equivalent of e.g. amd_iommu.c's
>> init_reserved_iova_ranges().
> 
> Hmmm. Each dm_region have protection parameter. Can't we reuse them to
> create prohibited/reserved regions by setting it to 0 (no read / no write)
> and mapping to physical 0 address? That's just a quick idea, because
> dm_regions and the proposed code for pci looks a bit similar for me...

It might look similar, but at different levels (iommu_domain vs.
iova_domain) and with the opposite intent. The dm_region prot flag is
just the standard flag as passed to iommu_map() - trying to map a region
with no access in an empty pagetable isn't going to achieve anything
anyway (it's effectively unmapping addresses that are already unmapped).

But for this case, even if you _did_ map something in the pagetable
(i.e. the iommu_domain), it wouldn't make any difference, because the
thing we're mitigating against is handing out addresses which are going
to cause a device's accesses to blow up inside the PCI root complex
without ever even reaching the IOMMU. In short, dm_regions are about
"these addresses are already being used for DMA, so make sure the IOMMU
API doesn't block them", whereas reserved ranges are about "these
addresses are unusable for DMA, so make sure the DMA API can't allocate
them".

Robin.

> 
> [...]
> 
> Best regards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Robin Murphy
On 14/09/16 13:32, Auger Eric wrote:
> Hi,
> On 14/09/2016 12:35, Robin Murphy wrote:
>> On 14/09/16 09:41, Auger Eric wrote:
>>> Hi,
>>>
>>> On 12/09/2016 18:13, Robin Murphy wrote:
 Hi all,

 To any more confusing fixups and crazily numbered extra patches, here's
 a quick v7 with everything rebased into the right order. The significant
 change this time is to implement iommu_fwspec properly from the start,
 which ends up being far simpler and more robust than faffing about
 introducing it somewhere 'less intrusive' to move toward core code later.

 New branch in the logical place:

 git://linux-arm.org/linux-rm iommu/generic-v7
>>>
>>> For information, as discussed privately with Robin I experience some
>>> regressions with the former and now deprecated dt description.
>>>
>>> on my AMD Overdrive board and my old dt description I now only see a
>>> single group:
>>>
>>> /sys/kernel/iommu_groups/
>>> /sys/kernel/iommu_groups/0
>>> /sys/kernel/iommu_groups/0/devices
>>> /sys/kernel/iommu_groups/0/devices/e070.xgmac
>>>
>>> whereas I formerly see
>>>
>>> /sys/kernel/iommu_groups/
>>> /sys/kernel/iommu_groups/3
>>> /sys/kernel/iommu_groups/3/devices
>>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>>> /sys/kernel/iommu_groups/1
>>> /sys/kernel/iommu_groups/1/devices
>>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>>> /sys/kernel/iommu_groups/4
>>> /sys/kernel/iommu_groups/4/devices
>>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>>> /sys/kernel/iommu_groups/4/devices/:01:00.1
>>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>>> /sys/kernel/iommu_groups/4/devices/:01:00.0
>>> /sys/kernel/iommu_groups/2
>>> /sys/kernel/iommu_groups/2/devices
>>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>>> /sys/kernel/iommu_groups/0
>>> /sys/kernel/iommu_groups/0/devices
>>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>>>
>>> This is the group topology without ACS override. Applying the non
>>> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
>>> to see separate groups for each PCIe components. Now I don't see any
>>> difference with and without ACS override.
>>
>> OK, having reproduced on my Juno, the problem looks to be that
>> of_for_each_phandle() leaves err set to -ENOENT after successfully
>> walking a phandle list, which makes __find_legacy_master_phandle()
>> always bail out after the first SMMU.
>>
>> Can you confirm that the following diff fixes things for you?
> 
> Well it improves but there are still differences in the group topology.
> The PFs now are in group 0.
> 
> root@trusty:~# lspci -nk
> 00:00.0 0600: 1022:1a00
> Subsystem: 1022:1a00
> 00:02.0 0600: 1022:1a01
> 00:02.2 0604: 1022:1a02
> Kernel driver in use: pcieport
> 01:00.0 0200: 8086:1521 (rev 01)
> Subsystem: 8086:0002
> Kernel driver in use: igb
> 01:00.1 0200: 8086:1521 (rev 01)
> Subsystem: 8086:0002
> Kernel driver in use: igb
> 
> 
> with your series + fix:
> /sys/kernel/iommu_groups/
> /sys/kernel/iommu_groups/3
> /sys/kernel/iommu_groups/3/devices
> /sys/kernel/iommu_groups/3/devices/:00:00.0
> /sys/kernel/iommu_groups/1
> /sys/kernel/iommu_groups/1/devices
> /sys/kernel/iommu_groups/1/devices/e070.xgmac
> /sys/kernel/iommu_groups/4
> /sys/kernel/iommu_groups/4/devices
> /sys/kernel/iommu_groups/4/devices/:00:02.2
> /sys/kernel/iommu_groups/4/devices/:00:02.0
> /sys/kernel/iommu_groups/2
> /sys/kernel/iommu_groups/2/devices
> /sys/kernel/iommu_groups/2/devices/e090.xgmac
> /sys/kernel/iommu_groups/0
> /sys/kernel/iommu_groups/0/devices
> /sys/kernel/iommu_groups/0/devices/:01:00.1
> /sys/kernel/iommu_groups/0/devices/f000.pcie
> /sys/kernel/iommu_groups/0/devices/:01:00.0
> 
> Before (4.8-rc5):
> 
> /sys/kernel/iommu_groups/
> /sys/kernel/iommu_groups/3
> /sys/kernel/iommu_groups/3/devices
> /sys/kernel/iommu_groups/3/devices/:00:00.0
> /sys/kernel/iommu_groups/1
> /sys/kernel/iommu_groups/1/devices
> /sys/kernel/iommu_groups/1/devices/e070.xgmac
> /sys/kernel/iommu_groups/4
> /sys/kernel/iommu_groups/4/devices
> /sys/kernel/iommu_groups/4/devices/:00:02.2
> /sys/kernel/iommu_groups/4/devices/:01:00.1
> /sys/kernel/iommu_groups/4/devices/:00:02.0
> /sys/kernel/iommu_groups/4/devices/:01:00.0
> /sys/kernel/iommu_groups/2
> /sys/kernel/iommu_groups/2/devices
> /sys/kernel/iommu_groups/2/devices/e090.xgmac
> /sys/kernel/iommu_groups/0
> /sys/kernel/iommu_groups/0/devices
> /sys/kernel/iommu_groups/0/devices/f000.pcie

Your DT claims that f000.pcie (i.e. the "platform device" side of
the host controller) owns the IDs 0x100 0x101 0x102 0x103 0x200 0x201
0x202 0x203 0x300 0x301 0x302 0x303 0x400 0x401 0x402 0x403. Thus when
new devices (the PCI PFs) come along *also* claiming those IDs (via the
RID-to-SID assumption), we now detect the aliasing and assign them to
the existing group.

The only way that DT worked without SMR 

[PATCH] iommu/exynos: Improve page fault debug message

2016-09-14 Thread Marek Szyprowski
Add master device name to default IOMMU fault message to make easier to
find which device triggered the fault. While at it, move printing some
information (like page table base and first level entry addresses) to
dev_dbg(), because those are typically not very useful for typical device
driver user/developer not equipped with hardware debugging tools.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 30808e9..47f0d67 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -368,13 +368,14 @@ static void show_fault_information(struct sysmmu_drvdata 
*data,
 {
sysmmu_pte_t *ent;
 
-   dev_err(data->sysmmu, "%s FAULT occurred at %#x (page table base: 
%pa)\n",
-   finfo->name, fault_addr, >pgtable);
+   dev_err(data->sysmmu, "%s: %s FAULT occurred at %#x\n",
+   dev_name(data->master), finfo->name, fault_addr);
+   dev_dbg(data->sysmmu, "Page table base: %pa\n", >pgtable);
ent = section_entry(phys_to_virt(data->pgtable), fault_addr);
-   dev_err(data->sysmmu, "\tLv1 entry: %#x\n", *ent);
+   dev_dbg(data->sysmmu, "\tLv1 entry: %#x\n", *ent);
if (lv1ent_page(ent)) {
ent = page_entry(ent, fault_addr);
-   dev_err(data->sysmmu, "\t Lv2 entry: %#x\n", *ent);
+   dev_dbg(data->sysmmu, "\t Lv2 entry: %#x\n", *ent);
}
 }
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 22/22] iommu/dma: Avoid PCI host bridge windows

2016-09-14 Thread Marek Szyprowski

Hi Robin,


On 2016-09-14 13:10, Robin Murphy wrote:

On 14/09/16 11:55, Marek Szyprowski wrote:

On 2016-09-12 18:14, Robin Murphy wrote:

With our DMA ops enabled for PCI devices, we should avoid allocating
IOVAs which a host bridge might misinterpret as peer-to-peer DMA and
lead to faults, corruption or other badness. To be safe, punch out holes
for all of the relevant host bridge's windows when initialising a DMA
domain for a PCI device.

CC: Marek Szyprowski 
CC: Inki Dae 
Reported-by: Lorenzo Pieralisi 
Signed-off-by: Robin Murphy 

I don't know much about PCI and their IOMMU integration, but can't we use
the direct mapping region feature of iommu core for it? There are already
iommu_get_dm_regions(), iommu_put_dm_regions() and
iommu_request_dm_for_dev()
functions for handling them...

It's rather the opposite problem - in the direct-mapping case, we're
making sure the iommu_domain has translations installed for the given
IOVAs (which are also the corresponding physical address) before it goes
live, whereas what we need to do here is make sure the these addresses
never get used as IOVAs at all, because any attempt to do so them will
likely go wrong. Thus we carve them out of the iova_domain such that
they will never get near an actual IOMMU API call.

This is a slightly generalised equivalent of e.g. amd_iommu.c's
init_reserved_iova_ranges().


Hmmm. Each dm_region have protection parameter. Can't we reuse them to
create prohibited/reserved regions by setting it to 0 (no read / no write)
and mapping to physical 0 address? That's just a quick idea, because
dm_regions and the proposed code for pci looks a bit similar for me...

[...]

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi,
On 14/09/2016 12:35, Robin Murphy wrote:
> On 14/09/16 09:41, Auger Eric wrote:
>> Hi,
>>
>> On 12/09/2016 18:13, Robin Murphy wrote:
>>> Hi all,
>>>
>>> To any more confusing fixups and crazily numbered extra patches, here's
>>> a quick v7 with everything rebased into the right order. The significant
>>> change this time is to implement iommu_fwspec properly from the start,
>>> which ends up being far simpler and more robust than faffing about
>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>
>>> New branch in the logical place:
>>>
>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>
>> For information, as discussed privately with Robin I experience some
>> regressions with the former and now deprecated dt description.
>>
>> on my AMD Overdrive board and my old dt description I now only see a
>> single group:
>>
>> /sys/kernel/iommu_groups/
>> /sys/kernel/iommu_groups/0
>> /sys/kernel/iommu_groups/0/devices
>> /sys/kernel/iommu_groups/0/devices/e070.xgmac
>>
>> whereas I formerly see
>>
>> /sys/kernel/iommu_groups/
>> /sys/kernel/iommu_groups/3
>> /sys/kernel/iommu_groups/3/devices
>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>> /sys/kernel/iommu_groups/1
>> /sys/kernel/iommu_groups/1/devices
>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>> /sys/kernel/iommu_groups/4
>> /sys/kernel/iommu_groups/4/devices
>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>> /sys/kernel/iommu_groups/4/devices/:01:00.1
>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>> /sys/kernel/iommu_groups/4/devices/:01:00.0
>> /sys/kernel/iommu_groups/2
>> /sys/kernel/iommu_groups/2/devices
>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>> /sys/kernel/iommu_groups/0
>> /sys/kernel/iommu_groups/0/devices
>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>>
>> This is the group topology without ACS override. Applying the non
>> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
>> to see separate groups for each PCIe components. Now I don't see any
>> difference with and without ACS override.
> 
> OK, having reproduced on my Juno, the problem looks to be that
> of_for_each_phandle() leaves err set to -ENOENT after successfully
> walking a phandle list, which makes __find_legacy_master_phandle()
> always bail out after the first SMMU.
> 
> Can you confirm that the following diff fixes things for you?

Well it improves but there are still differences in the group topology.
The PFs now are in group 0.

root@trusty:~# lspci -nk
00:00.0 0600: 1022:1a00
Subsystem: 1022:1a00
00:02.0 0600: 1022:1a01
00:02.2 0604: 1022:1a02
Kernel driver in use: pcieport
01:00.0 0200: 8086:1521 (rev 01)
Subsystem: 8086:0002
Kernel driver in use: igb
01:00.1 0200: 8086:1521 (rev 01)
Subsystem: 8086:0002
Kernel driver in use: igb


with your series + fix:
/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/:01:00.1
/sys/kernel/iommu_groups/0/devices/f000.pcie
/sys/kernel/iommu_groups/0/devices/:01:00.0

Before (4.8-rc5):

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:01:00.1
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/4/devices/:01:00.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/f000.pcie

Thanks

Eric

> 
> Robin
> 
> --->8---
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index fa892d25004d..ac4aab97c93a 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -477,7 +477,7 @@ static int __find_legacy_master_phandle(struct
> device *dev, void *data)
>   return 1;
>   }
>   it->node = np;
> - return err;
> + return err == -ENOENT ? 0 : err;
>  }
> 
>  static struct platform_driver arm_smmu_driver;
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> 

Re: [PATCH v3 2/2] iommu/exynos: Add proper runtime pm support

2016-09-14 Thread Ulf Hansson
[...]

>>>
 To solve this problem, I was thinking we could convert to use the
 asynchronous pm_runtime_get() API, when trying to runtime resume the
 device from atomic contexts.
>>>
>>> I'm not sure if this will work for DMA engine devices. If I understand
>>> correctly some client's of DMA engine device might rely on the DMA
>>> engine being configured and operational after queuing a request and
>>> they might lock up if the DMA engine device activation if postponed
>>> because of async runtime pm activation.
>>
>> I didn't know about this. If you have an example from the top of your
>> head, could you perhaps point me to it?
>
>
> No, I don't have any example. This is just my fear that there might be some
> hardware which works this way. I can imagine a driver, which queue dma
> engine
> request and then triggers 'start' command to its hw block. HW blocks usually
> uses some additional signal lines for DMA, so the HW block might check if
> all
> needed signals from DMA engine HW are ready. If not it will fail to start
> avoid lockup of starting without DMA engine HW being ready.

Well, I think this reasoning makes lots of sense! Clearly we wouldn't
be able to guarantee that there's isn't a glitch in an undefined HW
behaviour.

I will drop my suggested approach and try out another one.

>
> However I don't have much experience with DMA engine and drivers. I only
> helped in adding DMA engine support to Samsung UART driver and in the
> hardware manual there is information about additional lines between DMA
> controller and UART module, which are used in the DMA mode.

Thanks for sharing your experience!

Now, I should allow you to get back to the original review! :-)

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 22/22] iommu/dma: Avoid PCI host bridge windows

2016-09-14 Thread Robin Murphy
Hi Marek,

On 14/09/16 11:55, Marek Szyprowski wrote:
> Hi Robin,
> 
> 
> On 2016-09-12 18:14, Robin Murphy wrote:
>> With our DMA ops enabled for PCI devices, we should avoid allocating
>> IOVAs which a host bridge might misinterpret as peer-to-peer DMA and
>> lead to faults, corruption or other badness. To be safe, punch out holes
>> for all of the relevant host bridge's windows when initialising a DMA
>> domain for a PCI device.
>>
>> CC: Marek Szyprowski 
>> CC: Inki Dae 
>> Reported-by: Lorenzo Pieralisi 
>> Signed-off-by: Robin Murphy 
> 
> I don't know much about PCI and their IOMMU integration, but can't we use
> the direct mapping region feature of iommu core for it? There are already
> iommu_get_dm_regions(), iommu_put_dm_regions() and
> iommu_request_dm_for_dev()
> functions for handling them...

It's rather the opposite problem - in the direct-mapping case, we're
making sure the iommu_domain has translations installed for the given
IOVAs (which are also the corresponding physical address) before it goes
live, whereas what we need to do here is make sure the these addresses
never get used as IOVAs at all, because any attempt to do so them will
likely go wrong. Thus we carve them out of the iova_domain such that
they will never get near an actual IOMMU API call.

This is a slightly generalised equivalent of e.g. amd_iommu.c's
init_reserved_iova_ranges().

Robin.

> 
>> ---
>>
>> - Squash in the previous drm/exynos fixup
>> - If need be, this one can probably wait
>> ---
>>   arch/arm64/mm/dma-mapping.c   |  2 +-
>>   drivers/gpu/drm/exynos/exynos_drm_iommu.h |  2 +-
>>   drivers/iommu/dma-iommu.c | 25
>> -
>>   include/linux/dma-iommu.h |  3 ++-
>>   4 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index c4284c432ae8..610d8e53011e 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -827,7 +827,7 @@ static bool do_iommu_attach(struct device *dev,
>> const struct iommu_ops *ops,
>>* then the IOMMU core will have already configured a group for
>> this
>>* device, and allocated the default domain for that group.
>>*/
>> -if (!domain || iommu_dma_init_domain(domain, dma_base, size)) {
>> +if (!domain || iommu_dma_init_domain(domain, dma_base, size, dev)) {
>>   pr_warn("Failed to set up IOMMU for device %s; retaining
>> platform DMA ops\n",
>>   dev_name(dev));
>>   return false;
>> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> index c8de4913fdbe..87f6b5672e11 100644
>> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> @@ -66,7 +66,7 @@ static inline int
>> __exynos_iommu_create_mapping(struct exynos_drm_private *priv,
>>   if (ret)
>>   goto free_domain;
>>   -ret = iommu_dma_init_domain(domain, start, size);
>> +ret = iommu_dma_init_domain(domain, start, size, NULL);
>>   if (ret)
>>   goto put_cookie;
>>   diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 4329d18080cf..c5ab8667e6f2 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>   @@ -103,18 +104,38 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>   }
>>   EXPORT_SYMBOL(iommu_put_dma_cookie);
>>   +static void iova_reserve_pci_windows(struct pci_dev *dev,
>> +struct iova_domain *iovad)
>> +{
>> +struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +struct resource_entry *window;
>> +unsigned long lo, hi;
>> +
>> +resource_list_for_each_entry(window, >windows) {
>> +if (resource_type(window->res) != IORESOURCE_MEM &&
>> +resource_type(window->res) != IORESOURCE_IO)
>> +continue;
>> +
>> +lo = iova_pfn(iovad, window->res->start - window->offset);
>> +hi = iova_pfn(iovad, window->res->end - window->offset);
>> +reserve_iova(iovad, lo, hi);
>> +}
>> +}
>> +
>>   /**
>>* iommu_dma_init_domain - Initialise a DMA mapping domain
>>* @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>>* @base: IOVA at which the mappable address space starts
>>* @size: Size of IOVA space
>> + * @dev: Device the domain is being initialised for
>>*
>>* @base and @size should be exact multiples of IOMMU page
>> granularity to
>>* avoid rounding surprises. If necessary, we reserve the page at
>> address 0
>>* to ensure it is an invalid IOVA. It is safe to reinitialise a
>> domain, but
>>* any change which could make prior IOVAs invalid will fail.
>>*/
>> -int 

Re: [PATCH v7 22/22] iommu/dma: Avoid PCI host bridge windows

2016-09-14 Thread Marek Szyprowski

Hi Robin,


On 2016-09-12 18:14, Robin Murphy wrote:

With our DMA ops enabled for PCI devices, we should avoid allocating
IOVAs which a host bridge might misinterpret as peer-to-peer DMA and
lead to faults, corruption or other badness. To be safe, punch out holes
for all of the relevant host bridge's windows when initialising a DMA
domain for a PCI device.

CC: Marek Szyprowski 
CC: Inki Dae 
Reported-by: Lorenzo Pieralisi 
Signed-off-by: Robin Murphy 


I don't know much about PCI and their IOMMU integration, but can't we use
the direct mapping region feature of iommu core for it? There are already
iommu_get_dm_regions(), iommu_put_dm_regions() and 
iommu_request_dm_for_dev()

functions for handling them...


---

- Squash in the previous drm/exynos fixup
- If need be, this one can probably wait
---
  arch/arm64/mm/dma-mapping.c   |  2 +-
  drivers/gpu/drm/exynos/exynos_drm_iommu.h |  2 +-
  drivers/iommu/dma-iommu.c | 25 -
  include/linux/dma-iommu.h |  3 ++-
  4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c4284c432ae8..610d8e53011e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -827,7 +827,7 @@ static bool do_iommu_attach(struct device *dev, const 
struct iommu_ops *ops,
 * then the IOMMU core will have already configured a group for this
 * device, and allocated the default domain for that group.
 */
-   if (!domain || iommu_dma_init_domain(domain, dma_base, size)) {
+   if (!domain || iommu_dma_init_domain(domain, dma_base, size, dev)) {
pr_warn("Failed to set up IOMMU for device %s; retaining platform 
DMA ops\n",
dev_name(dev));
return false;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h 
b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index c8de4913fdbe..87f6b5672e11 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -66,7 +66,7 @@ static inline int __exynos_iommu_create_mapping(struct 
exynos_drm_private *priv,
if (ret)
goto free_domain;
  
-	ret = iommu_dma_init_domain(domain, start, size);

+   ret = iommu_dma_init_domain(domain, start, size, NULL);
if (ret)
goto put_cookie;
  
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c

index 4329d18080cf..c5ab8667e6f2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -103,18 +104,38 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)

  }
  EXPORT_SYMBOL(iommu_put_dma_cookie);
  
+static void iova_reserve_pci_windows(struct pci_dev *dev,

+   struct iova_domain *iovad)
+{
+   struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct resource_entry *window;
+   unsigned long lo, hi;
+
+   resource_list_for_each_entry(window, >windows) {
+   if (resource_type(window->res) != IORESOURCE_MEM &&
+   resource_type(window->res) != IORESOURCE_IO)
+   continue;
+
+   lo = iova_pfn(iovad, window->res->start - window->offset);
+   hi = iova_pfn(iovad, window->res->end - window->offset);
+   reserve_iova(iovad, lo, hi);
+   }
+}
+
  /**
   * iommu_dma_init_domain - Initialise a DMA mapping domain
   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
   * @base: IOVA at which the mappable address space starts
   * @size: Size of IOVA space
+ * @dev: Device the domain is being initialised for
   *
   * @base and @size should be exact multiples of IOMMU page granularity to
   * avoid rounding surprises. If necessary, we reserve the page at address 0
   * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
   * any change which could make prior IOVAs invalid will fail.
   */
-int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size)
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
+   u64 size, struct device *dev)
  {
struct iova_domain *iovad = cookie_iovad(domain);
unsigned long order, base_pfn, end_pfn;
@@ -152,6 +173,8 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base, u64 size
iovad->dma_32bit_pfn = end_pfn;
} else {
init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+   if (dev && dev_is_pci(dev))
+   iova_reserve_pci_windows(to_pci_dev(dev), iovad);
}
return 0;
  }
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 5ee806e41b5c..32c589062bd9 100644
--- 

Re: [PATCH v3 2/2] iommu/exynos: Add proper runtime pm support

2016-09-14 Thread Marek Szyprowski

Hi Ulf,


On 2016-09-14 12:28, Ulf Hansson wrote:

[...]


There are some similarities between IOMMU and DMA engine devices (serial
drivers are imho completely different case). Both hw blocks do their work
on behalf of some other hardware block, which I will call master device.
DMA engine performs some DMA transaction on master's device request, while
IOMMU usually sits between system memory and master's device memory
interface, remapping addresses of each DMA transaction according to its
configuration and provided mapping tables (master device has some kind
of internal DMA controller and performs DMA transactions on their own).
IOMMU is usually used for a) mapping physically discontinuous memory into
contiguous DMA addresses and b) isolating devices, so they can access
only memory, which is dedicated or allocated for them.

DMA engine devices provide explicit API for their master's device drivers,
while IOMMU drivers are usually hidden behind DMA-mapping API (for most
use cases, although it would be possible for master's device driver to
call IOMMU API directly and some GPU/DRM drivers do that).

However from runtime pm perspective the DMA engine and IOMMU devices are
a bit different.

DMA engine drivers have well defined start and end of operation (queuing
dma request and irq from hw about having it finished). During that time
the device has to be runtime active all the time. The problem with using
current implementation of runtime pm is the fact that both start and end
of operation can be triggered from atomic context, what is not really
suitable for runtime pm. So the problem is mainly about API
incompatibility and lack of something like dma_engine_prepare()/unprepare()
(as an analogy to clocks api).

That's also a viable option. Although, DMA clients would then need to
invoke such APIs from non-atomic contexts. Typically that would be
from client driver's runtime PM callbacks.

Me personally would rather avoid such solution, as it would sprinkle
lots of drivers to deal with this. Although, perhaps this is the only
option that actually works.


One might also introduce optional functions and notify DMA engine core with
some flag that the client driver wants to use them or not. If not core will
prepare dma engine on initialization. This is not really nice from the API
clearness perspective, but it would allow to have some time for transition
to the new approach till all clients gets updated.


In case of IOMMU the main problem is determining weather IOMMU controller
has to be activated. There is no calls in IOMMU and DMA-mapping API, which
would bracket all DMA transactions performed by the master device. Someone
proposed to keep IOMMU runtime active when there exist at least one
mapping created by the IOMMU/DMA-mapping layers. This however does not
cover all the cases. In case of our IOMMU, when it is disabled or runtime
suspended, it enters "pass-thought" mode, so master device can still
perform DMA operations with identity mappings (so DMA address equals to
physical memory address). Till now Exynos IOMMU called pm_runtime_get()
on attaching to the iommu domain (what happens during initialization of
dma-mapping structures for given master device) and kept it active all
the time.

This patch series tries to address Exynos IOMMU runtime pm issue by forcing
IOMMU controller to follow runtime pm status of its master device. This way
we ensure that anytime when master's device is runtime activated, the iommu
will be also active and master device won't be able to bypass during its
DMA transactions mappings created by the IOMMU layer.

Quite long answer, but I hope I managed to give you a bit more background
on this topic.

Yes, indeed. Thank you for taking the time to respond!


As we know, using the pm_runtime_irq_safe() option comes with some
limitations, such as the runtime PM callbacks is not allowed to sleep.
For a PM domain (genpd) that is attached to the device, this also
means it must not be powered off.


Right, if possible I would like to avoid using pm_runtime_irq_safe()
option, because it is really impractical.


To solve this problem, I was thinking we could convert to use the
asynchronous pm_runtime_get() API, when trying to runtime resume the
device from atomic contexts.

I'm not sure if this will work for DMA engine devices. If I understand
correctly some client's of DMA engine device might rely on the DMA
engine being configured and operational after queuing a request and
they might lock up if the DMA engine device activation if postponed
because of async runtime pm activation.

I didn't know about this. If you have an example from the top of your
head, could you perhaps point me to it?


No, I don't have any example. This is just my fear that there might be some
hardware which works this way. I can imagine a driver, which queue dma 
engine

request and then triggers 'start' command to its hw block. HW blocks usually
uses some additional signal lines for DMA, so the HW block might check 

Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Robin Murphy
On 14/09/16 09:41, Auger Eric wrote:
> Hi,
> 
> On 12/09/2016 18:13, Robin Murphy wrote:
>> Hi all,
>>
>> To any more confusing fixups and crazily numbered extra patches, here's
>> a quick v7 with everything rebased into the right order. The significant
>> change this time is to implement iommu_fwspec properly from the start,
>> which ends up being far simpler and more robust than faffing about
>> introducing it somewhere 'less intrusive' to move toward core code later.
>>
>> New branch in the logical place:
>>
>> git://linux-arm.org/linux-rm iommu/generic-v7
> 
> For information, as discussed privately with Robin I experience some
> regressions with the former and now deprecated dt description.
> 
> on my AMD Overdrive board and my old dt description I now only see a
> single group:
> 
> /sys/kernel/iommu_groups/
> /sys/kernel/iommu_groups/0
> /sys/kernel/iommu_groups/0/devices
> /sys/kernel/iommu_groups/0/devices/e070.xgmac
> 
> whereas I formerly see
> 
> /sys/kernel/iommu_groups/
> /sys/kernel/iommu_groups/3
> /sys/kernel/iommu_groups/3/devices
> /sys/kernel/iommu_groups/3/devices/:00:00.0
> /sys/kernel/iommu_groups/1
> /sys/kernel/iommu_groups/1/devices
> /sys/kernel/iommu_groups/1/devices/e070.xgmac
> /sys/kernel/iommu_groups/4
> /sys/kernel/iommu_groups/4/devices
> /sys/kernel/iommu_groups/4/devices/:00:02.2
> /sys/kernel/iommu_groups/4/devices/:01:00.1
> /sys/kernel/iommu_groups/4/devices/:00:02.0
> /sys/kernel/iommu_groups/4/devices/:01:00.0
> /sys/kernel/iommu_groups/2
> /sys/kernel/iommu_groups/2/devices
> /sys/kernel/iommu_groups/2/devices/e090.xgmac
> /sys/kernel/iommu_groups/0
> /sys/kernel/iommu_groups/0/devices
> /sys/kernel/iommu_groups/0/devices/f000.pcie
> 
> This is the group topology without ACS override. Applying the non
> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
> to see separate groups for each PCIe components. Now I don't see any
> difference with and without ACS override.

OK, having reproduced on my Juno, the problem looks to be that
of_for_each_phandle() leaves err set to -ENOENT after successfully
walking a phandle list, which makes __find_legacy_master_phandle()
always bail out after the first SMMU.

Can you confirm that the following diff fixes things for you?

Robin

--->8---
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fa892d25004d..ac4aab97c93a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -477,7 +477,7 @@ static int __find_legacy_master_phandle(struct
device *dev, void *data)
return 1;
}
it->node = np;
-   return err;
+   return err == -ENOENT ? 0 : err;
 }

 static struct platform_driver arm_smmu_driver;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] iommu/exynos: Add proper runtime pm support

2016-09-14 Thread Ulf Hansson
[...]

>
>
> There are some similarities between IOMMU and DMA engine devices (serial
> drivers are imho completely different case). Both hw blocks do their work
> on behalf of some other hardware block, which I will call master device.
> DMA engine performs some DMA transaction on master's device request, while
> IOMMU usually sits between system memory and master's device memory
> interface, remapping addresses of each DMA transaction according to its
> configuration and provided mapping tables (master device has some kind
> of internal DMA controller and performs DMA transactions on their own).
> IOMMU is usually used for a) mapping physically discontinuous memory into
> contiguous DMA addresses and b) isolating devices, so they can access
> only memory, which is dedicated or allocated for them.
>
> DMA engine devices provide explicit API for their master's device drivers,
> while IOMMU drivers are usually hidden behind DMA-mapping API (for most
> use cases, although it would be possible for master's device driver to
> call IOMMU API directly and some GPU/DRM drivers do that).
>
> However from runtime pm perspective the DMA engine and IOMMU devices are
> a bit different.
>
> DMA engine drivers have well defined start and end of operation (queuing
> dma request and irq from hw about having it finished). During that time
> the device has to be runtime active all the time. The problem with using
> current implementation of runtime pm is the fact that both start and end
> of operation can be triggered from atomic context, what is not really
> suitable for runtime pm. So the problem is mainly about API
> incompatibility and lack of something like dma_engine_prepare()/unprepare()
> (as an analogy to clocks api).

That's also a viable option. Although, DMA clients would then need to
invoke such APIs from non-atomic contexts. Typically that would be
from client driver's runtime PM callbacks.

Me personally would rather avoid such solution, as it would sprinkle
lots of drivers to deal with this. Although, perhaps this is the only
option that actually works.

>
> In case of IOMMU the main problem is determining weather IOMMU controller
> has to be activated. There is no calls in IOMMU and DMA-mapping API, which
> would bracket all DMA transactions performed by the master device. Someone
> proposed to keep IOMMU runtime active when there exist at least one
> mapping created by the IOMMU/DMA-mapping layers. This however does not
> cover all the cases. In case of our IOMMU, when it is disabled or runtime
> suspended, it enters "pass-thought" mode, so master device can still
> perform DMA operations with identity mappings (so DMA address equals to
> physical memory address). Till now Exynos IOMMU called pm_runtime_get()
> on attaching to the iommu domain (what happens during initialization of
> dma-mapping structures for given master device) and kept it active all
> the time.
>
> This patch series tries to address Exynos IOMMU runtime pm issue by forcing
> IOMMU controller to follow runtime pm status of its master device. This way
> we ensure that anytime when master's device is runtime activated, the iommu
> will be also active and master device won't be able to bypass during its
> DMA transactions mappings created by the IOMMU layer.
>
> Quite long answer, but I hope I managed to give you a bit more background
> on this topic.

Yes, indeed. Thank you for taking the time to respond!

>
>> As we know, using the pm_runtime_irq_safe() option comes with some
>> limitations, such as the runtime PM callbacks is not allowed to sleep.
>> For a PM domain (genpd) that is attached to the device, this also
>> means it must not be powered off.
>
>
> Right, if possible I would like to avoid using pm_runtime_irq_safe()
> option, because it is really impractical.
>
>> To solve this problem, I was thinking we could convert to use the
>> asynchronous pm_runtime_get() API, when trying to runtime resume the
>> device from atomic contexts.
>
>
> I'm not sure if this will work for DMA engine devices. If I understand
> correctly some client's of DMA engine device might rely on the DMA
> engine being configured and operational after queuing a request and
> they might lock up if the DMA engine device activation if postponed
> because of async runtime pm activation.

I didn't know about this. If you have an example from the top of your
head, could you perhaps point me to it?

[...]

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Don't put completion-wait semaphore on stack

2016-09-14 Thread Joerg Roedel
From: Joerg Roedel 

The semaphore used by the AMD IOMMU to signal command
completion lived on the stack until now, which was safe as
the driver busy-waited on the semaphore with IRQs disabled,
so the stack can't go away under the driver.

But the recently introduced vmap-based stacks break this as
the physical address of the semaphore can't be determinded
easily anymore. The driver used the __pa() macro, but that
only works in the direct-mapping. The result were
Completion-Wait timeout errors seen by the IOMMU driver,
breaking system boot.

Since putting the semaphore on the stack is bad design
anyway, move the semaphore into 'struct amd_iommu'. It is
protected by the per-iommu lock and now in the direct
mapping again. This fixes the Completion-Wait timeout errors
and makes AMD IOMMU systems boot again with vmap-based
stacks enabled.

Reported-by: Borislav Petkov 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c   | 14 --
 drivers/iommu/amd_iommu_types.h |  2 ++
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 96de97a..1307e73 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -957,15 +957,16 @@ again:
 
if (left <= 2) {
struct iommu_cmd sync_cmd;
-   volatile u64 sem = 0;
int ret;
 
-   build_completion_wait(_cmd, (u64));
+   iommu->cmd_sem = 0;
+
+   build_completion_wait(_cmd, (u64)>cmd_sem);
copy_cmd_to_buffer(iommu, _cmd, tail);
 
spin_unlock_irqrestore(>lock, flags);
 
-   if ((ret = wait_on_sem()) != 0)
+   if ((ret = wait_on_sem(>cmd_sem)) != 0)
return ret;
 
goto again;
@@ -993,19 +994,20 @@ static int iommu_queue_command(struct amd_iommu *iommu, 
struct iommu_cmd *cmd)
 static int iommu_completion_wait(struct amd_iommu *iommu)
 {
struct iommu_cmd cmd;
-   volatile u64 sem = 0;
int ret;
 
if (!iommu->need_sync)
return 0;
 
-   build_completion_wait(, (u64));
+   iommu->cmd_sem = 0;
+
+   build_completion_wait(, (u64)>cmd_sem);
 
ret = iommu_queue_command_sync(iommu, , false);
if (ret)
return ret;
 
-   return wait_on_sem();
+   return wait_on_sem(>cmd_sem);
 }
 
 static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index caf5e38..9652848 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -524,6 +524,8 @@ struct amd_iommu {
struct irq_domain *ir_domain;
struct irq_domain *msi_domain;
 #endif
+
+   volatile u64 __aligned(8) cmd_sem;
 };
 
 #define ACPIHID_UID_LEN 256
-- 
2.6.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi Will

On 14/09/2016 11:20, Will Deacon wrote:
> On Wed, Sep 14, 2016 at 10:41:57AM +0200, Auger Eric wrote:
>> Hi,
> 
> Hi Eric,
> 
>> On 12/09/2016 18:13, Robin Murphy wrote:
>>> To any more confusing fixups and crazily numbered extra patches, here's
>>> a quick v7 with everything rebased into the right order. The significant
>>> change this time is to implement iommu_fwspec properly from the start,
>>> which ends up being far simpler and more robust than faffing about
>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>
>>> New branch in the logical place:
>>>
>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>
>> For information, as discussed privately with Robin I experience some
>> regressions with the former and now deprecated dt description.
> 
> Please can you share the DT you're using so we can reproduce this locally?

Done already

Thanks

Eric
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Will Deacon
On Wed, Sep 14, 2016 at 10:41:57AM +0200, Auger Eric wrote:
> Hi,

Hi Eric,

> On 12/09/2016 18:13, Robin Murphy wrote:
> > To any more confusing fixups and crazily numbered extra patches, here's
> > a quick v7 with everything rebased into the right order. The significant
> > change this time is to implement iommu_fwspec properly from the start,
> > which ends up being far simpler and more robust than faffing about
> > introducing it somewhere 'less intrusive' to move toward core code later.
> > 
> > New branch in the logical place:
> > 
> > git://linux-arm.org/linux-rm iommu/generic-v7
> 
> For information, as discussed privately with Robin I experience some
> regressions with the former and now deprecated dt description.

Please can you share the DT you're using so we can reproduce this locally?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi,

On 12/09/2016 18:13, Robin Murphy wrote:
> Hi all,
> 
> To any more confusing fixups and crazily numbered extra patches, here's
> a quick v7 with everything rebased into the right order. The significant
> change this time is to implement iommu_fwspec properly from the start,
> which ends up being far simpler and more robust than faffing about
> introducing it somewhere 'less intrusive' to move toward core code later.
> 
> New branch in the logical place:
> 
> git://linux-arm.org/linux-rm iommu/generic-v7

For information, as discussed privately with Robin I experience some
regressions with the former and now deprecated dt description.

on my AMD Overdrive board and my old dt description I now only see a
single group:

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/e070.xgmac

whereas I formerly see

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:01:00.1
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/4/devices/:01:00.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/f000.pcie

This is the group topology without ACS override. Applying the non
upstreamed "pci: Enable overrides for missing ACS capabilities" I used
to see separate groups for each PCIe components. Now I don't see any
difference with and without ACS override.

Thanks

Eric
> 
> Robin.
> 
> Mark Rutland (1):
>   Docs: dt: add PCI IOMMU map bindings
> 
> Robin Murphy (21):
>   of/irq: Break out msi-map lookup (again)
>   iommu/of: Handle iommu-map property for PCI
>   iommu: Introduce iommu_fwspec
>   Docs: dt: document ARM SMMUv3 generic binding usage
>   iommu/arm-smmu: Fall back to global bypass
>   iommu/arm-smmu: Implement of_xlate() for SMMUv3
>   iommu/arm-smmu: Support non-PCI devices with SMMUv3
>   iommu/arm-smmu: Set PRIVCFG in stage 1 STEs
>   iommu/arm-smmu: Handle stream IDs more dynamically
>   iommu/arm-smmu: Consolidate stream map entry state
>   iommu/arm-smmu: Keep track of S2CR state
>   iommu/arm-smmu: Refactor mmu-masters handling
>   iommu/arm-smmu: Streamline SMMU data lookups
>   iommu/arm-smmu: Add a stream map entry iterator
>   iommu/arm-smmu: Intelligent SMR allocation
>   iommu/arm-smmu: Convert to iommu_fwspec
>   Docs: dt: document ARM SMMU generic binding usage
>   iommu/arm-smmu: Wire up generic configuration support
>   iommu/arm-smmu: Set domain geometry
>   iommu/dma: Add support for mapping MSIs
>   iommu/dma: Avoid PCI host bridge windows
> 
>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |   8 +-
>  .../devicetree/bindings/iommu/arm,smmu.txt |  63 +-
>  .../devicetree/bindings/pci/pci-iommu.txt  | 171 
>  arch/arm64/mm/dma-mapping.c|   2 +-
>  drivers/gpu/drm/exynos/exynos_drm_iommu.h  |   2 +-
>  drivers/iommu/Kconfig  |   2 +-
>  drivers/iommu/arm-smmu-v3.c| 386 +
>  drivers/iommu/arm-smmu.c   | 962 
> ++---
>  drivers/iommu/dma-iommu.c  | 161 +++-
>  drivers/iommu/iommu.c  |  56 ++
>  drivers/iommu/of_iommu.c   |  52 +-
>  drivers/irqchip/irq-gic-v2m.c  |   3 +
>  drivers/irqchip/irq-gic-v3-its.c   |   3 +
>  drivers/of/irq.c   |  78 +-
>  drivers/of/of_pci.c| 102 +++
>  include/linux/device.h |   3 +
>  include/linux/dma-iommu.h  |  12 +-
>  include/linux/iommu.h  |  38 +
>  include/linux/of_pci.h |  10 +
>  19 files changed, 1323 insertions(+), 791 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] iommu/exynos: Add proper runtime pm support

2016-09-14 Thread Marek Szyprowski

Hi Ulf,

On 2016-09-13 16:20, Ulf Hansson wrote:

On 13 September 2016 at 14:49, Marek Szyprowski
 wrote:

This patch uses recently introduced device links to track the runtime pm
state of the master's device. This way each SYSMMU controller is runtime
activated when its master's device is active and can save/restore its state
instead of being enabled all the time. This way SYSMMU controllers no
longer prevents respective power domains to be turned off when master's
device is not used.

Apologize for not reviewing earlier and if you find my
questions/suggestions being silly. You may ignore them, if you don't
think they deserves a proper answer. :-)


No problem. There are no silly questions, but there might be some silly
answers ;)


I am not so familiar with the IOMMU subsystem, but I am wondering
whether the issue you are solving, is similar to what can be observed
for DMA and serial drivers. And of course also for other IOMMU
drivers.

In general the DMA/serial drivers requires to use the
pm_runtime_irq_safe() option, to be able to easily deploy runtime PM
support (of course there are some other workarounds as well).


There are some similarities between IOMMU and DMA engine devices (serial
drivers are imho completely different case). Both hw blocks do their work
on behalf of some other hardware block, which I will call master device.
DMA engine performs some DMA transaction on master's device request, while
IOMMU usually sits between system memory and master's device memory
interface, remapping addresses of each DMA transaction according to its
configuration and provided mapping tables (master device has some kind
of internal DMA controller and performs DMA transactions on their own).
IOMMU is usually used for a) mapping physically discontinuous memory into
contiguous DMA addresses and b) isolating devices, so they can access
only memory, which is dedicated or allocated for them.

DMA engine devices provide explicit API for their master's device drivers,
while IOMMU drivers are usually hidden behind DMA-mapping API (for most
use cases, although it would be possible for master's device driver to
call IOMMU API directly and some GPU/DRM drivers do that).

However from runtime pm perspective the DMA engine and IOMMU devices are
a bit different.

DMA engine drivers have well defined start and end of operation (queuing
dma request and irq from hw about having it finished). During that time
the device has to be runtime active all the time. The problem with using
current implementation of runtime pm is the fact that both start and end
of operation can be triggered from atomic context, what is not really
suitable for runtime pm. So the problem is mainly about API
incompatibility and lack of something like dma_engine_prepare()/unprepare()
(as an analogy to clocks api).

In case of IOMMU the main problem is determining weather IOMMU controller
has to be activated. There is no calls in IOMMU and DMA-mapping API, which
would bracket all DMA transactions performed by the master device. Someone
proposed to keep IOMMU runtime active when there exist at least one
mapping created by the IOMMU/DMA-mapping layers. This however does not
cover all the cases. In case of our IOMMU, when it is disabled or runtime
suspended, it enters "pass-thought" mode, so master device can still
perform DMA operations with identity mappings (so DMA address equals to
physical memory address). Till now Exynos IOMMU called pm_runtime_get()
on attaching to the iommu domain (what happens during initialization of
dma-mapping structures for given master device) and kept it active all
the time.

This patch series tries to address Exynos IOMMU runtime pm issue by forcing
IOMMU controller to follow runtime pm status of its master device. This way
we ensure that anytime when master's device is runtime activated, the iommu
will be also active and master device won't be able to bypass during its
DMA transactions mappings created by the IOMMU layer.

Quite long answer, but I hope I managed to give you a bit more background
on this topic.


As we know, using the pm_runtime_irq_safe() option comes with some
limitations, such as the runtime PM callbacks is not allowed to sleep.
For a PM domain (genpd) that is attached to the device, this also
means it must not be powered off.


Right, if possible I would like to avoid using pm_runtime_irq_safe()
option, because it is really impractical.


To solve this problem, I was thinking we could convert to use the
asynchronous pm_runtime_get() API, when trying to runtime resume the
device from atomic contexts.


I'm not sure if this will work for DMA engine devices. If I understand
correctly some client's of DMA engine device might rely on the DMA
engine being configured and operational after queuing a request and
they might lock up if the DMA engine device activation if postponed
because of async runtime pm activation.


Of course when it turns out that the device isn't yet runtime resumed