Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-04-27 Thread Dave Young
On 04/27/17 at 08:52am, Dave Hansen wrote:
> On 04/27/2017 12:25 AM, Dave Young wrote:
> > On 04/21/17 at 02:55pm, Dave Hansen wrote:
> >> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> >>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> >>> determine if SME is active.
> >>>
> >>> A new directory will be created:
> >>>   /sys/kernel/mm/sme/
> >>>
> >>> And two entries within the new directory:
> >>>   /sys/kernel/mm/sme/active
> >>>   /sys/kernel/mm/sme/encryption_mask
> >>
> >> Why do they care, and what will they be doing with this information?
> > 
> > Since kdump will copy old memory but need this to know if the old memory
> > was encrypted or not. With this sysfs file we can know the previous SME
> > status and pass to kdump kernel as like a kernel param.
> > 
> > Tom, have you got chance to try if it works or not?
> 
> What will the kdump kernel do with it though?  We kexec() into that
> kernel so the SME keys will all be the same, right?  So, will the kdump
> kernel be just setting the encryption bit in the PTE so it can copy the
> old plaintext out?

I assume it is for active -> non active case, the new boot need to know
the old memory is encrypted. But I think I did not read all the patches
I may miss things.

> 
> Why do we need both 'active' and 'encryption_mask'?  How could it be
> that the hardware-enumerated 'encryption_mask' changes across a kexec()?

Leave this question to Tom..

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/s390: Fix IOMMU groups

2017-04-27 Thread Joerg Roedel
On Thu, Apr 27, 2017 at 08:11:42PM +0200, Gerald Schaefer wrote:
> > +void zpci_destroy_iommu(struct zpci_dev *zdev)
> > +{
> > +   iommu_group_put(zdev->group);
> > +   zdev->group = NULL;
> > +}
> 
> While the rest of this patch doesn't seem to make much of a difference to
> the current behavior, I'm wondering where this extra iommu_group_put()
> comes from. It either was erroneously missing before this patch, or it
> is erroneously introduced by this patch.

This is the way to free an iommu-group. It was missing before probably
because it was unclear whether the add_device function allocated a group
or not. So there was no way to know if it needs to be put again in the
remove_device function.

With this patch the iommu-group is explicitly allocated when the
zpci_dev is created and destroyed again with it.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/2] iommu/s390: Fix iommu-groups and add sysfs support

2017-04-27 Thread Joerg Roedel
Hi Gerald,

thanks for your reply. I have some more questions, please see below.

On Thu, Apr 27, 2017 at 08:10:18PM +0200, Gerald Schaefer wrote:

> Well, there is a separate zpci_dev for each pci_dev on s390,
> and each of those has its own separate dma-table (thus not shared).

Is that true for all functions of a PCIe card, so does every function of
a device has its own zpci_dev structure and thus its own DMA-table?

My assumption came from the fact that the zpci_dev is read from
pci_dev->sysdata, which is propagated there from the pci_bridge
through the pci_root_bus structures.

> Given this "separate zpci_dev for each pci_dev" situation, I don't
> see what this update actually changes, compared to the previous code,
> see also my comments to that patch.

The add_device call-back is invoked for every function of a pci-device,
because each function gets its own pci_dev structure. Also we usually
group all functions of a PCI-device together into one iommu-group,
because we don't trust that the device isolates its functions from each
other.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/s390: Fix IOMMU groups

2017-04-27 Thread Gerald Schaefer
On Thu, 27 Apr 2017 17:28:24 +0200
Joerg Roedel  wrote:

> From: Joerg Roedel 
> 
> Currently the s390 iommu driver allocates an iommu-group for
> every device that is added. But that is wrong, as there is
> only one dma-table per pci-root-bus. Make all devices behind
> one dma-table share one iommu-group.

On s390, each PCI device has its own zpci_dev structure, and also its own
DMA address space. Even with this patch, you'll still allocate an
iommu_group for every device that is added, see below, so I do not really
see the benefit of this (but my knowledge got a little rusty, so I may be
missing something).

The only reason why we allow the theoretical option to attach more than
one device to the same IOMMU domain (and thus address space), is because
it was a requirement by you at that time (IIRC). We have no use-case for
this, and even in this theoretical scenario you would still have separate
zpci_dev structures for the affected devices that share the same IOMMU
domain (DMA address space), and thus also separate IOMMU groups, at least
after this patch.

Before this patch, you could in theory have different PCI devices in the
same IOMMU group, by having iommu_group_get() in s390_iommu_add_device()
return a group != NULL. With this patch, you will enforce that a new
iommu_group is allocated for every device, which would be the contrary
of what the description says.

> 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/s390/include/asm/pci.h |  7 +++
>  arch/s390/pci/pci.c | 10 +-
>  drivers/iommu/s390-iommu.c  | 43 ---
>  3 files changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 4e31866..045665d 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -8,6 +8,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -123,6 +124,8 @@ struct zpci_dev {
>   unsigned long   iommu_pages;
>   unsigned intnext_bit;
> 
> + struct iommu_group *group;  /* IOMMU group for all devices behind 
> this zdev */

There is always only one device behind a zpci_dev, so this comment makes
no sense.

> +
>   char res_name[16];
>   struct zpci_bar_struct bars[PCI_BAR_COUNT];
> 
> @@ -173,6 +176,10 @@ static inline bool zdev_enabled(struct zpci_dev *zdev)
>  int clp_enable_fh(struct zpci_dev *, u8);
>  int clp_disable_fh(struct zpci_dev *);
> 
> +/* IOMMU Interface */
> +int zpci_init_iommu(struct zpci_dev *zdev);
> +void zpci_destroy_iommu(struct zpci_dev *zdev);
> +
>  #ifdef CONFIG_PCI
>  /* Error handling and recovery */
>  void zpci_event_error(void *);
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 364b9d8..3178e4d 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -754,6 +754,7 @@ void pcibios_remove_bus(struct pci_bus *bus)
> 
>   zpci_exit_slot(zdev);
>   zpci_cleanup_bus_resources(zdev);
> + zpci_destroy_iommu(zdev);
>   zpci_free_domain(zdev);
> 
>   spin_lock(_list_lock);
> @@ -825,11 +826,15 @@ int zpci_create_device(struct zpci_dev *zdev)
>   if (rc)
>   goto out;
> 
> + rc = zpci_init_iommu(zdev);
> + if (rc)
> + goto out_free;

This will get called for every device that is added, and every device
will get its own zpci_dev, so this will still result in allocating
an IOMMU group for every device.

> +
>   mutex_init(>lock);
>   if (zdev->state == ZPCI_FN_STATE_CONFIGURED) {
>   rc = zpci_enable_device(zdev);
>   if (rc)
> - goto out_free;
> + goto out_iommu;
>   }
>   rc = zpci_scan_bus(zdev);
>   if (rc)
> @@ -846,6 +851,9 @@ int zpci_create_device(struct zpci_dev *zdev)
>  out_disable:
>   if (zdev->state == ZPCI_FN_STATE_ONLINE)
>   zpci_disable_device(zdev);
> +out_iommu:
> + zpci_destroy_iommu(zdev);
> +
>  out_free:
>   zpci_free_domain(zdev);
>  out:
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 179e636..cad3ad0 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -163,22 +163,22 @@ static void s390_iommu_detach_device(struct 
> iommu_domain *domain,
>   }
>  }
> 
> -static int s390_iommu_add_device(struct device *dev)
> +static struct iommu_group *s390_iommu_device_group(struct device *dev)
>  {
> - struct iommu_group *group;
> - int rc;
> + struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
> 
> - group = iommu_group_get(dev);
> - if (!group) {
> - group = iommu_group_alloc();
> - if (IS_ERR(group))
> - return PTR_ERR(group);
> - }
> + return zdev->group;
> +}
> +
> +static int s390_iommu_add_device(struct device *dev)
> +{
> + struct iommu_group *group = iommu_group_get_for_dev(dev);
> + if 

Re: [RFC PATCH 0/2] iommu/s390: Fix iommu-groups and add sysfs support

2017-04-27 Thread Gerald Schaefer
On Thu, 27 Apr 2017 17:28:23 +0200
Joerg Roedel  wrote:

> Hey,
> 
> here are two patches for the s390 PCI and IOMMU code. It is
> based on the assumption that every pci_dev that points to
> the same zpci_dev shares a single dma-table (and thus a
> single address space).

Well, there is a separate zpci_dev for each pci_dev on s390,
and each of those has its own separate dma-table (thus not shared).

> 
> If this assupmtion is true (as it looks to me from reading
> the code) then the iommu-group setup code in the s390 iommu
> driver needs to be updated.

Given this "separate zpci_dev for each pci_dev" situation, I don't
see what this update actually changes, compared to the previous code,
see also my comments to that patch.

> 
> These patches do this and also add support for the
> iommu_device_register interface to the s390 iommu driver.
> 
> Any comments and testing appreciated.
> 
> Thanks,
> 
>   Joerg
> 
> Joerg Roedel (2):
>   iommu/s390: Fix IOMMU groups
>   iommu/s390: Add support for iommu_device handling
> 
>  arch/s390/include/asm/pci.h |  8 +
>  arch/s390/pci/pci.c | 10 ++-
>  drivers/iommu/s390-iommu.c  | 71 
> ++---
>  3 files changed, 78 insertions(+), 11 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-04-27 Thread Will Deacon
On Thu, Apr 27, 2017 at 05:42:37PM +0100, Mark Rutland wrote:
> On Thu, Apr 27, 2017 at 05:16:23PM +0530, Geetha sowjanya wrote:
> > +   /*
> > +* Override the size, for Cavium CN99xx implementations
> > +* which doesn't support the page 1 SMMU register space.
> > +*/
> > +   cpu_model = read_cpuid_id() & MIDR_CPU_MODEL_MASK;
> > +   if (cpu_model == MIDR_THUNDERX_99XX ||
> > +   cpu_model == MIDR_BRCM_VULCAN)
> > +   size = SZ_64K;
> 
> If you're trying to identify an SMMU erratum, identify the SMMU, not the
> CPU it happens to be paired with this time.
> 
> There are ID registers in the SMMU you can use to do so.
> 
> NAK to using the CPU ID here.

Agreed. I had some off-list discussion with Geetha where we agreed to use
the "silicon ID", which I assumed was the SMMU IIDR register.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-04-27 Thread Mark Rutland
On Thu, Apr 27, 2017 at 05:16:23PM +0530, Geetha sowjanya wrote:
> + /*
> +  * Override the size, for Cavium CN99xx implementations
> +  * which doesn't support the page 1 SMMU register space.
> +  */
> + cpu_model = read_cpuid_id() & MIDR_CPU_MODEL_MASK;
> + if (cpu_model == MIDR_THUNDERX_99XX ||
> + cpu_model == MIDR_BRCM_VULCAN)
> + size = SZ_64K;

If you're trying to identify an SMMU erratum, identify the SMMU, not the
CPU it happens to be paired with this time.

There are ID registers in the SMMU you can use to do so.

NAK to using the CPU ID here.

Mark.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Cavium ThunderX2 SMMUv3 errata workarounds

2017-04-27 Thread Sunil Kovvuri
On Thu, Apr 27, 2017 at 7:09 PM, Robert Richter
 wrote:
> On 27.04.17 17:16:21, Geetha sowjanya wrote:
>> From: Geetha 
>>
>> Cavium CN99xx SMMUv3 implementation has two Silicon Erratas.
>> 1. Errata ID #74
>>SMMU register alias Page 1 is not implemented
>> 2. Errata ID #126
>>SMMU doesnt support unique IRQ lines for gerror, eventq and cmdq-sync
>>
>> The following patchset does software workaround for these two erratas.
>>
>> This series is based on RFC patch.
>> https://www.spinics.net/lists/arm-kernel/msg575739.html
>>
>> As suggested by Will Deacon, code is modified to use silicon id to
>> enable errata#74 workaround.
>
> Can we go with the previous series [1] and:
>
>  * drop the iort model numbering part,
>
>  * add an enablement function that enables flags (smmu->options)
>depending on midr values (which replaces the macro code)?

I don't see how it is efficient and consistent, if we take data from DT
for non-ACPI mode and read CPU ID from MIDR for ACPI mode.

Thanks,
Sunil.

>
> E.g.:
>
> static void acpi_smmu_enable_cavium(struct arm_smmu_device *smmu)
> {
> u32 cpu_model;
>
> if (!IS_ENABLED(CONFIG_ARM64))
> return;
>
> cpu_model = read_cpuid_id() & MIDR_CPU_MODEL_MASK;
> switch (cpu_model) {
> case ...:
> case ...:
> break;
> default:
> /* No Cavium CN99xx SMMU v3 */
> return;
> }
>
> smmu->options |= (ARM_SMMU_OPT_PAGE0_REGS_ONLY |
>  ARM_SMMU_OPT_USE_SHARED_IRQS);
> }
>
> -Robert
>
>
> [1] [RFC PATCH 0/7] Cavium CN99xx SMMUv3 Errata workarounds
> https://marc.info/?l=linux-acpi=149192179623708=2
>
>>
>> Linu Cherian (1):
>>   iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
>>
>> Geetha (2):
>>   arm64: Add MIDR values for Cavium cn99xx SoCs
>>   iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
>>
>>  Documentation/arm64/silicon-errata.txt |  2 ++
>>  arch/arm64/include/asm/cputype.h   |  3 ++
>>  drivers/acpi/arm64/iort.c  | 14 +++-
>>  drivers/iommu/arm-smmu-v3.c| 64 
>> +-
>>  4 files changed, 73 insertions(+), 10 deletions(-)
>>
>> --
>> 1.9.1
>>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Does a new booting kernel by "kexec -l" need to copy IR table from previous kernel?

2017-04-27 Thread Raj, Ashok
Hi Joerg,

On Thu, Apr 27, 2017 at 06:12:38PM +0200, j...@8bytes.org wrote:
> On Thu, Apr 27, 2017 at 03:34:06PM +, Zhuo, Qiuxu wrote:
> > It looks like the printk is misleading and it’s nothing actually
> > failed, but just it isn’t copying if the new kernel is not a kdump
> > kernel.
> 
> Yes, that is true. But the messages are harmless and you are safe to
> ignore them in your usecase. We only care about the kdump case when

Looks like maybe sanitizing the message to make it less disturbing 
should be enough :-) 

> copying the old IR and DMAR tables into the new kernel, because in the
> kdump case the old kernel crashed and left devices in an undefined
> state, so that they might still send DMA and IRQ requests to the new
> kernel, corrupting data or causing spurious/blocked irqs. Blocked IRQs
> or DMA requests might even break devices so that the new kernel can't
> initialize them anymore.

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption

2017-04-27 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:17:54PM -0500, Tom Lendacky wrote:
> Changes to the existing page table macros will allow the SME support to
> be enabled in a simple fashion with minimal changes to files that use these
> macros.  Since the memory encryption mask will now be part of the regular
> pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
> _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
> without the encryption mask before SME becomes active.  Two new pgprot()
> macros are defined to allow setting or clearing the page encryption mask.

...

> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, 
> unsigned long vaddr,
>   __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>  
>  #ifndef __va
> -#define __va(x)  ((void *)((unsigned 
> long)(x)+PAGE_OFFSET))
> +#define __va(x)  ((void *)(__sme_clr(x) + PAGE_OFFSET))
>  #endif
>  
>  #define __boot_va(x) __va(x)
> diff --git a/arch/x86/include/asm/page_types.h 
> b/arch/x86/include/asm/page_types.h
> index 7bd0099..fead0a5 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -15,7 +15,7 @@
>  #define PUD_PAGE_SIZE(_AC(1, UL) << PUD_SHIFT)
>  #define PUD_PAGE_MASK(~(PUD_PAGE_SIZE-1))
>  
> -#define __PHYSICAL_MASK  ((phys_addr_t)((1ULL << 
> __PHYSICAL_MASK_SHIFT) - 1))
> +#define __PHYSICAL_MASK  ((phys_addr_t)(__sme_clr((1ULL << 
> __PHYSICAL_MASK_SHIFT) - 1)))

That looks strange: poking SME mask hole into a mask...?

>  #define __VIRTUAL_MASK   ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
>  
>  /* Cast *PAGE_MASK to a signed type so that it is sign-extended if

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Does a new booting kernel by "kexec -l" need to copy IR table from previous kernel?

2017-04-27 Thread j...@8bytes.org
On Thu, Apr 27, 2017 at 03:34:06PM +, Zhuo, Qiuxu wrote:
> It looks like the printk is misleading and it’s nothing actually
> failed, but just it isn’t copying if the new kernel is not a kdump
> kernel.

Yes, that is true. But the messages are harmless and you are safe to
ignore them in your usecase. We only care about the kdump case when
copying the old IR and DMAR tables into the new kernel, because in the
kdump case the old kernel crashed and left devices in an undefined
state, so that they might still send DMA and IRQ requests to the new
kernel, corrupting data or causing spurious/blocked irqs. Blocked IRQs
or DMA requests might even break devices so that the new kernel can't
initialize them anymore.

That is why we want to make sure, that no spurious IRQ or DMA requests
are blocked when the new kernel initializes the IOMMU. In the non-kdump
case we can assume that the old kernel put the devices into a defined
state where they are not sending spurious requests.

>For kdump kernel can we just reinitializing IR table(as like normal
> kernel boot from power on) to handle the “spurious interrupts” issue instead 
> of
> copying IR  table from previous kernel?
> 
>For booting a new kernel by “kexec -l” (my test case), do we still need
> to copy IR table from previous kernel to handle the “spurious interrupts”
> issue?

As I said, for your case there is no need to do the copying and you can
ignore the messages.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-04-27 Thread Dave Hansen
On 04/27/2017 12:25 AM, Dave Young wrote:
> On 04/21/17 at 02:55pm, Dave Hansen wrote:
>> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>>>
>>> A new directory will be created:
>>>   /sys/kernel/mm/sme/
>>>
>>> And two entries within the new directory:
>>>   /sys/kernel/mm/sme/active
>>>   /sys/kernel/mm/sme/encryption_mask
>>
>> Why do they care, and what will they be doing with this information?
> 
> Since kdump will copy old memory but need this to know if the old memory
> was encrypted or not. With this sysfs file we can know the previous SME
> status and pass to kdump kernel as like a kernel param.
> 
> Tom, have you got chance to try if it works or not?

What will the kdump kernel do with it though?  We kexec() into that
kernel so the SME keys will all be the same, right?  So, will the kdump
kernel be just setting the encryption bit in the PTE so it can copy the
old plaintext out?

Why do we need both 'active' and 'encryption_mask'?  How could it be
that the hardware-enumerated 'encryption_mask' changes across a kexec()?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support

2017-04-27 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:17:27PM -0500, Tom Lendacky wrote:
> Add support for Secure Memory Encryption (SME). This initial support
> provides a Kconfig entry to build the SME support into the kernel and
> defines the memory encryption mask that will be used in subsequent
> patches to mark pages as encrypted.

...

> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> new file mode 100644
> index 000..d5c4a2b
> --- /dev/null
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -0,0 +1,42 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +

These ifdeffery closing #endif markers look strange:

> +#ifndef __X86_MEM_ENCRYPT_H__
> +#define __X86_MEM_ENCRYPT_H__
> +
> +#ifndef __ASSEMBLY__
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +
> +extern unsigned long sme_me_mask;
> +
> +static inline bool sme_active(void)
> +{
> + return !!sme_me_mask;
> +}
> +
> +#else/* !CONFIG_AMD_MEM_ENCRYPT */
> +
> +#ifndef sme_me_mask
> +#define sme_me_mask  0UL
> +
> +static inline bool sme_active(void)
> +{
> + return false;
> +}
> +#endif

this endif is the sme_me_mask closing one and it has sme_active() in it.
Shouldn't it be:

#ifndef sme_me_mask
#define sme_me_mask  0UL
#endif

and have sme_active below it, in the !CONFIG_AMD_MEM_ENCRYPT branch?

The same thing is in include/linux/mem_encrypt.h

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Does a new booting kernel by "kexec -l" need to copy IR table from previous kernel?

2017-04-27 Thread Zhuo, Qiuxu
Hi Joerg Roedel,

   When we run below command, the kernel message showed below confusing failed 
messages:
 sudo kexec -l /boot/vmlinuz-4.11.0-rc7 
--append=root=UUID=276659dd-77f0-47f9-967c-7643c260b746 
--initrd=/boot/initrd.img-4.11.0-rc7
 sudo kexec -e
  ---
 [ 0.203397] DMAR-IR: Failed to copy IR table for dmar2 from previous kernel
[ 0.211568] DMAR-IR: Failed to copy IR table for dmar1 from previous 
kernel
[ 0.219738] DMAR-IR: Failed to copy IR table for dmar0 from previous 
kernel
[ 0.227906] DMAR-IR: Failed to copy IR table for dmar3 from previous 
kernel
   ---

  The driver "intel_irq_remapping.c" uses "is_kdump_kernel()" API to 
determine whether copy IR table from previous kernel.
  a) When booting a kdump kernel specified by "kexec -p" (loading the new 
kernel for use on panic), the "is_kdump_kernel()" API is true then the driver 
copy IR table from previous kernel.
  b) While when booting a new kernel specified by "kexec -l" (loading the 
new kernel into current kernel) the "is_kdump_kernel()" API is false then the 
driver prints error message "Failed to copy IR table from
previous kernel" directly. It looks like the printk is misleading 
and it's nothing actually failed, but just it isn't copying if the new kernel 
is not a kdump kernel.

   Found that the commit id "af3b358e48115" said:
---
iommu/vt-d: Copy IR table from old kernel when in kdump mode
When we are booting into a kdump kernel and find IR enabled,
copy over the contents of the previous IR table so that
spurious interrupts will not be target aborted.
   ---

   May I ask some questions:
   What are spurious interrupts?
   For kdump kernel can we just reinitializing IR table(as like normal 
kernel boot from power on) to handle the "spurious interrupts" issue instead of 
copying IR  table from previous kernel?
   For booting a new kernel by "kexec -l" (my test case), do we still need 
to copy IR table from previous kernel to handle the "spurious interrupts" issue?

Thanks!
qiuxu




BR
qiuxu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 0/2] iommu/s390: Fix iommu-groups and add sysfs support

2017-04-27 Thread Joerg Roedel
Hey,

here are two patches for the s390 PCI and IOMMU code. It is
based on the assumption that every pci_dev that points to
the same zpci_dev shares a single dma-table (and thus a
single address space).

If this assupmtion is true (as it looks to me from reading
the code) then the iommu-group setup code in the s390 iommu
driver needs to be updated.

These patches do this and also add support for the
iommu_device_register interface to the s390 iommu driver.

Any comments and testing appreciated.

Thanks,

Joerg

Joerg Roedel (2):
  iommu/s390: Fix IOMMU groups
  iommu/s390: Add support for iommu_device handling

 arch/s390/include/asm/pci.h |  8 +
 arch/s390/pci/pci.c | 10 ++-
 drivers/iommu/s390-iommu.c  | 71 ++---
 3 files changed, 78 insertions(+), 11 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/s390: Add support for iommu_device handling

2017-04-27 Thread Joerg Roedel
From: Joerg Roedel 

Add support for the iommu_device_register interface to make
the s390 hardware iommus visible to the iommu core and in
sysfs.

Signed-off-by: Joerg Roedel 
---
 arch/s390/include/asm/pci.h |  1 +
 drivers/iommu/s390-iommu.c  | 30 ++
 2 files changed, 31 insertions(+)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 045665d..8c071af 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -124,6 +124,7 @@ struct zpci_dev {
unsigned long   iommu_pages;
unsigned intnext_bit;
 
+   struct iommu_device iommu_dev;  /* IOMMU core handle */
struct iommu_group *group;  /* IOMMU group for all devices behind 
this zdev */
 
char res_name[16];
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index cad3ad0..ec7f5e4 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -18,6 +18,8 @@
  */
 #define S390_IOMMU_PGSIZES (~0xFFFUL)
 
+static struct iommu_ops s390_iommu_ops;
+
 struct s390_domain {
struct iommu_domain domain;
struct list_headdevices;
@@ -173,10 +175,13 @@ static struct iommu_group *s390_iommu_device_group(struct 
device *dev)
 static int s390_iommu_add_device(struct device *dev)
 {
struct iommu_group *group = iommu_group_get_for_dev(dev);
+   struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
+
if (IS_ERR(group))
return PTR_ERR(group);
 
iommu_group_put(group);
+   iommu_device_link(>iommu_dev, dev);
 
return 0;
 }
@@ -203,6 +208,7 @@ static void s390_iommu_remove_device(struct device *dev)
s390_iommu_detach_device(domain, dev);
}
 
+   iommu_device_unlink(>iommu_dev, dev);
iommu_group_remove_device(dev);
 }
 
@@ -342,13 +348,37 @@ int zpci_init_iommu(struct zpci_dev *zdev)
if (IS_ERR(zdev->group)) {
rc = PTR_ERR(zdev->group);
zdev->group = NULL;
+   goto out_err;
}
 
+   rc = iommu_device_sysfs_add(>iommu_dev, NULL, NULL,
+   "s390-iommu.%08x", zdev->fid);
+   if (rc)
+   goto out_group;
+
+   iommu_device_set_ops(>iommu_dev, _iommu_ops);
+
+   rc = iommu_device_register(>iommu_dev);
+   if (rc)
+   goto out_sysfs;
+
+   return 0;
+
+out_sysfs:
+   iommu_device_sysfs_remove(>iommu_dev);
+
+out_group:
+   iommu_group_put(zdev->group);
+   zdev->group = NULL;
+
+out_err:
return rc;
 }
 
 void zpci_destroy_iommu(struct zpci_dev *zdev)
 {
+   iommu_device_unregister(>iommu_dev);
+   iommu_device_sysfs_remove(>iommu_dev);
iommu_group_put(zdev->group);
zdev->group = NULL;
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/s390: Fix IOMMU groups

2017-04-27 Thread Joerg Roedel
From: Joerg Roedel 

Currently the s390 iommu driver allocates an iommu-group for
every device that is added. But that is wrong, as there is
only one dma-table per pci-root-bus. Make all devices behind
one dma-table share one iommu-group.

Signed-off-by: Joerg Roedel 
---
 arch/s390/include/asm/pci.h |  7 +++
 arch/s390/pci/pci.c | 10 +-
 drivers/iommu/s390-iommu.c  | 43 ---
 3 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 4e31866..045665d 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -123,6 +124,8 @@ struct zpci_dev {
unsigned long   iommu_pages;
unsigned intnext_bit;
 
+   struct iommu_group *group;  /* IOMMU group for all devices behind 
this zdev */
+
char res_name[16];
struct zpci_bar_struct bars[PCI_BAR_COUNT];
 
@@ -173,6 +176,10 @@ static inline bool zdev_enabled(struct zpci_dev *zdev)
 int clp_enable_fh(struct zpci_dev *, u8);
 int clp_disable_fh(struct zpci_dev *);
 
+/* IOMMU Interface */
+int zpci_init_iommu(struct zpci_dev *zdev);
+void zpci_destroy_iommu(struct zpci_dev *zdev);
+
 #ifdef CONFIG_PCI
 /* Error handling and recovery */
 void zpci_event_error(void *);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 364b9d8..3178e4d 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -754,6 +754,7 @@ void pcibios_remove_bus(struct pci_bus *bus)
 
zpci_exit_slot(zdev);
zpci_cleanup_bus_resources(zdev);
+   zpci_destroy_iommu(zdev);
zpci_free_domain(zdev);
 
spin_lock(_list_lock);
@@ -825,11 +826,15 @@ int zpci_create_device(struct zpci_dev *zdev)
if (rc)
goto out;
 
+   rc = zpci_init_iommu(zdev);
+   if (rc)
+   goto out_free;
+
mutex_init(>lock);
if (zdev->state == ZPCI_FN_STATE_CONFIGURED) {
rc = zpci_enable_device(zdev);
if (rc)
-   goto out_free;
+   goto out_iommu;
}
rc = zpci_scan_bus(zdev);
if (rc)
@@ -846,6 +851,9 @@ int zpci_create_device(struct zpci_dev *zdev)
 out_disable:
if (zdev->state == ZPCI_FN_STATE_ONLINE)
zpci_disable_device(zdev);
+out_iommu:
+   zpci_destroy_iommu(zdev);
+
 out_free:
zpci_free_domain(zdev);
 out:
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 179e636..cad3ad0 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -163,22 +163,22 @@ static void s390_iommu_detach_device(struct iommu_domain 
*domain,
}
 }
 
-static int s390_iommu_add_device(struct device *dev)
+static struct iommu_group *s390_iommu_device_group(struct device *dev)
 {
-   struct iommu_group *group;
-   int rc;
+   struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
 
-   group = iommu_group_get(dev);
-   if (!group) {
-   group = iommu_group_alloc();
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-   }
+   return zdev->group;
+}
+
+static int s390_iommu_add_device(struct device *dev)
+{
+   struct iommu_group *group = iommu_group_get_for_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
 
-   rc = iommu_group_add_device(group, dev);
iommu_group_put(group);
 
-   return rc;
+   return 0;
 }
 
 static void s390_iommu_remove_device(struct device *dev)
@@ -333,6 +333,26 @@ static size_t s390_iommu_unmap(struct iommu_domain *domain,
return size;
 }
 
+int zpci_init_iommu(struct zpci_dev *zdev)
+{
+   int rc = 0;
+
+   zdev->group = iommu_group_alloc();
+
+   if (IS_ERR(zdev->group)) {
+   rc = PTR_ERR(zdev->group);
+   zdev->group = NULL;
+   }
+
+   return rc;
+}
+
+void zpci_destroy_iommu(struct zpci_dev *zdev)
+{
+   iommu_group_put(zdev->group);
+   zdev->group = NULL;
+}
+
 static struct iommu_ops s390_iommu_ops = {
.capable = s390_iommu_capable,
.domain_alloc = s390_domain_alloc,
@@ -344,6 +364,7 @@ static size_t s390_iommu_unmap(struct iommu_domain *domain,
.iova_to_phys = s390_iommu_iova_to_phys,
.add_device = s390_iommu_add_device,
.remove_device = s390_iommu_remove_device,
+   .device_group = s390_iommu_device_group,
.pgsize_bitmap = S390_IOMMU_PGSIZES,
 };
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu: Remove pci.h include from trace/events/iommu.h

2017-04-27 Thread kbuild test robot
Hi Joerg,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc8 next-20170426]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Joerg-Roedel/iommu-Header-file-cleanups/20170427-160734
config: tile-allmodconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   In file included from drivers/infiniband//hw/qedr/main.c:39:0:
>> include/linux/qed/qede_roce.h:57:12: warning: 'struct pci_dev' declared 
>> inside parameter list [enabled by default]
>> include/linux/qed/qede_roce.h:57:12: warning: its scope is only this 
>> definition or declaration, which is probably not what you want [enabled by 
>> default]
>> drivers/infiniband//hw/qedr/main.c:905:2: warning: initialization from 
>> incompatible pointer type [enabled by default]
   drivers/infiniband//hw/qedr/main.c:905:2: warning: (near initialization for 
'qedr_drv.add') [enabled by default]
--
   In file included from drivers/infiniband/hw/qedr/main.c:39:0:
>> include/linux/qed/qede_roce.h:57:12: warning: 'struct pci_dev' declared 
>> inside parameter list [enabled by default]
>> include/linux/qed/qede_roce.h:57:12: warning: its scope is only this 
>> definition or declaration, which is probably not what you want [enabled by 
>> default]
   drivers/infiniband/hw/qedr/main.c:905:2: warning: initialization from 
incompatible pointer type [enabled by default]
   drivers/infiniband/hw/qedr/main.c:905:2: warning: (near initialization for 
'qedr_drv.add') [enabled by default]

vim +57 include/linux/qed/qede_roce.h

cee9fbd8 Ram Amrani 2016-10-01  41  QEDE_DOWN,
cee9fbd8 Ram Amrani 2016-10-01  42  QEDE_CHANGE_ADDR,
cee9fbd8 Ram Amrani 2016-10-01  43  QEDE_CLOSE
cee9fbd8 Ram Amrani 2016-10-01  44  };
cee9fbd8 Ram Amrani 2016-10-01  45  
cee9fbd8 Ram Amrani 2016-10-01  46  struct qede_roce_event_work {
cee9fbd8 Ram Amrani 2016-10-01  47  struct list_head list;
cee9fbd8 Ram Amrani 2016-10-01  48  struct work_struct work;
cee9fbd8 Ram Amrani 2016-10-01  49  void *ptr;
cee9fbd8 Ram Amrani 2016-10-01  50  enum qede_roce_event event;
cee9fbd8 Ram Amrani 2016-10-01  51  };
cee9fbd8 Ram Amrani 2016-10-01  52  
cee9fbd8 Ram Amrani 2016-10-01  53  struct qedr_driver {
cee9fbd8 Ram Amrani 2016-10-01  54  unsigned char name[32];
cee9fbd8 Ram Amrani 2016-10-01  55  
cee9fbd8 Ram Amrani 2016-10-01  56  struct qedr_dev* (*add)(struct qed_dev 
*, struct pci_dev *,
cee9fbd8 Ram Amrani 2016-10-01 @57  struct 
net_device *);
cee9fbd8 Ram Amrani 2016-10-01  58  
cee9fbd8 Ram Amrani 2016-10-01  59  void (*remove)(struct qedr_dev *);
cee9fbd8 Ram Amrani 2016-10-01  60  void (*notify)(struct qedr_dev *, enum 
qede_roce_event);
cee9fbd8 Ram Amrani 2016-10-01  61  };
cee9fbd8 Ram Amrani 2016-10-01  62  
cee9fbd8 Ram Amrani 2016-10-01  63  /* APIs for RoCE driver to register 
callback handlers,
cee9fbd8 Ram Amrani 2016-10-01  64   * which will be invoked when device is 
added, removed, ifup, ifdown
cee9fbd8 Ram Amrani 2016-10-01  65   */

:: The code at line 57 was first introduced by commit
:: cee9fbd8e2e9e713cd8bf227c6492fd8854de74b qede: Add qedr framework

:: TO: Ram Amrani <ram.amr...@caviumnetworks.com>
:: CC: David S. Miller <da...@davemloft.net>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] Cavium ThunderX2 SMMUv3 errata workarounds

2017-04-27 Thread Robert Richter
On 27.04.17 17:16:21, Geetha sowjanya wrote:
> From: Geetha 
> 
> Cavium CN99xx SMMUv3 implementation has two Silicon Erratas.
> 1. Errata ID #74
>SMMU register alias Page 1 is not implemented
> 2. Errata ID #126
>SMMU doesnt support unique IRQ lines for gerror, eventq and cmdq-sync
> 
> The following patchset does software workaround for these two erratas.
> 
> This series is based on RFC patch.  
> https://www.spinics.net/lists/arm-kernel/msg575739.html
> 
> As suggested by Will Deacon, code is modified to use silicon id to
> enable errata#74 workaround.

Can we go with the previous series [1] and:

 * drop the iort model numbering part,

 * add an enablement function that enables flags (smmu->options)
   depending on midr values (which replaces the macro code)?

E.g.:

static void acpi_smmu_enable_cavium(struct arm_smmu_device *smmu)
{
u32 cpu_model;

if (!IS_ENABLED(CONFIG_ARM64))
return;

cpu_model = read_cpuid_id() & MIDR_CPU_MODEL_MASK;
switch (cpu_model) {
case ...:
case ...:
break;
default:
/* No Cavium CN99xx SMMU v3 */
return;
}

smmu->options |= (ARM_SMMU_OPT_PAGE0_REGS_ONLY |
 ARM_SMMU_OPT_USE_SHARED_IRQS);
}

-Robert


[1] [RFC PATCH 0/7] Cavium CN99xx SMMUv3 Errata workarounds
https://marc.info/?l=linux-acpi=149192179623708=2

> 
> Linu Cherian (1):
>   iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
> 
> Geetha (2):
>   arm64: Add MIDR values for Cavium cn99xx SoCs
>   iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
> 
>  Documentation/arm64/silicon-errata.txt |  2 ++
>  arch/arm64/include/asm/cputype.h   |  3 ++
>  drivers/acpi/arm64/iort.c  | 14 +++-
>  drivers/iommu/arm-smmu-v3.c| 64 
> +-
>  4 files changed, 73 insertions(+), 10 deletions(-)
> 
> -- 
> 1.9.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] arm64: Add MIDR values for Cavium cn99xx SoCs

2017-04-27 Thread Jayachandran C.
On Thu, Apr 27, 2017 at 5:16 PM, Geetha sowjanya
 wrote:
> From: Geetha 
>
> Add MIDR values for Cavium cn99xx SoCs
>
> Signed-off-by: Geetha 
> ---
>  arch/arm64/include/asm/cputype.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm64/include/asm/cputype.h 
> b/arch/arm64/include/asm/cputype.h
> index fc50271..066fad0 100644
> --- a/arch/arm64/include/asm/cputype.h
> +++ b/arch/arm64/include/asm/cputype.h
> @@ -85,6 +85,7 @@
>
>  #define CAVIUM_CPU_PART_THUNDERX   0x0A1
>  #define CAVIUM_CPU_PART_THUNDERX_81XX  0x0A2
> +#define CAVIUM_CPU_PART_THUNDERX_99XX  0x0AF

Can you please use the name CAVIUM_CPU_PART_THUNDERX2? We have used
ThunderX2 consistently for this platform, having THUNDERX here would
be confusing.

>  #define BRCM_CPU_PART_VULCAN   0x516
>
> @@ -94,6 +95,8 @@
>  #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
> ARM_CPU_PART_CORTEX_A57)
>  #define MIDR_THUNDERX  MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
> CAVIUM_CPU_PART_THUNDERX)
>  #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
> CAVIUM_CPU_PART_THUNDERX_81XX)
> +#define MIDR_THUNDERX_99XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
> CAVIUM_CPU_PART_THUNDERX_99XX)
> +#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, 
> BRCM_CPU_PART_VULCAN)
>  #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
> QCOM_CPU_PART_FALKOR_V1)
>
>  #ifndef __ASSEMBLY__

Thanks,
JC.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2

2017-04-27 Thread Geetha sowjanya
From: Geetha 

Cavium 99xx SMMU doesn't support MSI and also doesn't have unique irq
lines for gerror, eventq and cmdq-sync.

This patch addresses the issue by checking if any interrupt sources are
using same irq number, then they are registered as shared irqs.

Signed-off-by: Geetha 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 drivers/iommu/arm-smmu-v3.c| 32 
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 629e2ce..cc15f25 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -62,6 +62,7 @@ stable kernels.
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
 | Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
+| Cavium | ThunderX2 SMMUv3| #126| N/A 
|
 || | | 
|
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 || | | 
|
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ee23ccd..eb55d38 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2211,10 +2211,30 @@ static void arm_smmu_setup_msis(struct arm_smmu_device 
*smmu)
devm_add_action(dev, arm_smmu_free_msis, dev);
 }
 
+static int get_irq_flags(struct arm_smmu_device *smmu, int irq)
+{
+   int match_count = 0;
+
+   if (irq == smmu->evtq.q.irq)
+   match_count++;
+   if (irq == smmu->cmdq.q.irq)
+   match_count++;
+   if (irq == smmu->gerr_irq)
+   match_count++;
+   if (irq == smmu->priq.q.irq)
+   match_count++;
+
+   if (match_count > 1)
+   return IRQF_SHARED | IRQF_ONESHOT;
+
+   return 0;
+}
+
 static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 {
int ret, irq;
u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
+   u32 irqflags = 0;
 
/* Disable IRQs first */
ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
@@ -2229,9 +2249,10 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
/* Request interrupt lines */
irq = smmu->evtq.q.irq;
if (irq) {
+   irqflags = get_irq_flags(smmu, irq);
ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
arm_smmu_evtq_thread,
-   IRQF_ONESHOT,
+   IRQF_ONESHOT | irqflags,
"arm-smmu-v3-evtq", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable evtq irq\n");
@@ -2239,8 +2260,9 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
 
irq = smmu->cmdq.q.irq;
if (irq) {
+   irqflags = get_irq_flags(smmu, irq);
ret = devm_request_irq(smmu->dev, irq,
-  arm_smmu_cmdq_sync_handler, 0,
+  arm_smmu_cmdq_sync_handler, irqflags,
   "arm-smmu-v3-cmdq-sync", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable cmdq-sync irq\n");
@@ -2248,8 +2270,9 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
 
irq = smmu->gerr_irq;
if (irq) {
+   irqflags = get_irq_flags(smmu, irq);
ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
-  0, "arm-smmu-v3-gerror", smmu);
+  irqflags, "arm-smmu-v3-gerror", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable gerror irq\n");
}
@@ -2257,9 +2280,10 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
if (smmu->features & ARM_SMMU_FEAT_PRI) {
irq = smmu->priq.q.irq;
if (irq) {
+   irqflags = get_irq_flags(smmu, irq);
ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
arm_smmu_priq_thread,
-   IRQF_ONESHOT,
+   IRQF_ONESHOT | irqflags,
"arm-smmu-v3-priq",
smmu);
if (ret < 0)
-- 
1.9.1


[PATCH 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-04-27 Thread Geetha sowjanya
From: Linu Cherian  

Cavium 99xx SMMU implementation doesn't support page 1 register space.
Based on silicon id, ARM_SMMU_PAGE0_REGS_ONLY macro is set as an errata
workaround.

This macro when set, replaces all page 1 offsets used for
EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.

Signed-off-by: Linu Cherian 
Signed-off-by: Geetha 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 drivers/acpi/arm64/iort.c  | 14 +-
 drivers/iommu/arm-smmu-v3.c| 32 +++-
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 2f66683..629e2ce 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -61,6 +61,7 @@ stable kernels.
 | Cavium | ThunderX GICv3  | #23154  | CAVIUM_ERRATUM_23154
|
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
+| Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
 || | | 
|
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 || | | 
|
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 4a5bb96..a074ce9 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IORT_TYPE_MASK(type)   (1 << (type))
 #define IORT_MSI_TYPE  (1 << ACPI_IORT_NODE_ITS_GROUP)
@@ -669,12 +670,23 @@ static void __init arm_smmu_v3_init_resources(struct 
resource *res,
 {
struct acpi_iort_smmu_v3 *smmu;
int num_res = 0;
+   u32 cpu_model;
+   unsigned long size = SZ_128K;
 
/* Retrieve SMMUv3 specific data */
smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 
+   /*
+* Override the size, for Cavium CN99xx implementations
+* which doesn't support the page 1 SMMU register space.
+*/
+   cpu_model = read_cpuid_id() & MIDR_CPU_MODEL_MASK;
+   if (cpu_model == MIDR_THUNDERX_99XX ||
+   cpu_model == MIDR_BRCM_VULCAN)
+   size = SZ_64K;
+
res[num_res].start = smmu->base_address;
-   res[num_res].end = smmu->base_address + SZ_128K - 1;
+   res[num_res].end = smmu->base_address + size - 1;
res[num_res].flags = IORESOURCE_MEM;
 
num_res++;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1dcd154..ee23ccd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -38,6 +38,7 @@
 #include 
 
 #include 
+#include 
 
 #include "io-pgtable.h"
 
@@ -176,15 +177,15 @@
 #define ARM_SMMU_CMDQ_CONS 0x9c
 
 #define ARM_SMMU_EVTQ_BASE 0xa0
-#define ARM_SMMU_EVTQ_PROD 0x100a8
-#define ARM_SMMU_EVTQ_CONS 0x100ac
+#define ARM_SMMU_EVTQ_PROD (page1_offset_adjust(0x100a8))
+#define ARM_SMMU_EVTQ_CONS (page1_offset_adjust(0x100ac))
 #define ARM_SMMU_EVTQ_IRQ_CFG0 0xb0
 #define ARM_SMMU_EVTQ_IRQ_CFG1 0xb8
 #define ARM_SMMU_EVTQ_IRQ_CFG2 0xbc
 
 #define ARM_SMMU_PRIQ_BASE 0xc0
-#define ARM_SMMU_PRIQ_PROD 0x100c8
-#define ARM_SMMU_PRIQ_CONS 0x100cc
+#define ARM_SMMU_PRIQ_PROD (page1_offset_adjust(0x100c8))
+#define ARM_SMMU_PRIQ_CONS (page1_offset_adjust(0x100cc))
 #define ARM_SMMU_PRIQ_IRQ_CFG0 0xd0
 #define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8
 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
@@ -412,6 +413,10 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
+#define ARM_SMMU_PAGE0_REGS_ONLY   \
+   (((read_cpuid_id() & MIDR_CPU_MODEL_MASK) == MIDR_THUNDERX_99XX) \
+   || ((read_cpuid_id() & MIDR_CPU_MODEL_MASK) == MIDR_BRCM_VULCAN))
+
 static bool disable_bypass;
 module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
@@ -660,6 +665,15 @@ struct arm_smmu_option_prop {
{ 0, NULL},
 };
 
+static inline unsigned long page1_offset_adjust(
+   unsigned long off)
+{
+   if (!ARM_SMMU_PAGE0_REGS_ONLY)
+   return off;
+   else
+   return (off - SZ_64K);
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -2631,6 +2645,14 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
return ret;
 }
 
+static unsigned long arm_smmu_resource_size(void)
+{
+   if 

[PATCH 1/3] arm64: Add MIDR values for Cavium cn99xx SoCs

2017-04-27 Thread Geetha sowjanya
From: Geetha 

Add MIDR values for Cavium cn99xx SoCs

Signed-off-by: Geetha 
---
 arch/arm64/include/asm/cputype.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index fc50271..066fad0 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -85,6 +85,7 @@
 
 #define CAVIUM_CPU_PART_THUNDERX   0x0A1
 #define CAVIUM_CPU_PART_THUNDERX_81XX  0x0A2
+#define CAVIUM_CPU_PART_THUNDERX_99XX  0x0AF
 
 #define BRCM_CPU_PART_VULCAN   0x516
 
@@ -94,6 +95,8 @@
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
 #define MIDR_THUNDERX  MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX)
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
+#define MIDR_THUNDERX_99XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_99XX)
+#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
 
 #ifndef __ASSEMBLY__
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Cavium ThunderX2 SMMUv3 errata workarounds

2017-04-27 Thread Geetha sowjanya
From: Geetha 

Cavium CN99xx SMMUv3 implementation has two Silicon Erratas.
1. Errata ID #74
   SMMU register alias Page 1 is not implemented
2. Errata ID #126
   SMMU doesnt support unique IRQ lines for gerror, eventq and cmdq-sync

The following patchset does software workaround for these two erratas.

This series is based on RFC patch.  
https://www.spinics.net/lists/arm-kernel/msg575739.html

As suggested by Will Deacon, code is modified to use silicon id to
enable errata#74 workaround.

Linu Cherian (1):
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

Geetha (2):
  arm64: Add MIDR values for Cavium cn99xx SoCs
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126

 Documentation/arm64/silicon-errata.txt |  2 ++
 arch/arm64/include/asm/cputype.h   |  3 ++
 drivers/acpi/arm64/iort.c  | 14 +++-
 drivers/iommu/arm-smmu-v3.c| 64 +-
 4 files changed, 73 insertions(+), 10 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-04-27 Thread sunil . kovvuri
From: Sunil Goutham 

Modified polling on CMDQ consumer similar to how polling is done for TLB SYNC
completion in SMMUv2 driver. Code changes are done with reference to

8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively

Poll timeout has been increased which addresses issue of 100us timeout not
sufficient, when command queue is full with TLB invalidation commands.

Signed-off-by: Sunil Goutham 
Signed-off-by: Geetha 
---
 drivers/iommu/arm-smmu-v3.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d412bdd..34599d4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -379,6 +379,9 @@
 #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
 
+#define CMDQ_DRAIN_TIMEOUT_US  1000
+#define CMDQ_SPIN_COUNT10
+
 /* Event queue */
 #define EVTQ_ENT_DWORDS4
 #define EVTQ_MAX_SZ_SHIFT  7
@@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
  */
 static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
 {
-   ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
+   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
+   unsigned int spin_cnt, delay = 1;
 
while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) {
if (ktime_compare(ktime_get(), timeout) > 0)
@@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool 
drain, bool wfe)
if (wfe) {
wfe();
} else {
-   cpu_relax();
-   udelay(1);
+   for (spin_cnt = 0;
+spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
+   cpu_relax();
+   continue;
+   }
+   udelay(delay);
+   delay *= 2;
}
}
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 03/20] intel_iommu: add "svm" option

2017-04-27 Thread Peter Xu
On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> Expose "Shared Virtual Memory" to guest by using "svm" option.
> Also use "svm" to expose SVM related capabilities to guest.
> e.g. "-device intel-iommu, svm=on"
> 
> Signed-off-by: Liu, Yi L 
> ---
>  hw/i386/intel_iommu.c  | 10 ++
>  hw/i386/intel_iommu_internal.h |  5 +
>  include/hw/i386/intel_iommu.h  |  1 +
>  3 files changed, 16 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index bf98fa5..ba1e7eb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
>  DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
>  DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
>  DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> +DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
>  s->ecap |= VTD_ECAP_ECS;
>  }
>  
> +if (s->svm) {
> +if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> +error_report("Need to set ecs, pt, caching-mode for svm");
> +exit(1);
> +}
> +s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> +s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> +}
> +
>  if (s->caching_mode) {
>  s->cap |= VTD_CAP_CM;
>  }
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 71a1c1e..f2a7d12 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -191,6 +191,9 @@
>  #define VTD_ECAP_PT (1ULL << 6)
>  #define VTD_ECAP_MHMV   (15ULL << 20)
>  #define VTD_ECAP_ECS(1ULL << 24)
> +#define VTD_ECAP_PASID28(1ULL << 28)

Could I ask what's this bit? On my spec, it says this bit is reserved
and defunct (spec version: June 2016).

> +#define VTD_ECAP_PRS(1ULL << 29)
> +#define VTD_ECAP_PTS(0xeULL << 35)

Would it better we avoid using 0xe here, or at least add some comment?

>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> @@ -207,6 +210,8 @@
>  #define VTD_CAP_PSI (1ULL << 39)
>  #define VTD_CAP_SLLPS   ((1ULL << 34) | (1ULL << 35))
>  #define VTD_CAP_CM  (1ULL << 7)
> +#define VTD_CAP_DWD (1ULL << 54)
> +#define VTD_CAP_DRD (1ULL << 55)

Just to confirm: after this series, we should support drain read/write
then, right?

Thanks,

>  
>  /* Supported Adjusted Guest Address Widths */
>  #define VTD_CAP_SAGAW_SHIFT 8
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index ae21fe5..8981615 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -267,6 +267,7 @@ struct IntelIOMMUState {
>  
>  bool caching_mode;  /* RO - is cap CM enabled? */
>  bool ecs;   /* Extended Context Support */
> +bool svm;   /* Shared Virtual Memory */
>  
>  dma_addr_t root;/* Current root table pointer */
>  bool root_extended; /* Type of root table (extended or not) 
> */
> -- 
> 1.9.1
> 

-- 
Peter Xu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-27 Thread Peter Xu
On Thu, Apr 27, 2017 at 06:25:37PM +0800, Liu, Yi L wrote:
> On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> > On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > > + void *data)
> > > > > +{
> > > > > +IOMMUNotifier *iommu_notifier;
> > > > > +IOMMUNotifierFlag request_flags;
> > > > > +
> > > > > +assert(memory_region_is_iommu(mr));
> > > > > +
> > > > > +/*TODO: support other bind requests with smaller gran,
> > > > > + * e.g. bind signle pasid entry
> > > > > + */
> > > > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > > +
> > > > > +QLIST_FOREACH(iommu_notifier, >iommu_notify, node) {
> > > > > +if (iommu_notifier->notifier_flags & request_flags) {
> > > > > +iommu_notifier->notify(iommu_notifier, data);
> > > > > +break;
> > > > > +}
> > > > > +}
> > > > 
> > > > Peter,
> > > > 
> > > > should this reuse ->notify, or should it be different function pointer
> > > > in IOMMUNotifier?
> > > 
> > > Hi Paolo,
> > > 
> > > Thx for your review.
> > > 
> > > I think it should be “->notify” here. In this patchset, the new notifier
> > > is registered with the existing notifier registration API. So the all the
> > > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > > When the flag meets, trigger it by “->notify”. The diagram below shows
> > > my understanding , wish it helps to make me understood.
> > > 
> > > VFIOContainer
> > >|
> > >giommu_list(VFIOGuestIOMMU)
> > > \
> > >  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 
> > > ...
> > > | | |
> > > mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
> > >   (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb 
> > > invalidate)
> > > 
> > > 
> > > Actually, compared with the MAP/UNMAP notifier, the newly added notifier 
> > > has
> > > no start/end check, and there may be other types of bind notfier flag in
> > > future, so I added a separate fire func for SVM bind notifier.
> > 
> > I agree with Paolo that this interface might not be the suitable place
> > for the SVM notifiers (just like what I worried about in previous
> > discussions).
> > 
> > The biggest problem is that, if you see current notifier mechanism,
> > it's per-memory-region. However iiuc your messages should be
> > per-iommu, or say, per translation unit.
> 
> Hi Peter,
> 
> yes, you're right. the newly added notifier is per-iommu.
> 
> > While, for each iommu, there
> > can be more than one memory regions (ppc can be an example). When
> > there are more than one MRs binded to the same iommu unit, which
> > memory region should you register to? Any one of them, or all?
> 
> Honestly, I'm not expert on ppc. According to the current code,
> I can only find one MR initialized with memory_region_init_iommu()
> in spapr_tce_table_realize(). So to better get your point, let me
> check. Do you mean there may be multiple of iommu MRs behind a iommu?

I am not either. :)

But yes, that's what I mean. At least that's how I understand it.

> 
> I admit it must be considered if there are multiple iommu MRs. I may
> choose to register for one of them since the notifier is per-iommu as
> you've pointed. Then vIOMMU emulator need to trigger the notifier with
> the correct MR. Not sure if ppc vIOMMU is fine with it.
> 
> > So my conclusion is, it just has nothing to do with memory regions...
> >
> > Instead of a different function pointer in IOMMUNotifer, IMHO we can
> > even move a step further, to isolate IOTLB notifications (targeted at
> > memory regions and with start/end ranges) out of SVM/other
> > notifications, since they are different in general. So we basically
> > need two notification mechanism:
> > 
> > - one for memory regions, currently what I can see is IOTLB
> >   notifications
> > 
> > - one for translation units, currently I see all the rest of
> >   notifications needed in virt-svm in this category
> > 
> > Maybe some RFC patches would be good to show what I mean... I'll see
> > whether I can prepare some.
> 
> I agree that it would be helpful to split the two kinds of notifiers. I
> marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
> for common IOMMUObject. Thx for your work, would try to review it.

Thanks, looking forward to your review comments.

> 
> Besides the notifier registration, pls also help to review the SVM
> virtualization itself. Would be glad to know your comments.

Yes. It's on my list. Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-27 Thread Liu, Yi L
On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > + void *data)
> > > > +{
> > > > +IOMMUNotifier *iommu_notifier;
> > > > +IOMMUNotifierFlag request_flags;
> > > > +
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +/*TODO: support other bind requests with smaller gran,
> > > > + * e.g. bind signle pasid entry
> > > > + */
> > > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +QLIST_FOREACH(iommu_notifier, >iommu_notify, node) {
> > > > +if (iommu_notifier->notifier_flags & request_flags) {
> > > > +iommu_notifier->notify(iommu_notifier, data);
> > > > +break;
> > > > +}
> > > > +}
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >|
> >giommu_list(VFIOGuestIOMMU)
> > \
> >  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> > | | |
> > mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
> >   (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb 
> > invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit.

Hi Peter,

yes, you're right. the newly added notifier is per-iommu.

> While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?

Honestly, I'm not expert on ppc. According to the current code,
I can only find one MR initialized with memory_region_init_iommu()
in spapr_tce_table_realize(). So to better get your point, let me
check. Do you mean there may be multiple of iommu MRs behind a iommu?

I admit it must be considered if there are multiple iommu MRs. I may
choose to register for one of them since the notifier is per-iommu as
you've pointed. Then vIOMMU emulator need to trigger the notifier with
the correct MR. Not sure if ppc vIOMMU is fine with it.

> So my conclusion is, it just has nothing to do with memory regions...
>
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

I agree that it would be helpful to split the two kinds of notifiers. I
marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
for common IOMMUObject. Thx for your work, would try to review it.

Besides the notifier registration, pls also help to review the SVM
virtualization itself. Would be glad to know your comments.

Thanks,
Yi L

> Thanks,
> 
> -- 
> Peter Xu
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest

2017-04-27 Thread Peter Xu
On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> VT-d implementations reporting PASID or PRS fields as "Set", must also
> report ecap.ECS as "Set". Extended-Context is required for SVM.
> 
> When ECS is reported, intel iommu driver would initiate extended root entry
> and extended context entry, and also PASID table if there is any SVM capable
> device.
> 
> Signed-off-by: Liu, Yi L 
> ---
>  hw/i386/intel_iommu.c  | 131 
> +++--
>  hw/i386/intel_iommu_internal.h |   9 +++
>  include/hw/i386/intel_iommu.h  |   2 +-
>  3 files changed, 97 insertions(+), 45 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 400d0d1..bf98fa5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry 
> *root)
>  return root->val & VTD_ROOT_ENTRY_P;
>  }
>  
> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> +{
> +return root->rsvd & VTD_ROOT_ENTRY_P;
> +}
> +
>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>VTDRootEntry *re)
>  {
> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t 
> index,
>  return -VTD_FR_ROOT_TABLE_INV;
>  }
>  re->val = le64_to_cpu(re->val);
> +if (s->ecs) {
> +re->rsvd = le64_to_cpu(re->rsvd);
> +}

I feel it slightly hacky to play with re->rsvd. How about:

union VTDRootEntry {
struct {
uint64_t val;
uint64_t rsvd;
} base;
struct {
uint64_t ext_lo;
uint64_t ext_hi;
} extended;
};

(Or any better way that can get rid of rsvd...)

Even:

struct VTDRootEntry {
union {
struct {
uint64_t val;
uint64_t rsvd;
} base;
struct {
uint64_t ext_lo;
uint64_t ext_hi;
} extended;
} data;
bool extended;
};

Then we read the entry into data, and setup extended bit. A benefit of
it is that we may avoid passing around IntelIOMMUState everywhere to
know whether we are using extended context entries.

>  return 0;
>  }
>  
> @@ -517,19 +525,30 @@ static inline bool 
> vtd_context_entry_present(VTDContextEntry *context)
>  return context->lo & VTD_CONTEXT_ENTRY_P;
>  }
>  
> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> -   VTDContextEntry *ce)
> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> + VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
>  {
> -dma_addr_t addr;
> +dma_addr_t addr, ce_size;
>  
>  /* we have checked that root entry is present */
> -addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> -if (dma_memory_read(_space_memory, addr, ce, sizeof(*ce))) {
> +ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> +addr = (s->ecs && (index > 0x7f)) ?
> +   ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> +   ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> +
> +if (dma_memory_read(_space_memory, addr, ce, ce_size)) {
>  trace_vtd_re_invalid(root->rsvd, root->val);
>  return -VTD_FR_CONTEXT_TABLE_INV;
>  }
> -ce->lo = le64_to_cpu(ce->lo);
> -ce->hi = le64_to_cpu(ce->hi);
> +
> +ce[0].lo = le64_to_cpu(ce[0].lo);
> +ce[0].hi = le64_to_cpu(ce[0].hi);

Again, I feel this even hackier. :)

I would slightly prefer to play the same union trick to context
entries, just like what I proposed to the root entries above...

> +
> +if (s->ecs) {
> +ce[1].lo = le64_to_cpu(ce[1].lo);
> +ce[1].hi = le64_to_cpu(ce[1].hi);
> +}
> +
>  return 0;
>  }
>  
> @@ -595,9 +614,11 @@ static inline uint32_t 
> vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>  return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>  }
>  
> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> +   VTDContextEntry *ce)
>  {
> -return ce->lo & VTD_CONTEXT_ENTRY_TT;
> +return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> +(ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
>  }
>  
>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
> uint8_t bus_num,
>  return ret_fr;
>  }
>  
> -if (!vtd_root_entry_present()) {
> +if (!vtd_root_entry_present() ||
> +(s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present( {
>  /* Not error - it's okay we don't have root entry. */
>  trace_vtd_re_not_present(bus_num);
>  return -VTD_FR_ROOT_ENTRY_P;
> -} else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> -

Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-27 Thread Peter Xu
On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > + void *data)
> > > > +{
> > > > +IOMMUNotifier *iommu_notifier;
> > > > +IOMMUNotifierFlag request_flags;
> > > > +
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +/*TODO: support other bind requests with smaller gran,
> > > > + * e.g. bind signle pasid entry
> > > > + */
> > > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +QLIST_FOREACH(iommu_notifier, >iommu_notify, node) {
> > > > +if (iommu_notifier->notifier_flags & request_flags) {
> > > > +iommu_notifier->notify(iommu_notifier, data);
> > > > +break;
> > > > +}
> > > > +}
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >|
> >giommu_list(VFIOGuestIOMMU)
> > \
> >  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> > | | |
> > mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
> >   (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb 
> > invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit. While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?
> 
> So my conclusion is, it just has nothing to do with memory regions...
> 
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

Here it is (on qemu-devel):

[RFC PATCH 0/8] IOMMU: introduce common IOMMUObject

Thanks,

-- 
Peter Xu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-27 Thread Jean-Philippe Brucker
On 27/04/17 07:36, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan 
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
>>> case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns the
>>> first level page tables (request with PASID) and performs GVA->GPA
>>> translation. Second level page tables are owned by the host for GPA->HPA
>>> translation for both request with and without PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called iommu_(un)bind_pasid_table()
>>> to IOMMU APIs. Architecture specific IOMMU function can be added later
>>> to perform the specific steps for binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used to
>>> check if the bind request is from a compatible entity. e.g. a bind
>>> request from an intel_iommu emulator may not be supported by an ARM SMMU
>>> driver.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Liu, Yi L 
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++
>>>  include/linux/iommu.h | 31 +++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
>>> struct device *dev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
>>> +   struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are expected
>> to share the same mappings (whether they want it or not), users will have
>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>> might be simpler to let the IOMMU core take the group lock and do
>> group->domain->ops->bind_task(dev...) for each device. The question also
>> holds for iommu_do_invalidate in patch 3/8.
>>
>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>
>> For PASID table binding it might not matter much, as VFIO will most likely
>> be the only user. But task binding will be called by device drivers, which
>> by now should be encouraged to do things at iommu_group granularity.
>> Alternatively it could be done implicitly like in iommu_attach_device,
>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
>>
>>
>> Extending this reasoning, since groups in a domain are also supposed to
>> have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU core to
>> keep a group list in each domain, which might complicate things a little
>> too much.
>>
>> But "all devices in a domain share the same PASID table" is the paradigm
>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>> iommu_group, it should be made more explicit to users, so they don't
>> assume that devices within a domain are isolated from each others with
>> regard to PASID DMA.
>>
>>> +{
>>> +   if (unlikely(!domain->ops->bind_pasid_table))
>>> +   return -EINVAL;
>>> +
>>> +   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
>>> +
>>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device 
>>> *dev)
>>> +{
>>> +   if (unlikely(!domain->ops->unbind_pasid_table))
>>> +   return -EINVAL;
>>> +
>>> +   return domain->ops->unbind_pasid_table(domain, dev);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>>> +
>>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>>   struct device *dev)
>>>  {
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 0ff5111..491a011 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -131,6 

Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-04-27 Thread Dave Young
On 04/21/17 at 02:55pm, Dave Hansen wrote:
> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> > Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> > determine if SME is active.
> > 
> > A new directory will be created:
> >   /sys/kernel/mm/sme/
> > 
> > And two entries within the new directory:
> >   /sys/kernel/mm/sme/active
> >   /sys/kernel/mm/sme/encryption_mask
> 
> Why do they care, and what will they be doing with this information?

Since kdump will copy old memory but need this to know if the old memory
was encrypted or not. With this sysfs file we can know the previous SME
status and pass to kdump kernel as like a kernel param.

Tom, have you got chance to try if it works or not?

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-04-27 Thread Liu, Yi L
On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan 
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h | 31 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> > struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +   struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have
> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.
> 
> > +{
> > +   if (unlikely(!domain->ops->bind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device 
> > *dev)
> > +{
> > +   if (unlikely(!domain->ops->unbind_pasid_table))
> > +   return -EINVAL;
> > +
> > +   return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> > int 

Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier

2017-04-27 Thread Peter Xu
On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > + void *data)
> > > +{
> > > +IOMMUNotifier *iommu_notifier;
> > > +IOMMUNotifierFlag request_flags;
> > > +
> > > +assert(memory_region_is_iommu(mr));
> > > +
> > > +/*TODO: support other bind requests with smaller gran,
> > > + * e.g. bind signle pasid entry
> > > + */
> > > +request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > +
> > > +QLIST_FOREACH(iommu_notifier, >iommu_notify, node) {
> > > +if (iommu_notifier->notifier_flags & request_flags) {
> > > +iommu_notifier->notify(iommu_notifier, data);
> > > +break;
> > > +}
> > > +}
> > 
> > Peter,
> > 
> > should this reuse ->notify, or should it be different function pointer
> > in IOMMUNotifier?
> 
> Hi Paolo,
> 
> Thx for your review.
> 
> I think it should be “->notify” here. In this patchset, the new notifier
> is registered with the existing notifier registration API. So the all the
> notifiers are in the mr->iommu_notify list. And notifiers are labeled
> by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> When the flag meets, trigger it by “->notify”. The diagram below shows
> my understanding , wish it helps to make me understood.
> 
> VFIOContainer
>|
>giommu_list(VFIOGuestIOMMU)
> \
>  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> | | |
> mr->iommu_notify: IOMMUNotifier   ->IOMMUNotifier  ->  IOMMUNotifier
>   (Flag:MAP/UNMAP) (Flag:SVM bind)  (Flag:tlb invalidate)
> 
> 
> Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> no start/end check, and there may be other types of bind notfier flag in
> future, so I added a separate fire func for SVM bind notifier.

I agree with Paolo that this interface might not be the suitable place
for the SVM notifiers (just like what I worried about in previous
discussions).

The biggest problem is that, if you see current notifier mechanism,
it's per-memory-region. However iiuc your messages should be
per-iommu, or say, per translation unit. While, for each iommu, there
can be more than one memory regions (ppc can be an example). When
there are more than one MRs binded to the same iommu unit, which
memory region should you register to? Any one of them, or all?

So my conclusion is, it just has nothing to do with memory regions...

Instead of a different function pointer in IOMMUNotifer, IMHO we can
even move a step further, to isolate IOTLB notifications (targeted at
memory regions and with start/end ranges) out of SVM/other
notifications, since they are different in general. So we basically
need two notification mechanism:

- one for memory regions, currently what I can see is IOTLB
  notifications

- one for translation units, currently I see all the rest of
  notifications needed in virt-svm in this category

Maybe some RFC patches would be good to show what I mean... I'll see
whether I can prepare some.

Thanks,

-- 
Peter Xu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu