Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

2017-06-12 Thread Jerome Glisse
On Sat, Jun 10, 2017 at 04:06:28AM +, Wuzongyong (Cordius Wu, Euler Dept) 
wrote:
> Hi,
> 
> Could someone explain differences and relations between the SVM
> (Shared Virtual Memory, by Intel), HSA(Heterogeneous System
> Architecture, by AMD), HMM(Heterogeneous Memory Management, by Glisse)
> and UM(Unified Memory, by NVIDIA) ? Are these in the substitutional
> relation?
>
> As I understand it, these aim to solve the same thing, sharing
> pointers between CPU and GPU(implement with ATS/PASID/PRI/IOMMU
> support). So far, SVM and HSA can only be used by integrated gpu.
> And, Intel declare that the root ports doesn't not have the
> required TLP prefix support, resulting  that SVM can't be used
> by discrete devices. So could someone tell me the required TLP
> prefix means what specifically?
>
> With HMM, we can use allocator like malloc to manage host and
> device memory. Does this mean that there is no need to use SVM
> and HSA with HMM, or HMM is the basis of SVM and HAS to
> implement Fine-Grained system SVM defined in the opencl spec?

So aim of all technology is to share address space between a device
and CPU. Now they are 3 way to do it:

  A) all in hardware like CAPI or CCIX where device memory is cache
 coherent from CPU access point of view and system memory is also
 accessible by device in cache coherent way with CPU. So it is
 cache coherency going both way from CPU to device memory and from
 device to system memory


  B) partially in hardware ATS/PASID (which are the same technology
 behind both HSA and SVM). Here it is only single way solution
 where you have cache coherent access from device to system memory
 but not the other way around. Moreover you share the CPU page
 table with the device so you do not need to program the IOMMU.

Here you can not use the device memory transparently. At least
not without software help like HMM.


  C) all in software. Here device can access system memory with cache
 coherency but it does not share the same CPU page table. Each
 device have their own page table and thus you need to synchronize
 them.

HMM provides helper that address all of the 3 solutions.
  A) for all hardware solution HMM provides new helpers to help
 with migration of process memory to device memory
  B) for partial hardware solution you can mix with HMM to again
 provide helpers for migration to device memory. This assume
 you device can mix and match local device page table with
 ATS/PASID region
  C) full software solution using all the feature of HMM where it
 is all done in software and HMM is just doing the heavy lifting
 on behalf of device driver

In all of the above we are talking fine-grained system SVM as in
the OpenCL specificiation. So you can malloc() memory and use it
directly from the GPU.

Hope this clarify thing.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 14/34] x86/mm: Insure that boot memory areas are mapped properly

2017-06-12 Thread Tom Lendacky

On 6/10/2017 11:01 AM, Borislav Petkov wrote:

On Wed, Jun 07, 2017 at 02:15:39PM -0500, Tom Lendacky wrote:

The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process.  The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.

For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/mem_encrypt.h |   11 +
  arch/x86/include/asm/pgtable.h |3 +
  arch/x86/kernel/head64.c   |   30 --
  arch/x86/kernel/setup.c|9 
  arch/x86/mm/mem_encrypt.c  |   77 
  5 files changed, 126 insertions(+), 4 deletions(-)


Some cleanups ontop in case you get to send v7:


There will be a v7.



diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 61a704945294..5959a42dd4d5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -45,13 +45,8 @@ static inline void __init sme_early_decrypt(resource_size_t 
paddr,
  {
  }
  
-static inline void __init sme_map_bootdata(char *real_mode_data)

-{
-}
-
-static inline void __init sme_unmap_bootdata(char *real_mode_data)
-{
-}
+static inline void __init sme_map_bootdata(char *real_mode_data)   { }
+static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
  
  static inline void __init sme_early_init(void)

  {
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 2321f05045e5..32ebbe0ab04d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -132,6 +132,10 @@ static void __init __sme_map_unmap_bootdata(char 
*real_mode_data, bool map)
struct boot_params *boot_data;
unsigned long cmdline_paddr;
  
+	/* If SME is not active, the bootdata is in the correct state */

+   if (!sme_active())
+   return;
+
__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
boot_data = (struct boot_params *)real_mode_data;
  
@@ -142,40 +146,22 @@ static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)

cmdline_paddr = boot_data->hdr.cmd_line_ptr |
((u64)boot_data->ext_cmd_line_ptr << 32);
  
-	if (cmdline_paddr)

-   __sme_early_map_unmap_mem(__va(cmdline_paddr),
- COMMAND_LINE_SIZE, map);
+   if (!cmdline_paddr)
+   return;
+
+   __sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, map);
+
+   sme_early_pgtable_flush();


Yup, overall it definitely simplifies things.

I have to call sme_early_pgtable_flush() even if cmdline_paddr is NULL,
so I'll either keep the if and have one flush at the end or I can move
the flush into __sme_early_map_unmap_mem(). I'm leaning towards the
latter.

Thanks,
Tom


  }
  
  void __init sme_unmap_bootdata(char *real_mode_data)

  {
-   /* If SME is not active, the bootdata is in the correct state */
-   if (!sme_active())
-   return;
-
-   /*
-* The bootdata and command line aren't needed anymore so clear
-* any mapping of them.
-*/
__sme_map_unmap_bootdata(real_mode_data, false);
-
-   sme_early_pgtable_flush();
  }
  
  void __init sme_map_bootdata(char *real_mode_data)

  {
-   /* If SME is not active, the bootdata is in the correct state */
-   if (!sme_active())
-   return;
-
-   /*
-* The bootdata and command line will not be encrypted, so they
-* need to be mapped as decrypted memory so they can be copied
-* properly.
-*/
__sme_map_unmap_bootdata(real_mode_data, true);
-
-   sme_early_pgtable_flush();
  }
  
  void __init sme_early_init(void)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu: add qcom_iommu

2017-06-12 Thread Rob Clark
On Fri, May 26, 2017 at 8:56 AM, Robin Murphy  wrote:
>> + struct iommu_group  *group;
>
> This feels weird, since a device can be associated with multiple
> contexts, but only one group, so group-per-context is somewhat redundant
> and smacks of being in the wrong place. Does the firmware ever map
> multiple devices to the same context?


so, actually it seems like I can dump all of this, and just plug
generic_device_group directly in to iommu ops without needing to care
about tracking the iommu_group myself.  At least this appears to work.

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

2017-06-12 Thread Jean-Philippe Brucker
Hello,

On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
> Hi,
> 
> Could someone explain differences and relations between the SVM(Shared
> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
> NVIDIA) ? Are these in the substitutional relation?
> 
> As I understand it, these aim to solve the same thing, sharing pointers
> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
> SVM and HSA can only be used by integrated gpu. And, Intel declare that
> the root ports doesn’t not have the required TLP prefix support, resulting
>  that SVM can’t be used by discrete devices. So could someone tell me the
> required TLP prefix means what specifically?>
> With HMM, we can use allocator like malloc to manage host and device
> memory. Does this mean that there is no need to use SVM and HSA with HMM,
> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
> defined in the opencl spec?

I can't provide an exhaustive answer, but I have done some work on SVM.
Take it with a grain of salt though, I am not an expert.

* HSA is an architecture that provides a common programming model for CPUs
and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
faults, PASID and compatible address spaces), though it's only a small
part of it.

* Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
2.0 introduced the concept of Fine-Grained System SVM, which allows to
pass userspace pointers to devices. It is just one flavor of SVM, they
also have coarse-grained and non-system. But they might have coined the
name, and I believe that in the context of Linux IOMMU, when we talk about
"SVM" it is OpenCL's fine-grained system SVM.

* Nvidia Cuda has a feature similar to fine-grained system SVM, called
Unified Virtual Adressing. I'm not sure whether it maps exactly to
OpenCL's system SVM. Nividia's Unified Memory seems to be more in line
with HMM, because in addition to unifying the virtual address space, they
also unify system and device memory.


So SVM is about userspace API, the ability to perform DMA on a process
address space instead of using a separate DMA address space. One possible
implementation, for PCIe endpoints, uses ATS+PRI+PASID.

* The PASID extension adds a prefix to the PCI TLP (characterized by
bits[31:29] = 0b100) that specifies which address space is affected by the
transaction. The IOMMU uses (RequesterID, PASID, Virt Addr) to derive a
Phys Addr, where it previously only needed (RID, IOVA).

* The PRI extension allows to handle page faults from endpoints, which are
bound to happen if they attempt to access process memory.

* PRI requires ATS. PRI adds two new TLPs, but ATS makes use of the AT
field [11:10] in PCIe TLPs, which was previously reserved.

So PCI switches, endpoints, root complexes and IOMMUs all have to be aware
of these three extensions in order to use SVM with discrete endpoints.


While SVM is only about virtual address space, HMM deals with physical
storage. If I understand correctly, HMM allows to transparently use device
RAM from userspace applications. So upon an I/O page fault, the mm
subsystem will migrate data from system memory into device RAM. It would
differ from "pure" SVM in that you would use different page directories on
IOMMU and MMU sides, and synchronize them using MMU notifiers. But please
don't take this at face value, I haven't had time to look into HMM yet.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 27/44] sparc: remove leon_dma_ops

2017-06-12 Thread Andreas Larsson

On 2017-06-08 15:25, Christoph Hellwig wrote:

We can just use pci32_dma_ops.

Btw, given that leon is 32-bit and appears to be PCI based, do even need
the special case for it in get_arch_dma_ops at all?


Hi!

Yes, it is needed. LEON systems are AMBA bus based. The common case here 
is DMA over AMBA buses. Some LEON systems have PCI bridges, but in 
general CONFIG_PCI is not a given.


--
Andreas Larsson
Software Engineer
Cobham Gaisler
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Fwd: [PATCH v7 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-06-12 Thread Jayachandran C
On Fri, Jun 09, 2017 at 04:43:07PM +0100, Robin Murphy wrote:
> On 09/06/17 12:38, Jayachandran C wrote:
> > On Fri, Jun 09, 2017 Robin Murphy wrote:
> >>
> >> On 30/05/17 13:03, Geetha sowjanya wrote:
> >>> From: Linu Cherian 
> >>>
> >>> Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
> >>> and PAGE0_REGS_ONLY option is enabled as an errata workaround.
> >>> This option when turned on, replaces all page 1 offsets used for
> >>> EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
> >>>
> >>> SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
> >>> since resource size can be either 64k/128k.
> >>> For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
> >>> platform_get_resource call, so that SMMU options are set beforehand.
> >>>
> >>> Signed-off-by: Linu Cherian 
> >>> Signed-off-by: Geetha Sowjanya 
> >>> ---
> >>>  Documentation/arm64/silicon-errata.txt |1 +
> >>>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |6 ++
> >>>  drivers/iommu/arm-smmu-v3.c|   64 
> >>> +++-
> >>>  3 files changed, 56 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/Documentation/arm64/silicon-errata.txt 
> >>> b/Documentation/arm64/silicon-errata.txt
> >>> index 10f2ddd..4693a32 100644
> >>> --- a/Documentation/arm64/silicon-errata.txt
> >>> +++ b/Documentation/arm64/silicon-errata.txt
> >>> @@ -62,6 +62,7 @@ stable kernels.
> >>>  | Cavium | ThunderX GICv3  | #23154  | 
> >>> CAVIUM_ERRATUM_23154|
> >>>  | Cavium | ThunderX Core   | #27456  | 
> >>> CAVIUM_ERRATUM_27456|
> >>>  | Cavium | ThunderX SMMUv2 | #27704  | N/A   
> >>>   |
> >>> +| Cavium | ThunderX2 SMMUv3| #74 | N/A   
> >>>   |
> >>>  || | |   
> >>>   |
> >>>  | Freescale/NXP  | LS2080A/LS1043A | A-008585| 
> >>> FSL_ERRATUM_A008585 |
> >>>  || | |   
> >>>   |
> >>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
> >>> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>> index be57550..607e270 100644
> >>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>> @@ -49,6 +49,12 @@ the PCIe specification.
> >>>  - hisilicon,broken-prefetch-cmd
> >>>  : Avoid sending CMD_PREFETCH_* commands to the SMMU.
> >>>
> >>> +- cavium,cn9900-broken-page1-regspace
> >>> +: Replaces all page 1 offsets used for 
> >>> EVTQ_PROD/CONS,
> >>> + PRIQ_PROD/CONS register 
> >>> access with page 0 offsets.
> >>> + Set for Caviun ThunderX2 
> >>> silicon that doesn't support
> >>> + SMMU page1 register space.
> >>
> >> The indentation's a bit funky here - the rest of this file is actually
> >> indented with spaces, but either way it's clear your editor isn't set to
> >> 8-space tabs ;)
> >>
> >>> +
> >>>  ** Example
> >>>
> >>>  smmu@2b40 {
> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> >>> index 380969a..4e80205 100644
> >>> --- a/drivers/iommu/arm-smmu-v3.c
> >>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>> @@ -412,6 +412,9 @@
> >>>  #define MSI_IOVA_BASE0x800
> >>>  #define MSI_IOVA_LENGTH  0x10
> >>>
> >>> +#define ARM_SMMU_PAGE0_REGS_ONLY(smmu)   \
> >>> + ((smmu)->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
> >>
> >> At the two places we use this macro, frankly I think it would be clearer
> >> to just reference smmu->options directly, as we currently do for
> >> SKIP_PREFETCH. The abstraction also adds more lines than it saves...
> >>
> >>> +
> >>>  static bool disable_bypass;
> >>>  module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
> >>>  MODULE_PARM_DESC(disable_bypass,
> >>> @@ -597,6 +600,7 @@ struct arm_smmu_device {
> >>>   u32 features;
> >>>
> >>>  #define ARM_SMMU_OPT_SKIP_PREFETCH   (1 << 0)
> >>> +#define ARM_SMMU_OPT_PAGE0_REGS_ONLY(1 << 1)
> >>
> >> Whitespace again, although this time it's spaces where there should be a
> >> tab.
> >>
> >>>   u32 options;
> >>>
> >>>   struct arm_smmu_cmdqcmdq;
> >>> @@ -663,9 +667,19 @@ struct arm_smmu_option_prop {
> >>>
> >>>  static struct arm_smmu_option_prop arm_smmu_options[] = {
> >>>   { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
> >>> + { ARM_SMMU_OPT_PAGE0_REGS_ONLY, 
> >>>