Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 1:58 PM, Borislav Petkov wrote:

On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote:

Just read it. If you want to use cpuid_has_tdx_guest() directly in
cc_platform_has(), then you want to rename intel_cc_platform_has() to
tdx_cc_platform_has()?


Why?

You simply do:

if (cpuid_has_tdx_guest())
intel_cc_platform_has(...);

and lemme paste from that mail: " ...you should use
cpuid_has_tdx_guest() instead but cache its result so that you don't
call CPUID each time the kernel executes cc_platform_has()."

Makes sense?


Yes. But, since the check is related to TDX, I just want to confirm whether
you are fine with naming the function as intel_*().

Since this patch is going to have dependency on TDX code, I will include
this patch in TDX patch set.





--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 1:31 PM, Borislav Petkov wrote:

On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote:

Intel CC support patch is not included in this series. You want me
to address the issue raised by Joerg before merging it?


Did you not see my email to you today:

https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic


Just read it. If you want to use cpuid_has_tdx_guest() directly in
cc_platform_has(), then you want to rename intel_cc_platform_has() to
tdx_cc_platform_has()?



?



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 12:10 PM, Borislav Petkov wrote:

From: Borislav Petkov 

Hi all,

here's v4 of the cc_platform_has() patchset with feedback incorporated.

I'm going to route this through tip if there are no objections.


Intel CC support patch is not included in this series. You want me
to address the issue raised by Joerg before merging it?



Thx.

Tom Lendacky (8):
   x86/ioremap: Selectively build arch override encryption functions
   arch/cc: Introduce a function to check for confidential computing
 features
   x86/sev: Add an x86 version of cc_platform_has()
   powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
   x86/sme: Replace occurrences of sme_active() with cc_platform_has()
   x86/sev: Replace occurrences of sev_active() with cc_platform_has()
   x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
   treewide: Replace the use of mem_encrypt_active() with
 cc_platform_has()

  arch/Kconfig |  3 +
  arch/powerpc/include/asm/mem_encrypt.h   |  5 --
  arch/powerpc/platforms/pseries/Kconfig   |  1 +
  arch/powerpc/platforms/pseries/Makefile  |  2 +
  arch/powerpc/platforms/pseries/cc_platform.c | 26 ++
  arch/powerpc/platforms/pseries/svm.c |  5 +-
  arch/s390/include/asm/mem_encrypt.h  |  2 -
  arch/x86/Kconfig |  1 +
  arch/x86/include/asm/io.h|  8 ++
  arch/x86/include/asm/kexec.h |  2 +-
  arch/x86/include/asm/mem_encrypt.h   | 12 +--
  arch/x86/kernel/Makefile |  6 ++
  arch/x86/kernel/cc_platform.c| 69 +++
  arch/x86/kernel/crash_dump_64.c  |  4 +-
  arch/x86/kernel/head64.c |  9 +-
  arch/x86/kernel/kvm.c|  3 +-
  arch/x86/kernel/kvmclock.c   |  4 +-
  arch/x86/kernel/machine_kexec_64.c   | 19 +++--
  arch/x86/kernel/pci-swiotlb.c|  9 +-
  arch/x86/kernel/relocate_kernel_64.S |  2 +-
  arch/x86/kernel/sev.c|  6 +-
  arch/x86/kvm/svm/svm.c   |  3 +-
  arch/x86/mm/ioremap.c| 18 ++--
  arch/x86/mm/mem_encrypt.c| 55 
  arch/x86/mm/mem_encrypt_identity.c   |  9 +-
  arch/x86/mm/pat/set_memory.c |  3 +-
  arch/x86/platform/efi/efi_64.c   |  9 +-
  arch/x86/realmode/init.c |  8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |  4 +-
  drivers/gpu/drm/drm_cache.c  |  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c  |  6 +-
  drivers/iommu/amd/init.c |  7 +-
  drivers/iommu/amd/iommu.c|  3 +-
  drivers/iommu/amd/iommu_v2.c |  3 +-
  drivers/iommu/iommu.c|  3 +-
  fs/proc/vmcore.c |  6 +-
  include/linux/cc_platform.h  | 88 
  include/linux/mem_encrypt.h  |  4 -
  kernel/dma/swiotlb.c |  4 +-
  40 files changed, 310 insertions(+), 129 deletions(-)
  create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c
  create mode 100644 arch/x86/kernel/cc_platform.c
  create mode 100644 include/linux/cc_platform.h



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-16 Thread Kuppuswamy, Sathyanarayanan




On 9/16/21 8:02 AM, Borislav Petkov wrote:

On Wed, Sep 15, 2021 at 10:26:06AM -0700, Kuppuswamy, Sathyanarayanan wrote:

I have a Intel variant patch (please check following patch). But it includes
TDX changes as well. Shall I move TDX changes to different patch and just
create a separate patch for adding intel_cc_platform_has()?


Yes, please, so that I can expedite that stuff separately and so that it
can go in early in order for future work to be based ontop.


Sent it part of TDX patch series. Please check and cherry pick it.

https://lore.kernel.org/lkml/20210916183550.15349-2-sathyanarayanan.kuppusw...@linux.intel.com/



Thx.



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Kuppuswamy, Sathyanarayanan




On 9/15/21 9:46 AM, Borislav Petkov wrote:

Sathya,

if you want to prepare the Intel variant intel_cc_platform_has() ontop
of those and send it to me, that would be good because then I can
integrate it all in one branch which can be used to base future work
ontop.


I have a Intel variant patch (please check following patch). But it includes
TDX changes as well. Shall I move TDX changes to different patch and just
create a separate patch for adding intel_cc_platform_has()?


commit fc5f98a0ed94629d903827c5b44ee9295f835831
Author: Kuppuswamy Sathyanarayanan 
Date:   Wed May 12 11:35:13 2021 -0700

x86/tdx: Add confidential guest support for TDX guest

TDX architecture provides a way for VM guests to be highly secure and
isolated (from untrusted VMM). To achieve this requirement, any data
coming from VMM cannot be completely trusted. TDX guest fixes this
issue by hardening the IO drivers against the attack from the VMM.
So, when adding hardening fixes to the generic drivers, to protect
custom fixes use cc_platform_has() API.

Also add TDX guest support to cc_platform_has() API to protect the
TDX specific fixes.

Signed-off-by: Kuppuswamy Sathyanarayanan 


diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a5b14de03458..2e78358923a1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -871,6 +871,7 @@ config INTEL_TDX_GUEST
depends on SECURITY
select X86_X2APIC
select SECURITY_LOCKDOWN_LSM
+   select ARCH_HAS_CC_PLATFORM
help
  Provide support for running in a trusted domain on Intel processors
  equipped with Trusted Domain eXtensions. TDX is a new Intel
diff --git a/arch/x86/include/asm/intel_cc_platform.h 
b/arch/x86/include/asm/intel_cc_platform.h
new file mode 100644
index ..472c3174beac
--- /dev/null
+++ b/arch/x86/include/asm/intel_cc_platform.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2021 Intel Corporation */
+#ifndef _ASM_X86_INTEL_CC_PLATFORM_H
+#define _ASM_X86_INTEL_CC_PLATFORM_H
+
+#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_CC_PLATFORM)
+bool intel_cc_platform_has(unsigned int flag);
+#else
+static inline bool intel_cc_platform_has(unsigned int flag) { return false; }
+#endif
+
+#endif /* _ASM_X86_INTEL_CC_PLATFORM_H */
+
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 3c9bacd3c3f3..e83bc2f48efe 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -10,11 +10,16 @@
 #include 
 #include 
 #include 
+#include 
+
+#include 

 bool cc_platform_has(enum cc_attr attr)
 {
if (sme_me_mask)
return amd_cc_platform_has(attr);
+   else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+   return intel_cc_platform_has(attr);

return false;
 }
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..ab486a3b1eb0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -60,6 +61,21 @@ static u64 msr_test_ctrl_cache __ro_after_init;
  */
 static bool cpu_model_supports_sld __ro_after_init;

+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+bool intel_cc_platform_has(enum cc_attr attr)
+{
+   switch (attr) {
+   case CC_ATTR_GUEST_TDX:
+   return cpu_feature_enabled(X86_FEATURE_TDX_GUEST);
+   default:
+   return false;
+   }
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(intel_cc_platform_has);
+#endif
+
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 253f3ea66cd8..e38430e6e396 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -61,6 +61,15 @@ enum cc_attr {
 * Examples include SEV-ES.
 */
CC_ATTR_GUEST_STATE_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support
+*
+* The platform/OS is running as a TDX guest/virtual machine.
+*
+* Examples include SEV-ES.
+*/
+   CC_ATTR_GUEST_TDX,
 };

 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Kuppuswamy, Sathyanarayanan




On 8/19/21 11:33 AM, Tom Lendacky wrote:

There was some talk about this on the mailing list where TDX and SEV may
need to be differentiated, so we wanted to reserve a range of values per
technology. I guess I can remove them until they are actually needed.


In TDX also we have similar requirements and we need some flags for
TDX specific checks. So I think it is fine to leave some space for vendor
flags.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-13 Thread Kuppuswamy, Sathyanarayanan




On 8/13/21 9:59 AM, Tom Lendacky wrote:

In prep for other protected virtualization technologies, introduce a
generic helper function, prot_guest_has(), that can be used to check
for specific protection attributes, like memory encryption. This is
intended to eliminate having to add multiple technology-specific checks
to the code (e.g. if (sev_active() || tdx_active())).

Reviewed-by: Joerg Roedel
Co-developed-by: Andi Kleen
Signed-off-by: Andi Kleen
Co-developed-by: Kuppuswamy 
Sathyanarayanan
Signed-off-by: Kuppuswamy 
Sathyanarayanan
Signed-off-by: Tom Lendacky
---
  arch/Kconfig|  3 +++
  include/linux/protected_guest.h | 35 +
  2 files changed, 38 insertions(+)
  create mode 100644 include/linux/protected_guest.h


Reviewed-by: Kuppuswamy Sathyanarayanan 


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-08-11 Thread Kuppuswamy, Sathyanarayanan




On 7/27/21 3:26 PM, Tom Lendacky wrote:

diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
new file mode 100644
index ..f8ed7b72967b
--- /dev/null
+++ b/include/linux/protected_guest.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Protected Guest (and Host) Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky
+ */
+
+#ifndef _PROTECTED_GUEST_H
+#define _PROTECTED_GUEST_H
+
+#ifndef __ASSEMBLY__


Can you include headers for bool type and false definition?

--- a/include/linux/protected_guest.h
+++ b/include/linux/protected_guest.h
@@ -12,6 +12,9 @@

 #ifndef __ASSEMBLY__

+#include 
+#include 

Otherwise, I see following errors in multi-config auto testing.

include/linux/protected_guest.h:40:15: error: unknown type name 'bool'
include/linux/protected_guest.h:40:63: error: 'false' undeclared (first use in 
this functi



+
+#define PATTR_MEM_ENCRYPT  0   /* Encrypted memory */
+#define PATTR_HOST_MEM_ENCRYPT 1   /* Host encrypted memory */
+#define PATTR_GUEST_MEM_ENCRYPT2   /* Guest encrypted 
memory */
+#define PATTR_GUEST_PROT_STATE 3   /* Guest encrypted state */
+
+#ifdef CONFIG_ARCH_HAS_PROTECTED_GUEST
+
+#include 
+
+#else  /* !CONFIG_ARCH_HAS_PROTECTED_GUEST */
+
+static inline bool prot_guest_has(unsigned int attr) { return false; }
+
+#endif /* CONFIG_ARCH_HAS_PROTECTED_GUEST */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _PROTECTED_GUEST_H */


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-10 Thread Kuppuswamy, Sathyanarayanan



On 8/10/21 12:48 PM, Tom Lendacky wrote:

On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote:



On 7/27/21 3:26 PM, Tom Lendacky wrote:

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
   #include 
   #include 
   #include 
-#include 
+#include 
   #include 
     #include 
@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
physaddr,
    * there is no need to zero it after changing the memory encryption
    * attribute.
    */
-    if (mem_encrypt_active()) {
+    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
   vaddr = (unsigned long)__start_bss_decrypted;
   vaddr_end = (unsigned long)__end_bss_decrypted;



Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with
prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in
TDX.


This is a direct replacement for now. I think the change you're requesting
should be done as part of the TDX support patches so it's clear why it is
being changed.


Ok. I will include it part of TDX changes.



But, wouldn't TDX still need to do something with this shared/unencrypted
area, though? Or since it is shared, there's actually nothing you need to
do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not
configured)?


Kirill had a requirement to turn on CONFIG_AMD_MEM_ENCRYPT for adding lazy
accept support in TDX guest kernel. Kirill, can you add details here?



Thanks,
Tom





--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-10 Thread Kuppuswamy, Sathyanarayanan




On 7/27/21 3:26 PM, Tom Lendacky wrote:

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  
  #include 

@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * there is no need to zero it after changing the memory encryption
 * attribute.
 */
-   if (mem_encrypt_active()) {
+   if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;



Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with
prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in
TDX.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Kuppuswamy, Sathyanarayanan




On 8/9/21 2:59 PM, Tom Lendacky wrote:

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.


We don't plan to set PROT_STATE. So it does not affect TDX.
For SMP, we use MADT ACPI table for AP booting.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-08-08 Thread Kuppuswamy, Sathyanarayanan

Hi Tom,

On 7/27/21 3:26 PM, Tom Lendacky wrote:

This patch series provides a generic helper function, prot_guest_has(),
to replace the sme_active(), sev_active(), sev_es_active() and
mem_encrypt_active() functions.

It is expected that as new protected virtualization technologies are
added to the kernel, they can all be covered by a single function call
instead of a collection of specific function calls all called from the
same locations.

The powerpc and s390 patches have been compile tested only. Can the
folks copied on this series verify that nothing breaks for them.


With this patch set, select ARCH_HAS_PROTECTED_GUEST and set
CONFIG_AMD_MEM_ENCRYPT=n, creates following error.

ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data':
arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted'

It looks like early_memremap_is_setup_data() is not protected with
appropriate config.




Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Baoquan He 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Christian Borntraeger 
Cc: Daniel Vetter 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Airlie 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Joerg Roedel 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Thomas Zimmermann 
Cc: Vasily Gorbik 
Cc: VMware Graphics 
Cc: Will Deacon 

---

Patches based on:
   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
   commit 79e920060fa7 ("Merge branch 'WIP/fixes'")

Tom Lendacky (11):
   mm: Introduce a function to check for virtualization protection
 features
   x86/sev: Add an x86 version of prot_guest_has()
   powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
   x86/sme: Replace occurrences of sme_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
   treewide: Replace the use of mem_encrypt_active() with
 prot_guest_has()
   mm: Remove the now unused mem_encrypt_active() function
   x86/sev: Remove the now unused mem_encrypt_active() function
   powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
 function
   s390/mm: Remove the now unused mem_encrypt_active() function

  arch/Kconfig   |  3 ++
  arch/powerpc/include/asm/mem_encrypt.h |  5 --
  arch/powerpc/include/asm/protected_guest.h | 30 +++
  arch/powerpc/platforms/pseries/Kconfig |  1 +
  arch/s390/include/asm/mem_encrypt.h|  2 -
  arch/x86/Kconfig   |  1 +
  arch/x86/include/asm/kexec.h   |  2 +-
  arch/x86/include/asm/mem_encrypt.h | 13 +
  arch/x86/include/asm/protected_guest.h | 27 ++
  arch/x86/kernel/crash_dump_64.c|  4 +-
  arch/x86/kernel/head64.c   |  4 +-
  arch/x86/kernel/kvm.c  |  3 +-
  arch/x86/kernel/kvmclock.c |  4 +-
  arch/x86/kernel/machine_kexec_64.c | 19 +++
  arch/x86/kernel/pci-swiotlb.c  |  9 ++--
  arch/x86/kernel/relocate_kernel_64.S   |  2 +-
  arch/x86/kernel/sev.c  |  6 +--
  arch/x86/kvm/svm/svm.c |  3 +-
  arch/x86/mm/ioremap.c  | 16 +++---
  arch/x86/mm/mem_encrypt.c  | 60 +++---
  arch/x86/mm/mem_encrypt_identity.c |  3 +-
  arch/x86/mm/pat/set_memory.c   |  3 +-
  arch/x86/platform/efi/efi_64.c |  9 ++--
  arch/x86/realmode/init.c   |  8 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  4 +-
  drivers/gpu/drm/drm_cache.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c|  6 +--
  drivers/iommu/amd/init.c   |  7 +--
  drivers/iommu/amd/iommu.c  |  3 +-
  drivers/iommu/amd/iommu_v2.c   |  3 +-
  drivers/iommu/iommu.c  |  3 +-
  fs/proc/vmcore.c   |  6 +--
  include/linux/mem_encrypt.h|  4 --
  include/linux/protected_guest.h| 37 +
  kernel/dma/swiotlb.c   |  4 +-
  36 files changed, 218 insertions(+), 104 deletions(-)
  create mode 100644 arch/powerpc/include/asm/protected_guest.h
  create mode 100644 arch/x86/include/asm/protected_guest.h
  create mode 100644 include/linux/protected_guest.h



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX

2021-06-02 Thread Kuppuswamy, Sathyanarayanan




On 6/2/21 5:41 PM, Andi Kleen wrote:

+int arch_has_restricted_virtio_memory_access(void)
+{
+   return is_tdx_guest();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
+


This function definition had to be removed from arch/x86/mm/mem_encrypt.c.

Otherwise, if you enable both CONFIG_AMD_MEM_ENCRYPT,
CONFIG_X86_MEM_ENCRYPT_COMMON it will generate multiple definition error.

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -493,9 +493,3 @@ void __init amd_mem_encrypt_init(void)

print_mem_encrypt_feature_info();
 }
-
-int arch_has_restricted_virtio_memory_access(void)
-{
-   return sev_active();
-}
-EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);

--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -40,7 +40,7 @@ void __init mem_encrypt_init(void)

 int arch_has_restricted_virtio_memory_access(void)
 {
-   return is_tdx_guest();
+   return (is_tdx_guest() || sev_active());
 }
 EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Relax ACS requirement for Intel RCiEP devices.

2020-05-26 Thread Kuppuswamy, Sathyanarayanan



On 5/26/20 3:17 PM, Ashok Raj wrote:

All Intel platforms guarantee that all root complex implementations
must send transactions up to IOMMU for address translations. Hence for
RCiEP devices that are Vendor ID Intel, can claim exception for lack of
ACS support.


3.16 Root-Complex Peer to Peer Considerations
When DMA remapping is enabled, peer-to-peer requests through the
Root-Complex must be handled
as follows:
• The input address in the request is translated (through first-level,
   second-level or nested translation) to a host physical address (HPA).
   The address decoding for peer addresses must be done only on the
   translated HPA. Hardware implementations are free to further limit
   peer-to-peer accesses to specific host physical address regions
   (or to completely disallow peer-forwarding of translated requests).
• Since address translation changes the contents (address field) of
   the PCI Express Transaction Layer Packet (TLP), for PCI Express
   peer-to-peer requests with ECRC, the Root-Complex hardware must use
   the new ECRC (re-computed with the translated address) if it
   decides to forward the TLP as a peer request.
• Root-ports, and multi-function root-complex integrated endpoints, may
   support additional peerto-peer control features by supporting PCI Express
   Access Control Services (ACS) capability. Refer to ACS capability in
   PCI Express specifications for details.

Since Linux didn't give special treatment to allow this exception, certain
RCiEP MFD devices are getting grouped in a single iommu group. This
doesn't permit a single device to be assigned to a guest for instance.

In one vendor system: Device 14.x were grouped in a single IOMMU group.

/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/5/devices/:00:14.3

After the patch:
/sys/kernel/iommu_groups/5/devices/:00:14.0
/sys/kernel/iommu_groups/5/devices/:00:14.2
/sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group

14.0 and 14.2 are integrated devices, but legacy end points.
Whereas 14.3 was a PCIe compliant RCiEP.

00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00

This permits assigning this device to a guest VM.

Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
Signed-off-by: Ashok Raj 
To: Joerg Roedel 
To: Bjorn Helgaas 
Cc: linux-ker...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
Cc: Alex Williamson 
Cc: Darrel Goeddel 
Cc: Mark Scott ,
Cc: Romil Sharma 
Cc: Ashok Raj 
---
  drivers/iommu/iommu.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..31b595dfedde 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1187,7 +1187,18 @@ static struct iommu_group 
*get_pci_function_alias_group(struct pci_dev *pdev,
struct pci_dev *tmp = NULL;
struct iommu_group *group;
  
-	if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))

+   /*
+* Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer
+* Considerations manadate that all transactions in RCiEP's and
+* even Integrated MFD's *must* be sent up to the IOMMU. P2P is
+* only possible on translated addresses. This gives enough
+* guarantee that such devices can be forgiven for lack of ACS
+* support.
+*/
+   if (!pdev->multifunction ||
+   (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) ||
+pci_acs_enabled(pdev, REQ_ACS_FLAGS))

Looks good to me.
Reviewed-by: Kuppuswamy Sathyanarayanan 


return NULL;
  
  	for_each_pci_dev(tmp) {



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 23/25] PCI/ATS: Add PRI stubs

2020-04-14 Thread Kuppuswamy, Sathyanarayanan

Hi,

On 4/14/20 10:02 AM, Jean-Philippe Brucker wrote:

The SMMUv3 driver, which can be built without CONFIG_PCI, will soon gain
support for PRI.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
unused PRI and PASID stubs") to re-introduce the PRI stubs, and avoid
adding more #ifdefs to the SMMU driver.

Acked-by: Bjorn Helgaas 
Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Kuppuswamy Sathyanarayanan 


---
  include/linux/pci-ats.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index f75c307f346de..e9e266df9b37c 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -28,6 +28,14 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
  void pci_disable_pri(struct pci_dev *pdev);
  int pci_reset_pri(struct pci_dev *pdev);
  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+#else /* CONFIG_PCI_PRI */
+static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
+{ return -ENODEV; }
+static inline void pci_disable_pri(struct pci_dev *pdev) { }
+static inline int pci_reset_pri(struct pci_dev *pdev)
+{ return -ENODEV; }
+static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
+{ return 0; }
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 24/25] PCI/ATS: Export PRI functions

2020-04-14 Thread Kuppuswamy, Sathyanarayanan



Hi,
On 4/14/20 10:02 AM, Jean-Philippe Brucker wrote:

The SMMUv3 driver uses pci_{enable,disable}_pri() and related
functions. Export those functions to allow the driver to be built as a
module.

Acked-by: Bjorn Helgaas 
Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Kuppuswamy Sathyanarayanan 


---
  drivers/pci/ats.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index bbfd0d42b8b97..fc8fc6fc8bd55 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -197,6 +197,7 @@ void pci_pri_init(struct pci_dev *pdev)
if (status & PCI_PRI_STATUS_PASID)
pdev->pasid_required = 1;
  }
+EXPORT_SYMBOL_GPL(pci_pri_init);
  
  /**

   * pci_enable_pri - Enable PRI capability
@@ -243,6 +244,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
  
  	return 0;

  }
+EXPORT_SYMBOL_GPL(pci_enable_pri);
  
  /**

   * pci_disable_pri - Disable PRI capability
@@ -322,6 +324,7 @@ int pci_reset_pri(struct pci_dev *pdev)
  
  	return 0;

  }
+EXPORT_SYMBOL_GPL(pci_reset_pri);
  
  /**

   * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
@@ -337,6 +340,7 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
  
  	return pdev->pasid_required;

  }
+EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] PCI/ATS: Remove unused PRI and PASID stubs

2019-10-10 Thread Kuppuswamy Sathyanarayanan



On 10/9/19 3:53 PM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

The following functions are only used by amd_iommu.c and intel-iommu.c
(when CONFIG_INTEL_IOMMU_SVM is enabled).  CONFIG_PCI_PRI and
CONFIG_PCI_PASID are always defined in those cases, so there's no need for
the stubs.

   pci_enable_pri()
   pci_disable_pri()
   pci_reset_pri()
   pci_prg_resp_pasid_required()
   pci_enable_pasid()
   pci_disable_pasid()

Remove the unused stubs.

Signed-off-by: Bjorn Helgaas 

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan 


---
  include/linux/pci-ats.h | 10 --
  1 file changed, 10 deletions(-)

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 67de3a9499bb..963c11f7c56b 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -27,14 +27,7 @@ void pci_restore_pri_state(struct pci_dev *pdev);
  int pci_reset_pri(struct pci_dev *pdev);
  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
  #else /* CONFIG_PCI_PRI */
-static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
-{ return -ENODEV; }
-static inline void pci_disable_pri(struct pci_dev *pdev) { }
  static inline void pci_restore_pri_state(struct pci_dev *pdev) { }
-static inline int pci_reset_pri(struct pci_dev *pdev)
-{ return -ENODEV; }
-static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
-{ return 0; }
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID

@@ -44,9 +37,6 @@ void pci_restore_pasid_state(struct pci_dev *pdev);
  int pci_pasid_features(struct pci_dev *pdev);
  int pci_max_pasids(struct pci_dev *pdev);
  #else /* CONFIG_PCI_PASID */
-static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
-{ return -EINVAL; }
-static inline void pci_disable_pasid(struct pci_dev *pdev) { }
  static inline void pci_restore_pasid_state(struct pci_dev *pdev) { }
  static inline int pci_pasid_features(struct pci_dev *pdev)
  { return -EINVAL; }


--
Sathyanarayanan Kuppuswamy
Linux kernel developer

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Select PCI_PRI for INTEL_IOMMU_SVM

2019-10-09 Thread Kuppuswamy Sathyanarayanan



On 10/9/19 3:45 PM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

When CONFIG_INTEL_IOMMU_SVM=y, iommu_enable_dev_iotlb() calls PRI
interfaces (pci_reset_pri() and pci_enable_pri()), but those are only
implemented when CONFIG_PCI_PRI is enabled.

Previously INTEL_IOMMU_SVM selected PCI_PASID but not PCI_PRI, so the state
of PCI_PRI depended on whether AMD_IOMMU (which selects PCI_PRI) was
enabled or PCI_PRI was enabled explicitly.

The behavior of iommu_enable_dev_iotlb() should not depend on whether
AMD_IOMMU is enabled.  Make it predictable by having INTEL_IOMMU_SVM select
PCI_PRI so iommu_enable_dev_iotlb() always uses the full implementations of
PRI interfaces.

Signed-off-by: Bjorn Helgaas 


Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan 




---
  drivers/iommu/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..b183c9f916b0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -207,6 +207,7 @@ config INTEL_IOMMU_SVM
bool "Support for Shared Virtual Memory with Intel IOMMU"
depends on INTEL_IOMMU && X86
select PCI_PASID
+   select PCI_PRI
select MMU_NOTIFIER
help
  Shared Virtual Memory (SVM) provides a facility for devices


--
Sathyanarayanan Kuppuswamy
Linux kernel developer

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] PCI/ATS: Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI

2019-10-09 Thread Kuppuswamy Sathyanarayanan



On 10/9/19 3:45 PM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

pci_prg_resp_pasid_required() returns the value of the "PRG Response PASID
Required" bit from the PRI capability, but the interface was previously
defined under #ifdef CONFIG_PCI_PASID.

Move it from CONFIG_PCI_PASID to CONFIG_PCI_PRI so it's with the other
PRI-related things.

Signed-off-by: Bjorn Helgaas 
Reviewed-by: Kuppuswamy Sathyanarayanan 


---
  drivers/pci/ats.c   | 55 +++--
  include/linux/pci-ats.h | 11 -
  2 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e18499243f84..0d06177252c7 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -280,6 +280,31 @@ int pci_reset_pri(struct pci_dev *pdev)
return 0;
  }
  EXPORT_SYMBOL_GPL(pci_reset_pri);
+
+/**
+ * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
+ *  status.
+ * @pdev: PCI device structure
+ *
+ * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
+ */
+int pci_prg_resp_pasid_required(struct pci_dev *pdev)
+{
+   u16 status;
+   int pos;
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+   if (!pos)
+   return 0;
+
+   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
+
+   if (status & PCI_PRI_STATUS_PASID)
+   return 1;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID

@@ -395,36 +420,6 @@ int pci_pasid_features(struct pci_dev *pdev)
  }
  EXPORT_SYMBOL_GPL(pci_pasid_features);
  
-/**

- * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
- *  status.
- * @pdev: PCI device structure
- *
- * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
- *
- * Even though the PRG response PASID status is read from PRI Status
- * Register, since this API will mainly be used by PASID users, this
- * function is defined within #ifdef CONFIG_PCI_PASID instead of
- * CONFIG_PCI_PRI.
- */
-int pci_prg_resp_pasid_required(struct pci_dev *pdev)
-{
-   u16 status;
-   int pos;
-
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-   if (!pos)
-   return 0;
-
-   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
-
-   if (status & PCI_PRI_STATUS_PASID)
-   return 1;
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
-
  #define PASID_NUMBER_SHIFT8
  #define PASID_NUMBER_MASK (0x1f << PASID_NUMBER_SHIFT)
  /**
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 1ebb88e7c184..a7a2b3d94fcc 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -10,6 +10,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
  void pci_disable_pri(struct pci_dev *pdev);
  void pci_restore_pri_state(struct pci_dev *pdev);
  int pci_reset_pri(struct pci_dev *pdev);
+int pci_prg_resp_pasid_required(struct pci_dev *pdev);
  
  #else /* CONFIG_PCI_PRI */
  
@@ -31,6 +32,10 @@ static inline int pci_reset_pri(struct pci_dev *pdev)

return -ENODEV;
  }
  
+static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)

+{
+   return 0;
+}
  #endif /* CONFIG_PCI_PRI */
  
  #ifdef CONFIG_PCI_PASID

@@ -40,7 +45,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
  void pci_restore_pasid_state(struct pci_dev *pdev);
  int pci_pasid_features(struct pci_dev *pdev);
  int pci_max_pasids(struct pci_dev *pdev);
-int pci_prg_resp_pasid_required(struct pci_dev *pdev);
  
  #else  /* CONFIG_PCI_PASID */
  
@@ -66,11 +70,6 @@ static inline int pci_max_pasids(struct pci_dev *pdev)

  {
return -EINVAL;
  }
-
-static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
-{
-   return 0;
-}
  #endif /* CONFIG_PCI_PASID */
  
  


--
Sathyanarayanan Kuppuswamy
Linux kernel developer



Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-28 Thread Kuppuswamy Sathyanarayanan
On Mon, Aug 19, 2019 at 06:19:25PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan 
> > > wrote:
> > > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > > 
> > > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
> > > > > sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > > > > > From: Kuppuswamy Sathyanarayanan 
> > > > > > 
> > > > > > 
> > > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF 
> > > > > > device
> > > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > > enabling the PRI support. But this assumption is incorrect. As per 
> > > > > > PCIe
> > > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > > > PRI of the PF and not implement it. Hence we need to create 
> > > > > > exception
> > > > > > for handling the PRI support for PCIe VF device.
> > > > > > 
> > > > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > > > should apply.
> > > > > > 
> > > > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > > > PRI enable/disable call.
> > > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is 
> > > > > > zero.
> > > 
> > > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > > PASID of the PF.
> > > > > 
> > > > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > > > just make them do nothing or return -EINVAL?  What functionality would
> > > > > we be missing if we did that?
> > > > 
> > > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > > > doing nothing for VF device will break the functionality.
> > > 
> > > What is the path where we can enable PRI/PASID for VF but not for the
> > > PF?  The call chains leading to pci_enable_pri() go through the
> > > iommu_ops.add_device interface, which makes me think this is part of
> > > the device enumeration done by the PCI core, and in that case I would
> > > think this it should be done for the PF before VFs.  But maybe this
> > > path isn't exercised until a driver does a DMA map or something
> > > similar?
> 
> > AFAIK, this path will only get exercised when the device does DMA and
> > hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> > In fact, my v2 version of this patch set had a check to ensure PF
> > PRI/PASID enable is happened before VF attempts PRI/PASID
> > enable/disable. But I had to remove it in later version of this series
> > due to failure case reported by one the tester of this code. 
> 
> What's the path?  And does that path make sense?
> 
> I got this far before giving up:
> 
> iommu_go_to_state   # AMD
>   state_next
> amd_iommu_init_pci
>   amd_iommu_init_api
> bus_set_iommu
>   iommu_bus_init
> bus_for_each_dev(..., add_iommu_group)
>   add_iommu_group
> iommu_probe_device
>   amd_iommu_add_device  # 
> amd_iommu_ops.add_device
> init_iommu_group
>   iommu_group_get_for_dev
> iommu_gr

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-19 Thread Kuppuswamy Sathyanarayanan
On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > 
> > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
> > > sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > > > From: Kuppuswamy Sathyanarayanan 
> > > > 
> > > > 
> > > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > PCIe VF device is currently broken. Current implementation expects
> > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > PRI of the PF and not implement it. Hence we need to create exception
> > > > for handling the PRI support for PCIe VF device.
> > > > 
> > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > should apply.
> > > > 
> > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > PRI enable/disable call.
> > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> 
> > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > PASID of the PF.
> > > 
> > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > just make them do nothing or return -EINVAL?  What functionality would
> > > we be missing if we did that?
> > 
> > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > doing nothing for VF device will break the functionality.
> 
> What is the path where we can enable PRI/PASID for VF but not for the
> PF?  The call chains leading to pci_enable_pri() go through the
> iommu_ops.add_device interface, which makes me think this is part of
> the device enumeration done by the PCI core, and in that case I would
> think this it should be done for the PF before VFs.  But maybe this
> path isn't exercised until a driver does a DMA map or something
> similar?
AFAIK, this path will only get exercised when the device does DMA and
hence there is no specific order in which PRI/PASID is enabled in PF/VF.
In fact, my v2 version of this patch set had a check to ensure PF
PRI/PASID enable is happened before VF attempts PRI/PASID
enable/disable. But I had to remove it in later version of this series
due to failure case reported by one the tester of this code. 
> 
> Bjorn

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-15 Thread Kuppuswamy Sathyanarayanan



On 8/15/19 3:20 PM, Bjorn Helgaas wrote:

[+cc Joerg, David, iommu list: because IOMMU drivers are the only
callers of pci_enable_pri() and pci_enable_pasid()]

On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:

From: Kuppuswamy Sathyanarayanan 

When IOMMU tries to enable Page Request Interface (PRI) for VF device
in iommu_enable_dev_iotlb(), it always fails because PRI support for
PCIe VF device is currently broken. Current implementation expects
the given PCIe device (PF & VF) to implement PRI capability before
enabling the PRI support. But this assumption is incorrect. As per PCIe
spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
PRI of the PF and not implement it. Hence we need to create exception
for handling the PRI support for PCIe VF device.

Also, since PRI is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
PRI enable/disable call.
2. Use reference count logic to track the usage of PRI resource.
3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Wait, why do we need this at all?  I agree the spec says VFs may not
implement PRI or PASID capabilities and that VFs use the PRI and
PASID of the PF.

But why do we need to support pci_enable_pri() and pci_enable_pasid()
for VFs?  There's nothing interesting we can *do* in the VF, and
passing it off to the PF adds all this locking mess.  For VFs, can we
just make them do nothing or return -EINVAL?  What functionality would
we be missing if we did that?


Currently PRI/PASID capabilities are not enabled by default. IOMMU can
enable PRI/PASID for VF first (and not enable it for PF). In this case,
doing nothing for VF device will break the functionality.

Also the PRI/PASID config options like "PRI Outstanding Page Request 
Allocation"
or "PASID Execute Permission" or "PASID Privileged Mode" are currently 
configured
as per device feature. And hence there is a chance for VF/PF to use 
different

values for these options.


(Obviously returning -EINVAL would require tweaks in the callers to
either avoid the call for VFs or handle the -EINVAL gracefully.)


Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
  drivers/pci/ats.c   | 143 ++--
  include/linux/pci.h |   2 +
  2 files changed, 112 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 1f4be27a071d..079dc544 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
if (pdev->is_virtfn)
return;
  
+	mutex_init(>pri_lock);

+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return;
@@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
  {
u16 control, status;
u32 max_requests;
+   int ret = 0;
+   struct pci_dev *pf = pci_physfn(pdev);
  
-	if (WARN_ON(pdev->pri_enabled))

-   return -EBUSY;
+   mutex_lock(>pri_lock);
  
-	if (!pdev->pri_cap)

-   return -EINVAL;
+   if (WARN_ON(pdev->pri_enabled)) {
+   ret = -EBUSY;
+   goto pri_unlock;
+   }
  
-	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );

-   if (!(status & PCI_PRI_STATUS_STOPPED))
-   return -EBUSY;
+   if (!pf->pri_cap) {
+   ret = -EINVAL;
+   goto pri_unlock;
+   }
+
+   if (pdev->is_virtfn && pf->pri_enabled)
+   goto update_status;
+
+   /*
+* Before updating PRI registers, make sure there is no
+* outstanding PRI requests.
+*/
+   pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, );
+   if (!(status & PCI_PRI_STATUS_STOPPED)) {
+   ret = -EBUSY;
+   goto pri_unlock;
+   }
  
-	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,

- _requests);
+   pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests);
reqs = min(max_requests, reqs);
-   pdev->pri_reqs_alloc = reqs;
-   pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+   pf->pri_reqs_alloc = reqs;
+   pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
  
  	control = PCI_PRI_CTRL_ENABLE;

-   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+   pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
  
-	pdev->pri_enabled = 1;

+   /*
+* If PRI is not already enabled in PF, increment the PF
+* pri_ref_cnt to track the usage of PRI interface.
+*/
+   if (pdev->is_virtfn && !pf->pri_enabled) {
+