Re: [RFC v4 17/17] procfs: display the protection-key number associated with a vma

2017-06-27 Thread Michael Ellerman
Ram Pai  writes:

> Display the pkey number associated with the vma in smaps of a task.
> The key will be seen as below:
>
> VmFlags: rd wr mr mw me dw ac key=0

Why wouldn't we just emit a "ProtectionKey:" line like x86 does?

See their arch_show_smap().

You should probably also do what x86 does, which is to not display the
key on CPUs that don't support keys.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] kmod: help make deterministic

2017-06-27 Thread Luis R. Rodriguez
On Tue, Jun 27, 2017 at 05:26:05PM +0200, Jessica Yu wrote:
> +++ Luis R. Rodriguez [27/06/17 00:44 +0200]:
> > Feel free to decouple it, but note that then the commit log must then be
> > changed. My own take is this fix is not so critical as it is a corner case, 
> > so
> > I have instead preferred to couple in the test case and respective fix
> > together. I'll leave it up to you how to proceed.
> 
> I'll take 01 and 02 for the next merge window, as they are
> straightforward. 03/04 can stay together, and as I understand it 04
> may need to switch back to using the normal wait_* api.

OK, I'll rework 03-04 with the regular wait and submit aiming towards Andrew's 
tree.

> I'm not the maintainer for kmod.c, if that's what you mean by
> decoupling.

I was suggesting adding kmod.c to MODULE SUPPORT list as its all module
related, adding my self to the list and I'd just take on helping kmod.c stuff.
My point was that it feels odd to decouple kmod.c from module stuff. But lets
give it a shot with them separated first and see how that goes.

>  But I don't think it has one, which is why I'm suggesting
> adding it to MAINTAINERS, since you've been actively working on it :)
> (looking at git log, it looks like Andrew did most of the sign-off's
> for kmod.c in the past). I think a separate entry in MAINTAINERS is
> good, with the name you suggested.

Sure.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/2] dm: boot a mapped device without an initramfs

2017-06-27 Thread Kees Cook
On Fri, May 19, 2017 at 12:11 AM, Enric Balletbo i Serra
 wrote:
> Dear all,
>
> So here is a new version of the patches to be reviewed, this time as
> suggested by Alasdair the patches are reworked to match with the new
> dmsetup bootformat feature [1]. These patches are not reviewed yet but
> the format was discussed in the IRC and was suggested to send the
> kernel patches in parallel.

Just pinging on this thread again. Alasdair, how does this look to you?

Thanks!

-Kees

>
> Changes since v7:
>  - Fix build error due commit
> e516db4f67 (dm ioctl: add a new DM_DEV_ARM_POLL ioctl)
>
> Changes since v6:
>  - Add a new function to issue the equivalent of a DM ioctl programatically.
>  - Use the new ioctl interface to create the devices.
>  - Use a comma-delimited and semi-colon delimited dmsetup-like commands.
>
> Changes since v5:
>  - https://www.redhat.com/archives/dm-devel/2016-February/msg00112.html
>
> [1] https://www.redhat.com/archives/linux-lvm/2017-May/msg00047.html
>
> Wating for your feedback,
>
> Enric Balletbo i Serra (1):
>   dm ioctl: add a device mapper ioctl function.
>
> Will Drewry (1):
>   init: add support to directly boot to a mapped device
>
>  Documentation/admin-guide/kernel-parameters.rst |   1 +
>  Documentation/admin-guide/kernel-parameters.txt |   3 +
>  Documentation/device-mapper/dm-boot.txt |  65 
>  drivers/md/dm-ioctl.c   |  50 +++
>  include/linux/device-mapper.h   |   6 +
>  init/Makefile   |   1 +
>  init/do_mounts.c|   1 +
>  init/do_mounts.h|  10 +
>  init/do_mounts_dm.c | 459 
> 
>  9 files changed, 596 insertions(+)
>  create mode 100644 Documentation/device-mapper/dm-boot.txt
>  create mode 100644 init/do_mounts_dm.c
>
> --
> 2.9.3
>



-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 34/38] x86/mm: Create native_make_p4d() for PGTABLE_LEVELS <= 4

2017-06-27 Thread Tom Lendacky
Currently, native_make_p4d() is only defined when CONFIG_PGTABLE_LEVELS
is greater than 4. Create a macro that will allow for defining and using
native_make_p4d() when CONFIG_PGTABLES_LEVELS is not greater than 4.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/pgtable_types.h |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 830992f..6c55973 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -309,6 +309,11 @@ static inline p4dval_t native_p4d_val(p4d_t p4d)
 #else
 #include 
 
+static inline p4d_t native_make_p4d(pudval_t val)
+{
+   return (p4d_t) { .pgd = native_make_pgd((pgdval_t)val) };
+}
+
 static inline p4dval_t native_p4d_val(p4d_t p4d)
 {
return native_pgd_val(p4d.pgd);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 35/38] x86/mm: Add support to encrypt the kernel in-place

2017-06-27 Thread Tom Lendacky
Add the support to encrypt the kernel in-place. This is done by creating
new page mappings for the kernel - a decrypted write-protected mapping
and an encrypted mapping. The kernel is encrypted by copying it through
a temporary buffer.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |6 +
 arch/x86/mm/Makefile   |1 
 arch/x86/mm/mem_encrypt.c  |  310 
 arch/x86/mm/mem_encrypt_boot.S |  150 +
 4 files changed, 467 insertions(+)
 create mode 100644 arch/x86/mm/mem_encrypt_boot.S

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 70e55f6..7122c36 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,12 @@
 
 extern unsigned long sme_me_mask;
 
+void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
+unsigned long decrypted_kernel_vaddr,
+unsigned long kernel_len,
+unsigned long encryption_wa,
+unsigned long encryption_pgd);
+
 void __init sme_early_encrypt(resource_size_t paddr,
  unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a94a7b6..72bf8c0 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -40,3 +40,4 @@ obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a7400ec..e5d5439 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -199,8 +201,316 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned 
long size)
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
 }
 
+static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
+unsigned long end)
+{
+   unsigned long pgd_start, pgd_end, pgd_size;
+   pgd_t *pgd_p;
+
+   pgd_start = start & PGDIR_MASK;
+   pgd_end = end & PGDIR_MASK;
+
+   pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1);
+   pgd_size *= sizeof(pgd_t);
+
+   pgd_p = pgd_base + pgd_index(start);
+
+   memset(pgd_p, 0, pgd_size);
+}
+
+#define PGD_FLAGS  _KERNPG_TABLE_NOENC
+#define P4D_FLAGS  _KERNPG_TABLE_NOENC
+#define PUD_FLAGS  _KERNPG_TABLE_NOENC
+#define PMD_FLAGS  (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+
+static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
+unsigned long vaddr, pmdval_t pmd_val)
+{
+   pgd_t *pgd_p;
+   p4d_t *p4d_p;
+   pud_t *pud_p;
+   pmd_t *pmd_p;
+
+   pgd_p = pgd_base + pgd_index(vaddr);
+   if (native_pgd_val(*pgd_p)) {
+   if (IS_ENABLED(CONFIG_X86_5LEVEL))
+   p4d_p = (p4d_t *)(native_pgd_val(*pgd_p) & 
~PTE_FLAGS_MASK);
+   else
+   pud_p = (pud_t *)(native_pgd_val(*pgd_p) & 
~PTE_FLAGS_MASK);
+   } else {
+   pgd_t pgd;
+
+   if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+   p4d_p = pgtable_area;
+   memset(p4d_p, 0, sizeof(*p4d_p) * PTRS_PER_P4D);
+   pgtable_area += sizeof(*p4d_p) * PTRS_PER_P4D;
+
+   pgd = native_make_pgd((pgdval_t)p4d_p + PGD_FLAGS);
+   } else {
+   pud_p = pgtable_area;
+   memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+   pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+   pgd = native_make_pgd((pgdval_t)pud_p + PGD_FLAGS);
+   }
+   native_set_pgd(pgd_p, pgd);
+   }
+
+   if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+   p4d_p += p4d_index(vaddr);
+   if (native_p4d_val(*p4d_p)) {
+   pud_p = (pud_t *)(native_p4d_val(*p4d_p) & 
~PTE_FLAGS_MASK);
+   } else {
+   p4d_t p4d;
+
+   pud_p = pgtable_area;
+   memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+   pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+   p4d = native_make_p4d((pudval_t)pud_p + P4D_FLAGS);
+   native_set_p4d(p4d_p, p4d);
+   }
+   }
+
+   pud_p += pud_index(vaddr);
+   if (native_pud_val(*pud_p)) {
+   if (native_pud_val(*pud_p) & _PAGE_PSE)
+   goto out;
+
+   pmd_p = (pmd_t *)(native_pud_val(*pud_p) & 

[PATCH v8 RESEND 36/38] x86/boot: Add early cmdline parsing for options with arguments

2017-06-27 Thread Tom Lendacky
Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cmdline.h |2 +
 arch/x86/lib/cmdline.c |  105 
 2 files changed, 107 insertions(+)

diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index e01f7f7..84ae170 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+   char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 5cc78bf..3261abb 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -104,7 +104,112 @@ static inline int myisspace(u8 c)
return 0;   /* Buffer overrun */
 }
 
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+   char c;
+   int pos = 0, len = -1;
+   const char *opptr = NULL;
+   char *bufptr = buffer;
+   enum {
+   st_wordstart = 0,   /* Start of word/after whitespace */
+   st_wordcmp, /* Comparing this word */
+   st_wordskip,/* Miscompare, skip */
+   st_bufcpy,  /* Copying this to buffer */
+   } state = st_wordstart;
+
+   if (!cmdline)
+   return -1;  /* No command line */
+
+   /*
+* This 'pos' check ensures we do not overrun
+* a non-NULL-terminated 'cmdline'
+*/
+   while (pos++ < max_cmdline_size) {
+   c = *(char *)cmdline++;
+   if (!c)
+   break;
+
+   switch (state) {
+   case st_wordstart:
+   if (myisspace(c))
+   break;
+
+   state = st_wordcmp;
+   opptr = option;
+   /* fall through */
+
+   case st_wordcmp:
+   if ((c == '=') && !*opptr) {
+   /*
+* We matched all the way to the end of the
+* option we were looking for, prepare to
+* copy the argument.
+*/
+   len = 0;
+   bufptr = buffer;
+   state = st_bufcpy;
+   break;
+   } else if (c == *opptr++) {
+   /*
+* We are currently matching, so continue
+* to the next character on the cmdline.
+*/
+   break;
+   }
+   state = st_wordskip;
+   /* fall through */
+
+   case st_wordskip:
+   if (myisspace(c))
+   state = st_wordstart;
+   break;
+
+   case st_bufcpy:
+   if (myisspace(c)) {
+   state = st_wordstart;
+   } else {
+   /*
+* Increment len, but don't overrun the
+* supplied buffer and leave room for the
+* NULL terminator.
+*/
+   if (++len < bufsize)
+   *bufptr++ = c;
+   }
+   break;
+   }
+   }
+
+   if (bufsize)
+   *bufptr = '\0';
+
+   return len;
+}
+
 int cmdline_find_option_bool(const char *cmdline, const char *option)
 {
return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
 }
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ 

[PATCH v8 RESEND 37/38] compiler-gcc.h: Introduce __nostackp function attribute

2017-06-27 Thread Tom Lendacky
Create a new function attribute, __nostackp, that can used to turn off
stack protection on a per function basis.

Signed-off-by: Tom Lendacky 
---
 include/linux/compiler-gcc.h |2 ++
 include/linux/compiler.h |4 
 2 files changed, 6 insertions(+)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 0efef9c..de48b32 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -162,6 +162,8 @@
 
 #if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
+
+#define __nostackp __attribute__((__optimize__("no-stack-protector")))
 #endif
 
 #if GCC_VERSION >= 40300
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 707242f..615d50d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -458,6 +458,10 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 #define __visible
 #endif
 
+#ifndef __nostackp
+#define __nostackp
+#endif
+
 /*
  * Assume alignment of return value.
  */

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 38/38] x86/mm: Add support to make use of Secure Memory Encryption

2017-06-27 Thread Tom Lendacky
Add support to check if SME has been enabled and if memory encryption
should be activated (checking of command line option based on the
configuration of the default state).  If memory encryption is to be
activated, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |6 ++-
 arch/x86/kernel/head64.c   |5 +-
 arch/x86/mm/mem_encrypt.c  |   77 
 3 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 7122c36..8e618fc 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -17,6 +17,8 @@
 
 #include 
 
+#include 
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
@@ -38,7 +40,7 @@ void __init sme_early_decrypt(resource_size_t paddr,
 void __init sme_early_init(void);
 
 void __init sme_encrypt_kernel(void);
-void __init sme_enable(void);
+void __init sme_enable(struct boot_params *bp);
 
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
@@ -60,7 +62,7 @@ static inline void __init sme_unmap_bootdata(char 
*real_mode_data) { }
 static inline void __init sme_early_init(void) { }
 
 static inline void __init sme_encrypt_kernel(void) { }
-static inline void __init sme_enable(void) { }
+static inline void __init sme_enable(struct boot_params *bp) { }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0cdb53b..925b292 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,7 +45,8 @@ static void __head *fixup_pointer(void *ptr, unsigned long 
physaddr)
return ptr - (void *)_text + (void *)physaddr;
 }
 
-unsigned long __head __startup_64(unsigned long physaddr)
+unsigned long __head __startup_64(unsigned long physaddr,
+ struct boot_params *bp)
 {
unsigned long load_delta, *p;
unsigned long pgtable_flags;
@@ -70,7 +71,7 @@ unsigned long __head __startup_64(unsigned long physaddr)
for (;;);
 
/* Activate Secure Memory Encryption (SME) if supported and enabled */
-   sme_enable();
+   sme_enable(bp);
 
/* Include the SME encryption mask in the fixup value */
load_delta += sme_get_me_mask();
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index e5d5439..053d540 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -23,6 +24,13 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+
+static char sme_cmdline_arg[] __initdata = "mem_encrypt";
+static char sme_cmdline_on[]  __initdata = "on";
+static char sme_cmdline_off[] __initdata = "off";
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -190,6 +198,8 @@ void __init mem_encrypt_init(void)
 
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
+
+   pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
 void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
@@ -513,6 +523,71 @@ void __init sme_encrypt_kernel(void)
native_write_cr3(__native_read_cr3());
 }
 
-void __init sme_enable(void)
+void __init __nostackp sme_enable(struct boot_params *bp)
 {
+   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+   unsigned int eax, ebx, ecx, edx;
+   bool active_by_default;
+   unsigned long me_mask;
+   char buffer[16];
+   u64 msr;
+
+   /* Check for the SME support leaf */
+   eax = 0x8000;
+   ecx = 0;
+   native_cpuid(, , , );
+   if (eax < 0x801f)
+   return;
+
+   /*
+* Check for the SME feature:
+*   CPUID Fn8000_001F[EAX] - Bit 0
+* Secure Memory Encryption support
+*   CPUID Fn8000_001F[EBX] - Bits 5:0
+* Pagetable bit position used to indicate encryption
+*/
+   eax = 0x801f;
+   ecx = 0;
+   native_cpuid(, , , );
+   if (!(eax & 1))
+   return;
+
+   me_mask = 1UL << (ebx & 0x3f);
+
+   /* Check if SME is enabled */
+   msr = __rdmsr(MSR_K8_SYSCFG);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   return;
+
+   /*
+* Fixups have not been applied to phys_base yet and we're running
+* identity mapped, so we must obtain the address to the SME command
+* line argument data using rip-relative addressing.
+*/
+   asm ("lea sme_cmdline_arg(%%rip), %0"
+: "=r" (cmdline_arg)
+: "p" (sme_cmdline_arg));
+   asm ("lea sme_cmdline_on(%%rip), %0"
+: "=r" (cmdline_on)
+: "p" (sme_cmdline_on));
+   asm ("lea 

Re: [PATCH v3 0/4] kmod: help make deterministic

2017-06-27 Thread Jessica Yu

+++ Luis R. Rodriguez [27/06/17 00:44 +0200]:

On Mon, Jun 26, 2017 at 11:37:36PM +0200, Jessica Yu wrote:

+++ Kees Cook [20/06/17 17:23 -0700]:
> On Tue, Jun 20, 2017 at 1:56 PM, Luis R. Rodriguez  wrote:
> > On Fri, May 26, 2017 at 02:12:24PM -0700, Luis R. Rodriguez wrote:
> > > This v3 nukes the proc sysctl interface in favor for just letting 
userspace
> > > just check kernel revision. Prior to whenever this is merged userspace 
should
> > > try to avoid hammering more than 50 kmod threads as they can fail and it'd
> > > get -ENOMEM.
> > >
> > > We do away with the old heuristics on assuming you could end up with
> > > less than max_threads/2 < 50 threads as Dmitry notes this would mean 
having
> > > a system with 16 MiB of RAM with modules enabled. It simplifies our patch
> > > "kmod: reduce atomic operations on kmod_concurrent" considerbly.
> > >
> > > Since the sysctl interface is gone, this no longer depends on any
> > > other patches, the series is independent. As usual the series is
> > > available on my linux-next 20170526-kmod-only branch which is based
> > > on next-20170526.
> > >
> > > [0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170526-kmod-only
> > >
> > >   Luis
> > >
> > > Luis R. Rodriguez (4):
> > >   module: use list_for_each_entry_rcu() on find_module_all()
> > >   kmod: reduce atomic operations on kmod_concurrent and simplify
> > >   kmod: add test driver to stress test the module loader
> > >   kmod: throttle kmod thread limit
> >
> > About a month now with no further nitpicks. What tree should these changes
> > go through if there are no issues? Andrew's, Jessica's ?
>
> Seems like going through Jessica's would make the most sense?

Would be happy to take patches 01 (which I need to anyway), 02,
possibly 04 if decoupled from the test driver (03).


Feel free to decouple it, but note that then the commit log must then be
changed. My own take is this fix is not so critical as it is a corner case, so
I have instead preferred to couple in the test case and respective fix
together. I'll leave it up to you how to proceed.


I'll take 01 and 02 for the next merge window, as they are
straightforward. 03/04 can stay together, and as I understand it 04
may need to switch back to using the normal wait_* api.


I can't take patch 03 through my tree just yet, as I haven't had time to give
it a look yet :-/


Understood. I'd appreciate at least a review though.


Of course! I should have rephrased and said *by this upcoming merge window. 


[ Side comment, it seems that kmod.c isn't directly maintained by anyone
right now, perhaps Luis would be interested in picking it up? :-) ]


Sure thing, I'm not sure if it makes sense to decouple kernel/kmod.c on
MAINTAINERS though, if you do let me know what you'd prefer to call it,
"KMOD MODULE USERMODE HELPER" ?

If you prefer to keep them together I can certainly volunteer to review all
kmod changes and can send a patch to add kmod and myself under "MODULE
SUPPORT".


I'm not the maintainer for kmod.c, if that's what you mean by
decoupling. But I don't think it has one, which is why I'm suggesting
adding it to MAINTAINERS, since you've been actively working on it :)
(looking at git log, it looks like Andrew did most of the sign-off's
for kmod.c in the past). I think a separate entry in MAINTAINERS is
good, with the name you suggested.

Thanks!

Jessica

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 09/17] powerpc: call the hash functions with the correct pkey value

2017-06-27 Thread Aneesh Kumar K.V



On Tuesday 27 June 2017 03:41 PM, Ram Pai wrote:

Pass the correct protection key value to the hash functions on
page fault.

Signed-off-by: Ram Pai 
---
  arch/powerpc/include/asm/pkeys.h | 11 +++
  arch/powerpc/mm/hash_utils_64.c  |  4 
  arch/powerpc/mm/mem.c|  6 ++
  3 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index ef1c601..1370b3f 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -74,6 +74,17 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
*mm, int pkey)
  }

  /*
+ * return the protection key of the vma corresponding to the
+ * given effective address @ea.
+ */
+static inline int mm_pkey(struct mm_struct *mm, unsigned long ea)
+{
+   struct vm_area_struct *vma = find_vma(mm, ea);
+   int pkey = vma ? vma_pkey(vma) : 0;
+   return pkey;
+}
+
+/*



That is not going to work in hash fault path right ? We can't do a 
find_vma there without holding the mmap_sem


-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 31/38] x86/mm, kexec: Allow kexec to be used with SME

2017-06-27 Thread Tom Lendacky
Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/init.h  |1 +
 arch/x86/include/asm/kexec.h |8 
 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/kernel/machine_kexec_64.c   |   22 +-
 arch/x86/kernel/process.c|   17 +++--
 arch/x86/mm/ident_map.c  |   12 
 include/linux/kexec.h|8 
 kernel/kexec_core.c  |   12 +++-
 8 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 474eb8c..05c4aa0 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -7,6 +7,7 @@ struct x86_mapping_info {
unsigned long page_flag; /* page flag for PMD or PUD entry */
unsigned long offset;/* ident mapping offset */
bool direct_gbpages; /* PUD level 1GB page support */
+   unsigned long kernpg_flag;   /* kernel pagetable flag override */
 };
 
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 70ef205..e8183ac 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -207,6 +207,14 @@ struct kexec_entry64_regs {
uint64_t r15;
uint64_t rip;
 };
+
+extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+  gfp_t gfp);
+#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
+
+extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
+#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
+
 #endif
 
 typedef void crash_vmclear_fn(void);
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 32095af..830992f 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -213,6 +213,7 @@ enum page_cache_mode {
 #define PAGE_KERNEL__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC   __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
 #define PAGE_KERNEL_NOCACHE__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
 #define PAGE_KERNEL_LARGE  __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index cb0a304..9cf8daa 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -87,7 +87,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
return 0;
 err:
free_transition_pgtable(image);
@@ -115,6 +115,7 @@ static int init_pgtable(struct kimage *image, unsigned long 
start_pgtable)
.alloc_pgt_page = alloc_pgt_page,
.context= image,
.page_flag  = __PAGE_KERNEL_LARGE_EXEC,
+   .kernpg_flag= _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
pgd_t *level4p;
@@ -602,3 +603,22 @@ void arch_kexec_unprotect_crashkres(void)
 {
kexec_mark_crashkres(false);
 }
+
+int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
+{
+   /*
+* If SME is active we need to be sure that kexec pages are
+* not encrypted because when we boot to the new kernel the
+* pages won't be accessed encrypted (initially).
+*/
+   return set_memory_decrypted((unsigned long)vaddr, pages);
+}
+
+void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+   /*
+* If SME is active we need to reset the pages back to being
+* an encrypted mapping before freeing them.
+*/
+   

[PATCH v8 RESEND 33/38] x86/mm: Use proper encryption attributes with /dev/mem

2017-06-27 Thread Tom Lendacky
When accessing memory using /dev/mem (or /dev/kmem) use the proper
encryption attributes when mapping the memory.

To insure the proper attributes are applied when reading or writing
/dev/mem, update the xlate_dev_mem_ptr() function to use memremap()
which will essentially perform the same steps of applying __va for
RAM or using ioremap() if not RAM.

To insure the proper attributes are applied when mmapping /dev/mem,
update the phys_mem_access_prot() to call phys_mem_access_encrypted(),
a new function which will check if the memory should be mapped encrypted
or not. If it is not to be mapped encrypted then the VMA protection
value is updated to remove the encryption bit.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/io.h |3 +++
 arch/x86/mm/ioremap.c |   18 +-
 arch/x86/mm/pat.c |3 +++
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 09c5557..e080a39 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -386,4 +386,7 @@ extern bool arch_memremap_can_ram_remap(resource_size_t 
offset,
unsigned long flags);
 #define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
 
+extern bool phys_mem_access_encrypted(unsigned long phys_addr,
+ unsigned long size);
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index effa529..71d4ca7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -396,12 +396,10 @@ void *xlate_dev_mem_ptr(phys_addr_t phys)
unsigned long offset = phys & ~PAGE_MASK;
void *vaddr;
 
-   /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-   if (page_is_ram(start >> PAGE_SHIFT))
-   return __va(phys);
+   /* memremap() maps if RAM, otherwise falls back to ioremap() */
+   vaddr = memremap(start, PAGE_SIZE, MEMREMAP_WB);
 
-   vaddr = ioremap_cache(start, PAGE_SIZE);
-   /* Only add the offset on success and return NULL if the ioremap() 
failed: */
+   /* Only add the offset on success and return NULL if memremap() failed 
*/
if (vaddr)
vaddr += offset;
 
@@ -410,10 +408,7 @@ void *xlate_dev_mem_ptr(phys_addr_t phys)
 
 void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 {
-   if (page_is_ram(phys >> PAGE_SHIFT))
-   return;
-
-   iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
+   memunmap((void *)((unsigned long)addr & PAGE_MASK));
 }
 
 /*
@@ -622,6 +617,11 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
return prot;
 }
 
+bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size)
+{
+   return arch_memremap_can_ram_remap(phys_addr, size, 0);
+}
+
 #ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 6753d9c..b970c95 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -748,6 +748,9 @@ void arch_io_free_memtype_wc(resource_size_t start, 
resource_size_t size)
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
 {
+   if (!phys_mem_access_encrypted(pfn << PAGE_SHIFT, size))
+   vma_prot = pgprot_decrypted(vma_prot);
+
return vma_prot;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 27/38] iommu/amd: Allow the AMD IOMMU to work with memory encryption

2017-06-27 Thread Tom Lendacky
The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.

The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 drivers/iommu/amd_iommu.c   |   30 --
 drivers/iommu/amd_iommu_init.c  |   34 --
 drivers/iommu/amd_iommu_proto.h |   10 ++
 drivers/iommu/amd_iommu_types.h |2 +-
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 503849d..16cc54b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
 
 static void dump_command(unsigned long phys_addr)
 {
-   struct iommu_cmd *cmd = phys_to_virt(phys_addr);
+   struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
int i;
 
for (i = 0; i < 4; ++i)
@@ -865,11 +865,13 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
 
 static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
 {
+   u64 paddr = iommu_virt_to_phys((void *)address);
+
WARN_ON(address & 0x7ULL);
 
memset(cmd, 0, sizeof(*cmd));
-   cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
-   cmd->data[1] = upper_32_bits(__pa(address));
+   cmd->data[0] = lower_32_bits(paddr) | CMD_COMPL_WAIT_STORE_MASK;
+   cmd->data[1] = upper_32_bits(paddr);
cmd->data[2] = 1;
CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
 }
@@ -1328,7 +1330,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
return false;
 
*pte = PM_LEVEL_PDE(domain->mode,
-   virt_to_phys(domain->pt_root));
+   iommu_virt_to_phys(domain->pt_root));
domain->pt_root  = pte;
domain->mode+= 1;
domain->updated  = true;
@@ -1365,7 +1367,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
if (!page)
return NULL;
 
-   __npte = PM_LEVEL_PDE(level, virt_to_phys(page));
+   __npte = PM_LEVEL_PDE(level, iommu_virt_to_phys(page));
 
/* pte could have been changed somewhere. */
if (cmpxchg64(pte, __pte, __npte) != __pte) {
@@ -1481,10 +1483,10 @@ static int iommu_map_page(struct protection_domain *dom,
return -EBUSY;
 
if (count > 1) {
-   __pte = PAGE_SIZE_PTE(phys_addr, page_size);
+   __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
} else
-   __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+   __pte = __sme_set(phys_addr) | IOMMU_PTE_P | IOMMU_PTE_FC;
 
if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -1700,7 +1702,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
 
-   ptr = __va(tbl[i] & PAGE_MASK);
+   ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
free_page((unsigned long)ptr);
}
@@ -1715,7 +1717,7 @@ static void free_gcr3_tbl_level2(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
 
-   ptr = __va(tbl[i] & PAGE_MASK);
+   ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
free_gcr3_tbl_level1(ptr);
}
@@ -1807,7 +1809,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain, bool ats)
u64 flags = 0;
 
if (domain->mode != PAGE_MODE_NONE)
-   pte_root = virt_to_phys(domain->pt_root);
+   pte_root = iommu_virt_to_phys(domain->pt_root);
 
pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
@@ -1819,7 +1821,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain, bool ats)
flags |= DTE_FLAG_IOTLB;
 
if (domain->flags & PD_IOMMUV2_MASK) {
-   u64 gcr3 = __pa(domain->gcr3_tbl);
+   u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
u64 glx  = domain->glx;
u64 tmp;
 
@@ -3470,10 +3472,10 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int 
pasid, bool alloc)
if (root == NULL)
  

[PATCH v8 RESEND 29/38] x86, drm, fbdev: Do not specify encrypted memory for video mappings

2017-06-27 Thread Tom Lendacky
Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/vga.h   |   14 +-
 arch/x86/mm/pageattr.c   |2 ++
 drivers/gpu/drm/drm_gem.c|2 ++
 drivers/gpu/drm/drm_vm.c |4 
 drivers/gpu/drm/ttm/ttm_bo_vm.c  |7 +--
 drivers/gpu/drm/udl/udl_fb.c |4 
 drivers/video/fbdev/core/fbmem.c |   12 
 7 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..9f42bee 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,24 @@
 #ifndef _ASM_X86_VGA_H
 #define _ASM_X86_VGA_H
 
+#include 
+
 /*
  * On the PC, we can just recalculate addresses and then
  * access the videoram directly without any black magic.
+ * To support memory encryption however, we need to access
+ * the videoram as decrypted memory.
  */
 
-#define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#define VGA_MAP_MEM(x, s)  \
+({ \
+   unsigned long start = (unsigned long)phys_to_virt(x);   \
+   \
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) \
+   set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
+   \
+   start;  \
+})
 
 #define vga_readb(x) (*(x))
 #define vga_writeb(x, y) (*(y) = (x))
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d9e09fb..13fc5db 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1825,11 +1825,13 @@ int set_memory_encrypted(unsigned long addr, int 
numpages)
 {
return __set_memory_enc_dec(addr, numpages, true);
 }
+EXPORT_SYMBOL_GPL(set_memory_encrypted);
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
return __set_memory_enc_dec(addr, numpages, false);
 }
+EXPORT_SYMBOL_GPL(set_memory_decrypted);
 
 int set_pages_uc(struct page *page, int numpages)
 {
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index b1e28c9..019f48c 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned 
long obj_size,
vma->vm_ops = dev->driver->gem_vm_ops;
vma->vm_private_data = obj;
vma->vm_page_prot = 
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+   vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
/* Take a ref for this mapping of the object, so that the fault
 * handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index 1170b32..ed4bcbf 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #endif
+#include 
 #include 
 #include "drm_internal.h"
 #include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
 {
pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
+   /* We don't want graphics memory to be mapped encrypted */
+   tmp = pgprot_decrypted(tmp);
+
 #if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 9f53df9..622dab6 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
@@ -230,9 +231,11 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 * first page.
 */
for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
-   if (bo->mem.bus.is_iomem)
+   if (bo->mem.bus.is_iomem) {
+   /* Iomem should not be marked encrypted */
+   cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
pfn = bdev->driver->io_mem_pfn(bo, page_offset);
-   else {
+   } else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 4a65003..92e1690 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 

[PATCH v8 RESEND 30/38] kvm: x86: svm: Support Secure Memory Encryption within KVM

2017-06-27 Thread Tom Lendacky
Update the KVM support to work with SME. The VMCB has a number of fields
where physical addresses are used and these addresses must contain the
memory encryption mask in order to properly access the encrypted memory.
Also, use the memory encryption mask when creating and using the nested
page tables.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/mmu.c  |   12 
 arch/x86/kvm/mmu.h  |2 +-
 arch/x86/kvm/svm.c  |   35 ++-
 arch/x86/kvm/vmx.c  |3 ++-
 arch/x86/kvm/x86.c  |3 ++-
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..6d1267f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1069,7 +1069,7 @@ struct kvm_arch_async_pf {
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-   u64 acc_track_mask);
+   u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb82259..e85888c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -107,7 +107,7 @@ enum {
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
 
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_BASE_ADDR_MASK __sme_clr1ULL << 52) - 1) & 
~(u64)(PAGE_SIZE-1)))
 #define PT64_DIR_BASE_ADDR_MASK \
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 #define PT64_LVL_ADDR_MASK(level) \
@@ -125,7 +125,7 @@ enum {
* PT32_LEVEL_BITS))) - 1))
 
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
-   | shadow_x_mask | shadow_nx_mask)
+   | shadow_x_mask | shadow_nx_mask | shadow_me_mask)
 
 #define ACC_EXEC_MASK1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
@@ -184,6 +184,7 @@ struct kvm_shadow_walk_iterator {
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
 static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;
 
 /*
  * The mask/value to distinguish a PTE that has been marked not-present for
@@ -317,7 +318,7 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-   u64 acc_track_mask)
+   u64 acc_track_mask, u64 me_mask)
 {
if (acc_track_mask != 0)
acc_track_mask |= SPTE_SPECIAL_MASK;
@@ -330,6 +331,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
shadow_present_mask = p_mask;
shadow_acc_track_mask = acc_track_mask;
WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
+   shadow_me_mask = me_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -2398,7 +2400,8 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 
*sptep,
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
-  shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+  shadow_user_mask | shadow_x_mask | shadow_accessed_mask |
+  shadow_me_mask;
 
mmu_spte_set(sptep, spte);
 
@@ -2700,6 +2703,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
pte_access &= ~ACC_WRITE_MASK;
 
spte |= (u64)pfn << PAGE_SHIFT;
+   spte |= shadow_me_mask;
 
if (pte_access & ACC_WRITE_MASK) {
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 330bf3a..08b779d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
-   return ((1ULL << (e - s + 1)) - 1) << s;
+   return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..d2e9fca 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1138,9 +1138,9 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb *vmcb = svm->vmcb;
struct kvm_arch *vm_data = >vcpu.kvm->arch;
-   phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
-   phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
-   phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
+   phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
+   phys_addr_t lpa = 

[PATCH v8 RESEND 28/38] x86, realmode: Check for memory encryption on the APs

2017-06-27 Thread Tom Lendacky
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP, then set the memory encryption bit (bit
23) of MSR_K8_SYSCFG to enable memory encryption on that AP and allow the
AP to continue start up.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/realmode.h  |   12 
 arch/x86/realmode/init.c |4 
 arch/x86/realmode/rm/trampoline_64.S |   24 
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..90d9152 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
 #ifndef _ARCH_X86_REALMODE_H
 #define _ARCH_X86_REALMODE_H
 
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * in the CONFIG_X86_64 variant.
+ */
+#define TH_FLAGS_SME_ACTIVE_BIT0
+#define TH_FLAGS_SME_ACTIVEBIT(TH_FLAGS_SME_ACTIVE_BIT)
+
+#ifndef __ASSEMBLY__
+
 #include 
 #include 
 
@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+   u32 flags;
 #endif
 };
 
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
 void set_real_mode_mem(phys_addr_t mem, size_t size);
 void reserve_real_mode(void);
 
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index d6ddc7e..1f71980 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -108,6 +108,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = _header->cr4;
*trampoline_cr4_features = mmu_cr4_features;
 
+   trampoline_header->flags = 0;
+   if (sme_active())
+   trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_top_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S 
b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..614fd70 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "realmode.h"
 
.text
@@ -92,6 +93,28 @@ ENTRY(startup_32)
movl%edx, %fs
movl%edx, %gs
 
+   /*
+* Check for memory encryption support. This is a safety net in
+* case BIOS hasn't done the necessary step of setting the bit in
+* the MSR for this AP. If SME is active and we've gotten this far
+* then it is safe for us to set the MSR bit and continue. If we
+* don't we'll eventually crash trying to execute encrypted
+* instructions.
+*/
+   bt  $TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
+   jnc .Ldone
+   movl$MSR_K8_SYSCFG, %ecx
+   rdmsr
+   bts $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+   jc  .Ldone
+
+   /*
+* Memory encryption is enabled but the SME enable bit for this
+* CPU has has not been set.  It is safe to set it, so do so.
+*/
+   wrmsr
+.Ldone:
+
movlpa_tr_cr4, %eax
movl%eax, %cr4  # Enable PAE mode
 
@@ -147,6 +170,7 @@ GLOBAL(trampoline_header)
tr_start:   .space  8
GLOBAL(tr_efer) .space  8
GLOBAL(tr_cr4)  .space  4
+   GLOBAL(tr_flags).space  4
 END(trampoline_header)
 
 #include "trampoline_common.S"

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 23/38] x86/realmode: Decrypt trampoline area if memory encryption is active

2017-06-27 Thread Tom Lendacky
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/realmode/init.c |8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index cd4be19..d6ddc7e 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -1,6 +1,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -59,6 +60,13 @@ static void __init setup_real_mode(void)
 
base = (unsigned char *)real_mode_header;
 
+   /*
+* If SME is active, the trampoline area will need to be in
+* decrypted memory in order to bring up other processors
+* successfully.
+*/
+   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+
memcpy(base, real_mode_blob, size);
 
phys_base = __pa(base);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 21/38] x86/mm: Add support to access persistent memory in the clear

2017-06-27 Thread Tom Lendacky
Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ee33838..effa529 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -420,17 +420,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
  * kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
 {
+   int is_pmem;
+
+   /*
+* Check if the address is part of a persistent memory region.
+* This check covers areas added by E820, EFI and ACPI.
+*/
+   is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+   IORES_DESC_PERSISTENT_MEMORY);
+   if (is_pmem != REGION_DISJOINT)
+   return true;
+
+   /*
+* Check if the non-volatile attribute is set for an EFI
+* reserved area.
+*/
+   if (efi_enabled(EFI_BOOT)) {
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_RESERVED_TYPE:
+   if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+   return true;
+   break;
+   default:
+   break;
+   }
+   }
+
/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   case E820_TYPE_PRAM:
return true;
default:
break;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 25/38] swiotlb: Add warnings for use of bounce buffers with SME

2017-06-27 Thread Tom Lendacky
Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.  Actions can range from utilizing an
IOMMU, replacing the device with another device that can support 64-bit
DMA, ignoring the message if the device isn't used much, etc.

Signed-off-by: Tom Lendacky 
---
 include/linux/dma-mapping.h |   13 +
 lib/swiotlb.c   |3 +++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eece..a156c40 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -554,6 +555,12 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
 #endif
 }
 
+static inline void dma_check_mask(struct device *dev, u64 mask)
+{
+   if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
+   dev_warn(dev, "SME is active, device will require DMA bounce 
buffers\n");
+}
+
 #ifndef HAVE_ARCH_DMA_SUPPORTED
 static inline int dma_supported(struct device *dev, u64 mask)
 {
@@ -577,6 +584,9 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+
+   dma_check_mask(dev, mask);
+
*dev->dma_mask = mask;
return 0;
 }
@@ -596,6 +606,9 @@ static inline int dma_set_coherent_mask(struct device *dev, 
u64 mask)
 {
if (!dma_supported(dev, mask))
return -EIO;
+
+   dma_check_mask(dev, mask);
+
dev->coherent_dma_mask = mask;
return 0;
 }
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 04ac91a..8c6c83e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,6 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
+   if (sme_active())
+   pr_warn_once("SME is active and system is using DMA bounce 
buffers\n");
+
mask = dma_get_seg_boundary(hwdev);
 
tbl_dma_addr &= mask;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 26/38] x86/CPU/AMD: Make the microcode level available earlier in the boot

2017-06-27 Thread Tom Lendacky
Move the setting of the cpuinfo_x86.microcode field from amd_init() to
early_amd_init() so that it is available earlier in the boot process. This
avoids having to read MSR_AMD64_PATCH_LEVEL directly during early boot.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/cpu/amd.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 5bdcbd4..fdcf305 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -547,8 +547,12 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
+   u32 dummy;
+
early_init_amd_mc(c);
 
+   rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
+
/*
 * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate
 * with P/T states and does not stop in deep C-states
@@ -746,8 +750,6 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
-   u32 dummy;
-
early_init_amd(c);
 
/*
@@ -809,8 +811,6 @@ static void init_amd(struct cpuinfo_x86 *c)
if (c->x86 > 0x11)
set_cpu_cap(c, X86_FEATURE_ARAT);
 
-   rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
-
/* 3DNow or LM implies PREFETCHW */
if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 24/38] x86, swiotlb: Add memory encryption support

2017-06-27 Thread Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/dma-mapping.h |5 ++-
 arch/x86/include/asm/mem_encrypt.h |5 +++
 arch/x86/kernel/pci-dma.c  |   11 +--
 arch/x86/kernel/pci-nommu.c|2 +
 arch/x86/kernel/pci-swiotlb.c  |   15 +-
 arch/x86/mm/mem_encrypt.c  |   22 +++
 include/linux/swiotlb.h|1 +
 init/main.c|   10 +++
 lib/swiotlb.c  |   54 +++-
 9 files changed, 108 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 08a0838..191f9a5 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -62,12 +63,12 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   return paddr;
+   return __sme_set(paddr);
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-   return daddr;
+   return __sme_clr(daddr);
 }
 #endif /* CONFIG_X86_DMA_REMAP */
 
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index ab1fe77..70e55f6 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -34,6 +34,11 @@ void __init sme_early_decrypt(resource_size_t paddr,
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(void);
 
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec..72d96d4 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -93,9 +93,12 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
if (gfpflags_allow_blocking(flag)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size),
 flag);
-   if (page && page_to_phys(page) + size > dma_mask) {
-   dma_release_from_contiguous(dev, page, count);
-   page = NULL;
+   if (page) {
+   addr = phys_to_dma(dev, page_to_phys(page));
+   if (addr + size > dma_mask) {
+   dma_release_from_contiguous(dev, page, count);
+   page = NULL;
+   }
}
}
/* fallback */
@@ -104,7 +107,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
if (!page)
return NULL;
 
-   addr = page_to_phys(page);
+   addr = phys_to_dma(dev, page_to_phys(page));
if (addr + size > dma_mask) {
__free_pages(page, get_order(size));
 
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952e..98b576a 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
page *page,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
-   dma_addr_t bus = page_to_phys(page) + offset;
+   dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 1e23577..6770775 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -6,12 +6,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 #include 
+
 int swiotlb __read_mostly;
 
 void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -79,8 +81,8 @@ int __init pci_swiotlb_detect_override(void)
  pci_swiotlb_late_init);
 
 /*
- * if 4GB or more detected (and iommu=off not set) return 1
- * and set swiotlb to 1.
+ * If 4GB or more detected (and iommu=off not set) or if SME is active
+ * then set swiotlb to 1 and return 1.
  */
 int __init pci_swiotlb_detect_4gb(void)
 {
@@ -89,6 +91,15 @@ int __init pci_swiotlb_detect_4gb(void)
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
 #endif

[PATCH v8 RESEND 26/38] x86/CPU/AMD: Make the microcode level available earlier in the boot

2017-06-27 Thread Tom Lendacky
Move the setting of the cpuinfo_x86.microcode field from amd_init() to
early_amd_init() so that it is available earlier in the boot process. This
avoids having to read MSR_AMD64_PATCH_LEVEL directly during early boot.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/cpu/amd.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 5bdcbd4..fdcf305 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -547,8 +547,12 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
+   u32 dummy;
+
early_init_amd_mc(c);
 
+   rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
+
/*
 * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate
 * with P/T states and does not stop in deep C-states
@@ -746,8 +750,6 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
-   u32 dummy;
-
early_init_amd(c);
 
/*
@@ -809,8 +811,6 @@ static void init_amd(struct cpuinfo_x86 *c)
if (c->x86 > 0x11)
set_cpu_cap(c, X86_FEATURE_ARAT);
 
-   rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
-
/* 3DNow or LM implies PREFETCHW */
if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 29/38] x86, drm, fbdev: Do not specify encrypted memory for video mappings

2017-06-27 Thread Tom Lendacky
Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/vga.h   |   14 +-
 arch/x86/mm/pageattr.c   |2 ++
 drivers/gpu/drm/drm_gem.c|2 ++
 drivers/gpu/drm/drm_vm.c |4 
 drivers/gpu/drm/ttm/ttm_bo_vm.c  |7 +--
 drivers/gpu/drm/udl/udl_fb.c |4 
 drivers/video/fbdev/core/fbmem.c |   12 
 7 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..9f42bee 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,24 @@
 #ifndef _ASM_X86_VGA_H
 #define _ASM_X86_VGA_H
 
+#include 
+
 /*
  * On the PC, we can just recalculate addresses and then
  * access the videoram directly without any black magic.
+ * To support memory encryption however, we need to access
+ * the videoram as decrypted memory.
  */
 
-#define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#define VGA_MAP_MEM(x, s)  \
+({ \
+   unsigned long start = (unsigned long)phys_to_virt(x);   \
+   \
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) \
+   set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
+   \
+   start;  \
+})
 
 #define vga_readb(x) (*(x))
 #define vga_writeb(x, y) (*(y) = (x))
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d9e09fb..13fc5db 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1825,11 +1825,13 @@ int set_memory_encrypted(unsigned long addr, int 
numpages)
 {
return __set_memory_enc_dec(addr, numpages, true);
 }
+EXPORT_SYMBOL_GPL(set_memory_encrypted);
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
return __set_memory_enc_dec(addr, numpages, false);
 }
+EXPORT_SYMBOL_GPL(set_memory_decrypted);
 
 int set_pages_uc(struct page *page, int numpages)
 {
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index b1e28c9..019f48c 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned 
long obj_size,
vma->vm_ops = dev->driver->gem_vm_ops;
vma->vm_private_data = obj;
vma->vm_page_prot = 
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+   vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
/* Take a ref for this mapping of the object, so that the fault
 * handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index 1170b32..ed4bcbf 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #endif
+#include 
 #include 
 #include "drm_internal.h"
 #include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
 {
pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
+   /* We don't want graphics memory to be mapped encrypted */
+   tmp = pgprot_decrypted(tmp);
+
 #if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 9f53df9..622dab6 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
@@ -230,9 +231,11 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 * first page.
 */
for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
-   if (bo->mem.bus.is_iomem)
+   if (bo->mem.bus.is_iomem) {
+   /* Iomem should not be marked encrypted */
+   cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
pfn = bdev->driver->io_mem_pfn(bo, page_offset);
-   else {
+   } else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 4a65003..92e1690 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 

[PATCH v8 28/38] x86, realmode: Check for memory encryption on the APs

2017-06-27 Thread Tom Lendacky
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP, then set the memory encryption bit (bit
23) of MSR_K8_SYSCFG to enable memory encryption on that AP and allow the
AP to continue start up.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/realmode.h  |   12 
 arch/x86/realmode/init.c |4 
 arch/x86/realmode/rm/trampoline_64.S |   24 
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..90d9152 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
 #ifndef _ARCH_X86_REALMODE_H
 #define _ARCH_X86_REALMODE_H
 
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * in the CONFIG_X86_64 variant.
+ */
+#define TH_FLAGS_SME_ACTIVE_BIT0
+#define TH_FLAGS_SME_ACTIVEBIT(TH_FLAGS_SME_ACTIVE_BIT)
+
+#ifndef __ASSEMBLY__
+
 #include 
 #include 
 
@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+   u32 flags;
 #endif
 };
 
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
 void set_real_mode_mem(phys_addr_t mem, size_t size);
 void reserve_real_mode(void);
 
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index d6ddc7e..1f71980 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -108,6 +108,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = _header->cr4;
*trampoline_cr4_features = mmu_cr4_features;
 
+   trampoline_header->flags = 0;
+   if (sme_active())
+   trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_top_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S 
b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..614fd70 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "realmode.h"
 
.text
@@ -92,6 +93,28 @@ ENTRY(startup_32)
movl%edx, %fs
movl%edx, %gs
 
+   /*
+* Check for memory encryption support. This is a safety net in
+* case BIOS hasn't done the necessary step of setting the bit in
+* the MSR for this AP. If SME is active and we've gotten this far
+* then it is safe for us to set the MSR bit and continue. If we
+* don't we'll eventually crash trying to execute encrypted
+* instructions.
+*/
+   bt  $TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
+   jnc .Ldone
+   movl$MSR_K8_SYSCFG, %ecx
+   rdmsr
+   bts $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+   jc  .Ldone
+
+   /*
+* Memory encryption is enabled but the SME enable bit for this
+* CPU has has not been set.  It is safe to set it, so do so.
+*/
+   wrmsr
+.Ldone:
+
movlpa_tr_cr4, %eax
movl%eax, %cr4  # Enable PAE mode
 
@@ -147,6 +170,7 @@ GLOBAL(trampoline_header)
tr_start:   .space  8
GLOBAL(tr_efer) .space  8
GLOBAL(tr_cr4)  .space  4
+   GLOBAL(tr_flags).space  4
 END(trampoline_header)
 
 #include "trampoline_common.S"

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 27/38] iommu/amd: Allow the AMD IOMMU to work with memory encryption

2017-06-27 Thread Tom Lendacky
The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.

The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 drivers/iommu/amd_iommu.c   |   30 --
 drivers/iommu/amd_iommu_init.c  |   34 --
 drivers/iommu/amd_iommu_proto.h |   10 ++
 drivers/iommu/amd_iommu_types.h |2 +-
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 503849d..16cc54b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
 
 static void dump_command(unsigned long phys_addr)
 {
-   struct iommu_cmd *cmd = phys_to_virt(phys_addr);
+   struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
int i;
 
for (i = 0; i < 4; ++i)
@@ -865,11 +865,13 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
 
 static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
 {
+   u64 paddr = iommu_virt_to_phys((void *)address);
+
WARN_ON(address & 0x7ULL);
 
memset(cmd, 0, sizeof(*cmd));
-   cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
-   cmd->data[1] = upper_32_bits(__pa(address));
+   cmd->data[0] = lower_32_bits(paddr) | CMD_COMPL_WAIT_STORE_MASK;
+   cmd->data[1] = upper_32_bits(paddr);
cmd->data[2] = 1;
CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
 }
@@ -1328,7 +1330,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
return false;
 
*pte = PM_LEVEL_PDE(domain->mode,
-   virt_to_phys(domain->pt_root));
+   iommu_virt_to_phys(domain->pt_root));
domain->pt_root  = pte;
domain->mode+= 1;
domain->updated  = true;
@@ -1365,7 +1367,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
if (!page)
return NULL;
 
-   __npte = PM_LEVEL_PDE(level, virt_to_phys(page));
+   __npte = PM_LEVEL_PDE(level, iommu_virt_to_phys(page));
 
/* pte could have been changed somewhere. */
if (cmpxchg64(pte, __pte, __npte) != __pte) {
@@ -1481,10 +1483,10 @@ static int iommu_map_page(struct protection_domain *dom,
return -EBUSY;
 
if (count > 1) {
-   __pte = PAGE_SIZE_PTE(phys_addr, page_size);
+   __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
} else
-   __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+   __pte = __sme_set(phys_addr) | IOMMU_PTE_P | IOMMU_PTE_FC;
 
if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -1700,7 +1702,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
 
-   ptr = __va(tbl[i] & PAGE_MASK);
+   ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
free_page((unsigned long)ptr);
}
@@ -1715,7 +1717,7 @@ static void free_gcr3_tbl_level2(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
 
-   ptr = __va(tbl[i] & PAGE_MASK);
+   ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
free_gcr3_tbl_level1(ptr);
}
@@ -1807,7 +1809,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain, bool ats)
u64 flags = 0;
 
if (domain->mode != PAGE_MODE_NONE)
-   pte_root = virt_to_phys(domain->pt_root);
+   pte_root = iommu_virt_to_phys(domain->pt_root);
 
pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
@@ -1819,7 +1821,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain, bool ats)
flags |= DTE_FLAG_IOTLB;
 
if (domain->flags & PD_IOMMUV2_MASK) {
-   u64 gcr3 = __pa(domain->gcr3_tbl);
+   u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
u64 glx  = domain->glx;
u64 tmp;
 
@@ -3470,10 +3472,10 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int 
pasid, bool alloc)
if (root == NULL)
  

[PATCH v8 30/38] kvm: x86: svm: Support Secure Memory Encryption within KVM

2017-06-27 Thread Tom Lendacky
Update the KVM support to work with SME. The VMCB has a number of fields
where physical addresses are used and these addresses must contain the
memory encryption mask in order to properly access the encrypted memory.
Also, use the memory encryption mask when creating and using the nested
page tables.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/mmu.c  |   12 
 arch/x86/kvm/mmu.h  |2 +-
 arch/x86/kvm/svm.c  |   35 ++-
 arch/x86/kvm/vmx.c  |3 ++-
 arch/x86/kvm/x86.c  |3 ++-
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..6d1267f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1069,7 +1069,7 @@ struct kvm_arch_async_pf {
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-   u64 acc_track_mask);
+   u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb82259..e85888c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -107,7 +107,7 @@ enum {
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
 
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_BASE_ADDR_MASK __sme_clr1ULL << 52) - 1) & 
~(u64)(PAGE_SIZE-1)))
 #define PT64_DIR_BASE_ADDR_MASK \
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 #define PT64_LVL_ADDR_MASK(level) \
@@ -125,7 +125,7 @@ enum {
* PT32_LEVEL_BITS))) - 1))
 
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
-   | shadow_x_mask | shadow_nx_mask)
+   | shadow_x_mask | shadow_nx_mask | shadow_me_mask)
 
 #define ACC_EXEC_MASK1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
@@ -184,6 +184,7 @@ struct kvm_shadow_walk_iterator {
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
 static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;
 
 /*
  * The mask/value to distinguish a PTE that has been marked not-present for
@@ -317,7 +318,7 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-   u64 acc_track_mask)
+   u64 acc_track_mask, u64 me_mask)
 {
if (acc_track_mask != 0)
acc_track_mask |= SPTE_SPECIAL_MASK;
@@ -330,6 +331,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
shadow_present_mask = p_mask;
shadow_acc_track_mask = acc_track_mask;
WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
+   shadow_me_mask = me_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -2398,7 +2400,8 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 
*sptep,
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
-  shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+  shadow_user_mask | shadow_x_mask | shadow_accessed_mask |
+  shadow_me_mask;
 
mmu_spte_set(sptep, spte);
 
@@ -2700,6 +2703,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
pte_access &= ~ACC_WRITE_MASK;
 
spte |= (u64)pfn << PAGE_SHIFT;
+   spte |= shadow_me_mask;
 
if (pte_access & ACC_WRITE_MASK) {
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 330bf3a..08b779d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
-   return ((1ULL << (e - s + 1)) - 1) << s;
+   return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..d2e9fca 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1138,9 +1138,9 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb *vmcb = svm->vmcb;
struct kvm_arch *vm_data = >vcpu.kvm->arch;
-   phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
-   phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
-   phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
+   phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
+   phys_addr_t lpa = 

[PATCH v8 RESEND 25/38] swiotlb: Add warnings for use of bounce buffers with SME

2017-06-27 Thread Tom Lendacky
Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.  Actions can range from utilizing an
IOMMU, replacing the device with another device that can support 64-bit
DMA, ignoring the message if the device isn't used much, etc.

Signed-off-by: Tom Lendacky 
---
 include/linux/dma-mapping.h |   13 +
 lib/swiotlb.c   |3 +++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eece..a156c40 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -554,6 +555,12 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
 #endif
 }
 
+static inline void dma_check_mask(struct device *dev, u64 mask)
+{
+   if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
+   dev_warn(dev, "SME is active, device will require DMA bounce 
buffers\n");
+}
+
 #ifndef HAVE_ARCH_DMA_SUPPORTED
 static inline int dma_supported(struct device *dev, u64 mask)
 {
@@ -577,6 +584,9 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+
+   dma_check_mask(dev, mask);
+
*dev->dma_mask = mask;
return 0;
 }
@@ -596,6 +606,9 @@ static inline int dma_set_coherent_mask(struct device *dev, 
u64 mask)
 {
if (!dma_supported(dev, mask))
return -EIO;
+
+   dma_check_mask(dev, mask);
+
dev->coherent_dma_mask = mask;
return 0;
 }
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 04ac91a..8c6c83e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,6 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
+   if (sme_active())
+   pr_warn_once("SME is active and system is using DMA bounce 
buffers\n");
+
mask = dma_get_seg_boundary(hwdev);
 
tbl_dma_addr &= mask;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 08/38] x86/mm: Add support to enable SME in early boot processing

2017-06-27 Thread Tom Lendacky
Add support to the early boot code to use Secure Memory Encryption (SME).
Since the kernel has been loaded into memory in a decrypted state, encrypt
the kernel in place and update the early pagetables with the memory
encryption mask so that new pagetable entries will use memory encryption.

The routines to set the encryption mask and perform the encryption are
stub routines for now with functionality to be added in a later patch.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |8 +
 arch/x86/kernel/head64.c   |   53 +---
 arch/x86/kernel/head_64.S  |   20 --
 arch/x86/mm/mem_encrypt.c  |9 ++
 include/linux/mem_encrypt.h|5 +++
 5 files changed, 82 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index a105796..475e34f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,14 +15,22 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
 
+void __init sme_encrypt_kernel(void);
+void __init sme_enable(void);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
 
+static inline void __init sme_encrypt_kernel(void) { }
+static inline void __init sme_enable(void) { }
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 46c3c73..1f0ddcc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,9 +46,10 @@ static void __head *fixup_pointer(void *ptr, unsigned long 
physaddr)
return ptr - (void *)_text + (void *)physaddr;
 }
 
-void __head __startup_64(unsigned long physaddr)
+unsigned long __head __startup_64(unsigned long physaddr)
 {
unsigned long load_delta, *p;
+   unsigned long pgtable_flags;
pgdval_t *pgd;
p4dval_t *p4d;
pudval_t *pud;
@@ -68,6 +70,12 @@ void __head __startup_64(unsigned long physaddr)
if (load_delta & ~PMD_PAGE_MASK)
for (;;);
 
+   /* Activate Secure Memory Encryption (SME) if supported and enabled */
+   sme_enable();
+
+   /* Include the SME encryption mask in the fixup value */
+   load_delta += sme_get_me_mask();
+
/* Fixup the physical addresses in the page table */
 
pgd = fixup_pointer(_top_pgt, physaddr);
@@ -94,28 +102,30 @@ void __head __startup_64(unsigned long physaddr)
 
pud = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
pmd = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
+   pgtable_flags = _KERNPG_TABLE + sme_get_me_mask();
 
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
p4d = fixup_pointer(early_dynamic_pgts[next_early_pgt++], 
physaddr);
 
i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-   pgd[i + 0] = (pgdval_t)p4d + _KERNPG_TABLE;
-   pgd[i + 1] = (pgdval_t)p4d + _KERNPG_TABLE;
+   pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
+   pgd[i + 1] = (pgdval_t)p4d + pgtable_flags;
 
i = (physaddr >> P4D_SHIFT) % PTRS_PER_P4D;
-   p4d[i + 0] = (pgdval_t)pud + _KERNPG_TABLE;
-   p4d[i + 1] = (pgdval_t)pud + _KERNPG_TABLE;
+   p4d[i + 0] = (pgdval_t)pud + pgtable_flags;
+   p4d[i + 1] = (pgdval_t)pud + pgtable_flags;
} else {
i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-   pgd[i + 0] = (pgdval_t)pud + _KERNPG_TABLE;
-   pgd[i + 1] = (pgdval_t)pud + _KERNPG_TABLE;
+   pgd[i + 0] = (pgdval_t)pud + pgtable_flags;
+   pgd[i + 1] = (pgdval_t)pud + pgtable_flags;
}
 
i = (physaddr >> PUD_SHIFT) % PTRS_PER_PUD;
-   pud[i + 0] = (pudval_t)pmd + _KERNPG_TABLE;
-   pud[i + 1] = (pudval_t)pmd + _KERNPG_TABLE;
+   pud[i + 0] = (pudval_t)pmd + pgtable_flags;
+   pud[i + 1] = (pudval_t)pmd + pgtable_flags;
 
pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
+   pmd_entry += sme_get_me_mask();
pmd_entry +=  physaddr;
 
for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
@@ -136,9 +146,30 @@ void __head __startup_64(unsigned long physaddr)
pmd[i] += load_delta;
}
 
-   /* Fixup phys_base */
+   /*
+* Fixup phys_base - remove the memory encryption mask to obtain
+* the true physical address.
+*/
p = fixup_pointer(_base, physaddr);
-   *p += load_delta;
+   *p += load_delta - sme_get_me_mask();
+
+   /* Encrypt the kernel (if SME is active) */
+   sme_encrypt_kernel();
+
+   /*
+* Return the SME encryption mask (if SME is active) to be used as a
+* 

[PATCH v8 RESEND 06/38] x86/mm: Add Secure Memory Encryption (SME) support

2017-06-27 Thread Tom Lendacky
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/Kconfig   |   25 +
 arch/x86/include/asm/mem_encrypt.h |   30 ++
 arch/x86/mm/Makefile   |1 +
 arch/x86/mm/mem_encrypt.c  |   21 +
 include/linux/mem_encrypt.h|   35 +++
 5 files changed, 112 insertions(+)
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 include/linux/mem_encrypt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72028a1..3a59e9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1409,6 +1409,31 @@ config X86_DIRECT_GBPAGES
  supports them), so don't confuse the user by printing
  that we have them enabled.
 
+config ARCH_HAS_MEM_ENCRYPT
+   def_bool y
+
+config AMD_MEM_ENCRYPT
+   bool "AMD Secure Memory Encryption (SME) support"
+   depends on X86_64 && CPU_SUP_AMD
+   ---help---
+ Say yes to enable support for the encryption of system memory.
+ This requires an AMD processor that supports Secure Memory
+ Encryption (SME).
+
+config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
+   bool "Activate AMD Secure Memory Encryption (SME) by default"
+   default y
+   depends on AMD_MEM_ENCRYPT
+   ---help---
+ Say yes to have system memory encrypted by default if running on
+ an AMD processor that supports Secure Memory Encryption (SME).
+
+ If set to Y, then the encryption of system memory can be
+ deactivated with the mem_encrypt=off command line option.
+
+ If set to N, then the encryption of system memory can be
+ activated with the mem_encrypt=on command line option.
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 000..a105796
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,30 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+#else  /* !CONFIG_AMD_MEM_ENCRYPT */
+
+#define sme_me_mask0UL
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 0fbdcb6..a94a7b6 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)   += mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 000..b99d469
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+/*
+ * Since SME related variables are set early in the boot process they must
+ * reside in the .data section so as not to be zeroed out when the .bss
+ * section is later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 000..59769f7
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,35 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_ARCH_HAS_MEM_ENCRYPT
+
+#include 
+
+#else  /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
+
+#define sme_me_mask0UL
+
+#endif /* 

[PATCH v8 RESEND 09/38] x86/mm: Simplify p[g4um]d_page() macros

2017-06-27 Thread Tom Lendacky
Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then
use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of
duplicating the code.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/pgtable.h |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 77037b6..b64ea52 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -195,6 +195,11 @@ static inline unsigned long p4d_pfn(p4d_t p4d)
return (p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+   return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 static inline int p4d_large(p4d_t p4d)
 {
/* No 512 GiB pages yet */
@@ -704,8 +709,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)  \
-   pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)  pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -773,8 +777,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)  \
-   pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)  pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -824,8 +827,7 @@ static inline unsigned long p4d_page_vaddr(p4d_t p4d)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define p4d_page(p4d)  \
-   pfn_to_page((p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT)
+#define p4d_page(p4d)  pfn_to_page(p4d_pfn(p4d))
 
 /* Find an entry in the third-level page table.. */
 static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
@@ -859,7 +861,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)  pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)  pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 21/38] x86/mm: Add support to access persistent memory in the clear

2017-06-27 Thread Tom Lendacky
Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ee33838..effa529 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -420,17 +420,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
  * kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
 {
+   int is_pmem;
+
+   /*
+* Check if the address is part of a persistent memory region.
+* This check covers areas added by E820, EFI and ACPI.
+*/
+   is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+   IORES_DESC_PERSISTENT_MEMORY);
+   if (is_pmem != REGION_DISJOINT)
+   return true;
+
+   /*
+* Check if the non-volatile attribute is set for an EFI
+* reserved area.
+*/
+   if (efi_enabled(EFI_BOOT)) {
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_RESERVED_TYPE:
+   if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+   return true;
+   break;
+   default:
+   break;
+   }
+   }
+
/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   case E820_TYPE_PRAM:
return true;
default:
break;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 24/38] x86, swiotlb: Add memory encryption support

2017-06-27 Thread Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/dma-mapping.h |5 ++-
 arch/x86/include/asm/mem_encrypt.h |5 +++
 arch/x86/kernel/pci-dma.c  |   11 +--
 arch/x86/kernel/pci-nommu.c|2 +
 arch/x86/kernel/pci-swiotlb.c  |   15 +-
 arch/x86/mm/mem_encrypt.c  |   22 +++
 include/linux/swiotlb.h|1 +
 init/main.c|   10 +++
 lib/swiotlb.c  |   54 +++-
 9 files changed, 108 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 08a0838..191f9a5 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -62,12 +63,12 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   return paddr;
+   return __sme_set(paddr);
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-   return daddr;
+   return __sme_clr(daddr);
 }
 #endif /* CONFIG_X86_DMA_REMAP */
 
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index ab1fe77..70e55f6 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -34,6 +34,11 @@ void __init sme_early_decrypt(resource_size_t paddr,
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(void);
 
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec..72d96d4 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -93,9 +93,12 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
if (gfpflags_allow_blocking(flag)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size),
 flag);
-   if (page && page_to_phys(page) + size > dma_mask) {
-   dma_release_from_contiguous(dev, page, count);
-   page = NULL;
+   if (page) {
+   addr = phys_to_dma(dev, page_to_phys(page));
+   if (addr + size > dma_mask) {
+   dma_release_from_contiguous(dev, page, count);
+   page = NULL;
+   }
}
}
/* fallback */
@@ -104,7 +107,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t 
size,
if (!page)
return NULL;
 
-   addr = page_to_phys(page);
+   addr = phys_to_dma(dev, page_to_phys(page));
if (addr + size > dma_mask) {
__free_pages(page, get_order(size));
 
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952e..98b576a 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
page *page,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
-   dma_addr_t bus = page_to_phys(page) + offset;
+   dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 1e23577..6770775 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -6,12 +6,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 #include 
+
 int swiotlb __read_mostly;
 
 void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -79,8 +81,8 @@ int __init pci_swiotlb_detect_override(void)
  pci_swiotlb_late_init);
 
 /*
- * if 4GB or more detected (and iommu=off not set) return 1
- * and set swiotlb to 1.
+ * If 4GB or more detected (and iommu=off not set) or if SME is active
+ * then set swiotlb to 1 and return 1.
  */
 int __init pci_swiotlb_detect_4gb(void)
 {
@@ -89,6 +91,15 @@ int __init pci_swiotlb_detect_4gb(void)
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
 #endif

[PATCH v8 RESEND 23/38] x86/realmode: Decrypt trampoline area if memory encryption is active

2017-06-27 Thread Tom Lendacky
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/realmode/init.c |8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index cd4be19..d6ddc7e 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -1,6 +1,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -59,6 +60,13 @@ static void __init setup_real_mode(void)
 
base = (unsigned char *)real_mode_header;
 
+   /*
+* If SME is active, the trampoline area will need to be in
+* decrypted memory in order to bring up other processors
+* successfully.
+*/
+   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+
memcpy(base, real_mode_blob, size);
 
phys_base = __pa(base);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 22/38] x86/mm: Add support for changing the memory encryption attribute

2017-06-27 Thread Tom Lendacky
Add support for changing the memory encryption attribute for one or more
memory pages. This will be useful when we have to change the AP trampoline
area to not be encrypted. Or when we need to change the SWIOTLB area to
not be encrypted in support of devices that can't support the encryption
mask range.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/set_memory.h |3 ++
 arch/x86/mm/pageattr.c|   62 +
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h 
b/arch/x86/include/asm/set_memory.h
index eaec6c3..cd71273 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -11,6 +11,7 @@
  * Executability : eXeutable, NoteXecutable
  * Read/Write: ReadOnly, ReadWrite
  * Presence  : NotPresent
+ * Encryption: Encrypted, Decrypted
  *
  * Within a category, the attributes are mutually exclusive.
  *
@@ -42,6 +43,8 @@
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e7d3866..d9e09fb 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1769,6 +1769,68 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+   struct cpa_data cpa;
+   unsigned long start;
+   int ret;
+
+   /* Nothing to do if the SME is not active */
+   if (!sme_active())
+   return 0;
+
+   /* Should not be working on unaligned addresses */
+   if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
+   addr &= PAGE_MASK;
+
+   start = addr;
+
+   memset(, 0, sizeof(cpa));
+   cpa.vaddr = 
+   cpa.numpages = numpages;
+   cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
+   cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+   cpa.pgd = init_mm.pgd;
+
+   /* Must avoid aliasing mappings in the highmem code */
+   kmap_flush_unused();
+   vm_unmap_aliases();
+
+   /*
+* Before changing the encryption attribute, we need to flush caches.
+*/
+   if (static_cpu_has(X86_FEATURE_CLFLUSH))
+   cpa_flush_range(start, numpages, 1);
+   else
+   cpa_flush_all(1);
+
+   ret = __change_page_attr_set_clr(, 1);
+
+   /*
+* After changing the encryption attribute, we need to flush TLBs
+* again in case any speculative TLB caching occurred (but no need
+* to flush caches again).  We could just use cpa_flush_all(), but
+* in case TLB flushing gets optimized in the cpa_flush_range()
+* path use the same logic as above.
+*/
+   if (static_cpu_has(X86_FEATURE_CLFLUSH))
+   cpa_flush_range(start, numpages, 0);
+   else
+   cpa_flush_all(0);
+
+   return ret;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+   return __set_memory_enc_dec(addr, numpages, true);
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+   return __set_memory_enc_dec(addr, numpages, false);
+}
+
 int set_pages_uc(struct page *page, int numpages)
 {
unsigned long addr = (unsigned long)page_address(page);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 12/38] x86/mm: Extend early_memremap() support with additional attrs

2017-06-27 Thread Tom Lendacky
Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/Kconfig |4 +++
 arch/x86/include/asm/fixmap.h|   13 ++
 arch/x86/include/asm/pgtable_types.h |8 ++
 arch/x86/mm/ioremap.c|   44 ++
 include/asm-generic/early_ioremap.h  |2 ++
 mm/early_ioremap.c   |   10 
 6 files changed, 81 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3a59e9c..a04081ce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1434,6 +1434,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  If set to N, then the encryption of system memory can be
  activated with the mem_encrypt=on command line option.
 
+config ARCH_USE_MEMREMAP_PROT
+   def_bool y
+   depends on AMD_MEM_ENCRYPT
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d9ff226..dcd9fb5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -164,6 +164,19 @@ static inline void __set_fixmap(enum fixed_addresses idx,
  */
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
 
+/*
+ * Early memremap routines used for in-place encryption. The mappings created
+ * by these routines are intended to be used as temporary mappings.
+ */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+unsigned long size);
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+unsigned long size);
+
 #include 
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index de32ca3..32095af 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {
 
 #define _PAGE_CACHE_MASK   (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 #define _PAGE_NOCACHE  (cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP (cachemode2protval(_PAGE_CACHE_MODE_WP))
 
 #define PAGE_NONE  __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_VVAR (__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC   (__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP   (__PAGE_KERNEL | _PAGE_CACHE_WP)
 
 #define __PAGE_KERNEL_IO   (__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE   (__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
 #define _KERNPG_TABLE  (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |\
 _PAGE_DIRTY | _PAGE_ENC)
 
+#define __PAGE_KERNEL_ENC  (__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP   (__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_NOENC(__PAGE_KERNEL)
+#define __PAGE_KERNEL_NOENC_WP (__PAGE_KERNEL_WP)
+
 #define PAGE_KERNEL__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC   __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bfc3e2d..26db273 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -414,6 +414,50 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+/* Remap memory with encryption */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+   return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/*
+ * Remap memory with encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+unsigned long size)
+{
+   /* Be sure the write-protect PAT entry is set for write-protect */
+   if 

[PATCH v8 RESEND 13/38] x86/mm: Add support for early encrypt/decrypt of memory

2017-06-27 Thread Tom Lendacky
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or decrypted memory area is in the proper state (for example
the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |   10 +
 arch/x86/mm/mem_encrypt.c  |   76 
 2 files changed, 86 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index dbae7a5..8baa35b 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,11 @@
 
 extern unsigned long sme_me_mask;
 
+void __init sme_early_encrypt(resource_size_t paddr,
+ unsigned long size);
+void __init sme_early_decrypt(resource_size_t paddr,
+ unsigned long size);
+
 void __init sme_early_init(void);
 
 void __init sme_encrypt_kernel(void);
@@ -30,6 +35,11 @@
 
 #define sme_me_mask0UL
 
+static inline void __init sme_early_encrypt(resource_size_t paddr,
+   unsigned long size) { }
+static inline void __init sme_early_decrypt(resource_size_t paddr,
+   unsigned long size) { }
+
 static inline void __init sme_early_init(void) { }
 
 static inline void __init sme_encrypt_kernel(void) { }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index f973d3d..54bb73c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 /*
  * Since SME related variables are set early in the boot process they must
  * reside in the .data section so as not to be zeroed out when the .bss
@@ -22,6 +25,79 @@
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as either encrypted or decrypted but the contents
+ * are currently not in the desired state.
+ *
+ * This routine follows the steps outlined in the AMD64 Architecture
+ * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
+ */
+static void __init __sme_early_enc_dec(resource_size_t paddr,
+  unsigned long size, bool enc)
+{
+   void *src, *dst;
+   size_t len;
+
+   if (!sme_me_mask)
+   return;
+
+   local_flush_tlb();
+   wbinvd();
+
+   /*
+* There are limited number of early mapping slots, so map (at most)
+* one page at time.
+*/
+   while (size) {
+   len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+   /*
+* Create mappings for the current and desired format of
+* the memory. Use a write-protected mapping for the source.
+*/
+   src = enc ? early_memremap_decrypted_wp(paddr, len) :
+   early_memremap_encrypted_wp(paddr, len);
+
+   dst = enc ? early_memremap_encrypted(paddr, len) :
+   early_memremap_decrypted(paddr, len);
+
+   /*
+* If a mapping can't be obtained to perform the operation,
+* then eventual access of that area in the desired mode
+* will cause a crash.
+*/
+   BUG_ON(!src || !dst);
+
+   /*
+* Use a temporary buffer, of cache-line multiple size, to
+* avoid data corruption as documented in the APM.
+*/
+   memcpy(sme_early_buffer, src, len);
+   memcpy(dst, sme_early_buffer, len);
+
+   early_memunmap(dst, len);
+   early_memunmap(src, len);
+
+   paddr += len;
+   size -= len;
+   }
+}
+
+void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
+{
+   __sme_early_enc_dec(paddr, size, true);
+}
+
+void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
+{
+   __sme_early_enc_dec(paddr, size, false);
+}
+
 void __init sme_early_init(void)
 {
unsigned int i;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 15/38] x86/boot/e820: Add support to determine the E820 type of an address

2017-06-27 Thread Tom Lendacky
Add a function that will return the E820 type associated with an address
range.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/e820/api.h |2 ++
 arch/x86/kernel/e820.c  |   26 +++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 8e0f8b8..3641f5f 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -38,6 +38,8 @@
 extern void e820__reallocate_tables(void);
 extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
+extern int  e820__get_entry_type(u64 start, u64 end);
+
 /*
  * Returns true iff the specified range [start,end) is completely contained 
inside
  * the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d78a586..46c9b65 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -84,7 +84,8 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
  * Note: this function only works correctly once the E820 table is sorted and
  * not-overlapping (at least for the range specified), which is the case 
normally.
  */
-bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+static struct e820_entry *__e820__mapped_all(u64 start, u64 end,
+enum e820_type type)
 {
int i;
 
@@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum 
e820_type type)
 * coverage of the desired range exists:
 */
if (start >= end)
-   return 1;
+   return entry;
}
-   return 0;
+
+   return NULL;
+}
+
+/*
+ * This function checks if the entire range  is mapped with type.
+ */
+bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+{
+   return __e820__mapped_all(start, end, type);
+}
+
+/*
+ * This function returns the type associated with the range .
+ */
+int e820__get_entry_type(u64 start, u64 end)
+{
+   struct e820_entry *entry = __e820__mapped_all(start, end, 0);
+
+   return entry ? entry->type : -EINVAL;
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 19/38] x86/mm: Add support to access boot related data in the clear

2017-06-27 Thread Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted because UEFI/BIOS does not run with SME active. In order to access
this data properly it needs to be mapped decrypted.

Update early_memremap() to provide an arch specific routine to modify the
pagetable protection attributes before they are applied to the new
mapping. This is used to remove the encryption mask for boot related data.

Update memremap() to provide an arch specific routine to determine if RAM
remapping is allowed.  RAM remapping will cause an encrypted mapping to be
generated. By preventing RAM remapping, ioremap_cache() will be used
instead, which will provide a decrypted mapping of the boot related data.

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/io.h |5 +
 arch/x86/mm/ioremap.c |  179 +
 include/linux/io.h|2 +
 kernel/memremap.c |   20 -
 mm/early_ioremap.c|   18 -
 5 files changed, 217 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7afb0e2..09c5557 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -381,4 +381,9 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+extern bool arch_memremap_can_ram_remap(resource_size_t offset,
+   unsigned long size,
+   unsigned long flags);
+#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 26db273..ee33838 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -414,6 +416,183 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+/*
+ * Examine the physical address to determine if it is an area of memory
+ * that should be mapped decrypted.  If the memory is not part of the
+ * kernel usable area it was accessed and created decrypted, so these
+ * areas should be mapped decrypted.
+ */
+static bool memremap_should_map_decrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+   /* Check if the address is outside kernel usable area */
+   switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
+   case E820_TYPE_RESERVED:
+   case E820_TYPE_ACPI:
+   case E820_TYPE_NVS:
+   case E820_TYPE_UNUSABLE:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+/*
+ * Examine the physical address to determine if it is EFI data. Check
+ * it against the boot params structure and EFI tables and memory types.
+ */
+static bool memremap_is_efi_data(resource_size_t phys_addr,
+unsigned long size)
+{
+   u64 paddr;
+
+   /* Check if the address is part of EFI boot/runtime data */
+   if (!efi_enabled(EFI_BOOT))
+   return false;
+
+   paddr = boot_params.efi_info.efi_memmap_hi;
+   paddr <<= 32;
+   paddr |= boot_params.efi_info.efi_memmap;
+   if (phys_addr == paddr)
+   return true;
+
+   paddr = boot_params.efi_info.efi_systab_hi;
+   paddr <<= 32;
+   paddr |= boot_params.efi_info.efi_systab;
+   if (phys_addr == paddr)
+   return true;
+
+   if (efi_is_table_address(phys_addr))
+   return true;
+
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_BOOT_SERVICES_DATA:
+   case EFI_RUNTIME_SERVICES_DATA:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain.
+ */
+static bool memremap_is_setup_data(resource_size_t phys_addr,
+  unsigned long size)
+{
+   struct setup_data *data;
+   u64 paddr, paddr_next;
+
+   paddr = boot_params.hdr.setup_data;
+   while (paddr) {
+   unsigned int len;
+
+   if (phys_addr == paddr)
+   return true;
+
+   data = memremap(paddr, sizeof(*data),
+   MEMREMAP_WB | MEMREMAP_DEC);
+
+   paddr_next = data->next;
+   len = data->len;
+
+   memunmap(data);
+
+   if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
+   return true;
+
+   paddr = paddr_next;
+   }
+
+   return 

[PATCH v8 RESEND 00/38] x86: Secure Memory Encryption (AMD)

2017-06-27 Thread Tom Lendacky
RESENDING - Mail Server Issues

This patch series provides support for AMD's new Secure Memory Encryption (SME)
feature.

SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.

The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.

This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.

The following links provide additional detail:

AMD Memory Encryption whitepaper:
   
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
   http://support.amd.com/TechDocs/24593.pdf
   SME is section 7.10
   SEV is section 15.34

---

This patch series is based off of the master branch of tip:
  https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

  Commit 6ab5af989579 ("Merge branch 'irq/core'")

Source code is also available at https://github.com/codomania/tip/tree/sme-v8


Still to do:
- Kdump support, including using memremap() instead of ioremap_cache()

Changes since v7:
- Fixed kbuild test robot failure related to pgprot_decrypted() macro
  usage for some non-x86 archs
- Moved calls to encrypt the kernel and retrieve the encryption mask
  from assembler (head_64.S) into C (head64.c)
- Removed use of phys_to_virt() in __ioremap_caller() when address is in
  the ISA range. Now regular ioremap() processing occurs.
- Two new, small patches:
  - Introduced a native_make_p4d() for use when CONFIG_PGTABLE_LEVELS is
not greater than 4
  - Introduced __nostackp GCC option to turn off stack protection on a
per function basis
- General code cleanup based on feedback

Changes since v6:
- Fixed the asm include file issue that caused build errors on other archs
- Rebased the CR3 register changes on top of Andy Lutomirski's patch
- Added a patch to clear the SME cpu feature if running as a PV guest under
  Xen
- Added a patch to obtain the AMD microcode level earlier in the boot
  instead of directly reading the MSR
- Refactor patch #8 ("x86/mm: Add support to enable SME in early boot
  processing") because the 5-level paging support moved the code into the
  new C-function __startup_64()
- Removed need to decrypt trampoline area in-place (set memory attributes
  before copying the trampoline code)
- General code cleanup based on feedback

Changes since v5:
- Added support for 5-level paging
- Added IOMMU support
- Created a generic asm/mem_encrypt.h in order to remove a bunch of
  #ifndef/#define entries
- Removed changes to the __va() macro and defined a function to return
  the true physical address in cr3
- Removed sysfs support as it was determined not to be needed
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions

Changes since v4:
- Re-worked mapping of setup data to not use a fixed list. Rather, check
  dynamically whether the requested early_memremap()/memremap() call
  needs to be mapped decrypted.
- Moved SME cpu feature into scattered features
- Moved some declarations into header files
- Cleared the encryption mask from the __PHYSICAL_MASK so that users
  of macros such as pmd_pfn_mask() don't have to worry/know about the
  encryption mask
- Updated some return types and values related to EFI and e820 functions
  so that an error could be returned
- During cpu shutdown, removed cache disabling and added a check for kexec
  in progress to use wbinvd followed immediately by halt in order to avoid
  any memory corruption
- Update how persistent memory is identified
- Added a function to find command line arguments and their values
- Added sysfs support
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions


Changes since v3:
- Broke out some of the 

[PATCH v8 RESEND 14/38] x86/mm: Insure that boot memory areas are mapped properly

2017-06-27 Thread Tom Lendacky
The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process.  The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.

For the initrd, encrypt this data in place. Since the future mapping of
the initrd area will be mapped as encrypted the data will be accessed
properly.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |6 +++
 arch/x86/include/asm/pgtable.h |3 ++
 arch/x86/kernel/head64.c   |   30 +++--
 arch/x86/kernel/setup.c|9 +
 arch/x86/mm/kasan_init_64.c|2 +
 arch/x86/mm/mem_encrypt.c  |   63 
 6 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 8baa35b..ab1fe77 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,9 @@ void __init sme_early_encrypt(resource_size_t paddr,
 void __init sme_early_decrypt(resource_size_t paddr,
  unsigned long size);
 
+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_unmap_bootdata(char *real_mode_data);
+
 void __init sme_early_init(void);
 
 void __init sme_encrypt_kernel(void);
@@ -40,6 +43,9 @@ static inline void __init sme_early_encrypt(resource_size_t 
paddr,
 static inline void __init sme_early_decrypt(resource_size_t paddr,
unsigned long size) { }
 
+static inline void __init sme_map_bootdata(char *real_mode_data) { }
+static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
+
 static inline void __init sme_early_init(void) { }
 
 static inline void __init sme_encrypt_kernel(void) { }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index c6452cb..bbeae4a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,9 @@
 #ifndef __ASSEMBLY__
 #include 
 
+extern pgd_t early_top_pgt[PTRS_PER_PGD];
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
+
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 void ptdump_walk_pgd_level_checkwx(void);
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 5cd0b72..0cdb53b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -34,7 +34,6 @@
 /*
  * Manage page tables very early on.
  */
-extern pgd_t early_top_pgt[PTRS_PER_PGD];
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
@@ -181,13 +180,13 @@ static void __init reset_early_page_tables(void)
 }
 
 /* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
unsigned long physaddr = address - __PAGE_OFFSET;
pgdval_t pgd, *pgd_p;
p4dval_t p4d, *p4d_p;
pudval_t pud, *pud_p;
-   pmdval_t pmd, *pmd_p;
+   pmdval_t *pmd_p;
 
/* Invalid address or early pgt is done ?  */
if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_top_pgt))
@@ -246,12 +245,21 @@ int __init early_make_pgtable(unsigned long address)
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + 
_KERNPG_TABLE;
}
-   pmd = (physaddr & PMD_MASK) + early_pmd_flags;
pmd_p[pmd_index(address)] = pmd;
 
return 0;
 }
 
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   pmdval_t pmd;
+
+   pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+   return __early_make_pgtable(address, pmd);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
yet. */
 static void __init clear_bss(void)
@@ -274,6 +282,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;
 
+   /*
+* If SME is active, this will create decrypted mappings of the
+* boot data in advance of the copy operations.
+*/
+   sme_map_bootdata(real_mode_data);
+
memcpy(_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(_params);
cmd_line_ptr = get_cmd_line_ptr();
@@ -281,6 +295,14 @@ static void __init copy_bootdata(char *real_mode_data)
command_line = __va(cmd_line_ptr);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
}
+
+   /*
+* The old boot data is no longer needed and won't be reserved,
+* freeing up that memory for use by the 

[PATCH v8 RESEND 20/38] x86, mpparse: Use memremap to map the mpf and mpc data

2017-06-27 Thread Tom Lendacky
The SMP MP-table is built by UEFI and placed in memory in a decrypted
state. These tables are accessed using a mix of early_memremap(),
early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
to use early_memremap()/early_memunmap(). This allows for proper setting
of the encryption mask so that the data can be successfully accessed when
SME is active.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/mpparse.c |   98 -
 1 file changed, 70 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index fd37f39..5cbb317 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -429,7 +429,7 @@ static inline void __init construct_default_ISA_mptable(int 
mpc_default_type)
}
 }
 
-static struct mpf_intel *mpf_found;
+static unsigned long mpf_base;
 
 static unsigned long __init get_mpc_size(unsigned long physptr)
 {
@@ -451,6 +451,7 @@ static int __init check_physptr(struct mpf_intel *mpf, 
unsigned int early)
 
size = get_mpc_size(mpf->physptr);
mpc = early_memremap(mpf->physptr, size);
+
/*
 * Read the physical hardware table.  Anything here will
 * override the defaults.
@@ -497,12 +498,12 @@ static int __init check_physptr(struct mpf_intel *mpf, 
unsigned int early)
  */
 void __init default_get_smp_config(unsigned int early)
 {
-   struct mpf_intel *mpf = mpf_found;
+   struct mpf_intel *mpf;
 
if (!smp_found_config)
return;
 
-   if (!mpf)
+   if (!mpf_base)
return;
 
if (acpi_lapic && early)
@@ -515,6 +516,12 @@ void __init default_get_smp_config(unsigned int early)
if (acpi_lapic && acpi_ioapic)
return;
 
+   mpf = early_memremap(mpf_base, sizeof(*mpf));
+   if (!mpf) {
+   pr_err("MPTABLE: error mapping MP table\n");
+   return;
+   }
+
pr_info("Intel MultiProcessor Specification v1.%d\n",
mpf->specification);
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
@@ -529,7 +536,7 @@ void __init default_get_smp_config(unsigned int early)
/*
 * Now see if we need to read further.
 */
-   if (mpf->feature1 != 0) {
+   if (mpf->feature1) {
if (early) {
/*
 * local APIC has default address
@@ -542,8 +549,10 @@ void __init default_get_smp_config(unsigned int early)
construct_default_ISA_mptable(mpf->feature1);
 
} else if (mpf->physptr) {
-   if (check_physptr(mpf, early))
+   if (check_physptr(mpf, early)) {
+   early_memunmap(mpf, sizeof(*mpf));
return;
+   }
} else
BUG();
 
@@ -552,6 +561,8 @@ void __init default_get_smp_config(unsigned int early)
/*
 * Only use the first configuration found.
 */
+
+   early_memunmap(mpf, sizeof(*mpf));
 }
 
 static void __init smp_reserve_memory(struct mpf_intel *mpf)
@@ -561,15 +572,16 @@ static void __init smp_reserve_memory(struct mpf_intel 
*mpf)
 
 static int __init smp_scan_config(unsigned long base, unsigned long length)
 {
-   unsigned int *bp = phys_to_virt(base);
+   unsigned int *bp;
struct mpf_intel *mpf;
-   unsigned long mem;
+   int ret = 0;
 
apic_printk(APIC_VERBOSE, "Scan for SMP in [mem %#010lx-%#010lx]\n",
base, base + length - 1);
BUILD_BUG_ON(sizeof(*mpf) != 16);
 
while (length > 0) {
+   bp = early_memremap(base, length);
mpf = (struct mpf_intel *)bp;
if ((*bp == SMP_MAGIC_IDENT) &&
(mpf->length == 1) &&
@@ -579,24 +591,26 @@ static int __init smp_scan_config(unsigned long base, 
unsigned long length)
 #ifdef CONFIG_X86_LOCAL_APIC
smp_found_config = 1;
 #endif
-   mpf_found = mpf;
+   mpf_base = base;
 
-   pr_info("found SMP MP-table at [mem %#010llx-%#010llx] 
mapped at [%p]\n",
-   (unsigned long long) virt_to_phys(mpf),
-   (unsigned long long) virt_to_phys(mpf) +
-   sizeof(*mpf) - 1, mpf);
+   pr_info("found SMP MP-table at [mem %#010lx-%#010lx] 
mapped at [%p]\n",
+   base, base + sizeof(*mpf) - 1, mpf);
 
-   mem = virt_to_phys(mpf);
-   memblock_reserve(mem, sizeof(*mpf));
+   memblock_reserve(base, sizeof(*mpf));
if (mpf->physptr)
smp_reserve_memory(mpf);
 
-   return 1;
+   ret = 1;
}
-

[PATCH v8 RESEND 17/38] efi: Update efi_mem_type() to return an error rather than 0

2017-06-27 Thread Tom Lendacky
The efi_mem_type() function currently returns a 0, which maps to
EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
the supplied physical address. Returning EFI_RESERVED_TYPE implies that
a memmap entry exists, when it doesn't.  Instead of returning 0, change
the function to return a negative error value when no memmap entry is
found.

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/ia64/kernel/efi.c  |4 ++--
 arch/x86/platform/efi/efi.c |6 +++---
 include/linux/efi.h |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 1212956..8141600 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -757,14 +757,14 @@ static void __init handle_palo(unsigned long phys_addr)
return 0;
 }
 
-u32
+int
 efi_mem_type (unsigned long phys_addr)
 {
efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);
 
if (md)
return md->type;
-   return 0;
+   return -EINVAL;
 }
 
 u64
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f084d87..6217b23 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1035,12 +1035,12 @@ void __init efi_enter_virtual_mode(void)
 /*
  * Convenience functions to obtain memory types and attributes
  */
-u32 efi_mem_type(unsigned long phys_addr)
+int efi_mem_type(unsigned long phys_addr)
 {
efi_memory_desc_t *md;
 
if (!efi_enabled(EFI_MEMMAP))
-   return 0;
+   return -ENOTSUPP;
 
for_each_efi_memory_desc(md) {
if ((md->phys_addr <= phys_addr) &&
@@ -1048,7 +1048,7 @@ u32 efi_mem_type(unsigned long phys_addr)
  (md->num_pages << EFI_PAGE_SHIFT
return md->type;
}
-   return 0;
+   return -EINVAL;
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8e24f09..4e47f78 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -985,7 +985,7 @@ static inline void efi_esrt_init(void) { }
 extern int efi_config_parse_tables(void *config_tables, int count, int sz,
   efi_config_table_type_t *arch_tables);
 extern u64 efi_get_iobase (void);
-extern u32 efi_mem_type (unsigned long phys_addr);
+extern int efi_mem_type(unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
 extern u64 efi_mem_attribute (unsigned long phys_addr, unsigned long size);
 extern int __init efi_uart_console_only (void);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 10/38] x86/mm: Provide general kernel support for memory encryption

2017-06-27 Thread Tom Lendacky
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros.  Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active.  Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.

The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.

Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask.  These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.

Also, an early initialization function is added for SME.  If SME is active,
this function:
 - Updates the early_pmd_flags so that early page faults create mappings
   with the encryption mask.
 - Updates the __supported_pte_mask to include the encryption mask.
 - Updates the protection_map entries to include the encryption mask so
   that user-space allocations will automatically have the encryption mask
   applied.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/boot/compressed/pagetable.c |7 +
 arch/x86/include/asm/fixmap.h|7 +
 arch/x86/include/asm/mem_encrypt.h   |   13 ++
 arch/x86/include/asm/page_types.h|3 ++
 arch/x86/include/asm/pgtable.h   |9 +++
 arch/x86/include/asm/pgtable_types.h |   45 ++
 arch/x86/include/asm/processor.h |3 ++
 arch/x86/kernel/espfix_64.c  |2 +-
 arch/x86/kernel/head64.c |   11 +++-
 arch/x86/kernel/head_64.S|   20 ---
 arch/x86/mm/kasan_init_64.c  |4 ++-
 arch/x86/mm/mem_encrypt.c|   17 +
 arch/x86/mm/pageattr.c   |3 ++
 include/asm-generic/pgtable.h|   12 +
 include/linux/mem_encrypt.h  |8 ++
 15 files changed, 131 insertions(+), 33 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c 
b/arch/x86/boot/compressed/pagetable.c
index 8e69df9..246bf29 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
 #define __pa(x)  ((unsigned long)(x))
 #define __va(x)  ((void *)((unsigned long)(x)))
 
+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155c..d9ff226 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -157,6 +157,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
 }
 #endif
 
+/*
+ * FIXMAP_PAGE_NOCACHE is used for MMIO. Memory encryption is not
+ * supported for MMIO addresses, so make sure that the memory encryption
+ * mask is not part of the page attributes.
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
 #include 
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 475e34f..dbae7a5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,8 @@
 
 extern unsigned long sme_me_mask;
 
+void __init sme_early_init(void);
+
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(void);
 
@@ -28,11 +30,22 @@
 
 #define sme_me_mask0UL
 
+static inline void __init sme_early_init(void) { }
+
 static inline void __init sme_encrypt_kernel(void) { }
 static inline void __init sme_enable(void) { }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
+/*
+ * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
+ * writing to or comparing values from the cr3 register.  Having the
+ * encryption mask set in cr3 enables the PGD entry to be encrypted and
+ * avoid special case handling of PGD allocations.
+ */
+#define __sme_pa(x)(__pa(x) | sme_me_mask)
+#define __sme_pa_nodebug(x)(__pa_nodebug(x) | sme_me_mask)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h

[PATCH v8 RESEND 18/38] x86/efi: Update EFI pagetable creation to work with SME

2017-06-27 Thread Tom Lendacky
When SME is active, pagetable entries created for EFI need to have the
encryption mask set as necessary.

When the new pagetable pages are allocated they are mapped encrypted. So,
update the efi_pgt value that will be used in cr3 to include the encryption
mask so that the PGD table can be read successfully. The pagetable mapping
as well as the kernel are also added to the pagetable mapping as encrypted.
All other EFI mappings are mapped decrypted (tables, etc.).

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/platform/efi/efi_64.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 9bf72f5..12e8388 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -327,7 +327,7 @@ void efi_sync_low_kernel_mappings(void)
 
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
-   unsigned long pfn, text;
+   unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
pgd_t *pgd;
@@ -335,7 +335,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
 
-   efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+   /*
+* Since the PGD is encrypted, set the encryption mask so that when
+* this value is loaded into cr3 the PGD will be decrypted during
+* the pagetable walk.
+*/
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
pgd = efi_pgd;
 
/*
@@ -345,7 +350,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * phys_efi_set_virtual_address_map().
 */
pfn = pa_memmap >> PAGE_SHIFT;
-   if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | 
_PAGE_RW)) {
+   pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -388,7 +394,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;
 
-   if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+   pf = _PAGE_RW | _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()

2017-06-27 Thread Tom Lendacky
Currently there is a check if the address being mapped is in the ISA
range (is_ISA_range()), and if it is, then phys_to_virt() is used to
perform the mapping. When SME is active, the default is to add pagetable
mappings with the encryption bit set unless specifically overridden. The
resulting pagetable mapping from phys_to_virt() will result in a mapping
that has the encryption bit set. With SME, the use of ioremap() is
intended to generate pagetable mappings that do not have the encryption
bit set through the use of the PAGE_KERNEL_IO protection value.

Rather than special case the SME scenario, remove the ISA range check and
usage of phys_to_virt() and have ISA range mappings continue through the
remaining ioremap() path.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4c1b5fd..bfc3e2d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -106,12 +107,6 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
}
 
/*
-* Don't remap the low PCI/ISA area, it's always mapped..
-*/
-   if (is_ISA_range(phys_addr, last_addr))
-   return (__force void __iomem *)phys_to_virt(phys_addr);
-
-   /*
 * Don't allow anybody to remap normal RAM that we're using..
 */
pfn  = phys_addr >> PAGE_SHIFT;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 11/38] x86/mm: Add SME support for read_cr3_pa()

2017-06-27 Thread Tom Lendacky
The cr3 register entry can contain the SME encryption mask that indicates
the PGD is encrypted.  The encryption mask should not be used when
creating a virtual address from the cr3 register, so remove the SME
encryption mask in the read_cr3_pa() function.

During early boot SME will need to use a native version of read_cr3_pa(),
so create native_read_cr3_pa().

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/processor-flags.h |5 +++--
 arch/x86/include/asm/processor.h   |5 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h 
b/arch/x86/include/asm/processor-flags.h
index 79aa2f9..f5d3e50 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_PROCESSOR_FLAGS_H
 
 #include 
+#include 
 
 #ifdef CONFIG_VM86
 #define X86_VM_MASKX86_EFLAGS_VM
@@ -32,8 +33,8 @@
  * CR3_ADDR_MASK is the mask used by read_cr3_pa().
  */
 #ifdef CONFIG_X86_64
-/* Mask off the address space ID bits. */
-#define CR3_ADDR_MASK 0x7000ull
+/* Mask off the address space ID and SME encryption bits. */
+#define CR3_ADDR_MASK __sme_clr(0x7000ull)
 #define CR3_PCID_MASK 0xFFFull
 #else
 /*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8010c97..ab878bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -240,6 +240,11 @@ static inline unsigned long read_cr3_pa(void)
return __read_cr3() & CR3_ADDR_MASK;
 }
 
+static inline unsigned long native_read_cr3_pa(void)
+{
+   return __native_read_cr3() & CR3_ADDR_MASK;
+}
+
 static inline void load_cr3(pgd_t *pgdir)
 {
write_cr3(__sme_pa(pgdir));

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 05/38] x86/CPU/AMD: Handle SME reduction in physical address size

2017-06-27 Thread Tom Lendacky
When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/cpu/amd.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c47ceee..5bdcbd4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -613,15 +613,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_E400);
 
/*
-* BIOS support is required for SME. If BIOS has not enabled SME
-* then don't advertise the feature (set in scattered.c)
+* BIOS support is required for SME. If BIOS has enabld SME then
+* adjust x86_phys_bits by the SME physical address space reduction
+* value. If BIOS has not enabled SME then don't advertise the
+* feature (set in scattered.c).
 */
if (cpu_has(c, X86_FEATURE_SME)) {
u64 msr;
 
/* Check if SME is enabled */
rdmsrl(MSR_K8_SYSCFG, msr);
-   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
+   c->x86_phys_bits -= (cpuid_ebx(0x801f) >> 6) & 0x3f;
+   else
clear_cpu_cap(c, X86_FEATURE_SME);
}
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 01/38] x86: Document AMD Secure Memory Encryption (SME)

2017-06-27 Thread Tom Lendacky
Create a Documentation entry to describe the AMD Secure Memory
Encryption (SME) feature and add documentation for the mem_encrypt=
kernel parameter.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 Documentation/admin-guide/kernel-parameters.txt |   11 
 Documentation/x86/amd-memory-encryption.txt |   68 +++
 2 files changed, 79 insertions(+)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9b0b3de..51e03ee 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2197,6 +2197,17 @@
memory contents and reserves bad memory
regions that are detected.
 
+   mem_encrypt=[X86-64] AMD Secure Memory Encryption (SME) control
+   Valid arguments: on, off
+   Default (depends on kernel configuration option):
+ on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
+ off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
+   mem_encrypt=on: Activate SME
+   mem_encrypt=off:Do not activate SME
+
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.
+
mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 000..f512ab7
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,68 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables.  A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM.  SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below on how to determine its position).  The encryption bit can also be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted by setting the encryption
+bit in the page table entry that points to the next table. This allows the full
+page table hierarchy to be encrypted. Note, this means that just because the
+encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted.
+Each page table entry in the hierarchy needs to have the encryption bit set to
+achieve that. So, theoretically, you could have the encryption bit set in cr3
+so that the PGD is encrypted, but not set the encryption bit in the PGD entry
+for a PUD which results in the PUD pointed to by that entry to not be
+encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x801f reports information related to SME:
+
+   0x801f[eax]:
+   Bit[0] indicates support for SME
+   0x801f[ebx]:
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+  system physical addresses, not guest physical
+  addresses)
+
+If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+   0xc0010010:
+   Bit[23]   0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system.  If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+The state of SME in the Linux kernel can be documented as follows:
+   - Supported:
+ The CPU supports SME (determined through CPUID instruction).
+
+   - Enabled:
+ Supported and bit 23 of MSR_K8_SYSCFG is set.
+
+   - Active:
+ Supported, Enabled and the Linux kernel is actively applying
+ the encryption bit to page table entries (the SME mask in the
+ kernel is non-zero).
+
+SME can also be enabled and 

[PATCH v8 RESEND 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature

2017-06-27 Thread Tom Lendacky
Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature.  SME is identified by CPUID
0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG).  Only show the SME feature as available if reported by
CPUID and enabled by BIOS.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/msr-index.h   |2 ++
 arch/x86/kernel/cpu/amd.c  |   13 +
 arch/x86/kernel/cpu/scattered.c|1 +
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2701e5f..2b692df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -196,6 +196,7 @@
 
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory 
Encryption */
 
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number 
*/
 #define X86_FEATURE_INTEL_PT   ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18b1623..460ac01 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -352,6 +352,8 @@
 #define MSR_K8_TOP_MEM10xc001001a
 #define MSR_K8_TOP_MEM20xc001001d
 #define MSR_K8_SYSCFG  0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT  23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT  BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
 #define MSR_K8_INT_PENDING_MSG 0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK0x1800
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index bb5abe8..c47ceee 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -611,6 +611,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 */
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);
+
+   /*
+* BIOS support is required for SME. If BIOS has not enabled SME
+* then don't advertise the feature (set in scattered.c)
+*/
+   if (cpu_has(c, X86_FEATURE_SME)) {
+   u64 msr;
+
+   /* Check if SME is enabled */
+   rdmsrl(MSR_K8_SYSCFG, msr);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   clear_cpu_cap(c, X86_FEATURE_SME);
+   }
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 23c2350..05459ad 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ struct cpuid_bit {
{ X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 RESEND 03/38] x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap for RAM mappings

2017-06-27 Thread Tom Lendacky
The ioremap() function is intended for mapping MMIO. For RAM, the
memremap() function should be used. Convert calls from ioremap() to
memremap() when re-mapping RAM.

This will be used later by SME to control how the encryption mask is
applied to memory mappings, with certain memory locations being mapped
decrypted vs encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/dmi.h   |8 
 arch/x86/kernel/acpi/boot.c  |6 +++---
 arch/x86/kernel/kdebugfs.c   |   34 +++---
 arch/x86/kernel/ksysfs.c |   28 ++--
 arch/x86/kernel/mpparse.c|   10 +-
 arch/x86/pci/common.c|4 ++--
 drivers/firmware/dmi-sysfs.c |5 +++--
 drivers/firmware/pcdp.c  |4 ++--
 drivers/sfi/sfi_core.c   |   22 +++---
 9 files changed, 55 insertions(+), 66 deletions(-)

diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
index 3c69fed..a8e15b0 100644
--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -13,9 +13,9 @@ static __always_inline __init void *dmi_alloc(unsigned len)
 }
 
 /* Use early IO mappings for DMI because it's initialized early */
-#define dmi_early_remapearly_ioremap
-#define dmi_early_unmapearly_iounmap
-#define dmi_remap  ioremap_cache
-#define dmi_unmap  iounmap
+#define dmi_early_remapearly_memremap
+#define dmi_early_unmapearly_memunmap
+#define dmi_remap(_x, _l)  memremap(_x, _l, MEMREMAP_WB)
+#define dmi_unmap(_x)  memunmap(_x)
 
 #endif /* _ASM_X86_DMI_H */
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6bb6806..850160a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -115,7 +115,7 @@
 #defineACPI_INVALID_GSIINT_MIN
 
 /*
- * This is just a simple wrapper around early_ioremap(),
+ * This is just a simple wrapper around early_memremap(),
  * with sanity checks for phys == 0 and size == 0.
  */
 char *__init __acpi_map_table(unsigned long phys, unsigned long size)
@@ -124,7 +124,7 @@ char *__init __acpi_map_table(unsigned long phys, unsigned 
long size)
if (!phys || !size)
return NULL;
 
-   return early_ioremap(phys, size);
+   return early_memremap(phys, size);
 }
 
 void __init __acpi_unmap_table(char *map, unsigned long size)
@@ -132,7 +132,7 @@ void __init __acpi_unmap_table(char *map, unsigned long 
size)
if (!map || !size)
return;
 
-   early_iounmap(map, size);
+   early_memunmap(map, size);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index 38b6458..fd6f8fb 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -33,7 +33,6 @@ static ssize_t setup_data_read(struct file *file, char __user 
*user_buf,
struct setup_data_node *node = file->private_data;
unsigned long remain;
loff_t pos = *ppos;
-   struct page *pg;
void *p;
u64 pa;
 
@@ -47,18 +46,13 @@ static ssize_t setup_data_read(struct file *file, char 
__user *user_buf,
count = node->len - pos;
 
pa = node->paddr + sizeof(struct setup_data) + pos;
-   pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
-   if (PageHighMem(pg)) {
-   p = ioremap_cache(pa, count);
-   if (!p)
-   return -ENXIO;
-   } else
-   p = __va(pa);
+   p = memremap(pa, count, MEMREMAP_WB);
+   if (!p)
+   return -ENOMEM;
 
remain = copy_to_user(user_buf, p, count);
 
-   if (PageHighMem(pg))
-   iounmap(p);
+   memunmap(p);
 
if (remain)
return -EFAULT;
@@ -109,7 +103,6 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
struct setup_data *data;
int error;
struct dentry *d;
-   struct page *pg;
u64 pa_data;
int no = 0;
 
@@ -126,16 +119,12 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
goto err_dir;
}
 
-   pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
-   if (PageHighMem(pg)) {
-   data = ioremap_cache(pa_data, sizeof(*data));
-   if (!data) {
-   kfree(node);
-   error = -ENXIO;
-   goto err_dir;
-   }
-   } else
-   data = __va(pa_data);
+   data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
+   if (!data) {
+   kfree(node);
+   error = -ENOMEM;
+   goto err_dir;
+   }
 
node->paddr 

[PATCH v8 RESEND 02/38] x86/mm/pat: Set write-protect cache mode for full PAT support

2017-06-27 Thread Tom Lendacky
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).

Acked-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/pat.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 9b78685..6753d9c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -295,7 +295,7 @@ static void init_cache_modes(void)
  * pat_init - Initialize PAT MSR and PAT table
  *
  * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC and WT.
+ * to enable additional cache attributes, WC, WT and WP.
  *
  * This function must be called on all CPUs using the specific sequence of
  * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
@@ -356,7 +356,7 @@ void pat_init(void)
 *  0102UC-: _PAGE_CACHE_MODE_UC_MINUS
 *  0113UC : _PAGE_CACHE_MODE_UC
 *  1004WB : Reserved
-*  1015WC : Reserved
+*  1015WP : _PAGE_CACHE_MODE_WP
 *  1106UC-: Reserved
 *  1117WT : _PAGE_CACHE_MODE_WT
 *
@@ -364,7 +364,7 @@ void pat_init(void)
 * corresponding types in the presence of PAT errata.
 */
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
}
 
if (!boot_cpu_done) {

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 16/38] efi: Add an EFI table address match function

2017-06-27 Thread Tom Lendacky
Add a function that will determine if a supplied physical address matches
the address of an EFI table.

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 drivers/firmware/efi/efi.c |   33 +
 include/linux/efi.h|7 +++
 2 files changed, 40 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 045d6d3..69d4d13 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
 };
 EXPORT_SYMBOL(efi);
 
+static unsigned long *efi_tables[] = {
+   ,
+   ,
+   ,
+   ,
+   ,
+   _systab,
+   _info,
+   ,
+   ,
+   _systab,
+   _vendor,
+   ,
+   _table,
+   ,
+   _table,
+   _attr_table,
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -855,6 +874,20 @@ int efi_status_to_err(efi_status_t status)
return err;
 }
 
+bool efi_is_table_address(unsigned long phys_addr)
+{
+   unsigned int i;
+
+   if (phys_addr == EFI_INVALID_TABLE_ADDR)
+   return false;
+
+   for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+   if (*(efi_tables[i]) == phys_addr)
+   return true;
+
+   return false;
+}
+
 #ifdef CONFIG_KEXEC
 static int update_efi_random_seed(struct notifier_block *nb,
  unsigned long code, void *unused)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb..8e24f09 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1091,6 +1091,8 @@ static inline bool efi_enabled(int feature)
return test_bit(feature, ) != 0;
 }
 extern void efi_reboot(enum reboot_mode reboot_mode, const char *__unused);
+
+extern bool efi_is_table_address(unsigned long phys_addr);
 #else
 static inline bool efi_enabled(int feature)
 {
@@ -1104,6 +1106,11 @@ static inline bool efi_enabled(int feature)
 {
return false;
 }
+
+static inline bool efi_is_table_address(unsigned long phys_addr)
+{
+   return false;
+}
 #endif
 
 extern int efi_status_to_err(efi_status_t status);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 22/38] x86/mm: Add support for changing the memory encryption attribute

2017-06-27 Thread Tom Lendacky
Add support for changing the memory encryption attribute for one or more
memory pages. This will be useful when we have to change the AP trampoline
area to not be encrypted. Or when we need to change the SWIOTLB area to
not be encrypted in support of devices that can't support the encryption
mask range.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/set_memory.h |3 ++
 arch/x86/mm/pageattr.c|   62 +
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h 
b/arch/x86/include/asm/set_memory.h
index eaec6c3..cd71273 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -11,6 +11,7 @@
  * Executability : eXeutable, NoteXecutable
  * Read/Write: ReadOnly, ReadWrite
  * Presence  : NotPresent
+ * Encryption: Encrypted, Decrypted
  *
  * Within a category, the attributes are mutually exclusive.
  *
@@ -42,6 +43,8 @@
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e7d3866..d9e09fb 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1769,6 +1769,68 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+   struct cpa_data cpa;
+   unsigned long start;
+   int ret;
+
+   /* Nothing to do if the SME is not active */
+   if (!sme_active())
+   return 0;
+
+   /* Should not be working on unaligned addresses */
+   if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
+   addr &= PAGE_MASK;
+
+   start = addr;
+
+   memset(, 0, sizeof(cpa));
+   cpa.vaddr = 
+   cpa.numpages = numpages;
+   cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
+   cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+   cpa.pgd = init_mm.pgd;
+
+   /* Must avoid aliasing mappings in the highmem code */
+   kmap_flush_unused();
+   vm_unmap_aliases();
+
+   /*
+* Before changing the encryption attribute, we need to flush caches.
+*/
+   if (static_cpu_has(X86_FEATURE_CLFLUSH))
+   cpa_flush_range(start, numpages, 1);
+   else
+   cpa_flush_all(1);
+
+   ret = __change_page_attr_set_clr(, 1);
+
+   /*
+* After changing the encryption attribute, we need to flush TLBs
+* again in case any speculative TLB caching occurred (but no need
+* to flush caches again).  We could just use cpa_flush_all(), but
+* in case TLB flushing gets optimized in the cpa_flush_range()
+* path use the same logic as above.
+*/
+   if (static_cpu_has(X86_FEATURE_CLFLUSH))
+   cpa_flush_range(start, numpages, 0);
+   else
+   cpa_flush_all(0);
+
+   return ret;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+   return __set_memory_enc_dec(addr, numpages, true);
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+   return __set_memory_enc_dec(addr, numpages, false);
+}
+
 int set_pages_uc(struct page *page, int numpages)
 {
unsigned long addr = (unsigned long)page_address(page);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 20/38] x86, mpparse: Use memremap to map the mpf and mpc data

2017-06-27 Thread Tom Lendacky
The SMP MP-table is built by UEFI and placed in memory in a decrypted
state. These tables are accessed using a mix of early_memremap(),
early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
to use early_memremap()/early_memunmap(). This allows for proper setting
of the encryption mask so that the data can be successfully accessed when
SME is active.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/mpparse.c |   98 -
 1 file changed, 70 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index fd37f39..5cbb317 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -429,7 +429,7 @@ static inline void __init construct_default_ISA_mptable(int 
mpc_default_type)
}
 }
 
-static struct mpf_intel *mpf_found;
+static unsigned long mpf_base;
 
 static unsigned long __init get_mpc_size(unsigned long physptr)
 {
@@ -451,6 +451,7 @@ static int __init check_physptr(struct mpf_intel *mpf, 
unsigned int early)
 
size = get_mpc_size(mpf->physptr);
mpc = early_memremap(mpf->physptr, size);
+
/*
 * Read the physical hardware table.  Anything here will
 * override the defaults.
@@ -497,12 +498,12 @@ static int __init check_physptr(struct mpf_intel *mpf, 
unsigned int early)
  */
 void __init default_get_smp_config(unsigned int early)
 {
-   struct mpf_intel *mpf = mpf_found;
+   struct mpf_intel *mpf;
 
if (!smp_found_config)
return;
 
-   if (!mpf)
+   if (!mpf_base)
return;
 
if (acpi_lapic && early)
@@ -515,6 +516,12 @@ void __init default_get_smp_config(unsigned int early)
if (acpi_lapic && acpi_ioapic)
return;
 
+   mpf = early_memremap(mpf_base, sizeof(*mpf));
+   if (!mpf) {
+   pr_err("MPTABLE: error mapping MP table\n");
+   return;
+   }
+
pr_info("Intel MultiProcessor Specification v1.%d\n",
mpf->specification);
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
@@ -529,7 +536,7 @@ void __init default_get_smp_config(unsigned int early)
/*
 * Now see if we need to read further.
 */
-   if (mpf->feature1 != 0) {
+   if (mpf->feature1) {
if (early) {
/*
 * local APIC has default address
@@ -542,8 +549,10 @@ void __init default_get_smp_config(unsigned int early)
construct_default_ISA_mptable(mpf->feature1);
 
} else if (mpf->physptr) {
-   if (check_physptr(mpf, early))
+   if (check_physptr(mpf, early)) {
+   early_memunmap(mpf, sizeof(*mpf));
return;
+   }
} else
BUG();
 
@@ -552,6 +561,8 @@ void __init default_get_smp_config(unsigned int early)
/*
 * Only use the first configuration found.
 */
+
+   early_memunmap(mpf, sizeof(*mpf));
 }
 
 static void __init smp_reserve_memory(struct mpf_intel *mpf)
@@ -561,15 +572,16 @@ static void __init smp_reserve_memory(struct mpf_intel 
*mpf)
 
 static int __init smp_scan_config(unsigned long base, unsigned long length)
 {
-   unsigned int *bp = phys_to_virt(base);
+   unsigned int *bp;
struct mpf_intel *mpf;
-   unsigned long mem;
+   int ret = 0;
 
apic_printk(APIC_VERBOSE, "Scan for SMP in [mem %#010lx-%#010lx]\n",
base, base + length - 1);
BUILD_BUG_ON(sizeof(*mpf) != 16);
 
while (length > 0) {
+   bp = early_memremap(base, length);
mpf = (struct mpf_intel *)bp;
if ((*bp == SMP_MAGIC_IDENT) &&
(mpf->length == 1) &&
@@ -579,24 +591,26 @@ static int __init smp_scan_config(unsigned long base, 
unsigned long length)
 #ifdef CONFIG_X86_LOCAL_APIC
smp_found_config = 1;
 #endif
-   mpf_found = mpf;
+   mpf_base = base;
 
-   pr_info("found SMP MP-table at [mem %#010llx-%#010llx] 
mapped at [%p]\n",
-   (unsigned long long) virt_to_phys(mpf),
-   (unsigned long long) virt_to_phys(mpf) +
-   sizeof(*mpf) - 1, mpf);
+   pr_info("found SMP MP-table at [mem %#010lx-%#010lx] 
mapped at [%p]\n",
+   base, base + sizeof(*mpf) - 1, mpf);
 
-   mem = virt_to_phys(mpf);
-   memblock_reserve(mem, sizeof(*mpf));
+   memblock_reserve(base, sizeof(*mpf));
if (mpf->physptr)
smp_reserve_memory(mpf);
 
-   return 1;
+   ret = 1;
}
-

[PATCH v8 17/38] efi: Update efi_mem_type() to return an error rather than 0

2017-06-27 Thread Tom Lendacky
The efi_mem_type() function currently returns a 0, which maps to
EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
the supplied physical address. Returning EFI_RESERVED_TYPE implies that
a memmap entry exists, when it doesn't.  Instead of returning 0, change
the function to return a negative error value when no memmap entry is
found.

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/ia64/kernel/efi.c  |4 ++--
 arch/x86/platform/efi/efi.c |6 +++---
 include/linux/efi.h |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 1212956..8141600 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -757,14 +757,14 @@ static void __init handle_palo(unsigned long phys_addr)
return 0;
 }
 
-u32
+int
 efi_mem_type (unsigned long phys_addr)
 {
efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);
 
if (md)
return md->type;
-   return 0;
+   return -EINVAL;
 }
 
 u64
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f084d87..6217b23 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1035,12 +1035,12 @@ void __init efi_enter_virtual_mode(void)
 /*
  * Convenience functions to obtain memory types and attributes
  */
-u32 efi_mem_type(unsigned long phys_addr)
+int efi_mem_type(unsigned long phys_addr)
 {
efi_memory_desc_t *md;
 
if (!efi_enabled(EFI_MEMMAP))
-   return 0;
+   return -ENOTSUPP;
 
for_each_efi_memory_desc(md) {
if ((md->phys_addr <= phys_addr) &&
@@ -1048,7 +1048,7 @@ u32 efi_mem_type(unsigned long phys_addr)
  (md->num_pages << EFI_PAGE_SHIFT
return md->type;
}
-   return 0;
+   return -EINVAL;
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8e24f09..4e47f78 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -985,7 +985,7 @@ static inline void efi_esrt_init(void) { }
 extern int efi_config_parse_tables(void *config_tables, int count, int sz,
   efi_config_table_type_t *arch_tables);
 extern u64 efi_get_iobase (void);
-extern u32 efi_mem_type (unsigned long phys_addr);
+extern int efi_mem_type(unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
 extern u64 efi_mem_attribute (unsigned long phys_addr, unsigned long size);
 extern int __init efi_uart_console_only (void);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 18/38] x86/efi: Update EFI pagetable creation to work with SME

2017-06-27 Thread Tom Lendacky
When SME is active, pagetable entries created for EFI need to have the
encryption mask set as necessary.

When the new pagetable pages are allocated they are mapped encrypted. So,
update the efi_pgt value that will be used in cr3 to include the encryption
mask so that the PGD table can be read successfully. The pagetable mapping
as well as the kernel are also added to the pagetable mapping as encrypted.
All other EFI mappings are mapped decrypted (tables, etc.).

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/platform/efi/efi_64.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 9bf72f5..12e8388 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -327,7 +327,7 @@ void efi_sync_low_kernel_mappings(void)
 
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
-   unsigned long pfn, text;
+   unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
pgd_t *pgd;
@@ -335,7 +335,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
 
-   efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+   /*
+* Since the PGD is encrypted, set the encryption mask so that when
+* this value is loaded into cr3 the PGD will be decrypted during
+* the pagetable walk.
+*/
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
pgd = efi_pgd;
 
/*
@@ -345,7 +350,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * phys_efi_set_virtual_address_map().
 */
pfn = pa_memmap >> PAGE_SHIFT;
-   if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | 
_PAGE_RW)) {
+   pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -388,7 +394,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;
 
-   if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+   pf = _PAGE_RW | _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()

2017-06-27 Thread Tom Lendacky
Currently there is a check if the address being mapped is in the ISA
range (is_ISA_range()), and if it is, then phys_to_virt() is used to
perform the mapping. When SME is active, the default is to add pagetable
mappings with the encryption bit set unless specifically overridden. The
resulting pagetable mapping from phys_to_virt() will result in a mapping
that has the encryption bit set. With SME, the use of ioremap() is
intended to generate pagetable mappings that do not have the encryption
bit set through the use of the PAGE_KERNEL_IO protection value.

Rather than special case the SME scenario, remove the ISA range check and
usage of phys_to_virt() and have ISA range mappings continue through the
remaining ioremap() path.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4c1b5fd..bfc3e2d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -106,12 +107,6 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
}
 
/*
-* Don't remap the low PCI/ISA area, it's always mapped..
-*/
-   if (is_ISA_range(phys_addr, last_addr))
-   return (__force void __iomem *)phys_to_virt(phys_addr);
-
-   /*
 * Don't allow anybody to remap normal RAM that we're using..
 */
pfn  = phys_addr >> PAGE_SHIFT;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 19/38] x86/mm: Add support to access boot related data in the clear

2017-06-27 Thread Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted because UEFI/BIOS does not run with SME active. In order to access
this data properly it needs to be mapped decrypted.

Update early_memremap() to provide an arch specific routine to modify the
pagetable protection attributes before they are applied to the new
mapping. This is used to remove the encryption mask for boot related data.

Update memremap() to provide an arch specific routine to determine if RAM
remapping is allowed.  RAM remapping will cause an encrypted mapping to be
generated. By preventing RAM remapping, ioremap_cache() will be used
instead, which will provide a decrypted mapping of the boot related data.

Reviewed-by: Matt Fleming 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/io.h |5 +
 arch/x86/mm/ioremap.c |  179 +
 include/linux/io.h|2 +
 kernel/memremap.c |   20 -
 mm/early_ioremap.c|   18 -
 5 files changed, 217 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7afb0e2..09c5557 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -381,4 +381,9 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+extern bool arch_memremap_can_ram_remap(resource_size_t offset,
+   unsigned long size,
+   unsigned long flags);
+#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 26db273..ee33838 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -414,6 +416,183 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+/*
+ * Examine the physical address to determine if it is an area of memory
+ * that should be mapped decrypted.  If the memory is not part of the
+ * kernel usable area it was accessed and created decrypted, so these
+ * areas should be mapped decrypted.
+ */
+static bool memremap_should_map_decrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+   /* Check if the address is outside kernel usable area */
+   switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
+   case E820_TYPE_RESERVED:
+   case E820_TYPE_ACPI:
+   case E820_TYPE_NVS:
+   case E820_TYPE_UNUSABLE:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+/*
+ * Examine the physical address to determine if it is EFI data. Check
+ * it against the boot params structure and EFI tables and memory types.
+ */
+static bool memremap_is_efi_data(resource_size_t phys_addr,
+unsigned long size)
+{
+   u64 paddr;
+
+   /* Check if the address is part of EFI boot/runtime data */
+   if (!efi_enabled(EFI_BOOT))
+   return false;
+
+   paddr = boot_params.efi_info.efi_memmap_hi;
+   paddr <<= 32;
+   paddr |= boot_params.efi_info.efi_memmap;
+   if (phys_addr == paddr)
+   return true;
+
+   paddr = boot_params.efi_info.efi_systab_hi;
+   paddr <<= 32;
+   paddr |= boot_params.efi_info.efi_systab;
+   if (phys_addr == paddr)
+   return true;
+
+   if (efi_is_table_address(phys_addr))
+   return true;
+
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_BOOT_SERVICES_DATA:
+   case EFI_RUNTIME_SERVICES_DATA:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain.
+ */
+static bool memremap_is_setup_data(resource_size_t phys_addr,
+  unsigned long size)
+{
+   struct setup_data *data;
+   u64 paddr, paddr_next;
+
+   paddr = boot_params.hdr.setup_data;
+   while (paddr) {
+   unsigned int len;
+
+   if (phys_addr == paddr)
+   return true;
+
+   data = memremap(paddr, sizeof(*data),
+   MEMREMAP_WB | MEMREMAP_DEC);
+
+   paddr_next = data->next;
+   len = data->len;
+
+   memunmap(data);
+
+   if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
+   return true;
+
+   paddr = paddr_next;
+   }
+
+   return 

[PATCH v8 10/38] x86/mm: Provide general kernel support for memory encryption

2017-06-27 Thread Tom Lendacky
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros.  Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active.  Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.

The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.

Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask.  These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.

Also, an early initialization function is added for SME.  If SME is active,
this function:
 - Updates the early_pmd_flags so that early page faults create mappings
   with the encryption mask.
 - Updates the __supported_pte_mask to include the encryption mask.
 - Updates the protection_map entries to include the encryption mask so
   that user-space allocations will automatically have the encryption mask
   applied.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/boot/compressed/pagetable.c |7 +
 arch/x86/include/asm/fixmap.h|7 +
 arch/x86/include/asm/mem_encrypt.h   |   13 ++
 arch/x86/include/asm/page_types.h|3 ++
 arch/x86/include/asm/pgtable.h   |9 +++
 arch/x86/include/asm/pgtable_types.h |   45 ++
 arch/x86/include/asm/processor.h |3 ++
 arch/x86/kernel/espfix_64.c  |2 +-
 arch/x86/kernel/head64.c |   11 +++-
 arch/x86/kernel/head_64.S|   20 ---
 arch/x86/mm/kasan_init_64.c  |4 ++-
 arch/x86/mm/mem_encrypt.c|   17 +
 arch/x86/mm/pageattr.c   |3 ++
 include/asm-generic/pgtable.h|   12 +
 include/linux/mem_encrypt.h  |8 ++
 15 files changed, 131 insertions(+), 33 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c 
b/arch/x86/boot/compressed/pagetable.c
index 8e69df9..246bf29 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
 #define __pa(x)  ((unsigned long)(x))
 #define __va(x)  ((void *)((unsigned long)(x)))
 
+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155c..d9ff226 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -157,6 +157,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
 }
 #endif
 
+/*
+ * FIXMAP_PAGE_NOCACHE is used for MMIO. Memory encryption is not
+ * supported for MMIO addresses, so make sure that the memory encryption
+ * mask is not part of the page attributes.
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
 #include 
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 475e34f..dbae7a5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,8 @@
 
 extern unsigned long sme_me_mask;
 
+void __init sme_early_init(void);
+
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(void);
 
@@ -28,11 +30,22 @@
 
 #define sme_me_mask0UL
 
+static inline void __init sme_early_init(void) { }
+
 static inline void __init sme_encrypt_kernel(void) { }
 static inline void __init sme_enable(void) { }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
+/*
+ * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
+ * writing to or comparing values from the cr3 register.  Having the
+ * encryption mask set in cr3 enables the PGD entry to be encrypted and
+ * avoid special case handling of PGD allocations.
+ */
+#define __sme_pa(x)(__pa(x) | sme_me_mask)
+#define __sme_pa_nodebug(x)(__pa_nodebug(x) | sme_me_mask)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h

[PATCH v8 11/38] x86/mm: Add SME support for read_cr3_pa()

2017-06-27 Thread Tom Lendacky
The cr3 register entry can contain the SME encryption mask that indicates
the PGD is encrypted.  The encryption mask should not be used when
creating a virtual address from the cr3 register, so remove the SME
encryption mask in the read_cr3_pa() function.

During early boot SME will need to use a native version of read_cr3_pa(),
so create native_read_cr3_pa().

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/processor-flags.h |5 +++--
 arch/x86/include/asm/processor.h   |5 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h 
b/arch/x86/include/asm/processor-flags.h
index 79aa2f9..f5d3e50 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_PROCESSOR_FLAGS_H
 
 #include 
+#include 
 
 #ifdef CONFIG_VM86
 #define X86_VM_MASKX86_EFLAGS_VM
@@ -32,8 +33,8 @@
  * CR3_ADDR_MASK is the mask used by read_cr3_pa().
  */
 #ifdef CONFIG_X86_64
-/* Mask off the address space ID bits. */
-#define CR3_ADDR_MASK 0x7000ull
+/* Mask off the address space ID and SME encryption bits. */
+#define CR3_ADDR_MASK __sme_clr(0x7000ull)
 #define CR3_PCID_MASK 0xFFFull
 #else
 /*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8010c97..ab878bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -240,6 +240,11 @@ static inline unsigned long read_cr3_pa(void)
return __read_cr3() & CR3_ADDR_MASK;
 }
 
+static inline unsigned long native_read_cr3_pa(void)
+{
+   return __native_read_cr3() & CR3_ADDR_MASK;
+}
+
 static inline void load_cr3(pgd_t *pgdir)
 {
write_cr3(__sme_pa(pgdir));

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 15/38] x86/boot/e820: Add support to determine the E820 type of an address

2017-06-27 Thread Tom Lendacky
Add a function that will return the E820 type associated with an address
range.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/e820/api.h |2 ++
 arch/x86/kernel/e820.c  |   26 +++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 8e0f8b8..3641f5f 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -38,6 +38,8 @@
 extern void e820__reallocate_tables(void);
 extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
+extern int  e820__get_entry_type(u64 start, u64 end);
+
 /*
  * Returns true iff the specified range [start,end) is completely contained 
inside
  * the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d78a586..46c9b65 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -84,7 +84,8 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
  * Note: this function only works correctly once the E820 table is sorted and
  * not-overlapping (at least for the range specified), which is the case 
normally.
  */
-bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+static struct e820_entry *__e820__mapped_all(u64 start, u64 end,
+enum e820_type type)
 {
int i;
 
@@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum 
e820_type type)
 * coverage of the desired range exists:
 */
if (start >= end)
-   return 1;
+   return entry;
}
-   return 0;
+
+   return NULL;
+}
+
+/*
+ * This function checks if the entire range  is mapped with type.
+ */
+bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+{
+   return __e820__mapped_all(start, end, type);
+}
+
+/*
+ * This function returns the type associated with the range .
+ */
+int e820__get_entry_type(u64 start, u64 end)
+{
+   struct e820_entry *entry = __e820__mapped_all(start, end, 0);
+
+   return entry ? entry->type : -EINVAL;
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 12/38] x86/mm: Extend early_memremap() support with additional attrs

2017-06-27 Thread Tom Lendacky
Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/Kconfig |4 +++
 arch/x86/include/asm/fixmap.h|   13 ++
 arch/x86/include/asm/pgtable_types.h |8 ++
 arch/x86/mm/ioremap.c|   44 ++
 include/asm-generic/early_ioremap.h  |2 ++
 mm/early_ioremap.c   |   10 
 6 files changed, 81 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3a59e9c..a04081ce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1434,6 +1434,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  If set to N, then the encryption of system memory can be
  activated with the mem_encrypt=on command line option.
 
+config ARCH_USE_MEMREMAP_PROT
+   def_bool y
+   depends on AMD_MEM_ENCRYPT
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d9ff226..dcd9fb5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -164,6 +164,19 @@ static inline void __set_fixmap(enum fixed_addresses idx,
  */
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
 
+/*
+ * Early memremap routines used for in-place encryption. The mappings created
+ * by these routines are intended to be used as temporary mappings.
+ */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+unsigned long size);
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+unsigned long size);
+
 #include 
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index de32ca3..32095af 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {
 
 #define _PAGE_CACHE_MASK   (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 #define _PAGE_NOCACHE  (cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP (cachemode2protval(_PAGE_CACHE_MODE_WP))
 
 #define PAGE_NONE  __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_VVAR (__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC   (__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP   (__PAGE_KERNEL | _PAGE_CACHE_WP)
 
 #define __PAGE_KERNEL_IO   (__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE   (__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
 #define _KERNPG_TABLE  (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |\
 _PAGE_DIRTY | _PAGE_ENC)
 
+#define __PAGE_KERNEL_ENC  (__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP   (__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_NOENC(__PAGE_KERNEL)
+#define __PAGE_KERNEL_NOENC_WP (__PAGE_KERNEL_WP)
+
 #define PAGE_KERNEL__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC   __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bfc3e2d..26db273 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -414,6 +414,50 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+/* Remap memory with encryption */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+   return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/*
+ * Remap memory with encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+unsigned long size)
+{
+   /* Be sure the write-protect PAT entry is set for write-protect */
+   if 

[PATCH v8 00/38] x86: Secure Memory Encryption (AMD)

2017-06-27 Thread Tom Lendacky
This patch series provides support for AMD's new Secure Memory Encryption (SME)
feature.

SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.

The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.

This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.

The following links provide additional detail:

AMD Memory Encryption whitepaper:
   
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
   http://support.amd.com/TechDocs/24593.pdf
   SME is section 7.10
   SEV is section 15.34

---

This patch series is based off of the master branch of tip:
  https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

  Commit 6ab5af989579 ("Merge branch 'irq/core'")

Source code is also available at https://github.com/codomania/tip/tree/sme-v8


Still to do:
- Kdump support, including using memremap() instead of ioremap_cache()

Changes since v7:
- Fixed kbuild test robot failure related to pgprot_decrypted() macro
  usage for some non-x86 archs
- Moved calls to encrypt the kernel and retrieve the encryption mask
  from assembler (head_64.S) into C (head64.c)
- Removed use of phys_to_virt() in __ioremap_caller() when address is in
  the ISA range. Now regular ioremap() processing occurs.
- Two new, small patches:
  - Introduced a native_make_p4d() for use when CONFIG_PGTABLE_LEVELS is
not greater than 4
  - Introduced __nostackp GCC option to turn off stack protection on a
per function basis
- General code cleanup based on feedback

Changes since v6:
- Fixed the asm include file issue that caused build errors on other archs
- Rebased the CR3 register changes on top of Andy Lutomirski's patch
- Added a patch to clear the SME cpu feature if running as a PV guest under
  Xen
- Added a patch to obtain the AMD microcode level earlier in the boot
  instead of directly reading the MSR
- Refactor patch #8 ("x86/mm: Add support to enable SME in early boot
  processing") because the 5-level paging support moved the code into the
  new C-function __startup_64()
- Removed need to decrypt trampoline area in-place (set memory attributes
  before copying the trampoline code)
- General code cleanup based on feedback

Changes since v5:
- Added support for 5-level paging
- Added IOMMU support
- Created a generic asm/mem_encrypt.h in order to remove a bunch of
  #ifndef/#define entries
- Removed changes to the __va() macro and defined a function to return
  the true physical address in cr3
- Removed sysfs support as it was determined not to be needed
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions

Changes since v4:
- Re-worked mapping of setup data to not use a fixed list. Rather, check
  dynamically whether the requested early_memremap()/memremap() call
  needs to be mapped decrypted.
- Moved SME cpu feature into scattered features
- Moved some declarations into header files
- Cleared the encryption mask from the __PHYSICAL_MASK so that users
  of macros such as pmd_pfn_mask() don't have to worry/know about the
  encryption mask
- Updated some return types and values related to EFI and e820 functions
  so that an error could be returned
- During cpu shutdown, removed cache disabling and added a check for kexec
  in progress to use wbinvd followed immediately by halt in order to avoid
  any memory corruption
- Update how persistent memory is identified
- Added a function to find command line arguments and their values
- Added sysfs support
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions


Changes since v3:
- Broke out some of the patches into smaller individual 

[PATCH v8 13/38] x86/mm: Add support for early encrypt/decrypt of memory

2017-06-27 Thread Tom Lendacky
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or decrypted memory area is in the proper state (for example
the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |   10 +
 arch/x86/mm/mem_encrypt.c  |   76 
 2 files changed, 86 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index dbae7a5..8baa35b 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,11 @@
 
 extern unsigned long sme_me_mask;
 
+void __init sme_early_encrypt(resource_size_t paddr,
+ unsigned long size);
+void __init sme_early_decrypt(resource_size_t paddr,
+ unsigned long size);
+
 void __init sme_early_init(void);
 
 void __init sme_encrypt_kernel(void);
@@ -30,6 +35,11 @@
 
 #define sme_me_mask0UL
 
+static inline void __init sme_early_encrypt(resource_size_t paddr,
+   unsigned long size) { }
+static inline void __init sme_early_decrypt(resource_size_t paddr,
+   unsigned long size) { }
+
 static inline void __init sme_early_init(void) { }
 
 static inline void __init sme_encrypt_kernel(void) { }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index f973d3d..54bb73c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 /*
  * Since SME related variables are set early in the boot process they must
  * reside in the .data section so as not to be zeroed out when the .bss
@@ -22,6 +25,79 @@
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as either encrypted or decrypted but the contents
+ * are currently not in the desired state.
+ *
+ * This routine follows the steps outlined in the AMD64 Architecture
+ * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
+ */
+static void __init __sme_early_enc_dec(resource_size_t paddr,
+  unsigned long size, bool enc)
+{
+   void *src, *dst;
+   size_t len;
+
+   if (!sme_me_mask)
+   return;
+
+   local_flush_tlb();
+   wbinvd();
+
+   /*
+* There are limited number of early mapping slots, so map (at most)
+* one page at time.
+*/
+   while (size) {
+   len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+   /*
+* Create mappings for the current and desired format of
+* the memory. Use a write-protected mapping for the source.
+*/
+   src = enc ? early_memremap_decrypted_wp(paddr, len) :
+   early_memremap_encrypted_wp(paddr, len);
+
+   dst = enc ? early_memremap_encrypted(paddr, len) :
+   early_memremap_decrypted(paddr, len);
+
+   /*
+* If a mapping can't be obtained to perform the operation,
+* then eventual access of that area in the desired mode
+* will cause a crash.
+*/
+   BUG_ON(!src || !dst);
+
+   /*
+* Use a temporary buffer, of cache-line multiple size, to
+* avoid data corruption as documented in the APM.
+*/
+   memcpy(sme_early_buffer, src, len);
+   memcpy(dst, sme_early_buffer, len);
+
+   early_memunmap(dst, len);
+   early_memunmap(src, len);
+
+   paddr += len;
+   size -= len;
+   }
+}
+
+void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
+{
+   __sme_early_enc_dec(paddr, size, true);
+}
+
+void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
+{
+   __sme_early_enc_dec(paddr, size, false);
+}
+
 void __init sme_early_init(void)
 {
unsigned int i;

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 05/38] x86/CPU/AMD: Handle SME reduction in physical address size

2017-06-27 Thread Tom Lendacky
When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/kernel/cpu/amd.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c47ceee..5bdcbd4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -613,15 +613,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_E400);
 
/*
-* BIOS support is required for SME. If BIOS has not enabled SME
-* then don't advertise the feature (set in scattered.c)
+* BIOS support is required for SME. If BIOS has enabld SME then
+* adjust x86_phys_bits by the SME physical address space reduction
+* value. If BIOS has not enabled SME then don't advertise the
+* feature (set in scattered.c).
 */
if (cpu_has(c, X86_FEATURE_SME)) {
u64 msr;
 
/* Check if SME is enabled */
rdmsrl(MSR_K8_SYSCFG, msr);
-   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
+   c->x86_phys_bits -= (cpuid_ebx(0x801f) >> 6) & 0x3f;
+   else
clear_cpu_cap(c, X86_FEATURE_SME);
}
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 14/38] x86/mm: Insure that boot memory areas are mapped properly

2017-06-27 Thread Tom Lendacky
The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process.  The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.

For the initrd, encrypt this data in place. Since the future mapping of
the initrd area will be mapped as encrypted the data will be accessed
properly.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |6 +++
 arch/x86/include/asm/pgtable.h |3 ++
 arch/x86/kernel/head64.c   |   30 +++--
 arch/x86/kernel/setup.c|9 +
 arch/x86/mm/kasan_init_64.c|2 +
 arch/x86/mm/mem_encrypt.c  |   63 
 6 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 8baa35b..ab1fe77 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,9 @@ void __init sme_early_encrypt(resource_size_t paddr,
 void __init sme_early_decrypt(resource_size_t paddr,
  unsigned long size);
 
+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_unmap_bootdata(char *real_mode_data);
+
 void __init sme_early_init(void);
 
 void __init sme_encrypt_kernel(void);
@@ -40,6 +43,9 @@ static inline void __init sme_early_encrypt(resource_size_t 
paddr,
 static inline void __init sme_early_decrypt(resource_size_t paddr,
unsigned long size) { }
 
+static inline void __init sme_map_bootdata(char *real_mode_data) { }
+static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
+
 static inline void __init sme_early_init(void) { }
 
 static inline void __init sme_encrypt_kernel(void) { }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index c6452cb..bbeae4a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,9 @@
 #ifndef __ASSEMBLY__
 #include 
 
+extern pgd_t early_top_pgt[PTRS_PER_PGD];
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
+
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 void ptdump_walk_pgd_level_checkwx(void);
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 5cd0b72..0cdb53b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -34,7 +34,6 @@
 /*
  * Manage page tables very early on.
  */
-extern pgd_t early_top_pgt[PTRS_PER_PGD];
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
@@ -181,13 +180,13 @@ static void __init reset_early_page_tables(void)
 }
 
 /* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
unsigned long physaddr = address - __PAGE_OFFSET;
pgdval_t pgd, *pgd_p;
p4dval_t p4d, *p4d_p;
pudval_t pud, *pud_p;
-   pmdval_t pmd, *pmd_p;
+   pmdval_t *pmd_p;
 
/* Invalid address or early pgt is done ?  */
if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_top_pgt))
@@ -246,12 +245,21 @@ int __init early_make_pgtable(unsigned long address)
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + 
_KERNPG_TABLE;
}
-   pmd = (physaddr & PMD_MASK) + early_pmd_flags;
pmd_p[pmd_index(address)] = pmd;
 
return 0;
 }
 
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   pmdval_t pmd;
+
+   pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+   return __early_make_pgtable(address, pmd);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
yet. */
 static void __init clear_bss(void)
@@ -274,6 +282,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;
 
+   /*
+* If SME is active, this will create decrypted mappings of the
+* boot data in advance of the copy operations.
+*/
+   sme_map_bootdata(real_mode_data);
+
memcpy(_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(_params);
cmd_line_ptr = get_cmd_line_ptr();
@@ -281,6 +295,14 @@ static void __init copy_bootdata(char *real_mode_data)
command_line = __va(cmd_line_ptr);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
}
+
+   /*
+* The old boot data is no longer needed and won't be reserved,
+* freeing up that memory for use by the 

[PATCH v8 09/38] x86/mm: Simplify p[g4um]d_page() macros

2017-06-27 Thread Tom Lendacky
Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then
use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of
duplicating the code.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/pgtable.h |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 77037b6..b64ea52 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -195,6 +195,11 @@ static inline unsigned long p4d_pfn(p4d_t p4d)
return (p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+   return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 static inline int p4d_large(p4d_t p4d)
 {
/* No 512 GiB pages yet */
@@ -704,8 +709,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)  \
-   pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)  pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -773,8 +777,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)  \
-   pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)  pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -824,8 +827,7 @@ static inline unsigned long p4d_page_vaddr(p4d_t p4d)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define p4d_page(p4d)  \
-   pfn_to_page((p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT)
+#define p4d_page(p4d)  pfn_to_page(p4d_pfn(p4d))
 
 /* Find an entry in the third-level page table.. */
 static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
@@ -859,7 +861,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)  pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)  pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 08/38] x86/mm: Add support to enable SME in early boot processing

2017-06-27 Thread Tom Lendacky
Add support to the early boot code to use Secure Memory Encryption (SME).
Since the kernel has been loaded into memory in a decrypted state, encrypt
the kernel in place and update the early pagetables with the memory
encryption mask so that new pagetable entries will use memory encryption.

The routines to set the encryption mask and perform the encryption are
stub routines for now with functionality to be added in a later patch.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |8 +
 arch/x86/kernel/head64.c   |   53 +---
 arch/x86/kernel/head_64.S  |   20 --
 arch/x86/mm/mem_encrypt.c  |9 ++
 include/linux/mem_encrypt.h|5 +++
 5 files changed, 82 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index a105796..475e34f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,14 +15,22 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
 
+void __init sme_encrypt_kernel(void);
+void __init sme_enable(void);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
 
+static inline void __init sme_encrypt_kernel(void) { }
+static inline void __init sme_enable(void) { }
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 46c3c73..1f0ddcc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,9 +46,10 @@ static void __head *fixup_pointer(void *ptr, unsigned long 
physaddr)
return ptr - (void *)_text + (void *)physaddr;
 }
 
-void __head __startup_64(unsigned long physaddr)
+unsigned long __head __startup_64(unsigned long physaddr)
 {
unsigned long load_delta, *p;
+   unsigned long pgtable_flags;
pgdval_t *pgd;
p4dval_t *p4d;
pudval_t *pud;
@@ -68,6 +70,12 @@ void __head __startup_64(unsigned long physaddr)
if (load_delta & ~PMD_PAGE_MASK)
for (;;);
 
+   /* Activate Secure Memory Encryption (SME) if supported and enabled */
+   sme_enable();
+
+   /* Include the SME encryption mask in the fixup value */
+   load_delta += sme_get_me_mask();
+
/* Fixup the physical addresses in the page table */
 
pgd = fixup_pointer(_top_pgt, physaddr);
@@ -94,28 +102,30 @@ void __head __startup_64(unsigned long physaddr)
 
pud = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
pmd = fixup_pointer(early_dynamic_pgts[next_early_pgt++], physaddr);
+   pgtable_flags = _KERNPG_TABLE + sme_get_me_mask();
 
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
p4d = fixup_pointer(early_dynamic_pgts[next_early_pgt++], 
physaddr);
 
i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-   pgd[i + 0] = (pgdval_t)p4d + _KERNPG_TABLE;
-   pgd[i + 1] = (pgdval_t)p4d + _KERNPG_TABLE;
+   pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
+   pgd[i + 1] = (pgdval_t)p4d + pgtable_flags;
 
i = (physaddr >> P4D_SHIFT) % PTRS_PER_P4D;
-   p4d[i + 0] = (pgdval_t)pud + _KERNPG_TABLE;
-   p4d[i + 1] = (pgdval_t)pud + _KERNPG_TABLE;
+   p4d[i + 0] = (pgdval_t)pud + pgtable_flags;
+   p4d[i + 1] = (pgdval_t)pud + pgtable_flags;
} else {
i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-   pgd[i + 0] = (pgdval_t)pud + _KERNPG_TABLE;
-   pgd[i + 1] = (pgdval_t)pud + _KERNPG_TABLE;
+   pgd[i + 0] = (pgdval_t)pud + pgtable_flags;
+   pgd[i + 1] = (pgdval_t)pud + pgtable_flags;
}
 
i = (physaddr >> PUD_SHIFT) % PTRS_PER_PUD;
-   pud[i + 0] = (pudval_t)pmd + _KERNPG_TABLE;
-   pud[i + 1] = (pudval_t)pmd + _KERNPG_TABLE;
+   pud[i + 0] = (pudval_t)pmd + pgtable_flags;
+   pud[i + 1] = (pudval_t)pmd + pgtable_flags;
 
pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
+   pmd_entry += sme_get_me_mask();
pmd_entry +=  physaddr;
 
for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
@@ -136,9 +146,30 @@ void __head __startup_64(unsigned long physaddr)
pmd[i] += load_delta;
}
 
-   /* Fixup phys_base */
+   /*
+* Fixup phys_base - remove the memory encryption mask to obtain
+* the true physical address.
+*/
p = fixup_pointer(_base, physaddr);
-   *p += load_delta;
+   *p += load_delta - sme_get_me_mask();
+
+   /* Encrypt the kernel (if SME is active) */
+   sme_encrypt_kernel();
+
+   /*
+* Return the SME encryption mask (if SME is active) to be used as a
+* 

[PATCH v8 06/38] x86/mm: Add Secure Memory Encryption (SME) support

2017-06-27 Thread Tom Lendacky
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/Kconfig   |   25 +
 arch/x86/include/asm/mem_encrypt.h |   30 ++
 arch/x86/mm/Makefile   |1 +
 arch/x86/mm/mem_encrypt.c  |   21 +
 include/linux/mem_encrypt.h|   35 +++
 5 files changed, 112 insertions(+)
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 include/linux/mem_encrypt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72028a1..3a59e9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1409,6 +1409,31 @@ config X86_DIRECT_GBPAGES
  supports them), so don't confuse the user by printing
  that we have them enabled.
 
+config ARCH_HAS_MEM_ENCRYPT
+   def_bool y
+
+config AMD_MEM_ENCRYPT
+   bool "AMD Secure Memory Encryption (SME) support"
+   depends on X86_64 && CPU_SUP_AMD
+   ---help---
+ Say yes to enable support for the encryption of system memory.
+ This requires an AMD processor that supports Secure Memory
+ Encryption (SME).
+
+config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
+   bool "Activate AMD Secure Memory Encryption (SME) by default"
+   default y
+   depends on AMD_MEM_ENCRYPT
+   ---help---
+ Say yes to have system memory encrypted by default if running on
+ an AMD processor that supports Secure Memory Encryption (SME).
+
+ If set to Y, then the encryption of system memory can be
+ deactivated with the mem_encrypt=off command line option.
+
+ If set to N, then the encryption of system memory can be
+ activated with the mem_encrypt=on command line option.
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 000..a105796
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,30 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+#else  /* !CONFIG_AMD_MEM_ENCRYPT */
+
+#define sme_me_mask0UL
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 0fbdcb6..a94a7b6 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)   += mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 000..b99d469
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+/*
+ * Since SME related variables are set early in the boot process they must
+ * reside in the .data section so as not to be zeroed out when the .bss
+ * section is later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 000..59769f7
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,35 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_ARCH_HAS_MEM_ENCRYPT
+
+#include 
+
+#else  /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
+
+#define sme_me_mask0UL
+
+#endif /* 

[PATCH v8 02/38] x86/mm/pat: Set write-protect cache mode for full PAT support

2017-06-27 Thread Tom Lendacky
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).

Acked-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/pat.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 9b78685..6753d9c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -295,7 +295,7 @@ static void init_cache_modes(void)
  * pat_init - Initialize PAT MSR and PAT table
  *
  * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC and WT.
+ * to enable additional cache attributes, WC, WT and WP.
  *
  * This function must be called on all CPUs using the specific sequence of
  * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
@@ -356,7 +356,7 @@ void pat_init(void)
 *  0102UC-: _PAGE_CACHE_MODE_UC_MINUS
 *  0113UC : _PAGE_CACHE_MODE_UC
 *  1004WB : Reserved
-*  1015WC : Reserved
+*  1015WP : _PAGE_CACHE_MODE_WP
 *  1106UC-: Reserved
 *  1117WT : _PAGE_CACHE_MODE_WT
 *
@@ -364,7 +364,7 @@ void pat_init(void)
 * corresponding types in the presence of PAT errata.
 */
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
}
 
if (!boot_cpu_done) {

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature

2017-06-27 Thread Tom Lendacky
Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature.  SME is identified by CPUID
0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG).  Only show the SME feature as available if reported by
CPUID and enabled by BIOS.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/msr-index.h   |2 ++
 arch/x86/kernel/cpu/amd.c  |   13 +
 arch/x86/kernel/cpu/scattered.c|1 +
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2701e5f..2b692df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -196,6 +196,7 @@
 
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory 
Encryption */
 
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number 
*/
 #define X86_FEATURE_INTEL_PT   ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18b1623..460ac01 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -352,6 +352,8 @@
 #define MSR_K8_TOP_MEM10xc001001a
 #define MSR_K8_TOP_MEM20xc001001d
 #define MSR_K8_SYSCFG  0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT  23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT  BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
 #define MSR_K8_INT_PENDING_MSG 0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK0x1800
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index bb5abe8..c47ceee 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -611,6 +611,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 */
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);
+
+   /*
+* BIOS support is required for SME. If BIOS has not enabled SME
+* then don't advertise the feature (set in scattered.c)
+*/
+   if (cpu_has(c, X86_FEATURE_SME)) {
+   u64 msr;
+
+   /* Check if SME is enabled */
+   rdmsrl(MSR_K8_SYSCFG, msr);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   clear_cpu_cap(c, X86_FEATURE_SME);
+   }
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 23c2350..05459ad 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ struct cpuid_bit {
{ X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 01/38] x86: Document AMD Secure Memory Encryption (SME)

2017-06-27 Thread Tom Lendacky
Create a Documentation entry to describe the AMD Secure Memory
Encryption (SME) feature and add documentation for the mem_encrypt=
kernel parameter.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 Documentation/admin-guide/kernel-parameters.txt |   11 
 Documentation/x86/amd-memory-encryption.txt |   68 +++
 2 files changed, 79 insertions(+)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9b0b3de..51e03ee 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2197,6 +2197,17 @@
memory contents and reserves bad memory
regions that are detected.
 
+   mem_encrypt=[X86-64] AMD Secure Memory Encryption (SME) control
+   Valid arguments: on, off
+   Default (depends on kernel configuration option):
+ on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
+ off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
+   mem_encrypt=on: Activate SME
+   mem_encrypt=off:Do not activate SME
+
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.
+
mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 000..f512ab7
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,68 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables.  A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM.  SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below on how to determine its position).  The encryption bit can also be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted by setting the encryption
+bit in the page table entry that points to the next table. This allows the full
+page table hierarchy to be encrypted. Note, this means that just because the
+encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted.
+Each page table entry in the hierarchy needs to have the encryption bit set to
+achieve that. So, theoretically, you could have the encryption bit set in cr3
+so that the PGD is encrypted, but not set the encryption bit in the PGD entry
+for a PUD which results in the PUD pointed to by that entry to not be
+encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x801f reports information related to SME:
+
+   0x801f[eax]:
+   Bit[0] indicates support for SME
+   0x801f[ebx]:
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+  system physical addresses, not guest physical
+  addresses)
+
+If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+   0xc0010010:
+   Bit[23]   0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system.  If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+The state of SME in the Linux kernel can be documented as follows:
+   - Supported:
+ The CPU supports SME (determined through CPUID instruction).
+
+   - Enabled:
+ Supported and bit 23 of MSR_K8_SYSCFG is set.
+
+   - Active:
+ Supported, Enabled and the Linux kernel is actively applying
+ the encryption bit to page table entries (the SME mask in the
+ kernel is non-zero).
+
+SME can also be enabled and 

[PATCH v8 03/38] x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap for RAM mappings

2017-06-27 Thread Tom Lendacky
The ioremap() function is intended for mapping MMIO. For RAM, the
memremap() function should be used. Convert calls from ioremap() to
memremap() when re-mapping RAM.

This will be used later by SME to control how the encryption mask is
applied to memory mappings, with certain memory locations being mapped
decrypted vs encrypted.

Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/dmi.h   |8 
 arch/x86/kernel/acpi/boot.c  |6 +++---
 arch/x86/kernel/kdebugfs.c   |   34 +++---
 arch/x86/kernel/ksysfs.c |   28 ++--
 arch/x86/kernel/mpparse.c|   10 +-
 arch/x86/pci/common.c|4 ++--
 drivers/firmware/dmi-sysfs.c |5 +++--
 drivers/firmware/pcdp.c  |4 ++--
 drivers/sfi/sfi_core.c   |   22 +++---
 9 files changed, 55 insertions(+), 66 deletions(-)

diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
index 3c69fed..a8e15b0 100644
--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -13,9 +13,9 @@ static __always_inline __init void *dmi_alloc(unsigned len)
 }
 
 /* Use early IO mappings for DMI because it's initialized early */
-#define dmi_early_remapearly_ioremap
-#define dmi_early_unmapearly_iounmap
-#define dmi_remap  ioremap_cache
-#define dmi_unmap  iounmap
+#define dmi_early_remapearly_memremap
+#define dmi_early_unmapearly_memunmap
+#define dmi_remap(_x, _l)  memremap(_x, _l, MEMREMAP_WB)
+#define dmi_unmap(_x)  memunmap(_x)
 
 #endif /* _ASM_X86_DMI_H */
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6bb6806..850160a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -115,7 +115,7 @@
 #defineACPI_INVALID_GSIINT_MIN
 
 /*
- * This is just a simple wrapper around early_ioremap(),
+ * This is just a simple wrapper around early_memremap(),
  * with sanity checks for phys == 0 and size == 0.
  */
 char *__init __acpi_map_table(unsigned long phys, unsigned long size)
@@ -124,7 +124,7 @@ char *__init __acpi_map_table(unsigned long phys, unsigned 
long size)
if (!phys || !size)
return NULL;
 
-   return early_ioremap(phys, size);
+   return early_memremap(phys, size);
 }
 
 void __init __acpi_unmap_table(char *map, unsigned long size)
@@ -132,7 +132,7 @@ void __init __acpi_unmap_table(char *map, unsigned long 
size)
if (!map || !size)
return;
 
-   early_iounmap(map, size);
+   early_memunmap(map, size);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index 38b6458..fd6f8fb 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -33,7 +33,6 @@ static ssize_t setup_data_read(struct file *file, char __user 
*user_buf,
struct setup_data_node *node = file->private_data;
unsigned long remain;
loff_t pos = *ppos;
-   struct page *pg;
void *p;
u64 pa;
 
@@ -47,18 +46,13 @@ static ssize_t setup_data_read(struct file *file, char 
__user *user_buf,
count = node->len - pos;
 
pa = node->paddr + sizeof(struct setup_data) + pos;
-   pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
-   if (PageHighMem(pg)) {
-   p = ioremap_cache(pa, count);
-   if (!p)
-   return -ENXIO;
-   } else
-   p = __va(pa);
+   p = memremap(pa, count, MEMREMAP_WB);
+   if (!p)
+   return -ENOMEM;
 
remain = copy_to_user(user_buf, p, count);
 
-   if (PageHighMem(pg))
-   iounmap(p);
+   memunmap(p);
 
if (remain)
return -EFAULT;
@@ -109,7 +103,6 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
struct setup_data *data;
int error;
struct dentry *d;
-   struct page *pg;
u64 pa_data;
int no = 0;
 
@@ -126,16 +119,12 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
goto err_dir;
}
 
-   pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
-   if (PageHighMem(pg)) {
-   data = ioremap_cache(pa_data, sizeof(*data));
-   if (!data) {
-   kfree(node);
-   error = -ENXIO;
-   goto err_dir;
-   }
-   } else
-   data = __va(pa_data);
+   data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
+   if (!data) {
+   kfree(node);
+   error = -ENOMEM;
+   goto err_dir;
+   }
 
node->paddr 

Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Paolo Bonzini


On 27/06/2017 16:22, Radim Krčmář wrote:
> vcpu_is_preempted() on current cpu cannot return true, AFAIK.

Of course.  I must have been thinking of an older version of the
vcpu_is_preempted patch (at some point the guest was the one that set
preempted to 0).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Radim Krčmář
2017-06-27 15:56+0200, Paolo Bonzini:
> On 27/06/2017 15:40, Radim Krčmář wrote:
>>> ... which is not necessarily _wrong_.  It's just a different heuristic.
>> Right, it's just harder to use than host's single_task_running() -- the
>> VCPU calling vcpu_is_preempted() is never preempted, so we have to look
>> at other VCPUs that are not halted, but still preempted.
>> 
>> If we see some ratio of preempted VCPUs (> 0?), then we stop polling and
>> yield to the host.  Working under the assumption that there is work for
>> this PCPU if other VCPUs have stuff to do.  The downside is that it
>> misses information about host's topology, so it would be hard to make it
>> work well.
> 
> I would just use vcpu_is_preempted on the current CPU.  From guest POV
> this option is really a "f*** everyone else" setting just like
> idle=poll, only a little more polite.

vcpu_is_preempted() on current cpu cannot return true, AFAIK.

> If we've been preempted and we were polling, there are two cases.  If an
> interrupt was queued while the guest was preempted, the poll will be
> treated as successful anyway.

I think the poll should be treated as invalid if the window has expired
while the VCPU was preempted -- the guest can't tell whether the
interrupt arrived still within the poll window (unless we added paravirt
for that), so it shouldn't be wasting time waiting for it.

>If it hasn't, let others run---but really
> that's not because the guest wants to be polite, it's to avoid that the
> scheduler penalizes it excessively.

This sounds like a VM entry just to do an immediate VM exit, so paravirt
seems better here as well ... (the guest telling the host about its
window -- which could also be used to rule it out as a target in the
pause loop random kick.)

> So until it's preempted, I think it's okay if the guest doesn't care
> about others.  You wouldn't use this option anyway in overcommitted
> situations.
> 
> (I'm still not very convinced about the idea).

Me neither.  (The same mechanism is applicable to bare-metal, but was
never used there, so I would rather bring the guest behavior closer to
bare-metal.)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] x86/idle: add halt poll support

2017-06-27 Thread Radim Krčmář
2017-06-23 14:49+0800, Yang Zhang:
> On 2017/6/23 12:35, Wanpeng Li wrote:
> > 2017-06-23 12:08 GMT+08:00 Yang Zhang :
> > > On 2017/6/22 19:50, Wanpeng Li wrote:
> > > > 
> > > > 2017-06-22 19:22 GMT+08:00 root :
> > > > > 
> > > > > From: Yang Zhang 
> > > > > 
> > > > > Some latency-intensive workload will see obviously performance
> > > > > drop when running inside VM. The main reason is that the overhead
> > > > > is amplified when running inside VM. The most cost i have seen is
> > > > > inside idle path.
> > > > > This patch introduces a new mechanism to poll for a while before
> > > > > entering idle state. If schedule is needed during poll, then we
> > > > > don't need to goes through the heavy overhead path.
> > > > > 
> > > > > Here is the data i get when running benchmark contextswitch
> > > > > (https://github.com/tsuna/contextswitch)
> > > > > before patch:
> > > > > 200 process context switches in 4822613801ns (2411.3ns/ctxsw)
> > > > > after patch:
> > > > > 200 process context switches in 3584098241ns (1792.0ns/ctxsw)
> > > > 
> > > > 
> > > > If you test this after disabling the adaptive halt-polling in kvm?
> > > > What's the performance data of w/ this patchset and w/o the adaptive
> > > > halt-polling in kvm, and w/o this patchset and w/ the adaptive
> > > > halt-polling in kvm? In addition, both linux and windows guests can
> > > > get benefit as we have already done this in kvm.
> > > 
> > > 
> > > I will provide more data in next version. But it doesn't conflict with
> > 
> > Another case I can think of is w/ both this patchset and the adaptive
> > halt-polling in kvm.
> > 
> > > current halt polling inside kvm. This is just another enhancement.
> > 
> > I didn't look close to the patchset, however, maybe there is another
> > poll in the kvm part again sometimes if you fails the poll in the
> > guest. In addition, the adaptive halt-polling in kvm has performance
> > penalty when the pCPU is heavily overcommitted though there is a
> > single_task_running() in my testing, it is hard to accurately aware
> > whether there are other tasks waiting on the pCPU in the guest which
> > will make it worser. Depending on vcpu_is_preempted() or steal time
> > maybe not accurately or directly.
> > 
> > So I'm not sure how much sense it makes by adaptive halt-polling in
> > both guest and kvm. I prefer to just keep adaptive halt-polling in
> > kvm(then both linux/windows or other guests can get benefit) and avoid
> > to churn the core x86 path.
> 
> This mechanism is not specific to KVM. It is a kernel feature which can
> benefit guest when running inside X86 virtualization environment. The guest
> includes KVM,Xen,VMWARE,Hyper-v. Administrator can control KVM to use
> adaptive halt poll but he cannot control the user to use halt polling inside
> guest. Lots of user set idle=poll inside guest to improve performance which
> occupy more CPU cycles. This mechanism is a enhancement to it not to KVM
> halt polling.

Users of idle=poll shouln't overcommit, so the goal seems to be energy
savings without crippling the guest performance too much ...

Wouldn't switching to idle=mwait work as well?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Paolo Bonzini


On 27/06/2017 15:40, Radim Krčmář wrote:
>> ... which is not necessarily _wrong_.  It's just a different heuristic.
> Right, it's just harder to use than host's single_task_running() -- the
> VCPU calling vcpu_is_preempted() is never preempted, so we have to look
> at other VCPUs that are not halted, but still preempted.
> 
> If we see some ratio of preempted VCPUs (> 0?), then we stop polling and
> yield to the host.  Working under the assumption that there is work for
> this PCPU if other VCPUs have stuff to do.  The downside is that it
> misses information about host's topology, so it would be hard to make it
> work well.

I would just use vcpu_is_preempted on the current CPU.  From guest POV
this option is really a "f*** everyone else" setting just like
idle=poll, only a little more polite.

If we've been preempted and we were polling, there are two cases.  If an
interrupt was queued while the guest was preempted, the poll will be
treated as successful anyway.  If it hasn't, let others run---but really
that's not because the guest wants to be polite, it's to avoid that the
scheduler penalizes it excessively.

So until it's preempted, I think it's okay if the guest doesn't care
about others.  You wouldn't use this option anyway in overcommitted
situations.

(I'm still not very convinced about the idea).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Radim Krčmář
2017-06-27 14:28+0200, Paolo Bonzini:
> On 27/06/2017 14:23, Wanpeng Li wrote:
> I have considered single_task_running() before. But since there is no
> such paravirtual interface currently and i am not sure whether it is a
> information leak from host if introducing such interface, so i didn't do
> it. Do you mean vcpu_is_preempted can do the same thing? I check the
> code and seems it only tells whether the VCPU is scheduled out or not
> which cannot satisfy the needs.
 Can you help to answer my confusion? I have double checked the code, but
 still not get your point. Do you think it is necessary to introduce an
 paravirtual interface to expose single_task_running() to guest?
>>
>> I think vcpu_is_preempted is a good enough replacement.
>> For example, vcpu->arch.st.steal.preempted is 0 when the vCPU is sched
>> in and vmentry, then several tasks are enqueued on the same pCPU and
>> waiting on cfs red-black tree, the guest should avoid to poll in this
>> scenario, however, vcpu_is_preempted returns false and guest decides
>> to poll.
> 
> ... which is not necessarily _wrong_.  It's just a different heuristic.

Right, it's just harder to use than host's single_task_running() -- the
VCPU calling vcpu_is_preempted() is never preempted, so we have to look
at other VCPUs that are not halted, but still preempted.

If we see some ratio of preempted VCPUs (> 0?), then we stop polling and
yield to the host.  Working under the assumption that there is work for
this PCPU if other VCPUs have stuff to do.  The downside is that it
misses information about host's topology, so it would be hard to make it
work well.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Paolo Bonzini


On 27/06/2017 14:23, Wanpeng Li wrote:
 I have considered single_task_running() before. But since there is no
 such paravirtual interface currently and i am not sure whether it is a
 information leak from host if introducing such interface, so i didn't do
 it. Do you mean vcpu_is_preempted can do the same thing? I check the
 code and seems it only tells whether the VCPU is scheduled out or not
 which cannot satisfy the needs.
>>> Can you help to answer my confusion? I have double checked the code, but
>>> still not get your point. Do you think it is necessary to introduce an
>>> paravirtual interface to expose single_task_running() to guest?
>
> I think vcpu_is_preempted is a good enough replacement.
> For example, vcpu->arch.st.steal.preempted is 0 when the vCPU is sched
> in and vmentry, then several tasks are enqueued on the same pCPU and
> waiting on cfs red-black tree, the guest should avoid to poll in this
> scenario, however, vcpu_is_preempted returns false and guest decides
> to poll.

... which is not necessarily _wrong_.  It's just a different heuristic.

In the end, the guest could run with "idle=poll" even, and there's
little the host scheduler can do about it, except treating it as a CPU
bound task.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Wanpeng Li
2017-06-27 20:07 GMT+08:00 Paolo Bonzini :
>
>
> On 27/06/2017 13:22, Yang Zhang wrote:

 Regarding the good/bad idea part, KVM's polling is made much more
 acceptable by single_task_running().  At least you need to integrate it
 with paravirtualization.  If the VM is scheduled out, you shrink the
 polling period.  There is already vcpu_is_preempted for this, it is used
 by mutexes.
>>>
>>> I have considered single_task_running() before. But since there is no
>>> such paravirtual interface currently and i am not sure whether it is a
>>> information leak from host if introducing such interface, so i didn't do
>>> it. Do you mean vcpu_is_preempted can do the same thing? I check the
>>> code and seems it only tells whether the VCPU is scheduled out or not
>>> which cannot satisfy the needs.
>>
>> Can you help to answer my confusion? I have double checked the code, but
>> still not get your point. Do you think it is necessary to introduce an
>> paravirtual interface to expose single_task_running() to guest?
>
> I think vcpu_is_preempted is a good enough replacement.

For example, vcpu->arch.st.steal.preempted is 0 when the vCPU is sched
in and vmentry, then several tasks are enqueued on the same pCPU and
waiting on cfs red-black tree, the guest should avoid to poll in this
scenario, however, vcpu_is_preempted returns false and guest decides
to poll.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] arm64: kvm: route synchronous external abort exceptions to el2

2017-06-27 Thread gengdongjiu
correct the commit message:

 In the firmware-first RAS solution, OS receives an synchronous
 external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
 the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
 If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
 otherwise it delegates to the host OS kernel


On 2017/6/26 20:45, Dongjiu Geng wrote:
> In the firmware-first RAS solution, guest OS receives an synchronous
> external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
> the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
> If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
> otherwise it delegates to the guest OS kernel
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/include/asm/kvm_arm.h | 2 ++
>  arch/arm64/include/asm/kvm_emulate.h | 7 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 61d694c..1188272 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,8 @@
>  #include 
>  
>  /* Hyp Configuration Register (HCR) bits */
> +#define HCR_TEA  (UL(1) << 37)
> +#define HCR_TERR (UL(1) << 36)
>  #define HCR_E2H  (UL(1) << 34)
>  #define HCR_ID   (UL(1) << 33)
>  #define HCR_CD   (UL(1) << 32)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f5ea0ba..5f64ab2 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
>   if (is_kernel_in_hyp_mode())
>   vcpu->arch.hcr_el2 |= HCR_E2H;
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> + /* route synchronous external abort exceptions to EL2 */
> + vcpu->arch.hcr_el2 |= HCR_TEA;
> + /* trap error record accesses */
> + vcpu->arch.hcr_el2 |= HCR_TERR;
> + }
> +
>   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>   vcpu->arch.hcr_el2 &= ~HCR_RW;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Paolo Bonzini


On 27/06/2017 13:22, Yang Zhang wrote:
>>>
>>> Regarding the good/bad idea part, KVM's polling is made much more
>>> acceptable by single_task_running().  At least you need to integrate it
>>> with paravirtualization.  If the VM is scheduled out, you shrink the
>>> polling period.  There is already vcpu_is_preempted for this, it is used
>>> by mutexes.
>>
>> I have considered single_task_running() before. But since there is no
>> such paravirtual interface currently and i am not sure whether it is a
>> information leak from host if introducing such interface, so i didn't do
>> it. Do you mean vcpu_is_preempted can do the same thing? I check the
>> code and seems it only tells whether the VCPU is scheduled out or not
>> which cannot satisfy the needs.
> 
> Can you help to answer my confusion? I have double checked the code, but
> still not get your point. Do you think it is necessary to introduce an
> paravirtual interface to expose single_task_running() to guest?

I think vcpu_is_preempted is a good enough replacement.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 02/17] mm: ability to disable execute permission on a key at creation

2017-06-27 Thread Balbir Singh
On Tue, 2017-06-27 at 03:11 -0700, Ram Pai wrote:
> Currently sys_pkey_create() provides the ability to disable read
> and write permission on the key, at  creation. powerpc  has  the
> hardware support to disable execute on a pkey as well.This patch
> enhances the interface to let disable execute  at  key  creation
> time. x86 does  not  allow  this.  Hence the next patch will add
> ability  in  x86  to  return  error  if  PKEY_DISABLE_EXECUTE is
> specified.
> 
> Signed-off-by: Ram Pai 
> ---

Acked-by: Balbir Singh 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 03/17] x86: key creation with PKEY_DISABLE_EXECUTE disallowed

2017-06-27 Thread Balbir Singh
On Tue, 2017-06-27 at 03:11 -0700, Ram Pai wrote:
> x86 does not support disabling execute permissions on a pkey.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/x86/kernel/fpu/xstate.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index c24ac1e..d582631 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -900,6 +900,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, 
> int pkey,
>   if (!boot_cpu_has(X86_FEATURE_OSPKE))
>   return -EINVAL;
>  
> + if (init_val & PKEY_DISABLE_EXECUTE)
> + return -EINVAL;
> +
>   /* Set the bits we need in PKRU:  */
>   if (init_val & PKEY_DISABLE_ACCESS)
>   new_pkru_bits |= PKRU_AD_BIT;

I am not an x86 expert. IIUC, execute disable is done via allocating an
execute_only_pkey and checking vma_key via AD + vma_flags against VM_EXEC.

Your patch looks good to me

Acked-by: Balbir Singh 

Balbir Singh.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/idle: use dynamic halt poll

2017-06-27 Thread Yang Zhang

On 2017/6/23 11:58, Yang Zhang wrote:

On 2017/6/22 19:51, Paolo Bonzini wrote:



On 22/06/2017 13:22, root wrote:

 ==

+poll_grow: (X86 only)
+
+This parameter is multiplied in the grow_poll_ns() to increase the
poll time.
+By default, the values is 2.
+
+==
+poll_shrink: (X86 only)
+
+This parameter is divided in the shrink_poll_ns() to reduce the poll
time.
+By default, the values is 2.


Even before starting the debate on whether this is a good idea or a bad
idea, KVM reduces the polling value to the minimum (10 us) by default


I noticed it. It looks like the logic inside KVM is more reasonable. I
will do more testing to compare the two.


when polling fails.  Also, it shouldn't be bound to
CONFIG_HYPERVISOR_GUEST, since there's nothing specific to virtual
machines here.


Yes. The original idea to use CONFIG_HYPERVISOR_GUEST because this
mechanism will only helpful inside VM. But as Thomas mentioned on other
thread it is wrong to use it since most distribution kernel will set it
to yes and still affect the bare metal. I will integrate it with
paravirtualizaion part as you suggested in below.



Regarding the good/bad idea part, KVM's polling is made much more
acceptable by single_task_running().  At least you need to integrate it
with paravirtualization.  If the VM is scheduled out, you shrink the
polling period.  There is already vcpu_is_preempted for this, it is used
by mutexes.


I have considered single_task_running() before. But since there is no
such paravirtual interface currently and i am not sure whether it is a
information leak from host if introducing such interface, so i didn't do
it. Do you mean vcpu_is_preempted can do the same thing? I check the
code and seems it only tells whether the VCPU is scheduled out or not
which cannot satisfy the needs.


Hi Paolo

Can you help to answer my confusion? I have double checked the code, but 
still not get your point. Do you think it is necessary to introduce an 
paravirtual interface to expose single_task_running() to guest?


--
Yang
Alibaba Cloud Computing
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 01/17] mm: introduce an additional vma bit for powerpc pkey

2017-06-27 Thread Balbir Singh
On Tue, 2017-06-27 at 03:11 -0700, Ram Pai wrote:
> Currently there are only 4bits in the vma flags to support 16 keys
> on x86.  powerpc supports 32 keys, which needs 5bits. This patch
> introduces an addition bit in the vma flags.
> 
> Signed-off-by: Ram Pai 
> ---
>  fs/proc/task_mmu.c |  6 +-
>  include/linux/mm.h | 18 +-
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f0c8b33..2ddc298 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -666,12 +666,16 @@ static void show_smap_vma_flags(struct seq_file *m, 
> struct vm_area_struct *vma)
>   [ilog2(VM_MERGEABLE)]   = "mg",
>   [ilog2(VM_UFFD_MISSING)]= "um",
>   [ilog2(VM_UFFD_WP)] = "uw",
> -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> +#ifdef CONFIG_ARCH_HAS_PKEYS
>   /* These come out via ProtectionKey: */
>   [ilog2(VM_PKEY_BIT0)]   = "",
>   [ilog2(VM_PKEY_BIT1)]   = "",
>   [ilog2(VM_PKEY_BIT2)]   = "",
>   [ilog2(VM_PKEY_BIT3)]   = "",
> +#endif /* CONFIG_ARCH_HAS_PKEYS */
> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> + /* Additional bit in ProtectionKey: */
> + [ilog2(VM_PKEY_BIT4)]   = "",
>  #endif

Not sure why these are linked with smap bits, but I guess the keys live
in the Supervisor Mode Access Prevention area?

Balbir Singh.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4 00/17] powerpc: Memory Protection Keys

2017-06-27 Thread Balbir Singh
On Tue, 2017-06-27 at 03:11 -0700, Ram Pai wrote:
> Memory protection keys enable applications to protect its
> address space from inadvertent access or corruption from
> itself.
> 
> The overall idea:
> 
>  A process allocates a   key  and associates it with
>  a  address  range  withinits   address   space.
>  The process  than  can  dynamically  set read/write 
>  permissions on  the   key   without  involving  the 
>  kernel. Any  code that  violates   the  permissions
>  off the address space; as defined by its associated
>  key, will receive a segmentation fault.
> 
> This patch series enables the feature on PPC64 HPTE
> platform.
> 
> ISA3.0 section 5.7.13 describes the detailed specifications.
> 
> 
> Testing:
>   This patch series has passed all the protection key
>   tests available in  the selftests directory.
>   The tests are updated to work on both x86 and powerpc.
> 
> version v4:
>   (1) patches no more depend on the pte bits to program
>   the hpte -- comment by Balbir
>   (2) documentation updates
>   (3) fixed a bug in the selftest.
>   (4) unlike x86, powerpc lets signal handler change key
>   permission bits; the change will persist across
>   signal handler boundaries. Earlier we allowed
>   the signal handler to modify a field in the siginfo
>   structure which would than be used by the kernel
>   to program the key protection register (AMR)
>   -- resolves a issue raised by Ben.
>   "Calls to sys_swapcontext with a made-up context
>   will end up with a crap AMR if done by code who
>   didn't know about that register".
>   (5) these changes enable protection keys on 4k-page 
>   kernel aswell.

I have not looked at the full series, but it seems cleaner than the original
one and the side-effect is that we can support 4k as well. Nice!

Balbir Singh.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 01/17] mm: introduce an additional vma bit for powerpc pkey

2017-06-27 Thread Ram Pai
Currently there are only 4bits in the vma flags to support 16 keys
on x86.  powerpc supports 32 keys, which needs 5bits. This patch
introduces an addition bit in the vma flags.

Signed-off-by: Ram Pai 
---
 fs/proc/task_mmu.c |  6 +-
 include/linux/mm.h | 18 +-
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f0c8b33..2ddc298 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -666,12 +666,16 @@ static void show_smap_vma_flags(struct seq_file *m, 
struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)]   = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   /* Additional bit in ProtectionKey: */
+   [ilog2(VM_PKEY_BIT4)]   = "",
 #endif
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7cb17c6..3d35bcc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,21 +208,29 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define VM_HIGH_ARCH_BIT_1 33  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_2 34  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_3 35  /* bit only usable on 64-bit 
architectures */
+#define VM_HIGH_ARCH_BIT_4 36  /* bit only usable on 64-bit arch */
 #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
-#if defined(CONFIG_X86)
-# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
-# define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
+# define VM_PKEY_BIT0  VM_HIGH_ARCH_0
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_PPC64_MEMORY_PROTECTION_KEYS)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4 /* additional key bit used on ppc64 */
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
+
+#if defined(CONFIG_X86)
+# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
 #elif defined(CONFIG_PPC)
 # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
(powerpc) */
 #elif defined(CONFIG_PARISC)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 04/17] powerpc: Implement sys_pkey_alloc and sys_pkey_free system call

2017-06-27 Thread Ram Pai
Sys_pkey_alloc() allocates and returns available pkey
Sys_pkey_free()  frees up the pkey.

Total 32 keys are supported on powerpc. However pkey 0,1 and 31
are reserved. So effectively we have 29 pkeys.

Each key  can  be  initialized  to disable read, write and execute
permissions. On powerpc a key can be initialize to disable execute.

Signed-off-by: Ram Pai 
---
 arch/powerpc/Kconfig |  15 
 arch/powerpc/include/asm/book3s/64/mmu.h |  10 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h |  62 ++
 arch/powerpc/include/asm/pkeys.h | 124 +++
 arch/powerpc/include/asm/systbl.h|   2 +
 arch/powerpc/include/asm/unistd.h|   4 +-
 arch/powerpc/include/uapi/asm/unistd.h   |   2 +
 arch/powerpc/mm/Makefile |   1 +
 arch/powerpc/mm/mmu_context_book3s64.c   |   5 ++
 arch/powerpc/mm/pkeys.c  |  88 +++
 10 files changed, 310 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pkeys.h
 create mode 100644 arch/powerpc/mm/pkeys.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7c8f99..81202e5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -871,6 +871,21 @@ config SECCOMP
 
  If unsure, say Y. Only embedded should say N here.
 
+config PPC64_MEMORY_PROTECTION_KEYS
+   prompt "PowerPC Memory Protection Keys"
+   def_bool y
+   # Note: only available in 64-bit mode
+   depends on PPC64
+   select ARCH_USES_HIGH_VMA_FLAGS
+   select ARCH_HAS_PKEYS
+   ---help---
+ Memory Protection Keys provides a mechanism for enforcing
+ page-based protections, but without requiring modification of the
+ page tables when an application changes protection domains.
+
+ For details, see Documentation/powerpc/protection-keys.txt
+
+ If unsure, say y.
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 77529a3..0c0a2a8 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -108,6 +108,16 @@ struct patb_entry {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
struct list_head iommu_group_mem_list;
 #endif
+
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   /*
+* Each bit represents one protection key.
+* bit set   -> key allocated
+* bit unset -> key available for allocation
+*/
+   u32 pkey_allocation_map;
+   s16 execute_only_pkey; /* key holding execute-only protection */
+#endif
 } mm_context_t;
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 85bc987..87e9a89 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -428,6 +428,68 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 1);
 }
 
+
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+
+#include 
+static inline u64 read_amr(void)
+{
+   return mfspr(SPRN_AMR);
+}
+static inline void write_amr(u64 value)
+{
+   mtspr(SPRN_AMR, value);
+}
+static inline u64 read_iamr(void)
+{
+   return mfspr(SPRN_IAMR);
+}
+static inline void write_iamr(u64 value)
+{
+   mtspr(SPRN_IAMR, value);
+}
+static inline u64 read_uamor(void)
+{
+   return mfspr(SPRN_UAMOR);
+}
+static inline void write_uamor(u64 value)
+{
+   mtspr(SPRN_UAMOR, value);
+}
+
+#else /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
+static inline u64 read_amr(void)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+   return -1;
+}
+static inline void write_amr(u64 value)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+}
+static inline u64 read_uamor(void)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+   return -1;
+}
+static inline void write_uamor(u64 value)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+}
+static inline u64 read_iamr(void)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+   return -1;
+}
+static inline void write_iamr(u64 value)
+{
+   WARN(1, "%s called with MEMORY PROTECTION KEYS disabled\n", __func__);
+}
+
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
+
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
new file mode 100644
index 000..7bc8746
--- /dev/null
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -0,0 +1,124 @@
+#ifndef _ASM_PPC64_PKEYS_H
+#define _ASM_PPC64_PKEYS_H
+
+
+#define arch_max_pkey()  32
+
+#define 

[RFC v4 07/17] powerpc: make the hash functions protection-key aware

2017-06-27 Thread Ram Pai
Prepare the hash functions to be aware of protection keys.
This key will later be used to program the HPTE.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 14 ++-
 arch/powerpc/mm/hash64_4k.c   |  4 ++--
 arch/powerpc/mm/hash64_64k.c  |  8 +++
 arch/powerpc/mm/hash_utils_64.c   | 34 ++-
 arch/powerpc/mm/hugepage-hash64.c |  4 ++--
 arch/powerpc/mm/hugetlbpage-hash64.c  |  5 ++--
 arch/powerpc/mm/mem.c |  1 +
 arch/powerpc/mm/mmu_decl.h|  5 +++-
 9 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 4e957b0..3c1ef01 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -92,7 +92,7 @@ static inline int hash__pgd_bad(pgd_t pgd)
 
 extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
-extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
+extern unsigned long htab_convert_pte_flags(unsigned long pteflags, int pkey);
 /* Atomic PTE updates */
 static inline unsigned long hash__pte_update(struct mm_struct *mm,
 unsigned long addr,
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 6981a52..aa3c299 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -430,11 +430,11 @@ static inline unsigned long hpt_hash(unsigned long vpn,
 #define HPTE_NOHPTE_UPDATE 0x2
 
 extern int __hash_page_4K(unsigned long ea, unsigned long access,
- unsigned long vsid, pte_t *ptep, unsigned long trap,
- unsigned long flags, int ssize, int subpage_prot);
+ unsigned long vsid, pte_t *ptep, unsigned long trap,
+ unsigned long flags, int ssize, int subpage_prot, int pkey);
 extern int __hash_page_64K(unsigned long ea, unsigned long access,
   unsigned long vsid, pte_t *ptep, unsigned long trap,
-  unsigned long flags, int ssize);
+  unsigned long flags, int ssize, int pkey);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
 extern int hash_page_mm(struct mm_struct *mm, unsigned long ea,
@@ -444,16 +444,18 @@ extern int hash_page(unsigned long ea, unsigned long 
access, unsigned long trap,
 unsigned long dsisr);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long 
vsid,
 pte_t *ptep, unsigned long trap, unsigned long flags,
-int ssize, unsigned int shift, unsigned int mmu_psize);
+int ssize, unsigned int shift, unsigned int mmu_psize,
+int pkey);
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern int __hash_page_thp(unsigned long ea, unsigned long access,
   unsigned long vsid, pmd_t *pmdp, unsigned long trap,
-  unsigned long flags, int ssize, unsigned int psize);
+  unsigned long flags, int ssize, unsigned int psize,
+  int pkey);
 #else
 static inline int __hash_page_thp(unsigned long ea, unsigned long access,
  unsigned long vsid, pmd_t *pmdp,
  unsigned long trap, unsigned long flags,
- int ssize, unsigned int psize)
+ int ssize, unsigned int psize, int pkey)
 {
BUG();
return -1;
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 6fa450c..6765ba2 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -18,7 +18,7 @@
 
 int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
   pte_t *ptep, unsigned long trap, unsigned long flags,
-  int ssize, int subpg_prot)
+  int ssize, int subpg_prot, int pkey)
 {
unsigned long hpte_group;
unsigned long rflags, pa;
@@ -53,7 +53,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * PP bits. _PAGE_USER is already PP bit 0x2, so we only
 * need to add in 0x1 if it's a read-only user page
 */
-   rflags = htab_convert_pte_flags(new_pte);
+   rflags = htab_convert_pte_flags(new_pte, pkey);
 
if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 1a68cb1..9ce4d7b 100644

[RFC v4 06/17] powerpc: Implementation for sys_mprotect_pkey() system call

2017-06-27 Thread Ram Pai
This system call, associates the pkey with vma corresponding to
the given address range.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/mman.h|  8 ++-
 arch/powerpc/include/asm/pkeys.h   | 17 ++-
 arch/powerpc/include/asm/systbl.h  |  1 +
 arch/powerpc/include/asm/unistd.h  |  4 +-
 arch/powerpc/include/uapi/asm/unistd.h |  1 +
 arch/powerpc/mm/pkeys.c| 93 +-
 6 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 30922f6..067eec2 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -22,7 +23,12 @@
 static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
unsigned long pkey)
 {
-   return (prot & PROT_SAO) ? VM_SAO : 0;
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   return (((prot & PROT_SAO) ? VM_SAO : 0) |
+   pkey_to_vmflag_bits(pkey));
+#else
+   return ((prot & PROT_SAO) ? VM_SAO : 0);
+#endif
 }
 #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
 
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 7bc8746..41bf5d4 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -14,6 +14,15 @@
VM_PKEY_BIT3 | \
VM_PKEY_BIT4)
 
+static inline unsigned long  pkey_to_vmflag_bits(int pkey)
+{
+   return (((pkey & 0x1UL) ? VM_PKEY_BIT0 : 0x0UL) |
+   ((pkey & 0x2UL) ? VM_PKEY_BIT1 : 0x0UL) |
+   ((pkey & 0x4UL) ? VM_PKEY_BIT2 : 0x0UL) |
+   ((pkey & 0x8UL) ? VM_PKEY_BIT3 : 0x0UL) |
+   ((pkey & 0x10UL) ? VM_PKEY_BIT4 : 0x0UL));
+}
+
 /*
  * Bits are in BE format.
  * NOTE: key 31, 1, 0 are not used.
@@ -42,6 +51,12 @@
 #define mm_set_pkey_is_reserved(mm, pkey) (PKEY_INITIAL_ALLOCAION & \
pkeybit_mask(pkey))
 
+
+static inline int vma_pkey(struct vm_area_struct *vma)
+{
+   return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
+}
+
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
/* a reserved key is never considered as 'explicitly allocated' */
@@ -114,7 +129,7 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
return __arch_set_user_pkey_access(tsk, pkey, init_val);
 }
 
-static inline pkey_mm_init(struct mm_struct *mm)
+static inline void pkey_mm_init(struct mm_struct *mm)
 {
mm_pkey_allocation_map(mm) = PKEY_INITIAL_ALLOCAION;
/* -1 means unallocated or invalid */
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 22dd776..b33b551 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -390,3 +390,4 @@
 SYSCALL(statx)
 SYSCALL(pkey_alloc)
 SYSCALL(pkey_free)
+SYSCALL(pkey_mprotect)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index e0273bc..daf1ba9 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,12 +12,10 @@
 #include 
 
 
-#define NR_syscalls386
+#define NR_syscalls387
 
 #define __NR__exit __NR_exit
 
-#define __IGNORE_pkey_mprotect
-
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 7993a07..71ae45e 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -396,5 +396,6 @@
 #define __NR_statx 383
 #define __NR_pkey_alloc384
 #define __NR_pkey_free 385
+#define __NR_pkey_mprotect 386
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index b97366e..11a32b3 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -15,6 +15,17 @@
 #include /* PKEY_*   */
 #include 
 
+#define pkeyshift(pkey) ((arch_max_pkey()-pkey-1) * AMR_BITS_PER_PKEY)
+
+static inline bool pkey_allows_readwrite(int pkey)
+{
+   int pkey_shift = pkeyshift(pkey);
+
+   if (!(read_uamor() & (0x3UL << pkey_shift)))
+   return true;
+
+   return !(read_amr() & ((AMR_AD_BIT|AMR_WD_BIT) << pkey_shift));
+}
 
 /*
  * set the access right in AMR IAMR and UAMOR register
@@ -68,7 +79,60 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
 int __execute_only_pkey(struct mm_struct *mm)
 {
-   return -1;
+   bool need_to_set_mm_pkey = false;
+   int execute_only_pkey = mm->context.execute_only_pkey;
+   int ret;
+
+   /* Do we need to assign a pkey for mm's execute-only maps? */
+   if (execute_only_pkey == -1) {
+   /* Go allocate one to use, which might 

[RFC v4 09/17] powerpc: call the hash functions with the correct pkey value

2017-06-27 Thread Ram Pai
Pass the correct protection key value to the hash functions on
page fault.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/pkeys.h | 11 +++
 arch/powerpc/mm/hash_utils_64.c  |  4 
 arch/powerpc/mm/mem.c|  6 ++
 3 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index ef1c601..1370b3f 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -74,6 +74,17 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
*mm, int pkey)
 }
 
 /*
+ * return the protection key of the vma corresponding to the
+ * given effective address @ea.
+ */
+static inline int mm_pkey(struct mm_struct *mm, unsigned long ea)
+{
+   struct vm_area_struct *vma = find_vma(mm, ea);
+   int pkey = vma ? vma_pkey(vma) : 0;
+   return pkey;
+}
+
+/*
  * Returns a positive, 5-bit key on success, or -1 on failure.
  */
 static inline int mm_pkey_alloc(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7e67dea..403f75d 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1319,6 +1319,10 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea,
goto bail;
}
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   pkey = mm_pkey(mm, ea);
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
if (hugeshift) {
if (is_thp)
rc = __hash_page_thp(ea, access, vsid, (pmd_t *)ptep,
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index ec890d3..0fcaa48 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -541,8 +541,14 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned 
long address,
return;
}
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   hash_preload_pkey(vma->vm_mm, address, access, trap, vma_pkey(vma));
+#else
hash_preload(vma->vm_mm, address, access, trap);
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
 #endif /* CONFIG_PPC_STD_MMU */
+
 #if (defined(CONFIG_PPC_BOOK3E_64) || defined(CONFIG_PPC_FSL_BOOK3E)) \
&& defined(CONFIG_HUGETLB_PAGE)
if (is_vm_hugetlb_page(vma))
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 05/17] powerpc: store and restore the pkey state across context switches

2017-06-27 Thread Ram Pai
Store and restore the AMR, IAMR and UMOR register state of the task
before scheduling out and after scheduling in, respectively.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/processor.h |  5 +
 arch/powerpc/kernel/process.c| 18 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index a2123f2..1f714df 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -310,6 +310,11 @@ struct thread_struct {
struct thread_vr_state ckvr_state; /* Checkpointed VR state */
unsigned long   ckvrsave; /* Checkpointed VRSAVE */
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   unsigned long   amr;
+   unsigned long   iamr;
+   unsigned long   uamor;
+#endif
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
 #endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104..37d001a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1096,6 +1096,11 @@ static inline void save_sprs(struct thread_struct *t)
t->tar = mfspr(SPRN_TAR);
}
 #endif
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   t->amr = mfspr(SPRN_AMR);
+   t->iamr = mfspr(SPRN_IAMR);
+   t->uamor = mfspr(SPRN_UAMOR);
+#endif
 }
 
 static inline void restore_sprs(struct thread_struct *old_thread,
@@ -1131,6 +1136,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
 #endif
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   if (old_thread->amr != new_thread->amr)
+   mtspr(SPRN_AMR, new_thread->amr);
+   if (old_thread->iamr != new_thread->iamr)
+   mtspr(SPRN_IAMR, new_thread->iamr);
+   if (old_thread->uamor != new_thread->uamor)
+   mtspr(SPRN_UAMOR, new_thread->uamor);
+#endif
 }
 
 struct task_struct *__switch_to(struct task_struct *prev,
@@ -1686,6 +1699,11 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.tm_texasr = 0;
current->thread.tm_tfiar = 0;
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   current->thread.amr   = 0x0ul;
+   current->thread.iamr  = 0x0ul;
+   current->thread.uamor = 0x0ul;
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
 }
 EXPORT_SYMBOL(start_thread);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 08/17] powerpc: Program HPTE key protection bits

2017-06-27 Thread Ram Pai
Map the PTE protection key bits to the HPTE key protection bits,
while creating HPTE  entries.

Signed-off-by: Ram Pai 
---
 Makefile  | 2 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +
 arch/powerpc/include/asm/pkeys.h  | 9 +
 arch/powerpc/mm/hash_utils_64.c   | 4 
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 470bd4d..141ea4e 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 4
 PATCHLEVEL = 12
 SUBLEVEL = 0
-EXTRAVERSION = -rc3
+EXTRAVERSION = -rc3-64k
 NAME = Fearless Coyote
 
 # *DOCUMENTATION*
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index aa3c299..721a4c3 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -90,6 +90,8 @@
 #define HPTE_R_PP0 ASM_CONST(0x8000)
 #define HPTE_R_TS  ASM_CONST(0x4000)
 #define HPTE_R_KEY_HI  ASM_CONST(0x3000)
+#define HPTE_R_KEY_BIT0ASM_CONST(0x2000)
+#define HPTE_R_KEY_BIT1ASM_CONST(0x1000)
 #define HPTE_R_RPN_SHIFT   12
 #define HPTE_R_RPN ASM_CONST(0x0000)
 #define HPTE_R_RPN_3_0 ASM_CONST(0x01fff000)
@@ -104,6 +106,9 @@
 #define HPTE_R_C   ASM_CONST(0x0080)
 #define HPTE_R_R   ASM_CONST(0x0100)
 #define HPTE_R_KEY_LO  ASM_CONST(0x0e00)
+#define HPTE_R_KEY_BIT2ASM_CONST(0x0800)
+#define HPTE_R_KEY_BIT3ASM_CONST(0x0400)
+#define HPTE_R_KEY_BIT4ASM_CONST(0x0200)
 
 #define HPTE_V_1TB_SEG ASM_CONST(0x4000)
 #define HPTE_V_VRMA_MASK   ASM_CONST(0x4001ff00)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 41bf5d4..ef1c601 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -23,6 +23,15 @@ static inline unsigned long  pkey_to_vmflag_bits(int pkey)
((pkey & 0x10UL) ? VM_PKEY_BIT4 : 0x0UL));
 }
 
+static inline unsigned long  pkey_to_hpte_pkey_bits(int pkey)
+{
+   return  (((pkey & 0x10) ? HPTE_R_KEY_BIT0 : 0x0UL) |
+   ((pkey & 0x8) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+   ((pkey & 0x4) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+   ((pkey & 0x2) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+   ((pkey & 0x1) ? HPTE_R_KEY_BIT4 : 0x0UL));
+}
+
 /*
  * Bits are in BE format.
  * NOTE: key 31, 1, 0 are not used.
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 2254ff0..7e67dea 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -231,6 +231,10 @@ unsigned long htab_convert_pte_flags(unsigned long 
pteflags, int pkey)
 */
rflags |= HPTE_R_M;
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   rflags |= pkey_to_hpte_pkey_bits(pkey);
+#endif
+
return rflags;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 10/17] powerpc: Macro the mask used for checking DSI exception

2017-06-27 Thread Ram Pai
Replace the magic number used to check for DSI exception
with a meaningful value.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/reg.h   | 7 ++-
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 7e50e47..ba110dd 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -272,16 +272,21 @@
 #define SPRN_DAR   0x013   /* Data Address Register */
 #define SPRN_DBCR  0x136   /* e300 Data Breakpoint Control Reg */
 #define SPRN_DSISR 0x012   /* Data Storage Interrupt Status Register */
+#define   DSISR_BIT32  0x8000  /* not defined */
 #define   DSISR_NOHPTE 0x4000  /* no translation found */
+#define   DSISR_PAGEATTR_CONFLT0x2000  /* page attribute 
conflict */
+#define   DSISR_BIT35  0x1000  /* not defined */
 #define   DSISR_PROTFAULT  0x0800  /* protection fault */
 #define   DSISR_BADACCESS  0x0400  /* bad access to CI or G */
 #define   DSISR_ISSTORE0x0200  /* access was a store */
 #define   DSISR_DABRMATCH  0x0040  /* hit data breakpoint */
-#define   DSISR_NOSEGMENT  0x0020  /* SLB miss */
 #define   DSISR_KEYFAULT   0x0020  /* Key fault */
+#define   DSISR_BIT43  0x0010  /* not defined */
 #define   DSISR_UNSUPP_MMU 0x0008  /* Unsupported MMU config */
 #define   DSISR_SET_RC 0x0004  /* Failed setting of R/C bits */
 #define   DSISR_PGDIRFAULT  0x0002  /* Fault on page directory */
+#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | DSISR_PAGEATTR_CONFLT | \
+   DSISR_BADACCESS | DSISR_BIT43)
 #define SPRN_TBRL  0x10C   /* Time Base Read Lower Register (user, R/O) */
 #define SPRN_TBRU  0x10D   /* Time Base Read Upper Register (user, R/O) */
 #define SPRN_CIR   0x11B   /* Chip Information Register (hyper, R/0) */
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae418b8..3fd0528 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1411,7 +1411,7 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
 do_hash_page:
 #ifdef CONFIG_PPC_STD_MMU_64
-   andis.  r0,r4,0xa410/* weird error? */
+   andis.  r0,r4,DSISR_PAGE_FAULT_MASK@h
bne-handle_page_fault   /* if not, try to insert a HPTE */
andis.  r0,r4,DSISR_DABRMATCH@h
bne-handle_dabr_fault
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 12/17] powerpc: Deliver SEGV signal on pkey violation

2017-06-27 Thread Ram Pai
The value of the AMR register at the time of exception
is made available in gp_regs[PT_AMR] of the siginfo.

The value of the pkey, whose protection got violated,
is made available in si_pkey field of the siginfo structure.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/paca.h|  1 +
 arch/powerpc/include/uapi/asm/ptrace.h |  3 ++-
 arch/powerpc/kernel/asm-offsets.c  |  5 
 arch/powerpc/kernel/exceptions-64s.S   | 16 +--
 arch/powerpc/kernel/signal_32.c|  5 
 arch/powerpc/kernel/signal_64.c|  4 +++
 arch/powerpc/kernel/traps.c| 49 ++
 arch/powerpc/mm/fault.c|  2 ++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 1c09f8f..a41afd3 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -92,6 +92,7 @@ struct paca_struct {
struct dtl_entry *dispatch_log_end;
 #endif /* CONFIG_PPC_STD_MMU_64 */
u64 dscr_default;   /* per-CPU default DSCR */
+   u64 paca_amr;   /* value of amr at exception */
 
 #ifdef CONFIG_PPC_STD_MMU_64
/*
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 8036b38..7ec2428 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -108,8 +108,9 @@ struct pt_regs {
 #define PT_DAR 41
 #define PT_DSISR 42
 #define PT_RESULT 43
-#define PT_DSCR 44
 #define PT_REGS_COUNT 44
+#define PT_DSCR 44
+#define PT_AMR 45
 
 #define PT_FPR048  /* each FP reg occupies 2 slots in this space */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 709e234..17f5d8a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -241,6 +241,11 @@ int main(void)
OFFSET(PACAHWCPUID, paca_struct, hw_cpu_id);
OFFSET(PACAKEXECSTATE, paca_struct, kexec_state);
OFFSET(PACA_DSCR_DEFAULT, paca_struct, dscr_default);
+
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   OFFSET(PACA_AMR, paca_struct, paca_amr);
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
OFFSET(ACCOUNT_STARTTIME, paca_struct, accounting.starttime);
OFFSET(ACCOUNT_STARTTIME_USER, paca_struct, accounting.starttime_user);
OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3fd0528..a4de1b4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -493,9 +493,15 @@ EXC_COMMON_BEGIN(data_access_common)
ld  r12,_MSR(r1)
ld  r3,PACA_EXGEN+EX_DAR(r13)
lwz r4,PACA_EXGEN+EX_DSISR(r13)
-   li  r5,0x300
std r3,_DAR(r1)
std r4,_DSISR(r1)
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   andis.  r0,r4,DSISR_KEYFAULT@h /* save AMR only if its a key fault */
+   beq+1f
+   mfspr   r5,SPRN_AMR
+   std r5,PACA_AMR(r13)
+#endif /*  CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+1: li  r5,0x300
 BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
@@ -561,9 +567,15 @@ EXC_COMMON_BEGIN(instruction_access_common)
ld  r12,_MSR(r1)
ld  r3,_NIP(r1)
andis.  r4,r12,0x5820
-   li  r5,0x400
std r3,_DAR(r1)
std r4,_DSISR(r1)
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   andis.  r0,r4,DSISR_KEYFAULT@h /* save AMR only if its a key fault */
+   beq+1f
+   mfspr   r5,SPRN_AMR
+   std r5,PACA_AMR(r13)
+#endif /*  CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+1: li  r5,0x400
 BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 97bb138..9c4a7f3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -500,6 +500,11 @@ static int save_user_regs(struct pt_regs *regs, struct 
mcontext __user *frame,
   (unsigned long) >tramp[2]);
}
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   if (__put_user(get_paca()->paca_amr, >mc_gregs[PT_AMR]))
+   return 1;
+#endif /*  CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
return 0;
 }
 
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index c83c115..86a4262 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -174,6 +174,10 @@ static long setup_sigcontext(struct sigcontext __user *sc,
if (set != NULL)
err |=  __put_user(set->sig[0], >oldmask);
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   err |= __put_user(get_paca()->paca_amr, 

[RFC v4 11/17] powerpc: Handle exceptions caused by pkey violation

2017-06-27 Thread Ram Pai
Handle Data and Instruction exceptions caused by memory
protection-key.

Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/mmu_context.h | 12 ++
 arch/powerpc/include/asm/reg.h |  2 +-
 arch/powerpc/mm/fault.c| 20 +
 arch/powerpc/mm/pkeys.c| 79 ++
 4 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index da7e943..71fffe0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -175,11 +175,23 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
 {
 }
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+bool arch_pte_access_permitted(pte_t pte, bool write);
+bool arch_vma_access_permitted(struct vm_area_struct *vma,
+   bool write, bool execute, bool foreign);
+#else /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
bool write, bool execute, bool foreign)
 {
/* by default, allow everything */
return true;
 }
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index ba110dd..6e2a860 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -286,7 +286,7 @@
 #define   DSISR_SET_RC 0x0004  /* Failed setting of R/C bits */
 #define   DSISR_PGDIRFAULT  0x0002  /* Fault on page directory */
 #define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | DSISR_PAGEATTR_CONFLT | \
-   DSISR_BADACCESS | DSISR_BIT43)
+   DSISR_BADACCESS | DSISR_KEYFAULT | DSISR_BIT43)
 #define SPRN_TBRL  0x10C   /* Time Base Read Lower Register (user, R/O) */
 #define SPRN_TBRU  0x10D   /* Time Base Read Upper Register (user, R/O) */
 #define SPRN_CIR   0x11B   /* Chip Information Register (hyper, R/0) */
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3a7d580..3d71984 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -261,6 +261,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
}
 #endif
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   if (error_code & DSISR_KEYFAULT) {
+   code = SEGV_PKUERR;
+   goto bad_area_nosemaphore;
+   }
+#endif /*  CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
/* We restore the interrupt state now */
if (!arch_irq_disabled_regs(regs))
local_irq_enable();
@@ -441,6 +448,19 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
 #endif /* CONFIG_PPC_STD_MMU */
 
+#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
+   if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
+   is_exec, 0)) {
+   code = SEGV_PKUERR;
+   goto bad_area;
+   }
+#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
+
+   /* handle_mm_fault() needs to know if its a instruction access
+* fault.
+*/
+   if (is_exec)
+   flags |= FAULT_FLAG_INSTRUCTION;
/*
 * If for any reason at all we couldn't handle the fault,
 * make sure we exit gracefully rather than endlessly redo
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 11a32b3..514f503 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -27,6 +27,37 @@ static inline bool pkey_allows_readwrite(int pkey)
return !(read_amr() & ((AMR_AD_BIT|AMR_WD_BIT) << pkey_shift));
 }
 
+static inline bool pkey_allows_read(int pkey)
+{
+   int pkey_shift = (arch_max_pkey()-pkey-1) * AMR_BITS_PER_PKEY;
+
+   if (!(read_uamor() & (0x3ul << pkey_shift)))
+   return true;
+
+   return !(read_amr() & (AMR_AD_BIT << pkey_shift));
+}
+
+static inline bool pkey_allows_write(int pkey)
+{
+   int pkey_shift = (arch_max_pkey()-pkey-1) * AMR_BITS_PER_PKEY;
+
+   if (!(read_uamor() & (0x3ul << pkey_shift)))
+   return true;
+
+   return !(read_amr() & (AMR_WD_BIT << pkey_shift));
+}
+
+static inline bool pkey_allows_execute(int pkey)
+{
+   int pkey_shift = (arch_max_pkey()-pkey-1) * AMR_BITS_PER_PKEY;
+
+   if (!(read_uamor() & (0x3ul << pkey_shift)))
+   return true;
+
+   return !(read_iamr() & (IAMR_EX_BIT << pkey_shift));
+}
+
+
 /*
  * set the access right in AMR IAMR and UAMOR register
  * for @pkey to that specified in @init_val.
@@ -175,3 +206,51 @@ int __arch_override_mprotect_pkey(struct vm_area_struct 
*vma, int prot,
 */
return 

[RFC v4 15/17] Documentation: Move protecton key documentation to arch neutral directory

2017-06-27 Thread Ram Pai
Since PowerPC and Intel both support memory protection keys, moving
the documenation to arch-neutral directory.

Signed-off-by: Ram Pai 
---
 Documentation/vm/protection-keys.txt  | 85 +++
 Documentation/x86/protection-keys.txt | 85 ---
 2 files changed, 85 insertions(+), 85 deletions(-)
 create mode 100644 Documentation/vm/protection-keys.txt
 delete mode 100644 Documentation/x86/protection-keys.txt

diff --git a/Documentation/vm/protection-keys.txt 
b/Documentation/vm/protection-keys.txt
new file mode 100644
index 000..b643045
--- /dev/null
+++ b/Documentation/vm/protection-keys.txt
@@ -0,0 +1,85 @@
+Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
+which will be found on future Intel CPUs.
+
+Memory Protection Keys provides a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains.  It works by
+dedicating 4 previously ignored bits in each page table entry to a
+"protection key", giving 16 possible keys.
+
+There is also a new user-accessible register (PKRU) with two separate
+bits (Access Disable and Write Disable) for each key.  Being a CPU
+register, PKRU is inherently thread-local, potentially giving each
+thread a different set of protections from every other thread.
+
+There are two new instructions (RDPKRU/WRPKRU) for reading and writing
+to the new register.  The feature is only available in 64-bit mode,
+even though there is theoretically space in the PAE PTEs.  These
+permissions are enforced on data access only and have no effect on
+instruction fetches.
+
+=== Syscalls ===
+
+There are 3 system calls which directly interact with pkeys:
+
+   int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+   int pkey_free(int pkey);
+   int pkey_mprotect(unsigned long start, size_t len,
+ unsigned long prot, int pkey);
+
+Before a pkey can be used, it must first be allocated with
+pkey_alloc().  An application calls the WRPKRU instruction
+directly in order to change access permissions to memory covered
+with a key.  In this example WRPKRU is wrapped by a C function
+called pkey_set().
+
+   int real_prot = PROT_READ|PROT_WRITE;
+   pkey = pkey_alloc(0, PKEY_DENY_WRITE);
+   ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 
0);
+   ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
+   ... application runs here
+
+Now, if the application needs to update the data at 'ptr', it can
+gain access, do the update, then remove its write access:
+
+   pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
+   *ptr = foo; // assign something
+   pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again
+
+Now when it frees the memory, it will also free the pkey since it
+is no longer in use:
+
+   munmap(ptr, PAGE_SIZE);
+   pkey_free(pkey);
+
+(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+ An example implementation can be found in
+ tools/testing/selftests/x86/protection_keys.c)
+
+=== Behavior ===
+
+The kernel attempts to make protection keys consistent with the
+behavior of a plain mprotect().  For instance if you do this:
+
+   mprotect(ptr, size, PROT_NONE);
+   something(ptr);
+
+you can expect the same effects with protection keys when doing this:
+
+   pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
+   pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
+   something(ptr);
+
+That should be true whether something() is a direct access to 'ptr'
+like:
+
+   *ptr = foo;
+
+or when the kernel does the access on the application's behalf like
+with a read():
+
+   read(fd, ptr, 1);
+
+The kernel will send a SIGSEGV in both cases, but si_code will be set
+to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
+the plain mprotect() permissions are violated.
diff --git a/Documentation/x86/protection-keys.txt 
b/Documentation/x86/protection-keys.txt
deleted file mode 100644
index b643045..000
--- a/Documentation/x86/protection-keys.txt
+++ /dev/null
@@ -1,85 +0,0 @@
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
-which will be found on future Intel CPUs.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains.  It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key.  Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of 

[RFC v4 14/17] selftest: PowerPC specific test updates to memory protection keys

2017-06-27 Thread Ram Pai
Abstracted out the arch specific code into the header file, and
added powerpc specific changes.

a) added 4k-backed hpte, memory allocator, powerpc specific.
b) added three test case where the key is associated after the page is
accessed/allocated/mapped.
c) cleaned up the code to make checkpatch.pl happy

Signed-off-by: Ram Pai 
---
 tools/testing/selftests/vm/pkey-helpers.h| 230 +--
 tools/testing/selftests/vm/protection_keys.c | 567 ---
 2 files changed, 518 insertions(+), 279 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index b202939..69bfa89 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -12,13 +12,72 @@
 #include 
 #include 
 
-#define NR_PKEYS 16
-#define PKRU_BITS_PER_PKEY 2
+/* Define some kernel-like types */
+#define  u8 uint8_t
+#define u16 uint16_t
+#define u32 uint32_t
+#define u64 uint64_t
+
+#ifdef __i386__ /* arch */
+
+#define SYS_mprotect_key 380
+#define SYS_pkey_alloc  381
+#define SYS_pkey_free   382
+#define REG_IP_IDX REG_EIP
+#define si_pkey_offset 0x14
+
+#define NR_PKEYS   16
+#define NR_RESERVED_PKEYS  1
+#define PKRU_BITS_PER_PKEY 2
+#define PKEY_DISABLE_ACCESS0x1
+#define PKEY_DISABLE_WRITE 0x2
+#define HPAGE_SIZE (1UL<<21)
+
+#define INIT_PRKU 0x0UL
+
+#elif __powerpc64__ /* arch */
+
+#define SYS_mprotect_key 386
+#define SYS_pkey_alloc  384
+#define SYS_pkey_free   385
+#define si_pkey_offset 0x20
+#define REG_IP_IDX PT_NIP
+#define REG_TRAPNO PT_TRAP
+#define REG_AMR45
+#define gregs gp_regs
+#define fpregs fp_regs
+
+#define NR_PKEYS   32
+#define NR_RESERVED_PKEYS  3
+#define PKRU_BITS_PER_PKEY 2
+#define PKEY_DISABLE_ACCESS0x3  /* disable read and write */
+#define PKEY_DISABLE_WRITE 0x2
+#define HPAGE_SIZE (1UL<<24)
+
+#define INIT_PRKU 0x3UL
+#else /* arch */
+
+   NOT SUPPORTED
+
+#endif /* arch */
+
 
 #ifndef DEBUG_LEVEL
 #define DEBUG_LEVEL 0
 #endif
 #define DPRINT_IN_SIGNAL_BUF_SIZE 4096
+
+
+static inline u32 pkey_to_shift(int pkey)
+{
+#ifdef __i386__ /* arch */
+   return pkey * PKRU_BITS_PER_PKEY;
+#elif __powerpc64__ /* arch */
+   return (NR_PKEYS - pkey - 1) * PKRU_BITS_PER_PKEY;
+#endif /* arch */
+}
+
+
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 static inline void sigsafe_printf(const char *format, ...)
@@ -53,53 +112,76 @@ static inline void sigsafe_printf(const char *format, ...)
 #define dprintf3(args...) dprintf_level(3, args)
 #define dprintf4(args...) dprintf_level(4, args)
 
-extern unsigned int shadow_pkru;
-static inline unsigned int __rdpkru(void)
+extern u64 shadow_pkey_reg;
+
+static inline u64 __rdpkey_reg(void)
 {
+#ifdef __i386__ /* arch */
unsigned int eax, edx;
unsigned int ecx = 0;
-   unsigned int pkru;
+   unsigned int pkey_reg;
 
asm volatile(".byte 0x0f,0x01,0xee\n\t"
 : "=a" (eax), "=d" (edx)
 : "c" (ecx));
-   pkru = eax;
-   return pkru;
+#elif __powerpc64__ /* arch */
+   u64 eax;
+   u64 pkey_reg;
+
+   asm volatile("mfspr %0, 0xd" : "=r" ((u64)(eax)));
+#endif /* arch */
+   pkey_reg = (u64)eax;
+   return pkey_reg;
 }
 
-static inline unsigned int _rdpkru(int line)
+static inline u64 _rdpkey_reg(int line)
 {
-   unsigned int pkru = __rdpkru();
+   u64 pkey_reg = __rdpkey_reg();
 
-   dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
-   line, pkru, shadow_pkru);
-   assert(pkru == shadow_pkru);
+   dprintf4("rdpkey_reg(line=%d) pkey_reg: %lx shadow: %lx\n",
+   line, pkey_reg, shadow_pkey_reg);
+   assert(pkey_reg == shadow_pkey_reg);
 
-   return pkru;
+   return pkey_reg;
 }
 
-#define rdpkru() _rdpkru(__LINE__)
+#define rdpkey_reg() _rdpkey_reg(__LINE__)
 
-static inline void __wrpkru(unsigned int pkru)
+static inline void __wrpkey_reg(u64 pkey_reg)
 {
-   unsigned int eax = pkru;
+#ifdef __i386__ /* arch */
+   unsigned int eax = pkey_reg;
unsigned int ecx = 0;
unsigned int edx = 0;
 
-   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   dprintf4("%s() changing %lx to %lx\n",
+__func__, __rdpkey_reg(), pkey_reg);
asm volatile(".byte 0x0f,0x01,0xef\n\t"
 : : "a" (eax), "c" (ecx), "d" (edx));
-   assert(pkru == __rdpkru());
+   dprintf4("%s() PKRUP after changing %lx to %lx\n",
+   __func__, __rdpkey_reg(), pkey_reg);
+#else /* arch */
+   u64 eax = pkey_reg;
+
+   dprintf4("%s() changing %llx to %llx\n",
+__func__, __rdpkey_reg(), pkey_reg);
+   asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(eax)) : "memory");
+   

[RFC v4 13/17] selftest: Move protecton key selftest to arch neutral directory

2017-06-27 Thread Ram Pai
Signed-off-by: Ram Pai 
---
 tools/testing/selftests/vm/Makefile   |1 +
 tools/testing/selftests/vm/pkey-helpers.h |  219 
 tools/testing/selftests/vm/protection_keys.c  | 1395 +
 tools/testing/selftests/x86/Makefile  |2 +-
 tools/testing/selftests/x86/pkey-helpers.h|  219 
 tools/testing/selftests/x86/protection_keys.c | 1395 -
 6 files changed, 1616 insertions(+), 1615 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
 create mode 100644 tools/testing/selftests/vm/protection_keys.c
 delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
 delete mode 100644 tools/testing/selftests/x86/protection_keys.c

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index cbb29e4..1d32f78 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -17,6 +17,7 @@ TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
 TEST_GEN_FILES += mlock-random-test
 TEST_GEN_FILES += virtual_address_range
+TEST_GEN_FILES += protection_keys
 
 TEST_PROGS := run_vmtests
 
diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
new file mode 100644
index 000..b202939
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -0,0 +1,219 @@
+#ifndef _PKEYS_HELPER_H
+#define _PKEYS_HELPER_H
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define NR_PKEYS 16
+#define PKRU_BITS_PER_PKEY 2
+
+#ifndef DEBUG_LEVEL
+#define DEBUG_LEVEL 0
+#endif
+#define DPRINT_IN_SIGNAL_BUF_SIZE 4096
+extern int dprint_in_signal;
+extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+static inline void sigsafe_printf(const char *format, ...)
+{
+   va_list ap;
+
+   va_start(ap, format);
+   if (!dprint_in_signal) {
+   vprintf(format, ap);
+   } else {
+   int len = vsnprintf(dprint_in_signal_buffer,
+   DPRINT_IN_SIGNAL_BUF_SIZE,
+   format, ap);
+   /*
+* len is amount that would have been printed,
+* but actual write is truncated at BUF_SIZE.
+*/
+   if (len > DPRINT_IN_SIGNAL_BUF_SIZE)
+   len = DPRINT_IN_SIGNAL_BUF_SIZE;
+   write(1, dprint_in_signal_buffer, len);
+   }
+   va_end(ap);
+}
+#define dprintf_level(level, args...) do { \
+   if (level <= DEBUG_LEVEL)   \
+   sigsafe_printf(args);   \
+   fflush(NULL);   \
+} while (0)
+#define dprintf0(args...) dprintf_level(0, args)
+#define dprintf1(args...) dprintf_level(1, args)
+#define dprintf2(args...) dprintf_level(2, args)
+#define dprintf3(args...) dprintf_level(3, args)
+#define dprintf4(args...) dprintf_level(4, args)
+
+extern unsigned int shadow_pkru;
+static inline unsigned int __rdpkru(void)
+{
+   unsigned int eax, edx;
+   unsigned int ecx = 0;
+   unsigned int pkru;
+
+   asm volatile(".byte 0x0f,0x01,0xee\n\t"
+: "=a" (eax), "=d" (edx)
+: "c" (ecx));
+   pkru = eax;
+   return pkru;
+}
+
+static inline unsigned int _rdpkru(int line)
+{
+   unsigned int pkru = __rdpkru();
+
+   dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
+   line, pkru, shadow_pkru);
+   assert(pkru == shadow_pkru);
+
+   return pkru;
+}
+
+#define rdpkru() _rdpkru(__LINE__)
+
+static inline void __wrpkru(unsigned int pkru)
+{
+   unsigned int eax = pkru;
+   unsigned int ecx = 0;
+   unsigned int edx = 0;
+
+   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   asm volatile(".byte 0x0f,0x01,0xef\n\t"
+: : "a" (eax), "c" (ecx), "d" (edx));
+   assert(pkru == __rdpkru());
+}
+
+static inline void wrpkru(unsigned int pkru)
+{
+   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   /* will do the shadow check for us: */
+   rdpkru();
+   __wrpkru(pkru);
+   shadow_pkru = pkru;
+   dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
+}
+
+/*
+ * These are technically racy. since something could
+ * change PKRU between the read and the write.
+ */
+static inline void __pkey_access_allow(int pkey, int do_allow)
+{
+   unsigned int pkru = rdpkru();
+   int bit = pkey * 2;
+
+   if (do_allow)
+   pkru &= (1<

  1   2   >