[PATCH 3.2 038/104] x86/Documentation: Add PTI description

2018-03-11 Thread Ben Hutchings
3.2.101-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Dave Hansen 

commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream.

Add some details about how PTI works, what some of the downsides
are, and how to debug it when things go wrong.

Also document the kernel parameter: 'pti/nopti'.

Signed-off-by: Dave Hansen 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Randy Dunlap 
Reviewed-by: Kees Cook 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Hugh Dickins 
Cc: Andi Lutomirsky 
Link: https://lkml.kernel.org/r/20180105174436.1bc6f...@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 Documentation//kernel-parameters.txt |  21 ++-
 Documentation/x86/pti.txt| 186 
 2 files changed, 200 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/x86/pti.txt

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1803,8 +1803,6 @@ bytes respectively. Such letter suffixes
 
nojitter[IA-64] Disables jitter checking for ITC timers.
 
-   nopti   [X86-64] Disable KAISER isolation of kernel from user.
-
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
 
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
@@ -2245,11 +2243,20 @@ bytes respectively. Such letter suffixes
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
 
-   pti=[X86_64]
-   Control KAISER user/kernel address space isolation:
-   on - enable
-   off - disable
-   auto - default setting
+   pti=[X86_64] Control Page Table Isolation of user and
+   kernel address spaces.  Disabling this feature
+   removes hardening, but improves performance of
+   system calls and interrupts.
+
+   on   - unconditionally enable
+   off  - unconditionally disable
+   auto - kernel detects whether your CPU model is
+  vulnerable to issues that PTI mitigates
+
+   Not specifying this option is equivalent to pti=auto.
+
+   nopti   [X86_64]
+   Equivalent to pti=off
 
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
--- /dev/null
+++ b/Documentation/x86/pti.txt
@@ -0,0 +1,186 @@
+Overview
+
+
+Page Table Isolation (pti, previously known as KAISER[1]) is a
+countermeasure against attacks on the shared user/kernel address
+space such as the "Meltdown" approach[2].
+
+To mitigate this class of attacks, we create an independent set of
+page tables for use only when running userspace applications.  When
+the kernel is entered via syscalls, interrupts or exceptions, the
+page tables are switched to the full "kernel" copy.  When the system
+switches back to user mode, the user copy is used again.
+
+The userspace page tables contain only a minimal amount of kernel
+data: only what is needed to enter/exit the kernel such as the
+entry/exit functions themselves and the interrupt descriptor table
+(IDT).  There are a few strictly unnecessary things that get mapped
+such as the first C function when entering an interrupt (see
+comments in pti.c).
+
+This approach helps to ensure that side-channel attacks leveraging
+the paging structures do not function when PTI is enabled.  It can be
+enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time.
+Once enabled at compile-time, it can be disabled at boot with the
+'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).
+
+Page Table Management
+=
+
+When PTI is enabled, the kernel manages two sets of page tables.
+The first set is very similar to the single set which is present in
+kernels without PTI.  This includes a complete mapping of userspace
+that the kernel can use for things like copy_to_user().
+
+Although _complete_, the user portion of the kernel page tables is
+crippled by setting the NX bit in the top level.  This ensures
+that any missed kernel->user CR3 switch will immediately crash
+userspace upon executing its first instruction.
+
+The userspace page tables map only the kernel data needed to enter
+and exit the kernel.  This data is entirely contained in the 'struct
+cpu_entry_area' 

[PATCH 3.2 038/104] x86/Documentation: Add PTI description

2018-03-11 Thread Ben Hutchings
3.2.101-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Dave Hansen 

commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream.

Add some details about how PTI works, what some of the downsides
are, and how to debug it when things go wrong.

Also document the kernel parameter: 'pti/nopti'.

Signed-off-by: Dave Hansen 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Randy Dunlap 
Reviewed-by: Kees Cook 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Hugh Dickins 
Cc: Andi Lutomirsky 
Link: https://lkml.kernel.org/r/20180105174436.1bc6f...@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 Documentation//kernel-parameters.txt |  21 ++-
 Documentation/x86/pti.txt| 186 
 2 files changed, 200 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/x86/pti.txt

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1803,8 +1803,6 @@ bytes respectively. Such letter suffixes
 
nojitter[IA-64] Disables jitter checking for ITC timers.
 
-   nopti   [X86-64] Disable KAISER isolation of kernel from user.
-
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
 
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
@@ -2245,11 +2243,20 @@ bytes respectively. Such letter suffixes
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
 
-   pti=[X86_64]
-   Control KAISER user/kernel address space isolation:
-   on - enable
-   off - disable
-   auto - default setting
+   pti=[X86_64] Control Page Table Isolation of user and
+   kernel address spaces.  Disabling this feature
+   removes hardening, but improves performance of
+   system calls and interrupts.
+
+   on   - unconditionally enable
+   off  - unconditionally disable
+   auto - kernel detects whether your CPU model is
+  vulnerable to issues that PTI mitigates
+
+   Not specifying this option is equivalent to pti=auto.
+
+   nopti   [X86_64]
+   Equivalent to pti=off
 
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
--- /dev/null
+++ b/Documentation/x86/pti.txt
@@ -0,0 +1,186 @@
+Overview
+
+
+Page Table Isolation (pti, previously known as KAISER[1]) is a
+countermeasure against attacks on the shared user/kernel address
+space such as the "Meltdown" approach[2].
+
+To mitigate this class of attacks, we create an independent set of
+page tables for use only when running userspace applications.  When
+the kernel is entered via syscalls, interrupts or exceptions, the
+page tables are switched to the full "kernel" copy.  When the system
+switches back to user mode, the user copy is used again.
+
+The userspace page tables contain only a minimal amount of kernel
+data: only what is needed to enter/exit the kernel such as the
+entry/exit functions themselves and the interrupt descriptor table
+(IDT).  There are a few strictly unnecessary things that get mapped
+such as the first C function when entering an interrupt (see
+comments in pti.c).
+
+This approach helps to ensure that side-channel attacks leveraging
+the paging structures do not function when PTI is enabled.  It can be
+enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time.
+Once enabled at compile-time, it can be disabled at boot with the
+'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).
+
+Page Table Management
+=
+
+When PTI is enabled, the kernel manages two sets of page tables.
+The first set is very similar to the single set which is present in
+kernels without PTI.  This includes a complete mapping of userspace
+that the kernel can use for things like copy_to_user().
+
+Although _complete_, the user portion of the kernel page tables is
+crippled by setting the NX bit in the top level.  This ensures
+that any missed kernel->user CR3 switch will immediately crash
+userspace upon executing its first instruction.
+
+The userspace page tables map only the kernel data needed to enter
+and exit the kernel.  This data is entirely contained in the 'struct
+cpu_entry_area' structure which is placed in the fixmap which gives
+each CPU's copy of the area a compile-time-fixed virtual address.
+
+For new userspace mappings, the kernel makes the entries in its
+page tables like normal.  The only difference is when the kernel
+makes entries in the top (PGD) level.  In addition to setting the
+entry in the main kernel PGD, a copy of the entry is made in the