[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster

2017-11-24 Thread Ingo Molnar
From: Dave Hansen 

Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls.  Speed them up.  Makes context switches
and TLB flushing slower.

Background:

KAISER keeps two copies of the page tables.  Switches between the
copies are performed by writing to the CR3 register.  But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages).  This TLB flush
increases the cost of interrupts and context switches.  For
syscall-heavy microbenchmarks it can cut the rate of syscalls by
2/3.

The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski.  This feature is intended to allow you to switch
between contexts without flushing the TLB.

Implementation:

PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.

First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID.  On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.

The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID.  An extra TLB flush operation is now required in
order to flush the user ASID.  This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.

Support:

PCIDs are generally available on Sandybridge and newer CPUs.  However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core).  This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.

Without INVPCID, PCIDs are much harder to use.  TLB invalidation gets
much more onerous:

1. Every kernel TLB flush (even for a single page) requires an
   interrupts-off MOV-to-CR4 which is very expensive.  This is because
   there is no way to flush a kernel address that might be loaded
   in *EVERY* PCID.  Right now, there are "only" ~12 of these per-cpu,
   but that's too painful to use the MOV-to-CR3 to flush them.  That
   leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
   following:
   a. A pair of flushing (bit 63 clear) CR3 writes: one for
  the kernel ASID and another for userspace.
   b. A pair of non-flushing CR3 writes (bit 63 set) with the
  flush done for each.  For instance, what is currently a
  single instruction without KAISER:

invpcid_flush_one(current_pcid, addr);

  becomes this with KAISER:

invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);

  and this without INVPCID:

__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);

So, for now, fully disable PCIDs with KAISER when INVPCID is not
available.  This is fixable, but it's an optimization that can be
performed later.

Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER.  The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does.  They can also be used as
a "KAISER syscall/interrupt accelerator".  If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.

Signed-off-by: Dave Hansen 
Signed-off-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: daniel.gr...@iaik.tugraz.at
Cc: hu...@google.com
Cc: keesc...@google.com
Cc: linux...@kvack.org
Cc: l...@kernel.org
Cc: michael.schw...@iaik.tugraz.at
Cc: moritz.l...@iaik.tugraz.at
Cc: richard.fell...@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/calling.h|  25 +++--
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/pgtable_types.h|  11 +++
 arch/x86/include/asm/tlbflush.h | 137 +++-
 arch/x86/include/uapi/asm/processor-flags.h |   3 +-
 arch/x86/kvm/x86.c  |   3 +-
 

[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster

2017-11-24 Thread Ingo Molnar
From: Dave Hansen 

Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls.  Speed them up.  Makes context switches
and TLB flushing slower.

Background:

KAISER keeps two copies of the page tables.  Switches between the
copies are performed by writing to the CR3 register.  But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages).  This TLB flush
increases the cost of interrupts and context switches.  For
syscall-heavy microbenchmarks it can cut the rate of syscalls by
2/3.

The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski.  This feature is intended to allow you to switch
between contexts without flushing the TLB.

Implementation:

PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.

First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID.  On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.

The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID.  An extra TLB flush operation is now required in
order to flush the user ASID.  This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.

Support:

PCIDs are generally available on Sandybridge and newer CPUs.  However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core).  This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.

Without INVPCID, PCIDs are much harder to use.  TLB invalidation gets
much more onerous:

1. Every kernel TLB flush (even for a single page) requires an
   interrupts-off MOV-to-CR4 which is very expensive.  This is because
   there is no way to flush a kernel address that might be loaded
   in *EVERY* PCID.  Right now, there are "only" ~12 of these per-cpu,
   but that's too painful to use the MOV-to-CR3 to flush them.  That
   leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
   following:
   a. A pair of flushing (bit 63 clear) CR3 writes: one for
  the kernel ASID and another for userspace.
   b. A pair of non-flushing CR3 writes (bit 63 set) with the
  flush done for each.  For instance, what is currently a
  single instruction without KAISER:

invpcid_flush_one(current_pcid, addr);

  becomes this with KAISER:

invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);

  and this without INVPCID:

__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);

So, for now, fully disable PCIDs with KAISER when INVPCID is not
available.  This is fixable, but it's an optimization that can be
performed later.

Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER.  The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does.  They can also be used as
a "KAISER syscall/interrupt accelerator".  If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.

Signed-off-by: Dave Hansen 
Signed-off-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: daniel.gr...@iaik.tugraz.at
Cc: hu...@google.com
Cc: keesc...@google.com
Cc: linux...@kvack.org
Cc: l...@kernel.org
Cc: michael.schw...@iaik.tugraz.at
Cc: moritz.l...@iaik.tugraz.at
Cc: richard.fell...@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/calling.h|  25 +++--
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/pgtable_types.h|  11 +++
 arch/x86/include/asm/tlbflush.h | 137 +++-
 arch/x86/include/uapi/asm/processor-flags.h |   3 +-
 arch/x86/kvm/x86.c  |   3 +-
 arch/x86/mm/init.c  |  75 ++-
 arch/x86/mm/tlb.c   |  66 +-
 8 files changed, 261 

[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster

2017-11-24 Thread Ingo Molnar
From: Dave Hansen 

Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls.  Speed them up.  Makes context switches
and TLB flushing slower.

Background:

KAISER keeps two copies of the page tables.  Switches between the
copies are performed by writing to the CR3 register.  But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages).  This TLB flush
increases the cost of interrupts and context switches.  For
syscall-heavy microbenchmarks it can cut the rate of syscalls by
2/3.

The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski.  This feature is intended to allow you to switch
between contexts without flushing the TLB.

Implementation:

PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.

First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID.  On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.

The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID.  An extra TLB flush operation is now required in
order to flush the user ASID.  This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.

Support:

PCIDs are generally available on Sandybridge and newer CPUs.  However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core).  This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.

Without INVPCID, PCIDs are much harder to use.  TLB invalidation gets
much more onerous:

1. Every kernel TLB flush (even for a single page) requires an
   interrupts-off MOV-to-CR4 which is very expensive.  This is because
   there is no way to flush a kernel address that might be loaded
   in *EVERY* PCID.  Right now, there are "only" ~12 of these per-cpu,
   but that's too painful to use the MOV-to-CR3 to flush them.  That
   leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
   following:
   a. A pair of flushing (bit 63 clear) CR3 writes: one for
  the kernel ASID and another for userspace.
   b. A pair of non-flushing CR3 writes (bit 63 set) with the
  flush done for each.  For instance, what is currently a
  single instruction without KAISER:

invpcid_flush_one(current_pcid, addr);

  becomes this with KAISER:

invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);

  and this without INVPCID:

__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);

So, for now, fully disable PCIDs with KAISER when INVPCID is not
available.  This is fixable, but it's an optimization that can be
performed later.

Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER.  The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does.  They can also be used as
a "KAISER syscall/interrupt accelerator".  If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.

Signed-off-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Daniel Gruss 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Hugh Dickins 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Michael Schwarz 
Cc: Moritz Lipp 
Cc: Peter Zijlstra 
Cc: Richard Fellner 
Cc: Thomas Gleixner 
Cc: linux...@kvack.org
Link: http://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/calling.h|  25 +++--
 

[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster

2017-11-24 Thread Ingo Molnar
From: Dave Hansen 

Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls.  Speed them up.  Makes context switches
and TLB flushing slower.

Background:

KAISER keeps two copies of the page tables.  Switches between the
copies are performed by writing to the CR3 register.  But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages).  This TLB flush
increases the cost of interrupts and context switches.  For
syscall-heavy microbenchmarks it can cut the rate of syscalls by
2/3.

The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski.  This feature is intended to allow you to switch
between contexts without flushing the TLB.

Implementation:

PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.

First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID.  On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.

The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID.  An extra TLB flush operation is now required in
order to flush the user ASID.  This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.

Support:

PCIDs are generally available on Sandybridge and newer CPUs.  However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core).  This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.

Without INVPCID, PCIDs are much harder to use.  TLB invalidation gets
much more onerous:

1. Every kernel TLB flush (even for a single page) requires an
   interrupts-off MOV-to-CR4 which is very expensive.  This is because
   there is no way to flush a kernel address that might be loaded
   in *EVERY* PCID.  Right now, there are "only" ~12 of these per-cpu,
   but that's too painful to use the MOV-to-CR3 to flush them.  That
   leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
   following:
   a. A pair of flushing (bit 63 clear) CR3 writes: one for
  the kernel ASID and another for userspace.
   b. A pair of non-flushing CR3 writes (bit 63 set) with the
  flush done for each.  For instance, what is currently a
  single instruction without KAISER:

invpcid_flush_one(current_pcid, addr);

  becomes this with KAISER:

invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);

  and this without INVPCID:

__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);

So, for now, fully disable PCIDs with KAISER when INVPCID is not
available.  This is fixable, but it's an optimization that can be
performed later.

Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER.  The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does.  They can also be used as
a "KAISER syscall/interrupt accelerator".  If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.

Signed-off-by: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Daniel Gruss 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Hugh Dickins 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Michael Schwarz 
Cc: Moritz Lipp 
Cc: Peter Zijlstra 
Cc: Richard Fellner 
Cc: Thomas Gleixner 
Cc: linux...@kvack.org
Link: http://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/calling.h|  25 +++--
 arch/x86/entry/entry_64.S   |   1 +
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/pgtable_types.h|  11 +++
 arch/x86/include/asm/tlbflush.h | 137 +++-
 arch/x86/include/uapi/asm/processor-flags.h |   3 +-
 arch/x86/kvm/x86.c  |   3 +-
 arch/x86/mm/init.c  |  75 ++-
 arch/x86/mm/tlb.c