[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
From: Dave HansenShort summary: Use x86 PCID feature to avoid flushing the TLB at all interrupts and syscalls. Speed them up. Makes context switches and TLB flushing slower. Background: KAISER keeps two copies of the page tables. Switches between the copies are performed by writing to the CR3 register. But, CR3 was really designed for context switches and writes to it also flush the entire TLB (modulo global pages). This TLB flush increases the cost of interrupts and context switches. For syscall-heavy microbenchmarks it can cut the rate of syscalls by 2/3. The kernel recently gained support for and Intel CPU feature called Process Context IDentifiers (PCID) thanks to Andy Lutomirski. This feature is intended to allow you to switch between contexts without flushing the TLB. Implementation: PCIDs can be used to avoid flushing the TLB at kernel entry/exit. This is speeds up both interrupts and syscalls. First, the kernel and userspace must be assigned different ASIDs. On entry from userspace, move over to the kernel page tables *and* ASID. On exit, restore the user page tables and ASID. Fortunately, the ASID is programmed via CR3, which is already being used to switch between the user and kernel page tables. This gives us convenient, one-stop shopping. The CR3 write which is used to switch between processes provides all the TLB flushing normally required at context switch time. But, with KAISER, that CR3 write only flushes the current (kernel) ASID. An extra TLB flush operation is now required in order to flush the user ASID. This new instruction (INVPCID) is probably ~100 cycles, but this is done with the assumption that the time lost in context switches is more than made up for by lower cost of interrupts and syscalls. Support: PCIDs are generally available on Sandybridge and newer CPUs. However, the accompanying INVPCID instruction did not become available until Haswell (the ones with "v4", or called fourth-generation Core). This instruction allows non-current-PCID TLB entries to be flushed without switching CR3 and global pages to be flushed without a double MOV-to-CR4. Without INVPCID, PCIDs are much harder to use. TLB invalidation gets much more onerous: 1. Every kernel TLB flush (even for a single page) requires an interrupts-off MOV-to-CR4 which is very expensive. This is because there is no way to flush a kernel address that might be loaded in *EVERY* PCID. Right now, there are "only" ~12 of these per-cpu, but that's too painful to use the MOV-to-CR3 to flush them. That leaves only the MOV-to-CR4. 2. Every userspace flush (even for a single page requires one of the following: a. A pair of flushing (bit 63 clear) CR3 writes: one for the kernel ASID and another for userspace. b. A pair of non-flushing CR3 writes (bit 63 set) with the flush done for each. For instance, what is currently a single instruction without KAISER: invpcid_flush_one(current_pcid, addr); becomes this with KAISER: invpcid_flush_one(current_kern_pcid, addr); invpcid_flush_one(current_user_pcid, addr); and this without INVPCID: __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_user_pcid | NOFLUSH); __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_kern_pcid | NOFLUSH); So, for now, fully disable PCIDs with KAISER when INVPCID is not available. This is fixable, but it's an optimization that can be performed later. Hugh Dickins also points out that PCIDs really have two distinct use-cases in the context of KAISER. The first way they can be used is as "TLB preservation across context-switch", which is what Andy Lutomirksi's 4.14 PCID code does. They can also be used as a "KAISER syscall/interrupt accelerator". If we just use them to speed up syscall/interrupts (and ignore the context-switch TLB preservation), then the deficiency of not having INVPCID becomes much less onerous. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: daniel.gr...@iaik.tugraz.at Cc: hu...@google.com Cc: keesc...@google.com Cc: linux...@kvack.org Cc: l...@kernel.org Cc: michael.schw...@iaik.tugraz.at Cc: moritz.l...@iaik.tugraz.at Cc: richard.fell...@student.tugraz.at Link: https://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h| 25 +++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/pgtable_types.h| 11 +++ arch/x86/include/asm/tlbflush.h | 137 +++- arch/x86/include/uapi/asm/processor-flags.h | 3 +- arch/x86/kvm/x86.c | 3 +-
[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
From: Dave Hansen Short summary: Use x86 PCID feature to avoid flushing the TLB at all interrupts and syscalls. Speed them up. Makes context switches and TLB flushing slower. Background: KAISER keeps two copies of the page tables. Switches between the copies are performed by writing to the CR3 register. But, CR3 was really designed for context switches and writes to it also flush the entire TLB (modulo global pages). This TLB flush increases the cost of interrupts and context switches. For syscall-heavy microbenchmarks it can cut the rate of syscalls by 2/3. The kernel recently gained support for and Intel CPU feature called Process Context IDentifiers (PCID) thanks to Andy Lutomirski. This feature is intended to allow you to switch between contexts without flushing the TLB. Implementation: PCIDs can be used to avoid flushing the TLB at kernel entry/exit. This is speeds up both interrupts and syscalls. First, the kernel and userspace must be assigned different ASIDs. On entry from userspace, move over to the kernel page tables *and* ASID. On exit, restore the user page tables and ASID. Fortunately, the ASID is programmed via CR3, which is already being used to switch between the user and kernel page tables. This gives us convenient, one-stop shopping. The CR3 write which is used to switch between processes provides all the TLB flushing normally required at context switch time. But, with KAISER, that CR3 write only flushes the current (kernel) ASID. An extra TLB flush operation is now required in order to flush the user ASID. This new instruction (INVPCID) is probably ~100 cycles, but this is done with the assumption that the time lost in context switches is more than made up for by lower cost of interrupts and syscalls. Support: PCIDs are generally available on Sandybridge and newer CPUs. However, the accompanying INVPCID instruction did not become available until Haswell (the ones with "v4", or called fourth-generation Core). This instruction allows non-current-PCID TLB entries to be flushed without switching CR3 and global pages to be flushed without a double MOV-to-CR4. Without INVPCID, PCIDs are much harder to use. TLB invalidation gets much more onerous: 1. Every kernel TLB flush (even for a single page) requires an interrupts-off MOV-to-CR4 which is very expensive. This is because there is no way to flush a kernel address that might be loaded in *EVERY* PCID. Right now, there are "only" ~12 of these per-cpu, but that's too painful to use the MOV-to-CR3 to flush them. That leaves only the MOV-to-CR4. 2. Every userspace flush (even for a single page requires one of the following: a. A pair of flushing (bit 63 clear) CR3 writes: one for the kernel ASID and another for userspace. b. A pair of non-flushing CR3 writes (bit 63 set) with the flush done for each. For instance, what is currently a single instruction without KAISER: invpcid_flush_one(current_pcid, addr); becomes this with KAISER: invpcid_flush_one(current_kern_pcid, addr); invpcid_flush_one(current_user_pcid, addr); and this without INVPCID: __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_user_pcid | NOFLUSH); __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_kern_pcid | NOFLUSH); So, for now, fully disable PCIDs with KAISER when INVPCID is not available. This is fixable, but it's an optimization that can be performed later. Hugh Dickins also points out that PCIDs really have two distinct use-cases in the context of KAISER. The first way they can be used is as "TLB preservation across context-switch", which is what Andy Lutomirksi's 4.14 PCID code does. They can also be used as a "KAISER syscall/interrupt accelerator". If we just use them to speed up syscall/interrupts (and ignore the context-switch TLB preservation), then the deficiency of not having INVPCID becomes much less onerous. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: daniel.gr...@iaik.tugraz.at Cc: hu...@google.com Cc: keesc...@google.com Cc: linux...@kvack.org Cc: l...@kernel.org Cc: michael.schw...@iaik.tugraz.at Cc: moritz.l...@iaik.tugraz.at Cc: richard.fell...@student.tugraz.at Link: https://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h| 25 +++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/pgtable_types.h| 11 +++ arch/x86/include/asm/tlbflush.h | 137 +++- arch/x86/include/uapi/asm/processor-flags.h | 3 +- arch/x86/kvm/x86.c | 3 +- arch/x86/mm/init.c | 75 ++- arch/x86/mm/tlb.c | 66 +- 8 files changed, 261
[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
From: Dave HansenShort summary: Use x86 PCID feature to avoid flushing the TLB at all interrupts and syscalls. Speed them up. Makes context switches and TLB flushing slower. Background: KAISER keeps two copies of the page tables. Switches between the copies are performed by writing to the CR3 register. But, CR3 was really designed for context switches and writes to it also flush the entire TLB (modulo global pages). This TLB flush increases the cost of interrupts and context switches. For syscall-heavy microbenchmarks it can cut the rate of syscalls by 2/3. The kernel recently gained support for and Intel CPU feature called Process Context IDentifiers (PCID) thanks to Andy Lutomirski. This feature is intended to allow you to switch between contexts without flushing the TLB. Implementation: PCIDs can be used to avoid flushing the TLB at kernel entry/exit. This is speeds up both interrupts and syscalls. First, the kernel and userspace must be assigned different ASIDs. On entry from userspace, move over to the kernel page tables *and* ASID. On exit, restore the user page tables and ASID. Fortunately, the ASID is programmed via CR3, which is already being used to switch between the user and kernel page tables. This gives us convenient, one-stop shopping. The CR3 write which is used to switch between processes provides all the TLB flushing normally required at context switch time. But, with KAISER, that CR3 write only flushes the current (kernel) ASID. An extra TLB flush operation is now required in order to flush the user ASID. This new instruction (INVPCID) is probably ~100 cycles, but this is done with the assumption that the time lost in context switches is more than made up for by lower cost of interrupts and syscalls. Support: PCIDs are generally available on Sandybridge and newer CPUs. However, the accompanying INVPCID instruction did not become available until Haswell (the ones with "v4", or called fourth-generation Core). This instruction allows non-current-PCID TLB entries to be flushed without switching CR3 and global pages to be flushed without a double MOV-to-CR4. Without INVPCID, PCIDs are much harder to use. TLB invalidation gets much more onerous: 1. Every kernel TLB flush (even for a single page) requires an interrupts-off MOV-to-CR4 which is very expensive. This is because there is no way to flush a kernel address that might be loaded in *EVERY* PCID. Right now, there are "only" ~12 of these per-cpu, but that's too painful to use the MOV-to-CR3 to flush them. That leaves only the MOV-to-CR4. 2. Every userspace flush (even for a single page requires one of the following: a. A pair of flushing (bit 63 clear) CR3 writes: one for the kernel ASID and another for userspace. b. A pair of non-flushing CR3 writes (bit 63 set) with the flush done for each. For instance, what is currently a single instruction without KAISER: invpcid_flush_one(current_pcid, addr); becomes this with KAISER: invpcid_flush_one(current_kern_pcid, addr); invpcid_flush_one(current_user_pcid, addr); and this without INVPCID: __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_user_pcid | NOFLUSH); __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_kern_pcid | NOFLUSH); So, for now, fully disable PCIDs with KAISER when INVPCID is not available. This is fixable, but it's an optimization that can be performed later. Hugh Dickins also points out that PCIDs really have two distinct use-cases in the context of KAISER. The first way they can be used is as "TLB preservation across context-switch", which is what Andy Lutomirksi's 4.14 PCID code does. They can also be used as a "KAISER syscall/interrupt accelerator". If we just use them to speed up syscall/interrupts (and ignore the context-switch TLB preservation), then the deficiency of not having INVPCID becomes much less onerous. Signed-off-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Daniel Gruss Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Hugh Dickins Cc: Josh Poimboeuf Cc: Kees Cook Cc: Linus Torvalds Cc: Michael Schwarz Cc: Moritz Lipp Cc: Peter Zijlstra Cc: Richard Fellner Cc: Thomas Gleixner Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h| 25 +++--
[PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
From: Dave Hansen Short summary: Use x86 PCID feature to avoid flushing the TLB at all interrupts and syscalls. Speed them up. Makes context switches and TLB flushing slower. Background: KAISER keeps two copies of the page tables. Switches between the copies are performed by writing to the CR3 register. But, CR3 was really designed for context switches and writes to it also flush the entire TLB (modulo global pages). This TLB flush increases the cost of interrupts and context switches. For syscall-heavy microbenchmarks it can cut the rate of syscalls by 2/3. The kernel recently gained support for and Intel CPU feature called Process Context IDentifiers (PCID) thanks to Andy Lutomirski. This feature is intended to allow you to switch between contexts without flushing the TLB. Implementation: PCIDs can be used to avoid flushing the TLB at kernel entry/exit. This is speeds up both interrupts and syscalls. First, the kernel and userspace must be assigned different ASIDs. On entry from userspace, move over to the kernel page tables *and* ASID. On exit, restore the user page tables and ASID. Fortunately, the ASID is programmed via CR3, which is already being used to switch between the user and kernel page tables. This gives us convenient, one-stop shopping. The CR3 write which is used to switch between processes provides all the TLB flushing normally required at context switch time. But, with KAISER, that CR3 write only flushes the current (kernel) ASID. An extra TLB flush operation is now required in order to flush the user ASID. This new instruction (INVPCID) is probably ~100 cycles, but this is done with the assumption that the time lost in context switches is more than made up for by lower cost of interrupts and syscalls. Support: PCIDs are generally available on Sandybridge and newer CPUs. However, the accompanying INVPCID instruction did not become available until Haswell (the ones with "v4", or called fourth-generation Core). This instruction allows non-current-PCID TLB entries to be flushed without switching CR3 and global pages to be flushed without a double MOV-to-CR4. Without INVPCID, PCIDs are much harder to use. TLB invalidation gets much more onerous: 1. Every kernel TLB flush (even for a single page) requires an interrupts-off MOV-to-CR4 which is very expensive. This is because there is no way to flush a kernel address that might be loaded in *EVERY* PCID. Right now, there are "only" ~12 of these per-cpu, but that's too painful to use the MOV-to-CR3 to flush them. That leaves only the MOV-to-CR4. 2. Every userspace flush (even for a single page requires one of the following: a. A pair of flushing (bit 63 clear) CR3 writes: one for the kernel ASID and another for userspace. b. A pair of non-flushing CR3 writes (bit 63 set) with the flush done for each. For instance, what is currently a single instruction without KAISER: invpcid_flush_one(current_pcid, addr); becomes this with KAISER: invpcid_flush_one(current_kern_pcid, addr); invpcid_flush_one(current_user_pcid, addr); and this without INVPCID: __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_user_pcid | NOFLUSH); __native_flush_tlb_single(addr); write_cr3(mm->pgd | current_kern_pcid | NOFLUSH); So, for now, fully disable PCIDs with KAISER when INVPCID is not available. This is fixable, but it's an optimization that can be performed later. Hugh Dickins also points out that PCIDs really have two distinct use-cases in the context of KAISER. The first way they can be used is as "TLB preservation across context-switch", which is what Andy Lutomirksi's 4.14 PCID code does. They can also be used as a "KAISER syscall/interrupt accelerator". If we just use them to speed up syscall/interrupts (and ignore the context-switch TLB preservation), then the deficiency of not having INVPCID becomes much less onerous. Signed-off-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Daniel Gruss Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Hugh Dickins Cc: Josh Poimboeuf Cc: Kees Cook Cc: Linus Torvalds Cc: Michael Schwarz Cc: Moritz Lipp Cc: Peter Zijlstra Cc: Richard Fellner Cc: Thomas Gleixner Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/20171123003509.ec42d...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h| 25 +++-- arch/x86/entry/entry_64.S | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/pgtable_types.h| 11 +++ arch/x86/include/asm/tlbflush.h | 137 +++- arch/x86/include/uapi/asm/processor-flags.h | 3 +- arch/x86/kvm/x86.c | 3 +- arch/x86/mm/init.c | 75 ++- arch/x86/mm/tlb.c