[tip:x86/fpu] x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
Commit-ID: 3c1d32300920a446c67d697cd6b80f012ad06028 Gitweb: http://git.kernel.org/tip/3c1d32300920a446c67d697cd6b80f012ad06028 Author: Qiaowei Ren AuthorDate: Sun, 7 Jun 2015 11:37:02 -0700 Committer: Ingo Molnar CommitDate: Tue, 9 Jun 2015 12:24:30 +0200 x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes redundant one. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Reviewed-by: Thomas Gleixner Cc: Andrew Morton Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20150607183702.5f129...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/mpx.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 0cdd16a..871e5e5 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -45,7 +45,6 @@ #define MPX_BNDSTA_TAIL2 #define MPX_BNDCFG_TAIL12 #define MPX_BNDSTA_ADDR_MASK (~((1UL<http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/fpu] x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
Commit-ID: 3c1d32300920a446c67d697cd6b80f012ad06028 Gitweb: http://git.kernel.org/tip/3c1d32300920a446c67d697cd6b80f012ad06028 Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Sun, 7 Jun 2015 11:37:02 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 9 Jun 2015 12:24:30 +0200 x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes redundant one. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Reviewed-by: Thomas Gleixner t...@linutronix.de Cc: Andrew Morton a...@linux-foundation.org Cc: Dave Hansen d...@sr71.net Cc: H. Peter Anvin h...@zytor.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/20150607183702.5f129...@viggo.jf.intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/x86/include/asm/mpx.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 0cdd16a..871e5e5 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -45,7 +45,6 @@ #define MPX_BNDSTA_TAIL2 #define MPX_BNDCFG_TAIL12 #define MPX_BNDSTA_ADDR_MASK (~((1ULMPX_BNDSTA_TAIL)-1)) -#define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) #define MPX_BT_ADDR_MASK (~((1ULMPX_BD_ENTRY_TAIL)-1)) #define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] x86, mpx: Add documentation on Intel MPX
Commit-ID: 5776563648f6437ede91c91cbad85862ca682b0b Gitweb: http://git.kernel.org/tip/5776563648f6437ede91c91cbad85862ca682b0b Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:32 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:54 +0100 x86, mpx: Add documentation on Intel MPX This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151832.7fdb1...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- Documentation/x86/intel_mpx.txt | 234 1 file changed, 234 insertions(+) diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..4472ed2 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,234 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability +introduced into Intel Architecture. Intel MPX provides hardware features +that can be used in conjunction with compiler changes to check memory +references, for those references whose compile-time normal intentions are +usurped at runtime due to buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture Instruction +Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection +Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, which +can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How to get the advantage of MPX +== + +For MPX to work, changes are required in the kernel, binutils and compiler. +No source changes are required for applications, just a recompile. + +There are a lot of moving parts of this to all work right. The following +is how we expect the compiler, application and kernel to work together. + +1) Application developer compiles with -fmpx. The compiler will add the + instrumentation as well as some setup code called early after the app + starts. New instruction prefixes are noops for old CPUs. +2) That setup code allocates (virtual) space for the "bounds directory", + points the "bndcfgu" register to the directory and notifies the kernel + (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using + MPX. +3) The kernel detects that the CPU has MPX, allows the new prctl() to + succeed, and notes the location of the bounds directory. Userspace is + expected to keep the bounds directory at that locationWe note it + instead of reading it each time because the 'xsave' operation needed + to access the bounds directory register is an expensive operation. +4) If the application needs to spill bounds out of the 4 registers, it + issues a bndstx instruction. Since the bounds directory is empty at + this point, a bounds fault (#BR) is raised, the kernel allocates a + bounds table (in the user address space) and makes the relevant entry + in the bounds directory point to the new table. +5) If the application violates the bounds specified in the bounds registers, + a separate kind of #BR is raised which will deliver a signal with + information about the violation in the 'struct siginfo'. +6) Whenever memory is freed, we know that it can no longer contain valid + pointers, and we attempt to free the associated space in the bounds + tables. If an entire table becomes unused, we will attempt to free + the table and remove the entry in the directory. + +To summarize, there are essentially three things interacting here: + +GCC with -fmpx: + * enables annotation of code with MPX instructions and prefixes + * inserts code early in the application to call in to the "gcc runtime" +GCC MPX Runtime: + * Checks for hardware MPX support in cpuid leaf + * allocates virtual space for the bounds directory (malloc() essentially) + * points the hardware BNDCFGU register at the directory + * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to + start managing the bounds directories +Kernel MPX Code: + * Checks for hardware MPX support in cpuid leaf + * Handles #BR exceptions and sends SIGSEGV to the app when it violates + bounds, like during a buffer overflow. + * When bounds are spilled in to an unallocated bounds table, the kernel + notices in the #BR exception, allocates the virtual space, then + updates the bounds directory to point to the new table. It keeps + special track of the memory with a VM_MPX flag. + * Frees unused bounds tables at the time that the memory they described + is unmapped. + + +3. How does MPX kernel code work + + +Han
[tip:x86/mpx] x86, mpx: Add MPX-specific mmap interface
Commit-ID: 57319d80e1d328e34cb24868a4f4405661485e30 Gitweb: http://git.kernel.org/tip/57319d80e1d328e34cb24868a4f4405661485e30 Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:27 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 x86, mpx: Add MPX-specific mmap interface We have chosen to perform the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables") and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151827.ce440...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- arch/x86/Kconfig | 4 +++ arch/x86/include/asm/mpx.h | 36 +++ arch/x86/mm/Makefile | 2 ++ arch/x86/mm/mpx.c | 86 ++ 4 files changed, 128 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ded8a67..967dfe0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -248,6 +248,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..7d7c5f5 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,36 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..72d13b0 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,86 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren + * Dave Hansen + */ +#include +#include +#include + +#include +#include + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return "[mpx]"; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * This is really a simplified "vm_mmap". it only handles MPX + * bounds tables (the bounds directory is user-allocated). + * + * Later on, we use the vma->vm_ops to uniquely identify these + * VMAs. + */ +static unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(>mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE
[tip:x86/mpx] x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
Commit-ID: 4aae7e436fa51faf4bf5d11b175aea82cfe8224a Gitweb: http://git.kernel.org/tip/4aae7e436fa51faf4bf5d11b175aea82cfe8224a Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:25 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific MPX-enabled applications using large swaths of memory can potentially have large numbers of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. Being this huge, our expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. So we need a way to track memory use for MPX. If we want to specifically track MPX VMAs we need to be able to distinguish them from normal VMAs, and keep them from getting merged with normal VMAs. A new VM_ flag set only on MPX VMAs does both of those things. With this flag, MPX bounds-table VMAs can be distinguished from other VMAs, and userspace can also walk /proc/$pid/smaps to get memory usage for MPX. In addition to this flag, we also introduce a special ->vm_ops specific to MPX VMAs (see the patch "add MPX specific mmap interface"), but currently different ->vm_ops do not by themselves prevent VMA merging, so we still need this flag. We understand that VM_ flags are scarce and are open to other options. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151825.56562...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 6 ++ 2 files changed, 9 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 4e0388c..f6734c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -552,6 +552,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", +#ifdef CONFIG_X86_INTEL_MPX + [ilog2(VM_MPX)] = "mp", +#endif [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)]= "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index b464611..f7606d3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -128,6 +128,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -155,6 +156,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] mips: Sync struct siginfo with general version
Commit-ID: 232b5fff5bad78ad00b94153fa90ca53bef6a444 Gitweb: http://git.kernel.org/tip/232b5fff5bad78ad00b94153fa90ca53bef6a444 Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:20 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 mips: Sync struct siginfo with general version New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for MIPS with general version. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151820.f7edc...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- arch/mips/include/uapi/asm/siginfo.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] ia64: Sync struct siginfo with general version
Commit-ID: 53f037b08b5bebf47aa2b574a984e2f9fc7926f2 Gitweb: http://git.kernel.org/tip/53f037b08b5bebf47aa2b574a984e2f9fc7926f2 Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:22 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 ia64: Sync struct siginfo with general version New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for IA64 with general version. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151822.82b3b...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- arch/ia64/include/uapi/asm/siginfo.h | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/uapi/asm/siginfo.h b/arch/ia64/include/uapi/asm/siginfo.h index 4ea6225..bce9bc1 100644 --- a/arch/ia64/include/uapi/asm/siginfo.h +++ b/arch/ia64/include/uapi/asm/siginfo.h @@ -63,6 +63,10 @@ typedef struct siginfo { unsigned int _flags;/* see below */ unsigned long _isr; /* isr */ short _addr_lsb;/* lsb of faulting address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -110,9 +114,9 @@ typedef struct siginfo { /* * SIGSEGV si_codes */ -#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */ +#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */ #undef NSIGSEGV -#define NSIGSEGV 3 +#define NSIGSEGV 4 #undef NSIGTRAP #define NSIGTRAP 4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] mpx: Extend siginfo structure to include bound violation information
Commit-ID: ee1b58d36aa1b5a79eaba11f5c3633c88231da83 Gitweb: http://git.kernel.org/tip/ee1b58d36aa1b5a79eaba11f5c3633c88231da83 Author: Qiaowei Ren AuthorDate: Fri, 14 Nov 2014 07:18:19 -0800 Committer: Thomas Gleixner CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 mpx: Extend siginfo structure to include bound violation information This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren Signed-off-by: Dave Hansen Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen Link: http://lkml.kernel.org/r/20141114151819.1908c...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- include/uapi/asm-generic/siginfo.h | 9 - kernel/signal.c| 4 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, >si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, >si_lower); + err |= __put_user(from->si_upper, >si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, >si_pid); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] mpx: Extend siginfo structure to include bound violation information
Commit-ID: ee1b58d36aa1b5a79eaba11f5c3633c88231da83 Gitweb: http://git.kernel.org/tip/ee1b58d36aa1b5a79eaba11f5c3633c88231da83 Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:19 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 mpx: Extend siginfo structure to include bound violation information This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151819.1908c...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/uapi/asm-generic/siginfo.h | 9 - kernel/signal.c| 4 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from-si_code == BUS_MCEERR_AR || from-si_code == BUS_MCEERR_AO) err |= __put_user(from-si_addr_lsb, to-si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from-si_lower, to-si_lower); + err |= __put_user(from-si_upper, to-si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from-si_pid, to-si_pid); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] ia64: Sync struct siginfo with general version
Commit-ID: 53f037b08b5bebf47aa2b574a984e2f9fc7926f2 Gitweb: http://git.kernel.org/tip/53f037b08b5bebf47aa2b574a984e2f9fc7926f2 Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:22 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 ia64: Sync struct siginfo with general version New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for IA64 with general version. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151822.82b3b...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/ia64/include/uapi/asm/siginfo.h | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/uapi/asm/siginfo.h b/arch/ia64/include/uapi/asm/siginfo.h index 4ea6225..bce9bc1 100644 --- a/arch/ia64/include/uapi/asm/siginfo.h +++ b/arch/ia64/include/uapi/asm/siginfo.h @@ -63,6 +63,10 @@ typedef struct siginfo { unsigned int _flags;/* see below */ unsigned long _isr; /* isr */ short _addr_lsb;/* lsb of faulting address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -110,9 +114,9 @@ typedef struct siginfo { /* * SIGSEGV si_codes */ -#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */ +#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */ #undef NSIGSEGV -#define NSIGSEGV 3 +#define NSIGSEGV 4 #undef NSIGTRAP #define NSIGTRAP 4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] mips: Sync struct siginfo with general version
Commit-ID: 232b5fff5bad78ad00b94153fa90ca53bef6a444 Gitweb: http://git.kernel.org/tip/232b5fff5bad78ad00b94153fa90ca53bef6a444 Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:20 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 mips: Sync struct siginfo with general version New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for MIPS with general version. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151820.f7edc...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/mips/include/uapi/asm/siginfo.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
Commit-ID: 4aae7e436fa51faf4bf5d11b175aea82cfe8224a Gitweb: http://git.kernel.org/tip/4aae7e436fa51faf4bf5d11b175aea82cfe8224a Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:25 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific MPX-enabled applications using large swaths of memory can potentially have large numbers of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. Being this huge, our expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. So we need a way to track memory use for MPX. If we want to specifically track MPX VMAs we need to be able to distinguish them from normal VMAs, and keep them from getting merged with normal VMAs. A new VM_ flag set only on MPX VMAs does both of those things. With this flag, MPX bounds-table VMAs can be distinguished from other VMAs, and userspace can also walk /proc/$pid/smaps to get memory usage for MPX. In addition to this flag, we also introduce a special -vm_ops specific to MPX VMAs (see the patch add MPX specific mmap interface), but currently different -vm_ops do not by themselves prevent VMA merging, so we still need this flag. We understand that VM_ flags are scarce and are open to other options. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151825.56562...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 6 ++ 2 files changed, 9 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 4e0388c..f6734c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -552,6 +552,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = gd, [ilog2(VM_PFNMAP)] = pf, [ilog2(VM_DENYWRITE)] = dw, +#ifdef CONFIG_X86_INTEL_MPX + [ilog2(VM_MPX)] = mp, +#endif [ilog2(VM_LOCKED)] = lo, [ilog2(VM_IO)] = io, [ilog2(VM_SEQ_READ)]= sr, diff --git a/include/linux/mm.h b/include/linux/mm.h index b464611..f7606d3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -128,6 +128,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -155,6 +156,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mpx] x86, mpx: Add MPX-specific mmap interface
Commit-ID: 57319d80e1d328e34cb24868a4f4405661485e30 Gitweb: http://git.kernel.org/tip/57319d80e1d328e34cb24868a4f4405661485e30 Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:27 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:53 +0100 x86, mpx: Add MPX-specific mmap interface We have chosen to perform the allocation of bounds tables in kernel (See the patch on-demand kernel allocation of bounds tables) and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified -vm_ops or -vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151827.ce440...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/x86/Kconfig | 4 +++ arch/x86/include/asm/mpx.h | 36 +++ arch/x86/mm/Makefile | 2 ++ arch/x86/mm/mpx.c | 86 ++ 4 files changed, 128 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ded8a67..967dfe0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -248,6 +248,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..7d7c5f5 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,36 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include linux/types.h +#include asm/ptrace.h + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..72d13b0 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,86 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren qiaowei@intel.com + * Dave Hansen dave.han...@intel.com + */ +#include linux/kernel.h +#include linux/syscalls.h +#include linux/sched/sysctl.h + +#include asm/mman.h +#include asm/mpx.h + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return [mpx]; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * This is really a simplified vm_mmap. it only handles MPX + * bounds tables (the bounds directory is user-allocated). + * + * Later on, we use the vma-vm_ops to uniquely identify these + * VMAs. + */ +static unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current-mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(mm-mmap_sem); + + /* Too many mappings? */ + if (mm-map_count sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid
[tip:x86/mpx] x86, mpx: Add documentation on Intel MPX
Commit-ID: 5776563648f6437ede91c91cbad85862ca682b0b Gitweb: http://git.kernel.org/tip/5776563648f6437ede91c91cbad85862ca682b0b Author: Qiaowei Ren qiaowei@intel.com AuthorDate: Fri, 14 Nov 2014 07:18:32 -0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 18 Nov 2014 00:58:54 +0100 x86, mpx: Add documentation on Intel MPX This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: linux...@kvack.org Cc: linux-m...@linux-mips.org Cc: Dave Hansen d...@sr71.net Link: http://lkml.kernel.org/r/20141114151832.7fdb1...@viggo.jf.intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- Documentation/x86/intel_mpx.txt | 234 1 file changed, 234 insertions(+) diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..4472ed2 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,234 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability +introduced into Intel Architecture. Intel MPX provides hardware features +that can be used in conjunction with compiler changes to check memory +references, for those references whose compile-time normal intentions are +usurped at runtime due to buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture Instruction +Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection +Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, which +can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How to get the advantage of MPX +== + +For MPX to work, changes are required in the kernel, binutils and compiler. +No source changes are required for applications, just a recompile. + +There are a lot of moving parts of this to all work right. The following +is how we expect the compiler, application and kernel to work together. + +1) Application developer compiles with -fmpx. The compiler will add the + instrumentation as well as some setup code called early after the app + starts. New instruction prefixes are noops for old CPUs. +2) That setup code allocates (virtual) space for the bounds directory, + points the bndcfgu register to the directory and notifies the kernel + (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using + MPX. +3) The kernel detects that the CPU has MPX, allows the new prctl() to + succeed, and notes the location of the bounds directory. Userspace is + expected to keep the bounds directory at that locationWe note it + instead of reading it each time because the 'xsave' operation needed + to access the bounds directory register is an expensive operation. +4) If the application needs to spill bounds out of the 4 registers, it + issues a bndstx instruction. Since the bounds directory is empty at + this point, a bounds fault (#BR) is raised, the kernel allocates a + bounds table (in the user address space) and makes the relevant entry + in the bounds directory point to the new table. +5) If the application violates the bounds specified in the bounds registers, + a separate kind of #BR is raised which will deliver a signal with + information about the violation in the 'struct siginfo'. +6) Whenever memory is freed, we know that it can no longer contain valid + pointers, and we attempt to free the associated space in the bounds + tables. If an entire table becomes unused, we will attempt to free + the table and remove the entry in the directory. + +To summarize, there are essentially three things interacting here: + +GCC with -fmpx: + * enables annotation of code with MPX instructions and prefixes + * inserts code early in the application to call in to the gcc runtime +GCC MPX Runtime: + * Checks for hardware MPX support in cpuid leaf + * allocates virtual space for the bounds directory (malloc() essentially) + * points the hardware BNDCFGU register at the directory + * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to + start managing the bounds directories +Kernel MPX Code: + * Checks for hardware MPX support in cpuid leaf + * Handles #BR exceptions and sends SIGSEGV to the app when it violates + bounds, like during a buffer overflow. + * When bounds are spilled in to an unallocated bounds table, the kernel + notices in the #BR exception, allocates the virtual space, then + updates the bounds directory to point to the new table. It keeps + special track of the memory with a VM_MPX flag. + * Frees unused bounds tables at the time that the memory they described + is unmapped. + + +3. How does MPX
[PATCH v9 06/12] mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, >si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, >si_lower); + err |= __put_user(from->si_upper, >si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, >si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 04/12] x86, mpx: add MPX to disaabled features
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as both a runtime and compile-time check. When CONFIG_X86_INTEL_MPX is disabled, cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid flag will be checked at runtime. This patch must be applied after another Dave's commit: 381aa07a9b4e1f82969203e9e4863da2a157781d Signed-off-by: Dave Hansen Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/disabled-features.h |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 97534a7..f226df0 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -10,6 +10,12 @@ * cpu_feature_enabled(). */ +#ifdef CONFIG_X86_INTEL_MPX +# define DISABLE_MPX 0 +#else +# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -34,6 +40,6 @@ #define DISABLED_MASK6 0 #define DISABLED_MASK7 0 #define DISABLED_MASK8 0 -#define DISABLED_MASK9 0 +#define DISABLED_MASK9 (DISABLE_MPX) #endif /* _ASM_X86_DISABLED_FEATURES_H */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 07/12] mips: sync struct siginfo with general version
New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for MIPS with general version. Signed-off-by: Qiaowei Ren --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 05/12] x86, mpx: on-demand kernel allocation of bounds tables
MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables *need* to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. Why not do this in userspace? This patch is obviously doing this allocation in the kernel. However, MPX does not strictly *require* anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this *could* be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are *HUGE*. An X-GB virtual area needs 4*X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual *AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 20 + arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 101 arch/x86/kernel/traps.c| 52 ++- 4 files changed, 173 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1UL< + * Dave Hansen + */ + +#include +#include +#include + +/* + * With 32-bit mode, MPX_BT_SIZE_BYTES is 4MB, and the size of each + * bounds table is 16KB. With 64-bit mode, MPX_BT_SIZE_BYTES is 2GB, + * and the size of each bounds table is 4MB. + */ +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr; + unsigned long expected_old_val = 0; + unsigned long actual_old_val = 0; + int ret = 0; + + /* +* Carve the virtual space out of userspace for the new +* bounds table: +*/ + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) + return PTR_ERR((void *)bt_addr); + /* +* Set the valid flag (kinda like _PAGE_PRESENT in a pte) +*/ + bt_addr = bt_addr | MPX_BD_ENTRY_VALID_FLAG; + + /* +* Go poke the address of the new bounds table in to the +* bounds directory entry out in userspace memory. Note: +
[PATCH v9 03/12] x86, mpx: add MPX specific mmap interface
We have to do the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables"). Moreover, if we want to track MPX VMAs we need to be able to stick new VM_MPX flag and a specific vm_ops for MPX in the vma_area_struct. But there are not suitable interfaces to do this in current kernel. Existing interfaces, like do_mmap_pgoff(), could not stick specific ->vm_ops in the vma_area_struct when a VMA is created. So, this patch adds MPX specific mmap interface to do the allocation of bounds tables. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b663e1..e5bcc70 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -243,6 +243,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include +#include +#include +#include +#include + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return "[mpx]"; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified "vm_mmap". it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(>mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr >> PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out; + + vma = find_vma(mm, ret); + if (!vma) { + ret = -ENOMEM; + goto out; + } + vma->vm_ops = _vma_ops; + + if (vm_flags & VM_LOCKED)
[PATCH v9 01/12] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled applications using large swaths of memory can potentially have large numbers of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. Being this huge, our expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. So we need a way to track memory use for MPX. If we want to specifically track MPX VMAs we need to be able to distinguish them from normal VMAs, and keep them from getting merged with normal VMAs. A new VM_ flag set only on MPX VMAs does both of those things. With this flag, MPX bounds-table VMAs can be distinguished from other VMAs, and userspace can also walk /proc/$pid/smaps to get memory usage for MPX. Except this flag, we also introduce a specific ->vm_ops for MPX VMAs (see the patch "add MPX specific mmap interface"), but currently vmas with different ->vm_ops could be not prevented from merging. We understand that VM_ flags are scarce and are open to other options. Signed-off-by: Qiaowei Ren --- fs/proc/task_mmu.c |1 + include/linux/mm.h |6 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..cc31520 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", + [ilog2(VM_MPX)] = "mp", [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)]= "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..942be8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 08/12] ia64: sync struct siginfo with general version
New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for IA64 with general version. Signed-off-by: Qiaowei Ren --- arch/ia64/include/uapi/asm/siginfo.h |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/uapi/asm/siginfo.h b/arch/ia64/include/uapi/asm/siginfo.h index 4ea6225..bce9bc1 100644 --- a/arch/ia64/include/uapi/asm/siginfo.h +++ b/arch/ia64/include/uapi/asm/siginfo.h @@ -63,6 +63,10 @@ typedef struct siginfo { unsigned int _flags;/* see below */ unsigned long _isr; /* isr */ short _addr_lsb;/* lsb of faulting address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -110,9 +114,9 @@ typedef struct siginfo { /* * SIGSEGV si_codes */ -#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */ +#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */ #undef NSIGSEGV -#define NSIGSEGV 3 +#define NSIGSEGV 4 #undef NSIGTRAP #define NSIGTRAP 4 -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 12/12] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren --- Documentation/x86/intel_mpx.txt | 245 +++ 1 files changed, 245 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..3c20a17 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,245 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability +introduced into Intel Architecture. Intel MPX provides hardware features +that can be used in conjunction with compiler changes to check memory +references, for those references whose compile-time normal intentions are +usurped at runtime due to buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture Instruction +Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection +Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, which +can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How to get the advantage of MPX +== + +For MPX to work, changes are required in the kernel, binutils and compiler. +No source changes are required for applications, just a recompile. + +There are a lot of moving parts of this to all work right. The following +is how we expect the compiler, application and kernel to work together. + +1) Application developer compiles with -fmpx. The compiler will add the + instrumentation as well as some setup code called early after the app + starts. New instruction prefixes are noops for old CPUs. +2) That setup code allocates (virtual) space for the "bounds directory", + points the "bndcfgu" register to the directory and notifies the kernel + (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using + MPX. +3) The kernel detects that the CPU has MPX, allows the new prctl() to + succeed, and notes the location of the bounds directory. Userspace is + expected to keep the bounds directory at that locationWe note it + instead of reading it each time because the 'xsave' operation needed + to access the bounds directory register is an expensive operation. +4) If the application needs to spill bounds out of the 4 registers, it + issues a bndstx instruction. Since the bounds directory is empty at + this point, a bounds fault (#BR) is raised, the kernel allocates a + bounds table (in the user address space) and makes the relevant entry + in the bounds directory point to the new table. +5) If the application violates the bounds specified in the bounds registers, + a separate kind of #BR is raised which will deliver a signal with + information about the violation in the 'struct siginfo'. +6) Whenever memory is freed, we know that it can no longer contain valid + pointers, and we attempt to free the associated space in the bounds + tables. If an entire table becomes unused, we will attempt to free + the table and remove the entry in the directory. + +To summarize, there are essentially three things interacting here: + +GCC with -fmpx: + * enables annotation of code with MPX instructions and prefixes + * inserts code early in the application to call in to the "gcc runtime" +GCC MPX Runtime: + * Checks for hardware MPX support in cpuid leaf + * allocates virtual space for the bounds directory (malloc() essentially) + * points the hardware BNDCFGU register at the directory + * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to + start managing the bounds directories +Kernel MPX Code: + * Checks for hardware MPX support in cpuid leaf + * Handles #BR exceptions and sends SIGSEGV to the app when it violates + bounds, like during a buffer overflow. + * When bounds are spilled in to an unallocated bounds table, the kernel + notices in the #BR exception, allocates the virtual space, then + updates the bounds directory to point to the new table. It keeps + special track of the memory with a VM_MPX flag. + * Frees unused bounds tables at the time that the memory they described + is unmapped. + + +3. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * new bounds tables (BT) need to be allocated to save bounds. + * bounds violation caused by MPX instructions. + +We hook #BR handler to handle these two new situations. + +On-demand kernel allocation of bounds tables + + +MPX only has 4 hardware registers for storing bounds informati
[PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include #include +#include #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 2103b5e..b7e4c0e 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -10,6 +10,275 @@ #include #include +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn->sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { +
[PATCH v9 11/12] x86, mpx: cleanup unused bound tables
There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table *backing* the data (is tagged with mpx_vma_ops, see the patch "add MPX specific mmap interface"). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first kind of mapping with the actual data, kernel needs to free the mapping for the bounds table backing the data. This patch calls arch_unmap() at the very end of do_unmap() to do so. This will walk the directory to look at the entries covered in the data vma and unmaps the bounds table which is referenced from the directory and then clears the directory entry. Unmapping of bounds tables is called under vm_munmap() of the data VMA. So we have to check ->vm_ops to prevent recursion. This recursion represents having bounds tables for bounds tables, which should not occur normally. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space could now be allocated for some other (random) use, and the MPX hardware is now going to go trying to walk it as if it were a bounds table. That would be bad. So any unmapping of a bounds table has to be accompanied by a corresponding write to the bounds directory entry to have it invalid. That write to the bounds directory can fault. Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. For now, to avoid deadlock, we disable page faults while touching the bounds directory entry. This keeps us from being able to free the tables in this case. This deficiency will be addressed in later patches. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h |9 + arch/x86/mm/mpx.c | 317 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 350 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index e33ddb7..2b52d1b 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -111,4 +111,20 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, #endif } +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Userspace never asked us to manage the bounds tables, +* so refuse to help. +*/ + if (!kernel_managing_mpx_tables(current->mm)) + return; + + mpx_notify_unmap(mm, vma, start, end); +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 32f13f5..a1a0155 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -48,6 +48,13 @@ #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1<>(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)>>MPX_IGN_BITS) & \ + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -73,6 +80,8 @@ static inline int kernel_managing_mpx_tables(struct mm_struct *mm) return (mm->bd_addr != MPX_INVALID_BOUNDS_DIR); } unsigned long mpx_mmap(unsigned long len); +void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 376f2ee..dcc6621 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -1,7 +1,16 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren + * Dave Hansen + */ + #include #include #include #include +#include #include static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -13,6 +22,11 @@ static struct vm_operations_struct mpx_vma_ops = { .name = mpx_mapping_name, }; +int is_mpx_vma(struct vm_area_struct *vma) +{ + return (vma->vm_ops == _vma_ops); +} + /* * this is really a simplified "vm_mmap". it only handles mpx * related maps, including bounds table and bo
[PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
This patch adds two prctl() commands to provide one explicit interaction mechanism to enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of the kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. And PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, kernel won't allocate and free the bounds table, even if the CPU supports MPX feature. PR_MPX_ENABLE_MANAGEMENT will do an xsave and fetch the base address of bounds directory from the xsave buffer and then cache it into new filed "bd_addr" of struct mm_struct. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to one invalid address. Then we can check "bd_addr" to judge whether the management of bounds tables in kernel is enabled. xsaves are expensive, so "bd_addr" is kept for caching to reduce the number of we have to do at munmap() time. But we still have to do xsave to get the value of BNDSTATUS at #BR fault time. In addition, with this caching, userspace can't just move the bounds directory around willy-nilly. For sane applications, base address of the bounds directory won't be changed, otherwise we would be in a world of hurt. But we will still check whether it is changed by users at #BR fault time. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h |9 arch/x86/include/asm/mpx.h | 11 + arch/x86/include/asm/processor.h | 18 +++ arch/x86/kernel/mpx.c | 88 arch/x86/kernel/setup.c|8 +++ arch/x86/kernel/traps.c| 30 - arch/x86/mm/mpx.c | 25 +++--- fs/exec.c |2 + include/asm-generic/mmu_context.h |5 ++ include/linux/mm_types.h |3 + include/uapi/linux/prctl.h |6 +++ kernel/sys.c | 12 + 12 files changed, 198 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 166af2a..e33ddb7 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include #include #include +#include #ifndef CONFIG_PARAVIRT #include @@ -102,4 +103,12 @@ do { \ } while (0) #endif +static inline void arch_bprm_mm_init(struct mm_struct *mm, + struct vm_area_struct *vma) +{ +#ifdef CONFIG_X86_INTEL_MPX + mm->bd_addr = MPX_INVALID_BOUNDS_DIR; +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..32f13f5 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -5,6 +5,12 @@ #include #include +/* + * NULL is theoretically a valid place to put the bounds + * directory, so point this at an invalid address. + */ +#define MPX_INVALID_BOUNDS_DIR ((void __user *)-1) + #ifdef CONFIG_X86_64 /* upper 28 bits [47:20] of the virtual address in 64-bit used to @@ -43,6 +49,7 @@ #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { @@ -61,6 +68,10 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 +static inline int kernel_managing_mpx_tables(struct mm_struct *mm) +{ + return (mm->bd_addr != MPX_INVALID_BOUNDS_DIR); +} unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 020142f..b35aefa 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_ENABLE_MANAGEMENT(tsk) mpx_enable_management((tsk)) +#define MPX_DISABLE_MANAGEMENT(tsk)mpx_disable_management((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_enable_management(struct task_struct *tsk); +extern int mpx_disable_management(struct task_struct *tsk); +#else +static inline int mpx_enable_management(struct task_struct *tsk) +{ +
[PATCH v9 00/12] Intel MPX support
hanges since v6: * because arch_vma_name is removed, this patchset have toset MPX specific ->vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Changes since v7: * introduce VM_ARCH_2 flag. * remove all of the pr_debug()s. * fix prctl numbers in documentation. * fix some bugs on bounds tables freeing. Changes since v8: * add new patch to rename cfg_reg_u and status_reg. * add new patch to use disabled features from Dave's patches. * add new patch to sync struct siginfo for IA64. * rename two new prctl() commands to PR_MPX_ENABLE_MANAGEMENT and PR_MPX_DISABLE_MANAGEMENT, check whether the management of bounds tables in kernel is enabled at #BR fault time, and add locking to protect the access to 'bd_addr'. * update the documentation file to add more content about on-demand allocation of bounds tables, etc.. Qiaowei Ren (12): mm: distinguish VMAs with different vm_ops x86, mpx: rename cfg_reg_u and status_reg x86, mpx: add MPX specific mmap interface x86, mpx: add MPX to disaabled features x86, mpx: on-demand kernel allocation of bounds tables mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version ia64: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Qiaowei Ren (12): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: rename cfg_reg_u and status_reg x86, mpx: add MPX specific mmap interface x86, mpx: add MPX to disaabled features x86, mpx: on-demand kernel allocation of bounds tables mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version ia64: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 245 +++ arch/ia64/include/uapi/asm/siginfo.h |8 +- arch/mips/include/uapi/asm/siginfo.h |4 + arch/x86/Kconfig |4 + arch/x86/include/asm/disabled-features.h |8 +- arch/x86/include/asm/mmu_context.h | 25 ++ arch/x86/include/asm/mpx.h | 101 ++ arch/x86/include/asm/processor.h | 22 ++- arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c| 488 ++ arch/x86/kernel/setup.c |8 + arch/x86/kernel/traps.c | 86 ++- arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c| 385 +++ fs/exec.c|2 + fs/proc/task_mmu.c |1 + include/asm-generic/mmu_context.h| 11 + include/linux/mm.h |6 + include/linux/mm_types.h |3 + include/uapi/asm-generic/siginfo.h |9 +- include/uapi/linux/prctl.h |6 + kernel/signal.c |4 + kernel/sys.c | 12 + mm/mmap.c|2 + 24 files changed, 1436 insertions(+), 7 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c create mode 100644 arch/x86/mm/mpx.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 02/12] x86, mpx: rename cfg_reg_u and status_reg
According to Intel SDM extension, MPX configuration and status registers should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and status_reg to bndcfgu and bndstatus. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/processor.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index eb71ec7..020142f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -379,8 +379,8 @@ struct bndregs_struct { } __packed; struct bndcsr_struct { - u64 cfg_reg_u; - u64 status_reg; + u64 bndcfgu; + u64 bndstatus; } __packed; struct xsave_hdr_struct { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 02/12] x86, mpx: rename cfg_reg_u and status_reg
According to Intel SDM extension, MPX configuration and status registers should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and status_reg to bndcfgu and bndstatus. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/processor.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index eb71ec7..020142f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -379,8 +379,8 @@ struct bndregs_struct { } __packed; struct bndcsr_struct { - u64 cfg_reg_u; - u64 status_reg; + u64 bndcfgu; + u64 bndstatus; } __packed; struct xsave_hdr_struct { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 00/12] Intel MPX support
toset MPX specific -vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Changes since v7: * introduce VM_ARCH_2 flag. * remove all of the pr_debug()s. * fix prctl numbers in documentation. * fix some bugs on bounds tables freeing. Changes since v8: * add new patch to rename cfg_reg_u and status_reg. * add new patch to use disabled features from Dave's patches. * add new patch to sync struct siginfo for IA64. * rename two new prctl() commands to PR_MPX_ENABLE_MANAGEMENT and PR_MPX_DISABLE_MANAGEMENT, check whether the management of bounds tables in kernel is enabled at #BR fault time, and add locking to protect the access to 'bd_addr'. * update the documentation file to add more content about on-demand allocation of bounds tables, etc.. Qiaowei Ren (12): mm: distinguish VMAs with different vm_ops x86, mpx: rename cfg_reg_u and status_reg x86, mpx: add MPX specific mmap interface x86, mpx: add MPX to disaabled features x86, mpx: on-demand kernel allocation of bounds tables mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version ia64: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Qiaowei Ren (12): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: rename cfg_reg_u and status_reg x86, mpx: add MPX specific mmap interface x86, mpx: add MPX to disaabled features x86, mpx: on-demand kernel allocation of bounds tables mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version ia64: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 245 +++ arch/ia64/include/uapi/asm/siginfo.h |8 +- arch/mips/include/uapi/asm/siginfo.h |4 + arch/x86/Kconfig |4 + arch/x86/include/asm/disabled-features.h |8 +- arch/x86/include/asm/mmu_context.h | 25 ++ arch/x86/include/asm/mpx.h | 101 ++ arch/x86/include/asm/processor.h | 22 ++- arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c| 488 ++ arch/x86/kernel/setup.c |8 + arch/x86/kernel/traps.c | 86 ++- arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c| 385 +++ fs/exec.c|2 + fs/proc/task_mmu.c |1 + include/asm-generic/mmu_context.h| 11 + include/linux/mm.h |6 + include/linux/mm_types.h |3 + include/uapi/asm-generic/siginfo.h |9 +- include/uapi/linux/prctl.h |6 + kernel/signal.c |4 + kernel/sys.c | 12 + mm/mmap.c|2 + 24 files changed, 1436 insertions(+), 7 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c create mode 100644 arch/x86/mm/mpx.c -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 09/12] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include linux/types.h #include asm/ptrace.h +#include asm/insn.h #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 2103b5e..b7e4c0e 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -10,6 +10,275 @@ #include linux/syscalls.h #include asm/mpx.h +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn-sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 X86_SIB_SCALE(sib
[PATCH v9 11/12] x86, mpx: cleanup unused bound tables
There are two mappings in play: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table *backing* the data (is tagged with mpx_vma_ops, see the patch add MPX specific mmap interface). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first kind of mapping with the actual data, kernel needs to free the mapping for the bounds table backing the data. This patch calls arch_unmap() at the very end of do_unmap() to do so. This will walk the directory to look at the entries covered in the data vma and unmaps the bounds table which is referenced from the directory and then clears the directory entry. Unmapping of bounds tables is called under vm_munmap() of the data VMA. So we have to check -vm_ops to prevent recursion. This recursion represents having bounds tables for bounds tables, which should not occur normally. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space could now be allocated for some other (random) use, and the MPX hardware is now going to go trying to walk it as if it were a bounds table. That would be bad. So any unmapping of a bounds table has to be accompanied by a corresponding write to the bounds directory entry to have it invalid. That write to the bounds directory can fault. Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. For now, to avoid deadlock, we disable page faults while touching the bounds directory entry. This keeps us from being able to free the tables in this case. This deficiency will be addressed in later patches. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h |9 + arch/x86/mm/mpx.c | 317 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 350 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index e33ddb7..2b52d1b 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -111,4 +111,20 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, #endif } +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Userspace never asked us to manage the bounds tables, +* so refuse to help. +*/ + if (!kernel_managing_mpx_tables(current-mm)) + return; + + mpx_notify_unmap(mm, vma, start, end); +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 32f13f5..a1a0155 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -48,6 +48,13 @@ #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1MPX_BD_ENTRY_OFFSET)-1) +#define MPX_BT_ENTRY_MASK ((1MPX_BT_ENTRY_OFFSET)-1) +#define MPX_GET_BD_ENTRY_OFFSET(addr) addr)(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) MPX_BD_ENTRY_MASK) MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)MPX_IGN_BITS) \ + MPX_BT_ENTRY_MASK) MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -73,6 +80,8 @@ static inline int kernel_managing_mpx_tables(struct mm_struct *mm) return (mm-bd_addr != MPX_INVALID_BOUNDS_DIR); } unsigned long mpx_mmap(unsigned long len); +void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 376f2ee..dcc6621 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -1,7 +1,16 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren qiaowei@intel.com + * Dave Hansen dave.han...@intel.com + */ + #include linux/kernel.h #include linux/syscalls.h #include asm/mpx.h #include asm/mman.h +#include asm/mmu_context.h #include linux/sched/sysctl.h static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -13,6 +22,11 @@ static struct vm_operations_struct mpx_vma_ops = { .name = mpx_mapping_name, }; +int is_mpx_vma(struct
[PATCH v9 10/12] x86, mpx: add prctl commands PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT
This patch adds two prctl() commands to provide one explicit interaction mechanism to enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch on-demand kernel allocation of bounds tables) and cleanup (See the patch cleanup unused bound tables). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of the kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. And PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, kernel won't allocate and free the bounds table, even if the CPU supports MPX feature. PR_MPX_ENABLE_MANAGEMENT will do an xsave and fetch the base address of bounds directory from the xsave buffer and then cache it into new filed bd_addr of struct mm_struct. PR_MPX_DISABLE_MANAGEMENT will set bd_addr to one invalid address. Then we can check bd_addr to judge whether the management of bounds tables in kernel is enabled. xsaves are expensive, so bd_addr is kept for caching to reduce the number of we have to do at munmap() time. But we still have to do xsave to get the value of BNDSTATUS at #BR fault time. In addition, with this caching, userspace can't just move the bounds directory around willy-nilly. For sane applications, base address of the bounds directory won't be changed, otherwise we would be in a world of hurt. But we will still check whether it is changed by users at #BR fault time. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mmu_context.h |9 arch/x86/include/asm/mpx.h | 11 + arch/x86/include/asm/processor.h | 18 +++ arch/x86/kernel/mpx.c | 88 arch/x86/kernel/setup.c|8 +++ arch/x86/kernel/traps.c| 30 - arch/x86/mm/mpx.c | 25 +++--- fs/exec.c |2 + include/asm-generic/mmu_context.h |5 ++ include/linux/mm_types.h |3 + include/uapi/linux/prctl.h |6 +++ kernel/sys.c | 12 + 12 files changed, 198 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 166af2a..e33ddb7 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include asm/pgalloc.h #include asm/tlbflush.h #include asm/paravirt.h +#include asm/mpx.h #ifndef CONFIG_PARAVIRT #include asm-generic/mm_hooks.h @@ -102,4 +103,12 @@ do { \ } while (0) #endif +static inline void arch_bprm_mm_init(struct mm_struct *mm, + struct vm_area_struct *vma) +{ +#ifdef CONFIG_X86_INTEL_MPX + mm-bd_addr = MPX_INVALID_BOUNDS_DIR; +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..32f13f5 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -5,6 +5,12 @@ #include asm/ptrace.h #include asm/insn.h +/* + * NULL is theoretically a valid place to put the bounds + * directory, so point this at an invalid address. + */ +#define MPX_INVALID_BOUNDS_DIR ((void __user *)-1) + #ifdef CONFIG_X86_64 /* upper 28 bits [47:20] of the virtual address in 64-bit used to @@ -43,6 +49,7 @@ #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { @@ -61,6 +68,10 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 +static inline int kernel_managing_mpx_tables(struct mm_struct *mm) +{ + return (mm-bd_addr != MPX_INVALID_BOUNDS_DIR); +} unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 020142f..b35aefa 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_ENABLE_MANAGEMENT(tsk) mpx_enable_management((tsk)) +#define MPX_DISABLE_MANAGEMENT(tsk)mpx_disable_management((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_enable_management(struct task_struct *tsk); +extern int mpx_disable_management(struct task_struct *tsk); +#else +static inline int mpx_enable_management(struct
[PATCH v9 08/12] ia64: sync struct siginfo with general version
New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for IA64 with general version. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/ia64/include/uapi/asm/siginfo.h |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/uapi/asm/siginfo.h b/arch/ia64/include/uapi/asm/siginfo.h index 4ea6225..bce9bc1 100644 --- a/arch/ia64/include/uapi/asm/siginfo.h +++ b/arch/ia64/include/uapi/asm/siginfo.h @@ -63,6 +63,10 @@ typedef struct siginfo { unsigned int _flags;/* see below */ unsigned long _isr; /* isr */ short _addr_lsb;/* lsb of faulting address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -110,9 +114,9 @@ typedef struct siginfo { /* * SIGSEGV si_codes */ -#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */ +#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */ #undef NSIGSEGV -#define NSIGSEGV 3 +#define NSIGSEGV 4 #undef NSIGTRAP #define NSIGTRAP 4 -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 12/12] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- Documentation/x86/intel_mpx.txt | 245 +++ 1 files changed, 245 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..3c20a17 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,245 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability +introduced into Intel Architecture. Intel MPX provides hardware features +that can be used in conjunction with compiler changes to check memory +references, for those references whose compile-time normal intentions are +usurped at runtime due to buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture Instruction +Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection +Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, which +can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How to get the advantage of MPX +== + +For MPX to work, changes are required in the kernel, binutils and compiler. +No source changes are required for applications, just a recompile. + +There are a lot of moving parts of this to all work right. The following +is how we expect the compiler, application and kernel to work together. + +1) Application developer compiles with -fmpx. The compiler will add the + instrumentation as well as some setup code called early after the app + starts. New instruction prefixes are noops for old CPUs. +2) That setup code allocates (virtual) space for the bounds directory, + points the bndcfgu register to the directory and notifies the kernel + (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using + MPX. +3) The kernel detects that the CPU has MPX, allows the new prctl() to + succeed, and notes the location of the bounds directory. Userspace is + expected to keep the bounds directory at that locationWe note it + instead of reading it each time because the 'xsave' operation needed + to access the bounds directory register is an expensive operation. +4) If the application needs to spill bounds out of the 4 registers, it + issues a bndstx instruction. Since the bounds directory is empty at + this point, a bounds fault (#BR) is raised, the kernel allocates a + bounds table (in the user address space) and makes the relevant entry + in the bounds directory point to the new table. +5) If the application violates the bounds specified in the bounds registers, + a separate kind of #BR is raised which will deliver a signal with + information about the violation in the 'struct siginfo'. +6) Whenever memory is freed, we know that it can no longer contain valid + pointers, and we attempt to free the associated space in the bounds + tables. If an entire table becomes unused, we will attempt to free + the table and remove the entry in the directory. + +To summarize, there are essentially three things interacting here: + +GCC with -fmpx: + * enables annotation of code with MPX instructions and prefixes + * inserts code early in the application to call in to the gcc runtime +GCC MPX Runtime: + * Checks for hardware MPX support in cpuid leaf + * allocates virtual space for the bounds directory (malloc() essentially) + * points the hardware BNDCFGU register at the directory + * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to + start managing the bounds directories +Kernel MPX Code: + * Checks for hardware MPX support in cpuid leaf + * Handles #BR exceptions and sends SIGSEGV to the app when it violates + bounds, like during a buffer overflow. + * When bounds are spilled in to an unallocated bounds table, the kernel + notices in the #BR exception, allocates the virtual space, then + updates the bounds directory to point to the new table. It keeps + special track of the memory with a VM_MPX flag. + * Frees unused bounds tables at the time that the memory they described + is unmapped. + + +3. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * new bounds tables (BT) need to be allocated to save bounds. + * bounds violation caused by MPX instructions. + +We hook #BR handler to handle these two new situations. + +On-demand kernel allocation of bounds tables + + +MPX only has 4 hardware registers for storing bounds information. If +MPX
[PATCH v9 03/12] x86, mpx: add MPX specific mmap interface
We have to do the allocation of bounds tables in kernel (See the patch on-demand kernel allocation of bounds tables). Moreover, if we want to track MPX VMAs we need to be able to stick new VM_MPX flag and a specific vm_ops for MPX in the vma_area_struct. But there are not suitable interfaces to do this in current kernel. Existing interfaces, like do_mmap_pgoff(), could not stick specific -vm_ops in the vma_area_struct when a VMA is created. So, this patch adds MPX specific mmap interface to do the allocation of bounds tables. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4b663e1..e5bcc70 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -243,6 +243,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include linux/types.h +#include asm/ptrace.h + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h +#include asm/mman.h +#include linux/sched/sysctl.h + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return [mpx]; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified vm_mmap. it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current-mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(mm-mmap_sem); + + /* Too many mappings? */ + if (mm-map_count sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm-def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out; + + vma = find_vma(mm, ret); + if (!vma) { + ret = -ENOMEM; + goto out; + } + vma-vm_ops = mpx_vma_ops; + + if (vm_flags VM_LOCKED
[PATCH v9 01/12] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled applications using large swaths of memory can potentially have large numbers of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. Being this huge, our expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. So we need a way to track memory use for MPX. If we want to specifically track MPX VMAs we need to be able to distinguish them from normal VMAs, and keep them from getting merged with normal VMAs. A new VM_ flag set only on MPX VMAs does both of those things. With this flag, MPX bounds-table VMAs can be distinguished from other VMAs, and userspace can also walk /proc/$pid/smaps to get memory usage for MPX. Except this flag, we also introduce a specific -vm_ops for MPX VMAs (see the patch add MPX specific mmap interface), but currently vmas with different -vm_ops could be not prevented from merging. We understand that VM_ flags are scarce and are open to other options. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- fs/proc/task_mmu.c |1 + include/linux/mm.h |6 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..cc31520 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = gd, [ilog2(VM_PFNMAP)] = pf, [ilog2(VM_DENYWRITE)] = dw, + [ilog2(VM_MPX)] = mp, [ilog2(VM_LOCKED)] = lo, [ilog2(VM_IO)] = io, [ilog2(VM_SEQ_READ)]= sr, diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..942be8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 07/12] mips: sync struct siginfo with general version
New fields about bound violation are added into general struct siginfo. This will impact MIPS and IA64, which extend general struct siginfo. This patch syncs this struct for MIPS with general version. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 05/12] x86, mpx: on-demand kernel allocation of bounds tables
MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new bounds tables. They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables *need* to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. Why not do this in userspace? This patch is obviously doing this allocation in the kernel. However, MPX does not strictly *require* anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this *could* be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are *HUGE*. An X-GB virtual area needs 4*X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual *AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 20 + arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 101 arch/x86/kernel/traps.c| 52 ++- 4 files changed, 173 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1ULMPX_BNDSTA_TAIL)-1)) +#define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) +#define MPX_BT_ADDR_MASK (~((1ULMPX_BD_ENTRY_TAIL)-1)) + #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BD_ENTRY_VALID_FLAG0x1 unsigned long mpx_mmap(unsigned long len); +#ifdef CONFIG_X86_INTEL_MPX +int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +#else +static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index ada2e2d..9ece662 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o obj-y += process.o obj-y += i387.o xsave.o +obj
[PATCH v9 04/12] x86, mpx: add MPX to disaabled features
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as both a runtime and compile-time check. When CONFIG_X86_INTEL_MPX is disabled, cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid flag will be checked at runtime. This patch must be applied after another Dave's commit: 381aa07a9b4e1f82969203e9e4863da2a157781d Signed-off-by: Dave Hansen dave.han...@linux.intel.com Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/disabled-features.h |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 97534a7..f226df0 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -10,6 +10,12 @@ * cpu_feature_enabled(). */ +#ifdef CONFIG_X86_INTEL_MPX +# define DISABLE_MPX 0 +#else +# define DISABLE_MPX (1(X86_FEATURE_MPX 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1(X86_FEATURE_VME 31)) # define DISABLE_K6_MTRR (1(X86_FEATURE_K6_MTRR 31)) @@ -34,6 +40,6 @@ #define DISABLED_MASK6 0 #define DISABLED_MASK7 0 #define DISABLED_MASK8 0 -#define DISABLED_MASK9 0 +#define DISABLED_MASK9 (DISABLE_MPX) #endif /* _ASM_X86_DISABLED_FEATURES_H */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v9 06/12] mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from-si_code == BUS_MCEERR_AR || from-si_code == BUS_MCEERR_AO) err |= __put_user(from-si_addr_lsb, to-si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from-si_lower, to-si_lower); + err |= __put_user(from-si_upper, to-si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from-si_pid, to-si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 00/10] Intel MPX support
This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in "Intel(R) Architecture Instruction Set Extensions Programming Reference". To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator This patchset has been tested on real internal hardware platform at Intel. We have some simple unit tests in user space, which directly call MPX instructions to produce #BR to let kernel allocate bounds tables and cause bounds violations. We also compiled several benchmarks with an MPX-enabled Gcc/Glibc and ICC, an ran them with this patch set. We found a number of bugs in this code in these tests. Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Changes since v6: * because arch_vma_name is removed, this patchset have toset MPX specific ->vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Changes since v7: * introduce VM_ARCH_2 flag. * remove all of the pr_debug()s. * fix prctl numbers in documentation. * fix some bugs on bounds tables freeing. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables
[PATCH v8 02/10] x86, mpx: add MPX specific mmap interface
This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. These bounds tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 778178f..935aa69 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -243,6 +243,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include +#include +#include +#include +#include + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return "[mpx]"; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified "vm_mmap". it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(>mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr >> PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out; + +
[PATCH v8 03/10] x86, mpx: add macro cpu_has_mpx
In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Community gave a lot of comments on this macro cpu_has_mpx in previous version. Dave will introduce a patchset about disabled features to fix it later. In this code: if (cpu_has_mpx) do_some_mpx_thing(); The patch series from Dave will introduce a new macro cpu_feature_enabled() (if merged after this patchset) to replace the cpu_has_mpx. if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/cpufeature.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index bb9b258..82ec7ed 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -353,6 +353,12 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 05/10] x86, mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, >si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, >si_lower); + err |= __put_user(from->si_upper, >si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, >si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren --- fs/proc/task_mmu.c |1 + include/linux/mm.h |6 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..cc31520 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", + [ilog2(VM_MPX)] = "mp", [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)]= "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..942be8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 20 +++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 58 arch/x86/kernel/traps.c| 55 - 4 files changed, 133 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1UL< +#include +#include + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) + return bt_addr; + bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + return 0; + +out: + vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; + status = xsave_buf->bndcsr.status_reg; + + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry < bd_base) || + (bd_entry >= bd_base + MPX_BD_SIZE_BYTES)) + return -EINVAL; + + return allocate_bt((long __user *)bd_entry); +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 0d0e922..396a88b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -60,6 +60,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -228,7 +229,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ DO_ERROR(X86_TRAP_DE, SIGFPE, "divide error", divide_error) DO_ERROR(X86_TRAP_OF, SIGSEGV, "overflow", overflow) -DO_ERROR(X86_TRAP_BR, SIGSEGV, "bounds", bounds) DO_ERROR(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op) DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun",coprocessor_segment_overrun) DO_ERROR(X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS) @@ -278,6 +278,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) } #endif +dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) +{ + enum ctx_state prev_state; + unsigned long status; + struct xsave_struct *xsave_buf; + struct task_struct *tsk = current; + + prev_state = exception_enter(); + if (notify_die(DIE_TRAP, "bounds",
[PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 55 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 95 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index eb71ec7..b801fea 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 7ef6e39..b86873a 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,61 @@ #include #include +#include #include +#include +#include + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(>thread.fpu); + xsave_buf = &(tsk->thread.fpu.state->xsave); + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm->bd_addr = task_get_bounds_dir(tsk); + if (!mm->bd_addr) + return -EINVAL; + + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm->bd_addr = NULL; + return 0; +} enum reg_type { REG_TYPE_RM = 0, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e0b286..760aee3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43 +#define PR_MPX_UNREGISTER 44 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index ce81291..9a43587 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -91,6 +91,
[PATCH v8 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..ccffeee --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 43 +156#define PR_MPX_UNREGISTER 44 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid bit +of bounds entry
[PATCH v8 09/10] x86, mpx: cleanup unused bound tables
Since the kernel allocated those tables on-demand without userspace knowledge, it is also responsible for freeing them when the associated mappings go away. Here, the solution for this issue is to hook do_munmap() to check whether one process is MPX enabled. If yes, those bounds tables covered in the virtual address region which is being unmapped will be freed also. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 252 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 285 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 166af2a..d13e01c 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include #include #include +#include #ifndef CONFIG_PARAVIRT #include @@ -102,4 +103,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1<>(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)>>MPX_IGN_BITS) & \ + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e1b28e6..feb1f01 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -1,7 +1,16 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren + * Dave Hansen + */ + #include #include #include #include +#include #include static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -77,3 +86,246 @@ out: up_write(>mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr) +{ + int valid; + + if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry + return -EFAULT; + + pagefault_disable(); + if (get_user(*bt_addr, bd_entry)) + goto out; + pagefault_enable(); + + valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG; + *bt_addr &= MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!valid && *bt_addr) + return -EINVAL; + if (!valid) + return -ENOENT; + + return 0; + +out: + pagefault_enable(); + return -EFAULT; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static int __must_check zap_bt_entries(struct mm_struct *mm, + unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + /* +* The table entry comes from userspace and could be +* pointing anywhere, so make sure it is at least +* pointing to valid memory. +*/ + if (!vma || !(vma->vm_flags & VM_MPX) || + vma->vm_start > bt_addr || + vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES) + return -EINVAL; + + zap_page_range(vma, start, end - start, NULL); + return 0; +} + +static in
[PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include #include +#include #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 88d660f..7ef6e39 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,275 @@ #include #include +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn->sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { +
[PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include linux/types.h #include asm/ptrace.h +#include asm/insn.h #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 88d660f..7ef6e39 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,275 @@ #include linux/syscalls.h #include asm/mpx.h +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn-sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 X86_SIB_SCALE(sib
[PATCH v8 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..ccffeee --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 43 +156#define PR_MPX_UNREGISTER 44 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid
[PATCH v8 09/10] x86, mpx: cleanup unused bound tables
Since the kernel allocated those tables on-demand without userspace knowledge, it is also responsible for freeing them when the associated mappings go away. Here, the solution for this issue is to hook do_munmap() to check whether one process is MPX enabled. If yes, those bounds tables covered in the virtual address region which is being unmapped will be freed also. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 252 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 285 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 166af2a..d13e01c 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include asm/pgalloc.h #include asm/tlbflush.h #include asm/paravirt.h +#include asm/mpx.h #ifndef CONFIG_PARAVIRT #include asm-generic/mm_hooks.h @@ -102,4 +103,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1MPX_BD_ENTRY_OFFSET)-1) +#define MPX_BT_ENTRY_MASK ((1MPX_BT_ENTRY_OFFSET)-1) +#define MPX_GET_BD_ENTRY_OFFSET(addr) addr)(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) MPX_BD_ENTRY_MASK) MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)MPX_IGN_BITS) \ + MPX_BT_ENTRY_MASK) MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e1b28e6..feb1f01 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -1,7 +1,16 @@ +/* + * mpx.c - Memory Protection eXtensions + * + * Copyright (c) 2014, Intel Corporation. + * Qiaowei Ren qiaowei@intel.com + * Dave Hansen dave.han...@intel.com + */ + #include linux/kernel.h #include linux/syscalls.h #include asm/mpx.h #include asm/mman.h +#include asm/mmu_context.h #include linux/sched/sysctl.h static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -77,3 +86,246 @@ out: up_write(mm-mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr) +{ + int valid; + + if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry + return -EFAULT; + + pagefault_disable(); + if (get_user(*bt_addr, bd_entry)) + goto out; + pagefault_enable(); + + valid = *bt_addr MPX_BD_ENTRY_VALID_FLAG; + *bt_addr = MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!valid *bt_addr) + return -EINVAL; + if (!valid) + return -ENOENT; + + return 0; + +out: + pagefault_enable(); + return -EFAULT; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static int __must_check zap_bt_entries(struct mm_struct *mm, + unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + /* +* The table entry comes from userspace and could be +* pointing anywhere, so make sure it is at least +* pointing to valid memory. +*/ + if (!vma || !(vma-vm_flags VM_MPX
[PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 55 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 95 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index eb71ec7..b801fea 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 7ef6e39..b86873a 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,61 @@ #include linux/kernel.h #include linux/syscalls.h +#include linux/prctl.h #include asm/mpx.h +#include asm/i387.h +#include asm/fpu-internal.h + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm-bd_addr = task_get_bounds_dir(tsk); + if (!mm-bd_addr) + return -EINVAL; + + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm-bd_addr = NULL; + return 0; +} enum reg_type { REG_TYPE_RM = 0, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e0b286..760aee3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43 +#define PR_MPX_UNREGISTER 44 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index ce81291..9a43587 100644
[PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 20 +++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 58 arch/x86/kernel/traps.c| 55 - 4 files changed, 133 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1ULMPX_BNDSTA_TAIL)-1)) +#define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) +#define MPX_BT_ADDR_MASK (~((1ULMPX_BD_ENTRY_TAIL)-1)) + #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BD_ENTRY_VALID_FLAG0x1 unsigned long mpx_mmap(unsigned long len); +#ifdef CONFIG_X86_INTEL_MPX +int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +#else +static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index ada2e2d..9ece662 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o obj-y += process.o obj-y += i387.o xsave.o +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o obj-y += ptrace.o obj-$(CONFIG_X86_32) += tls.o obj-$(CONFIG_IA32_EMULATION) += tls.o diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c new file mode 100644 index 000..88d660f --- /dev/null +++ b/arch/x86/kernel/mpx.c @@ -0,0 +1,58 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) + return bt_addr; + bt_addr = (bt_addr MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(old_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + return 0; + +out: + vm_munmap(bt_addr MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; + status = xsave_buf-bndcsr.status_reg; + + bd_entry = status MPX_BNDSTA_ADDR_MASK; + if ((bd_entry bd_base) || + (bd_entry = bd_base + MPX_BD_SIZE_BYTES)) + return -EINVAL; + + return
[PATCH v8 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- fs/proc/task_mmu.c |1 + include/linux/mm.h |6 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..cc31520 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = gd, [ilog2(VM_PFNMAP)] = pf, [ilog2(VM_DENYWRITE)] = dw, + [ilog2(VM_MPX)] = mp, [ilog2(VM_LOCKED)] = lo, [ilog2(VM_IO)] = io, [ilog2(VM_SEQ_READ)]= sr, diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..942be8a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +#define VM_ARCH_2 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY @@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPYVM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#if defined(CONFIG_X86) +/* MPX specific bounds table or bounds directory */ +# define VM_MPXVM_ARCH_2 +#endif + #ifndef VM_GROWSUP # define VM_GROWSUPVM_NONE #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 05/10] x86, mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 8f0876f..2c403a4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from-si_code == BUS_MCEERR_AR || from-si_code == BUS_MCEERR_AO) err |= __put_user(from-si_addr_lsb, to-si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from-si_lower, to-si_lower); + err |= __put_user(from-si_upper, to-si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from-si_pid, to-si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 03/10] x86, mpx: add macro cpu_has_mpx
In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Community gave a lot of comments on this macro cpu_has_mpx in previous version. Dave will introduce a patchset about disabled features to fix it later. In this code: if (cpu_has_mpx) do_some_mpx_thing(); The patch series from Dave will introduce a new macro cpu_feature_enabled() (if merged after this patchset) to replace the cpu_has_mpx. if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/cpufeature.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index bb9b258..82ec7ed 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -353,6 +353,12 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 02/10] x86, mpx: add MPX specific mmap interface
This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. These bounds tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 778178f..935aa69 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -243,6 +243,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include linux/types.h +#include asm/ptrace.h + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h +#include asm/mman.h +#include linux/sched/sysctl.h + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return [mpx]; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified vm_mmap. it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current-mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(mm-mmap_sem); + + /* Too many mappings? */ + if (mm-map_count sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm-def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out
[PATCH v8 00/10] Intel MPX support
This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in Intel(R) Architecture Instruction Set Extensions Programming Reference. To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator This patchset has been tested on real internal hardware platform at Intel. We have some simple unit tests in user space, which directly call MPX instructions to produce #BR to let kernel allocate bounds tables and cause bounds violations. We also compiled several benchmarks with an MPX-enabled Gcc/Glibc and ICC, an ran them with this patch set. We found a number of bugs in this code in these tests. Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Changes since v6: * because arch_vma_name is removed, this patchset have toset MPX specific -vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Changes since v7: * introduce VM_ARCH_2 flag. * remove all of the pr_debug()s. * fix prctl numbers in documentation. * fix some bugs on bounds tables freeing. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables x86, mpx: add
[PATCH v7 02/10] x86, mpx: add MPX specific mmap interface
This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. These bounds tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..020db35 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -238,6 +238,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include +#include +#include +#include +#include + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return "[mpx]"; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified "vm_mmap". it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(>mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr >> PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out; + +
[PATCH v7 05/10] x86, mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index a4077e9..2131636 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, >si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, >si_lower); + err |= __put_user(from->si_upper, >si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, >si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 20 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 60 arch/x86/kernel/traps.c| 55 +++- 4 files changed, 135 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1UL< +#include +#include + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) + return bt_addr; + bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + pr_debug("Allocate bounds table %lx at entry %p\n", + bt_addr, bd_entry); + return 0; + +out: + vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; + status = xsave_buf->bndcsr.status_reg; + + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry < bd_base) || + (bd_entry >= bd_base + MPX_BD_SIZE_BYTES)) + return -EINVAL; + + return allocate_bt((long __user *)bd_entry); +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 0d0e922..396a88b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -60,6 +60,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -228,7 +229,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ DO_ERROR(X86_TRAP_DE, SIGFPE, "divide error", divide_error) DO_ERROR(X86_TRAP_OF, SIGSEGV, "overflow", overflow) -DO_ERROR(X86_TRAP_BR, SIGSEGV, "bounds", bounds) DO_ERROR(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op) DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun",coprocessor_segment_overrun) DO_ERROR(X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS) @@ -278,6 +278,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) } #endif +dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) +{ + enum ctx_state prev_state; + unsigned long status; + struct xsave_struct *xsave_buf; + struct task_stru
[PATCH v7 00/10] Intel MPX support
This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in "Intel(R) Architecture Instruction Set Extensions Programming Reference". To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator In addition, this patchset has been tested on Intel internal hardware platform for MPX testing. Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Changes since v6: * because arch_vma_name is removed, this patchset have toset MPX specific ->vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 127 +++ arch/mips/include/uapi/asm/siginfo.h |4 + arch/x86/Kconfig |4 + arch/x86/include/asm/cpufeature.h|6 + arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h | 91 arch/x86/include/asm/processor.h | 18 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c
[PATCH v7 09/10] x86, mpx: cleanup unused bound tables
Since the kernel allocated those tables on-demand without userspace knowledge, it is also responsible for freeing them when the associated mappings go away. Here, the solution for this issue is to hook do_munmap() to check whether one process is MPX enabled. If yes, those bounds tables covered in the virtual address region which is being unmapped will be freed also. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 181 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 214 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index be12c53..af70d4f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,6 +6,7 @@ #include #include #include +#include #ifndef CONFIG_PARAVIRT #include @@ -96,4 +97,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1<>(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)>>MPX_IGN_BITS) & \ + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e1b28e6..d29ec9c 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -2,6 +2,7 @@ #include #include #include +#include #include static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -77,3 +78,183 @@ out: up_write(>mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr, + unsigned int *valid) +{ + if (get_user(*bt_addr, bd_entry)) + return -EFAULT; + + *valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG; + *bt_addr &= MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!(*valid) && *bt_addr) + force_sig(SIGSEGV, current); + + return 0; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + if (!vma || vma->vm_start > bt_addr || + vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES) + return; + + zap_page_range(vma, start, end, NULL); +} + +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long bt_addr) +{ + if (user_atomic_cmpxchg_inatomic(_addr, bd_entry, + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0)) + return; + + /* +* to avoid recursion, do_munmap() will check whether it comes +* from one bounds table through VM_MPX flag. +*/ + do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); +} + +/* + * If the bounds table pointed by bounds directory 'bd_entry' is + * not shared, unmap this whole bounds table. Otherwise, only free + * those backing physical pages of bounds table entries covered + * i
[PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Community gave a lot of comments on this macro cpu_has_mpx in previous version. Dave will introduce a patchset about disabled features to fix it later. In this code: if (cpu_has_mpx) do_some_mpx_thing(); The patch series from Dave will introduce a new macro cpu_feature_enabled() (if merged after this patchset) to replace the cpu_has_mpx. if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/cpufeature.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..f302d08 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include #include +#include #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index f02dcea..c1957a8 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,275 @@ #include #include +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn->sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { +
[PATCH v7 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..1af9809 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 41 +156#define PR_MPX_UNREGISTER 42 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid bit +of bounds entry
[PATCH v7 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren --- fs/proc/task_mmu.c |1 + include/linux/mm.h |2 ++ 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index cfa63ee..b2bc755 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", + [ilog2(VM_MPX)] = "mp", [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)]= "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index e03dd29..44c75d7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +/* MPX specific bounds table or bounds directory (x86) */ +#define VM_MPX 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 56 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6e0966e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index c1957a8..6b7e526 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,62 @@ #include #include +#include #include +#include +#include + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(>thread.fpu); + xsave_buf = &(tsk->thread.fpu.state->xsave); + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u & + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm->bd_addr = task_get_bounds_dir(tsk); + if (!mm->bd_addr) + return -EINVAL; + + pr_debug("MPX BD base address %p\n", mm->bd_addr); + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm->bd_addr = NULL; + return 0; +} enum reg_type { REG_TYPE_RM = 0, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 96c5750..131b5b3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43 +#define PR_MPX_UNREGISTER 44 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index
[PATCH v7 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 56 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6e0966e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index c1957a8..6b7e526 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,62 @@ #include linux/kernel.h #include linux/syscalls.h +#include linux/prctl.h #include asm/mpx.h +#include asm/i387.h +#include asm/fpu-internal.h + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm-bd_addr = task_get_bounds_dir(tsk); + if (!mm-bd_addr) + return -EINVAL; + + pr_debug(MPX BD base address %p\n, mm-bd_addr); + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm-bd_addr = NULL; + return 0; +} enum reg_type { REG_TYPE_RM = 0, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 96c5750..131b5b3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43 +#define PR_MPX_UNREGISTER 44 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel
[PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx
In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Community gave a lot of comments on this macro cpu_has_mpx in previous version. Dave will introduce a patchset about disabled features to fix it later. In this code: if (cpu_has_mpx) do_some_mpx_thing(); The patch series from Dave will introduce a new macro cpu_feature_enabled() (if merged after this patchset) to replace the cpu_has_mpx. if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/cpufeature.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..f302d08 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. This patch does't use the generic decoder, and implements a limited special-purpose decoder to decode MPX instructions, simply because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 299 arch/x86/kernel/traps.c|6 + 3 files changed, 328 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include linux/types.h #include asm/ptrace.h +#include asm/insn.h #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index f02dcea..c1957a8 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,275 @@ #include linux/syscalls.h #include asm/mpx.h +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +enum reg_type type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn-rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn-rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn-modrm.value; + unsigned char sib = (unsigned char)insn-sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn-sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 X86_SIB_SCALE(sib
[PATCH v7 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..1af9809 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 41 +156#define PR_MPX_UNREGISTER 42 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid
[PATCH v7 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- fs/proc/task_mmu.c |1 + include/linux/mm.h |2 ++ 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index cfa63ee..b2bc755 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = gd, [ilog2(VM_PFNMAP)] = pf, [ilog2(VM_DENYWRITE)] = dw, + [ilog2(VM_MPX)] = mp, [ilog2(VM_LOCKED)] = lo, [ilog2(VM_IO)] = io, [ilog2(VM_SEQ_READ)]= sr, diff --git a/include/linux/mm.h b/include/linux/mm.h index e03dd29..44c75d7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +/* MPX specific bounds table or bounds directory (x86) */ +#define VM_MPX 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 20 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 60 arch/x86/kernel/traps.c| 55 +++- 4 files changed, 135 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1ULMPX_BNDSTA_TAIL)-1)) +#define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) +#define MPX_BT_ADDR_MASK (~((1ULMPX_BD_ENTRY_TAIL)-1)) + #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BD_ENTRY_VALID_FLAG0x1 unsigned long mpx_mmap(unsigned long len); +#ifdef CONFIG_X86_INTEL_MPX +int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +#else +static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 047f9ff..5e81e16 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o obj-y += process.o obj-y += i387.o xsave.o +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o obj-y += ptrace.o obj-$(CONFIG_X86_32) += tls.o obj-$(CONFIG_IA32_EMULATION) += tls.o diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c new file mode 100644 index 000..f02dcea --- /dev/null +++ b/arch/x86/kernel/mpx.c @@ -0,0 +1,60 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) + return bt_addr; + bt_addr = (bt_addr MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(old_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + pr_debug(Allocate bounds table %lx at entry %p\n, + bt_addr, bd_entry); + return 0; + +out: + vm_munmap(bt_addr MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; + status = xsave_buf-bndcsr.status_reg; + + bd_entry = status MPX_BNDSTA_ADDR_MASK; + if ((bd_entry bd_base
[PATCH v7 00/10] Intel MPX support
This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in Intel(R) Architecture Instruction Set Extensions Programming Reference. To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator In addition, this patchset has been tested on Intel internal hardware platform for MPX testing. Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Changes since v6: * because arch_vma_name is removed, this patchset have toset MPX specific -vm_ops to do the same thing. * fix warnings for 32 bit arch. * add more description into these patches. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 127 +++ arch/mips/include/uapi/asm/siginfo.h |4 + arch/x86/Kconfig |4 + arch/x86/include/asm/cpufeature.h|6 + arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h | 91 arch/x86/include/asm/processor.h | 18 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c| 415
[PATCH v7 02/10] x86, mpx: add MPX specific mmap interface
This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. These bounds tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/Kconfig |4 ++ arch/x86/include/asm/mpx.h | 38 + arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 79 4 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..020db35 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -238,6 +238,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include linux/types.h +#include asm/ptrace.h + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..e1b28e6 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,79 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h +#include asm/mman.h +#include linux/sched/sysctl.h + +static const char *mpx_mapping_name(struct vm_area_struct *vma) +{ + return [mpx]; +} + +static struct vm_operations_struct mpx_vma_ops = { + .name = mpx_mapping_name, +}; + +/* + * this is really a simplified vm_mmap. it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current-mm; + vm_flags_t vm_flags; + struct vm_area_struct *vma; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(mm-mmap_sem); + + /* Too many mappings? */ + if (mm-map_count sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm-def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + if (IS_ERR_VALUE(ret)) + goto out
[PATCH v7 05/10] x86, mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index a4077e9..2131636 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from-si_code == BUS_MCEERR_AR || from-si_code == BUS_MCEERR_AO) err |= __put_user(from-si_addr_lsb, to-si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from-si_lower, to-si_lower); + err |= __put_user(from-si_upper, to-si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from-si_pid, to-si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 09/10] x86, mpx: cleanup unused bound tables
Since the kernel allocated those tables on-demand without userspace knowledge, it is also responsible for freeing them when the associated mappings go away. Here, the solution for this issue is to hook do_munmap() to check whether one process is MPX enabled. If yes, those bounds tables covered in the virtual address region which is being unmapped will be freed also. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 181 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 214 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index be12c53..af70d4f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,6 +6,7 @@ #include asm/pgalloc.h #include asm/tlbflush.h #include asm/paravirt.h +#include asm/mpx.h #ifndef CONFIG_PARAVIRT #include asm-generic/mm_hooks.h @@ -96,4 +97,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1MPX_BD_ENTRY_OFFSET)-1) +#define MPX_BT_ENTRY_MASK ((1MPX_BT_ENTRY_OFFSET)-1) +#define MPX_GET_BD_ENTRY_OFFSET(addr) addr)(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) MPX_BD_ENTRY_MASK) MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)MPX_IGN_BITS) \ + MPX_BT_ENTRY_MASK) MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e1b28e6..d29ec9c 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -2,6 +2,7 @@ #include linux/syscalls.h #include asm/mpx.h #include asm/mman.h +#include asm/mmu_context.h #include linux/sched/sysctl.h static const char *mpx_mapping_name(struct vm_area_struct *vma) @@ -77,3 +78,183 @@ out: up_write(mm-mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr, + unsigned int *valid) +{ + if (get_user(*bt_addr, bd_entry)) + return -EFAULT; + + *valid = *bt_addr MPX_BD_ENTRY_VALID_FLAG; + *bt_addr = MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!(*valid) *bt_addr) + force_sig(SIGSEGV, current); + + return 0; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + if (!vma || vma-vm_start bt_addr || + vma-vm_end bt_addr+MPX_BT_SIZE_BYTES) + return; + + zap_page_range(vma, start, end, NULL); +} + +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long bt_addr) +{ + if (user_atomic_cmpxchg_inatomic(bt_addr, bd_entry, + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0)) + return; + + /* +* to avoid recursion, do_munmap() will check whether it comes +* from one bounds table through VM_MPX flag. +*/ + do_munmap(mm, bt_addr MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); +} + +/* + * If the bounds table pointed
[PATCH v6 00/10] Intel MPX support
This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in "Intel(R) Architecture Instruction Set Extensions Programming Reference". To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 127 +++ arch/mips/include/uapi/asm/siginfo.h |4 + arch/x86/Kconfig |4 + arch/x86/include/asm/cpufeature.h|6 + arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h | 91 arch/x86/include/asm/processor.h | 18 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c| 413 ++ arch/x86/kernel/traps.c | 62 +- arch/x86/mm/Makefile |2 + arch/x86/mm/init_64.c|2 + arch/x86/mm/mpx.c| 247 fs/proc/task_mmu.c |1 + include/a
[PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx
In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/cpufeature.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..f302d08 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren --- arch/x86/mm/init_64.c |2 ++ fs/proc/task_mmu.c|1 + include/linux/mm.h|2 ++ 3 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index f35c66c..2d41679 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1223,6 +1223,8 @@ int in_gate_area_no_mm(unsigned long addr) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_MPX) + return "[mpx]"; if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso) return "[vdso]"; if (vma == _vma) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 442177b..09266bd 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -543,6 +543,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", + [ilog2(VM_MPX)] = "mp", [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)]= "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index d677706..029c716 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x0100 /* Architecture-specific flag */ +/* MPX specific bounds table or bounds directory (x86) */ +#define VM_MPX 0x0200 #define VM_DONTDUMP0x0400 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 05/10] x86, mpx: extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren --- include/uapi/asm-generic/siginfo.h |9 - kernel/signal.c|4 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb_sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band_sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR(__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR(__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR(__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 6ea13c0..0fcf749 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2773,6 +2773,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, >si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, >si_lower); + err |= __put_user(from->si_upper, >si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, >si_pid); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 02/10] x86, mpx: add MPX specific mmap interface
This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig |4 +++ arch/x86/include/asm/mpx.h | 38 arch/x86/mm/Makefile |2 + arch/x86/mm/mpx.c | 58 4 files changed, 102 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 25d2c6f..0194790 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -237,6 +237,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 000..546c5d1 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,58 @@ +#include +#include +#include +#include +#include + +/* + * this is really a simplified "vm_mmap". it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(>mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure +* that it represents a valid section of the address space. +*/ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Make bounds tables and bouds directory unlocked. */ + if (vm_flags & VM_LOCKED) + vm_flags &= ~VM_LOCKED; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr >> PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + +out: + up_write(>mmap_sem); + return ret; +} -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 56 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6e0966e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 650b282..d8a2a09 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,62 @@ #include #include +#include #include +#include +#include + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(>thread.fpu); + xsave_buf = &(tsk->thread.fpu.state->xsave); + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(xsave_buf->bndcsr.cfg_reg_u & + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm->bd_addr = task_get_bounds_dir(tsk); + if (!mm->bd_addr) + return -EINVAL; + + pr_debug("MPX BD base address %p\n", mm->bd_addr); + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm->bd_addr = NULL; + return 0; +} typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8967e20..54b8011 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43 +#define PR_MPX_UNREGIST
[PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 23 arch/x86/kernel/mpx.c | 294 arch/x86/kernel/traps.c|6 + 3 files changed, 323 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include #include +#include #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 4230c7b..650b282 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,270 @@ #include #include +typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +reg_type_t type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn->sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { + addr = get_reg(insn, regs, REG_TYPE_RM); + } + addr += insn->displacement.value; + } + + return addr; +} + +/* Verify next sizeof(t) bytes can be on the same instruction */ +#define validate_next(t, insn, n) \ + ((insn)-&
[PATCH v6 09/10] x86, mpx: cleanup unused bound tables
When user memory region is unmapped, related bound tables become unused and need to be released also. This patch cleanups these unused bound tables through hooking unmap path. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 189 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 222 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index be12c53..af70d4f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,6 +6,7 @@ #include #include #include +#include #ifndef CONFIG_PARAVIRT #include @@ -96,4 +97,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1<>(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)>>MPX_IGN_BITS) & \ + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 546c5d1..fd05cd4 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -2,6 +2,7 @@ #include #include #include +#include #include /* @@ -56,3 +57,191 @@ out: up_write(>mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr, + unsigned int *valid) +{ + if (get_user(*bt_addr, bd_entry)) + return -EFAULT; + + *valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG; + *bt_addr &= MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!(*valid) && *bt_addr) + force_sig(SIGSEGV, current); + + pr_debug("get_bt: BD Entry (%p) - Table (%lx,%d)\n", + bd_entry, *bt_addr, *valid); + return 0; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + if (!vma || vma->vm_start > bt_addr || + vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES) + return; + + zap_page_range(vma, start, end, NULL); + pr_debug("Bound table de-allocation %lx (%lx, %lx)\n", + bt_addr, start, end); +} + +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long bt_addr) +{ + if (user_atomic_cmpxchg_inatomic(_addr, bd_entry, + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0)) + return; + + pr_debug("Bound table de-allocation %lx at entry addr %p\n", + bt_addr, bd_entry); + /* +* to avoid recursion, do_munmap() will check whether it comes +* from one bounds table through VM_MPX flag. +*/ + do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); +} + +/* + * If the bounds table pointed by bounds directory 'bd_entry' is + * not shared, unmap thi
[PATCH v6 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..1af9809 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 41 +156#define PR_MPX_UNREGISTER 42 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid bit +of bounds entry
[PATCH v6 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 20 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 63 arch/x86/kernel/traps.c| 56 ++- 4 files changed, 139 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1UL< +#include +#include + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) { + pr_err("Bounds table allocation failed at entry addr %p\n", + bd_entry); + return bt_addr; + } + bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + pr_debug("Allocate bounds table %lx at entry %p\n", + bt_addr, bd_entry); + return 0; + +out: + vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; + status = xsave_buf->bndcsr.status_reg; + + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry < bd_base) || + (bd_entry >= bd_base + MPX_BD_SIZE_BYTES)) + return -EINVAL; + + return allocate_bt((long __user *)bd_entry); +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index f73b5d4..35b9b29 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -59,6 +59,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error,FPE_INTDIV, regs->ip ) DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow ) -DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds ) DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip ) DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun
[PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h | 20 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 63 arch/x86/kernel/traps.c| 56 ++- 4 files changed, 139 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL2 +#define MPX_BNDCFG_TAIL12 +#define MPX_BNDSTA_ADDR_MASK (~((1ULMPX_BNDSTA_TAIL)-1)) +#define MPX_BNDCFG_ADDR_MASK (~((1ULMPX_BNDCFG_TAIL)-1)) +#define MPX_BT_ADDR_MASK (~((1ULMPX_BD_ENTRY_TAIL)-1)) + #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BD_ENTRY_VALID_FLAG0x1 unsigned long mpx_mmap(unsigned long len); +#ifdef CONFIG_X86_INTEL_MPX +int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +#else +static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index f4d9600..3e81aed 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o obj-y += process.o obj-y += i387.o xsave.o +obj-$(CONFIG_X86_INTEL_MPX)+= mpx.o obj-y += ptrace.o obj-$(CONFIG_X86_32) += tls.o obj-$(CONFIG_IA32_EMULATION) += tls.o diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c new file mode 100644 index 000..4230c7b --- /dev/null +++ b/arch/x86/kernel/mpx.c @@ -0,0 +1,63 @@ +#include linux/kernel.h +#include linux/syscalls.h +#include asm/mpx.h + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) { + pr_err(Bounds table allocation failed at entry addr %p\n, + bd_entry); + return bt_addr; + } + bt_addr = (bt_addr MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(old_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* +* there is a existing bounds table pointed at this bounds +* directory entry, and so we need to free the bounds table +* allocated just now. +*/ + if (old_val) + goto out; + + pr_debug(Allocate bounds table %lx at entry %p\n, + bt_addr, bd_entry); + return 0; + +out: + vm_munmap(bt_addr MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK
[PATCH v6 10/10] x86, mpx: add documentation on Intel MPX
This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- Documentation/x86/intel_mpx.txt | 127 +++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 000..1af9809 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview + + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work + + +Handling #BR faults caused by MPX +- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno;/* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables + + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155#define PR_MPX_REGISTER 41 +156#define PR_MPX_UNREGISTER 42 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +=== + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid
[PATCH v6 06/10] mips: sync struct siginfo with general version
Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 09/10] x86, mpx: cleanup unused bound tables
When user memory region is unmapped, related bound tables become unused and need to be released also. This patch cleanups these unused bound tables through hooking unmap path. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h |9 ++ arch/x86/mm/mpx.c | 189 include/asm-generic/mmu_context.h |6 + mm/mmap.c |2 + 5 files changed, 222 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index be12c53..af70d4f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,6 +6,7 @@ #include asm/pgalloc.h #include asm/tlbflush.h #include asm/paravirt.h +#include asm/mpx.h #ifndef CONFIG_PARAVIRT #include asm-generic/mm_hooks.h @@ -96,4 +97,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* +* Check whether this vma comes from MPX-enabled application. +* If so, release this vma related bound tables. +*/ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1MPX_BD_ENTRY_OFFSET)-1) +#define MPX_BT_ENTRY_MASK ((1MPX_BT_ENTRY_OFFSET)-1) +#define MPX_GET_BD_ENTRY_OFFSET(addr) addr)(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) MPX_BD_ENTRY_MASK) MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) addr)MPX_IGN_BITS) \ + MPX_BT_ENTRY_MASK) MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 546c5d1..fd05cd4 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -2,6 +2,7 @@ #include linux/syscalls.h #include asm/mpx.h #include asm/mman.h +#include asm/mmu_context.h #include linux/sched/sysctl.h /* @@ -56,3 +57,191 @@ out: up_write(mm-mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr, + unsigned int *valid) +{ + if (get_user(*bt_addr, bd_entry)) + return -EFAULT; + + *valid = *bt_addr MPX_BD_ENTRY_VALID_FLAG; + *bt_addr = MPX_BT_ADDR_MASK; + + /* +* If this bounds directory entry is nonzero, and meanwhile +* the valid bit is zero, one SIGSEGV will be produced due to +* this unexpected situation. +*/ + if (!(*valid) *bt_addr) + force_sig(SIGSEGV, current); + + pr_debug(get_bt: BD Entry (%p) - Table (%lx,%d)\n, + bd_entry, *bt_addr, *valid); + return 0; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + if (!vma || vma-vm_start bt_addr || + vma-vm_end bt_addr+MPX_BT_SIZE_BYTES) + return; + + zap_page_range(vma, start, end, NULL); + pr_debug(Bound table de-allocation %lx (%lx, %lx)\n, + bt_addr, start, end); +} + +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long bt_addr) +{ + if (user_atomic_cmpxchg_inatomic(bt_addr, bd_entry, + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0)) + return; + + pr_debug(Bound table de-allocation %lx at entry addr %p\n, + bt_addr, bd_entry); + /* +* to avoid recursion, do_munmap() will check whether it comes +* from one bounds table through VM_MPX flag. +*/ + do_munmap(mm, bt_addr
[PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/x86/include/asm/mpx.h |1 + arch/x86/include/asm/processor.h | 18 arch/x86/kernel/mpx.c| 56 ++ include/linux/mm_types.h |3 ++ include/uapi/linux/prctl.h |6 kernel/sys.c | 12 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6e0966e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk)mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 650b282..d8a2a09 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,62 @@ #include linux/kernel.h #include linux/syscalls.h +#include linux/prctl.h #include asm/mpx.h +#include asm/i387.h +#include asm/fpu-internal.h + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(tsk-thread.fpu); + xsave_buf = (tsk-thread.fpu.state-xsave); + if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(xsave_buf-bndcsr.cfg_reg_u + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* +* runtime in the userspace will be responsible for allocation of +* the bounds directory. Then, it will save the base of the bounds +* directory into XSAVE/XRSTOR Save Area and enable MPX through +* XRSTOR instruction. +* +* fpu_xsave() is expected to be very expensive. In order to do +* performance optimization, here we get the base of the bounds +* directory and then save it into mm_struct to be used in future. +*/ + mm-bd_addr = task_get_bounds_dir(tsk); + if (!mm-bd_addr) + return -EINVAL; + + pr_debug(MPX BD base address %p\n, mm-bd_addr); + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current-mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm-bd_addr = NULL; + return 0; +} typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8967e20..54b8011 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER43