Re: [RFC PATCH v5 03/12] __wr_after_init: Core and default arch
On 15/02/2019 10:57, Peter Zijlstra wrote: Where are the comments and Changelog notes ? How is an arch maintainer to be aware of this requirement when adding support for his/her arch? Yes, it will be fixed in the next revision. I've added comment to the core wr_assign function and also to the changelogs for the patches enabling it on x86_64 and arm64, respectively. Should I add mention of it also in the documentation? -- igor
Re: [RFC PATCH v5 03/12] __wr_after_init: Core and default arch
On 14/02/2019 13:28, Peter Zijlstra wrote: On Thu, Feb 14, 2019 at 12:41:32AM +0200, Igor Stoppa wrote: [...] +#define wr_rcu_assign_pointer(p, v) ({ \ + smp_mb(); \ + wr_assign(p, v);\ + p; \ +}) This requires that wr_memcpy() (through wr_assign) is single-copy-atomic for native types. There is not a comment in sight that states this. Right, I kinda expected native-aligned <-> atomic, but it's not necessarily true. It should be confirmed when enabling write rare on a new architecture. I'll add the comment. Also, is this true of x86/arm64 memcpy ? For x86_64: https://elixir.bootlin.com/linux/v5.0-rc6/source/arch/x86/include/asm/uaccess.h#L462 the mov"itype" part should deal with atomic copy of native, aligned types. For arm64: https://elixir.bootlin.com/linux/v5.0-rc6/source/arch/arm64/lib/copy_template.S#L110 .Ltiny15 deals with copying less than 16 bytes, which includes pointers. When the data is aligned, the copy of a pointer should be atomic. -- igor
[RFC PATCH v5 05/12] __wr_after_init: x86_64: enable
Set ARCH_HAS_PRMEM to Y for x86_64 Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 68261430fe6e..7392b53b12c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -32,6 +32,7 @@ config X86_64 select SWIOTLB select X86_DEV_DMA_OPS select ARCH_HAS_SYSCALL_WRAPPER + select ARCH_HAS_PRMEM # # Arch settings -- 2.19.1
[RFC PATCH v5 10/12] __wr_after_init: rodata_test: test __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index e1349520b436..a669cf9f5a61 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,8 +16,23 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +#ifdef CONFIG_PRMEM +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; +#endif + static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) { @@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data, void rodata_test(void) { - test_data("rodata", _test_data, - (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata); + if (!test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + return; +#ifdef CONFIG_PRMEM + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init); +#endif } -- 2.19.1
[RFC PATCH v5 11/12] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. It can be optionally compiled as module. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/Kconfig.debug | 8 +++ mm/Makefile| 1 + mm/test_write_rare.c (new) | 142 +++ 3 files changed, 151 insertions(+) diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..a62c31901fea 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index ef3867c16ce0..8de1d468f4e7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_PRMEM) += prmem.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..e9ebc8e12041 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static bool test_pattern(void) +{ + if (memchr_inv(array, '0', PAGE_SIZE / 2)) + return pr_info("Pattern part 1 failed."); + if (memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ) + return pr_info("Pattern part 2 failed."); + if (memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2)) + return pr_info("Pattern part 3 failed."); + if (memchr_inv(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4)) + return pr_info("Pattern part 4 failed."); + if (memchr_inv(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)) + return pr_info("Pattern part 5 failed."); + return 0; +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memchr_inv(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memchr_inv(array, '0', PAGE_SIZE * 3), +"Array page aligned write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array half page aligned write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2), +"Array quarter page aligned write rare memset test failed.")) + return false; + + if (WARN(test_pattern(), "Array write rare memset test failed.")) + return false; + + pr_info("Array write rare memset test passed."); + return true; +} + +static u8 array_1[PAGE_SIZE * 2]; +static u8 array_2[PAGE_SIZE * 2]; + +static bool test_wr_memcp
[RFC PATCH v5 09/12] __wr_after_init: rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory, when available. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..e1349520b436 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + pr_info("%s tests were successful", data_type); + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata); } -- 2.19.1
[RFC PATCH v5 12/12] IMA: turn ima_policy_flags into __wr_after_init
The policy flags could be targeted by an attacker aiming at disabling IMA, so that there would be no trace of a file system modification in the measurement list. Since the flags can be altered at runtime, it is not possible to make them become fully read-only, for example with __ro_after_init. __wr_after_init can still provide some protection, at least against simple memory overwrite attacks Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- security/integrity/ima/ima.h| 3 ++- security/integrity/ima/ima_policy.c | 9 + 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index cc12f3449a72..297c25f5122e 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include "../integrity.h" @@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 }; #define IMA_TEMPLATE_IMA_FMT "d|n" /* current content of the policy */ -extern int ima_policy_flag; +extern int ima_policy_flag __wr_after_init; /* set during initialization */ extern int ima_hash_algo; diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 8bc8a1c8cb3f..d49c545b9cfb 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -48,7 +48,7 @@ #define INVALID_PCR(a) (((a) < 0) || \ (a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8)) -int ima_policy_flag; +int ima_policy_flag __wr_after_init; static int temp_ima_appraise; static int build_ima_appraise __ro_after_init; @@ -460,12 +460,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } static int ima_appraise_flag(enum ima_hooks func) @@ -651,7 +652,7 @@ void ima_update_policy(void) list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu); if (ima_rules != policy) { - ima_policy_flag = 0; + wr_assign(ima_policy_flag, 0); ima_rules = policy; /* -- 2.19.1
[RFC PATCH v5 08/12] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init attribute will cause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[RFC PATCH v5 06/12] __wr_after_init: arm64: enable
Set ARCH_HAS_PRMEM to Y for arm64 Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a4168d366127..7cbb2c133ed7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -66,6 +66,7 @@ config ARM64 select ARCH_WANT_COMPAT_IPC_PARSE_VERSION select ARCH_WANT_FRAME_POINTERS select ARCH_HAS_UBSAN_SANITIZE_ALL + select ARCH_HAS_PRMEM select ARM_AMBA select ARM_ARCH_TIMER select ARM_GIC -- 2.19.1
[RFC PATCH v5 07/12] __wr_after_init: Documentation: self-protection
Update the self-protection documentation, to mention also the use of the __wr_after_init attribute. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- Documentation/security/self-protection.rst | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst index f584fb74b4ff..df2614bc25b9 100644 --- a/Documentation/security/self-protection.rst +++ b/Documentation/security/self-protection.rst @@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, these can be marked with the (new and under development) ``__ro_after_init`` attribute. -What remains are variables that are updated rarely (e.g. GDT). These -will need another infrastructure (similar to the temporary exceptions -made to kernel code mentioned above) that allow them to spend the rest -of their lifetime read-only. (For example, when being updated, only the -CPU thread performing the update would be given uninterruptible write -access to the memory.) +Others, which are statically allocated, but still need to be updated +rarely, can be marked with the ``__wr_after_init`` attribute. + +The update mechanism must avoid exposing the data to rogue alterations +during the update. For example, only the CPU thread performing the update +would be given uninterruptible write access to the memory. + +Currently there is no protection available for data allocated dynamically. Segregation of kernel memory from userspace memory ~~ -- 2.19.1
[RFC PATCH v5 00/12] hardening: statically allocated protected memory
To: Andy Lutomirski , To: Matthew Wilcox , To: Nadav Amit To: Peter Zijlstra , To: Dave Hansen , To: Mimi Zohar To: Thiago Jung Bauermann CC: Kees Cook CC: Ahmed Soliman CC: linux-integrity CC: Kernel Hardening CC: Linux-MM CC: Linux Kernel Mailing List Hello, new version of the patchset, with default memset_user() function. Patch-set implementing write-rare memory protection for statically allocated data. Its purpose is to keep write protected the kernel data which is seldom modified, especially if altering it can be exploited during an attack. There is no read overhead, however writing requires special operations that are probably unsuitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * verified (and enabled) only x86_64 and arm64; other architectures need to be tested, possibly providing own backend. Some notes: - in case an architecture doesn't support write rare, the behavior is to fallback to regular write operations - before altering any memory, the destination is sanitized - write rare data is segregated into own set of pages - only x86_64 and arm64 verified, atm - the memset_user() assembly functions seems to work, but I'm not too sure they are really ok - I've added a simple example: the protection of ima_policy_flags - the last patch is optional, but it seemed worth to do the refactoring - the x86_64 user space address range is double the size of the kernel address space, so it's possible to randomize the beginning of the mapping of the kernel address space, but on arm64 they have the same size, so it's not possible to do the same. Eventually, the randomization could affect exclusively the ranges containing protectable memory, but this should be done togeter with the protection of dynamically allocated data (once it is available). - unaddressed: Nadav proposed to do: #define __wr __attribute__((address_space(5))) but I don't know exactly where to use it atm Changelog: v4->v5 -- * turned conditional inclusion of mm.h into permanent * added generic, albeit unoptimized memset_user() function * more verbose error messages for testing of wr_memset() v3->v4 -- * added function for setting memory in user space mapping for arm64 * refactored code, to work with both supported architectures * reduced dependency on x86_64 specific code, to support by default also arm64 * improved memset_user() for x86_64, but I'm not sure if I understood correctly what was the best way to enhance it. v2->v3 -- * both wr_memset and wr_memcpy are implemented as generic functions the arch code must provide suitable helpers * regular initialization for ima_policy_flags: it happens during init * remove spurious code from the initialization function v1->v2 -- * introduce cleaner split between generic and arch code * add x86_64 specific memset_user() * replace kernel-space memset() memcopy() with userspace counterpart * randomize the base address for the alternate map across the entire available address range from user space (128TB - 64TB) * convert BUG() to WARN() * turn verification of written data into debugging option * wr_rcu_assign_pointer() as special case of wr_assign() * example with protection of ima_policy_flags * documentation Igor Stoppa (11): __wr_after_init: linker section and attribute __wr_after_init: Core and default arch __wr_after_init: x86_64: randomize mapping offset __wr_after_init: x86_64: enable __wr_after_init: arm64: enable __wr_after_init: Documentation: self-protection __wr_after_init: lkdtm test __wr_after_init: rodata_test: refactor tests __wr_after_init: rodata_test: test __wr_after_init __wr_after_init: test write rare functionality IMA: turn ima_policy_flags into __wr_after_init Nadav Amit (1): fork: provide a function for copying init_mm Documentation/security/self-protection.rst | 14 +- arch/Kconfig | 22 +++ arch/arm64/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c (new) | 20 +++ drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h | 3 + drivers/misc/lkdtm/perms.c | 29 include/asm-generic/vmlinux.lds.h | 25 +++ include/linux/cache.h | 21 +++ include/linux/prmem.h (new)| 70 include/linux/sched/task.h | 1 + init/main.c| 3 + kernel/fork.c | 24 ++- mm/Kconfig.debug | 8 +
[RFC PATCH v5 02/12] __wr_after_init: linker section and attribute
Introduce a linker section and a matching attribute for statically allocated write rare data. The attribute is named "__wr_after_init". After the init phase is completed, this section will be modifiable only by invoking write rare functions. The section occupies a set of full pages, since the granularity available for write protection is of one memory page. The functionality is automatically activated by any architecture that sets CONFIG_ARCH_HAS_PRMEM Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/Kconfig | 15 +++ include/asm-generic/vmlinux.lds.h | 25 + include/linux/cache.h | 21 + init/main.c | 3 +++ 4 files changed, 64 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 4cfb6de48f79..b0b6d176f1c1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -808,6 +808,21 @@ config VMAP_STACK the stack to map directly to the KASAN shadow map using a formula that is incorrect if the stack is in vmalloc space. +config ARCH_HAS_PRMEM + def_bool n + help + architecture specific symbol stating that the architecture provides + a back-end function for the write rare operation. + +config PRMEM + bool "Write protect critical data that doesn't need high write speed." + depends on ARCH_HAS_PRMEM + default y + help + If the architecture supports it, statically allocated data which + has been selected for hardening becomes (mostly) read-only. + The selection happens by labelling the data "__wr_after_init". + config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..ddb1fd608490 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -311,6 +311,30 @@ KEEP(*(__jump_table)) \ __stop___jump_table = .; +/* + * Allow architectures to handle wr_after_init data on their + * own by defining an empty WR_AFTER_INIT_DATA. + * However, it's important that pages containing WR_RARE data do not + * hold anything else, to avoid both accidentally unprotecting something + * that is supposed to stay read-only all the time and also to protect + * something else that is supposed to be writeable all the time. + */ +#ifndef WR_AFTER_INIT_DATA +#ifdef CONFIG_PRMEM +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(PAGE_SIZE); \ + __start_wr_after_init = .; \ + . = ALIGN(align); \ + *(.data..wr_after_init) \ + . = ALIGN(PAGE_SIZE); \ + __end_wr_after_init = .;\ + . = ALIGN(align); +#else +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(align); +#endif +#endif + /* * Allow architectures to handle ro_after_init data on their * own by defining an empty RO_AFTER_INIT_DATA. @@ -332,6 +356,7 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ + WR_AFTER_INIT_DATA(align) /* wr after init */ \ KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ diff --git a/include/linux/cache.h b/include/linux/cache.h index 750621e41d1c..09bd0b9284b6 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -31,6 +31,27 @@ #define __ro_after_init __attribute__((__section__(".data..ro_after_init"))) #endif +/* + * __wr_after_init is used to mark objects that cannot be modified + * directly after init (i.e. after mark_rodata_ro() has been called). + * These objects become effectively read-only, from the perspective of + * performing a direct write, like a variable assignment. + * However, they can be altered through a dedicated function. + * It is intended for those objects which are occasionally modified after + * init, however they are modified so seldomly, that the extra cost from + * the indirect modification is either negligible or worth paying, for the + * sake of the protection gained. + */ +#ifn
[RFC PATCH v5 04/12] __wr_after_init: x86_64: randomize mapping offset
x86_64 specialized way of defining the base address for the alternate mapping used by write-rare. Since the kernel address space spans across 64TB and it is mapped into a used address space of 128TB, the kernel address space can be shifted by a random offset that is up to 64TB and page aligned. This is accomplished by providing arch-specific version of the function __init_wr_base() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/mm/Makefile | 2 ++ arch/x86/mm/prmem.c (new) | 20 2 files changed, 22 insertions(+) diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd6e52f..66652de1e2c7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_PRMEM)+= prmem.o diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c new file mode 100644 index ..b04fc03f92fb --- /dev/null +++ b/arch/x86/mm/prmem.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library - x86_64 backend + * + * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include + +unsigned long __init __init_wr_base(void) +{ + /* +* Place 64TB of kernel address space within 128TB of user address +* space, at a random page aligned offset. +*/ + return (((unsigned long)kaslr_get_random_long("WR Poke")) & + PAGE_MASK) % (64 * _BITUL(40)); +} -- 2.19.1
[RFC PATCH v5 03/12] __wr_after_init: Core and default arch
The patch provides: - the core functionality for write-rare after init for statically allocated data, based on code from Matthew Wilcox - the default implementation for generic architecture A specific architecture can override one or more of the default functions. The core (API) functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() In case either the selected architecture doesn't support write rare after init, or the functionality is disabled, the write rare functions will resolve into their non-write rare counterpart: - memset() - memcpy() - assignment operator - rcu_assign_pointer() For code that can be either link as module or as built-in (ex: device driver init function), it is not possible to tell upfront what will be the case. For this scenario if the functions are called during system init, they will automatically choose, at runtime, to go through the fast path of non-write rare. Should they be invoked later, during module init, they will use the write-rare path. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/Kconfig| 7 ++ include/linux/prmem.h (new) | 70 ++ mm/Makefile | 1 + mm/prmem.c (new)| 193 ++ 4 files changed, 271 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index b0b6d176f1c1..0380d4a64681 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -814,6 +814,13 @@ config ARCH_HAS_PRMEM architecture specific symbol stating that the architecture provides a back-end function for the write rare operation. +config ARCH_HAS_PRMEM_HEADER + def_bool n + depends on ARCH_HAS_PRMEM + help + architecture specific symbol stating that the architecture provides + own specific header back-end for the write rare operation. + config PRMEM bool "Write protect critical data that doesn't need high write speed." depends on ARCH_HAS_PRMEM diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..05a5e5b3abfd --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library - generic part + * + * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include +#include + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t n) +{ + return memset(p, c, n); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t n) +{ + return memcpy(p, q, n); +} + +#define wr_assign(var, val)((var) = (val)) +#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v) + +#else + +void *wr_memset(void *p, int c, __kernel_size_t n); +void *wr_memcpy(void *p, const void *q, __kernel_size_t n); + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + */ + +#define wr_assign(dst, val) ({ \ + typeof(dst) tmp = (typeof(dst))val; \ + \ + wr_memcpy(, , sizeof(dst)); \ + dst;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer - it MUST be aligned to a machine word + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + * The rcu_assign_pointer() is implemented as equivalent of: + * + * smp_mb(); + * WRITE_ONCE(); + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + smp_mb(); \ + wr_assign(p, v);\ + p; \ +}) +#endif +#endif diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..ef3867c16ce0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_PRMEM) += prmem.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/prmem.c b/mm/prmem.c new file mode 100644 index ..455e1e446260 --- /dev/null +++ b/mm/prmem.c @@ -0,0 +1,193 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +
Re: [RFC PATCH v4 01/12] __wr_after_init: Core and default arch
On 12/02/2019 04:39, Matthew Wilcox wrote: On Tue, Feb 12, 2019 at 01:27:38AM +0200, Igor Stoppa wrote: +#ifndef CONFIG_PRMEM [...] +#else + +#include It's a mistake to do conditional includes like this. That way you see include loops with some configs and not others. Our headers are already so messy, better to just include mm.h unconditionally. ok Can I still do the following, in prmem.c ? #ifdef CONFIG_ARCH_HAS_PRMEM_HEADER +#include +#else + +struct wr_state { + struct mm_struct *prev; +}; + +#endif It's still a conditional include, but it's in a C file, it shouldn't cause any chance of loops. The alternative is that each arch supporting prmem must have a (probably) empty asm/prmem.h header. I did some reasearch about telling the compiler to include a header only if it exists, but it doesn't seem to be a gcc feature. -- igor
Re: [RFC PATCH v4 00/12] hardening: statically allocated protected memory
On 12/02/2019 03:26, Kees Cook wrote: On Mon, Feb 11, 2019 at 5:08 PM igor.sto...@gmail.com wrote: On Tue, 12 Feb 2019, 4.47 Kees Cook On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa wrote: On 12/02/2019 02:09, Kees Cook wrote: On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa wrote: It looked like only the memset() needed architecture support. Is there a reason for not being able to implement memset() in terms of an inefficient put_user() loop instead? That would eliminate the need for per-arch support, yes? So far, yes, however from previous discussion about power arch, I understood this implementation would not be so easy to adapt. Lacking other examples where the extra mapping could be used, I did not want to add code without a use case. Probably both arm and x86 32 bit could do, but I would like to first get to the bitter end with memory protection (the other 2 thirds). Mostly, I hated having just one arch and I also really wanted to have arm64. Right, I meant, if you implemented the _memset() case with put_user() in this version, you could drop the arch-specific _memset() and shrink the patch series. Then you could also enable this across all the architectures in one patch. (Would you even need the Kconfig patches, i.e. won't this "Just Work" on everything with an MMU?) I had similar thoughts, but this answer [1] deflated my hopes (if I understood it correctly). It seems that each arch needs to be massaged in separately. True, but I think x86_64, x86, arm64, and arm will all be "normal". power may be that way too, but they always surprise me. :) Anyway, series looks good, but since nothing uses _memset(), it might make sense to leave it out and put all the arch-enabling into a single patch to cover the 4 archs above, in an effort to make the series even smaller. Actually, I do use it, albeit indirectly. That's the whole point of having the IMA patch as example. This is the fragment: @@ -460,12 +460,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } wr_assign() does just that. However, reading again your previous mails, I realize that I might have misinterpreted what you were suggesting. If the advice is to have also a default memset_user() which relies on put_user(), but do not activate the feature by default for every architecture, I definitely agree that it would be good to have it. I just didn't think about it before. What I cannot do is to turn it on for all the architectures prior to test it and atm I do not have means to do it. But I now realize that most likely you were just suggesting to have full, albeit inefficient default support and then let various archs review/enhance it. I can certainly do this. Regarding testing I have a question: how much can/should I lean on qemu? In most cases the MMU might not need to be fully emulated, so I wonder how well qemu-based testing can ensure that real life scenarios will work. -- igor
Re: [RFC PATCH v4 00/12] hardening: statically allocated protected memory
On 12/02/2019 02:09, Kees Cook wrote: On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa wrote: [...] Patch-set implementing write-rare memory protection for statically allocated data. It seems like this could be expanded in the future to cover dynamic memory too (i.e. just a separate base range in the mm). Indeed. And part of the code refactoring is also geared in that direction. I am working on that part, but it was agreed that I would first provide this subset of features covering statically allocated memory. So I'm sticking to the plan. But this is roughly 1/3 of the basic infra I have in mind. Its purpose is to keep write protected the kernel data which is seldom modified, especially if altering it can be exploited during an attack. There is no read overhead, however writing requires special operations that are probably unsuitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * supports only x86_64 and arm64;other architectures need to provide own backend It looked like only the memset() needed architecture support. Is there a reason for not being able to implement memset() in terms of an inefficient put_user() loop instead? That would eliminate the need for per-arch support, yes? So far, yes, however from previous discussion about power arch, I understood this implementation would not be so easy to adapt. Lacking other examples where the extra mapping could be used, I did not want to add code without a use case. Probably both arm and x86 32 bit could do, but I would like to first get to the bitter end with memory protection (the other 2 thirds). Mostly, I hated having just one arch and I also really wanted to have arm64. But eventually, yes, a generic put_user() loop could do, provided that there are other arch where the extra mapping to user space would be a good way to limit write access. This last part is what I'm not sure of. - I've added a simple example: the protection of ima_policy_flags You'd also looked at SELinux too, yes? What other things could be targeted for protection? (It seems we can't yet protect page tables themselves with this...) Yes, I have. See the "1/3" explanation above. I'm also trying to get away with as small example as possible, to get the basic infra merged. SELinux is not going to be a small patch set. I'd rather move to it once at least some of the framework is merged. It might be a good use case for dynamic allocation, if I do not find something smaller. But for static write rare, going after IMA was easier, and it is still a good target for protection, imho, as flipping this variable should be sufficient for turning IMA off. For the page tables, I have in mind a little bit different approach, that I hope to explain better once I get to do the dynamic allocation. - the x86_64 user space address range is double the size of the kernel address space, so it's possible to randomize the beginning of the mapping of the kernel address space, but on arm64 they have the same size, so it's not possible to do the same Only the wr_rare section needs mapping, though, yes? Yup, however, once more, I'm not so keen to do what seems as premature optimization, before I have addressed the framework in its entirety, as the dynamic allocation will need similar treatment. - I'm not sure if it's correct, since it doesn't seem to be that common in kernel sources, but instead of using #defines for overriding default function calls, I'm using "weak" for the default functions. The tradition is to use #defines for easier readability, but "weak" continues to be a thing. *shrug* Yes, I wasn't so sure about it, but I kinda liked the fact that, by using "weak", the arch header becomes optional, unless one has to redefine the struct wr_state. This will be a nice addition to protect more of the kernel's static data from write-what-where attacks. :) I hope so :-) -- thanks, igor
[RFC PATCH v4 03/12] __wr_after_init: x86_64: randomize mapping offset
x86_64 specialized way of defining the base address for the alternate mapping used by write-rare. Since the kernel address space spans across 64TB and it is mapped into a used address space of 128TB, the kernel address space can be shifted by a random offset that is up to 64TB and page aligned. This is accomplished by providing arch-specific version of the function __init_wr_base() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/mm/Makefile | 2 ++ arch/x86/mm/prmem.c (new) | 20 2 files changed, 22 insertions(+) diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd6e52f..66652de1e2c7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_PRMEM)+= prmem.o diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c new file mode 100644 index ..b04fc03f92fb --- /dev/null +++ b/arch/x86/mm/prmem.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library - x86_64 backend + * + * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include + +unsigned long __init __init_wr_base(void) +{ + /* +* Place 64TB of kernel address space within 128TB of user address +* space, at a random page aligned offset. +*/ + return (((unsigned long)kaslr_get_random_long("WR Poke")) & + PAGE_MASK) % (64 * _BITUL(40)); +} -- 2.19.1
[RFC PATCH v4 08/12] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init attribute will cause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[RFC PATCH v4 11/12] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. It can be optionally compiled as module. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/Kconfig.debug | 8 +++ mm/Makefile| 1 + mm/test_write_rare.c (new) | 136 +++ 3 files changed, 145 insertions(+) diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..a62c31901fea 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index ef3867c16ce0..8de1d468f4e7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_PRMEM) += prmem.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..dd2a0e2d6024 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,136 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static bool test_pattern(void) +{ + return (memchr_inv(array, '0', PAGE_SIZE / 2) || + memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) || + memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) || + memchr_inv(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) || + memchr_inv(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)); +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memchr_inv(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memchr_inv(array, '0', PAGE_SIZE * 3), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2), +"Array write rare memset test failed.")) + return false; + + if (WARN(test_pattern(), "Array write rare memset test failed.")) + return false; + + pr_info("Array write rare memset test passed."); + return true; +} + +static u8 array_1[PAGE_SIZE * 2]; +static u8 array_2[PAGE_SIZE * 2]; + +static bool test_wr_memcpy(void) +{ + int new_val = 0x12345678; + + wr_assign(scalar, new_val); + if (WARN(memcmp(, _val, sizeof(scalar)), +"Scalar write rare memcpy test failed.")) + return false; + pr_info("Scalar write rare memcpy test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + memset(array_1, '1', P
[RFC PATCH v4 12/12] IMA: turn ima_policy_flags into __wr_after_init
The policy flags could be targeted by an attacker aiming at disabling IMA, so that there would be no trace of a file system modification in the measurement list. Since the flags can be altered at runtime, it is not possible to make them become fully read-only, for example with __ro_after_init. __wr_after_init can still provide some protection, at least against simple memory overwrite attacks Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- security/integrity/ima/ima.h| 3 ++- security/integrity/ima/ima_policy.c | 9 + 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index cc12f3449a72..297c25f5122e 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include "../integrity.h" @@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 }; #define IMA_TEMPLATE_IMA_FMT "d|n" /* current content of the policy */ -extern int ima_policy_flag; +extern int ima_policy_flag __wr_after_init; /* set during initialization */ extern int ima_hash_algo; diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 8bc8a1c8cb3f..d49c545b9cfb 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -48,7 +48,7 @@ #define INVALID_PCR(a) (((a) < 0) || \ (a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8)) -int ima_policy_flag; +int ima_policy_flag __wr_after_init; static int temp_ima_appraise; static int build_ima_appraise __ro_after_init; @@ -460,12 +460,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } static int ima_appraise_flag(enum ima_hooks func) @@ -651,7 +652,7 @@ void ima_update_policy(void) list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu); if (ima_rules != policy) { - ima_policy_flag = 0; + wr_assign(ima_policy_flag, 0); ima_rules = policy; /* -- 2.19.1
[RFC PATCH v4 05/12] __wr_after_init: arm64: memset_user()
arm64 specific version of memset() for user space, memset_user() In the __wr_after_init scenario, write-rare variables have: - a primary read-only mapping in kernel memory space - an alternate, writable mapping, implemented as user-space mapping The write rare implementation expects the arch code to privide a memset_user() function, which is currently missing. clear_user() is the base for memset_user() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/arm64/include/asm/uaccess.h | 9 + arch/arm64/lib/Makefile| 2 +- arch/arm64/lib/memset_user.S (new) | 63 3 files changed, 73 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 547d7a0c9d05..0094f92a8f1b 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -415,6 +415,15 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi #define INLINE_COPY_TO_USER #define INLINE_COPY_FROM_USER +extern unsigned long __must_check __arch_memset_user(void __user *to, int c, unsigned long n); +static inline unsigned long __must_check __memset_user(void __user *to, int c, unsigned long n) +{ + if (access_ok(to, n)) + n = __arch_memset_user(__uaccess_mask_ptr(to), c, n); + return n; +} +#define memset_user__memset_user + extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n); static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n) { diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 5540a1638baf..614b090888de 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -lib-y := clear_user.o delay.o copy_from_user.o\ +lib-y := clear_user.o memset_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ diff --git a/arch/arm64/lib/memset_user.S b/arch/arm64/lib/memset_user.S new file mode 100644 index ..1bfbda3d112b --- /dev/null +++ b/arch/arm64/lib/memset_user.S @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * memset_user.S - memset for userspace on arm64 + * + * (C) Copyright 2018 Huawey Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Based on arch/arm64/lib/clear_user.S + */ + +#include + +#include + + .text + +/* Prototype: int __arch_memset_user(void *addr, int c, size_t n) + * Purpose : set n bytes of user memory at "addr" to the value "c" + * Params : x0 - addr, user memory address to set + * : x1 - c, byte value + * : x2 - n, number of bytes to set + * Returns : number of bytes NOT set + * + * Alignment fixed up by hardware. + */ +ENTRY(__arch_memset_user) + uaccess_enable_not_uao x3, x4, x5 + // replicate the byte to the whole register + and x1, x1, 0xff + lsl x3, x1, 8 + orr x1, x3, x1 + lsl x3, x1, 16 + orr x1, x3, x1 + lsl x3, x1, 32 + orr x1, x3, x1 + mov x3, x2 // save the size for fixup return + subsx2, x2, #8 + b.mi2f +1: +uao_user_alternative 9f, str, sttr, x1, x0, 8 + subsx2, x2, #8 + b.pl1b +2: addsx2, x2, #4 + b.mi3f +uao_user_alternative 9f, str, sttr, x1, x0, 4 + sub x2, x2, #4 +3: addsx2, x2, #2 + b.mi4f +uao_user_alternative 9f, strh, sttrh, w1, x0, 2 + sub x2, x2, #2 +4: addsx2, x2, #1 + b.mi5f +uao_user_alternative 9f, strb, sttrb, w1, x0, 0 +5: mov x0, #0 + uaccess_disable_not_uao x3, x4 + ret +ENDPROC(__arch_memset_user) + + .section .fixup,"ax" + .align 2 +9: mov x0, x3 // return the original size + ret + .previous -- 2.19.1
[RFC PATCH v4 02/12] __wr_after_init: x86_64: memset_user()
x86_64 specific version of memset() for user space, memset_user() In the __wr_after_init scenario, write-rare variables have: - a primary read-only mapping in kernel memory space - an alternate, writable mapping, implemented as user-space mapping The write rare implementation expects the arch code to privide a memset_user() function, which is currently missing. clear_user() is the base for memset_user() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/include/asm/uaccess_64.h | 6 arch/x86/lib/usercopy_64.c| 51 + 2 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index a9d637bc301d..f194bfce4866 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len); unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len); +unsigned long __must_check +memset_user(void __user *mem, int c, unsigned long len); + +unsigned long __must_check +__memset_user(void __user *mem, int c, unsigned long len); + #endif /* _ASM_X86_UACCESS_64_H */ diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index ee42bb0cbeb3..e61963585354 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -9,6 +9,57 @@ #include #include +/* + * Memset Userspace + */ + +unsigned long __memset_user(void __user *addr, int c, unsigned long size) +{ + long __d0; + unsigned long pattern = 0x0101010101010101UL * (0xFFUL & c); + + might_fault(); + /* no memory constraint: gcc doesn't know about this memory */ + stac(); + asm volatile( + " movq %[pattern], %%rdx\n" + " testq %[size8],%[size8]\n" + " jz 4f\n" + "0: mov %%rdx,(%[dst])\n" + " addq $8,%[dst]\n" + " decl %%ecx ; jnz 0b\n" + "4: movq %[size1],%%rcx\n" + " testl %%ecx,%%ecx\n" + " jz 2f\n" + "1: movb %%dl,(%[dst])\n" + " incq %[dst]\n" + " decl %%ecx ; jnz 1b\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: lea 0(%[size1],%[size8],8),%[size8]\n" + " jmp 2b\n" + ".previous\n" + _ASM_EXTABLE_UA(0b, 3b) + _ASM_EXTABLE_UA(1b, 2b) + : [size8] "="(size), [dst] "=" (__d0) + : [size1] "r" (size & 7), "[size8]" (size / 8), + "[dst]" (addr), [pattern] "r" (pattern) + : "rdx"); + + clac(); + return size; +} +EXPORT_SYMBOL(__memset_user); + +unsigned long memset_user(void __user *to, int c, unsigned long n) +{ + if (access_ok(to, n)) + return __memset_user(to, c, n); + return n; +} +EXPORT_SYMBOL(memset_user); + + /* * Zero Userspace */ -- 2.19.1
[RFC PATCH v4 06/12] __wr_after_init: arm64: enable
Set ARCH_HAS_PRMEM to Y for arm64 Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a4168d366127..7cbb2c133ed7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -66,6 +66,7 @@ config ARM64 select ARCH_WANT_COMPAT_IPC_PARSE_VERSION select ARCH_WANT_FRAME_POINTERS select ARCH_HAS_UBSAN_SANITIZE_ALL + select ARCH_HAS_PRMEM select ARM_AMBA select ARM_ARCH_TIMER select ARM_GIC -- 2.19.1
[RFC PATCH v4 01/12] __wr_after_init: Core and default arch
The patch provides: - the core functionality for write-rare after init for statically allocated data, based on code from Matthew Wilcox - the default implementation for generic architecture A specific architecture can override one or more of the default functions. The core (API) functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() In case either the selected architecture doesn't support write rare after init, or the functionality is disabled, the write rare functions will resolve into their non-write rare counterpart: - memset() - memcpy() - assignment operator - rcu_assign_pointer() For code that can be either link as module or as built-in (ex: device driver init function), it is not possible to tell upfront what will be the case. For this scenario if the functions are called during system init, they will automatically choose, at runtime, to go through the fast path of non-write rare. Should they be invoked later, during module init, they will use the write-rare path. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/Kconfig| 7 ++ include/linux/prmem.h (new) | 71 +++ mm/Makefile | 1 + mm/prmem.c (new)| 179 ++ 4 files changed, 258 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index b0b6d176f1c1..0380d4a64681 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -814,6 +814,13 @@ config ARCH_HAS_PRMEM architecture specific symbol stating that the architecture provides a back-end function for the write rare operation. +config ARCH_HAS_PRMEM_HEADER + def_bool n + depends on ARCH_HAS_PRMEM + help + architecture specific symbol stating that the architecture provides + own specific header back-end for the write rare operation. + config PRMEM bool "Write protect critical data that doesn't need high write speed." depends on ARCH_HAS_PRMEM diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..0e4683c503b9 --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library - generic part + * + * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t n) +{ + return memset(p, c, n); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t n) +{ + return memcpy(p, q, n); +} + +#define wr_assign(var, val)((var) = (val)) +#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v) + +#else + +#include + +void *wr_memset(void *p, int c, __kernel_size_t n); +void *wr_memcpy(void *p, const void *q, __kernel_size_t n); + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + */ + +#define wr_assign(dst, val) ({ \ + typeof(dst) tmp = (typeof(dst))val; \ + \ + wr_memcpy(, , sizeof(dst)); \ + dst;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer - it MUST be aligned to a machine word + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + * The rcu_assign_pointer() is implemented as equivalent of: + * + * smp_mb(); + * WRITE_ONCE(); + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + smp_mb(); \ + wr_assign(p, v);\ + p; \ +}) +#endif +#endif diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..ef3867c16ce0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_PRMEM) += prmem.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/prmem.c b/mm/prmem.c new file mode 100644 index ..9383b7d6951e --- /dev/null +++ b/mm/prmem.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +
[RFC PATCH v4 04/12] __wr_after_init: x86_64: enable
Set ARCH_HAS_PRMEM to Y for x86_64 Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 68261430fe6e..7392b53b12c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -32,6 +32,7 @@ config X86_64 select SWIOTLB select X86_DEV_DMA_OPS select ARCH_HAS_SYSCALL_WRAPPER + select ARCH_HAS_PRMEM # # Arch settings -- 2.19.1
[RFC PATCH v4 07/12] __wr_after_init: Documentation: self-protection
Update the self-protection documentation, to mention also the use of the __wr_after_init attribute. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- Documentation/security/self-protection.rst | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst index f584fb74b4ff..df2614bc25b9 100644 --- a/Documentation/security/self-protection.rst +++ b/Documentation/security/self-protection.rst @@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, these can be marked with the (new and under development) ``__ro_after_init`` attribute. -What remains are variables that are updated rarely (e.g. GDT). These -will need another infrastructure (similar to the temporary exceptions -made to kernel code mentioned above) that allow them to spend the rest -of their lifetime read-only. (For example, when being updated, only the -CPU thread performing the update would be given uninterruptible write -access to the memory.) +Others, which are statically allocated, but still need to be updated +rarely, can be marked with the ``__wr_after_init`` attribute. + +The update mechanism must avoid exposing the data to rogue alterations +during the update. For example, only the CPU thread performing the update +would be given uninterruptible write access to the memory. + +Currently there is no protection available for data allocated dynamically. Segregation of kernel memory from userspace memory ~~ -- 2.19.1
[RFC PATCH v4 09/12] __wr_after_init: rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory, when available. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..e1349520b436 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + pr_info("%s tests were successful", data_type); + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata); } -- 2.19.1
[RFC PATCH v4 10/12] __wr_after_init: rodata_test: test __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index e1349520b436..a669cf9f5a61 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,8 +16,23 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +#ifdef CONFIG_PRMEM +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; +#endif + static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) { @@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data, void rodata_test(void) { - test_data("rodata", _test_data, - (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata); + if (!test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + return; +#ifdef CONFIG_PRMEM + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init); +#endif } -- 2.19.1
[RFC PATCH v4 00/12] hardening: statically allocated protected memory
To: Andy Lutomirski , To: Matthew Wilcox , To: Nadav Amit To: Peter Zijlstra , To: Dave Hansen , To: Mimi Zohar To: Thiago Jung Bauermann CC: Kees Cook CC: Ahmed Soliman CC: linux-integrity CC: Kernel Hardening CC: Linux-MM CC: Linux Kernel Mailing List Hello, at last I'm able to resume work on the memory protection patchset I've proposed some time ago. This version should address comments received so far and introduce support for arm64. Details below. Patch-set implementing write-rare memory protection for statically allocated data. Its purpose is to keep write protected the kernel data which is seldom modified, especially if altering it can be exploited during an attack. There is no read overhead, however writing requires special operations that are probably unsuitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * supports only x86_64 and arm64;other architectures need to provide own backend Some notes: - in case an architecture doesn't support write rare, the behavior is to fallback to regular write operations - before altering any memory, the destination is sanitized - write rare data is segregated into own set of pages - only x86_64 and arm64 supported, atm - the memset_user() assembly functions seems to work, but I'm not too sure they are really ok - I've added a simple example: the protection of ima_policy_flags - the last patch is optional, but it seemed worth to do the refactoring - the x86_64 user space address range is double the size of the kernel address space, so it's possible to randomize the beginning of the mapping of the kernel address space, but on arm64 they have the same size, so it's not possible to do the same - I'm not sure if it's correct, since it doesn't seem to be that common in kernel sources, but instead of using #defines for overriding default function calls, I'm using "weak" for the default functions. - unaddressed: Nadav proposed to do: #define __wr __attribute__((address_space(5))) but I don't know exactly where to use it atm Changelog: v3->v4 -- * added function for setting memory in user space mapping for arm64 * refactored code, to work with both supported architectures * reduced dependency on x86_64 specific code, to support by default also arm64 * improved memset_user() for x86_64, but I'm not sure if I understood correctly what was the best way to enhance it. v2->v3 -- * both wr_memset and wr_memcpy are implemented as generic functions the arch code must provide suitable helpers * regular initialization for ima_policy_flags: it happens during init * remove spurious code from the initialization function v1->v2 -- * introduce cleaner split between generic and arch code * add x86_64 specific memset_user() * replace kernel-space memset() memcopy() with userspace counterpart * randomize the base address for the alternate map across the entire available address range from user space (128TB - 64TB) * convert BUG() to WARN() * turn verification of written data into debugging option * wr_rcu_assign_pointer() as special case of wr_assign() * example with protection of ima_policy_flags * documentation Igor Stoppa (12): __wr_after_init: Core and default arch __wr_after_init: x86_64: memset_user() __wr_after_init: x86_64: randomize mapping offset __wr_after_init: x86_64: enable __wr_after_init: arm64: memset_user() __wr_after_init: arm64: enable __wr_after_init: Documentation: self-protection __wr_after_init: lkdtm test __wr_after_init: rodata_test: refactor tests __wr_after_init: rodata_test: test __wr_after_init __wr_after_init: test write rare functionality IMA: turn ima_policy_flags into __wr_after_init Documentation/security/self-protection.rst | 14 +- arch/Kconfig | 7 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/uaccess.h | 9 ++ arch/arm64/lib/Makefile| 2 +- arch/arm64/lib/memset_user.S (new) | 63 arch/x86/Kconfig | 1 + arch/x86/include/asm/uaccess_64.h | 6 + arch/x86/lib/usercopy_64.c | 51 ++ arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c (new) | 20 +++ drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h | 3 + drivers/misc/lkdtm/perms.c | 29 include/linux/prmem.h (new)| 71 mm/Kconfig.debug | 8 + mm/Makefile| 2 + mm/prmem.c (new) | 179 ++
Re: [PATCH 03/12] __wr_after_init: generic header
On 21/12/2018 21:45, Matthew Wilcox wrote: On Fri, Dec 21, 2018 at 11:38:16AM -0800, Nadav Amit wrote: On Dec 19, 2018, at 1:33 PM, Igor Stoppa wrote: +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET); +} What do you think about doing something like: #define __wr __attribute__((address_space(5))) And then make all the pointers to write-rarely memory to use this attribute? It might require more changes to the code, but can prevent bugs. I like this idea. It was something I was considering suggesting. I have been thinking about this sort of problem, although from a bit different angle: 1) enforcing alignment for pointers This can be implemented in similar way, by creating a multi-attribute that would define section, address space, like said here, and alignment. However I'm not sure if it's possible to do anything to enforce the alignment of a pointer field within a structure. I haven't had time to look into this yet. 2) validation of the correctness of the actual value Inside the kernel code, a function is not supposed to sanitize its arguments, as long as they come from some other trusted part of the kernel, rather than say from userspace or from some HW interface. However,ROP/JOP should be considered. I am aware of various efforts to make it harder to exploit these techniques, like signed pointers, CFI plugins, LTO. But they are not necessarily available on every platform and mostly, afaik, they focus on specific type of attacks. LTO can help with global optimizations, for example inlining functions across different objects. CFI can detect jumps in the middle of a function, rather than proper function invocation, from its natural entry point. Signed pointers can prevent data-based attacks to the execution flow, and they might have a role in preventing the attack I have in mind, but they are not available on all platforms. What I'd like to do, is to verify, at runtime, that the pointer belongs to the type that the receiving function is meant for. Ex: a legitimate __wr_after_init data must exist between __start_wr_after_init and __end_wr_after_init That is easier and cleaner to test, imho. But dynamically allocated memory doesn't have any such constraint. If it was possible to introduce, for example, a flag to pass to vmalloc, to get the vmap_area from within a specific address range, it would reduce the attack surface. In the implementation I have right now, I'm using extra flags for the pmalloc pages, which means the metadata is the new target for an attack. But with adding the constraint that a dynamically allocated protected memory page must be within a range, then the attacker must change the underlying PTE. And if a region of PTEs are all part of protected memory, it is possible to make the PMD write rare. -- igor
Re: [PATCH 03/12] __wr_after_init: generic functionality
On 21/12/2018 21:43, Matthew Wilcox wrote: On Fri, Dec 21, 2018 at 09:07:54PM +0200, Igor Stoppa wrote: On 21/12/2018 20:41, Matthew Wilcox wrote: On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote: +static inline int memtst(void *p, int c, __kernel_size_t len) I don't understand why you're verifying that writes actually happen in production code. Sure, write lib/test_wrmem.c or something, but verifying every single rare write seems like a mistake to me. This is actually something I wrote more as a stop-gap. I have the feeling there should be already something similar available. And probably I could not find it. Unless it's so trivial that it doesn't deserve to become a function? But if there is really no existing alternative, I can put it in a separate file. I'm not questioning the implementation, I'm questioning why it's ever called. If I type 'p = q', I don't then verify that p actually is equal to q. I just assume that the compiler did its job. Paranoia, probably. My thinking is that, once the data is protected, it could still be attacked through the metadata. A pte, for example. Preventing the setting of a flag, that for example enables a functionality, might be a nice way to thwart all this protection. If I verify that the write was successful, through the read-only address, then I know that the action really completed successfully. There are many more types of attack that one can come up with, but attacking the metadata is probably the most likely next level. So what I'm trying to do is more akin to: p = *p = q; d == q; But in our case there is an indefinite amount of time between the creation of the alternate mapping and its use. Another way could be to check that the mapping is correct before writing to it. Maybe safer? I went for confirming that the end result is correct. Of course it adds overhead, but if the whole thing is already slow and happening not too often, how much does it matter? An alternative approach would be that the code invoking the wr operation performs an explicit test. Would it look better if I implemented this as a wr_assign_verify() inline function? +#ifndef CONFIG_PRMEM So is this PRMEM or wr_mem? It's not obvious that CONFIG_PRMEM controls wrmem. In my mind (maybe still clinging to the old implementation), PRMEM is the master toggle, for protected memory. Then there are various types and the first one being now implemented is write rare after init (because ro after init already exists). However, the same levels of protection should then follow for dynamically allocated memory (ye old pmalloc). PRMEM would then become the moniker for the whole shebang. To my mind, what we have in this patchset is support for statically allocated protected (or write-rare) memory. Later, we'll add dynamically allocated protected memory. So it's all protected memory, and we'll use the same accessors for both ... right? The static one is only write rare because read only after init already exists. The dynamic one must introduce the same write rare, yes, but it should also introduce read_only (I do not count the destruction of an entire pool as a write rare operation). Ex: SELinux policyDB. write rare, regardless if dynamic or static, is a sub-case of protected memory, hence the differentiation between protected and write rare. I'm not claiming to be particularly skilled at choosing names, so if something better sounding is available, it can be used. This is the best I could come up with. [...] I don't think there's anything to be done in that case. Indeed, I think the only thing to do is panic and stop the whole machine if initialisation fails. We'd be in a situation where nothing can update protected memory, and the machine just won't work. I suppose we could "fail insecure" and never protect the memory, but I think that's asking for trouble. ok, so init will BUG() if it fails, instead of the current WARN_ONCE() and return. Anyway, my concern was for a driver which can be built either as a module or built-in. Its init code will be called before write-protection happens when it's built in, and after write-protection happens when it's a module. It should be able to use wr_assign() in either circumstance. One might also have a utility function which is called from both init and non-init code and want to use wr_assign() whether initialisation has completed or not. If the writable mapping is created early enough, the only penalty for using the write-rare function on a writable variable is that it would be slower. Probably there wouldn't be so much data to deal with. If the driver is dealing with some HW, most likely that would make any write rare extra delay look negligible. -- igor
Re: [PATCH 03/12] __wr_after_init: generic functionality
On 21/12/2018 20:41, Matthew Wilcox wrote: On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote: +static inline int memtst(void *p, int c, __kernel_size_t len) I don't understand why you're verifying that writes actually happen in production code. Sure, write lib/test_wrmem.c or something, but verifying every single rare write seems like a mistake to me. This is actually something I wrote more as a stop-gap. I have the feeling there should be already something similar available. And probably I could not find it. Unless it's so trivial that it doesn't deserve to become a function? But if there is really no existing alternative, I can put it in a separate file. +#ifndef CONFIG_PRMEM So is this PRMEM or wr_mem? It's not obvious that CONFIG_PRMEM controls wrmem. In my mind (maybe still clinging to the old implementation), PRMEM is the master toggle, for protected memory. Then there are various types and the first one being now implemented is write rare after init (because ro after init already exists). However, the same levels of protection should then follow for dynamically allocated memory (ye old pmalloc). PRMEM would then become the moniker for the whole shebang. +#define wr_assign(var, val)((var) = (val)) The hamming distance between 'var' and 'val' is too small. The convention in the line immediately below (p and v) is much more readable. ok, I'll fix it +#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v) +#define wr_assign(var, val) ({ \ + typeof(var) tmp = (typeof(var))val; \ + \ + wr_memcpy(, , sizeof(var)); \ + var;\ +}) Doesn't wr_memcpy return 'var' anyway? It should return the destination, which is But I wanted to return the actual value of the assignment, val Like if I do (a = 7) it evaluates to 7, similarly wr_assign(a, 7) would also evaluate to 7 The reason why i returned var instead of val is that it would allow to detect any error. +/** + * wr_memcpy() - copyes size bytes from q to p typo :-( thanks + * @p: beginning of the memory to write to + * @q: beginning of the memory to read from + * @size: amount of bytes to copy + * + * Returns pointer to the destination + * The architecture code must provide: + * void __wr_enable(wr_state_t *state) + * void *__wr_addr(void *addr) + * void *__wr_memcpy(void *p, const void *q, __kernel_size_t size) + * void __wr_disable(wr_state_t *state) This section shouldn't be in the user documentation of wr_memcpy(). ok + */ +void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + wr_state_t wr_state; + void *wr_poking_addr = __wr_addr(p); + + if (WARN_ONCE(!wr_ready, "No writable mapping available") || Surely not. If somebody's called wr_memcpy() before wr_ready is set, that means we can just call memcpy(). What I was trying to catch is the case where, after a failed init, the writable mapping doesn't exist. In that case wr_ready is also not set. The problem is that I just don't know what to do in a case where there has been such a major error which prevents he creation of hte alternate mapping. I understand that we still want to continue, to provide as much debug info as possible, but I am at a loss about finding the saner course of actions. -- igor
Re: [PATCH 01/12] x86_64: memset_user()
On 21/12/2018 20:25, Matthew Wilcox wrote: On Fri, Dec 21, 2018 at 08:14:12PM +0200, Igor Stoppa wrote: +unsigned long __memset_user(void __user *addr, int c, unsigned long size) +{ + long __d0; + unsigned long pattern = 0; + int i; + + for (i = 0; i < 8; i++) + pattern = (pattern << 8) | (0xFF & c); That's inefficient. pattern = (unsigned char)c; pattern |= pattern << 8; pattern |= pattern << 16; pattern |= pattern << 32; ok, thank you -- igor
[PATCH 04/12] __wr_after_init: debug writes
After each write operation, confirm that it was successful, otherwise generate a warning. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/Kconfig.debug | 8 mm/prmem.c | 6 ++ 2 files changed, 14 insertions(+) diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..b10305cfac3c 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM +bool "Verify each write rare operation." +depends on PRMEM +default n +help + After any write rare operation, compares the data written with the + value provided by the caller. diff --git a/mm/prmem.c b/mm/prmem.c index e1c1be3a1171..51f6776e2515 100644 --- a/mm/prmem.c +++ b/mm/prmem.c @@ -61,6 +61,9 @@ void *wr_memcpy(void *p, const void *q, __kernel_size_t size) __wr_enable(_state); __wr_memcpy(wr_poking_addr, q, size); __wr_disable(_state); +#ifdef CONFIG_DEBUG_PRMEM + VM_WARN_ONCE(memcmp(p, q, size), "Failed %s()", __func__); +#endif local_irq_enable(); return p; } @@ -92,6 +95,9 @@ void *wr_memset(void *p, int c, __kernel_size_t len) __wr_enable(_state); __wr_memset(wr_poking_addr, c, len); __wr_disable(_state); +#ifdef CONFIG_DEBUG_PRMEM + VM_WARN_ONCE(memtst(p, c, len), "Failed %s()", __func__); +#endif local_irq_enable(); return p; } -- 2.19.1
[PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init
The policy flags could be targeted by an attacker aiming at disabling IMA, so that there would be no trace of a file system modification in the measurement list. Since the flags can be altered at runtime, it is not possible to make them become fully read-only, for example with __ro_after_init. __wr_after_init can still provide some protection, at least against simple memory overwrite attacks Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- security/integrity/ima/ima.h| 3 ++- security/integrity/ima/ima_policy.c | 9 + 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index cc12f3449a72..297c25f5122e 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include "../integrity.h" @@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 }; #define IMA_TEMPLATE_IMA_FMT "d|n" /* current content of the policy */ -extern int ima_policy_flag; +extern int ima_policy_flag __wr_after_init; /* set during initialization */ extern int ima_hash_algo; diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 7489cb7de6dc..2004de818d92 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -47,7 +47,7 @@ #define INVALID_PCR(a) (((a) < 0) || \ (a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8)) -int ima_policy_flag; +int ima_policy_flag __wr_after_init; static int temp_ima_appraise; static int build_ima_appraise __ro_after_init; @@ -452,12 +452,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } static int ima_appraise_flag(enum ima_hooks func) @@ -574,7 +575,7 @@ void ima_update_policy(void) list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu); if (ima_rules != policy) { - ima_policy_flag = 0; + wr_assign(ima_policy_flag, 0); ima_rules = policy; } ima_update_policy_flag(); -- 2.19.1
[PATCH 10/12] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. It can be optionally compiled as module. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/Kconfig.debug | 8 +++ mm/Makefile | 1 + mm/test_write_rare.c | 135 +++ 3 files changed, 144 insertions(+) create mode 100644 mm/test_write_rare.c diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index b10305cfac3c..ae018e56c4e4 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -102,3 +102,11 @@ config DEBUG_PRMEM help After any write rare operation, compares the data written with the value provided by the caller. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index ef3867c16ce0..8de1d468f4e7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_PRMEM) += prmem.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..30574bc34a20 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static bool test_pattern(void) +{ + return (memtst(array, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)); +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memtst(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memtst(array, '0', PAGE_SIZE * 3), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2), +"Array write rare memset test failed.")) + return false; + + if (WARN(test_pattern(), "Array write rare memset test failed.")) + return false; + + pr_info("Array write rare memset test passed."); + return true; +} + +static u8 array_1[PAGE_SIZE * 2]; +static u8 array_2[PAGE_SIZE * 2]; + +static bool test_wr_memcpy(void) +{ + int new_val = 0x12345678; + + wr_assign(scalar, new_val); + if (WARN(memcmp(, _val, sizeof(scalar)), +"Scalar write rare memcpy test failed.")) + return false; + pr_info("Scalar write rare memcpy test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + memset(array_1, '1', PAGE_SIZE * 2); + memset
[PATCH 07/12] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init attribute will cause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[PATCH 06/12] __wr_after_init: Documentation: self-protection
Update the self-protection documentation, to mention also the use of the __wr_after_init attribute. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- Documentation/security/self-protection.rst | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst index f584fb74b4ff..df2614bc25b9 100644 --- a/Documentation/security/self-protection.rst +++ b/Documentation/security/self-protection.rst @@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, these can be marked with the (new and under development) ``__ro_after_init`` attribute. -What remains are variables that are updated rarely (e.g. GDT). These -will need another infrastructure (similar to the temporary exceptions -made to kernel code mentioned above) that allow them to spend the rest -of their lifetime read-only. (For example, when being updated, only the -CPU thread performing the update would be given uninterruptible write -access to the memory.) +Others, which are statically allocated, but still need to be updated +rarely, can be marked with the ``__wr_after_init`` attribute. + +The update mechanism must avoid exposing the data to rogue alterations +during the update. For example, only the CPU thread performing the update +would be given uninterruptible write access to the memory. + +Currently there is no protection available for data allocated dynamically. Segregation of kernel memory from userspace memory ~~ -- 2.19.1
[PATCH 02/12] __wr_after_init: linker section and label
Introduce a section and a label for statically allocated write rare data. The label is named "__wr_after_init". As the name implies, after the init phase is completed, this section will be modifiable only by invoking write rare functions. The section must take up a set of full pages. To activate both section and label, the arch must set CONFIG_ARCH_HAS_PRMEM Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/Kconfig | 15 +++ include/asm-generic/vmlinux.lds.h | 25 + include/linux/cache.h | 21 + init/main.c | 2 ++ 4 files changed, 63 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index e1e540ffa979..8668ffec8098 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -802,6 +802,21 @@ config VMAP_STACK the stack to map directly to the KASAN shadow map using a formula that is incorrect if the stack is in vmalloc space. +config ARCH_HAS_PRMEM + def_bool n + help + architecture specific symbol stating that the architecture provides + a back-end function for the write rare operation. + +config PRMEM + bool "Write protect critical data that doesn't need high write speed." + depends on ARCH_HAS_PRMEM + default y + help + If the architecture supports it, statically allocated data which + has been selected for hardening becomes (mostly) read-only. + The selection happens by labelling the data "__wr_after_init". + config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..ddb1fd608490 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -311,6 +311,30 @@ KEEP(*(__jump_table)) \ __stop___jump_table = .; +/* + * Allow architectures to handle wr_after_init data on their + * own by defining an empty WR_AFTER_INIT_DATA. + * However, it's important that pages containing WR_RARE data do not + * hold anything else, to avoid both accidentally unprotecting something + * that is supposed to stay read-only all the time and also to protect + * something else that is supposed to be writeable all the time. + */ +#ifndef WR_AFTER_INIT_DATA +#ifdef CONFIG_PRMEM +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(PAGE_SIZE); \ + __start_wr_after_init = .; \ + . = ALIGN(align); \ + *(.data..wr_after_init) \ + . = ALIGN(PAGE_SIZE); \ + __end_wr_after_init = .;\ + . = ALIGN(align); +#else +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(align); +#endif +#endif + /* * Allow architectures to handle ro_after_init data on their * own by defining an empty RO_AFTER_INIT_DATA. @@ -332,6 +356,7 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ + WR_AFTER_INIT_DATA(align) /* wr after init */ \ KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ diff --git a/include/linux/cache.h b/include/linux/cache.h index 750621e41d1c..09bd0b9284b6 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -31,6 +31,27 @@ #define __ro_after_init __attribute__((__section__(".data..ro_after_init"))) #endif +/* + * __wr_after_init is used to mark objects that cannot be modified + * directly after init (i.e. after mark_rodata_ro() has been called). + * These objects become effectively read-only, from the perspective of + * performing a direct write, like a variable assignment. + * However, they can be altered through a dedicated function. + * It is intended for those objects which are occasionally modified after + * init, however they are modified so seldomly, that the extra cost from + * the indirect modification is either negligible or worth paying, for the + * sake of the protection gained. + */ +#ifndef __wr_after_init +#ifdef CONFIG_PRMEM +#define __wr_after_init \ + __attribute__((__se
[PATCH 12/12] x86_64: __clear_user as case of __memset_user
To avoid code duplication, re-use __memset_user(), when clearing user-space memory. The overhead should be minimal (2 extra register assignments) and outside of the writing loop. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/lib/usercopy_64.c | 29 + 1 file changed, 1 insertion(+), 28 deletions(-) diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 84f8f8a20b30..ab6aabb62055 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -69,34 +69,7 @@ EXPORT_SYMBOL(memset_user); unsigned long __clear_user(void __user *addr, unsigned long size) { - long __d0; - might_fault(); - /* no memory constraint because it doesn't change any memory gcc knows - about */ - stac(); - asm volatile( - " testq %[size8],%[size8]\n" - " jz 4f\n" - "0: movq $0,(%[dst])\n" - " addq $8,%[dst]\n" - " decl %%ecx ; jnz 0b\n" - "4: movq %[size1],%%rcx\n" - " testl %%ecx,%%ecx\n" - " jz 2f\n" - "1: movb $0,(%[dst])\n" - " incq %[dst]\n" - " decl %%ecx ; jnz 1b\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: lea 0(%[size1],%[size8],8),%[size8]\n" - " jmp 2b\n" - ".previous\n" - _ASM_EXTABLE_UA(0b, 3b) - _ASM_EXTABLE_UA(1b, 2b) - : [size8] "="(size), [dst] "=" (__d0) - : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr)); - clac(); - return size; + return __memset_user(addr, 0, size); } EXPORT_SYMBOL(__clear_user); -- 2.19.1
[PATCH 09/12] rodata_test: add verification for __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index e1349520b436..a669cf9f5a61 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,8 +16,23 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +#ifdef CONFIG_PRMEM +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; +#endif + static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) { @@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data, void rodata_test(void) { - test_data("rodata", _test_data, - (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata); + if (!test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + return; +#ifdef CONFIG_PRMEM + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init); +#endif } -- 2.19.1
[PATCH 08/12] rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory, when available. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..e1349520b436 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + pr_info("%s tests were successful", data_type); + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata); } -- 2.19.1
[PATCH 03/12] __wr_after_init: generic functionality
The patch provides: - the generic part of the write rare functionality for static data, based on code from Matthew Wilcox - the dummy functionality, in case an arch doesn't support write rare or the functionality is disabled The basic functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 106 ++ mm/Makefile | 1 + mm/prmem.c| 97 ++ 3 files changed, 204 insertions(+) create mode 100644 include/linux/prmem.h create mode 100644 mm/prmem.c diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..12c1d0d1cb78 --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,106 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Support for: + * - statically allocated write rare data + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include +#include + + +/** + * memtst() - test len bytes starting at p to match the c value + * @p: beginning of the memory to test + * @c: byte to compare against + * @len: amount of bytes to test + * + * Returns 0 on success, non-zero otherwise. + */ +static inline int memtst(void *p, int c, __kernel_size_t len) +{ + __kernel_size_t i; + + for (i = 0; i < len; i++) { + u8 d = *(i + (u8 *)p) - (u8)c; + + if (unlikely(d)) + return d; + } + return 0; +} + + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return memset(p, c, len); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return memcpy(p, q, size); +} + +#define wr_assign(var, val)((var) = (val)) +#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v) + +#else + +#include +#include +#include +#include + +#include + +void *wr_memset(void *p, int c, __kernel_size_t len); +void *wr_memcpy(void *p, const void *q, __kernel_size_t size); + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + * + * Note: it might be possible to optimize this, to use wr_memset in some + * cases (maybe with NULL?). + */ + +#define wr_assign(var, val) ({ \ + typeof(var) tmp = (typeof(var))val; \ + \ + wr_memcpy(, , sizeof(var)); \ + var;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer - it MUST be aligned to a machine word + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + * The rcu_assign_pointer() is implemented as equivalent of: + * + * smp_mb(); + * WRITE_ONCE(); + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + smp_mb(); \ + wr_assign(p, v);\ + p; \ +}) +#endif +#endif diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..ef3867c16ce0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_PRMEM) += prmem.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/prmem.c b/mm/prmem.c new file mode 100644 index ..e1c1be3a1171 --- /dev/null +++ b/mm/prmem.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library + * + * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include +#include + +__ro_after_init bool wr_ready; + +/* + * The following two variables are statically allocated by the linker + * script at the the boundaries of the memory region (rounded up to + * multiples of PAGE_SIZE) reserved for __wr_after_init. + */ +extern long __start_wr_after_init; +extern long __end_wr_after_init; +static unsigned long start = (unsigned long)&__start_wr_after_init;
[PATCH 05/12] __wr_after_init: x86_64: __wr_op
Architecture-specific implementation of the core write rare operation. The implementation is based on code from Andy Lutomirski and Nadav Amit for patching the text on x86 [here goes reference to commits, once merged] The modification of write protected data is done through an alternate mapping of the same pages, as writable. This mapping is persistent, but active only for a core that is performing a write rare operation. And only for the duration of said operation. Local interrupts are disabled, while the alternate mapping is active. In theory, it could introduce a non-predictable delay, in a preemptible system, however the amount of data to be altered is likely to be far smaller than a page. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/Kconfig | 1 + arch/x86/include/asm/prmem.h | 72 arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c | 69 ++ 4 files changed, 144 insertions(+) create mode 100644 arch/x86/include/asm/prmem.h create mode 100644 arch/x86/mm/prmem.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8689e794a43c..e5e4fc4fa5c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -32,6 +32,7 @@ config X86_64 select SWIOTLB select X86_DEV_DMA_OPS select ARCH_HAS_SYSCALL_WRAPPER + select ARCH_HAS_PRMEM # # Arch settings diff --git a/arch/x86/include/asm/prmem.h b/arch/x86/include/asm/prmem.h new file mode 100644 index ..e1f09f881351 --- /dev/null +++ b/arch/x86/include/asm/prmem.h @@ -0,0 +1,72 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Support for: + * - statically allocated write rare data + */ + +#ifndef _ASM_X86_PRMEM_H +#define _ASM_X86_PRMEM_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +typedef temporary_mm_state_t wr_state_t; + +extern __ro_after_init struct mm_struct *wr_poking_mm; +extern __ro_after_init unsigned long wr_poking_base; + +static inline void *__wr_addr(void *addr) +{ + return (void *)(wr_poking_base + (unsigned long)addr); +} + +static inline void __wr_enable(wr_state_t *state) +{ + *state = use_temporary_mm(wr_poking_mm); +} + +static inline void __wr_disable(wr_state_t *state) +{ + unuse_temporary_mm(*state); +} + + +/** + * __wr_memset() - sets len bytes of the destination p to the c value + * @p: beginning of the memory to write to + * @c: byte to replicate + * @len: amount of bytes to copy + * + * Returns pointer to the destination + */ +static inline void *__wr_memset(void *p, int c, __kernel_size_t len) +{ + return (void *)memset_user((void __user *)p, (u8)c, len); +} + +/** + * __wr_memcpy() - copyes size bytes from q to p + * @p: beginning of the memory to write to + * @q: beginning of the memory to read from + * @size: amount of bytes to copy + * + * Returns pointer to the destination + */ +static inline void *__wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return (void *)copy_to_user((void __user *)p, q, size); +} + +#endif diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd6e52f..66652de1e2c7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_PRMEM)+= prmem.o diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c new file mode 100644 index ..f4b36baa2f19 --- /dev/null +++ b/arch/x86/mm/prmem.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library + * + * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include + +extern __ro_after_init bool wr_ready; +__ro_after_init struct mm_struct *wr_poking_mm; +__ro_after_init unsigned long wr_poking_base; + +/* + * The following two variables are statically allocated by the linker + * script at the the boundaries of the memory region (rounded up to + * multiples of PAGE_SIZE) reserved for __wr_after_init. + */ +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +struct mm_struct *copy_init_mm(void); +void __init wr_poking_init(void) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_af
[PATCH 01/12] x86_64: memset_user()
Create x86_64 specific version of memset for user space, based on clear_user(). This will be used for implementing wr_memset() in the __wr_after_init scenario, where write-rare variables have an alternate mapping for writing. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: Thiago Jung Bauermann CC: Ahmed Soliman CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/include/asm/uaccess_64.h | 6 arch/x86/lib/usercopy_64.c| 54 +++ 2 files changed, 60 insertions(+) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index a9d637bc301d..f194bfce4866 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len); unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len); +unsigned long __must_check +memset_user(void __user *mem, int c, unsigned long len); + +unsigned long __must_check +__memset_user(void __user *mem, int c, unsigned long len); + #endif /* _ASM_X86_UACCESS_64_H */ diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 1bd837cdc4b1..84f8f8a20b30 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -9,6 +9,60 @@ #include #include +/* + * Memset Userspace + */ + +unsigned long __memset_user(void __user *addr, int c, unsigned long size) +{ + long __d0; + unsigned long pattern = 0; + int i; + + for (i = 0; i < 8; i++) + pattern = (pattern << 8) | (0xFF & c); + might_fault(); + /* no memory constraint: gcc doesn't know about this memory */ + stac(); + asm volatile( + " movq %[val], %%rdx\n" + " testq %[size8],%[size8]\n" + " jz 4f\n" + "0: mov %%rdx,(%[dst])\n" + " addq $8,%[dst]\n" + " decl %%ecx ; jnz 0b\n" + "4: movq %[size1],%%rcx\n" + " testl %%ecx,%%ecx\n" + " jz 2f\n" + "1: movb %%dl,(%[dst])\n" + " incq %[dst]\n" + " decl %%ecx ; jnz 1b\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: lea 0(%[size1],%[size8],8),%[size8]\n" + " jmp 2b\n" + ".previous\n" + _ASM_EXTABLE_UA(0b, 3b) + _ASM_EXTABLE_UA(1b, 2b) + : [size8] "="(size), [dst] "=" (__d0) + : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr), + [val] "ri"(pattern) + : "rdx"); + + clac(); + return size; +} +EXPORT_SYMBOL(__memset_user); + +unsigned long memset_user(void __user *to, int c, unsigned long n) +{ + if (access_ok(VERIFY_WRITE, to, n)) + return __memset_user(to, c, n); + return n; +} +EXPORT_SYMBOL(memset_user); + + /* * Zero Userspace */ -- 2.19.1
Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op
On 21/12/2018 19:23, Andy Lutomirski wrote: On Thu, Dec 20, 2018 at 11:19 AM Igor Stoppa wrote: On 20/12/2018 20:49, Matthew Wilcox wrote: I think you're causing yourself more headaches by implementing this "op" function. I probably misinterpreted the initial criticism on my first patchset, about duplication. Somehow, I'm still thinking to the endgame of having higher-level functions, like list management. Here's some generic code: thank you, I have one question, below void *wr_memcpy(void *dst, void *src, unsigned int len) { wr_state_t wr_state; void *wr_poking_addr = __wr_addr(dst); local_irq_disable(); wr_enable(_state); __wr_memcpy(wr_poking_addr, src, len); Is __wraddr() invoked inside wm_memcpy() instead of being invoked privately within __wr_memcpy() because the code is generic, or is there some other reason? wr_disable(_state); local_irq_enable(); return dst; } Now, x86 can define appropriate macros and functions to use the temporary_mm functionality, and other architectures can do what makes sense to them. I suspect that most architectures will want to do this exactly like x86, though, but sure, it could be restructured like this. In spirit, I think yes, but already I couldn't find a clean ways to do multi-arch wr_enable(_state), so I made that too become arch_dependent. Maybe after implementing write rare for a few archs, it becomes more clear (to me, any advice is welcome) which parts can be considered common. On x86, I *think* that __wr_memcpy() will want to special-case len == 1, 2, 4, and (on 64-bit) 8 byte writes to keep them atomic. i'm guessing this is the same on most or all architectures. I switched to xxx_user() approach, as you suggested. For x86_64 I'm using copy_user() and i added a memset_user(), based on copy_user(). It's already assembly code optimized for dealing with multiples of 8-byte words or subsets. You can see this in the first patch of the patchset, even this one. I'll send out the v3 patchset in a short while. -- igor
Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op
On 20/12/2018 20:49, Matthew Wilcox wrote: I think you're causing yourself more headaches by implementing this "op" function. I probably misinterpreted the initial criticism on my first patchset, about duplication. Somehow, I'm still thinking to the endgame of having higher-level functions, like list management. Here's some generic code: thank you, I have one question, below void *wr_memcpy(void *dst, void *src, unsigned int len) { wr_state_t wr_state; void *wr_poking_addr = __wr_addr(dst); local_irq_disable(); wr_enable(_state); __wr_memcpy(wr_poking_addr, src, len); Is __wraddr() invoked inside wm_memcpy() instead of being invoked privately within __wr_memcpy() because the code is generic, or is there some other reason? wr_disable(_state); local_irq_enable(); return dst; } Now, x86 can define appropriate macros and functions to use the temporary_mm functionality, and other architectures can do what makes sense to them. -- igor
Re: [PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init
Hi, On 20/12/2018 19:30, Thiago Jung Bauermann wrote: Hello Igor, Igor Stoppa writes: diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c index 59d834219cd6..5f4e13e671bf 100644 --- a/security/integrity/ima/ima_init.c +++ b/security/integrity/ima/ima_init.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "ima.h" @@ -98,9 +99,9 @@ void __init ima_load_x509(void) { int unset_flags = ima_policy_flag & IMA_APPRAISE; - ima_policy_flag &= ~unset_flags; + wr_assign(ima_policy_flag, ima_policy_flag & ~unset_flags); integrity_load_x509(INTEGRITY_KEYRING_IMA, CONFIG_IMA_X509_PATH); - ima_policy_flag |= unset_flags; + wr_assign(ima_policy_flag, ima_policy_flag | unset_flags); } #endif In the cover letter, you said: As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Given that, is it still necessary or useful to use wr_assign() in a function marked with __init? I might have been over enthusiastic of using the wr interface. You are right, I can drop these two. Thank you. -- igor
Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op
Hi, On 20/12/2018 19:20, Thiago Jung Bauermann wrote: Hello Igor, +/* + * The following two variables are statically allocated by the linker + * script at the the boundaries of the memory region (rounded up to + * multiples of PAGE_SIZE) reserved for __wr_after_init. + */ +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static inline bool is_wr_after_init(unsigned long ptr, __kernel_size_t size) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long low = ptr; + unsigned long high = ptr + size; + + return likely(start <= low && low <= high && high <= end); +} + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op) +{ + temporary_mm_state_t prev; + unsigned long offset; + unsigned long wr_poking_addr; + + /* Confirm that the writable mapping exists. */ + if (WARN_ONCE(!wr_ready, "No writable mapping available")) + return (void *)dst; + + if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") || + WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range.")) + return (void *)dst; + + offset = dst - (unsigned long)&__start_wr_after_init; + wr_poking_addr = wr_poking_base + offset; + local_irq_disable(); + prev = use_temporary_mm(wr_poking_mm); + + if (op == WR_MEMCPY) + copy_to_user((void __user *)wr_poking_addr, (void *)src, len); + else if (op == WR_MEMSET) + memset_user((void __user *)wr_poking_addr, (u8)src, len); + + unuse_temporary_mm(prev); + local_irq_enable(); + return (void *)dst; +} There's a lot of casting back and forth between unsigned long and void * (also in the previous patch). Is there a reason for that? The intention is to ensure that algebraic operations between addresses are performed as intended, rather than gcc operating some incorrect optimization, wrongly assuming that two addresses belong to the same object. Said this, I can certainly have a further look at the code and see if I can reduce the amount of casts. I do not like them either. But I'm not sure how much can be dropped: if I start from (void *), then I have to cast them to unsigned long for the math. And the xxx_user() operations require a (void __user *). My impression is that there would be less casts if variables holding addresses were declared as void * in the first place. It might save 1 or 2 casts. I'll do the count. In that case, it wouldn't hurt to have an additional argument in __rw_op() to carry the byte value for the WR_MEMSET operation. Wouldn't it clobber one more register? Or can gcc figure out that it's not used? __wr_op() is not inline. + +#define TB (1UL << 40) ^^spurious + +struct mm_struct *copy_init_mm(void); +void __init wr_poking_init(void) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long i; + unsigned long wr_range; + + wr_poking_mm = copy_init_mm(); + if (WARN_ONCE(!wr_poking_mm, "No alternate mapping available.")) + return; + + wr_range = round_up(end - start, PAGE_SIZE); + + /* Randomize the poking address base*/ + wr_poking_base = TASK_UNMAPPED_BASE + + (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) % + (TASK_SIZE - (TASK_UNMAPPED_BASE + wr_range)); + + /* +* Place 64TB of kernel address space within 128TB of user address +* space, at a random page aligned offset. +*/ + wr_poking_base = (((unsigned long)kaslr_get_random_long("WR Poke")) & + PAGE_MASK) % (64 * _BITUL(40)); You're setting wr_poking_base twice in a row? Is this an artifact from rebase? Yes, the first is a leftover. Thanks for spotting it. -- igor
Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op
On 19/12/2018 23:33, Igor Stoppa wrote: + if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") || + WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range.")) + return (void *)dst; + + offset = dst - (unsigned long)&__start_wr_after_init; I forgot to remove the offset. If the whole kernel memory is remapped, it is shifted by wr_poking_base. I'll fix it in the next iteration. + wr_poking_addr = wr_poking_base + offset; wr_poking_addr = wr_poking_base + dst; -- igor
[PATCH] checkpatch.pl: Improve WARNING on Kconfig help
The checkpatch.pl script complains when the help section of a Kconfig entry is too short, but it doesn't really explain what it is looking for. Instead, it gives a generic warning that one should consider writing a paragraph. But what it *really* checks is that the help section is at least .$min_conf_desc_length lines long. Since the definition of what is a paragraph is not really carved in stone (and actually the primary descriptions is "5 sentences"), make the warning less ambiguous by expliciting the actual test condition, so that one doesn't have to read checkpatch.pl sources, to figure out the actual test. Signed-off-by: Igor Stoppa CC: Andy Whitcroft CC: Joe Perches CC: Andi Kleen CC: linux-kernel@vger.kernel.org --- scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index c883ec55654f..818ddada28b5 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2931,7 +2931,7 @@ sub process { } if ($is_start && $is_end && $length < $min_conf_desc_length) { WARN("CONFIG_DESCRIPTION", -"please write a paragraph that describes the config symbol fully\n" . $herecurr); +"expecting a 'help' section of $min_conf_desc_length or more lines\n" . $herecurr); } #print "is_start<$is_start> is_end<$is_end> length<$length>\n"; } -- 2.19.1
Re: [PATCH 2/6] __wr_after_init: write rare for static allocation
On 12/12/2018 11:49, Martin Schwidefsky wrote: On Wed, 5 Dec 2018 15:13:56 -0800 Andy Lutomirski wrote: Hi s390 and powerpc people: it would be nice if this generic implementation *worked* on your architectures and that it will allow you to add some straightforward way to add a better arch-specific implementation if you think that would be better. As the code is right now I can guarantee that it will not work on s390. OK, I have thrown the towel wrt developing at the same time for multiple architectures. ATM I'm oriented toward getting support for one (x86_64), leaving the actual mechanism as architecture specific. Then I can add another one or two and see what makes sense to refactor. This approach should minimize the churning, overall. -- igor
[PATCH 08/12] rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory, when available. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..e1349520b436 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + pr_info("%s tests were successful", data_type); + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata); } -- 2.19.1
[PATCH 07/12] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init attribute will cause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[PATCH 04/12] __wr_after_init: x86_64: __wr_op
Architecture-specific implementation of the core write rare operation. The implementation is based on code from Andy Lutomirski and Nadav Amit for patching the text on x86 [here goes reference to commits, once merged] The modification of write protected data is done through an alternate mapping of the same pages, as writable. This mapping is persistent, but active only for a core that is performing a write rare operation. And only for the duration of said operation. Local interrupts are disabled, while the alternate mapping is active. In theory, it could introduce a non-predictable delay, in a preemptible system, however the amount of data to be altered is likely to be far smaller than a page. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/Kconfig | 1 + arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c | 120 +++ 3 files changed, 123 insertions(+) create mode 100644 arch/x86/mm/prmem.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8689e794a43c..e5e4fc4fa5c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -32,6 +32,7 @@ config X86_64 select SWIOTLB select X86_DEV_DMA_OPS select ARCH_HAS_SYSCALL_WRAPPER + select ARCH_HAS_PRMEM # # Arch settings diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd6e52f..66652de1e2c7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_PRMEM)+= prmem.o diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c new file mode 100644 index ..fc367551e736 --- /dev/null +++ b/arch/x86/mm/prmem.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library + * + * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include +#include + +static __ro_after_init bool wr_ready; +static __ro_after_init struct mm_struct *wr_poking_mm; +static __ro_after_init unsigned long wr_poking_base; + +/* + * The following two variables are statically allocated by the linker + * script at the the boundaries of the memory region (rounded up to + * multiples of PAGE_SIZE) reserved for __wr_after_init. + */ +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static inline bool is_wr_after_init(unsigned long ptr, __kernel_size_t size) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long low = ptr; + unsigned long high = ptr + size; + + return likely(start <= low && low <= high && high <= end); +} + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op) +{ + temporary_mm_state_t prev; + unsigned long offset; + unsigned long wr_poking_addr; + + /* Confirm that the writable mapping exists. */ + if (WARN_ONCE(!wr_ready, "No writable mapping available")) + return (void *)dst; + + if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") || + WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range.")) + return (void *)dst; + + offset = dst - (unsigned long)&__start_wr_after_init; + wr_poking_addr = wr_poking_base + offset; + local_irq_disable(); + prev = use_temporary_mm(wr_poking_mm); + + if (op == WR_MEMCPY) + copy_to_user((void __user *)wr_poking_addr, (void *)src, len); + else if (op == WR_MEMSET) + memset_user((void __user *)wr_poking_addr, (u8)src, len); + + unuse_temporary_mm(prev); + local_irq_enable(); + return (void *)dst; +} + +#define TB (1UL << 40) + +struct mm_struct *copy_init_mm(void); +void __init wr_poking_init(void) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long i; + unsigned long wr_range; + + wr_poking_mm = copy_init_mm(); + if (WARN_ONCE(!wr_poking_mm, "No alternate mapping available.")) + return; + + wr_range = round_up(end - start, PAGE_SIZE); + + /* Randomize the poking address base*/ + wr_poking_base = TASK_UNMAPPED_BASE + + (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) % +
[PATCH 03/12] __wr_after_init: generic header
The header provides: - the generic part of the write rare functionality for static data - the dummy functionality, in case an arch doesn't support write rare or the functionality is disabled The basic functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 142 ++ 1 file changed, 142 insertions(+) create mode 100644 include/linux/prmem.h diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..7b8f3a054d97 --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,142 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Support for: + * - statically allocated write rare data + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * memtst() - test n bytes of the source to match the c value + * @p: beginning of the memory to test + * @c: byte to compare against + * @len: amount of bytes to test + * + * Returns 0 on success, non-zero otherwise. + */ +static inline int memtst(void *p, int c, __kernel_size_t len) +{ + __kernel_size_t i; + + for (i = 0; i < len; i++) { + u8 d = *(i + (u8 *)p) - (u8)c; + + if (unlikely(d)) + return d; + } + return 0; +} + + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return memset(p, c, len); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return memcpy(p, q, size); +} + +#define wr_assign(var, val)((var) = (val)) + +#define wr_rcu_assign_pointer(p, v)\ + rcu_assign_pointer(p, v) + +#else + +/* + * If CONFIG_PRMEM is enabled, the ARCH code must provide an + * implementation for __wr_op() + */ + +enum wr_op_type { + WR_MEMCPY, + WR_MEMSET, + WR_OPS_NUMBER, +}; + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op); + +/** + * wr_memset() - sets n bytes of the destination to the c value + * @p: beginning of the memory to write to + * @c: byte to replicate + * @len: amount of bytes to copy + * + * Returns true on success, false otherwise. + */ +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET); +} + +/** + * wr_memcpy() - copyes n bytes from source to destination + * @dst: beginning of the memory to write to + * @src: beginning of the memory to read from + * @n_bytes: amount of bytes to copy + * + * Returns pointer to the destination + */ +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY); +} + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + * + * Note: it might be possible to optimize this, to use wr_memset in some + * cases (maybe with NULL?). + */ + +#define wr_assign(var, val) ({ \ + typeof(var) tmp = (typeof(var))val; \ + \ + wr_memcpy(, , sizeof(var)); \ + var;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer - it MUST be aligned to a machine word + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + * The rcu_assign_pointer() is implemented as equivalent of: + * + * smp_mb(); + * WRITE_ONCE(); + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + smp_mb(); \ + wr_assign(p, v);\ + p; \ +}) +#endif +#endif -- 2.19.1
[PATCH 05/12] __wr_after_init: x86_64: debug writes
After each write operation, confirm that it was successful, otherwise generate a warning. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/mm/prmem.c | 9 - mm/Kconfig.debug| 8 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c index fc367551e736..9d98525c687a 100644 --- a/arch/x86/mm/prmem.c +++ b/arch/x86/mm/prmem.c @@ -60,7 +60,14 @@ void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, copy_to_user((void __user *)wr_poking_addr, (void *)src, len); else if (op == WR_MEMSET) memset_user((void __user *)wr_poking_addr, (u8)src, len); - +#ifdef CONFIG_DEBUG_PRMEM + if (op == WR_MEMCPY) + VM_WARN_ONCE(memcmp((void *)dst, (void *)src, len), +"Failed wr_memcpy()"); + else if (op == WR_MEMSET) + VM_WARN_ONCE(memtst((void *)dst, (u8)src, len), +"Failed wr_memset()"); +#endif unuse_temporary_mm(prev); local_irq_enable(); return (void *)dst; diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..b10305cfac3c 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM +bool "Verify each write rare operation." +depends on PRMEM +default n +help + After any write rare operation, compares the data written with the + value provided by the caller. -- 2.19.1
[PATCH 10/12] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. It can be optionally compiled as module. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/Kconfig.debug | 8 +++ mm/Makefile | 1 + mm/test_write_rare.c | 135 +++ 3 files changed, 144 insertions(+) create mode 100644 mm/test_write_rare.c diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index b10305cfac3c..ae018e56c4e4 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -102,3 +102,11 @@ config DEBUG_PRMEM help After any write rare operation, compares the data written with the value provided by the caller. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..62d719c0ee1e 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..30574bc34a20 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static bool test_pattern(void) +{ + return (memtst(array, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)); +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memtst(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memtst(array, '0', PAGE_SIZE * 3), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2), +"Array write rare memset test failed.")) + return false; + + if (WARN(test_pattern(), "Array write rare memset test failed.")) + return false; + + pr_info("Array write rare memset test passed."); + return true; +} + +static u8 array_1[PAGE_SIZE * 2]; +static u8 array_2[PAGE_SIZE * 2]; + +static bool test_wr_memcpy(void) +{ + int new_val = 0x12345678; + + wr_assign(scalar, new_val); + if (WARN(memcmp(, _val, sizeof(scalar)), +"Scalar write rare memcpy test failed.")) + return false; + pr_info("Scalar write rare memcpy test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + memset(array_1, '1', PAGE_SIZE * 2); + memset(array_2, '0', PAGE_SIZE * 2); +
[PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init
The policy flags could be targeted by an attacker aiming at disabling IMA, so that there would be no trace of a file system modification in the measurement list. Since the flags can be altered at runtime, it is not possible to make them become fully read-only, for example with __ro_after_init. __wr_after_init can still provide some protection, at least against simple memory overwrite attacks Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- security/integrity/ima/ima.h| 3 ++- security/integrity/ima/ima_init.c | 5 +++-- security/integrity/ima/ima_policy.c | 9 + 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index cc12f3449a72..297c25f5122e 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include "../integrity.h" @@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 }; #define IMA_TEMPLATE_IMA_FMT "d|n" /* current content of the policy */ -extern int ima_policy_flag; +extern int ima_policy_flag __wr_after_init; /* set during initialization */ extern int ima_hash_algo; diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c index 59d834219cd6..5f4e13e671bf 100644 --- a/security/integrity/ima/ima_init.c +++ b/security/integrity/ima/ima_init.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "ima.h" @@ -98,9 +99,9 @@ void __init ima_load_x509(void) { int unset_flags = ima_policy_flag & IMA_APPRAISE; - ima_policy_flag &= ~unset_flags; + wr_assign(ima_policy_flag, ima_policy_flag & ~unset_flags); integrity_load_x509(INTEGRITY_KEYRING_IMA, CONFIG_IMA_X509_PATH); - ima_policy_flag |= unset_flags; + wr_assign(ima_policy_flag, ima_policy_flag | unset_flags); } #endif diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 7489cb7de6dc..2004de818d92 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -47,7 +47,7 @@ #define INVALID_PCR(a) (((a) < 0) || \ (a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8)) -int ima_policy_flag; +int ima_policy_flag __wr_after_init; static int temp_ima_appraise; static int build_ima_appraise __ro_after_init; @@ -452,12 +452,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } static int ima_appraise_flag(enum ima_hooks func) @@ -574,7 +575,7 @@ void ima_update_policy(void) list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu); if (ima_rules != policy) { - ima_policy_flag = 0; + wr_assign(ima_policy_flag, 0); ima_rules = policy; } ima_update_policy_flag(); -- 2.19.1
[PATCH 09/12] rodata_test: add verification for __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index e1349520b436..a669cf9f5a61 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,8 +16,23 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +#ifdef CONFIG_PRMEM +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; +#endif + static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) { @@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data, void rodata_test(void) { - test_data("rodata", _test_data, - (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata); + if (!test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + return; +#ifdef CONFIG_PRMEM + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init); +#endif } -- 2.19.1
[PATCH 06/12] __wr_after_init: Documentation: self-protection
Update the self-protection documentation, to mention also the use of the __wr_after_init attribute. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- Documentation/security/self-protection.rst | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst index f584fb74b4ff..df2614bc25b9 100644 --- a/Documentation/security/self-protection.rst +++ b/Documentation/security/self-protection.rst @@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, these can be marked with the (new and under development) ``__ro_after_init`` attribute. -What remains are variables that are updated rarely (e.g. GDT). These -will need another infrastructure (similar to the temporary exceptions -made to kernel code mentioned above) that allow them to spend the rest -of their lifetime read-only. (For example, when being updated, only the -CPU thread performing the update would be given uninterruptible write -access to the memory.) +Others, which are statically allocated, but still need to be updated +rarely, can be marked with the ``__wr_after_init`` attribute. + +The update mechanism must avoid exposing the data to rogue alterations +during the update. For example, only the CPU thread performing the update +would be given uninterruptible write access to the memory. + +Currently there is no protection available for data allocated dynamically. Segregation of kernel memory from userspace memory ~~ -- 2.19.1
[PATCH 12/12] x86_64: __clear_user as case of __memset_user
To avoid code duplication, re-use __memset_user(), when clearing user-space memory. The overhead should be minimal (2 extra register assignments) and outside of the writing loop. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/lib/usercopy_64.c | 29 + 1 file changed, 1 insertion(+), 28 deletions(-) diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 84f8f8a20b30..ab6aabb62055 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -69,34 +69,7 @@ EXPORT_SYMBOL(memset_user); unsigned long __clear_user(void __user *addr, unsigned long size) { - long __d0; - might_fault(); - /* no memory constraint because it doesn't change any memory gcc knows - about */ - stac(); - asm volatile( - " testq %[size8],%[size8]\n" - " jz 4f\n" - "0: movq $0,(%[dst])\n" - " addq $8,%[dst]\n" - " decl %%ecx ; jnz 0b\n" - "4: movq %[size1],%%rcx\n" - " testl %%ecx,%%ecx\n" - " jz 2f\n" - "1: movb $0,(%[dst])\n" - " incq %[dst]\n" - " decl %%ecx ; jnz 1b\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: lea 0(%[size1],%[size8],8),%[size8]\n" - " jmp 2b\n" - ".previous\n" - _ASM_EXTABLE_UA(0b, 3b) - _ASM_EXTABLE_UA(1b, 2b) - : [size8] "="(size), [dst] "=" (__d0) - : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr)); - clac(); - return size; + return __memset_user(addr, 0, size); } EXPORT_SYMBOL(__clear_user); -- 2.19.1
[RFC v2 PATCH 0/12] hardening: statically allocated protected memory
Patch-set implementing write-rare memory protection for statically allocated data. Its purpose it to keep data write protected kernel data which is seldom modified. There is no read overhead, however writing requires special operations that are probably unsitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * supports only x86_64, other earchitectures need to provide own backend Some notes: - there is a part of generic code which is basically a NOP, but should allow using unconditionally the write protection. It will automatically default to non-protected functionality, if the specific architecture doesn't support write-rare - to avoid the risk of weakening __ro_after_init, __wr_after_init data is in a separate set of pages, and any invocation will confirm that the memory affected falls within this range. rodata_test is modified accordingly, to check also this case. - for now, the patchset addresses only x86_64, as each architecture seems to have own way of dealing with user space. Once a few are implemented, it should be more obvious what code can be refactored as common. - the memset_user() assembly function seems to work, but I'm not too sure it's really ok - I've added a simple example: the protection of ima_policy_flags - the last patch is optional, but it seemed worth to do the refactoring Changelog: v1->v2 * introduce cleaner split between generic and arch code * add x86_64 specific memset_user() * replace kernel-space memset() memcopy() with userspace counterpart * randomize the base address for the alternate map across the entire available address range from user space (128TB - 64TB) * convert BUG() to WARN() * turn verification of written data into debugging option * wr_rcu_assign_pointer() as special case of wr_assign() * example with protection of ima_policy_flags * documentation CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org Igor Stoppa (12): [PATCH 01/12] x86_64: memset_user() [PATCH 02/12] __wr_after_init: linker section and label [PATCH 03/12] __wr_after_init: generic header [PATCH 04/12] __wr_after_init: x86_64: __wr_op [PATCH 05/12] __wr_after_init: x86_64: debug writes [PATCH 06/12] __wr_after_init: Documentation: self-protection [PATCH 07/12] __wr_after_init: lkdtm test [PATCH 08/12] rodata_test: refactor tests [PATCH 09/12] rodata_test: add verification for __wr_after_init [PATCH 10/12] __wr_after_init: test write rare functionality [PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init [PATCH 12/12] x86_64: __clear_user as case of __memset_user Documentation/security/self-protection.rst | 14 ++- arch/Kconfig | 15 +++ arch/x86/Kconfig | 1 + arch/x86/include/asm/uaccess_64.h | 6 + arch/x86/lib/usercopy_64.c | 41 +-- arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c| 127 + drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h | 3 + drivers/misc/lkdtm/perms.c | 29 + include/asm-generic/vmlinux.lds.h | 25 + include/linux/cache.h | 21 include/linux/prmem.h | 142 init/main.c| 2 + mm/Kconfig.debug | 16 +++ mm/Makefile| 1 + mm/rodata_test.c | 69 mm/test_write_rare.c | 135 ++ security/integrity/ima/ima.h | 3 +- security/integrity/ima/ima_init.c | 5 +- security/integrity/ima/ima_policy.c| 9 +- 21 files changed, 629 insertions(+), 40 deletions(-)
[PATCH 01/12] x86_64: memset_user()
Create x86_64 specific version of memset for user space, based on clear_user(). This will be used for implementing wr_memset() in the __wr_after_init scenario, where write-rare variables have an alternate mapping for writing. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/include/asm/uaccess_64.h | 6 arch/x86/lib/usercopy_64.c| 54 +++ 2 files changed, 60 insertions(+) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index a9d637bc301d..f194bfce4866 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len); unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len); +unsigned long __must_check +memset_user(void __user *mem, int c, unsigned long len); + +unsigned long __must_check +__memset_user(void __user *mem, int c, unsigned long len); + #endif /* _ASM_X86_UACCESS_64_H */ diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 1bd837cdc4b1..84f8f8a20b30 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -9,6 +9,60 @@ #include #include +/* + * Memset Userspace + */ + +unsigned long __memset_user(void __user *addr, int c, unsigned long size) +{ + long __d0; + unsigned long pattern = 0; + int i; + + for (i = 0; i < 8; i++) + pattern = (pattern << 8) | (0xFF & c); + might_fault(); + /* no memory constraint: gcc doesn't know about this memory */ + stac(); + asm volatile( + " movq %[val], %%rdx\n" + " testq %[size8],%[size8]\n" + " jz 4f\n" + "0: mov %%rdx,(%[dst])\n" + " addq $8,%[dst]\n" + " decl %%ecx ; jnz 0b\n" + "4: movq %[size1],%%rcx\n" + " testl %%ecx,%%ecx\n" + " jz 2f\n" + "1: movb %%dl,(%[dst])\n" + " incq %[dst]\n" + " decl %%ecx ; jnz 1b\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: lea 0(%[size1],%[size8],8),%[size8]\n" + " jmp 2b\n" + ".previous\n" + _ASM_EXTABLE_UA(0b, 3b) + _ASM_EXTABLE_UA(1b, 2b) + : [size8] "="(size), [dst] "=" (__d0) + : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr), + [val] "ri"(pattern) + : "rdx"); + + clac(); + return size; +} +EXPORT_SYMBOL(__memset_user); + +unsigned long memset_user(void __user *to, int c, unsigned long n) +{ + if (access_ok(VERIFY_WRITE, to, n)) + return __memset_user(to, c, n); + return n; +} +EXPORT_SYMBOL(memset_user); + + /* * Zero Userspace */ -- 2.19.1
[PATCH 02/12] __wr_after_init: linker section and label
Introduce a section and a label for statically allocated write rare data. The label is named "__wr_after_init". As the name implies, after the init phase is completed, this section will be modifiable only by invoking write rare functions. The section must take up a set of full pages. To activate both section and label, the arch must set CONFIG_ARCH_HAS_PRMEM Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- arch/Kconfig | 15 +++ include/asm-generic/vmlinux.lds.h | 25 + include/linux/cache.h | 21 + init/main.c | 2 ++ 4 files changed, 63 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index e1e540ffa979..8668ffec8098 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -802,6 +802,21 @@ config VMAP_STACK the stack to map directly to the KASAN shadow map using a formula that is incorrect if the stack is in vmalloc space. +config ARCH_HAS_PRMEM + def_bool n + help + architecture specific symbol stating that the architecture provides + a back-end function for the write rare operation. + +config PRMEM + bool "Write protect critical data that doesn't need high write speed." + depends on ARCH_HAS_PRMEM + default y + help + If the architecture supports it, statically allocated data which + has been selected for hardening becomes (mostly) read-only. + The selection happens by labelling the data "__wr_after_init". + config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..ddb1fd608490 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -311,6 +311,30 @@ KEEP(*(__jump_table)) \ __stop___jump_table = .; +/* + * Allow architectures to handle wr_after_init data on their + * own by defining an empty WR_AFTER_INIT_DATA. + * However, it's important that pages containing WR_RARE data do not + * hold anything else, to avoid both accidentally unprotecting something + * that is supposed to stay read-only all the time and also to protect + * something else that is supposed to be writeable all the time. + */ +#ifndef WR_AFTER_INIT_DATA +#ifdef CONFIG_PRMEM +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(PAGE_SIZE); \ + __start_wr_after_init = .; \ + . = ALIGN(align); \ + *(.data..wr_after_init) \ + . = ALIGN(PAGE_SIZE); \ + __end_wr_after_init = .;\ + . = ALIGN(align); +#else +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(align); +#endif +#endif + /* * Allow architectures to handle ro_after_init data on their * own by defining an empty RO_AFTER_INIT_DATA. @@ -332,6 +356,7 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ + WR_AFTER_INIT_DATA(align) /* wr after init */ \ KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ diff --git a/include/linux/cache.h b/include/linux/cache.h index 750621e41d1c..09bd0b9284b6 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -31,6 +31,27 @@ #define __ro_after_init __attribute__((__section__(".data..ro_after_init"))) #endif +/* + * __wr_after_init is used to mark objects that cannot be modified + * directly after init (i.e. after mark_rodata_ro() has been called). + * These objects become effectively read-only, from the perspective of + * performing a direct write, like a variable assignment. + * However, they can be altered through a dedicated function. + * It is intended for those objects which are occasionally modified after + * init, however they are modified so seldomly, that the extra cost from + * the indirect modification is either negligible or worth paying, for the + * sake of the protection gained. + */ +#ifndef __wr_after_init +#ifdef CONFIG_PRMEM +#define __wr_after_init \ + __attribute__((__section__(".data..wr_after_init")
[PATCH] checkpatch.pl: Improve WARNING on Kconfig help
The checkpatch.pl script complains when the help section of a Kconfig entry is too short, but it doesn't really explain what it is looking for. Instead, it gives a generic warning that one should consider writing a paragraph. But what it *really* checks is that the help section is at least .$min_conf_desc_length lines long. Since the definition of what is a paragraph is not really carved in stone (and actually the primary descriptions is "5 sentences"), make the warning less ambiguous by expliciting the actual test condition, so that one doesn't have to read checkpatch.pl sources, to figure out the actual test. Signed-off-by: Igor Stoppa CC: Andy Whitcroft CC: Joe Perches CC: linux-kernel@vger.kernel.org --- scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index c883ec55654f..33568d7e28d1 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2931,7 +2931,7 @@ sub process { } if ($is_start && $is_end && $length < $min_conf_desc_length) { WARN("CONFIG_DESCRIPTION", -"please write a paragraph that describes the config symbol fully\n" . $herecurr); +"expecting a 'help' section of " .$min_conf_desc_length . "+ lines\n" . $herecurr); } #print "is_start<$is_start> is_end<$is_end> length<$length>\n"; } -- 2.19.1
Re: [PATCH] checkpatch.pl: Improve WARNING on Kconfig help
On 19/12/2018 14:29, Joe Perches wrote: On Wed, 2018-12-19 at 11:59 +, Andy Whitcroft wrote: On Wed, Dec 19, 2018 at 02:44:36AM -0800, Joe Perches wrote: To cover both cases perhaps: "please ensure that this config symbols is described fully (less than $min_conf_desc_length lines is quite brief)" This is one of those checkpatch bleats I never really thought was appropriate as some or many Kconfig symbols are fully descriptive in even with only a single line. Also, it seems you are arguing for a checkpatch --verbose-help output style rather than the intentionally terse single line output that the script produces today. If I have to use --verbose, to understand that the warning is about me writing 3 lines when the script expects 4, I don't think it's particularly user friendly. Let's write "Expected 4+ lines" or something equally clear. It will fit in a row and get the job done. That is something Al Viro once suggested in this thread: https://lore.kernel.org/patchwork/patch/775901/ On Sat, 2017-04-01 at 05:08 +0100, Al Viro wrote: On Fri, Mar 31, 2017 at 08:52:50PM -0700, Joe Perches wrote: checkpatch messages are single line. Too bad... Incidentally, being able to get more detailed explanation of a warning might be a serious improvement, especially if it contains the rationale. Hell, something like TeX handling of errors might be a good idea - warning printed, offered actions include 'give more help', 'continue', 'exit', 'from now on suppress this kind of warning', 'from now on just dump this kind of warning into log and keep going', 'from now on dump all warnings into log and keep going'. It's all good in general, but here the word "paragraph" is being abused, in the sense that it has been given an arbitrary meaning of "4 lines". And the warning is even worse because it doesn't even acknowledge that I wrote something, even if it's a meager 1 or 2 lines. Which is even more confusing. As user, if I'm running checkpatch.pl and I get a warning, I should spend my time trying to decide if/how to fix it, not re-invoking it with extra options or reading its sources. -- igor
[PATCH] checkpatch.pl: Improve WARNING on Kconfig help
The checkpatch.pl script complains when the help section of a Kconfig entry is too short, but it doesn't really explain what it is looking for. Instead, it gives a generic warning that one should consider writing a paragraph. But what it *really* checks is that the help section is at least .$min_conf_desc_length lines long. Since the definition of what is a paragraph is not really carved in stone (and actually the primary descriptions is "5 sentences"), make the warning less ambiguous by expliciting the actual test condition, so that one doesn't have to read checkpatch.pl sources, to figure out the actual test. Signed-off-by: Igor Stoppa CC: Andy Whitcroft CC: Joe Perches CC: linux-kernel@vger.kernel.org --- scripts/checkpatch.pl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index c883ec55654f..e255f0423cca 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2931,7 +2931,8 @@ sub process { } if ($is_start && $is_end && $length < $min_conf_desc_length) { WARN("CONFIG_DESCRIPTION", -"please write a paragraph that describes the config symbol fully\n" . $herecurr); +"please write a paragraph (" .$min_conf_desc_length . " lines)" . +" that describes the config symbol fully\n" . $herecurr); } #print "is_start<$is_start> is_end<$is_end> length<$length>\n"; } -- 2.19.1
Re: [PATCH 2/6] __wr_after_init: write rare for static allocation
On 06/12/2018 11:44, Peter Zijlstra wrote: On Wed, Dec 05, 2018 at 03:13:56PM -0800, Andy Lutomirski wrote: + if (op == WR_MEMCPY) + memcpy((void *)wr_poking_addr, (void *)src, len); + else if (op == WR_MEMSET) + memset((u8 *)wr_poking_addr, (u8)src, len); + else if (op == WR_RCU_ASSIGN_PTR) + /* generic version of rcu_assign_pointer */ + smp_store_release((void **)wr_poking_addr, + RCU_INITIALIZER((void **)src)); + kasan_enable_current(); Hmm. I suspect this will explode quite badly on sane architectures like s390. (In my book, despite how weird s390 is, it has a vastly nicer model of "user" memory than any other architecture I know of...). I think you should use copy_to_user(), etc, instead. I'm not entirely sure what the best smp_store_release() replacement is. Making this change may also mean you can get rid of the kasan_disable_current(). If you make the MEMCPY one guarantee single-copy atomicity for native words then you're basically done. smp_store_release() can be implemented with: smp_mb(); WRITE_ONCE(); So if we make MEMCPY provide the WRITE_ONCE(), all we need is that barrier, which we can easily place at the call site and not overly complicate our interface with this. Ok, so the 3rd case (WR_RCU_ASSIGN_PTR) could be handled outside of this function. But, since now memcpy() will be replaced by copy_to_user(), can I assume that also copy_to_user() will be atomic, if the destination is properly aligned? On x86_64 it seems yes, however it's not clear to me if this is the outcome of an optimization or if I can expect it to be always true. -- igor
Re: [PATCH 2/6] __wr_after_init: write rare for static allocation
On 06/12/2018 06:44, Matthew Wilcox wrote: On Tue, Dec 04, 2018 at 02:18:01PM +0200, Igor Stoppa wrote: +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op) +{ + temporary_mm_state_t prev; + unsigned long flags; + unsigned long offset; + unsigned long wr_poking_addr; + + /* Confirm that the writable mapping exists. */ + BUG_ON(!wr_ready); + + if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") || + WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range.")) + return (void *)dst; + + offset = dst - (unsigned long)&__start_wr_after_init; + wr_poking_addr = wr_poking_base + offset; + local_irq_save(flags); Why not local_irq_disable()? Do we have a use-case for wanting to access this from interrupt context? No, not that I can think of. It was "just in case", but I can remove it. + /* XXX make the verification optional? */ Well, yes. It seems like debug code to me. Ok, I was not sure about this, because text_poke() does it as part of its normal operations. + /* Randomize the poking address base*/ + wr_poking_base = TASK_UNMAPPED_BASE + + (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) % + (TASK_SIZE - (TASK_UNMAPPED_BASE + wr_range)); I don't think this is a great idea. We want to use the same mm for both static and dynamic wr memory, yes? So we should have enough space for all of ram, not splatter the static section all over the address space. On x86-64 (4 level page tables), we have a 64TB space for all of physmem and 128TB of user space, so we can place the base anywhere in a 64TB range. I was actually wondering about the dynamic part. It's still not clear to me if it's possible to write the code in a sufficiently generic way that it could work on all 64 bit architectures. I'll start with x86-64 as you suggest. -- igor
Re: [PATCH 2/6] __wr_after_init: write rare for static allocation
On 06/12/2018 01:13, Andy Lutomirski wrote: + kasan_disable_current(); + if (op == WR_MEMCPY) + memcpy((void *)wr_poking_addr, (void *)src, len); + else if (op == WR_MEMSET) + memset((u8 *)wr_poking_addr, (u8)src, len); + else if (op == WR_RCU_ASSIGN_PTR) + /* generic version of rcu_assign_pointer */ + smp_store_release((void **)wr_poking_addr, + RCU_INITIALIZER((void **)src)); + kasan_enable_current(); Hmm. I suspect this will explode quite badly on sane architectures like s390. (In my book, despite how weird s390 is, it has a vastly nicer model of "user" memory than any other architecture I know of...). I see. I can try to setup also a qemu target for s390, for my tests. There seems to be a Debian image, to have a fully bootable system. I think you should use copy_to_user(), etc, instead. I'm having troubles with the "etc" part: as far as I can see, there are both generic and specific support for both copying and clearing user-space memory from kernel, however I couldn't find something that looks like a memset_user(). I can of course roll my own, for example iterating copy_to_user() with the support of a pre-allocated static buffer (1 page should be enough). But, before I go down this path, I wanted to confirm that there's really nothing better that I could use. If that's really the case, the static buffer instance should be replicated for each core, I think, since each core could be performing its own memset_user() at the same time. Alternatively, I could do a loop of WRITE_ONCE(), however I'm not sure how that would work with (lack-of) alignment and might require also a preamble/epilogue to deal with unaligned data? I'm not entirely sure what the best smp_store_release() replacement is. Making this change may also mean you can get rid of the kasan_disable_current(). + + barrier(); /* XXX redundant? */ I think it's redundant. If unuse_temporary_mm() allows earlier stores to hit the wrong address space, then something is very very wrong, and something is also very very wrong if the optimizer starts moving stores across a function call that is most definitely a barrier. ok, thanks + + unuse_temporary_mm(prev); + /* XXX make the verification optional? */ + if (op == WR_MEMCPY) + BUG_ON(memcmp((void *)dst, (void *)src, len)); + else if (op == WR_MEMSET) + BUG_ON(memtst((void *)dst, (u8)src, len)); + else if (op == WR_RCU_ASSIGN_PTR) + BUG_ON(*(unsigned long *)dst != src); Hmm. If you allowed cmpxchg or even plain xchg, then these bug_ons would be thoroughly buggy, but maybe they're okay. But they should, at most, be WARN_ON_ONCE(), I have to confess that I do not understand why Nadav's patchset was required to use BUG_ON(), while here it's not correct, not even for memcopy or memset . Is it because it is single-threaded? Or is it because text_poke() is patching code, instead of data? I can turn to WARN_ON_ONCE(), but I'd like to understand the reason. given that you can trigger them by writing the same addresses from two threads at once, and this isn't even entirely obviously bogus given the presence of smp_store_release(). True, however would it be reasonable to require the use of an explicit writer lock, from the user? This operation is not exactly fast and should happen seldom; I'm not sure if it's worth supporting cmpxchg. The speedup would be minimal. I'd rather not implement the locking implicitly, even if it would be possible to detect simultaneous writes, because it might lead to overall inconsistent data. -- igor
[PATCH 6/6] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init modifier wil lcause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[PATCH 6/6] __wr_after_init: lkdtm test
Verify that trying to modify a variable with the __wr_after_init modifier wil lcause a crash. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- drivers/misc/lkdtm/core.c | 3 +++ drivers/misc/lkdtm/lkdtm.h | 3 +++ drivers/misc/lkdtm/perms.c | 29 + 3 files changed, 35 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 2837dc77478e..73c34b17c433 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(ACCESS_USERSPACE), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), +#ifdef CONFIG_PRMEM + CRASHTYPE(WRITE_WR_AFTER_INIT), +#endif CRASHTYPE(WRITE_KERN), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..abba2f52ffa6 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void); void __init lkdtm_perms_init(void); void lkdtm_WRITE_RO(void); void lkdtm_WRITE_RO_AFTER_INIT(void); +#ifdef CONFIG_PRMEM +void lkdtm_WRITE_WR_AFTER_INIT(void); +#endif void lkdtm_WRITE_KERN(void); void lkdtm_EXEC_DATA(void); void lkdtm_EXEC_STACK(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 53b85c9d16b8..f681730aa652 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55; /* This is marked __ro_after_init, so it should ultimately be .rodata. */ static unsigned long ro_after_init __ro_after_init = 0x55AA5500; +/* This is marked __wr_after_init, so it should be in .rodata. */ +static +unsigned long wr_after_init __wr_after_init = 0x55AA5500; + /* * This just returns to the caller. It is designed to be copied into * non-executable memory regions. @@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void) *ptr ^= 0xabcd1234; } +#ifdef CONFIG_PRMEM + +void lkdtm_WRITE_WR_AFTER_INIT(void) +{ + unsigned long *ptr = _after_init; + + /* +* Verify we were written to during init. Since an Oops +* is considered a "success", a failure is to just skip the +* real test. +*/ + if ((*ptr & 0xAA) != 0xAA) { + pr_info("%p was NOT written during init!?\n", ptr); + return; + } + + pr_info("attempting bad wr_after_init write at %p\n", ptr); + *ptr ^= 0xabcd1234; +} + +#endif + void lkdtm_WRITE_KERN(void) { size_t size; @@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void) /* Make sure we can write to __ro_after_init values during __init */ ro_after_init |= 0xAA; + /* Make sure we can write to __wr_after_init during __init */ + wr_after_init |= 0xAA; } -- 2.19.1
[PATCH 2/6] __wr_after_init: write rare for static allocation
Implementation of write rare for statically allocated data, located in a specific memory section through the use of the __write_rare label. The basic functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() The implementation is based on code from Andy Lutomirski and Nadav Amit for patching the text on x86 [here goes reference to commits, once merged] The modification of write protected data is done through an alternate mapping of the same pages, as writable. This mapping is local to each core and is active only for the duration of each write operation. Local interrupts are disabled, while the alternate mapping is active. In theory, it could introduce a non-predictable delay, in a preemptible system, however the amount of data to be altered is likely to be far smaller than a page. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 133 ++ init/main.c | 2 + mm/Kconfig| 4 ++ mm/Makefile | 1 + mm/prmem.c| 124 +++ 5 files changed, 264 insertions(+) create mode 100644 include/linux/prmem.h create mode 100644 mm/prmem.c diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..b0131c1f5dc0 --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,133 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Support for: + * - statically allocated write rare data + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * memtst() - test n bytes of the source to match the c value + * @p: beginning of the memory to test + * @c: byte to compare against + * @len: amount of bytes to test + * + * Returns 0 on success, non-zero otherwise. + */ +static inline int memtst(void *p, int c, __kernel_size_t len) +{ + __kernel_size_t i; + + for (i = 0; i < len; i++) { + u8 d = *(i + (u8 *)p) - (u8)c; + + if (unlikely(d)) + return d; + } + return 0; +} + + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return memset(p, c, len); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return memcpy(p, q, size); +} + +#define wr_assign(var, val)((var) = (val)) + +#define wr_rcu_assign_pointer(p, v)\ + rcu_assign_pointer(p, v) + +#else + +enum wr_op_type { + WR_MEMCPY, + WR_MEMSET, + WR_RCU_ASSIGN_PTR, + WR_OPS_NUMBER, +}; + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op); + +/** + * wr_memset() - sets n bytes of the destination to the c value + * @p: beginning of the memory to write to + * @c: byte to replicate + * @len: amount of bytes to copy + * + * Returns true on success, false otherwise. + */ +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET); +} + +/** + * wr_memcpy() - copyes n bytes from source to destination + * @dst: beginning of the memory to write to + * @src: beginning of the memory to read from + * @n_bytes: amount of bytes to copy + * + * Returns pointer to the destination + */ +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY); +} + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + * + * Note: it might be possible to optimize this, to use wr_memset in some + * cases (maybe with NULL?). + */ + +#define wr_assign(var, val) ({ \ + typeof(var) tmp = (typeof(var))val; \ + \ + wr_memcpy(, , sizeof(var)); \ + var;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_
[PATCH 4/6] rodata_test: add verification for __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index 3c1e515ca9b1..a98d088ad9cc 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,7 +16,19 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) @@ -60,6 +72,9 @@ void rodata_test(void) { if (test_data("rodata", _test_data, (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata)) + (unsigned long)&__end_rodata) && + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init)) pr_info("all tests were successful\n"); } -- 2.19.1
[PATCH 5/6] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 7 ++- mm/Kconfig.debug | 9 +++ mm/Makefile | 1 + mm/test_write_rare.c | 135 ++ 4 files changed, 149 insertions(+), 3 deletions(-) create mode 100644 mm/test_write_rare.c diff --git a/include/linux/prmem.h b/include/linux/prmem.h index b0131c1f5dc0..d2492ec24c8c 100644 --- a/include/linux/prmem.h +++ b/include/linux/prmem.h @@ -125,9 +125,10 @@ static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) * * It is provided as macro, to match rcu_assign_pointer() */ -#define wr_rcu_assign_pointer(p, v) ({ \ - __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_PTR);\ - p; \ +#define wr_rcu_assign_pointer(p, v) ({ \ + __wr_op((unsigned long), (unsigned long)v, sizeof(p), \ + WR_RCU_ASSIGN_PTR); \ + p; \ }) #endif #endif diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..a26ecbd27aea 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,12 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on STRICT_KERNEL_RWX +select PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index ef3867c16ce0..8de1d468f4e7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_PRMEM) += prmem.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..240cc43793d1 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static inline bool test_pattern(void) +{ + return (memtst(array, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)); +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memtst(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memtst(array, '0', PAGE_SIZE * 3), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAG
[PATCH 2/6] __wr_after_init: write rare for static allocation
Implementation of write rare for statically allocated data, located in a specific memory section through the use of the __write_rare label. The basic functions are: - wr_memset(): write rare counterpart of memset() - wr_memcpy(): write rare counterpart of memcpy() - wr_assign(): write rare counterpart of the assignment ('=') operator - wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer() The implementation is based on code from Andy Lutomirski and Nadav Amit for patching the text on x86 [here goes reference to commits, once merged] The modification of write protected data is done through an alternate mapping of the same pages, as writable. This mapping is local to each core and is active only for the duration of each write operation. Local interrupts are disabled, while the alternate mapping is active. In theory, it could introduce a non-predictable delay, in a preemptible system, however the amount of data to be altered is likely to be far smaller than a page. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 133 ++ init/main.c | 2 + mm/Kconfig| 4 ++ mm/Makefile | 1 + mm/prmem.c| 124 +++ 5 files changed, 264 insertions(+) create mode 100644 include/linux/prmem.h create mode 100644 mm/prmem.c diff --git a/include/linux/prmem.h b/include/linux/prmem.h new file mode 100644 index ..b0131c1f5dc0 --- /dev/null +++ b/include/linux/prmem.h @@ -0,0 +1,133 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * prmem.h: Header for memory protection library + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + * + * Support for: + * - statically allocated write rare data + */ + +#ifndef _LINUX_PRMEM_H +#define _LINUX_PRMEM_H + +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * memtst() - test n bytes of the source to match the c value + * @p: beginning of the memory to test + * @c: byte to compare against + * @len: amount of bytes to test + * + * Returns 0 on success, non-zero otherwise. + */ +static inline int memtst(void *p, int c, __kernel_size_t len) +{ + __kernel_size_t i; + + for (i = 0; i < len; i++) { + u8 d = *(i + (u8 *)p) - (u8)c; + + if (unlikely(d)) + return d; + } + return 0; +} + + +#ifndef CONFIG_PRMEM + +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return memset(p, c, len); +} + +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return memcpy(p, q, size); +} + +#define wr_assign(var, val)((var) = (val)) + +#define wr_rcu_assign_pointer(p, v)\ + rcu_assign_pointer(p, v) + +#else + +enum wr_op_type { + WR_MEMCPY, + WR_MEMSET, + WR_RCU_ASSIGN_PTR, + WR_OPS_NUMBER, +}; + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op); + +/** + * wr_memset() - sets n bytes of the destination to the c value + * @p: beginning of the memory to write to + * @c: byte to replicate + * @len: amount of bytes to copy + * + * Returns true on success, false otherwise. + */ +static inline void *wr_memset(void *p, int c, __kernel_size_t len) +{ + return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET); +} + +/** + * wr_memcpy() - copyes n bytes from source to destination + * @dst: beginning of the memory to write to + * @src: beginning of the memory to read from + * @n_bytes: amount of bytes to copy + * + * Returns pointer to the destination + */ +static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) +{ + return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY); +} + +/** + * wr_assign() - sets a write-rare variable to a specified value + * @var: the variable to set + * @val: the new value + * + * Returns: the variable + * + * Note: it might be possible to optimize this, to use wr_memset in some + * cases (maybe with NULL?). + */ + +#define wr_assign(var, val) ({ \ + typeof(var) tmp = (typeof(var))val; \ + \ + wr_memcpy(, , sizeof(var)); \ + var;\ +}) + +/** + * wr_rcu_assign_pointer() - initialize a pointer in rcu mode + * @p: the rcu pointer + * @v: the new value + * + * Returns the value assigned to the rcu pointer. + * + * It is provided as macro, to match rcu_assign_pointer() + */ +#define wr_rcu_assign_pointer(p, v) ({ \ + __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_
[PATCH 4/6] rodata_test: add verification for __wr_after_init
The write protection of the __wr_after_init data can be verified with the same methodology used for const data. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index 3c1e515ca9b1..a98d088ad9cc 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -16,7 +16,19 @@ #define INIT_TEST_VAL 0xC3 +/* + * Note: __ro_after_init data is, for every practical effect, equivalent to + * const data, since they are even write protected at the same time; there + * is no need for separate testing. + * __wr_after_init data, otoh, is altered also after the write protection + * takes place and it cannot be exploitable for altering more permanent + * data. + */ + static const int rodata_test_data = INIT_TEST_VAL; +static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL; +extern long __start_wr_after_init; +extern long __end_wr_after_init; static bool test_data(char *data_type, const int *data, unsigned long start, unsigned long end) @@ -60,6 +72,9 @@ void rodata_test(void) { if (test_data("rodata", _test_data, (unsigned long)&__start_rodata, - (unsigned long)&__end_rodata)) + (unsigned long)&__end_rodata) && + test_data("wr after init data", _after_init_test_data, + (unsigned long)&__start_wr_after_init, + (unsigned long)&__end_wr_after_init)) pr_info("all tests were successful\n"); } -- 2.19.1
[PATCH 5/6] __wr_after_init: test write rare functionality
Set of test cases meant to confirm that the write rare functionality works as expected. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/linux/prmem.h | 7 ++- mm/Kconfig.debug | 9 +++ mm/Makefile | 1 + mm/test_write_rare.c | 135 ++ 4 files changed, 149 insertions(+), 3 deletions(-) create mode 100644 mm/test_write_rare.c diff --git a/include/linux/prmem.h b/include/linux/prmem.h index b0131c1f5dc0..d2492ec24c8c 100644 --- a/include/linux/prmem.h +++ b/include/linux/prmem.h @@ -125,9 +125,10 @@ static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size) * * It is provided as macro, to match rcu_assign_pointer() */ -#define wr_rcu_assign_pointer(p, v) ({ \ - __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_PTR);\ - p; \ +#define wr_rcu_assign_pointer(p, v) ({ \ + __wr_op((unsigned long), (unsigned long)v, sizeof(p), \ + WR_RCU_ASSIGN_PTR); \ + p; \ }) #endif #endif diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 9a7b8b049d04..a26ecbd27aea 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -94,3 +94,12 @@ config DEBUG_RODATA_TEST depends on STRICT_KERNEL_RWX ---help--- This option enables a testcase for the setting rodata read-only. + +config DEBUG_PRMEM_TEST +tristate "Run self test for statically allocated protected memory" +depends on STRICT_KERNEL_RWX +select PRMEM +default n +help + Tries to verify that the protection for statically allocated memory + works correctly and that the memory is effectively protected. diff --git a/mm/Makefile b/mm/Makefile index ef3867c16ce0..8de1d468f4e7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_PRMEM) += prmem.o +obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c new file mode 100644 index ..240cc43793d1 --- /dev/null +++ b/mm/test_write_rare.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * test_write_rare.c + * + * (C) Copyright 2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static __wr_after_init int scalar = '0'; +static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE); + +/* The section must occupy a non-zero number of whole pages */ +static bool test_alignment(void) +{ + unsigned long pstart = (unsigned long)&__start_wr_after_init; + unsigned long pend = (unsigned long)&__end_wr_after_init; + + if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) || +(pstart >= pend), "Boundaries test failed.")) + return false; + pr_info("Boundaries test passed."); + return true; +} + +static inline bool test_pattern(void) +{ + return (memtst(array, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) || + memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) || + memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2)); +} + +static bool test_wr_memset(void) +{ + int new_val = '1'; + + wr_memset(, new_val, sizeof(scalar)); + if (WARN(memtst(, new_val, sizeof(scalar)), +"Scalar write rare memset test failed.")) + return false; + + pr_info("Scalar write rare memset test passed."); + + wr_memset(array, '0', PAGE_SIZE * 3); + if (WARN(memtst(array, '0', PAGE_SIZE * 3), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2); + if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2), +"Array write rare memset test failed.")) + return false; + + wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2); + if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAG
[RFC v1 PATCH 0/6] hardening: statically allocated protected memory
This patch-set is the first-cut implementation of write-rare memory protection, as previously agreed [1] Its purpose it to keep data write protected kernel data which is seldom modified. There is no read overhead, however writing requires special operations that are probably unsitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * supports only x86_64 * might not work for very large amount of data, since it relies on the assumption that said data can be entirely remapped, at init. Some notes: - even if the code is only for x86_64, it is placed in the generic locations, with the intention of extending it also to arm64 - the current section used for collecting wr-after-init data might need to be moved, to work with arm64 MMU - the functionality is in its own c and h files, for now, to ease the introduction (and refactoring) of code dealing with dynamic allocation - recently some updated patches were posted for live-patch on arm64 [2], they might help with adding arm64 support here - to avoid the risk of weakening __ro_after_init, __wr_after_init data is in a separate set of pages, and any invocation will confirm that the memory affected falls within this range. I have modified rodata_test accordingly, to check als othis case. - to avoid replicating the code which does the change of mapping, there is only one function performing multiple, selectable, operations, such as memcpy(), memset(). I have added also rcu_assign_pointer() as further example. But I'm not too fond of this implementation either. I just couldn't think of any that I would like significantly better. - I have left out the patchset from Nadav that these patches depend on, but it can be found here [3] (Should have I resubmitted it?) - I am not sure what is the correct form for giving proper credit wrt the authoring of the wr_after_init mechanism, guidance would be appreciated - In an attempt to spam less people, I have curbed the list of recipients. If I have omitted someone who should have been kept/added, please add them to the thread. [1] https://www.openwall.com/lists/kernel-hardening/2018/11/22/8 [2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793199.html [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1810245.html Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org Igor Stoppa (6): [PATCH 1/6] __wr_after_init: linker section and label [PATCH 2/6] __wr_after_init: write rare for static allocation [PATCH 3/6] rodata_test: refactor tests [PATCH 4/6] rodata_test: add verification for __wr_after_init [PATCH 5/6] __wr_after_init: test write rare functionality [PATCH 6/6] __wr_after_init: lkdtm test drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h| 3 + drivers/misc/lkdtm/perms.c| 29 include/asm-generic/vmlinux.lds.h | 20 ++ include/linux/cache.h | 17 + include/linux/prmem.h | 134 + init/main.c | 2 + mm/Kconfig| 4 ++ mm/Kconfig.debug | 9 +++ mm/Makefile | 2 + mm/prmem.c| 124 ++ mm/rodata_test.c | 63 -- mm/test_write_rare.c | 135 ++ 13 files changed, 525 insertions(+), 20 deletions(-)
[PATCH 1/6] __wr_after_init: linker section and label
Introduce a section and a label for statically allocated write rare data. The label is named "__wr_after_init". As the name implies, after the init phase is completed, this section will be modifiable only by invoking write rare functions. The section must take up a set of full pages. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/asm-generic/vmlinux.lds.h | 20 include/linux/cache.h | 17 + 2 files changed, 37 insertions(+) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..b711dbe6999f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -311,6 +311,25 @@ KEEP(*(__jump_table)) \ __stop___jump_table = .; +/* + * Allow architectures to handle wr_after_init data on their + * own by defining an empty WR_AFTER_INIT_DATA. + * However, it's important that pages containing WR_RARE data do not + * hold anything else, to avoid both accidentally unprotecting something + * that is supposed to stay read-only all the time and also to protect + * something else that is supposed to be writeable all the time. + */ +#ifndef WR_AFTER_INIT_DATA +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(PAGE_SIZE); \ + __start_wr_after_init = .; \ + . = ALIGN(align); \ + *(.data..wr_after_init) \ + . = ALIGN(PAGE_SIZE); \ + __end_wr_after_init = .;\ + . = ALIGN(align); +#endif + /* * Allow architectures to handle ro_after_init data on their * own by defining an empty RO_AFTER_INIT_DATA. @@ -332,6 +351,7 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ + WR_AFTER_INIT_DATA(align) /* wr after init */ \ KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ diff --git a/include/linux/cache.h b/include/linux/cache.h index 750621e41d1c..9a7e7134b887 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -31,6 +31,23 @@ #define __ro_after_init __attribute__((__section__(".data..ro_after_init"))) #endif +/* + * __wr_after_init is used to mark objects that cannot be modified + * directly after init (i.e. after mark_rodata_ro() has been called). + * These objects become effectively read-only, from the perspective of + * performing a direct write, like a variable assignment. + * However, they can be altered through a dedicated function. + * It is intended for those objects which are occasionally modified after + * init, however they are modified so seldomly, that the extra cost from + * the indirect modification is either negligible or worth paying, for the + * sake of the protection gained. + */ +#ifndef __wr_after_init +#define __wr_after_init \ + __attribute__((__section__(".data..wr_after_init"))) +#endif + + #ifndef cacheline_aligned #define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) #endif -- 2.19.1
[RFC v1 PATCH 0/6] hardening: statically allocated protected memory
This patch-set is the first-cut implementation of write-rare memory protection, as previously agreed [1] Its purpose it to keep data write protected kernel data which is seldom modified. There is no read overhead, however writing requires special operations that are probably unsitable for often-changing data. The use is opt-in, by applying the modifier __wr_after_init to a variable declaration. As the name implies, the write protection kicks in only after init() is completed; before that moment, the data is modifiable in the usual way. Current Limitations: * supports only data which is allocated statically, at build time. * supports only x86_64 * might not work for very large amount of data, since it relies on the assumption that said data can be entirely remapped, at init. Some notes: - even if the code is only for x86_64, it is placed in the generic locations, with the intention of extending it also to arm64 - the current section used for collecting wr-after-init data might need to be moved, to work with arm64 MMU - the functionality is in its own c and h files, for now, to ease the introduction (and refactoring) of code dealing with dynamic allocation - recently some updated patches were posted for live-patch on arm64 [2], they might help with adding arm64 support here - to avoid the risk of weakening __ro_after_init, __wr_after_init data is in a separate set of pages, and any invocation will confirm that the memory affected falls within this range. I have modified rodata_test accordingly, to check als othis case. - to avoid replicating the code which does the change of mapping, there is only one function performing multiple, selectable, operations, such as memcpy(), memset(). I have added also rcu_assign_pointer() as further example. But I'm not too fond of this implementation either. I just couldn't think of any that I would like significantly better. - I have left out the patchset from Nadav that these patches depend on, but it can be found here [3] (Should have I resubmitted it?) - I am not sure what is the correct form for giving proper credit wrt the authoring of the wr_after_init mechanism, guidance would be appreciated - In an attempt to spam less people, I have curbed the list of recipients. If I have omitted someone who should have been kept/added, please add them to the thread. [1] https://www.openwall.com/lists/kernel-hardening/2018/11/22/8 [2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793199.html [3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1810245.html Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org Igor Stoppa (6): [PATCH 1/6] __wr_after_init: linker section and label [PATCH 2/6] __wr_after_init: write rare for static allocation [PATCH 3/6] rodata_test: refactor tests [PATCH 4/6] rodata_test: add verification for __wr_after_init [PATCH 5/6] __wr_after_init: test write rare functionality [PATCH 6/6] __wr_after_init: lkdtm test drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h| 3 + drivers/misc/lkdtm/perms.c| 29 include/asm-generic/vmlinux.lds.h | 20 ++ include/linux/cache.h | 17 + include/linux/prmem.h | 134 + init/main.c | 2 + mm/Kconfig| 4 ++ mm/Kconfig.debug | 9 +++ mm/Makefile | 2 + mm/prmem.c| 124 ++ mm/rodata_test.c | 63 -- mm/test_write_rare.c | 135 ++ 13 files changed, 525 insertions(+), 20 deletions(-)
[PATCH 1/6] __wr_after_init: linker section and label
Introduce a section and a label for statically allocated write rare data. The label is named "__wr_after_init". As the name implies, after the init phase is completed, this section will be modifiable only by invoking write rare functions. The section must take up a set of full pages. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- include/asm-generic/vmlinux.lds.h | 20 include/linux/cache.h | 17 + 2 files changed, 37 insertions(+) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..b711dbe6999f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -311,6 +311,25 @@ KEEP(*(__jump_table)) \ __stop___jump_table = .; +/* + * Allow architectures to handle wr_after_init data on their + * own by defining an empty WR_AFTER_INIT_DATA. + * However, it's important that pages containing WR_RARE data do not + * hold anything else, to avoid both accidentally unprotecting something + * that is supposed to stay read-only all the time and also to protect + * something else that is supposed to be writeable all the time. + */ +#ifndef WR_AFTER_INIT_DATA +#define WR_AFTER_INIT_DATA(align) \ + . = ALIGN(PAGE_SIZE); \ + __start_wr_after_init = .; \ + . = ALIGN(align); \ + *(.data..wr_after_init) \ + . = ALIGN(PAGE_SIZE); \ + __end_wr_after_init = .;\ + . = ALIGN(align); +#endif + /* * Allow architectures to handle ro_after_init data on their * own by defining an empty RO_AFTER_INIT_DATA. @@ -332,6 +351,7 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ + WR_AFTER_INIT_DATA(align) /* wr after init */ \ KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ diff --git a/include/linux/cache.h b/include/linux/cache.h index 750621e41d1c..9a7e7134b887 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -31,6 +31,23 @@ #define __ro_after_init __attribute__((__section__(".data..ro_after_init"))) #endif +/* + * __wr_after_init is used to mark objects that cannot be modified + * directly after init (i.e. after mark_rodata_ro() has been called). + * These objects become effectively read-only, from the perspective of + * performing a direct write, like a variable assignment. + * However, they can be altered through a dedicated function. + * It is intended for those objects which are occasionally modified after + * init, however they are modified so seldomly, that the extra cost from + * the indirect modification is either negligible or worth paying, for the + * sake of the protection gained. + */ +#ifndef __wr_after_init +#define __wr_after_init \ + __attribute__((__section__(".data..wr_after_init"))) +#endif + + #ifndef cacheline_aligned #define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) #endif -- 2.19.1
[PATCH 3/6] rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..3c1e515ca9b1 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + if (test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + pr_info("all tests were successful\n"); } -- 2.19.1
[PATCH 3/6] rodata_test: refactor tests
Refactor the test cases, in preparation for using them also for testing __wr_after_init memory. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: linux-integr...@vger.kernel.org CC: kernel-harden...@lists.openwall.com CC: linux...@kvack.org CC: linux-kernel@vger.kernel.org --- mm/rodata_test.c | 48 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/rodata_test.c b/mm/rodata_test.c index d908c8769b48..3c1e515ca9b1 100644 --- a/mm/rodata_test.c +++ b/mm/rodata_test.c @@ -14,44 +14,52 @@ #include #include -static const int rodata_test_data = 0xC3; +#define INIT_TEST_VAL 0xC3 -void rodata_test(void) +static const int rodata_test_data = INIT_TEST_VAL; + +static bool test_data(char *data_type, const int *data, + unsigned long start, unsigned long end) { - unsigned long start, end; int zero = 0; /* test 1: read the value */ /* If this test fails, some previous testrun has clobbered the state */ - if (!rodata_test_data) { - pr_err("test 1 fails (start data)\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test 1 fails (init data value)\n", data_type); + return false; } /* test 2: write to the variable; this should fault */ - if (!probe_kernel_write((void *)_test_data, - (void *), sizeof(zero))) { - pr_err("test data was not read only\n"); - return; + if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) { + pr_err("%s: test data was not read only\n", data_type); + return false; } /* test 3: check the value hasn't changed */ - if (rodata_test_data == zero) { - pr_err("test data was changed\n"); - return; + if (*data != INIT_TEST_VAL) { + pr_err("%s: test data was changed\n", data_type); + return false; } /* test 4: check if the rodata section is PAGE_SIZE aligned */ - start = (unsigned long)__start_rodata; - end = (unsigned long)__end_rodata; if (start & (PAGE_SIZE - 1)) { - pr_err("start of .rodata is not page size aligned\n"); - return; + pr_err("%s: start of data is not page size aligned\n", + data_type); + return false; } if (end & (PAGE_SIZE - 1)) { - pr_err("end of .rodata is not page size aligned\n"); - return; + pr_err("%s: end of data is not page size aligned\n", + data_type); + return false; } + return true; +} - pr_info("all tests were successful\n"); +void rodata_test(void) +{ + if (test_data("rodata", _test_data, + (unsigned long)&__start_rodata, + (unsigned long)&__end_rodata)) + pr_info("all tests were successful\n"); } -- 2.19.1
Re: [PATCH 10/17] prmem: documentation
Hi, On 13/11/2018 20:36, Andy Lutomirski wrote: On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa wrote: I forgot one sentence :-( On 13/11/2018 20:31, Igor Stoppa wrote: On 13/11/2018 19:47, Andy Lutomirski wrote: For general rare-writish stuff, I don't think we want IRQs running with them mapped anywhere for write. For AVC and IMA, I'm less sure. Why would these be less sensitive? But I see a big difference between my initial implementation and this one. In my case, by using a shared mapping, visible to all cores, freezing the core that is performing the write would have exposed the writable mapping to a potential attack run from another core. If the mapping is private to the core performing the write, even if it is frozen, it's much harder to figure out what it had mapped and where, from another core. To access that mapping, the attack should be performed from the ISR, I think. Unless the secondary mapping is also available to other cores, through the shared mm_struct ? I don't think this matters much. The other cores will only be able to use that mapping when they're doing a rare write. I'm still mulling over this. There might be other reasons for replicating the mm_struct. If I understand correctly how the text patching works, it happens sequentially, because of the text_mutex used by arch_jump_label_transform Which might be fine for this specific case, but I think I shouldn't introduce a global mutex, when it comes to data. Most likely, if two or more cores want to perform a write rare operation, there is no correlation between them, they could proceed in parallel. And if there really is, then the user of the API should introduce own locking, for that specific case. A bit unrelated question, related to text patching: I see that each patching operation is validated, but wouldn't it be more robust to first validate all of then, and only after they are all found to be compliant, to proceed with the actual modifications? And about the actual implementation of the write rare for the statically allocated variables, is it expected that I use Nadav's function? Or that I refactor the code? The name, referring to text would definitely not be ok for data. And I would have to also generalize it, to deal with larger amounts of data. I would find it easier, as first cut, to replicate its behavior and refactor only later, once it has stabilized and possibly Nadav's patches have been acked. -- igor
Re: [PATCH 10/17] prmem: documentation
Hi, On 13/11/2018 20:36, Andy Lutomirski wrote: On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa wrote: I forgot one sentence :-( On 13/11/2018 20:31, Igor Stoppa wrote: On 13/11/2018 19:47, Andy Lutomirski wrote: For general rare-writish stuff, I don't think we want IRQs running with them mapped anywhere for write. For AVC and IMA, I'm less sure. Why would these be less sensitive? But I see a big difference between my initial implementation and this one. In my case, by using a shared mapping, visible to all cores, freezing the core that is performing the write would have exposed the writable mapping to a potential attack run from another core. If the mapping is private to the core performing the write, even if it is frozen, it's much harder to figure out what it had mapped and where, from another core. To access that mapping, the attack should be performed from the ISR, I think. Unless the secondary mapping is also available to other cores, through the shared mm_struct ? I don't think this matters much. The other cores will only be able to use that mapping when they're doing a rare write. I'm still mulling over this. There might be other reasons for replicating the mm_struct. If I understand correctly how the text patching works, it happens sequentially, because of the text_mutex used by arch_jump_label_transform Which might be fine for this specific case, but I think I shouldn't introduce a global mutex, when it comes to data. Most likely, if two or more cores want to perform a write rare operation, there is no correlation between them, they could proceed in parallel. And if there really is, then the user of the API should introduce own locking, for that specific case. A bit unrelated question, related to text patching: I see that each patching operation is validated, but wouldn't it be more robust to first validate all of then, and only after they are all found to be compliant, to proceed with the actual modifications? And about the actual implementation of the write rare for the statically allocated variables, is it expected that I use Nadav's function? Or that I refactor the code? The name, referring to text would definitely not be ok for data. And I would have to also generalize it, to deal with larger amounts of data. I would find it easier, as first cut, to replicate its behavior and refactor only later, once it has stabilized and possibly Nadav's patches have been acked. -- igor
Re: [PATCH 10/17] prmem: documentation
On 13/11/2018 19:16, Andy Lutomirski wrote: > On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa wrote: [...] >> How about having one mm_struct for each writer (core or thread)? >> > > I don't think that helps anything. I think the mm_struct used for > prmem (or rare_write or whatever you want to call it) write_rare / rarely can be shortened to wr_ which is kinda less confusing than rare_write, since it would become rw_ and easier to confuse with R/W Any advice for better naming is welcome. > should be > entirely abstracted away by an appropriate API, so neither SELinux nor > IMA need to be aware that there's an mm_struct involved. Yes, that is fine. In my proposal I was thinking about tying it to the core/thread that performs the actual write. The high level API could be something like: wr_memcpy(void *src, void *dst, uint_t size) > It's also > entirely possible that some architectures won't even use an mm_struct > behind the scenes -- x86, for example, could have avoided it if there > were a kernel equivalent of PKRU. Sadly, there isn't. The mm_struct - or whatever is the means to do the write on that architecture - can be kept hidden from the API. But the reason why I was proposing to have one mm_struct per writer is that, iiuc, the secondary mapping is created in the secondary mm_struct for each writer using it. So the updating of IMA measurements would have, theoretically, also write access to the SELinux AVC. Which I was trying to avoid. And similarly any other write rare updater. Is this correct? >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of >> the patch might spill across the page boundary, however if I deal with >> the modification of generic data, I shouldn't (shouldn't I?) assume that >> the data will not span across multiple pages. > > The reason for the particular architecture of text_poke() is to avoid > memory allocation to get it working. i think that prmem/rare_write > should have each rare-writable kernel address map to a unique user > address, possibly just by offsetting everything by a constant. For > rare_write, you don't actually need it to work as such until fairly > late in boot, since the rare_writable data will just be writable early > on. Yes, that is true. I think it's safe to assume, from an attack pattern, that as long as user space is not started, the system can be considered ok. Even user-space code run from initrd should be ok, since it can be bundled (and signed) as a single binary with the kernel. Modules loaded from a regular filesystem are a bit more risky, because an attack might inject a rogue key in the key-ring and use it to load malicious modules. >> If the data spans across multiple pages, in unknown amount, I suppose >> that I should not keep interrupts disabled for an unknown time, as it >> would hurt preemption. >> >> What I thought, in my initial patch-set, was to iterate over each page >> that must be written to, in a loop, re-enabling interrupts in-between >> iterations, to give pending interrupts a chance to be served. >> >> This would mean that the data being written to would not be consistent, >> but it's a problem that would have to be addressed anyways, since it can >> be still read by other cores, while the write is ongoing. > > This probably makes sense, except that enabling and disabling > interrupts means you also need to restore the original mm_struct (most > likely), which is slow. I don't think there's a generic way to check > whether in interrupt is pending without turning interrupts on. The only "excuse" I have is that write_rare is opt-in and is "rare". Maybe the enabling/disabling of interrupts - and the consequent switch of mm_struct - could be somehow tied to the latency configuration? If preemption is disabled, the expectations on the system latency are anyway more relaxed. But I'm not sure how it would work against I/O. -- igor
Re: [PATCH 10/17] prmem: documentation
On 13/11/2018 19:16, Andy Lutomirski wrote: > On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa wrote: [...] >> How about having one mm_struct for each writer (core or thread)? >> > > I don't think that helps anything. I think the mm_struct used for > prmem (or rare_write or whatever you want to call it) write_rare / rarely can be shortened to wr_ which is kinda less confusing than rare_write, since it would become rw_ and easier to confuse with R/W Any advice for better naming is welcome. > should be > entirely abstracted away by an appropriate API, so neither SELinux nor > IMA need to be aware that there's an mm_struct involved. Yes, that is fine. In my proposal I was thinking about tying it to the core/thread that performs the actual write. The high level API could be something like: wr_memcpy(void *src, void *dst, uint_t size) > It's also > entirely possible that some architectures won't even use an mm_struct > behind the scenes -- x86, for example, could have avoided it if there > were a kernel equivalent of PKRU. Sadly, there isn't. The mm_struct - or whatever is the means to do the write on that architecture - can be kept hidden from the API. But the reason why I was proposing to have one mm_struct per writer is that, iiuc, the secondary mapping is created in the secondary mm_struct for each writer using it. So the updating of IMA measurements would have, theoretically, also write access to the SELinux AVC. Which I was trying to avoid. And similarly any other write rare updater. Is this correct? >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of >> the patch might spill across the page boundary, however if I deal with >> the modification of generic data, I shouldn't (shouldn't I?) assume that >> the data will not span across multiple pages. > > The reason for the particular architecture of text_poke() is to avoid > memory allocation to get it working. i think that prmem/rare_write > should have each rare-writable kernel address map to a unique user > address, possibly just by offsetting everything by a constant. For > rare_write, you don't actually need it to work as such until fairly > late in boot, since the rare_writable data will just be writable early > on. Yes, that is true. I think it's safe to assume, from an attack pattern, that as long as user space is not started, the system can be considered ok. Even user-space code run from initrd should be ok, since it can be bundled (and signed) as a single binary with the kernel. Modules loaded from a regular filesystem are a bit more risky, because an attack might inject a rogue key in the key-ring and use it to load malicious modules. >> If the data spans across multiple pages, in unknown amount, I suppose >> that I should not keep interrupts disabled for an unknown time, as it >> would hurt preemption. >> >> What I thought, in my initial patch-set, was to iterate over each page >> that must be written to, in a loop, re-enabling interrupts in-between >> iterations, to give pending interrupts a chance to be served. >> >> This would mean that the data being written to would not be consistent, >> but it's a problem that would have to be addressed anyways, since it can >> be still read by other cores, while the write is ongoing. > > This probably makes sense, except that enabling and disabling > interrupts means you also need to restore the original mm_struct (most > likely), which is slow. I don't think there's a generic way to check > whether in interrupt is pending without turning interrupts on. The only "excuse" I have is that write_rare is opt-in and is "rare". Maybe the enabling/disabling of interrupts - and the consequent switch of mm_struct - could be somehow tied to the latency configuration? If preemption is disabled, the expectations on the system latency are anyway more relaxed. But I'm not sure how it would work against I/O. -- igor
Re: [PATCH 10/17] prmem: documentation
On 01/11/2018 01:19, Andy Lutomirski wrote: ISTM you don't need that atomic operation -- you could take a spinlock and then just add one directly to the variable. It was my intention to provide a 1:1 conversion of existing code, as it should be easier to verify the correctness of the conversion, as long as there isn't any significant degradation in performance. The rework could be done afterward. -- igor
Re: [PATCH 10/17] prmem: documentation
On 01/11/2018 01:19, Andy Lutomirski wrote: ISTM you don't need that atomic operation -- you could take a spinlock and then just add one directly to the variable. It was my intention to provide a 1:1 conversion of existing code, as it should be easier to verify the correctness of the conversion, as long as there isn't any significant degradation in performance. The rework could be done afterward. -- igor
Re: [PATCH 10/17] prmem: documentation
On 30/10/2018 23:02, Andy Lutomirski wrote: On Oct 30, 2018, at 1:43 PM, Igor Stoppa wrote: There is no need to process each of these tens of thousands allocations and initialization as write-rare. Would it be possible to do the same here? I don’t see why not, although getting the API right will be a tad complicated. yes, I have some first-hand experience with this :-/ To subsequently modify q, p = rare_modify(q); q->a = y; Do you mean p->a = y; here? I assume the intent is that q isn't writable ever, but that's the one we have in the structure at rest. Yes, that was my intent, thanks. To handle the list case that Igor has pointed out, you might want to do something like this: list_for_each_entry(x, , entry) { struct foo *writable = rare_modify(entry); Would this mapping be impossible to spoof by other cores? Indeed. Only the core with the special mm loaded could see it. But I dislike allowing regular writes in the protected region. We really only need four write primitives: 1. Just write one value. Call at any time (except NMI). 2. Just copy some bytes. Same as (1) but any number of bytes. 3,4: Same as 1 and 2 but must be called inside a special rare write region. This is purely an optimization. Atomic? RCU? Yes, they are technically just memory writes, but shouldn't the "normal" operation be executed on the writable mapping? It is possible to sandwich any call between a rare_modify/rare_protect, however isn't that pretty close to having a write-rare version of these plain function. -- igor
Re: [PATCH 10/17] prmem: documentation
On 30/10/2018 23:02, Andy Lutomirski wrote: On Oct 30, 2018, at 1:43 PM, Igor Stoppa wrote: There is no need to process each of these tens of thousands allocations and initialization as write-rare. Would it be possible to do the same here? I don’t see why not, although getting the API right will be a tad complicated. yes, I have some first-hand experience with this :-/ To subsequently modify q, p = rare_modify(q); q->a = y; Do you mean p->a = y; here? I assume the intent is that q isn't writable ever, but that's the one we have in the structure at rest. Yes, that was my intent, thanks. To handle the list case that Igor has pointed out, you might want to do something like this: list_for_each_entry(x, , entry) { struct foo *writable = rare_modify(entry); Would this mapping be impossible to spoof by other cores? Indeed. Only the core with the special mm loaded could see it. But I dislike allowing regular writes in the protected region. We really only need four write primitives: 1. Just write one value. Call at any time (except NMI). 2. Just copy some bytes. Same as (1) but any number of bytes. 3,4: Same as 1 and 2 but must be called inside a special rare write region. This is purely an optimization. Atomic? RCU? Yes, they are technically just memory writes, but shouldn't the "normal" operation be executed on the writable mapping? It is possible to sandwich any call between a rare_modify/rare_protect, however isn't that pretty close to having a write-rare version of these plain function. -- igor
Build error in drivers/cpufreq/intel_pstate.c
Hi, I'm getting the following build error: /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In function ‘show_base_frequency’: /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: error: implicit declaration of function ‘intel_pstate_get_cppc_guranteed’; did you mean ‘intel_pstate_get_epp’? [-Werror=implicit-function-declaration] ratio = intel_pstate_get_cppc_guranteed(policy->cpu); ^~~ intel_pstate_get_epp on top of: commit 11743c56785c751c087eecdb98713eef796609e0 Merge: 929e134c43c9 928002a5e9da -- igor
Build error in drivers/cpufreq/intel_pstate.c
Hi, I'm getting the following build error: /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In function ‘show_base_frequency’: /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: error: implicit declaration of function ‘intel_pstate_get_cppc_guranteed’; did you mean ‘intel_pstate_get_epp’? [-Werror=implicit-function-declaration] ratio = intel_pstate_get_cppc_guranteed(policy->cpu); ^~~ intel_pstate_get_epp on top of: commit 11743c56785c751c087eecdb98713eef796609e0 Merge: 929e134c43c9 928002a5e9da -- igor
Re: [PATCH 16/17] prmem: pratomic-long
On 25/10/2018 01:13, Peter Zijlstra wrote: On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote: +static __always_inline +bool __pratomic_long_op(bool inc, struct pratomic_long_t *l) +{ + struct page *page; + uintptr_t base; + uintptr_t offset; + unsigned long flags; + size_t size = sizeof(*l); + bool is_virt = __is_wr_after_init(l, size); + + if (WARN(!(is_virt || likely(__is_wr_pool(l, size))), +WR_ERR_RANGE_MSG)) + return false; + local_irq_save(flags); + if (is_virt) + page = virt_to_page(l); + else + vmalloc_to_page(l); + offset = (~PAGE_MASK) & (uintptr_t)l; + base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL); + if (WARN(!base, WR_ERR_PAGE_MSG)) { + local_irq_restore(flags); + return false; + } + if (inc) + atomic_long_inc((atomic_long_t *)(base + offset)); + else + atomic_long_dec((atomic_long_t *)(base + offset)); + vunmap((void *)base); + local_irq_restore(flags); + return true; + +} That's just hideously nasty.. and horribly broken. We're not going to duplicate all these kernel interfaces wrapped in gunk like that. one possibility would be to have macros which use typeof() on the parameter being passed, to decide what implementation to use: regular or write-rare This means that type punning would still be needed, to select the implementation. Would this be enough? Is there some better way? Also, you _cannot_ call vunmap() with IRQs disabled. Clearly you've never tested this with debug bits enabled. I thought I had them. And I _did_ have them enabled, at some point. But I must have messed up with the configuration and I failed to notice this. I can think of a way it might work, albeit it's not going to be very pretty: * for the vmap(): if I understand correctly, it might sleep while obtaining memory for creating the mapping. This part could be executed before disabling interrupts. The rest of the function, instead, would be executed after interrupts are disabled. * for vunmap(): after the writing is done, change also the alternate mapping to read only, then enable interrupts and destroy the alternate mapping. Making also the secondary mapping read only makes it equally secure as the primary, which means that it can be visible also with interrupts enabled. -- igor
Re: [PATCH 16/17] prmem: pratomic-long
On 25/10/2018 01:13, Peter Zijlstra wrote: On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote: +static __always_inline +bool __pratomic_long_op(bool inc, struct pratomic_long_t *l) +{ + struct page *page; + uintptr_t base; + uintptr_t offset; + unsigned long flags; + size_t size = sizeof(*l); + bool is_virt = __is_wr_after_init(l, size); + + if (WARN(!(is_virt || likely(__is_wr_pool(l, size))), +WR_ERR_RANGE_MSG)) + return false; + local_irq_save(flags); + if (is_virt) + page = virt_to_page(l); + else + vmalloc_to_page(l); + offset = (~PAGE_MASK) & (uintptr_t)l; + base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL); + if (WARN(!base, WR_ERR_PAGE_MSG)) { + local_irq_restore(flags); + return false; + } + if (inc) + atomic_long_inc((atomic_long_t *)(base + offset)); + else + atomic_long_dec((atomic_long_t *)(base + offset)); + vunmap((void *)base); + local_irq_restore(flags); + return true; + +} That's just hideously nasty.. and horribly broken. We're not going to duplicate all these kernel interfaces wrapped in gunk like that. one possibility would be to have macros which use typeof() on the parameter being passed, to decide what implementation to use: regular or write-rare This means that type punning would still be needed, to select the implementation. Would this be enough? Is there some better way? Also, you _cannot_ call vunmap() with IRQs disabled. Clearly you've never tested this with debug bits enabled. I thought I had them. And I _did_ have them enabled, at some point. But I must have messed up with the configuration and I failed to notice this. I can think of a way it might work, albeit it's not going to be very pretty: * for the vmap(): if I understand correctly, it might sleep while obtaining memory for creating the mapping. This part could be executed before disabling interrupts. The rest of the function, instead, would be executed after interrupts are disabled. * for vunmap(): after the writing is done, change also the alternate mapping to read only, then enable interrupts and destroy the alternate mapping. Making also the secondary mapping read only makes it equally secure as the primary, which means that it can be visible also with interrupts enabled. -- igor
Re: [PATCH 02/17] prmem: write rare for static allocation
On 26/10/2018 10:41, Peter Zijlstra wrote: On Wed, Oct 24, 2018 at 12:34:49AM +0300, Igor Stoppa wrote: +static __always_inline That's far too large for inline. The reason for it is that it's supposed to minimize the presence of gadgets that might be used in JOP attacks. I am ready to stand corrected, if I'm wrong, but this is the reason why I did it. Regarding the function being too large, yes, I would not normally choose it for inlining. Actually, I would not normally use "__always_inline" and instead I would limit myself to plain "inline", at most. +bool wr_memset(const void *dst, const int c, size_t n_bytes) +{ + size_t size; + unsigned long flags; + uintptr_t d = (uintptr_t)dst; + + if (WARN(!__is_wr_after_init(dst, n_bytes), WR_ERR_RANGE_MSG)) + return false; + while (n_bytes) { + struct page *page; + uintptr_t base; + uintptr_t offset; + uintptr_t offset_complement; + + local_irq_save(flags); + page = virt_to_page(d); + offset = d & ~PAGE_MASK; + offset_complement = PAGE_SIZE - offset; + size = min(n_bytes, offset_complement); + base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL); + if (WARN(!base, WR_ERR_PAGE_MSG)) { + local_irq_restore(flags); + return false; + } + memset((void *)(base + offset), c, size); + vunmap((void *)base); BUG yes, somehow I managed to drop this debug configuration from the debug builds I made. [...] Also, I see an amount of duplication here that shows you're not nearly lazy enough. I did notice a certain amount of duplication, but I didn't know how to exploit it. -- igor
Re: [PATCH 02/17] prmem: write rare for static allocation
On 26/10/2018 10:41, Peter Zijlstra wrote: On Wed, Oct 24, 2018 at 12:34:49AM +0300, Igor Stoppa wrote: +static __always_inline That's far too large for inline. The reason for it is that it's supposed to minimize the presence of gadgets that might be used in JOP attacks. I am ready to stand corrected, if I'm wrong, but this is the reason why I did it. Regarding the function being too large, yes, I would not normally choose it for inlining. Actually, I would not normally use "__always_inline" and instead I would limit myself to plain "inline", at most. +bool wr_memset(const void *dst, const int c, size_t n_bytes) +{ + size_t size; + unsigned long flags; + uintptr_t d = (uintptr_t)dst; + + if (WARN(!__is_wr_after_init(dst, n_bytes), WR_ERR_RANGE_MSG)) + return false; + while (n_bytes) { + struct page *page; + uintptr_t base; + uintptr_t offset; + uintptr_t offset_complement; + + local_irq_save(flags); + page = virt_to_page(d); + offset = d & ~PAGE_MASK; + offset_complement = PAGE_SIZE - offset; + size = min(n_bytes, offset_complement); + base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL); + if (WARN(!base, WR_ERR_PAGE_MSG)) { + local_irq_restore(flags); + return false; + } + memset((void *)(base + offset), c, size); + vunmap((void *)base); BUG yes, somehow I managed to drop this debug configuration from the debug builds I made. [...] Also, I see an amount of duplication here that shows you're not nearly lazy enough. I did notice a certain amount of duplication, but I didn't know how to exploit it. -- igor