Re: [RFC PATCH v5 03/12] __wr_after_init: Core and default arch

2019-02-16 Thread Igor Stoppa




On 15/02/2019 10:57, Peter Zijlstra wrote:


Where are the comments and Changelog notes ? How is an arch maintainer
to be aware of this requirement when adding support for his/her arch?


Yes, it will be fixed in the next revision. I've added comment to the 
core wr_assign function and also to the changelogs for the patches 
enabling it on x86_64 and arm64, respectively.


Should I add mention of it also in the documentation?

--
igor



Re: [RFC PATCH v5 03/12] __wr_after_init: Core and default arch

2019-02-14 Thread Igor Stoppa




On 14/02/2019 13:28, Peter Zijlstra wrote:

On Thu, Feb 14, 2019 at 12:41:32AM +0200, Igor Stoppa wrote:


[...]


+#define wr_rcu_assign_pointer(p, v) ({ \
+   smp_mb();   \
+   wr_assign(p, v);\
+   p;  \
+})


This requires that wr_memcpy() (through wr_assign) is single-copy-atomic
for native types. There is not a comment in sight that states this.


Right, I kinda expected native-aligned <-> atomic, but it's not 
necessarily true. It should be confirmed when enabling write rare on a 
new architecture. I'll add the comment.



Also, is this true of x86/arm64 memcpy ?



For x86_64:
https://elixir.bootlin.com/linux/v5.0-rc6/source/arch/x86/include/asm/uaccess.h#L462 
 the mov"itype"  part should deal with atomic copy of native, aligned 
types.



For arm64:
https://elixir.bootlin.com/linux/v5.0-rc6/source/arch/arm64/lib/copy_template.S#L110 
.Ltiny15 deals with copying less than 16 bytes, which includes pointers. 
When the data is aligned, the copy of a pointer should be atomic.



--
igor


[RFC PATCH v5 05/12] __wr_after_init: x86_64: enable

2019-02-13 Thread Igor Stoppa
Set ARCH_HAS_PRMEM to Y for x86_64

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 68261430fe6e..7392b53b12c2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86_64
select SWIOTLB
select X86_DEV_DMA_OPS
select ARCH_HAS_SYSCALL_WRAPPER
+   select ARCH_HAS_PRMEM
 
 #
 # Arch settings
-- 
2.19.1



[RFC PATCH v5 10/12] __wr_after_init: rodata_test: test __wr_after_init

2019-02-13 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index e1349520b436..a669cf9f5a61 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,8 +16,23 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
 
+#ifdef CONFIG_PRMEM
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+#endif
+
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
 {
@@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data,
 
 void rodata_test(void)
 {
-   test_data("rodata", _test_data,
- (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata);
+   if (!test_data("rodata", _test_data,
+  (unsigned long)&__start_rodata,
+  (unsigned long)&__end_rodata))
+   return;
+#ifdef CONFIG_PRMEM
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init);
+#endif
 }
-- 
2.19.1



[RFC PATCH v5 11/12] __wr_after_init: test write rare functionality

2019-02-13 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.
It can be optionally compiled as module.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/Kconfig.debug   |   8 +++
 mm/Makefile|   1 +
 mm/test_write_rare.c (new) | 142 +++
 3 files changed, 151 insertions(+)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..a62c31901fea 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index ef3867c16ce0..8de1d468f4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PRMEM) += prmem.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..e9ebc8e12041
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static bool test_pattern(void)
+{
+   if (memchr_inv(array, '0', PAGE_SIZE / 2))
+   return pr_info("Pattern part 1 failed.");
+   if (memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) )
+   return pr_info("Pattern part 2 failed.");
+   if (memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2))
+   return pr_info("Pattern part 3 failed.");
+   if (memchr_inv(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4))
+   return pr_info("Pattern part 4 failed.");
+   if (memchr_inv(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2))
+   return pr_info("Pattern part 5 failed.");
+   return 0;
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memchr_inv(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memchr_inv(array, '0', PAGE_SIZE * 3),
+"Array page aligned write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array half page aligned write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2),
+"Array quarter page aligned write rare memset test failed."))
+   return false;
+
+   if (WARN(test_pattern(), "Array write rare memset test failed."))
+   return false;
+
+   pr_info("Array write rare memset test passed.");
+   return true;
+}
+
+static u8 array_1[PAGE_SIZE * 2];
+static u8 array_2[PAGE_SIZE * 2];
+
+static bool test_wr_memcp

[RFC PATCH v5 09/12] __wr_after_init: rodata_test: refactor tests

2019-02-13 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory, when available.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..e1349520b436 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   pr_info("%s tests were successful", data_type);
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata);
 }
-- 
2.19.1



[RFC PATCH v5 12/12] IMA: turn ima_policy_flags into __wr_after_init

2019-02-13 Thread Igor Stoppa
The policy flags could be targeted by an attacker aiming at disabling IMA,
so that there would be no trace of a file system modification in the
measurement list.

Since the flags can be altered at runtime, it is not possible to make
them become fully read-only, for example with __ro_after_init.

__wr_after_init can still provide some protection, at least against
simple memory overwrite attacks

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 security/integrity/ima/ima.h| 3 ++-
 security/integrity/ima/ima_policy.c | 9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index cc12f3449a72..297c25f5122e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../integrity.h"
@@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
 #define IMA_TEMPLATE_IMA_FMT "d|n"
 
 /* current content of the policy */
-extern int ima_policy_flag;
+extern int ima_policy_flag __wr_after_init;
 
 /* set during initialization */
 extern int ima_hash_algo;
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 8bc8a1c8cb3f..d49c545b9cfb 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -48,7 +48,7 @@
 #define INVALID_PCR(a) (((a) < 0) || \
(a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8))
 
-int ima_policy_flag;
+int ima_policy_flag __wr_after_init;
 static int temp_ima_appraise;
 static int build_ima_appraise __ro_after_init;
 
@@ -460,12 +460,13 @@ void ima_update_policy_flag(void)
 
list_for_each_entry(entry, ima_rules, list) {
if (entry->action & IMA_DO_MASK)
-   ima_policy_flag |= entry->action;
+   wr_assign(ima_policy_flag,
+ ima_policy_flag | entry->action);
}
 
ima_appraise |= (build_ima_appraise | temp_ima_appraise);
if (!ima_appraise)
-   ima_policy_flag &= ~IMA_APPRAISE;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE);
 }
 
 static int ima_appraise_flag(enum ima_hooks func)
@@ -651,7 +652,7 @@ void ima_update_policy(void)
list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu);
 
if (ima_rules != policy) {
-   ima_policy_flag = 0;
+   wr_assign(ima_policy_flag, 0);
ima_rules = policy;
 
/*
-- 
2.19.1



[RFC PATCH v5 08/12] __wr_after_init: lkdtm test

2019-02-13 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
attribute will cause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[RFC PATCH v5 06/12] __wr_after_init: arm64: enable

2019-02-13 Thread Igor Stoppa
Set ARCH_HAS_PRMEM to Y for arm64

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..7cbb2c133ed7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -66,6 +66,7 @@ config ARM64
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_PRMEM
select ARM_AMBA
select ARM_ARCH_TIMER
select ARM_GIC
-- 
2.19.1



[RFC PATCH v5 07/12] __wr_after_init: Documentation: self-protection

2019-02-13 Thread Igor Stoppa
Update the self-protection documentation, to mention also the use of the
__wr_after_init attribute.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 Documentation/security/self-protection.rst | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/security/self-protection.rst 
b/Documentation/security/self-protection.rst
index f584fb74b4ff..df2614bc25b9 100644
--- a/Documentation/security/self-protection.rst
+++ b/Documentation/security/self-protection.rst
@@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, 
these can
 be marked with the (new and under development) ``__ro_after_init``
 attribute.
 
-What remains are variables that are updated rarely (e.g. GDT). These
-will need another infrastructure (similar to the temporary exceptions
-made to kernel code mentioned above) that allow them to spend the rest
-of their lifetime read-only. (For example, when being updated, only the
-CPU thread performing the update would be given uninterruptible write
-access to the memory.)
+Others, which are statically allocated, but still need to be updated
+rarely, can be marked with the ``__wr_after_init`` attribute.
+
+The update mechanism must avoid exposing the data to rogue alterations
+during the update. For example, only the CPU thread performing the update
+would be given uninterruptible write access to the memory.
+
+Currently there is no protection available for data allocated dynamically.
 
 Segregation of kernel memory from userspace memory
 ~~
-- 
2.19.1



[RFC PATCH v5 00/12] hardening: statically allocated protected memory

2019-02-13 Thread Igor Stoppa
To: Andy Lutomirski ,
To: Matthew Wilcox ,
To: Nadav Amit 
To: Peter Zijlstra ,
To: Dave Hansen ,
To: Mimi Zohar 
To: Thiago Jung Bauermann 
CC: Kees Cook 
CC: Ahmed Soliman 
CC: linux-integrity 
CC: Kernel Hardening 
CC: Linux-MM 
CC: Linux Kernel Mailing List 

Hello,
new version of the patchset, with default memset_user() function.

Patch-set implementing write-rare memory protection for statically
allocated data.
Its purpose is to keep write protected the kernel data which is seldom
modified, especially if altering it can be exploited during an attack.

There is no read overhead, however writing requires special operations that
are probably unsuitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* verified (and enabled) only x86_64 and arm64; other architectures need to
  be tested, possibly providing own backend.

Some notes:
- in case an architecture doesn't support write rare, the behavior is to
  fallback to regular write operations
- before altering any memory, the destination is sanitized
- write rare data is segregated into own set of pages
- only x86_64 and arm64 verified, atm
- the memset_user() assembly functions seems to work, but I'm not too sure
  they are really ok
- I've added a simple example: the protection of ima_policy_flags
- the last patch is optional, but it seemed worth to do the refactoring
- the x86_64 user space address range is double the size of the kernel
  address space, so it's possible to randomize the beginning of the
  mapping of the kernel address space, but on arm64 they have the same
  size, so it's not possible to do the same. Eventually, the randomization
  could affect exclusively the ranges containing protectable memory, but
  this should be done togeter with the protection of dynamically allocated
  data (once it is available).
- unaddressed: Nadav proposed to do:
#define __wr  __attribute__((address_space(5)))
  but I don't know exactly where to use it atm

Changelog:

v4->v5
--
* turned conditional inclusion of mm.h into permanent
* added generic, albeit unoptimized memset_user() function
* more verbose error messages for testing of wr_memset()

v3->v4
--

* added function for setting memory in user space mapping for arm64
* refactored code, to work with both supported architectures
* reduced dependency on x86_64 specific code, to support by default also
  arm64
* improved memset_user() for x86_64, but I'm not sure if I understood
  correctly what was the best way to enhance it.

v2->v3
--

* both wr_memset and wr_memcpy are implemented as generic functions
  the arch code must provide suitable helpers
* regular initialization for ima_policy_flags: it happens during init
* remove spurious code from the initialization function

v1->v2
--

* introduce cleaner split between generic and arch code
* add x86_64 specific memset_user()
* replace kernel-space memset() memcopy() with userspace counterpart
* randomize the base address for the alternate map across the entire
  available address range from user space (128TB - 64TB)
* convert BUG() to WARN()
* turn verification of written data into debugging option
* wr_rcu_assign_pointer() as special case of wr_assign()
* example with protection of ima_policy_flags
* documentation

Igor Stoppa (11):
  __wr_after_init: linker section and attribute
  __wr_after_init: Core and default arch
  __wr_after_init: x86_64: randomize mapping offset
  __wr_after_init: x86_64: enable
  __wr_after_init: arm64: enable
  __wr_after_init: Documentation: self-protection
  __wr_after_init: lkdtm test
  __wr_after_init: rodata_test: refactor tests
  __wr_after_init: rodata_test: test __wr_after_init
  __wr_after_init: test write rare functionality
  IMA: turn ima_policy_flags into __wr_after_init

Nadav Amit (1):
  fork: provide a function for copying init_mm

 Documentation/security/self-protection.rst |  14 +-
 arch/Kconfig   |  22 +++
 arch/arm64/Kconfig |   1 +
 arch/x86/Kconfig   |   1 +
 arch/x86/mm/Makefile   |   2 +
 arch/x86/mm/prmem.c (new)  |  20 +++
 drivers/misc/lkdtm/core.c  |   3 +
 drivers/misc/lkdtm/lkdtm.h |   3 +
 drivers/misc/lkdtm/perms.c |  29 
 include/asm-generic/vmlinux.lds.h  |  25 +++
 include/linux/cache.h  |  21 +++
 include/linux/prmem.h (new)|  70 
 include/linux/sched/task.h |   1 +
 init/main.c|   3 +
 kernel/fork.c  |  24 ++-
 mm/Kconfig.debug   |   8 +

[RFC PATCH v5 02/12] __wr_after_init: linker section and attribute

2019-02-13 Thread Igor Stoppa
Introduce a linker section and a matching attribute for statically
allocated write rare data. The attribute is named "__wr_after_init".
After the init phase is completed, this section will be modifiable only by
invoking write rare functions.
The section occupies a set of full pages, since the granularity
available for write protection is of one memory page.

The functionality is automatically activated by any architecture that sets
CONFIG_ARCH_HAS_PRMEM

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/Kconfig  | 15 +++
 include/asm-generic/vmlinux.lds.h | 25 +
 include/linux/cache.h | 21 +
 init/main.c   |  3 +++
 4 files changed, 64 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 4cfb6de48f79..b0b6d176f1c1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -808,6 +808,21 @@ config VMAP_STACK
  the stack to map directly to the KASAN shadow map using a formula
  that is incorrect if the stack is in vmalloc space.
 
+config ARCH_HAS_PRMEM
+   def_bool n
+   help
+ architecture specific symbol stating that the architecture provides
+ a back-end function for the write rare operation.
+
+config PRMEM
+   bool "Write protect critical data that doesn't need high write speed."
+   depends on ARCH_HAS_PRMEM
+   default y
+   help
+ If the architecture supports it, statically allocated data which
+ has been selected for hardening becomes (mostly) read-only.
+ The selection happens by labelling the data "__wr_after_init".
+
 config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
 
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 3d7a6a9c2370..ddb1fd608490 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -311,6 +311,30 @@
KEEP(*(__jump_table))   \
__stop___jump_table = .;
 
+/*
+ * Allow architectures to handle wr_after_init data on their
+ * own by defining an empty WR_AFTER_INIT_DATA.
+ * However, it's important that pages containing WR_RARE data do not
+ * hold anything else, to avoid both accidentally unprotecting something
+ * that is supposed to stay read-only all the time and also to protect
+ * something else that is supposed to be writeable all the time.
+ */
+#ifndef WR_AFTER_INIT_DATA
+#ifdef CONFIG_PRMEM
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(PAGE_SIZE);   \
+   __start_wr_after_init = .;  \
+   . = ALIGN(align);   \
+   *(.data..wr_after_init) \
+   . = ALIGN(PAGE_SIZE);   \
+   __end_wr_after_init = .;\
+   . = ALIGN(align);
+#else
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(align);
+#endif
+#endif
+
 /*
  * Allow architectures to handle ro_after_init data on their
  * own by defining an empty RO_AFTER_INIT_DATA.
@@ -332,6 +356,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
RO_AFTER_INIT_DATA  /* Read only after init */  \
+   WR_AFTER_INIT_DATA(align) /* wr after init */   \
KEEP(*(__vermagic)) /* Kernel version magic */  \
. = ALIGN(8);   \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 750621e41d1c..09bd0b9284b6 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -31,6 +31,27 @@
 #define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
 #endif
 
+/*
+ * __wr_after_init is used to mark objects that cannot be modified
+ * directly after init (i.e. after mark_rodata_ro() has been called).
+ * These objects become effectively read-only, from the perspective of
+ * performing a direct write, like a variable assignment.
+ * However, they can be altered through a dedicated function.
+ * It is intended for those objects which are occasionally modified after
+ * init, however they are modified so seldomly, that the extra cost from
+ * the indirect modification is either negligible or worth paying, for the
+ * sake of the protection gained.
+ */
+#ifn

[RFC PATCH v5 04/12] __wr_after_init: x86_64: randomize mapping offset

2019-02-13 Thread Igor Stoppa
x86_64 specialized way of defining the base address for the alternate
mapping used by write-rare.

Since the kernel address space spans across 64TB and it is mapped into a
used address space of 128TB, the kernel address space can be shifted by a
random offset that is up to 64TB and page aligned.

This is accomplished by providing arch-specific version of the function
__init_wr_base()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/mm/Makefile  |  2 ++
 arch/x86/mm/prmem.c (new) | 20 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..66652de1e2c7 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
+
+obj-$(CONFIG_PRMEM)+= prmem.o
diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c
new file mode 100644
index ..b04fc03f92fb
--- /dev/null
+++ b/arch/x86/mm/prmem.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library - x86_64 backend
+ *
+ * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+
+unsigned long __init __init_wr_base(void)
+{
+   /*
+* Place 64TB of kernel address space within 128TB of user address
+* space, at a random page aligned offset.
+*/
+   return (((unsigned long)kaslr_get_random_long("WR Poke")) &
+   PAGE_MASK) % (64 * _BITUL(40));
+}
-- 
2.19.1



[RFC PATCH v5 03/12] __wr_after_init: Core and default arch

2019-02-13 Thread Igor Stoppa
The patch provides:
- the core functionality for write-rare after init for statically
  allocated data, based on code from Matthew Wilcox
- the default implementation for generic architecture
  A specific architecture can override one or more of the default
  functions.

The core (API) functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

In case either the selected architecture doesn't support write rare
after init, or the functionality is disabled, the write rare functions
will resolve into their non-write rare counterpart:
- memset()
- memcpy()
- assignment operator
- rcu_assign_pointer()

For code that can be either link as module or as built-in (ex: device
driver init function), it is not possible to tell upfront what will be the
case. For this scenario if the functions are called during system init,
they will automatically choose, at runtime, to go through the fast path of
non-write rare. Should they be invoked later, during module init, they
will use the write-rare path.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/Kconfig|   7 ++
 include/linux/prmem.h (new) |  70 ++
 mm/Makefile |   1 +
 mm/prmem.c (new)| 193 ++
 4 files changed, 271 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index b0b6d176f1c1..0380d4a64681 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -814,6 +814,13 @@ config ARCH_HAS_PRMEM
  architecture specific symbol stating that the architecture provides
  a back-end function for the write rare operation.
 
+config ARCH_HAS_PRMEM_HEADER
+   def_bool n
+   depends on ARCH_HAS_PRMEM
+   help
+ architecture specific symbol stating that the architecture provides
+ own specific header back-end for the write rare operation.
+
 config PRMEM
bool "Write protect critical data that doesn't need high write speed."
depends on ARCH_HAS_PRMEM
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..05a5e5b3abfd
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library - generic part
+ *
+ * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+#include 
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t n)
+{
+   return memset(p, c, n);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t n)
+{
+   return memcpy(p, q, n);
+}
+
+#define wr_assign(var, val)((var) = (val))
+#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v)
+
+#else
+
+void *wr_memset(void *p, int c, __kernel_size_t n);
+void *wr_memcpy(void *p, const void *q, __kernel_size_t n);
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ */
+
+#define wr_assign(dst, val) ({ \
+   typeof(dst) tmp = (typeof(dst))val; \
+   \
+   wr_memcpy(, , sizeof(dst)); \
+   dst;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer - it MUST be aligned to a machine word
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ * The rcu_assign_pointer() is implemented as equivalent of:
+ *
+ * smp_mb();
+ * WRITE_ONCE();
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   smp_mb();   \
+   wr_assign(p, v);\
+   p;  \
+})
+#endif
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..ef3867c16ce0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM)   += sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PRMEM) += prmem.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/prmem.c b/mm/prmem.c
new file mode 100644
index ..455e1e446260
--- /dev/null
+++ b/mm/prmem.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+

Re: [RFC PATCH v4 01/12] __wr_after_init: Core and default arch

2019-02-11 Thread Igor Stoppa




On 12/02/2019 04:39, Matthew Wilcox wrote:

On Tue, Feb 12, 2019 at 01:27:38AM +0200, Igor Stoppa wrote:

+#ifndef CONFIG_PRMEM

[...]

+#else
+
+#include 


It's a mistake to do conditional includes like this.  That way you see
include loops with some configs and not others.  Our headers are already
so messy, better to just include mm.h unconditionally.



ok

Can I still do the following, in prmem.c ?

#ifdef CONFIG_ARCH_HAS_PRMEM_HEADER
+#include 
+#else
+
+struct wr_state {
+   struct mm_struct *prev;
+};
+
+#endif


It's still a conditional include, but it's in a C file, it shouldn't 
cause any chance of loops.


The alternative is that each arch supporting prmem must have a 
(probably) empty asm/prmem.h header.


I did some reasearch about telling the compiler to include a header only 
if it exists, but it doesn't seem to be a gcc feature.


--
igor


Re: [RFC PATCH v4 00/12] hardening: statically allocated protected memory

2019-02-11 Thread Igor Stoppa




On 12/02/2019 03:26, Kees Cook wrote:

On Mon, Feb 11, 2019 at 5:08 PM igor.sto...@gmail.com
 wrote:




On Tue, 12 Feb 2019, 4.47 Kees Cook 

On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa  wrote:




On 12/02/2019 02:09, Kees Cook wrote:

On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa  wrote:
It looked like only the memset() needed architecture support. Is there
a reason for not being able to implement memset() in terms of an
inefficient put_user() loop instead? That would eliminate the need for
per-arch support, yes?


So far, yes, however from previous discussion about power arch, I
understood this implementation would not be so easy to adapt.
Lacking other examples where the extra mapping could be used, I did not
want to add code without a use case.

Probably both arm and x86 32 bit could do, but I would like to first get
to the bitter end with memory protection (the other 2 thirds).

Mostly, I hated having just one arch and I also really wanted to have arm64.


Right, I meant, if you implemented the _memset() case with put_user()
in this version, you could drop the arch-specific _memset() and shrink
the patch series. Then you could also enable this across all the
architectures in one patch. (Would you even need the Kconfig patches,
i.e. won't this "Just Work" on everything with an MMU?)



I had similar thoughts, but this answer [1] deflated my hopes (if I understood 
it correctly).
It seems that each arch needs to be massaged in separately.


True, but I think x86_64, x86, arm64, and arm will all be "normal".
power may be that way too, but they always surprise me. :)

Anyway, series looks good, but since nothing uses _memset(), it might
make sense to leave it out and put all the arch-enabling into a single
patch to cover the 4 archs above, in an effort to make the series even
smaller.


Actually, I do use it, albeit indirectly.
That's the whole point of having the IMA patch as example.

This is the fragment:

@@ -460,12 +460,13 @@ void ima_update_policy_flag(void)

list_for_each_entry(entry, ima_rules, list) {
if (entry->action & IMA_DO_MASK)
-   ima_policy_flag |= entry->action;
+   wr_assign(ima_policy_flag,
+ ima_policy_flag | entry->action);
}

ima_appraise |= (build_ima_appraise | temp_ima_appraise);
if (!ima_appraise)
-   ima_policy_flag &= ~IMA_APPRAISE;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE);
 }


wr_assign() does just that.

However, reading again your previous mails, I realize that I might have 
misinterpreted what you were suggesting.


If the advice is to have also a default memset_user() which relies on 
put_user(), but do not activate the feature by default for every 
architecture, I definitely agree that it would be good to have it.

I just didn't think about it before.

What I cannot do is to turn it on for all the architectures prior to 
test it and atm I do not have means to do it.


But I now realize that most likely you were just suggesting to have 
full, albeit inefficient default support and then let various archs 
review/enhance it. I can certainly do this.


Regarding testing I have a question: how much can/should I lean on qemu?
In most cases the MMU might not need to be fully emulated, so I wonder 
how well qemu-based testing can ensure that real life scenarios will work.


--
igor


Re: [RFC PATCH v4 00/12] hardening: statically allocated protected memory

2019-02-11 Thread Igor Stoppa




On 12/02/2019 02:09, Kees Cook wrote:

On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa  wrote:


[...]


Patch-set implementing write-rare memory protection for statically
allocated data.


It seems like this could be expanded in the future to cover dynamic
memory too (i.e. just a separate base range in the mm).


Indeed. And part of the code refactoring is also geared in that 
direction. I am working on that part, but it was agreed that I would 
first provide this subset of features covering statically allocated 
memory. So I'm sticking to the plan. But this is roughly 1/3 of the 
basic infra I have in mind.



Its purpose is to keep write protected the kernel data which is seldom
modified, especially if altering it can be exploited during an attack.

There is no read overhead, however writing requires special operations that
are probably unsuitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* supports only x86_64 and arm64;other architectures need to provide own
   backend


It looked like only the memset() needed architecture support. Is there
a reason for not being able to implement memset() in terms of an
inefficient put_user() loop instead? That would eliminate the need for
per-arch support, yes?


So far, yes, however from previous discussion about power arch, I 
understood this implementation would not be so easy to adapt.
Lacking other examples where the extra mapping could be used, I did not 
want to add code without a use case.


Probably both arm and x86 32 bit could do, but I would like to first get 
to the bitter end with memory protection (the other 2 thirds).


Mostly, I hated having just one arch and I also really wanted to have arm64.

But eventually, yes, a generic put_user() loop could do, provided that 
there are other arch where the extra mapping to user space would be a 
good way to limit write access. This last part is what I'm not sure of.



- I've added a simple example: the protection of ima_policy_flags


You'd also looked at SELinux too, yes? What other things could be
targeted for protection? (It seems we can't yet protect page tables
themselves with this...)


Yes, I have. See the "1/3" explanation above. I'm also trying to get 
away with as small example as possible, to get the basic infra merged.
SELinux is not going to be a small patch set. I'd rather move to it once 
at least some of the framework is merged. It might be a good use case 
for dynamic allocation, if I do not find something smaller.
But for static write rare, going after IMA was easier, and it is still a 
good target for protection, imho, as flipping this variable should be 
sufficient for turning IMA off.


For the page tables, I have in mind a little bit different approach, 
that I hope to explain better once I get to do the dynamic allocation.



- the x86_64 user space address range is double the size of the kernel
   address space, so it's possible to randomize the beginning of the
   mapping of the kernel address space, but on arm64 they have the same
   size, so it's not possible to do the same


Only the wr_rare section needs mapping, though, yes?


Yup, however, once more, I'm not so keen to do what seems as premature 
optimization, before I have addressed the framework in its entirety, as 
the dynamic allocation will need similar treatment.



- I'm not sure if it's correct, since it doesn't seem to be that common in
   kernel sources, but instead of using #defines for overriding default
   function calls, I'm using "weak" for the default functions.


The tradition is to use #defines for easier readability, but "weak"
continues to be a thing. *shrug*


Yes, I wasn't so sure about it, but I kinda liked the fact that, by 
using "weak", the arch header becomes optional, unless one has to 
redefine the struct wr_state.



This will be a nice addition to protect more of the kernel's static
data from write-what-where attacks. :)


I hope so :-)

--
thanks, igor


[RFC PATCH v4 03/12] __wr_after_init: x86_64: randomize mapping offset

2019-02-11 Thread Igor Stoppa
x86_64 specialized way of defining the base address for the alternate
mapping used by write-rare.

Since the kernel address space spans across 64TB and it is mapped into a
used address space of 128TB, the kernel address space can be shifted by a
random offset that is up to 64TB and page aligned.

This is accomplished by providing arch-specific version of the function
__init_wr_base()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/mm/Makefile  |  2 ++
 arch/x86/mm/prmem.c (new) | 20 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..66652de1e2c7 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
+
+obj-$(CONFIG_PRMEM)+= prmem.o
diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c
new file mode 100644
index ..b04fc03f92fb
--- /dev/null
+++ b/arch/x86/mm/prmem.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library - x86_64 backend
+ *
+ * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+
+unsigned long __init __init_wr_base(void)
+{
+   /*
+* Place 64TB of kernel address space within 128TB of user address
+* space, at a random page aligned offset.
+*/
+   return (((unsigned long)kaslr_get_random_long("WR Poke")) &
+   PAGE_MASK) % (64 * _BITUL(40));
+}
-- 
2.19.1



[RFC PATCH v4 08/12] __wr_after_init: lkdtm test

2019-02-11 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
attribute will cause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[RFC PATCH v4 11/12] __wr_after_init: test write rare functionality

2019-02-11 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.
It can be optionally compiled as module.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/Kconfig.debug   |   8 +++
 mm/Makefile|   1 +
 mm/test_write_rare.c (new) | 136 +++
 3 files changed, 145 insertions(+)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..a62c31901fea 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index ef3867c16ce0..8de1d468f4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PRMEM) += prmem.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..dd2a0e2d6024
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static bool test_pattern(void)
+{
+   return (memchr_inv(array, '0', PAGE_SIZE / 2) ||
+   memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ||
+   memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) ||
+   memchr_inv(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) ||
+   memchr_inv(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2));
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memchr_inv(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memchr_inv(array, '0', PAGE_SIZE * 3),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memchr_inv(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memchr_inv(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2),
+"Array write rare memset test failed."))
+   return false;
+
+   if (WARN(test_pattern(), "Array write rare memset test failed."))
+   return false;
+
+   pr_info("Array write rare memset test passed.");
+   return true;
+}
+
+static u8 array_1[PAGE_SIZE * 2];
+static u8 array_2[PAGE_SIZE * 2];
+
+static bool test_wr_memcpy(void)
+{
+   int new_val = 0x12345678;
+
+   wr_assign(scalar, new_val);
+   if (WARN(memcmp(, _val, sizeof(scalar)),
+"Scalar write rare memcpy test failed."))
+   return false;
+   pr_info("Scalar write rare memcpy test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   memset(array_1, '1', P

[RFC PATCH v4 12/12] IMA: turn ima_policy_flags into __wr_after_init

2019-02-11 Thread Igor Stoppa
The policy flags could be targeted by an attacker aiming at disabling IMA,
so that there would be no trace of a file system modification in the
measurement list.

Since the flags can be altered at runtime, it is not possible to make
them become fully read-only, for example with __ro_after_init.

__wr_after_init can still provide some protection, at least against
simple memory overwrite attacks

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 security/integrity/ima/ima.h| 3 ++-
 security/integrity/ima/ima_policy.c | 9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index cc12f3449a72..297c25f5122e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../integrity.h"
@@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
 #define IMA_TEMPLATE_IMA_FMT "d|n"
 
 /* current content of the policy */
-extern int ima_policy_flag;
+extern int ima_policy_flag __wr_after_init;
 
 /* set during initialization */
 extern int ima_hash_algo;
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 8bc8a1c8cb3f..d49c545b9cfb 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -48,7 +48,7 @@
 #define INVALID_PCR(a) (((a) < 0) || \
(a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8))
 
-int ima_policy_flag;
+int ima_policy_flag __wr_after_init;
 static int temp_ima_appraise;
 static int build_ima_appraise __ro_after_init;
 
@@ -460,12 +460,13 @@ void ima_update_policy_flag(void)
 
list_for_each_entry(entry, ima_rules, list) {
if (entry->action & IMA_DO_MASK)
-   ima_policy_flag |= entry->action;
+   wr_assign(ima_policy_flag,
+ ima_policy_flag | entry->action);
}
 
ima_appraise |= (build_ima_appraise | temp_ima_appraise);
if (!ima_appraise)
-   ima_policy_flag &= ~IMA_APPRAISE;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE);
 }
 
 static int ima_appraise_flag(enum ima_hooks func)
@@ -651,7 +652,7 @@ void ima_update_policy(void)
list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu);
 
if (ima_rules != policy) {
-   ima_policy_flag = 0;
+   wr_assign(ima_policy_flag, 0);
ima_rules = policy;
 
/*
-- 
2.19.1



[RFC PATCH v4 05/12] __wr_after_init: arm64: memset_user()

2019-02-11 Thread Igor Stoppa
arm64 specific version of memset() for user space, memset_user()

In the __wr_after_init scenario, write-rare variables have:
- a primary read-only mapping in kernel memory space
- an alternate, writable mapping, implemented as user-space mapping

The write rare implementation expects the arch code to privide a
memset_user() function, which is currently missing.

clear_user() is the base for memset_user()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/arm64/include/asm/uaccess.h   |  9 +
 arch/arm64/lib/Makefile|  2 +-
 arch/arm64/lib/memset_user.S (new) | 63 
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 547d7a0c9d05..0094f92a8f1b 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -415,6 +415,15 @@ extern unsigned long __must_check __arch_copy_in_user(void 
__user *to, const voi
 #define INLINE_COPY_TO_USER
 #define INLINE_COPY_FROM_USER
 
+extern unsigned long __must_check __arch_memset_user(void __user *to, int c, 
unsigned long n);
+static inline unsigned long __must_check __memset_user(void __user *to, int c, 
unsigned long n)
+{
+   if (access_ok(to, n))
+   n = __arch_memset_user(__uaccess_mask_ptr(to), c, n);
+   return n;
+}
+#define memset_user__memset_user
+
 extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned 
long n);
 static inline unsigned long __must_check __clear_user(void __user *to, 
unsigned long n)
 {
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 5540a1638baf..614b090888de 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-lib-y  := clear_user.o delay.o copy_from_user.o\
+lib-y  := clear_user.o memset_user.o delay.o copy_from_user.o  \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
diff --git a/arch/arm64/lib/memset_user.S b/arch/arm64/lib/memset_user.S
new file mode 100644
index ..1bfbda3d112b
--- /dev/null
+++ b/arch/arm64/lib/memset_user.S
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * memset_user.S - memset for userspace on arm64
+ *
+ * (C) Copyright 2018 Huawey Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Based on arch/arm64/lib/clear_user.S
+ */
+
+#include 
+
+#include 
+
+   .text
+
+/* Prototype: int __arch_memset_user(void *addr, int c, size_t n)
+ * Purpose  : set n bytes of user memory at "addr" to the value "c"
+ * Params   : x0 - addr, user memory address to set
+ *  : x1 - c, byte value
+ *  : x2 - n, number of bytes to set
+ * Returns  : number of bytes NOT set
+ *
+ * Alignment fixed up by hardware.
+ */
+ENTRY(__arch_memset_user)
+   uaccess_enable_not_uao x3, x4, x5
+   // replicate the byte to the whole register
+   and x1, x1, 0xff
+   lsl x3, x1, 8
+   orr x1, x3, x1
+   lsl x3, x1, 16
+   orr x1, x3, x1
+   lsl x3, x1, 32
+   orr x1, x3, x1
+   mov x3, x2  // save the size for fixup return
+   subsx2, x2, #8
+   b.mi2f
+1:
+uao_user_alternative 9f, str, sttr, x1, x0, 8
+   subsx2, x2, #8
+   b.pl1b
+2: addsx2, x2, #4
+   b.mi3f
+uao_user_alternative 9f, str, sttr, x1, x0, 4
+   sub x2, x2, #4
+3: addsx2, x2, #2
+   b.mi4f
+uao_user_alternative 9f, strh, sttrh, w1, x0, 2
+   sub x2, x2, #2
+4: addsx2, x2, #1
+   b.mi5f
+uao_user_alternative 9f, strb, sttrb, w1, x0, 0
+5: mov x0, #0
+   uaccess_disable_not_uao x3, x4
+   ret
+ENDPROC(__arch_memset_user)
+
+   .section .fixup,"ax"
+   .align  2
+9: mov x0, x3  // return the original size
+   ret
+   .previous
-- 
2.19.1



[RFC PATCH v4 02/12] __wr_after_init: x86_64: memset_user()

2019-02-11 Thread Igor Stoppa
x86_64 specific version of memset() for user space, memset_user()

In the __wr_after_init scenario, write-rare variables have:
- a primary read-only mapping in kernel memory space
- an alternate, writable mapping, implemented as user-space mapping

The write rare implementation expects the arch code to privide a
memset_user() function, which is currently missing.

clear_user() is the base for memset_user()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/uaccess_64.h |  6 
 arch/x86/lib/usercopy_64.c| 51 +
 2 files changed, 57 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index a9d637bc301d..f194bfce4866 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len);
 unsigned long
 mcsafe_handle_tail(char *to, char *from, unsigned len);
 
+unsigned long __must_check
+memset_user(void __user *mem, int c, unsigned long len);
+
+unsigned long __must_check
+__memset_user(void __user *mem, int c, unsigned long len);
+
 #endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index ee42bb0cbeb3..e61963585354 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -9,6 +9,57 @@
 #include 
 #include 
 
+/*
+ * Memset Userspace
+ */
+
+unsigned long __memset_user(void __user *addr, int c, unsigned long size)
+{
+   long __d0;
+   unsigned long  pattern = 0x0101010101010101UL * (0xFFUL & c);
+
+   might_fault();
+   /* no memory constraint: gcc doesn't know about this memory */
+   stac();
+   asm volatile(
+   "   movq %[pattern], %%rdx\n"
+   "   testq  %[size8],%[size8]\n"
+   "   jz 4f\n"
+   "0: mov %%rdx,(%[dst])\n"
+   "   addq   $8,%[dst]\n"
+   "   decl %%ecx ; jnz   0b\n"
+   "4: movq  %[size1],%%rcx\n"
+   "   testl %%ecx,%%ecx\n"
+   "   jz 2f\n"
+   "1: movb   %%dl,(%[dst])\n"
+   "   incq   %[dst]\n"
+   "   decl %%ecx ; jnz  1b\n"
+   "2:\n"
+   ".section .fixup,\"ax\"\n"
+   "3: lea 0(%[size1],%[size8],8),%[size8]\n"
+   "   jmp 2b\n"
+   ".previous\n"
+   _ASM_EXTABLE_UA(0b, 3b)
+   _ASM_EXTABLE_UA(1b, 2b)
+   : [size8] "="(size), [dst] "=" (__d0)
+   : [size1] "r" (size & 7), "[size8]" (size / 8),
+ "[dst]" (addr), [pattern] "r" (pattern)
+   : "rdx");
+
+   clac();
+   return size;
+}
+EXPORT_SYMBOL(__memset_user);
+
+unsigned long memset_user(void __user *to, int c, unsigned long n)
+{
+   if (access_ok(to, n))
+   return __memset_user(to, c, n);
+   return n;
+}
+EXPORT_SYMBOL(memset_user);
+
+
 /*
  * Zero Userspace
  */
-- 
2.19.1



[RFC PATCH v4 06/12] __wr_after_init: arm64: enable

2019-02-11 Thread Igor Stoppa
Set ARCH_HAS_PRMEM to Y for arm64

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..7cbb2c133ed7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -66,6 +66,7 @@ config ARM64
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_PRMEM
select ARM_AMBA
select ARM_ARCH_TIMER
select ARM_GIC
-- 
2.19.1



[RFC PATCH v4 01/12] __wr_after_init: Core and default arch

2019-02-11 Thread Igor Stoppa
The patch provides:
- the core functionality for write-rare after init for statically
  allocated data, based on code from Matthew Wilcox
- the default implementation for generic architecture
  A specific architecture can override one or more of the default
  functions.

The core (API) functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

In case either the selected architecture doesn't support write rare
after init, or the functionality is disabled, the write rare functions
will resolve into their non-write rare counterpart:
- memset()
- memcpy()
- assignment operator
- rcu_assign_pointer()

For code that can be either link as module or as built-in (ex: device
driver init function), it is not possible to tell upfront what will be the
case. For this scenario if the functions are called during system init,
they will automatically choose, at runtime, to go through the fast path of
non-write rare. Should they be invoked later, during module init, they
will use the write-rare path.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/Kconfig|   7 ++
 include/linux/prmem.h (new) |  71 +++
 mm/Makefile |   1 +
 mm/prmem.c (new)| 179 ++
 4 files changed, 258 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index b0b6d176f1c1..0380d4a64681 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -814,6 +814,13 @@ config ARCH_HAS_PRMEM
  architecture specific symbol stating that the architecture provides
  a back-end function for the write rare operation.
 
+config ARCH_HAS_PRMEM_HEADER
+   def_bool n
+   depends on ARCH_HAS_PRMEM
+   help
+ architecture specific symbol stating that the architecture provides
+ own specific header back-end for the write rare operation.
+
 config PRMEM
bool "Write protect critical data that doesn't need high write speed."
depends on ARCH_HAS_PRMEM
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..0e4683c503b9
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library - generic part
+ *
+ * (C) Copyright 2018-2019 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t n)
+{
+   return memset(p, c, n);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t n)
+{
+   return memcpy(p, q, n);
+}
+
+#define wr_assign(var, val)((var) = (val))
+#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v)
+
+#else
+
+#include 
+
+void *wr_memset(void *p, int c, __kernel_size_t n);
+void *wr_memcpy(void *p, const void *q, __kernel_size_t n);
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ */
+
+#define wr_assign(dst, val) ({ \
+   typeof(dst) tmp = (typeof(dst))val; \
+   \
+   wr_memcpy(, , sizeof(dst)); \
+   dst;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer - it MUST be aligned to a machine word
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ * The rcu_assign_pointer() is implemented as equivalent of:
+ *
+ * smp_mb();
+ * WRITE_ONCE();
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   smp_mb();   \
+   wr_assign(p, v);\
+   p;  \
+})
+#endif
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..ef3867c16ce0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM)   += sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PRMEM) += prmem.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/prmem.c b/mm/prmem.c
new file mode 100644
index ..9383b7d6951e
--- /dev/null
+++ b/mm/prmem.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+

[RFC PATCH v4 04/12] __wr_after_init: x86_64: enable

2019-02-11 Thread Igor Stoppa
Set ARCH_HAS_PRMEM to Y for x86_64

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 68261430fe6e..7392b53b12c2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86_64
select SWIOTLB
select X86_DEV_DMA_OPS
select ARCH_HAS_SYSCALL_WRAPPER
+   select ARCH_HAS_PRMEM
 
 #
 # Arch settings
-- 
2.19.1



[RFC PATCH v4 07/12] __wr_after_init: Documentation: self-protection

2019-02-11 Thread Igor Stoppa
Update the self-protection documentation, to mention also the use of the
__wr_after_init attribute.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 Documentation/security/self-protection.rst | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/security/self-protection.rst 
b/Documentation/security/self-protection.rst
index f584fb74b4ff..df2614bc25b9 100644
--- a/Documentation/security/self-protection.rst
+++ b/Documentation/security/self-protection.rst
@@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, 
these can
 be marked with the (new and under development) ``__ro_after_init``
 attribute.
 
-What remains are variables that are updated rarely (e.g. GDT). These
-will need another infrastructure (similar to the temporary exceptions
-made to kernel code mentioned above) that allow them to spend the rest
-of their lifetime read-only. (For example, when being updated, only the
-CPU thread performing the update would be given uninterruptible write
-access to the memory.)
+Others, which are statically allocated, but still need to be updated
+rarely, can be marked with the ``__wr_after_init`` attribute.
+
+The update mechanism must avoid exposing the data to rogue alterations
+during the update. For example, only the CPU thread performing the update
+would be given uninterruptible write access to the memory.
+
+Currently there is no protection available for data allocated dynamically.
 
 Segregation of kernel memory from userspace memory
 ~~
-- 
2.19.1



[RFC PATCH v4 09/12] __wr_after_init: rodata_test: refactor tests

2019-02-11 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory, when available.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..e1349520b436 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   pr_info("%s tests were successful", data_type);
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata);
 }
-- 
2.19.1



[RFC PATCH v4 10/12] __wr_after_init: rodata_test: test __wr_after_init

2019-02-11 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index e1349520b436..a669cf9f5a61 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,8 +16,23 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
 
+#ifdef CONFIG_PRMEM
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+#endif
+
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
 {
@@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data,
 
 void rodata_test(void)
 {
-   test_data("rodata", _test_data,
- (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata);
+   if (!test_data("rodata", _test_data,
+  (unsigned long)&__start_rodata,
+  (unsigned long)&__end_rodata))
+   return;
+#ifdef CONFIG_PRMEM
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init);
+#endif
 }
-- 
2.19.1



[RFC PATCH v4 00/12] hardening: statically allocated protected memory

2019-02-11 Thread Igor Stoppa
To: Andy Lutomirski ,
To: Matthew Wilcox ,
To: Nadav Amit 
To: Peter Zijlstra ,
To: Dave Hansen ,
To: Mimi Zohar 
To: Thiago Jung Bauermann 
CC: Kees Cook 
CC: Ahmed Soliman 
CC: linux-integrity 
CC: Kernel Hardening 
CC: Linux-MM 
CC: Linux Kernel Mailing List 

Hello,
at last I'm able to resume work on the memory protection patchset I've
proposed some time ago. This version should address comments received so
far and introduce support for arm64. Details below.

Patch-set implementing write-rare memory protection for statically
allocated data.
Its purpose is to keep write protected the kernel data which is seldom
modified, especially if altering it can be exploited during an attack.

There is no read overhead, however writing requires special operations that
are probably unsuitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* supports only x86_64 and arm64;other architectures need to provide own
  backend

Some notes:
- in case an architecture doesn't support write rare, the behavior is to
  fallback to regular write operations
- before altering any memory, the destination is sanitized
- write rare data is segregated into own set of pages
- only x86_64 and arm64 supported, atm
- the memset_user() assembly functions seems to work, but I'm not too sure
  they are really ok
- I've added a simple example: the protection of ima_policy_flags
- the last patch is optional, but it seemed worth to do the refactoring
- the x86_64 user space address range is double the size of the kernel
  address space, so it's possible to randomize the beginning of the
  mapping of the kernel address space, but on arm64 they have the same
  size, so it's not possible to do the same
- I'm not sure if it's correct, since it doesn't seem to be that common in
  kernel sources, but instead of using #defines for overriding default
  function calls, I'm using "weak" for the default functions.
- unaddressed: Nadav proposed to do:
#define __wr  __attribute__((address_space(5)))
  but I don't know exactly where to use it atm

Changelog:

v3->v4
--

* added function for setting memory in user space mapping for arm64
* refactored code, to work with both supported architectures
* reduced dependency on x86_64 specific code, to support by default also
  arm64
* improved memset_user() for x86_64, but I'm not sure if I understood
  correctly what was the best way to enhance it.

v2->v3
--

* both wr_memset and wr_memcpy are implemented as generic functions
  the arch code must provide suitable helpers
* regular initialization for ima_policy_flags: it happens during init
* remove spurious code from the initialization function

v1->v2
--

* introduce cleaner split between generic and arch code
* add x86_64 specific memset_user()
* replace kernel-space memset() memcopy() with userspace counterpart
* randomize the base address for the alternate map across the entire
  available address range from user space (128TB - 64TB)
* convert BUG() to WARN()
* turn verification of written data into debugging option
* wr_rcu_assign_pointer() as special case of wr_assign()
* example with protection of ima_policy_flags
* documentation

Igor Stoppa (12):
  __wr_after_init: Core and default arch
  __wr_after_init: x86_64: memset_user()
  __wr_after_init: x86_64: randomize mapping offset
  __wr_after_init: x86_64: enable
  __wr_after_init: arm64: memset_user()
  __wr_after_init: arm64: enable
  __wr_after_init: Documentation: self-protection
  __wr_after_init: lkdtm test
  __wr_after_init: rodata_test: refactor tests
  __wr_after_init: rodata_test: test __wr_after_init
  __wr_after_init: test write rare functionality
  IMA: turn ima_policy_flags into __wr_after_init

 Documentation/security/self-protection.rst |  14 +-
 arch/Kconfig   |   7 +
 arch/arm64/Kconfig |   1 +
 arch/arm64/include/asm/uaccess.h   |   9 ++
 arch/arm64/lib/Makefile|   2 +-
 arch/arm64/lib/memset_user.S (new) |  63 
 arch/x86/Kconfig   |   1 +
 arch/x86/include/asm/uaccess_64.h  |   6 +
 arch/x86/lib/usercopy_64.c |  51 ++
 arch/x86/mm/Makefile   |   2 +
 arch/x86/mm/prmem.c (new)  |  20 +++
 drivers/misc/lkdtm/core.c  |   3 +
 drivers/misc/lkdtm/lkdtm.h |   3 +
 drivers/misc/lkdtm/perms.c |  29 
 include/linux/prmem.h (new)|  71 
 mm/Kconfig.debug   |   8 +
 mm/Makefile|   2 +
 mm/prmem.c (new)   | 179 ++

Re: [PATCH 03/12] __wr_after_init: generic header

2018-12-22 Thread Igor Stoppa




On 21/12/2018 21:45, Matthew Wilcox wrote:

On Fri, Dec 21, 2018 at 11:38:16AM -0800, Nadav Amit wrote:

On Dec 19, 2018, at 1:33 PM, Igor Stoppa  wrote:

+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET);
+}


What do you think about doing something like:

#define __wr  __attribute__((address_space(5)))

And then make all the pointers to write-rarely memory to use this attribute?
It might require more changes to the code, but can prevent bugs.


I like this idea.  It was something I was considering suggesting.


I have been thinking about this sort of problem, although from a bit 
different angle:


1) enforcing alignment for pointers
This can be implemented in similar way, by creating a multi-attribute 
that would define section, address space, like said here, and alignment.
However I'm not sure if it's possible to do anything to enforce the 
alignment of a pointer field within a structure. I haven't had time to 
look into this yet.


2) validation of the correctness of the actual value
Inside the kernel code, a function is not supposed to sanitize its 
arguments, as long as they come from some other trusted part of the 
kernel, rather than say from userspace or from some HW interface.

However,ROP/JOP should be considered.

I am aware of various efforts to make it harder to exploit these 
techniques, like signed pointers, CFI plugins, LTO.


But they are not necessarily available on every platform and mostly, 
afaik, they focus on specific type of attacks.



LTO can help with global optimizations, for example inlining functions 
across different objects.


CFI can detect jumps in the middle of a function, rather than proper 
function invocation, from its natural entry point.


Signed pointers can prevent data-based attacks to the execution flow, 
and they might have a role in preventing the attack I have in mind, but 
they are not available on all platforms.


What I'd like to do, is to verify, at runtime, that the pointer belongs 
to the type that the receiving function is meant for.


Ex: a legitimate __wr_after_init data must exist between 
__start_wr_after_init and __end_wr_after_init


That is easier and cleaner to test, imho.

But dynamically allocated memory doesn't have any such constraint.
If it was possible to introduce, for example, a flag to pass to vmalloc, 
to get the vmap_area from within a specific address range, it would 
reduce the attack surface.


In the implementation I have right now, I'm using extra flags for the 
pmalloc pages, which means the metadata is the new target for an attack.


But with adding the constraint that a dynamically allocated protected 
memory page must be within a range, then the attacker must change the 
underlying PTE. And if a region of PTEs are all part of protected 
memory, it is possible to make the PMD write rare.


--
igor


Re: [PATCH 03/12] __wr_after_init: generic functionality

2018-12-21 Thread Igor Stoppa




On 21/12/2018 21:43, Matthew Wilcox wrote:

On Fri, Dec 21, 2018 at 09:07:54PM +0200, Igor Stoppa wrote:

On 21/12/2018 20:41, Matthew Wilcox wrote:

On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:

+static inline int memtst(void *p, int c, __kernel_size_t len)


I don't understand why you're verifying that writes actually happen
in production code.  Sure, write lib/test_wrmem.c or something, but
verifying every single rare write seems like a mistake to me.


This is actually something I wrote more as a stop-gap.
I have the feeling there should be already something similar available.
And probably I could not find it. Unless it's so trivial that it doesn't
deserve to become a function?

But if there is really no existing alternative, I can put it in a separate
file.


I'm not questioning the implementation, I'm questioning why it's ever
called.  If I type 'p = q', I don't then verify that p actually is equal
to q.  I just assume that the compiler did its job. 


Paranoia, probably.

My thinking is that, once the data is protected, it could still be 
attacked through the metadata. A pte, for example.
Preventing the setting of a flag, that for example enables a 
functionality, might be a nice way to thwart all this protection.


If I verify that the write was successful, through the read-only 
address, then I know that the action really completed successfully.


There are many more types of attack that one can come up with, but 
attacking the metadata is probably the most likely next level.


So what I'm trying to do is more akin to:

p = 
*p = q;
d == q;

But in our case there is an indefinite amount of time between the 
creation of

the alternate mapping and its use.

Another way could be to check that the mapping is correct before writing 
to it. Maybe safer? I went for confirming that the end result is correct.


Of course it adds overhead, but if the whole thing is already slow and 
happening not too often, how much does it matter?


An alternative approach would be that the code invoking the wr operation 
performs an explicit test.


Would it look better if I implemented this as a wr_assign_verify() 
inline function?



+#ifndef CONFIG_PRMEM


So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
wrmem.


In my mind (maybe still clinging to the old implementation), PRMEM is the
master toggle, for protected memory.

Then there are various types and the first one being now implemented is
write rare after init (because ro after init already exists).

However, the same levels of protection should then follow for dynamically
allocated memory (ye old pmalloc).

PRMEM would then become the moniker for the whole shebang.


To my mind, what we have in this patchset is support for statically
allocated protected (or write-rare) memory.  Later, we'll add dynamically
allocated protected memory.  So it's all protected memory, and we'll
use the same accessors for both ... right?


The static one is only write rare because read only after init already 
exists.


The dynamic one must introduce the same write rare, yes, but it should 
also introduce read_only (I do not count the destruction of an entire 
pool as a write rare operation). Ex: SELinux policyDB.


write rare, regardless if dynamic or static, is a sub-case of protected 
memory, hence the differentiation between protected and write rare.


I'm not claiming to be particularly skilled at choosing names, so if 
something better sounding is available, it can be used.

This is the best I could come up with.

[...]


I don't think there's anything to be done in that case.  Indeed,
I think the only thing to do is panic and stop the whole machine if
initialisation fails.  We'd be in a situation where nothing can update
protected memory, and the machine just won't work.

I suppose we could "fail insecure" and never protect the memory, but I
think that's asking for trouble.


ok, so init will BUG() if it fails, instead of the current WARN_ONCE() 
and return.



Anyway, my concern was for a driver which can be built either as a
module or built-in.  Its init code will be called before write-protection
happens when it's built in, and after write-protection happens when it's
a module.  It should be able to use wr_assign() in either circumstance.
One might also have a utility function which is called from both init
and non-init code and want to use wr_assign() whether initialisation
has completed or not.


If the writable mapping is created early enough, the only penalty for 
using the write-rare function on a writable variable is that it would be 
slower. Probably there wouldn't be so much data to deal with.


If the driver is dealing with some HW, most likely that would make any 
write rare extra delay look negligible.


--
igor


Re: [PATCH 03/12] __wr_after_init: generic functionality

2018-12-21 Thread Igor Stoppa




On 21/12/2018 20:41, Matthew Wilcox wrote:

On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:

+static inline int memtst(void *p, int c, __kernel_size_t len)


I don't understand why you're verifying that writes actually happen
in production code.  Sure, write lib/test_wrmem.c or something, but
verifying every single rare write seems like a mistake to me.


This is actually something I wrote more as a stop-gap.
I have the feeling there should be already something similar available.
And probably I could not find it. Unless it's so trivial that it doesn't 
deserve to become a function?


But if there is really no existing alternative, I can put it in a 
separate file.





+#ifndef CONFIG_PRMEM


So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
wrmem.


In my mind (maybe still clinging to the old implementation), PRMEM is 
the master toggle, for protected memory.


Then there are various types and the first one being now implemented is 
write rare after init (because ro after init already exists).


However, the same levels of protection should then follow for 
dynamically allocated memory (ye old pmalloc).


PRMEM would then become the moniker for the whole shebang.


+#define wr_assign(var, val)((var) = (val))


The hamming distance between 'var' and 'val' is too small.  The convention
in the line immediately below (p and v) is much more readable.


ok, I'll fix it


+#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v)
+#define wr_assign(var, val) ({ \
+   typeof(var) tmp = (typeof(var))val; \
+   \
+   wr_memcpy(, , sizeof(var)); \
+   var;\
+})


Doesn't wr_memcpy return 'var' anyway?


It should return the destination, which is 

But I wanted to return the actual value of the assignment, val

Like if I do  (a = 7)  it evaluates to 7,

similarly wr_assign(a, 7) would also evaluate to 7

The reason why i returned var instead of val is that it would allow to 
detect any error.



+/**
+ * wr_memcpy() - copyes size bytes from q to p


typo


:-( thanks


+ * @p: beginning of the memory to write to
+ * @q: beginning of the memory to read from
+ * @size: amount of bytes to copy
+ *
+ * Returns pointer to the destination



+ * The architecture code must provide:
+ *   void __wr_enable(wr_state_t *state)
+ *   void *__wr_addr(void *addr)
+ *   void *__wr_memcpy(void *p, const void *q, __kernel_size_t size)
+ *   void __wr_disable(wr_state_t *state)


This section shouldn't be in the user documentation of wr_memcpy().


ok


+ */
+void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   wr_state_t wr_state;
+   void *wr_poking_addr = __wr_addr(p);
+
+   if (WARN_ONCE(!wr_ready, "No writable mapping available") ||


Surely not.  If somebody's called wr_memcpy() before wr_ready is set,
that means we can just call memcpy().



What I was trying to catch is the case where, after a failed init, the 
writable mapping doesn't exist. In that case wr_ready is also not set.


The problem is that I just don't know what to do in a case where there 
has been such a major error which prevents he creation of hte alternate 
mapping.


I understand that we still want to continue, to provide as much debug 
info as possible, but I am at a loss about finding the saner course of 
actions.


--
igor


Re: [PATCH 01/12] x86_64: memset_user()

2018-12-21 Thread Igor Stoppa




On 21/12/2018 20:25, Matthew Wilcox wrote:

On Fri, Dec 21, 2018 at 08:14:12PM +0200, Igor Stoppa wrote:

+unsigned long __memset_user(void __user *addr, int c, unsigned long size)
+{
+   long __d0;
+   unsigned long  pattern = 0;
+   int i;
+
+   for (i = 0; i < 8; i++)
+   pattern = (pattern << 8) | (0xFF & c);


That's inefficient.

pattern = (unsigned char)c;
pattern |= pattern << 8;
pattern |= pattern << 16;
pattern |= pattern << 32;


ok, thank you

--
igor


[PATCH 04/12] __wr_after_init: debug writes

2018-12-21 Thread Igor Stoppa
After each write operation, confirm that it was successful, otherwise
generate a warning.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/Kconfig.debug | 8 
 mm/prmem.c   | 6 ++
 2 files changed, 14 insertions(+)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..b10305cfac3c 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM
+bool "Verify each write rare operation."
+depends on PRMEM
+default n
+help
+  After any write rare operation, compares the data written with the
+  value provided by the caller.
diff --git a/mm/prmem.c b/mm/prmem.c
index e1c1be3a1171..51f6776e2515 100644
--- a/mm/prmem.c
+++ b/mm/prmem.c
@@ -61,6 +61,9 @@ void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
__wr_enable(_state);
__wr_memcpy(wr_poking_addr, q, size);
__wr_disable(_state);
+#ifdef CONFIG_DEBUG_PRMEM
+   VM_WARN_ONCE(memcmp(p, q, size), "Failed %s()", __func__);
+#endif
local_irq_enable();
return p;
 }
@@ -92,6 +95,9 @@ void *wr_memset(void *p, int c, __kernel_size_t len)
__wr_enable(_state);
__wr_memset(wr_poking_addr, c, len);
__wr_disable(_state);
+#ifdef CONFIG_DEBUG_PRMEM
+   VM_WARN_ONCE(memtst(p, c, len), "Failed %s()", __func__);
+#endif
local_irq_enable();
return p;
 }
-- 
2.19.1



[PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init

2018-12-21 Thread Igor Stoppa
The policy flags could be targeted by an attacker aiming at disabling IMA,
so that there would be no trace of a file system modification in the
measurement list.

Since the flags can be altered at runtime, it is not possible to make
them become fully read-only, for example with __ro_after_init.

__wr_after_init can still provide some protection, at least against
simple memory overwrite attacks

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 security/integrity/ima/ima.h| 3 ++-
 security/integrity/ima/ima_policy.c | 9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index cc12f3449a72..297c25f5122e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../integrity.h"
@@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
 #define IMA_TEMPLATE_IMA_FMT "d|n"
 
 /* current content of the policy */
-extern int ima_policy_flag;
+extern int ima_policy_flag __wr_after_init;
 
 /* set during initialization */
 extern int ima_hash_algo;
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 7489cb7de6dc..2004de818d92 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -47,7 +47,7 @@
 #define INVALID_PCR(a) (((a) < 0) || \
(a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8))
 
-int ima_policy_flag;
+int ima_policy_flag __wr_after_init;
 static int temp_ima_appraise;
 static int build_ima_appraise __ro_after_init;
 
@@ -452,12 +452,13 @@ void ima_update_policy_flag(void)
 
list_for_each_entry(entry, ima_rules, list) {
if (entry->action & IMA_DO_MASK)
-   ima_policy_flag |= entry->action;
+   wr_assign(ima_policy_flag,
+ ima_policy_flag | entry->action);
}
 
ima_appraise |= (build_ima_appraise | temp_ima_appraise);
if (!ima_appraise)
-   ima_policy_flag &= ~IMA_APPRAISE;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE);
 }
 
 static int ima_appraise_flag(enum ima_hooks func)
@@ -574,7 +575,7 @@ void ima_update_policy(void)
list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu);
 
if (ima_rules != policy) {
-   ima_policy_flag = 0;
+   wr_assign(ima_policy_flag, 0);
ima_rules = policy;
}
ima_update_policy_flag();
-- 
2.19.1



[PATCH 10/12] __wr_after_init: test write rare functionality

2018-12-21 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.
It can be optionally compiled as module.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/Kconfig.debug |   8 +++
 mm/Makefile  |   1 +
 mm/test_write_rare.c | 135 +++
 3 files changed, 144 insertions(+)
 create mode 100644 mm/test_write_rare.c

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index b10305cfac3c..ae018e56c4e4 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -102,3 +102,11 @@ config DEBUG_PRMEM
 help
   After any write rare operation, compares the data written with the
   value provided by the caller.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index ef3867c16ce0..8de1d468f4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PRMEM) += prmem.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..30574bc34a20
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static bool test_pattern(void)
+{
+   return (memtst(array, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2));
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memtst(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memtst(array, '0', PAGE_SIZE * 3),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2),
+"Array write rare memset test failed."))
+   return false;
+
+   if (WARN(test_pattern(), "Array write rare memset test failed."))
+   return false;
+
+   pr_info("Array write rare memset test passed.");
+   return true;
+}
+
+static u8 array_1[PAGE_SIZE * 2];
+static u8 array_2[PAGE_SIZE * 2];
+
+static bool test_wr_memcpy(void)
+{
+   int new_val = 0x12345678;
+
+   wr_assign(scalar, new_val);
+   if (WARN(memcmp(, _val, sizeof(scalar)),
+"Scalar write rare memcpy test failed."))
+   return false;
+   pr_info("Scalar write rare memcpy test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   memset(array_1, '1', PAGE_SIZE * 2);
+   memset

[PATCH 07/12] __wr_after_init: lkdtm test

2018-12-21 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
attribute will cause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[PATCH 06/12] __wr_after_init: Documentation: self-protection

2018-12-21 Thread Igor Stoppa
Update the self-protection documentation, to mention also the use of the
__wr_after_init attribute.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 Documentation/security/self-protection.rst | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/security/self-protection.rst 
b/Documentation/security/self-protection.rst
index f584fb74b4ff..df2614bc25b9 100644
--- a/Documentation/security/self-protection.rst
+++ b/Documentation/security/self-protection.rst
@@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, 
these can
 be marked with the (new and under development) ``__ro_after_init``
 attribute.
 
-What remains are variables that are updated rarely (e.g. GDT). These
-will need another infrastructure (similar to the temporary exceptions
-made to kernel code mentioned above) that allow them to spend the rest
-of their lifetime read-only. (For example, when being updated, only the
-CPU thread performing the update would be given uninterruptible write
-access to the memory.)
+Others, which are statically allocated, but still need to be updated
+rarely, can be marked with the ``__wr_after_init`` attribute.
+
+The update mechanism must avoid exposing the data to rogue alterations
+during the update. For example, only the CPU thread performing the update
+would be given uninterruptible write access to the memory.
+
+Currently there is no protection available for data allocated dynamically.
 
 Segregation of kernel memory from userspace memory
 ~~
-- 
2.19.1



[PATCH 02/12] __wr_after_init: linker section and label

2018-12-21 Thread Igor Stoppa
Introduce a section and a label for statically allocated write rare
data. The label is named "__wr_after_init".
As the name implies, after the init phase is completed, this section
will be modifiable only by invoking write rare functions.
The section must take up a set of full pages.

To activate both section and label, the arch must set CONFIG_ARCH_HAS_PRMEM

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/Kconfig  | 15 +++
 include/asm-generic/vmlinux.lds.h | 25 +
 include/linux/cache.h | 21 +
 init/main.c   |  2 ++
 4 files changed, 63 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index e1e540ffa979..8668ffec8098 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -802,6 +802,21 @@ config VMAP_STACK
  the stack to map directly to the KASAN shadow map using a formula
  that is incorrect if the stack is in vmalloc space.
 
+config ARCH_HAS_PRMEM
+   def_bool n
+   help
+ architecture specific symbol stating that the architecture provides
+ a back-end function for the write rare operation.
+
+config PRMEM
+   bool "Write protect critical data that doesn't need high write speed."
+   depends on ARCH_HAS_PRMEM
+   default y
+   help
+ If the architecture supports it, statically allocated data which
+ has been selected for hardening becomes (mostly) read-only.
+ The selection happens by labelling the data "__wr_after_init".
+
 config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
 
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 3d7a6a9c2370..ddb1fd608490 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -311,6 +311,30 @@
KEEP(*(__jump_table))   \
__stop___jump_table = .;
 
+/*
+ * Allow architectures to handle wr_after_init data on their
+ * own by defining an empty WR_AFTER_INIT_DATA.
+ * However, it's important that pages containing WR_RARE data do not
+ * hold anything else, to avoid both accidentally unprotecting something
+ * that is supposed to stay read-only all the time and also to protect
+ * something else that is supposed to be writeable all the time.
+ */
+#ifndef WR_AFTER_INIT_DATA
+#ifdef CONFIG_PRMEM
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(PAGE_SIZE);   \
+   __start_wr_after_init = .;  \
+   . = ALIGN(align);   \
+   *(.data..wr_after_init) \
+   . = ALIGN(PAGE_SIZE);   \
+   __end_wr_after_init = .;\
+   . = ALIGN(align);
+#else
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(align);
+#endif
+#endif
+
 /*
  * Allow architectures to handle ro_after_init data on their
  * own by defining an empty RO_AFTER_INIT_DATA.
@@ -332,6 +356,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
RO_AFTER_INIT_DATA  /* Read only after init */  \
+   WR_AFTER_INIT_DATA(align) /* wr after init */   \
KEEP(*(__vermagic)) /* Kernel version magic */  \
. = ALIGN(8);   \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 750621e41d1c..09bd0b9284b6 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -31,6 +31,27 @@
 #define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
 #endif
 
+/*
+ * __wr_after_init is used to mark objects that cannot be modified
+ * directly after init (i.e. after mark_rodata_ro() has been called).
+ * These objects become effectively read-only, from the perspective of
+ * performing a direct write, like a variable assignment.
+ * However, they can be altered through a dedicated function.
+ * It is intended for those objects which are occasionally modified after
+ * init, however they are modified so seldomly, that the extra cost from
+ * the indirect modification is either negligible or worth paying, for the
+ * sake of the protection gained.
+ */
+#ifndef __wr_after_init
+#ifdef CONFIG_PRMEM
+#define __wr_after_init \
+   __attribute__((__se

[PATCH 12/12] x86_64: __clear_user as case of __memset_user

2018-12-21 Thread Igor Stoppa
To avoid code duplication, re-use __memset_user(), when clearing
user-space memory.

The overhead should be minimal (2 extra register assignments) and
outside of the writing loop.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/lib/usercopy_64.c | 29 +
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 84f8f8a20b30..ab6aabb62055 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -69,34 +69,7 @@ EXPORT_SYMBOL(memset_user);
 
 unsigned long __clear_user(void __user *addr, unsigned long size)
 {
-   long __d0;
-   might_fault();
-   /* no memory constraint because it doesn't change any memory gcc knows
-  about */
-   stac();
-   asm volatile(
-   "   testq  %[size8],%[size8]\n"
-   "   jz 4f\n"
-   "0: movq $0,(%[dst])\n"
-   "   addq   $8,%[dst]\n"
-   "   decl %%ecx ; jnz   0b\n"
-   "4: movq  %[size1],%%rcx\n"
-   "   testl %%ecx,%%ecx\n"
-   "   jz 2f\n"
-   "1: movb   $0,(%[dst])\n"
-   "   incq   %[dst]\n"
-   "   decl %%ecx ; jnz  1b\n"
-   "2:\n"
-   ".section .fixup,\"ax\"\n"
-   "3: lea 0(%[size1],%[size8],8),%[size8]\n"
-   "   jmp 2b\n"
-   ".previous\n"
-   _ASM_EXTABLE_UA(0b, 3b)
-   _ASM_EXTABLE_UA(1b, 2b)
-   : [size8] "="(size), [dst] "=" (__d0)
-   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr));
-   clac();
-   return size;
+   return __memset_user(addr, 0, size);
 }
 EXPORT_SYMBOL(__clear_user);
 
-- 
2.19.1



[PATCH 09/12] rodata_test: add verification for __wr_after_init

2018-12-21 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index e1349520b436..a669cf9f5a61 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,8 +16,23 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
 
+#ifdef CONFIG_PRMEM
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+#endif
+
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
 {
@@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data,
 
 void rodata_test(void)
 {
-   test_data("rodata", _test_data,
- (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata);
+   if (!test_data("rodata", _test_data,
+  (unsigned long)&__start_rodata,
+  (unsigned long)&__end_rodata))
+   return;
+#ifdef CONFIG_PRMEM
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init);
+#endif
 }
-- 
2.19.1



[PATCH 08/12] rodata_test: refactor tests

2018-12-21 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory, when available.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..e1349520b436 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   pr_info("%s tests were successful", data_type);
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata);
 }
-- 
2.19.1



[PATCH 03/12] __wr_after_init: generic functionality

2018-12-21 Thread Igor Stoppa
The patch provides:
- the generic part of the write rare functionality for static data,
  based on code from Matthew Wilcox
- the dummy functionality, in case an arch doesn't support write rare or
  the functionality is disabled

The basic functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h | 106 ++
 mm/Makefile   |   1 +
 mm/prmem.c|  97 ++
 3 files changed, 204 insertions(+)
 create mode 100644 include/linux/prmem.h
 create mode 100644 mm/prmem.c

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..12c1d0d1cb78
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+#include 
+
+
+/**
+ * memtst() - test len bytes starting at p to match the c value
+ * @p: beginning of the memory to test
+ * @c: byte to compare against
+ * @len: amount of bytes to test
+ *
+ * Returns 0 on success, non-zero otherwise.
+ */
+static inline int memtst(void *p, int c, __kernel_size_t len)
+{
+   __kernel_size_t i;
+
+   for (i = 0; i < len; i++) {
+   u8 d =  *(i + (u8 *)p) - (u8)c;
+
+   if (unlikely(d))
+   return d;
+   }
+   return 0;
+}
+
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return memset(p, c, len);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return memcpy(p, q, size);
+}
+
+#define wr_assign(var, val)((var) = (val))
+#define wr_rcu_assign_pointer(p, v)rcu_assign_pointer(p, v)
+
+#else
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+void *wr_memset(void *p, int c, __kernel_size_t len);
+void *wr_memcpy(void *p, const void *q, __kernel_size_t size);
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ *
+ * Note: it might be possible to optimize this, to use wr_memset in some
+ * cases (maybe with NULL?).
+ */
+
+#define wr_assign(var, val) ({ \
+   typeof(var) tmp = (typeof(var))val; \
+   \
+   wr_memcpy(, , sizeof(var)); \
+   var;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer - it MUST be aligned to a machine word
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ * The rcu_assign_pointer() is implemented as equivalent of:
+ *
+ * smp_mb();
+ * WRITE_ONCE();
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   smp_mb();   \
+   wr_assign(p, v);\
+   p;  \
+})
+#endif
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..ef3867c16ce0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM)   += sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PRMEM) += prmem.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/prmem.c b/mm/prmem.c
new file mode 100644
index ..e1c1be3a1171
--- /dev/null
+++ b/mm/prmem.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+__ro_after_init bool wr_ready;
+
+/*
+ * The following two variables are statically allocated by the linker
+ * script at the the boundaries of the memory region (rounded up to
+ * multiples of PAGE_SIZE) reserved for __wr_after_init.
+ */
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+static unsigned long start = (unsigned long)&__start_wr_after_init;

[PATCH 05/12] __wr_after_init: x86_64: __wr_op

2018-12-21 Thread Igor Stoppa
Architecture-specific implementation of the core write rare
operation.

The implementation is based on code from Andy Lutomirski and Nadav Amit
for patching the text on x86 [here goes reference to commits, once merged]

The modification of write protected data is done through an alternate
mapping of the same pages, as writable.
This mapping is persistent, but active only for a core that is
performing a write rare operation. And only for the duration of said
operation.
Local interrupts are disabled, while the alternate mapping is active.

In theory, it could introduce a non-predictable delay, in a preemptible
system, however the amount of data to be altered is likely to be far
smaller than a page.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/prmem.h | 72 
 arch/x86/mm/Makefile |  2 +
 arch/x86/mm/prmem.c  | 69 ++
 4 files changed, 144 insertions(+)
 create mode 100644 arch/x86/include/asm/prmem.h
 create mode 100644 arch/x86/mm/prmem.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8689e794a43c..e5e4fc4fa5c2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86_64
select SWIOTLB
select X86_DEV_DMA_OPS
select ARCH_HAS_SYSCALL_WRAPPER
+   select ARCH_HAS_PRMEM
 
 #
 # Arch settings
diff --git a/arch/x86/include/asm/prmem.h b/arch/x86/include/asm/prmem.h
new file mode 100644
index ..e1f09f881351
--- /dev/null
+++ b/arch/x86/include/asm/prmem.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _ASM_X86_PRMEM_H
+#define _ASM_X86_PRMEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+typedef temporary_mm_state_t wr_state_t;
+
+extern __ro_after_init struct mm_struct *wr_poking_mm;
+extern __ro_after_init unsigned long wr_poking_base;
+
+static inline void *__wr_addr(void *addr)
+{
+   return (void *)(wr_poking_base + (unsigned long)addr);
+}
+
+static inline void __wr_enable(wr_state_t *state)
+{
+   *state = use_temporary_mm(wr_poking_mm);
+}
+
+static inline void __wr_disable(wr_state_t *state)
+{
+   unuse_temporary_mm(*state);
+}
+
+
+/**
+ * __wr_memset() - sets len bytes of the destination p to the c value
+ * @p: beginning of the memory to write to
+ * @c: byte to replicate
+ * @len: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ */
+static inline void *__wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return (void *)memset_user((void __user *)p, (u8)c, len);
+}
+
+/**
+ * __wr_memcpy() - copyes size bytes from q to p
+ * @p: beginning of the memory to write to
+ * @q: beginning of the memory to read from
+ * @size: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ */
+static inline void *__wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return (void *)copy_to_user((void __user *)p, q, size);
+}
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..66652de1e2c7 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
+
+obj-$(CONFIG_PRMEM)+= prmem.o
diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c
new file mode 100644
index ..f4b36baa2f19
--- /dev/null
+++ b/arch/x86/mm/prmem.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+extern __ro_after_init bool wr_ready;
+__ro_after_init struct mm_struct *wr_poking_mm;
+__ro_after_init unsigned long wr_poking_base;
+
+/*
+ * The following two variables are statically allocated by the linker
+ * script at the the boundaries of the memory region (rounded up to
+ * multiples of PAGE_SIZE) reserved for __wr_after_init.
+ */
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+struct mm_struct *copy_init_mm(void);
+void __init wr_poking_init(void)
+{
+   unsigned long start = (unsigned long)&__start_wr_after_init;
+   unsigned long end = (unsigned long)&__end_wr_af

[PATCH 01/12] x86_64: memset_user()

2018-12-21 Thread Igor Stoppa
Create x86_64 specific version of memset for user space, based on
clear_user().
This will be used for implementing wr_memset() in the __wr_after_init
scenario, where write-rare variables have an alternate mapping for
writing.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: Thiago Jung Bauermann 
CC: Ahmed Soliman 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/uaccess_64.h |  6 
 arch/x86/lib/usercopy_64.c| 54 +++
 2 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index a9d637bc301d..f194bfce4866 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len);
 unsigned long
 mcsafe_handle_tail(char *to, char *from, unsigned len);
 
+unsigned long __must_check
+memset_user(void __user *mem, int c, unsigned long len);
+
+unsigned long __must_check
+__memset_user(void __user *mem, int c, unsigned long len);
+
 #endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 1bd837cdc4b1..84f8f8a20b30 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -9,6 +9,60 @@
 #include 
 #include 
 
+/*
+ * Memset Userspace
+ */
+
+unsigned long __memset_user(void __user *addr, int c, unsigned long size)
+{
+   long __d0;
+   unsigned long  pattern = 0;
+   int i;
+
+   for (i = 0; i < 8; i++)
+   pattern = (pattern << 8) | (0xFF & c);
+   might_fault();
+   /* no memory constraint: gcc doesn't know about this memory */
+   stac();
+   asm volatile(
+   "   movq %[val], %%rdx\n"
+   "   testq  %[size8],%[size8]\n"
+   "   jz 4f\n"
+   "0: mov %%rdx,(%[dst])\n"
+   "   addq   $8,%[dst]\n"
+   "   decl %%ecx ; jnz   0b\n"
+   "4: movq  %[size1],%%rcx\n"
+   "   testl %%ecx,%%ecx\n"
+   "   jz 2f\n"
+   "1: movb   %%dl,(%[dst])\n"
+   "   incq   %[dst]\n"
+   "   decl %%ecx ; jnz  1b\n"
+   "2:\n"
+   ".section .fixup,\"ax\"\n"
+   "3: lea 0(%[size1],%[size8],8),%[size8]\n"
+   "   jmp 2b\n"
+   ".previous\n"
+   _ASM_EXTABLE_UA(0b, 3b)
+   _ASM_EXTABLE_UA(1b, 2b)
+   : [size8] "="(size), [dst] "=" (__d0)
+   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr),
+ [val] "ri"(pattern)
+   : "rdx");
+
+   clac();
+   return size;
+}
+EXPORT_SYMBOL(__memset_user);
+
+unsigned long memset_user(void __user *to, int c, unsigned long n)
+{
+   if (access_ok(VERIFY_WRITE, to, n))
+   return __memset_user(to, c, n);
+   return n;
+}
+EXPORT_SYMBOL(memset_user);
+
+
 /*
  * Zero Userspace
  */
-- 
2.19.1



Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op

2018-12-21 Thread Igor Stoppa




On 21/12/2018 19:23, Andy Lutomirski wrote:

On Thu, Dec 20, 2018 at 11:19 AM Igor Stoppa  wrote:




On 20/12/2018 20:49, Matthew Wilcox wrote:


I think you're causing yourself more headaches by implementing this "op"
function.


I probably misinterpreted the initial criticism on my first patchset,
about duplication. Somehow, I'm still thinking to the endgame of having
higher-level functions, like list management.


Here's some generic code:


thank you, I have one question, below


void *wr_memcpy(void *dst, void *src, unsigned int len)
{
   wr_state_t wr_state;
   void *wr_poking_addr = __wr_addr(dst);

   local_irq_disable();
   wr_enable(_state);
   __wr_memcpy(wr_poking_addr, src, len);


Is __wraddr() invoked inside wm_memcpy() instead of being invoked
privately within __wr_memcpy() because the code is generic, or is there
some other reason?


   wr_disable(_state);
   local_irq_enable();

   return dst;
}

Now, x86 can define appropriate macros and functions to use the temporary_mm
functionality, and other architectures can do what makes sense to them.



I suspect that most architectures will want to do this exactly like
x86, though, but sure, it could be restructured like this.


In spirit, I think yes, but already I couldn't find a clean ways to do 
multi-arch wr_enable(_state), so I made that too become arch_dependent.


Maybe after implementing write rare for a few archs, it becomes more 
clear (to me, any advice is welcome) which parts can be considered common.



On x86, I *think* that __wr_memcpy() will want to special-case len ==
1, 2, 4, and (on 64-bit) 8 byte writes to keep them atomic. i'm
guessing this is the same on most or all architectures.


I switched to xxx_user() approach, as you suggested.
For x86_64 I'm using copy_user() and i added a memset_user(), based on 
copy_user().


It's already assembly code optimized for dealing with multiples of 
8-byte words or subsets. You can see this in the first patch of the 
patchset, even this one.


I'll send out the v3 patchset in a short while.

--
igor


Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op

2018-12-20 Thread Igor Stoppa




On 20/12/2018 20:49, Matthew Wilcox wrote:


I think you're causing yourself more headaches by implementing this "op"
function.  


I probably misinterpreted the initial criticism on my first patchset, 
about duplication. Somehow, I'm still thinking to the endgame of having 
higher-level functions, like list management.



Here's some generic code:


thank you, I have one question, below


void *wr_memcpy(void *dst, void *src, unsigned int len)
{
wr_state_t wr_state;
void *wr_poking_addr = __wr_addr(dst);

local_irq_disable();
wr_enable(_state);
__wr_memcpy(wr_poking_addr, src, len);


Is __wraddr() invoked inside wm_memcpy() instead of being invoked 
privately within __wr_memcpy() because the code is generic, or is there 
some other reason?



wr_disable(_state);
local_irq_enable();

return dst;
}

Now, x86 can define appropriate macros and functions to use the temporary_mm
functionality, and other architectures can do what makes sense to them.



--
igor


Re: [PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init

2018-12-20 Thread Igor Stoppa

Hi,

On 20/12/2018 19:30, Thiago Jung Bauermann wrote:


Hello Igor,

Igor Stoppa  writes:


diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 59d834219cd6..5f4e13e671bf 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 

  #include "ima.h"

@@ -98,9 +99,9 @@ void __init ima_load_x509(void)
  {
int unset_flags = ima_policy_flag & IMA_APPRAISE;

-   ima_policy_flag &= ~unset_flags;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~unset_flags);
integrity_load_x509(INTEGRITY_KEYRING_IMA, CONFIG_IMA_X509_PATH);
-   ima_policy_flag |= unset_flags;
+   wr_assign(ima_policy_flag, ima_policy_flag | unset_flags);
  }
  #endif


In the cover letter, you said:


As the name implies, the write protection kicks in only after init()
is completed; before that moment, the data is modifiable in the usual
way.


Given that, is it still necessary or useful to use wr_assign() in a
function marked with __init?


I might have been over enthusiastic of using the wr interface.
You are right, I can drop these two. Thank you.

--
igor


Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op

2018-12-20 Thread Igor Stoppa

Hi,

On 20/12/2018 19:20, Thiago Jung Bauermann wrote:


Hello Igor,


+/*
+ * The following two variables are statically allocated by the linker
+ * script at the the boundaries of the memory region (rounded up to
+ * multiples of PAGE_SIZE) reserved for __wr_after_init.
+ */
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static inline bool is_wr_after_init(unsigned long ptr, __kernel_size_t size)
+{
+   unsigned long start = (unsigned long)&__start_wr_after_init;
+   unsigned long end = (unsigned long)&__end_wr_after_init;
+   unsigned long low = ptr;
+   unsigned long high = ptr + size;
+
+   return likely(start <= low && low <= high && high <= end);
+}
+
+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op)
+{
+   temporary_mm_state_t prev;
+   unsigned long offset;
+   unsigned long wr_poking_addr;
+
+   /* Confirm that the writable mapping exists. */
+   if (WARN_ONCE(!wr_ready, "No writable mapping available"))
+   return (void *)dst;
+
+   if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") ||
+   WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range."))
+   return (void *)dst;
+
+   offset = dst - (unsigned long)&__start_wr_after_init;
+   wr_poking_addr = wr_poking_base + offset;
+   local_irq_disable();
+   prev = use_temporary_mm(wr_poking_mm);
+
+   if (op == WR_MEMCPY)
+   copy_to_user((void __user *)wr_poking_addr, (void *)src, len);
+   else if (op == WR_MEMSET)
+   memset_user((void __user *)wr_poking_addr, (u8)src, len);
+
+   unuse_temporary_mm(prev);
+   local_irq_enable();
+   return (void *)dst;
+}


There's a lot of casting back and forth between unsigned long and void *
(also in the previous patch). Is there a reason for that?


The intention is to ensure that algebraic operations between addresses 
are performed as intended, rather than gcc operating some incorrect 
optimization, wrongly assuming that two addresses belong to the same object.


Said this, I can certainly have a further look at the code and see if I 
can reduce the amount of casts. I do not like them either.


But I'm not sure how much can be dropped: if I start from (void *), then 
I have to cast them to unsigned long for the math.


And the xxx_user() operations require a (void __user *).


My impression
is that there would be less casts if variables holding addresses were
declared as void * in the first place. 


It might save 1 or 2 casts. I'll do the count.


In that case, it wouldn't hurt to
have an additional argument in __rw_op() to carry the byte value for the
WR_MEMSET operation.


Wouldn't it clobber one more register? Or can gcc figure out that it's 
not used? __wr_op() is not inline.



+
+#define TB (1UL << 40)


^^spurious


+
+struct mm_struct *copy_init_mm(void);
+void __init wr_poking_init(void)
+{
+   unsigned long start = (unsigned long)&__start_wr_after_init;
+   unsigned long end = (unsigned long)&__end_wr_after_init;
+   unsigned long i;
+   unsigned long wr_range;
+
+   wr_poking_mm = copy_init_mm();
+   if (WARN_ONCE(!wr_poking_mm, "No alternate mapping available."))
+   return;
+
+   wr_range = round_up(end - start, PAGE_SIZE);
+
+   /* Randomize the poking address base*/
+   wr_poking_base = TASK_UNMAPPED_BASE +
+   (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) %
+   (TASK_SIZE - (TASK_UNMAPPED_BASE + wr_range));
+
+   /*
+* Place 64TB of kernel address space within 128TB of user address
+* space, at a random page aligned offset.
+*/
+   wr_poking_base = (((unsigned long)kaslr_get_random_long("WR Poke")) &
+ PAGE_MASK) % (64 * _BITUL(40));


You're setting wr_poking_base twice in a row? Is this an artifact from
rebase?


Yes, the first is a leftover. Thanks for spotting it.

--
igor


Re: [PATCH 04/12] __wr_after_init: x86_64: __wr_op

2018-12-20 Thread Igor Stoppa

On 19/12/2018 23:33, Igor Stoppa wrote:


+   if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") ||
+   WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range."))
+   return (void *)dst;
+
+   offset = dst - (unsigned long)&__start_wr_after_init;


I forgot to remove the offset.
If the whole kernel memory is remapped, it is shifted by wr_poking_base.
I'll fix it in the next iteration.


+   wr_poking_addr = wr_poking_base + offset;


wr_poking_addr = wr_poking_base + dst;

--
igor


[PATCH] checkpatch.pl: Improve WARNING on Kconfig help

2018-12-19 Thread Igor Stoppa
The checkpatch.pl script complains when the help section of a Kconfig
entry is too short, but it doesn't really explain what it is looking
for. Instead, it gives a generic warning that one should consider writing
a paragraph.

But what it *really* checks is that the help section is at least
.$min_conf_desc_length lines long.

Since the definition of what is a paragraph is not really carved in
stone (and actually the primary descriptions is "5 sentences"), make the
warning less ambiguous by expliciting the actual test condition, so that
one doesn't have to read checkpatch.pl sources, to figure out the actual
test.

Signed-off-by: Igor Stoppa 
CC: Andy Whitcroft 
CC: Joe Perches 
CC: Andi Kleen 
CC: linux-kernel@vger.kernel.org
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index c883ec55654f..818ddada28b5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2931,7 +2931,7 @@ sub process {
}
if ($is_start && $is_end && $length < 
$min_conf_desc_length) {
WARN("CONFIG_DESCRIPTION",
-"please write a paragraph that describes 
the config symbol fully\n" . $herecurr);
+"expecting a 'help' section of 
$min_conf_desc_length or more lines\n" . $herecurr);
}
#print "is_start<$is_start> is_end<$is_end> 
length<$length>\n";
}
-- 
2.19.1



Re: [PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-19 Thread Igor Stoppa




On 12/12/2018 11:49, Martin Schwidefsky wrote:

On Wed, 5 Dec 2018 15:13:56 -0800
Andy Lutomirski  wrote:



Hi s390 and powerpc people: it would be nice if this generic
implementation *worked* on your architectures and that it will allow
you to add some straightforward way to add a better arch-specific
implementation if you think that would be better.


As the code is right now I can guarantee that it will not work on s390.


OK, I have thrown the towel wrt developing at the same time for multiple 
architectures.


ATM I'm oriented toward getting support for one (x86_64), leaving the 
actual mechanism as architecture specific.


Then I can add another one or two and see what makes sense to refactor.
This approach should minimize the churning, overall.


--
igor


[PATCH 08/12] rodata_test: refactor tests

2018-12-19 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory, when available.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..e1349520b436 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   pr_info("%s tests were successful", data_type);
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata);
 }
-- 
2.19.1



[PATCH 07/12] __wr_after_init: lkdtm test

2018-12-19 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
attribute will cause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[PATCH 04/12] __wr_after_init: x86_64: __wr_op

2018-12-19 Thread Igor Stoppa
Architecture-specific implementation of the core write rare
operation.

The implementation is based on code from Andy Lutomirski and Nadav Amit
for patching the text on x86 [here goes reference to commits, once merged]

The modification of write protected data is done through an alternate
mapping of the same pages, as writable.
This mapping is persistent, but active only for a core that is
performing a write rare operation. And only for the duration of said
operation.
Local interrupts are disabled, while the alternate mapping is active.

In theory, it could introduce a non-predictable delay, in a preemptible
system, however the amount of data to be altered is likely to be far
smaller than a page.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/Kconfig |   1 +
 arch/x86/mm/Makefile |   2 +
 arch/x86/mm/prmem.c  | 120 +++
 3 files changed, 123 insertions(+)
 create mode 100644 arch/x86/mm/prmem.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8689e794a43c..e5e4fc4fa5c2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86_64
select SWIOTLB
select X86_DEV_DMA_OPS
select ARCH_HAS_SYSCALL_WRAPPER
+   select ARCH_HAS_PRMEM
 
 #
 # Arch settings
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..66652de1e2c7 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION)+= pti.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += mem_encrypt_boot.o
+
+obj-$(CONFIG_PRMEM)+= prmem.o
diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c
new file mode 100644
index ..fc367551e736
--- /dev/null
+++ b/arch/x86/mm/prmem.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static __ro_after_init bool wr_ready;
+static __ro_after_init struct mm_struct *wr_poking_mm;
+static __ro_after_init unsigned long wr_poking_base;
+
+/*
+ * The following two variables are statically allocated by the linker
+ * script at the the boundaries of the memory region (rounded up to
+ * multiples of PAGE_SIZE) reserved for __wr_after_init.
+ */
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static inline bool is_wr_after_init(unsigned long ptr, __kernel_size_t size)
+{
+   unsigned long start = (unsigned long)&__start_wr_after_init;
+   unsigned long end = (unsigned long)&__end_wr_after_init;
+   unsigned long low = ptr;
+   unsigned long high = ptr + size;
+
+   return likely(start <= low && low <= high && high <= end);
+}
+
+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op)
+{
+   temporary_mm_state_t prev;
+   unsigned long offset;
+   unsigned long wr_poking_addr;
+
+   /* Confirm that the writable mapping exists. */
+   if (WARN_ONCE(!wr_ready, "No writable mapping available"))
+   return (void *)dst;
+
+   if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") ||
+   WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range."))
+   return (void *)dst;
+
+   offset = dst - (unsigned long)&__start_wr_after_init;
+   wr_poking_addr = wr_poking_base + offset;
+   local_irq_disable();
+   prev = use_temporary_mm(wr_poking_mm);
+
+   if (op == WR_MEMCPY)
+   copy_to_user((void __user *)wr_poking_addr, (void *)src, len);
+   else if (op == WR_MEMSET)
+   memset_user((void __user *)wr_poking_addr, (u8)src, len);
+
+   unuse_temporary_mm(prev);
+   local_irq_enable();
+   return (void *)dst;
+}
+
+#define TB (1UL << 40)
+
+struct mm_struct *copy_init_mm(void);
+void __init wr_poking_init(void)
+{
+   unsigned long start = (unsigned long)&__start_wr_after_init;
+   unsigned long end = (unsigned long)&__end_wr_after_init;
+   unsigned long i;
+   unsigned long wr_range;
+
+   wr_poking_mm = copy_init_mm();
+   if (WARN_ONCE(!wr_poking_mm, "No alternate mapping available."))
+   return;
+
+   wr_range = round_up(end - start, PAGE_SIZE);
+
+   /* Randomize the poking address base*/
+   wr_poking_base = TASK_UNMAPPED_BASE +
+   (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) %
+  

[PATCH 03/12] __wr_after_init: generic header

2018-12-19 Thread Igor Stoppa
The header provides:
- the generic part of the write rare functionality for static data
- the dummy functionality, in case an arch doesn't support write rare or
  the functionality is disabled

The basic functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h | 142 ++
 1 file changed, 142 insertions(+)
 create mode 100644 include/linux/prmem.h

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..7b8f3a054d97
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,142 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * memtst() - test n bytes of the source to match the c value
+ * @p: beginning of the memory to test
+ * @c: byte to compare against
+ * @len: amount of bytes to test
+ *
+ * Returns 0 on success, non-zero otherwise.
+ */
+static inline int memtst(void *p, int c, __kernel_size_t len)
+{
+   __kernel_size_t i;
+
+   for (i = 0; i < len; i++) {
+   u8 d =  *(i + (u8 *)p) - (u8)c;
+
+   if (unlikely(d))
+   return d;
+   }
+   return 0;
+}
+
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return memset(p, c, len);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return memcpy(p, q, size);
+}
+
+#define wr_assign(var, val)((var) = (val))
+
+#define wr_rcu_assign_pointer(p, v)\
+   rcu_assign_pointer(p, v)
+
+#else
+
+/*
+ * If CONFIG_PRMEM is enabled, the ARCH code must provide an
+ * implementation for __wr_op()
+ */
+
+enum wr_op_type {
+   WR_MEMCPY,
+   WR_MEMSET,
+   WR_OPS_NUMBER,
+};
+
+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op);
+
+/**
+ * wr_memset() - sets n bytes of the destination to the c value
+ * @p: beginning of the memory to write to
+ * @c: byte to replicate
+ * @len: amount of bytes to copy
+ *
+ * Returns true on success, false otherwise.
+ */
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET);
+}
+
+/**
+ * wr_memcpy() - copyes n bytes from source to destination
+ * @dst: beginning of the memory to write to
+ * @src: beginning of the memory to read from
+ * @n_bytes: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ */
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY);
+}
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ *
+ * Note: it might be possible to optimize this, to use wr_memset in some
+ * cases (maybe with NULL?).
+ */
+
+#define wr_assign(var, val) ({ \
+   typeof(var) tmp = (typeof(var))val; \
+   \
+   wr_memcpy(, , sizeof(var)); \
+   var;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer - it MUST be aligned to a machine word
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ * The rcu_assign_pointer() is implemented as equivalent of:
+ *
+ * smp_mb();
+ * WRITE_ONCE();
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   smp_mb();   \
+   wr_assign(p, v);\
+   p;  \
+})
+#endif
+#endif
-- 
2.19.1



[PATCH 05/12] __wr_after_init: x86_64: debug writes

2018-12-19 Thread Igor Stoppa
After each write operation, confirm that it was successful, otherwise
generate a warning.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/mm/prmem.c | 9 -
 mm/Kconfig.debug| 8 
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c
index fc367551e736..9d98525c687a 100644
--- a/arch/x86/mm/prmem.c
+++ b/arch/x86/mm/prmem.c
@@ -60,7 +60,14 @@ void *__wr_op(unsigned long dst, unsigned long src, 
__kernel_size_t len,
copy_to_user((void __user *)wr_poking_addr, (void *)src, len);
else if (op == WR_MEMSET)
memset_user((void __user *)wr_poking_addr, (u8)src, len);
-
+#ifdef CONFIG_DEBUG_PRMEM
+   if (op == WR_MEMCPY)
+   VM_WARN_ONCE(memcmp((void *)dst, (void *)src, len),
+"Failed wr_memcpy()");
+   else if (op == WR_MEMSET)
+   VM_WARN_ONCE(memtst((void *)dst, (u8)src, len),
+"Failed wr_memset()");
+#endif
unuse_temporary_mm(prev);
local_irq_enable();
return (void *)dst;
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..b10305cfac3c 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,11 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM
+bool "Verify each write rare operation."
+depends on PRMEM
+default n
+help
+  After any write rare operation, compares the data written with the
+  value provided by the caller.
-- 
2.19.1



[PATCH 10/12] __wr_after_init: test write rare functionality

2018-12-19 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.
It can be optionally compiled as module.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/Kconfig.debug |   8 +++
 mm/Makefile  |   1 +
 mm/test_write_rare.c | 135 +++
 3 files changed, 144 insertions(+)
 create mode 100644 mm/test_write_rare.c

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index b10305cfac3c..ae018e56c4e4 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -102,3 +102,11 @@ config DEBUG_PRMEM
 help
   After any write rare operation, compares the data written with the
   value provided by the caller.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..62d719c0ee1e 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_SPARSEMEM)   += sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..30574bc34a20
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static bool test_pattern(void)
+{
+   return (memtst(array, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2));
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memtst(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memtst(array, '0', PAGE_SIZE * 3),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2),
+"Array write rare memset test failed."))
+   return false;
+
+   if (WARN(test_pattern(), "Array write rare memset test failed."))
+   return false;
+
+   pr_info("Array write rare memset test passed.");
+   return true;
+}
+
+static u8 array_1[PAGE_SIZE * 2];
+static u8 array_2[PAGE_SIZE * 2];
+
+static bool test_wr_memcpy(void)
+{
+   int new_val = 0x12345678;
+
+   wr_assign(scalar, new_val);
+   if (WARN(memcmp(, _val, sizeof(scalar)),
+"Scalar write rare memcpy test failed."))
+   return false;
+   pr_info("Scalar write rare memcpy test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   memset(array_1, '1', PAGE_SIZE * 2);
+   memset(array_2, '0', PAGE_SIZE * 2);
+

[PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init

2018-12-19 Thread Igor Stoppa
The policy flags could be targeted by an attacker aiming at disabling IMA,
so that there would be no trace of a file system modification in the
measurement list.

Since the flags can be altered at runtime, it is not possible to make
them become fully read-only, for example with __ro_after_init.

__wr_after_init can still provide some protection, at least against
simple memory overwrite attacks

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 security/integrity/ima/ima.h| 3 ++-
 security/integrity/ima/ima_init.c   | 5 +++--
 security/integrity/ima/ima_policy.c | 9 +
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index cc12f3449a72..297c25f5122e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../integrity.h"
@@ -50,7 +51,7 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
 #define IMA_TEMPLATE_IMA_FMT "d|n"
 
 /* current content of the policy */
-extern int ima_policy_flag;
+extern int ima_policy_flag __wr_after_init;
 
 /* set during initialization */
 extern int ima_hash_algo;
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 59d834219cd6..5f4e13e671bf 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -98,9 +99,9 @@ void __init ima_load_x509(void)
 {
int unset_flags = ima_policy_flag & IMA_APPRAISE;
 
-   ima_policy_flag &= ~unset_flags;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~unset_flags);
integrity_load_x509(INTEGRITY_KEYRING_IMA, CONFIG_IMA_X509_PATH);
-   ima_policy_flag |= unset_flags;
+   wr_assign(ima_policy_flag, ima_policy_flag | unset_flags);
 }
 #endif
 
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 7489cb7de6dc..2004de818d92 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -47,7 +47,7 @@
 #define INVALID_PCR(a) (((a) < 0) || \
(a) >= (FIELD_SIZEOF(struct integrity_iint_cache, measured_pcrs) * 8))
 
-int ima_policy_flag;
+int ima_policy_flag __wr_after_init;
 static int temp_ima_appraise;
 static int build_ima_appraise __ro_after_init;
 
@@ -452,12 +452,13 @@ void ima_update_policy_flag(void)
 
list_for_each_entry(entry, ima_rules, list) {
if (entry->action & IMA_DO_MASK)
-   ima_policy_flag |= entry->action;
+   wr_assign(ima_policy_flag,
+ ima_policy_flag | entry->action);
}
 
ima_appraise |= (build_ima_appraise | temp_ima_appraise);
if (!ima_appraise)
-   ima_policy_flag &= ~IMA_APPRAISE;
+   wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE);
 }
 
 static int ima_appraise_flag(enum ima_hooks func)
@@ -574,7 +575,7 @@ void ima_update_policy(void)
list_splice_tail_init_rcu(_temp_rules, policy, synchronize_rcu);
 
if (ima_rules != policy) {
-   ima_policy_flag = 0;
+   wr_assign(ima_policy_flag, 0);
ima_rules = policy;
}
ima_update_policy_flag();
-- 
2.19.1



[PATCH 09/12] rodata_test: add verification for __wr_after_init

2018-12-19 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index e1349520b436..a669cf9f5a61 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,8 +16,23 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
 
+#ifdef CONFIG_PRMEM
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+#endif
+
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
 {
@@ -59,7 +74,13 @@ static bool test_data(char *data_type, const int *data,
 
 void rodata_test(void)
 {
-   test_data("rodata", _test_data,
- (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata);
+   if (!test_data("rodata", _test_data,
+  (unsigned long)&__start_rodata,
+  (unsigned long)&__end_rodata))
+   return;
+#ifdef CONFIG_PRMEM
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init);
+#endif
 }
-- 
2.19.1



[PATCH 06/12] __wr_after_init: Documentation: self-protection

2018-12-19 Thread Igor Stoppa
Update the self-protection documentation, to mention also the use of the
__wr_after_init attribute.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 Documentation/security/self-protection.rst | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/security/self-protection.rst 
b/Documentation/security/self-protection.rst
index f584fb74b4ff..df2614bc25b9 100644
--- a/Documentation/security/self-protection.rst
+++ b/Documentation/security/self-protection.rst
@@ -84,12 +84,14 @@ For variables that are initialized once at ``__init`` time, 
these can
 be marked with the (new and under development) ``__ro_after_init``
 attribute.
 
-What remains are variables that are updated rarely (e.g. GDT). These
-will need another infrastructure (similar to the temporary exceptions
-made to kernel code mentioned above) that allow them to spend the rest
-of their lifetime read-only. (For example, when being updated, only the
-CPU thread performing the update would be given uninterruptible write
-access to the memory.)
+Others, which are statically allocated, but still need to be updated
+rarely, can be marked with the ``__wr_after_init`` attribute.
+
+The update mechanism must avoid exposing the data to rogue alterations
+during the update. For example, only the CPU thread performing the update
+would be given uninterruptible write access to the memory.
+
+Currently there is no protection available for data allocated dynamically.
 
 Segregation of kernel memory from userspace memory
 ~~
-- 
2.19.1



[PATCH 12/12] x86_64: __clear_user as case of __memset_user

2018-12-19 Thread Igor Stoppa
To avoid code duplication, re-use __memset_user(), when clearing
user-space memory.

The overhead should be minimal (2 extra register assignments) and
outside of the writing loop.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/lib/usercopy_64.c | 29 +
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 84f8f8a20b30..ab6aabb62055 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -69,34 +69,7 @@ EXPORT_SYMBOL(memset_user);
 
 unsigned long __clear_user(void __user *addr, unsigned long size)
 {
-   long __d0;
-   might_fault();
-   /* no memory constraint because it doesn't change any memory gcc knows
-  about */
-   stac();
-   asm volatile(
-   "   testq  %[size8],%[size8]\n"
-   "   jz 4f\n"
-   "0: movq $0,(%[dst])\n"
-   "   addq   $8,%[dst]\n"
-   "   decl %%ecx ; jnz   0b\n"
-   "4: movq  %[size1],%%rcx\n"
-   "   testl %%ecx,%%ecx\n"
-   "   jz 2f\n"
-   "1: movb   $0,(%[dst])\n"
-   "   incq   %[dst]\n"
-   "   decl %%ecx ; jnz  1b\n"
-   "2:\n"
-   ".section .fixup,\"ax\"\n"
-   "3: lea 0(%[size1],%[size8],8),%[size8]\n"
-   "   jmp 2b\n"
-   ".previous\n"
-   _ASM_EXTABLE_UA(0b, 3b)
-   _ASM_EXTABLE_UA(1b, 2b)
-   : [size8] "="(size), [dst] "=" (__d0)
-   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr));
-   clac();
-   return size;
+   return __memset_user(addr, 0, size);
 }
 EXPORT_SYMBOL(__clear_user);
 
-- 
2.19.1



[RFC v2 PATCH 0/12] hardening: statically allocated protected memory

2018-12-19 Thread Igor Stoppa
Patch-set implementing write-rare memory protection for statically
allocated data.
Its purpose it to keep data write protected kernel data which is seldom
modified.
There is no read overhead, however writing requires special operations that
are probably unsitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* supports only x86_64, other earchitectures need to provide own backend

Some notes:
- there is a part of generic code which is basically a NOP, but should
  allow using unconditionally the write protection. It will automatically
  default to non-protected functionality, if the specific architecture
  doesn't support write-rare
- to avoid the risk of weakening __ro_after_init, __wr_after_init data is
  in a separate set of pages, and any invocation will confirm that the
  memory affected falls within this range.
  rodata_test is modified accordingly, to check also this case.
- for now, the patchset addresses only x86_64, as each architecture seems
  to have own way of dealing with user space. Once a few are implemented,
  it should be more obvious what code can be refactored as common.
- the memset_user() assembly function seems to work, but I'm not too sure
  it's really ok
- I've added a simple example: the protection of ima_policy_flags
- the last patch is optional, but it seemed worth to do the refactoring

Changelog:

v1->v2

* introduce cleaner split between generic and arch code
* add x86_64 specific memset_user()
* replace kernel-space memset() memcopy() with userspace counterpart
* randomize the base address for the alternate map across the entire
  available address range from user space (128TB - 64TB)
* convert BUG() to WARN()
* turn verification of written data into debugging option
* wr_rcu_assign_pointer() as special case of wr_assign()
* example with protection of ima_policy_flags
* documentation

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org

Igor Stoppa (12):
[PATCH 01/12] x86_64: memset_user()
[PATCH 02/12] __wr_after_init: linker section and label
[PATCH 03/12] __wr_after_init: generic header
[PATCH 04/12] __wr_after_init: x86_64: __wr_op
[PATCH 05/12] __wr_after_init: x86_64: debug writes
[PATCH 06/12] __wr_after_init: Documentation: self-protection
[PATCH 07/12] __wr_after_init: lkdtm test
[PATCH 08/12] rodata_test: refactor tests
[PATCH 09/12] rodata_test: add verification for __wr_after_init
[PATCH 10/12] __wr_after_init: test write rare functionality
[PATCH 11/12] IMA: turn ima_policy_flags into __wr_after_init
[PATCH 12/12] x86_64: __clear_user as case of __memset_user


Documentation/security/self-protection.rst |  14 ++-
arch/Kconfig   |  15 +++
arch/x86/Kconfig   |   1 +
arch/x86/include/asm/uaccess_64.h  |   6 +
arch/x86/lib/usercopy_64.c |  41 +--
arch/x86/mm/Makefile   |   2 +
arch/x86/mm/prmem.c| 127 +
drivers/misc/lkdtm/core.c  |   3 +
drivers/misc/lkdtm/lkdtm.h |   3 +
drivers/misc/lkdtm/perms.c |  29 +
include/asm-generic/vmlinux.lds.h  |  25 +
include/linux/cache.h  |  21 
include/linux/prmem.h  | 142 
init/main.c|   2 +
mm/Kconfig.debug   |  16 +++
mm/Makefile|   1 +
mm/rodata_test.c   |  69 
mm/test_write_rare.c   | 135 ++
security/integrity/ima/ima.h   |   3 +-
security/integrity/ima/ima_init.c  |   5 +-
security/integrity/ima/ima_policy.c|   9 +-
21 files changed, 629 insertions(+), 40 deletions(-)



[PATCH 01/12] x86_64: memset_user()

2018-12-19 Thread Igor Stoppa
Create x86_64 specific version of memset for user space, based on
clear_user().
This will be used for implementing wr_memset() in the __wr_after_init
scenario, where write-rare variables have an alternate mapping for
writing.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/uaccess_64.h |  6 
 arch/x86/lib/usercopy_64.c| 54 +++
 2 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index a9d637bc301d..f194bfce4866 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -213,4 +213,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len);
 unsigned long
 mcsafe_handle_tail(char *to, char *from, unsigned len);
 
+unsigned long __must_check
+memset_user(void __user *mem, int c, unsigned long len);
+
+unsigned long __must_check
+__memset_user(void __user *mem, int c, unsigned long len);
+
 #endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 1bd837cdc4b1..84f8f8a20b30 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -9,6 +9,60 @@
 #include 
 #include 
 
+/*
+ * Memset Userspace
+ */
+
+unsigned long __memset_user(void __user *addr, int c, unsigned long size)
+{
+   long __d0;
+   unsigned long  pattern = 0;
+   int i;
+
+   for (i = 0; i < 8; i++)
+   pattern = (pattern << 8) | (0xFF & c);
+   might_fault();
+   /* no memory constraint: gcc doesn't know about this memory */
+   stac();
+   asm volatile(
+   "   movq %[val], %%rdx\n"
+   "   testq  %[size8],%[size8]\n"
+   "   jz 4f\n"
+   "0: mov %%rdx,(%[dst])\n"
+   "   addq   $8,%[dst]\n"
+   "   decl %%ecx ; jnz   0b\n"
+   "4: movq  %[size1],%%rcx\n"
+   "   testl %%ecx,%%ecx\n"
+   "   jz 2f\n"
+   "1: movb   %%dl,(%[dst])\n"
+   "   incq   %[dst]\n"
+   "   decl %%ecx ; jnz  1b\n"
+   "2:\n"
+   ".section .fixup,\"ax\"\n"
+   "3: lea 0(%[size1],%[size8],8),%[size8]\n"
+   "   jmp 2b\n"
+   ".previous\n"
+   _ASM_EXTABLE_UA(0b, 3b)
+   _ASM_EXTABLE_UA(1b, 2b)
+   : [size8] "="(size), [dst] "=" (__d0)
+   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr),
+ [val] "ri"(pattern)
+   : "rdx");
+
+   clac();
+   return size;
+}
+EXPORT_SYMBOL(__memset_user);
+
+unsigned long memset_user(void __user *to, int c, unsigned long n)
+{
+   if (access_ok(VERIFY_WRITE, to, n))
+   return __memset_user(to, c, n);
+   return n;
+}
+EXPORT_SYMBOL(memset_user);
+
+
 /*
  * Zero Userspace
  */
-- 
2.19.1



[PATCH 02/12] __wr_after_init: linker section and label

2018-12-19 Thread Igor Stoppa
Introduce a section and a label for statically allocated write rare
data. The label is named "__wr_after_init".
As the name implies, after the init phase is completed, this section
will be modifiable only by invoking write rare functions.
The section must take up a set of full pages.

To activate both section and label, the arch must set CONFIG_ARCH_HAS_PRMEM

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: Mimi Zohar 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 arch/Kconfig  | 15 +++
 include/asm-generic/vmlinux.lds.h | 25 +
 include/linux/cache.h | 21 +
 init/main.c   |  2 ++
 4 files changed, 63 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index e1e540ffa979..8668ffec8098 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -802,6 +802,21 @@ config VMAP_STACK
  the stack to map directly to the KASAN shadow map using a formula
  that is incorrect if the stack is in vmalloc space.
 
+config ARCH_HAS_PRMEM
+   def_bool n
+   help
+ architecture specific symbol stating that the architecture provides
+ a back-end function for the write rare operation.
+
+config PRMEM
+   bool "Write protect critical data that doesn't need high write speed."
+   depends on ARCH_HAS_PRMEM
+   default y
+   help
+ If the architecture supports it, statically allocated data which
+ has been selected for hardening becomes (mostly) read-only.
+ The selection happens by labelling the data "__wr_after_init".
+
 config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
 
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 3d7a6a9c2370..ddb1fd608490 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -311,6 +311,30 @@
KEEP(*(__jump_table))   \
__stop___jump_table = .;
 
+/*
+ * Allow architectures to handle wr_after_init data on their
+ * own by defining an empty WR_AFTER_INIT_DATA.
+ * However, it's important that pages containing WR_RARE data do not
+ * hold anything else, to avoid both accidentally unprotecting something
+ * that is supposed to stay read-only all the time and also to protect
+ * something else that is supposed to be writeable all the time.
+ */
+#ifndef WR_AFTER_INIT_DATA
+#ifdef CONFIG_PRMEM
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(PAGE_SIZE);   \
+   __start_wr_after_init = .;  \
+   . = ALIGN(align);   \
+   *(.data..wr_after_init) \
+   . = ALIGN(PAGE_SIZE);   \
+   __end_wr_after_init = .;\
+   . = ALIGN(align);
+#else
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(align);
+#endif
+#endif
+
 /*
  * Allow architectures to handle ro_after_init data on their
  * own by defining an empty RO_AFTER_INIT_DATA.
@@ -332,6 +356,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
RO_AFTER_INIT_DATA  /* Read only after init */  \
+   WR_AFTER_INIT_DATA(align) /* wr after init */   \
KEEP(*(__vermagic)) /* Kernel version magic */  \
. = ALIGN(8);   \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 750621e41d1c..09bd0b9284b6 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -31,6 +31,27 @@
 #define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
 #endif
 
+/*
+ * __wr_after_init is used to mark objects that cannot be modified
+ * directly after init (i.e. after mark_rodata_ro() has been called).
+ * These objects become effectively read-only, from the perspective of
+ * performing a direct write, like a variable assignment.
+ * However, they can be altered through a dedicated function.
+ * It is intended for those objects which are occasionally modified after
+ * init, however they are modified so seldomly, that the extra cost from
+ * the indirect modification is either negligible or worth paying, for the
+ * sake of the protection gained.
+ */
+#ifndef __wr_after_init
+#ifdef CONFIG_PRMEM
+#define __wr_after_init \
+   __attribute__((__section__(".data..wr_after_init")

[PATCH] checkpatch.pl: Improve WARNING on Kconfig help

2018-12-19 Thread Igor Stoppa
The checkpatch.pl script complains when the help section of a Kconfig
entry is too short, but it doesn't really explain what it is looking
for. Instead, it gives a generic warning that one should consider writing
a paragraph.

But what it *really* checks is that the help section is at least
.$min_conf_desc_length lines long.

Since the definition of what is a paragraph is not really carved in
stone (and actually the primary descriptions is "5 sentences"), make the
warning less ambiguous by expliciting the actual test condition, so that
one doesn't have to read checkpatch.pl sources, to figure out the actual
test.

Signed-off-by: Igor Stoppa 
CC: Andy Whitcroft 
CC: Joe Perches 
CC: linux-kernel@vger.kernel.org
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index c883ec55654f..33568d7e28d1 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2931,7 +2931,7 @@ sub process {
}
if ($is_start && $is_end && $length < 
$min_conf_desc_length) {
WARN("CONFIG_DESCRIPTION",
-"please write a paragraph that describes 
the config symbol fully\n" . $herecurr);
+"expecting a 'help' section of " 
.$min_conf_desc_length . "+ lines\n" . $herecurr);
}
#print "is_start<$is_start> is_end<$is_end> 
length<$length>\n";
}
-- 
2.19.1



Re: [PATCH] checkpatch.pl: Improve WARNING on Kconfig help

2018-12-19 Thread Igor Stoppa




On 19/12/2018 14:29, Joe Perches wrote:

On Wed, 2018-12-19 at 11:59 +, Andy Whitcroft wrote:

On Wed, Dec 19, 2018 at 02:44:36AM -0800, Joe Perches wrote:



To cover both cases perhaps:

"please ensure that this config symbols is described fully (less than
 $min_conf_desc_length lines is quite brief)"


This is one of those checkpatch bleats I never
really thought was appropriate as some or many
Kconfig symbols are fully descriptive in even
with only a single line.

Also, it seems you are arguing for a checkpatch
--verbose-help output style rather than the
intentionally terse single line output that the
script produces today.


If I have to use --verbose, to understand that the warning is about me 
writing 3 lines when the script expects 4, I don't think it's 
particularly user friendly.


Let's write "Expected 4+ lines" or something equally clear.
It will fit in a row and get the job done.


That is something Al Viro once suggested in this thread:
https://lore.kernel.org/patchwork/patch/775901/

On Sat, 2017-04-01 at 05:08 +0100, Al Viro wrote:

On Fri, Mar 31, 2017 at 08:52:50PM -0700, Joe Perches wrote:

checkpatch messages are single line.


Too bad... Incidentally, being able to get more detailed explanation of
a warning might be a serious improvement, especially if it contains
the rationale.  Hell, something like TeX handling of errors might be
a good idea - warning printed, offered actions include 'give more help',
'continue', 'exit', 'from now on suppress this kind of warning', 'from
now on just dump this kind of warning into log and keep going', 'from
now on dump all warnings into log and keep going'.


It's all good in general, but here the word "paragraph" is being abused, 
in the sense that it has been given an arbitrary meaning of "4 lines".
And the warning is even worse because it doesn't even acknowledge that I 
wrote something, even if it's a meager 1 or 2 lines.

Which is even more confusing.

As user, if I'm running checkpatch.pl and I get a warning, I should 
spend my time trying to decide if/how to fix it, not re-invoking it with 
extra options or reading its sources.


--
igor





[PATCH] checkpatch.pl: Improve WARNING on Kconfig help

2018-12-19 Thread Igor Stoppa
The checkpatch.pl script complains when the help section of a Kconfig
entry is too short, but it doesn't really explain what it is looking
for. Instead, it gives a generic warning that one should consider writing
a paragraph.

But what it *really* checks is that the help section is at least
.$min_conf_desc_length lines long.

Since the definition of what is a paragraph is not really carved in
stone (and actually the primary descriptions is "5 sentences"), make the
warning less ambiguous by expliciting the actual test condition, so that
one doesn't have to read checkpatch.pl sources, to figure out the actual
test.

Signed-off-by: Igor Stoppa 
CC: Andy Whitcroft 
CC: Joe Perches 
CC: linux-kernel@vger.kernel.org
---
 scripts/checkpatch.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index c883ec55654f..e255f0423cca 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2931,7 +2931,8 @@ sub process {
}
if ($is_start && $is_end && $length < 
$min_conf_desc_length) {
WARN("CONFIG_DESCRIPTION",
-"please write a paragraph that describes 
the config symbol fully\n" . $herecurr);
+"please write a paragraph (" 
.$min_conf_desc_length . " lines)" .
+" that describes the config symbol 
fully\n" . $herecurr);
}
#print "is_start<$is_start> is_end<$is_end> 
length<$length>\n";
}
-- 
2.19.1



Re: [PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-09 Thread Igor Stoppa




On 06/12/2018 11:44, Peter Zijlstra wrote:

On Wed, Dec 05, 2018 at 03:13:56PM -0800, Andy Lutomirski wrote:


+   if (op == WR_MEMCPY)
+   memcpy((void *)wr_poking_addr, (void *)src, len);
+   else if (op == WR_MEMSET)
+   memset((u8 *)wr_poking_addr, (u8)src, len);
+   else if (op == WR_RCU_ASSIGN_PTR)
+   /* generic version of rcu_assign_pointer */
+   smp_store_release((void **)wr_poking_addr,
+ RCU_INITIALIZER((void **)src));
+   kasan_enable_current();


Hmm.  I suspect this will explode quite badly on sane architectures
like s390.  (In my book, despite how weird s390 is, it has a vastly
nicer model of "user" memory than any other architecture I know
of...).  I think you should use copy_to_user(), etc, instead.  I'm not
entirely sure what the best smp_store_release() replacement is.
Making this change may also mean you can get rid of the
kasan_disable_current().


If you make the MEMCPY one guarantee single-copy atomicity for native
words then you're basically done.

smp_store_release() can be implemented with:

smp_mb();
WRITE_ONCE();

So if we make MEMCPY provide the WRITE_ONCE(), all we need is that
barrier, which we can easily place at the call site and not overly
complicate our interface with this.


Ok, so the 3rd case (WR_RCU_ASSIGN_PTR) could be handled outside of this 
function.
But, since now memcpy() will be replaced by copy_to_user(), can I assume 
that also copy_to_user() will be atomic, if the destination is properly 
aligned? On x86_64 it seems yes, however it's not clear to me if this is 
the outcome of an optimization or if I can expect it to be always true.



--
igor


Re: [PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-09 Thread Igor Stoppa




On 06/12/2018 06:44, Matthew Wilcox wrote:

On Tue, Dec 04, 2018 at 02:18:01PM +0200, Igor Stoppa wrote:

+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op)
+{
+   temporary_mm_state_t prev;
+   unsigned long flags;
+   unsigned long offset;
+   unsigned long wr_poking_addr;
+
+   /* Confirm that the writable mapping exists. */
+   BUG_ON(!wr_ready);
+
+   if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") ||
+   WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range."))
+   return (void *)dst;
+
+   offset = dst - (unsigned long)&__start_wr_after_init;
+   wr_poking_addr = wr_poking_base + offset;
+   local_irq_save(flags);


Why not local_irq_disable()?  Do we have a use-case for wanting to access
this from interrupt context?


No, not that I can think of. It was "just in case", but I can remove it.


+   /* XXX make the verification optional? */


Well, yes.  It seems like debug code to me.


Ok, I was not sure about this, because text_poke() does it as part of 
its normal operations.



+   /* Randomize the poking address base*/
+   wr_poking_base = TASK_UNMAPPED_BASE +
+   (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) %
+   (TASK_SIZE - (TASK_UNMAPPED_BASE + wr_range));


I don't think this is a great idea.  We want to use the same mm for both
static and dynamic wr memory, yes?  So we should have enough space for
all of ram, not splatter the static section all over the address space.

On x86-64 (4 level page tables), we have a 64TB space for all of physmem
and 128TB of user space, so we can place the base anywhere in a 64TB
range.


I was actually wondering about the dynamic part.
It's still not clear to me if it's possible to write the code in a 
sufficiently generic way that it could work on all 64 bit architectures.

I'll start with x86-64 as you suggest.

--
igor



Re: [PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-09 Thread Igor Stoppa

On 06/12/2018 01:13, Andy Lutomirski wrote:


+   kasan_disable_current();
+   if (op == WR_MEMCPY)
+   memcpy((void *)wr_poking_addr, (void *)src, len);
+   else if (op == WR_MEMSET)
+   memset((u8 *)wr_poking_addr, (u8)src, len);
+   else if (op == WR_RCU_ASSIGN_PTR)
+   /* generic version of rcu_assign_pointer */
+   smp_store_release((void **)wr_poking_addr,
+ RCU_INITIALIZER((void **)src));
+   kasan_enable_current();


Hmm.  I suspect this will explode quite badly on sane architectures
like s390.  (In my book, despite how weird s390 is, it has a vastly
nicer model of "user" memory than any other architecture I know
of...).


I see. I can try to setup also a qemu target for s390, for my tests.
There seems to be a Debian image, to have a fully bootable system.


I think you should use copy_to_user(), etc, instead.


I'm having troubles with the "etc" part: as far as I can see, there are 
both generic and specific support for both copying and clearing 
user-space memory from kernel, however I couldn't find something that 
looks like a memset_user().


I can of course roll my own, for example iterating copy_to_user() with 
the support of a pre-allocated static buffer (1 page should be enough).


But, before I go down this path, I wanted to confirm that there's really 
nothing better that I could use.


If that's really the case, the static buffer instance should be 
replicated for each core, I think, since each core could be performing 
its own memset_user() at the same time.


Alternatively, I could do a loop of WRITE_ONCE(), however I'm not sure 
how that would work with (lack-of) alignment and might require also a 
preamble/epilogue to deal with unaligned data?



 I'm not
entirely sure what the best smp_store_release() replacement is.
Making this change may also mean you can get rid of the
kasan_disable_current().


+
+   barrier(); /* XXX redundant? */


I think it's redundant.  If unuse_temporary_mm() allows earlier stores
to hit the wrong address space, then something is very very wrong, and
something is also very very wrong if the optimizer starts moving
stores across a function call that is most definitely a barrier.


ok, thanks


+
+   unuse_temporary_mm(prev);
+   /* XXX make the verification optional? */
+   if (op == WR_MEMCPY)
+   BUG_ON(memcmp((void *)dst, (void *)src, len));
+   else if (op == WR_MEMSET)
+   BUG_ON(memtst((void *)dst, (u8)src, len));
+   else if (op == WR_RCU_ASSIGN_PTR)
+   BUG_ON(*(unsigned long *)dst != src);


Hmm.  If you allowed cmpxchg or even plain xchg, then these bug_ons
would be thoroughly buggy, but maybe they're okay.  But they should,
at most, be WARN_ON_ONCE(), 


I have to confess that I do not understand why Nadav's patchset was 
required to use BUG_ON(), while here it's not correct, not even for 
memcopy or memset .


Is it because it is single-threaded?
Or is it because text_poke() is patching code, instead of data?
I can turn to WARN_ON_ONCE(), but I'd like to understand the reason.


given that you can trigger them by writing
the same addresses from two threads at once, and this isn't even
entirely obviously bogus given the presence of smp_store_release().


True, however would it be reasonable to require the use of an explicit 
writer lock, from the user?


This operation is not exactly fast and should happen seldom; I'm not 
sure if it's worth supporting cmpxchg. The speedup would be minimal.


I'd rather not implement the locking implicitly, even if it would be 
possible to detect simultaneous writes, because it might lead to overall 
inconsistent data.


--
igor


[PATCH 6/6] __wr_after_init: lkdtm test

2018-12-04 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
modifier wil lcause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[PATCH 6/6] __wr_after_init: lkdtm test

2018-12-04 Thread Igor Stoppa
Verify that trying to modify a variable with the __wr_after_init
modifier wil lcause a crash.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  3 +++
 drivers/misc/lkdtm/perms.c | 29 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2837dc77478e..73c34b17c433 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PRMEM
+   CRASHTYPE(WRITE_WR_AFTER_INIT),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 3c6fd327e166..abba2f52ffa6 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,9 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+#ifdef CONFIG_PRMEM
+void lkdtm_WRITE_WR_AFTER_INIT(void);
+#endif
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..f681730aa652 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -27,6 +28,10 @@ static const unsigned long rodata = 0xAA55AA55;
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_after_init, so it should be in .rodata. */
+static
+unsigned long wr_after_init __wr_after_init = 0x55AA5500;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -104,6 +109,28 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PRMEM
+
+void lkdtm_WRITE_WR_AFTER_INIT(void)
+{
+   unsigned long *ptr = _after_init;
+
+   /*
+* Verify we were written to during init. Since an Oops
+* is considered a "success", a failure is to just skip the
+* real test.
+*/
+   if ((*ptr & 0xAA) != 0xAA) {
+   pr_info("%p was NOT written during init!?\n", ptr);
+   return;
+   }
+
+   pr_info("attempting bad wr_after_init write at %p\n", ptr);
+   *ptr ^= 0xabcd1234;
+}
+
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
@@ -200,4 +227,6 @@ void __init lkdtm_perms_init(void)
/* Make sure we can write to __ro_after_init values during __init */
ro_after_init |= 0xAA;
 
+   /* Make sure we can write to __wr_after_init during __init */
+   wr_after_init |= 0xAA;
 }
-- 
2.19.1



[PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-04 Thread Igor Stoppa
Implementation of write rare for statically allocated data, located in a
specific memory section through the use of the __write_rare label.

The basic functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

The implementation is based on code from Andy Lutomirski and Nadav Amit
for patching the text on x86 [here goes reference to commits, once merged]

The modification of write protected data is done through an alternate
mapping of the same pages, as writable.
This mapping is local to each core and is active only for the duration
of each write operation.
Local interrupts are disabled, while the alternate mapping is active.

In theory, it could introduce a non-predictable delay, in a preemptible
system, however the amount of data to be altered is likely to be far
smaller than a page.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h | 133 ++
 init/main.c   |   2 +
 mm/Kconfig|   4 ++
 mm/Makefile   |   1 +
 mm/prmem.c| 124 +++
 5 files changed, 264 insertions(+)
 create mode 100644 include/linux/prmem.h
 create mode 100644 mm/prmem.c

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..b0131c1f5dc0
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,133 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * memtst() - test n bytes of the source to match the c value
+ * @p: beginning of the memory to test
+ * @c: byte to compare against
+ * @len: amount of bytes to test
+ *
+ * Returns 0 on success, non-zero otherwise.
+ */
+static inline int memtst(void *p, int c, __kernel_size_t len)
+{
+   __kernel_size_t i;
+
+   for (i = 0; i < len; i++) {
+   u8 d =  *(i + (u8 *)p) - (u8)c;
+
+   if (unlikely(d))
+   return d;
+   }
+   return 0;
+}
+
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return memset(p, c, len);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return memcpy(p, q, size);
+}
+
+#define wr_assign(var, val)((var) = (val))
+
+#define wr_rcu_assign_pointer(p, v)\
+   rcu_assign_pointer(p, v)
+
+#else
+
+enum wr_op_type {
+   WR_MEMCPY,
+   WR_MEMSET,
+   WR_RCU_ASSIGN_PTR,
+   WR_OPS_NUMBER,
+};
+
+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op);
+
+/**
+ * wr_memset() - sets n bytes of the destination to the c value
+ * @p: beginning of the memory to write to
+ * @c: byte to replicate
+ * @len: amount of bytes to copy
+ *
+ * Returns true on success, false otherwise.
+ */
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET);
+}
+
+/**
+ * wr_memcpy() - copyes n bytes from source to destination
+ * @dst: beginning of the memory to write to
+ * @src: beginning of the memory to read from
+ * @n_bytes: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ */
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY);
+}
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ *
+ * Note: it might be possible to optimize this, to use wr_memset in some
+ * cases (maybe with NULL?).
+ */
+
+#define wr_assign(var, val) ({ \
+   typeof(var) tmp = (typeof(var))val; \
+   \
+   wr_memcpy(, , sizeof(var)); \
+   var;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_

[PATCH 4/6] rodata_test: add verification for __wr_after_init

2018-12-04 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index 3c1e515ca9b1..a98d088ad9cc 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,7 +16,19 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
 
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
@@ -60,6 +72,9 @@ void rodata_test(void)
 {
if (test_data("rodata", _test_data,
  (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata))
+ (unsigned long)&__end_rodata) &&
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init))
pr_info("all tests were successful\n");
 }
-- 
2.19.1



[PATCH 5/6] __wr_after_init: test write rare functionality

2018-12-04 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h |   7 ++-
 mm/Kconfig.debug  |   9 +++
 mm/Makefile   |   1 +
 mm/test_write_rare.c  | 135 ++
 4 files changed, 149 insertions(+), 3 deletions(-)
 create mode 100644 mm/test_write_rare.c

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index b0131c1f5dc0..d2492ec24c8c 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -125,9 +125,10 @@ static inline void *wr_memcpy(void *p, const void *q, 
__kernel_size_t size)
  *
  * It is provided as macro, to match rcu_assign_pointer()
  */
-#define wr_rcu_assign_pointer(p, v) ({ \
-   __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_PTR);\
-   p;  \
+#define wr_rcu_assign_pointer(p, v) ({ \
+   __wr_op((unsigned long), (unsigned long)v, sizeof(p), \
+   WR_RCU_ASSIGN_PTR); \
+   p;  \
 })
 #endif
 #endif
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..a26ecbd27aea 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,12 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on STRICT_KERNEL_RWX
+select PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index ef3867c16ce0..8de1d468f4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PRMEM) += prmem.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..240cc43793d1
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static inline bool test_pattern(void)
+{
+   return (memtst(array, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2));
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memtst(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memtst(array, '0', PAGE_SIZE * 3),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAG

[PATCH 2/6] __wr_after_init: write rare for static allocation

2018-12-04 Thread Igor Stoppa
Implementation of write rare for statically allocated data, located in a
specific memory section through the use of the __write_rare label.

The basic functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

The implementation is based on code from Andy Lutomirski and Nadav Amit
for patching the text on x86 [here goes reference to commits, once merged]

The modification of write protected data is done through an alternate
mapping of the same pages, as writable.
This mapping is local to each core and is active only for the duration
of each write operation.
Local interrupts are disabled, while the alternate mapping is active.

In theory, it could introduce a non-predictable delay, in a preemptible
system, however the amount of data to be altered is likely to be far
smaller than a page.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h | 133 ++
 init/main.c   |   2 +
 mm/Kconfig|   4 ++
 mm/Makefile   |   1 +
 mm/prmem.c| 124 +++
 5 files changed, 264 insertions(+)
 create mode 100644 include/linux/prmem.h
 create mode 100644 mm/prmem.c

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index ..b0131c1f5dc0
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,133 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * memtst() - test n bytes of the source to match the c value
+ * @p: beginning of the memory to test
+ * @c: byte to compare against
+ * @len: amount of bytes to test
+ *
+ * Returns 0 on success, non-zero otherwise.
+ */
+static inline int memtst(void *p, int c, __kernel_size_t len)
+{
+   __kernel_size_t i;
+
+   for (i = 0; i < len; i++) {
+   u8 d =  *(i + (u8 *)p) - (u8)c;
+
+   if (unlikely(d))
+   return d;
+   }
+   return 0;
+}
+
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return memset(p, c, len);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return memcpy(p, q, size);
+}
+
+#define wr_assign(var, val)((var) = (val))
+
+#define wr_rcu_assign_pointer(p, v)\
+   rcu_assign_pointer(p, v)
+
+#else
+
+enum wr_op_type {
+   WR_MEMCPY,
+   WR_MEMSET,
+   WR_RCU_ASSIGN_PTR,
+   WR_OPS_NUMBER,
+};
+
+void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len,
+ enum wr_op_type op);
+
+/**
+ * wr_memset() - sets n bytes of the destination to the c value
+ * @p: beginning of the memory to write to
+ * @c: byte to replicate
+ * @len: amount of bytes to copy
+ *
+ * Returns true on success, false otherwise.
+ */
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+   return __wr_op((unsigned long)p, (unsigned long)c, len, WR_MEMSET);
+}
+
+/**
+ * wr_memcpy() - copyes n bytes from source to destination
+ * @dst: beginning of the memory to write to
+ * @src: beginning of the memory to read from
+ * @n_bytes: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ */
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+   return __wr_op((unsigned long)p, (unsigned long)q, size, WR_MEMCPY);
+}
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ *
+ * Note: it might be possible to optimize this, to use wr_memset in some
+ * cases (maybe with NULL?).
+ */
+
+#define wr_assign(var, val) ({ \
+   typeof(var) tmp = (typeof(var))val; \
+   \
+   wr_memcpy(, , sizeof(var)); \
+   var;\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ */
+#define wr_rcu_assign_pointer(p, v) ({ \
+   __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_

[PATCH 4/6] rodata_test: add verification for __wr_after_init

2018-12-04 Thread Igor Stoppa
The write protection of the __wr_after_init data can be verified with the
same methodology used for const data.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index 3c1e515ca9b1..a98d088ad9cc 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -16,7 +16,19 @@
 
 #define INIT_TEST_VAL 0xC3
 
+/*
+ * Note: __ro_after_init data is, for every practical effect, equivalent to
+ * const data, since they are even write protected at the same time; there
+ * is no need for separate testing.
+ * __wr_after_init data, otoh, is altered also after the write protection
+ * takes place and it cannot be exploitable for altering more permanent
+ * data.
+ */
+
 static const int rodata_test_data = INIT_TEST_VAL;
+static int wr_after_init_test_data __wr_after_init = INIT_TEST_VAL;
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
 
 static bool test_data(char *data_type, const int *data,
  unsigned long start, unsigned long end)
@@ -60,6 +72,9 @@ void rodata_test(void)
 {
if (test_data("rodata", _test_data,
  (unsigned long)&__start_rodata,
- (unsigned long)&__end_rodata))
+ (unsigned long)&__end_rodata) &&
+   test_data("wr after init data", _after_init_test_data,
+ (unsigned long)&__start_wr_after_init,
+ (unsigned long)&__end_wr_after_init))
pr_info("all tests were successful\n");
 }
-- 
2.19.1



[PATCH 5/6] __wr_after_init: test write rare functionality

2018-12-04 Thread Igor Stoppa
Set of test cases meant to confirm that the write rare functionality
works as expected.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h |   7 ++-
 mm/Kconfig.debug  |   9 +++
 mm/Makefile   |   1 +
 mm/test_write_rare.c  | 135 ++
 4 files changed, 149 insertions(+), 3 deletions(-)
 create mode 100644 mm/test_write_rare.c

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index b0131c1f5dc0..d2492ec24c8c 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -125,9 +125,10 @@ static inline void *wr_memcpy(void *p, const void *q, 
__kernel_size_t size)
  *
  * It is provided as macro, to match rcu_assign_pointer()
  */
-#define wr_rcu_assign_pointer(p, v) ({ \
-   __wr_op((unsigned long), v, sizeof(p), WR_RCU_ASSIGN_PTR);\
-   p;  \
+#define wr_rcu_assign_pointer(p, v) ({ \
+   __wr_op((unsigned long), (unsigned long)v, sizeof(p), \
+   WR_RCU_ASSIGN_PTR); \
+   p;  \
 })
 #endif
 #endif
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 9a7b8b049d04..a26ecbd27aea 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -94,3 +94,12 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_PRMEM_TEST
+tristate "Run self test for statically allocated protected memory"
+depends on STRICT_KERNEL_RWX
+select PRMEM
+default n
+help
+  Tries to verify that the protection for statically allocated memory
+  works correctly and that the memory is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index ef3867c16ce0..8de1d468f4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PRMEM) += prmem.o
+obj-$(CONFIG_DEBUG_PRMEM_TEST) += test_write_rare.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_write_rare.c b/mm/test_write_rare.c
new file mode 100644
index ..240cc43793d1
--- /dev/null
+++ b/mm/test_write_rare.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * test_write_rare.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+
+static __wr_after_init int scalar = '0';
+static __wr_after_init u8 array[PAGE_SIZE * 3] __aligned(PAGE_SIZE);
+
+/* The section must occupy a non-zero number of whole pages */
+static bool test_alignment(void)
+{
+   unsigned long pstart = (unsigned long)&__start_wr_after_init;
+   unsigned long pend = (unsigned long)&__end_wr_after_init;
+
+   if (WARN((pstart & ~PAGE_MASK) || (pend & ~PAGE_MASK) ||
+(pstart >= pend), "Boundaries test failed."))
+   return false;
+   pr_info("Boundaries test passed.");
+   return true;
+}
+
+static inline bool test_pattern(void)
+{
+   return (memtst(array, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2) ||
+   memtst(array + PAGE_SIZE * 7 / 4, '1', PAGE_SIZE * 3 / 4) ||
+   memtst(array + PAGE_SIZE * 5 / 2, '0', PAGE_SIZE / 2));
+}
+
+static bool test_wr_memset(void)
+{
+   int new_val = '1';
+
+   wr_memset(, new_val, sizeof(scalar));
+   if (WARN(memtst(, new_val, sizeof(scalar)),
+"Scalar write rare memset test failed."))
+   return false;
+
+   pr_info("Scalar write rare memset test passed.");
+
+   wr_memset(array, '0', PAGE_SIZE * 3);
+   if (WARN(memtst(array, '0', PAGE_SIZE * 3),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2);
+   if (WARN(memtst(array + PAGE_SIZE / 2, '1', PAGE_SIZE * 2),
+"Array write rare memset test failed."))
+   return false;
+
+   wr_memset(array + PAGE_SIZE * 5 / 4, '0', PAGE_SIZE / 2);
+   if (WARN(memtst(array + PAGE_SIZE * 5 / 4, '0', PAG

[RFC v1 PATCH 0/6] hardening: statically allocated protected memory

2018-12-04 Thread Igor Stoppa
This patch-set is the first-cut implementation of write-rare memory
protection, as previously agreed [1]
Its purpose it to keep data write protected kernel data which is seldom
modified.
There is no read overhead, however writing requires special operations that
are probably unsitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* supports only x86_64
* might not work for very large amount of data, since it relies on the
  assumption that said data can be entirely remapped, at init.


Some notes:
- even if the code is only for x86_64, it is placed in the generic
  locations, with the intention of extending it also to arm64
- the current section used for collecting wr-after-init data might need to
  be moved, to work with arm64 MMU
- the functionality is in its own c and h files, for now, to ease the
  introduction (and refactoring) of code dealing with dynamic allocation
- recently some updated patches were posted for live-patch on arm64 [2],
  they might help with adding arm64 support here
- to avoid the risk of weakening __ro_after_init, __wr_after_init data is
  in a separate set of pages, and any invocation will confirm that the
  memory affected falls within this range.
  I have modified rodata_test accordingly, to check als othis case.
- to avoid replicating the code which does the change of mapping, there is
  only one function performing multiple, selectable, operations, such as
  memcpy(), memset(). I have added also rcu_assign_pointer() as further
  example. But I'm not too fond of this implementation either. I just
  couldn't think of any that I would like significantly better.
- I have left out the patchset from Nadav that these patches depend on,
  but it can be found here [3] (Should have I resubmitted it?)
- I am not sure what is the correct form for giving proper credit wrt the
  authoring of the wr_after_init mechanism, guidance would be appreciated
- In an attempt to spam less people, I have curbed the list of recipients.
  If I have omitted someone who should have been kept/added, please
  add them to the thread.


[1] https://www.openwall.com/lists/kernel-hardening/2018/11/22/8
[2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793199.html
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1810245.html

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org



Igor Stoppa (6):
[PATCH 1/6] __wr_after_init: linker section and label
[PATCH 2/6] __wr_after_init: write rare for static allocation
[PATCH 3/6] rodata_test: refactor tests
[PATCH 4/6] rodata_test: add verification for __wr_after_init
[PATCH 5/6] __wr_after_init: test write rare functionality
[PATCH 6/6] __wr_after_init: lkdtm test

drivers/misc/lkdtm/core.c |   3 +
drivers/misc/lkdtm/lkdtm.h|   3 +
drivers/misc/lkdtm/perms.c|  29 
include/asm-generic/vmlinux.lds.h |  20 ++
include/linux/cache.h |  17 +
include/linux/prmem.h | 134 +
init/main.c   |   2 +
mm/Kconfig|   4 ++
mm/Kconfig.debug  |   9 +++
mm/Makefile   |   2 +
mm/prmem.c| 124 ++
mm/rodata_test.c  |  63 --
mm/test_write_rare.c  | 135 ++
13 files changed, 525 insertions(+), 20 deletions(-)





[PATCH 1/6] __wr_after_init: linker section and label

2018-12-04 Thread Igor Stoppa
Introduce a section and a label for statically allocated write rare
data. The label is named "__wr_after_init".
As the name implies, after the init phase is completed, this section
will be modifiable only by invoking write rare functions.
The section must take up a set of full pages.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/asm-generic/vmlinux.lds.h | 20 
 include/linux/cache.h | 17 +
 2 files changed, 37 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 3d7a6a9c2370..b711dbe6999f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -311,6 +311,25 @@
KEEP(*(__jump_table))   \
__stop___jump_table = .;
 
+/*
+ * Allow architectures to handle wr_after_init data on their
+ * own by defining an empty WR_AFTER_INIT_DATA.
+ * However, it's important that pages containing WR_RARE data do not
+ * hold anything else, to avoid both accidentally unprotecting something
+ * that is supposed to stay read-only all the time and also to protect
+ * something else that is supposed to be writeable all the time.
+ */
+#ifndef WR_AFTER_INIT_DATA
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(PAGE_SIZE);   \
+   __start_wr_after_init = .;  \
+   . = ALIGN(align);   \
+   *(.data..wr_after_init) \
+   . = ALIGN(PAGE_SIZE);   \
+   __end_wr_after_init = .;\
+   . = ALIGN(align);
+#endif
+
 /*
  * Allow architectures to handle ro_after_init data on their
  * own by defining an empty RO_AFTER_INIT_DATA.
@@ -332,6 +351,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
RO_AFTER_INIT_DATA  /* Read only after init */  \
+   WR_AFTER_INIT_DATA(align) /* wr after init */   \
KEEP(*(__vermagic)) /* Kernel version magic */  \
. = ALIGN(8);   \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 750621e41d1c..9a7e7134b887 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -31,6 +31,23 @@
 #define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
 #endif
 
+/*
+ * __wr_after_init is used to mark objects that cannot be modified
+ * directly after init (i.e. after mark_rodata_ro() has been called).
+ * These objects become effectively read-only, from the perspective of
+ * performing a direct write, like a variable assignment.
+ * However, they can be altered through a dedicated function.
+ * It is intended for those objects which are occasionally modified after
+ * init, however they are modified so seldomly, that the extra cost from
+ * the indirect modification is either negligible or worth paying, for the
+ * sake of the protection gained.
+ */
+#ifndef __wr_after_init
+#define __wr_after_init \
+   __attribute__((__section__(".data..wr_after_init")))
+#endif
+
+
 #ifndef cacheline_aligned
 #define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
 #endif
-- 
2.19.1



[RFC v1 PATCH 0/6] hardening: statically allocated protected memory

2018-12-04 Thread Igor Stoppa
This patch-set is the first-cut implementation of write-rare memory
protection, as previously agreed [1]
Its purpose it to keep data write protected kernel data which is seldom
modified.
There is no read overhead, however writing requires special operations that
are probably unsitable for often-changing data.
The use is opt-in, by applying the modifier __wr_after_init to a variable
declaration.

As the name implies, the write protection kicks in only after init() is
completed; before that moment, the data is modifiable in the usual way.

Current Limitations:
* supports only data which is allocated statically, at build time.
* supports only x86_64
* might not work for very large amount of data, since it relies on the
  assumption that said data can be entirely remapped, at init.


Some notes:
- even if the code is only for x86_64, it is placed in the generic
  locations, with the intention of extending it also to arm64
- the current section used for collecting wr-after-init data might need to
  be moved, to work with arm64 MMU
- the functionality is in its own c and h files, for now, to ease the
  introduction (and refactoring) of code dealing with dynamic allocation
- recently some updated patches were posted for live-patch on arm64 [2],
  they might help with adding arm64 support here
- to avoid the risk of weakening __ro_after_init, __wr_after_init data is
  in a separate set of pages, and any invocation will confirm that the
  memory affected falls within this range.
  I have modified rodata_test accordingly, to check als othis case.
- to avoid replicating the code which does the change of mapping, there is
  only one function performing multiple, selectable, operations, such as
  memcpy(), memset(). I have added also rcu_assign_pointer() as further
  example. But I'm not too fond of this implementation either. I just
  couldn't think of any that I would like significantly better.
- I have left out the patchset from Nadav that these patches depend on,
  but it can be found here [3] (Should have I resubmitted it?)
- I am not sure what is the correct form for giving proper credit wrt the
  authoring of the wr_after_init mechanism, guidance would be appreciated
- In an attempt to spam less people, I have curbed the list of recipients.
  If I have omitted someone who should have been kept/added, please
  add them to the thread.


[1] https://www.openwall.com/lists/kernel-hardening/2018/11/22/8
[2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793199.html
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1810245.html

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org



Igor Stoppa (6):
[PATCH 1/6] __wr_after_init: linker section and label
[PATCH 2/6] __wr_after_init: write rare for static allocation
[PATCH 3/6] rodata_test: refactor tests
[PATCH 4/6] rodata_test: add verification for __wr_after_init
[PATCH 5/6] __wr_after_init: test write rare functionality
[PATCH 6/6] __wr_after_init: lkdtm test

drivers/misc/lkdtm/core.c |   3 +
drivers/misc/lkdtm/lkdtm.h|   3 +
drivers/misc/lkdtm/perms.c|  29 
include/asm-generic/vmlinux.lds.h |  20 ++
include/linux/cache.h |  17 +
include/linux/prmem.h | 134 +
init/main.c   |   2 +
mm/Kconfig|   4 ++
mm/Kconfig.debug  |   9 +++
mm/Makefile   |   2 +
mm/prmem.c| 124 ++
mm/rodata_test.c  |  63 --
mm/test_write_rare.c  | 135 ++
13 files changed, 525 insertions(+), 20 deletions(-)





[PATCH 1/6] __wr_after_init: linker section and label

2018-12-04 Thread Igor Stoppa
Introduce a section and a label for statically allocated write rare
data. The label is named "__wr_after_init".
As the name implies, after the init phase is completed, this section
will be modifiable only by invoking write rare functions.
The section must take up a set of full pages.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/asm-generic/vmlinux.lds.h | 20 
 include/linux/cache.h | 17 +
 2 files changed, 37 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 3d7a6a9c2370..b711dbe6999f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -311,6 +311,25 @@
KEEP(*(__jump_table))   \
__stop___jump_table = .;
 
+/*
+ * Allow architectures to handle wr_after_init data on their
+ * own by defining an empty WR_AFTER_INIT_DATA.
+ * However, it's important that pages containing WR_RARE data do not
+ * hold anything else, to avoid both accidentally unprotecting something
+ * that is supposed to stay read-only all the time and also to protect
+ * something else that is supposed to be writeable all the time.
+ */
+#ifndef WR_AFTER_INIT_DATA
+#define WR_AFTER_INIT_DATA(align)  \
+   . = ALIGN(PAGE_SIZE);   \
+   __start_wr_after_init = .;  \
+   . = ALIGN(align);   \
+   *(.data..wr_after_init) \
+   . = ALIGN(PAGE_SIZE);   \
+   __end_wr_after_init = .;\
+   . = ALIGN(align);
+#endif
+
 /*
  * Allow architectures to handle ro_after_init data on their
  * own by defining an empty RO_AFTER_INIT_DATA.
@@ -332,6 +351,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
RO_AFTER_INIT_DATA  /* Read only after init */  \
+   WR_AFTER_INIT_DATA(align) /* wr after init */   \
KEEP(*(__vermagic)) /* Kernel version magic */  \
. = ALIGN(8);   \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 750621e41d1c..9a7e7134b887 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -31,6 +31,23 @@
 #define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
 #endif
 
+/*
+ * __wr_after_init is used to mark objects that cannot be modified
+ * directly after init (i.e. after mark_rodata_ro() has been called).
+ * These objects become effectively read-only, from the perspective of
+ * performing a direct write, like a variable assignment.
+ * However, they can be altered through a dedicated function.
+ * It is intended for those objects which are occasionally modified after
+ * init, however they are modified so seldomly, that the extra cost from
+ * the indirect modification is either negligible or worth paying, for the
+ * sake of the protection gained.
+ */
+#ifndef __wr_after_init
+#define __wr_after_init \
+   __attribute__((__section__(".data..wr_after_init")))
+#endif
+
+
 #ifndef cacheline_aligned
 #define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
 #endif
-- 
2.19.1



[PATCH 3/6] rodata_test: refactor tests

2018-12-04 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..3c1e515ca9b1 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   if (test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata))
+   pr_info("all tests were successful\n");
 }
-- 
2.19.1



[PATCH 3/6] rodata_test: refactor tests

2018-12-04 Thread Igor Stoppa
Refactor the test cases, in preparation for using them also for testing
__wr_after_init memory.

Signed-off-by: Igor Stoppa 

CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Matthew Wilcox 
CC: Peter Zijlstra 
CC: Kees Cook 
CC: Dave Hansen 
CC: linux-integr...@vger.kernel.org
CC: kernel-harden...@lists.openwall.com
CC: linux...@kvack.org
CC: linux-kernel@vger.kernel.org
---
 mm/rodata_test.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index d908c8769b48..3c1e515ca9b1 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -14,44 +14,52 @@
 #include 
 #include 
 
-static const int rodata_test_data = 0xC3;
+#define INIT_TEST_VAL 0xC3
 
-void rodata_test(void)
+static const int rodata_test_data = INIT_TEST_VAL;
+
+static bool test_data(char *data_type, const int *data,
+ unsigned long start, unsigned long end)
 {
-   unsigned long start, end;
int zero = 0;
 
/* test 1: read the value */
/* If this test fails, some previous testrun has clobbered the state */
-   if (!rodata_test_data) {
-   pr_err("test 1 fails (start data)\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test 1 fails (init data value)\n", data_type);
+   return false;
}
 
/* test 2: write to the variable; this should fault */
-   if (!probe_kernel_write((void *)_test_data,
-   (void *), sizeof(zero))) {
-   pr_err("test data was not read only\n");
-   return;
+   if (!probe_kernel_write((void *)data, (void *), sizeof(zero))) {
+   pr_err("%s: test data was not read only\n", data_type);
+   return false;
}
 
/* test 3: check the value hasn't changed */
-   if (rodata_test_data == zero) {
-   pr_err("test data was changed\n");
-   return;
+   if (*data != INIT_TEST_VAL) {
+   pr_err("%s: test data was changed\n", data_type);
+   return false;
}
 
/* test 4: check if the rodata section is PAGE_SIZE aligned */
-   start = (unsigned long)__start_rodata;
-   end = (unsigned long)__end_rodata;
if (start & (PAGE_SIZE - 1)) {
-   pr_err("start of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: start of data is not page size aligned\n",
+  data_type);
+   return false;
}
if (end & (PAGE_SIZE - 1)) {
-   pr_err("end of .rodata is not page size aligned\n");
-   return;
+   pr_err("%s: end of data is not page size aligned\n",
+  data_type);
+   return false;
}
+   return true;
+}
 
-   pr_info("all tests were successful\n");
+void rodata_test(void)
+{
+   if (test_data("rodata", _test_data,
+ (unsigned long)&__start_rodata,
+ (unsigned long)&__end_rodata))
+   pr_info("all tests were successful\n");
 }
-- 
2.19.1



Re: [PATCH 10/17] prmem: documentation

2018-11-21 Thread Igor Stoppa

Hi,

On 13/11/2018 20:36, Andy Lutomirski wrote:

On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:


I forgot one sentence :-(

On 13/11/2018 20:31, Igor Stoppa wrote:

On 13/11/2018 19:47, Andy Lutomirski wrote:


For general rare-writish stuff, I don't think we want IRQs running
with them mapped anywhere for write.  For AVC and IMA, I'm less sure.


Why would these be less sensitive?

But I see a big difference between my initial implementation and this one.

In my case, by using a shared mapping, visible to all cores, freezing
the core that is performing the write would have exposed the writable
mapping to a potential attack run from another core.

If the mapping is private to the core performing the write, even if it
is frozen, it's much harder to figure out what it had mapped and where,
from another core.

To access that mapping, the attack should be performed from the ISR, I
think.


Unless the secondary mapping is also available to other cores, through
the shared mm_struct ?



I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.



I'm still mulling over this.
There might be other reasons for replicating the mm_struct.

If I understand correctly how the text patching works, it happens 
sequentially, because of the text_mutex used by arch_jump_label_transform


Which might be fine for this specific case, but I think I shouldn't 
introduce a global mutex, when it comes to data.
Most likely, if two or more cores want to perform a write rare 
operation, there is no correlation between them, they could proceed in 
parallel. And if there really is, then the user of the API should 
introduce own locking, for that specific case.


A bit unrelated question, related to text patching: I see that each 
patching operation is validated, but wouldn't it be more robust to first 
validate  all of then, and only after they are all found to be 
compliant, to proceed with the actual modifications?


And about the actual implementation of the write rare for the statically 
allocated variables, is it expected that I use Nadav's function?

Or that I refactor the code?

The name, referring to text would definitely not be ok for data.
And I would have to also generalize it, to deal with larger amounts of data.

I would find it easier, as first cut, to replicate its behavior and 
refactor only later, once it has stabilized and possibly Nadav's patches 
have been acked.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-21 Thread Igor Stoppa

Hi,

On 13/11/2018 20:36, Andy Lutomirski wrote:

On Tue, Nov 13, 2018 at 10:33 AM Igor Stoppa  wrote:


I forgot one sentence :-(

On 13/11/2018 20:31, Igor Stoppa wrote:

On 13/11/2018 19:47, Andy Lutomirski wrote:


For general rare-writish stuff, I don't think we want IRQs running
with them mapped anywhere for write.  For AVC and IMA, I'm less sure.


Why would these be less sensitive?

But I see a big difference between my initial implementation and this one.

In my case, by using a shared mapping, visible to all cores, freezing
the core that is performing the write would have exposed the writable
mapping to a potential attack run from another core.

If the mapping is private to the core performing the write, even if it
is frozen, it's much harder to figure out what it had mapped and where,
from another core.

To access that mapping, the attack should be performed from the ISR, I
think.


Unless the secondary mapping is also available to other cores, through
the shared mm_struct ?



I don't think this matters much.  The other cores will only be able to
use that mapping when they're doing a rare write.



I'm still mulling over this.
There might be other reasons for replicating the mm_struct.

If I understand correctly how the text patching works, it happens 
sequentially, because of the text_mutex used by arch_jump_label_transform


Which might be fine for this specific case, but I think I shouldn't 
introduce a global mutex, when it comes to data.
Most likely, if two or more cores want to perform a write rare 
operation, there is no correlation between them, they could proceed in 
parallel. And if there really is, then the user of the API should 
introduce own locking, for that specific case.


A bit unrelated question, related to text patching: I see that each 
patching operation is validated, but wouldn't it be more robust to first 
validate  all of then, and only after they are all found to be 
compliant, to proceed with the actual modifications?


And about the actual implementation of the write rare for the statically 
allocated variables, is it expected that I use Nadav's function?

Or that I refactor the code?

The name, referring to text would definitely not be ok for data.
And I would have to also generalize it, to deal with larger amounts of data.

I would find it easier, as first cut, to replicate its behavior and 
refactor only later, once it has stabilized and possibly Nadav's patches 
have been acked.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Igor Stoppa
On 13/11/2018 19:16, Andy Lutomirski wrote:
> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:

[...]

>> How about having one mm_struct for each writer (core or thread)?
>>
> 
> I don't think that helps anything.  I think the mm_struct used for
> prmem (or rare_write or whatever you want to call it)

write_rare / rarely can be shortened to wr_  which is kinda less
confusing than rare_write, since it would become rw_ and easier to
confuse with R/W

Any advice for better naming is welcome.

> should be
> entirely abstracted away by an appropriate API, so neither SELinux nor
> IMA need to be aware that there's an mm_struct involved.

Yes, that is fine. In my proposal I was thinking about tying it to the
core/thread that performs the actual write.

The high level API could be something like:

wr_memcpy(void *src, void *dst, uint_t size)

>  It's also
> entirely possible that some architectures won't even use an mm_struct
> behind the scenes -- x86, for example, could have avoided it if there
> were a kernel equivalent of PKRU.  Sadly, there isn't.

The mm_struct - or whatever is the means to do the write on that
architecture - can be kept hidden from the API.

But the reason why I was proposing to have one mm_struct per writer is
that, iiuc, the secondary mapping is created in the secondary mm_struct
for each writer using it.

So the updating of IMA measurements would have, theoretically, also
write access to the SELinux AVC. Which I was trying to avoid.
And similarly any other write rare updater. Is this correct?

>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>> the patch might spill across the page boundary, however if I deal with
>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>> the data will not span across multiple pages.
> 
> The reason for the particular architecture of text_poke() is to avoid
> memory allocation to get it working.  i think that prmem/rare_write
> should have each rare-writable kernel address map to a unique user
> address, possibly just by offsetting everything by a constant.  For
> rare_write, you don't actually need it to work as such until fairly
> late in boot, since the rare_writable data will just be writable early
> on.

Yes, that is true. I think it's safe to assume, from an attack pattern,
that as long as user space is not started, the system can be considered
ok. Even user-space code run from initrd should be ok, since it can be
bundled (and signed) as a single binary with the kernel.

Modules loaded from a regular filesystem are a bit more risky, because
an attack might inject a rogue key in the key-ring and use it to load
malicious modules.

>> If the data spans across multiple pages, in unknown amount, I suppose
>> that I should not keep interrupts disabled for an unknown time, as it
>> would hurt preemption.
>>
>> What I thought, in my initial patch-set, was to iterate over each page
>> that must be written to, in a loop, re-enabling interrupts in-between
>> iterations, to give pending interrupts a chance to be served.
>>
>> This would mean that the data being written to would not be consistent,
>> but it's a problem that would have to be addressed anyways, since it can
>> be still read by other cores, while the write is ongoing.
> 
> This probably makes sense, except that enabling and disabling
> interrupts means you also need to restore the original mm_struct (most
> likely), which is slow.  I don't think there's a generic way to check
> whether in interrupt is pending without turning interrupts on.

The only "excuse" I have is that write_rare is opt-in and is "rare".
Maybe the enabling/disabling of interrupts - and the consequent switch
of mm_struct - could be somehow tied to the latency configuration?

If preemption is disabled, the expectations on the system latency are
anyway more relaxed.

But I'm not sure how it would work against I/O.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-11-13 Thread Igor Stoppa
On 13/11/2018 19:16, Andy Lutomirski wrote:
> On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa  wrote:

[...]

>> How about having one mm_struct for each writer (core or thread)?
>>
> 
> I don't think that helps anything.  I think the mm_struct used for
> prmem (or rare_write or whatever you want to call it)

write_rare / rarely can be shortened to wr_  which is kinda less
confusing than rare_write, since it would become rw_ and easier to
confuse with R/W

Any advice for better naming is welcome.

> should be
> entirely abstracted away by an appropriate API, so neither SELinux nor
> IMA need to be aware that there's an mm_struct involved.

Yes, that is fine. In my proposal I was thinking about tying it to the
core/thread that performs the actual write.

The high level API could be something like:

wr_memcpy(void *src, void *dst, uint_t size)

>  It's also
> entirely possible that some architectures won't even use an mm_struct
> behind the scenes -- x86, for example, could have avoided it if there
> were a kernel equivalent of PKRU.  Sadly, there isn't.

The mm_struct - or whatever is the means to do the write on that
architecture - can be kept hidden from the API.

But the reason why I was proposing to have one mm_struct per writer is
that, iiuc, the secondary mapping is created in the secondary mm_struct
for each writer using it.

So the updating of IMA measurements would have, theoretically, also
write access to the SELinux AVC. Which I was trying to avoid.
And similarly any other write rare updater. Is this correct?

>> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
>> the patch might spill across the page boundary, however if I deal with
>> the modification of generic data, I shouldn't (shouldn't I?) assume that
>> the data will not span across multiple pages.
> 
> The reason for the particular architecture of text_poke() is to avoid
> memory allocation to get it working.  i think that prmem/rare_write
> should have each rare-writable kernel address map to a unique user
> address, possibly just by offsetting everything by a constant.  For
> rare_write, you don't actually need it to work as such until fairly
> late in boot, since the rare_writable data will just be writable early
> on.

Yes, that is true. I think it's safe to assume, from an attack pattern,
that as long as user space is not started, the system can be considered
ok. Even user-space code run from initrd should be ok, since it can be
bundled (and signed) as a single binary with the kernel.

Modules loaded from a regular filesystem are a bit more risky, because
an attack might inject a rogue key in the key-ring and use it to load
malicious modules.

>> If the data spans across multiple pages, in unknown amount, I suppose
>> that I should not keep interrupts disabled for an unknown time, as it
>> would hurt preemption.
>>
>> What I thought, in my initial patch-set, was to iterate over each page
>> that must be written to, in a loop, re-enabling interrupts in-between
>> iterations, to give pending interrupts a chance to be served.
>>
>> This would mean that the data being written to would not be consistent,
>> but it's a problem that would have to be addressed anyways, since it can
>> be still read by other cores, while the write is ongoing.
> 
> This probably makes sense, except that enabling and disabling
> interrupts means you also need to restore the original mm_struct (most
> likely), which is slow.  I don't think there's a generic way to check
> whether in interrupt is pending without turning interrupts on.

The only "excuse" I have is that write_rare is opt-in and is "rare".
Maybe the enabling/disabling of interrupts - and the consequent switch
of mm_struct - could be somehow tied to the latency configuration?

If preemption is disabled, the expectations on the system latency are
anyway more relaxed.

But I'm not sure how it would work against I/O.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Igor Stoppa




On 01/11/2018 01:19, Andy Lutomirski wrote:


ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.


It was my intention to provide a 1:1 conversion of existing code, as it 
should be easier to verify the correctness of the conversion, as long as 
there isn't any significant degradation in performance.


The rework could be done afterward.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-31 Thread Igor Stoppa




On 01/11/2018 01:19, Andy Lutomirski wrote:


ISTM you don't need that atomic operation -- you could take a spinlock
and then just add one directly to the variable.


It was my intention to provide a 1:1 conversion of existing code, as it 
should be easier to verify the correctness of the conversion, as long as 
there isn't any significant degradation in performance.


The rework could be done afterward.

--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-30 Thread Igor Stoppa




On 30/10/2018 23:02, Andy Lutomirski wrote:




On Oct 30, 2018, at 1:43 PM, Igor Stoppa  wrote:




There is no need to process each of these tens of thousands allocations and 
initialization as write-rare.

Would it be possible to do the same here?


I don’t see why not, although getting the API right will be a tad complicated.


yes, I have some first-hand experience with this :-/





To subsequently modify q,

p = rare_modify(q);
q->a = y;


Do you mean

 p->a = y;

here? I assume the intent is that q isn't writable ever, but that's
the one we have in the structure at rest.

Yes, that was my intent, thanks.
To handle the list case that Igor has pointed out, you might want to
do something like this:
list_for_each_entry(x, , entry) {
struct foo *writable = rare_modify(entry);


Would this mapping be impossible to spoof by other cores?



Indeed. Only the core with the special mm loaded could see it.

But I dislike allowing regular writes in the protected region. We really only 
need four write primitives:

1. Just write one value.  Call at any time (except NMI).

2. Just copy some bytes. Same as (1) but any number of bytes.

3,4: Same as 1 and 2 but must be called inside a special rare write region. 
This is purely an optimization.


Atomic? RCU?

Yes, they are technically just memory writes, but shouldn't the "normal" 
operation be executed on the writable mapping?


It is possible to sandwich any call between a rare_modify/rare_protect, 
however isn't that pretty close to having a write-rare version of these 
plain function.


--
igor


Re: [PATCH 10/17] prmem: documentation

2018-10-30 Thread Igor Stoppa




On 30/10/2018 23:02, Andy Lutomirski wrote:




On Oct 30, 2018, at 1:43 PM, Igor Stoppa  wrote:




There is no need to process each of these tens of thousands allocations and 
initialization as write-rare.

Would it be possible to do the same here?


I don’t see why not, although getting the API right will be a tad complicated.


yes, I have some first-hand experience with this :-/





To subsequently modify q,

p = rare_modify(q);
q->a = y;


Do you mean

 p->a = y;

here? I assume the intent is that q isn't writable ever, but that's
the one we have in the structure at rest.

Yes, that was my intent, thanks.
To handle the list case that Igor has pointed out, you might want to
do something like this:
list_for_each_entry(x, , entry) {
struct foo *writable = rare_modify(entry);


Would this mapping be impossible to spoof by other cores?



Indeed. Only the core with the special mm loaded could see it.

But I dislike allowing regular writes in the protected region. We really only 
need four write primitives:

1. Just write one value.  Call at any time (except NMI).

2. Just copy some bytes. Same as (1) but any number of bytes.

3,4: Same as 1 and 2 but must be called inside a special rare write region. 
This is purely an optimization.


Atomic? RCU?

Yes, they are technically just memory writes, but shouldn't the "normal" 
operation be executed on the writable mapping?


It is possible to sandwich any call between a rare_modify/rare_protect, 
however isn't that pretty close to having a write-rare version of these 
plain function.


--
igor


Build error in drivers/cpufreq/intel_pstate.c

2018-10-30 Thread Igor Stoppa

Hi,
I'm getting the following build error:

/home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In function 
‘show_base_frequency’:
/home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: 
error: implicit declaration of function 
‘intel_pstate_get_cppc_guranteed’; did you mean ‘intel_pstate_get_epp’? 
[-Werror=implicit-function-declaration]

  ratio = intel_pstate_get_cppc_guranteed(policy->cpu);
  ^~~
  intel_pstate_get_epp



on top of:

commit 11743c56785c751c087eecdb98713eef796609e0
Merge: 929e134c43c9 928002a5e9da

--
igor


Build error in drivers/cpufreq/intel_pstate.c

2018-10-30 Thread Igor Stoppa

Hi,
I'm getting the following build error:

/home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In function 
‘show_base_frequency’:
/home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: 
error: implicit declaration of function 
‘intel_pstate_get_cppc_guranteed’; did you mean ‘intel_pstate_get_epp’? 
[-Werror=implicit-function-declaration]

  ratio = intel_pstate_get_cppc_guranteed(policy->cpu);
  ^~~
  intel_pstate_get_epp



on top of:

commit 11743c56785c751c087eecdb98713eef796609e0
Merge: 929e134c43c9 928002a5e9da

--
igor


Re: [PATCH 16/17] prmem: pratomic-long

2018-10-29 Thread Igor Stoppa




On 25/10/2018 01:13, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote:

+static __always_inline
+bool __pratomic_long_op(bool inc, struct pratomic_long_t *l)
+{
+   struct page *page;
+   uintptr_t base;
+   uintptr_t offset;
+   unsigned long flags;
+   size_t size = sizeof(*l);
+   bool is_virt = __is_wr_after_init(l, size);
+
+   if (WARN(!(is_virt || likely(__is_wr_pool(l, size))),
+WR_ERR_RANGE_MSG))
+   return false;
+   local_irq_save(flags);
+   if (is_virt)
+   page = virt_to_page(l);
+   else
+   vmalloc_to_page(l);
+   offset = (~PAGE_MASK) & (uintptr_t)l;
+   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
+   if (WARN(!base, WR_ERR_PAGE_MSG)) {
+   local_irq_restore(flags);
+   return false;
+   }
+   if (inc)
+   atomic_long_inc((atomic_long_t *)(base + offset));
+   else
+   atomic_long_dec((atomic_long_t *)(base + offset));
+   vunmap((void *)base);
+   local_irq_restore(flags);
+   return true;
+
+}


That's just hideously nasty.. and horribly broken.

We're not going to duplicate all these kernel interfaces wrapped in gunk
like that. 


one possibility would be to have macros which use typeof() on the 
parameter being passed, to decide what implementation to use: regular or 
write-rare


This means that type punning would still be needed, to select the 
implementation.


Would this be enough? Is there some better way?


Also, you _cannot_ call vunmap() with IRQs disabled. Clearly
you've never tested this with debug bits enabled.


I thought I had them. And I _did_ have them enabled, at some point.
But I must have messed up with the configuration and I failed to notice 
this.


I can think of a way it might work, albeit it's not going to be very pretty:

* for the vmap(): if I understand correctly, it might sleep while 
obtaining memory for creating the mapping. This part could be executed 
before disabling interrupts. The rest of the function, instead, would be 
executed after interrupts are disabled.


* for vunmap(): after the writing is done, change also the alternate 
mapping to read only, then enable interrupts and destroy the alternate 
mapping. Making also the secondary mapping read only makes it equally 
secure as the primary, which means that it can be visible also with 
interrupts enabled.



--
igor


Re: [PATCH 16/17] prmem: pratomic-long

2018-10-29 Thread Igor Stoppa




On 25/10/2018 01:13, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote:

+static __always_inline
+bool __pratomic_long_op(bool inc, struct pratomic_long_t *l)
+{
+   struct page *page;
+   uintptr_t base;
+   uintptr_t offset;
+   unsigned long flags;
+   size_t size = sizeof(*l);
+   bool is_virt = __is_wr_after_init(l, size);
+
+   if (WARN(!(is_virt || likely(__is_wr_pool(l, size))),
+WR_ERR_RANGE_MSG))
+   return false;
+   local_irq_save(flags);
+   if (is_virt)
+   page = virt_to_page(l);
+   else
+   vmalloc_to_page(l);
+   offset = (~PAGE_MASK) & (uintptr_t)l;
+   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
+   if (WARN(!base, WR_ERR_PAGE_MSG)) {
+   local_irq_restore(flags);
+   return false;
+   }
+   if (inc)
+   atomic_long_inc((atomic_long_t *)(base + offset));
+   else
+   atomic_long_dec((atomic_long_t *)(base + offset));
+   vunmap((void *)base);
+   local_irq_restore(flags);
+   return true;
+
+}


That's just hideously nasty.. and horribly broken.

We're not going to duplicate all these kernel interfaces wrapped in gunk
like that. 


one possibility would be to have macros which use typeof() on the 
parameter being passed, to decide what implementation to use: regular or 
write-rare


This means that type punning would still be needed, to select the 
implementation.


Would this be enough? Is there some better way?


Also, you _cannot_ call vunmap() with IRQs disabled. Clearly
you've never tested this with debug bits enabled.


I thought I had them. And I _did_ have them enabled, at some point.
But I must have messed up with the configuration and I failed to notice 
this.


I can think of a way it might work, albeit it's not going to be very pretty:

* for the vmap(): if I understand correctly, it might sleep while 
obtaining memory for creating the mapping. This part could be executed 
before disabling interrupts. The rest of the function, instead, would be 
executed after interrupts are disabled.


* for vunmap(): after the writing is done, change also the alternate 
mapping to read only, then enable interrupts and destroy the alternate 
mapping. Making also the secondary mapping read only makes it equally 
secure as the primary, which means that it can be visible also with 
interrupts enabled.



--
igor


Re: [PATCH 02/17] prmem: write rare for static allocation

2018-10-29 Thread Igor Stoppa




On 26/10/2018 10:41, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 12:34:49AM +0300, Igor Stoppa wrote:

+static __always_inline


That's far too large for inline.


The reason for it is that it's supposed to minimize the presence of 
gadgets that might be used in JOP attacks.
I am ready to stand corrected, if I'm wrong, but this is the reason why 
I did it.


Regarding the function being too large, yes, I would not normally choose 
it for inlining.


Actually, I would not normally use "__always_inline" and instead I would 
limit myself to plain "inline", at most.





+bool wr_memset(const void *dst, const int c, size_t n_bytes)
+{
+   size_t size;
+   unsigned long flags;
+   uintptr_t d = (uintptr_t)dst;
+
+   if (WARN(!__is_wr_after_init(dst, n_bytes), WR_ERR_RANGE_MSG))
+   return false;
+   while (n_bytes) {
+   struct page *page;
+   uintptr_t base;
+   uintptr_t offset;
+   uintptr_t offset_complement;
+
+   local_irq_save(flags);
+   page = virt_to_page(d);
+   offset = d & ~PAGE_MASK;
+   offset_complement = PAGE_SIZE - offset;
+   size = min(n_bytes, offset_complement);
+   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
+   if (WARN(!base, WR_ERR_PAGE_MSG)) {
+   local_irq_restore(flags);
+   return false;
+   }
+   memset((void *)(base + offset), c, size);
+   vunmap((void *)base);


BUG


yes, somehow I managed to drop this debug configuration from the debug 
builds I made.


[...]


Also, I see an amount of duplication here that shows you're not nearly
lazy enough.


I did notice a certain amount of duplication, but I didn't know how to 
exploit it.


--
igor


Re: [PATCH 02/17] prmem: write rare for static allocation

2018-10-29 Thread Igor Stoppa




On 26/10/2018 10:41, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 12:34:49AM +0300, Igor Stoppa wrote:

+static __always_inline


That's far too large for inline.


The reason for it is that it's supposed to minimize the presence of 
gadgets that might be used in JOP attacks.
I am ready to stand corrected, if I'm wrong, but this is the reason why 
I did it.


Regarding the function being too large, yes, I would not normally choose 
it for inlining.


Actually, I would not normally use "__always_inline" and instead I would 
limit myself to plain "inline", at most.





+bool wr_memset(const void *dst, const int c, size_t n_bytes)
+{
+   size_t size;
+   unsigned long flags;
+   uintptr_t d = (uintptr_t)dst;
+
+   if (WARN(!__is_wr_after_init(dst, n_bytes), WR_ERR_RANGE_MSG))
+   return false;
+   while (n_bytes) {
+   struct page *page;
+   uintptr_t base;
+   uintptr_t offset;
+   uintptr_t offset_complement;
+
+   local_irq_save(flags);
+   page = virt_to_page(d);
+   offset = d & ~PAGE_MASK;
+   offset_complement = PAGE_SIZE - offset;
+   size = min(n_bytes, offset_complement);
+   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
+   if (WARN(!base, WR_ERR_PAGE_MSG)) {
+   local_irq_restore(flags);
+   return false;
+   }
+   memset((void *)(base + offset), c, size);
+   vunmap((void *)base);


BUG


yes, somehow I managed to drop this debug configuration from the debug 
builds I made.


[...]


Also, I see an amount of duplication here that shows you're not nearly
lazy enough.


I did notice a certain amount of duplication, but I didn't know how to 
exploit it.


--
igor


  1   2   3   4   5   6   7   8   9   >