Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

2024-04-18 Thread Mike Rapoport
On Tue, Apr 16, 2024 at 09:52:34AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote:
> > On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
> > > 
> > > > To populate the cache, a writable large page is allocated from vmalloc 
> > > > with
> > > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped 
> > > > as
> > > > ROX.
> > > 
> > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > > > +{
> > > > +   if (execmem_info->invalidate)
> > > > +   execmem_info->invalidate(ptr, size, writable);
> > > > +   else
> > > > +   memset(ptr, 0, size);
> > > > +}
> > > 
> > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> > > +{
> > > +   /* fill memory with INT3 instructions */
> > > +   if (writeable)
> > > +   memset(ptr, 0xcc, size);
> > > +   else
> > > +   text_poke_set(ptr, 0xcc, size);
> > > +}
> > > 
> > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> > > It raises #BP not #UD.
> > 
> > Do you mean that _invalidate is a poor name choice or that it's necessary
> > to use an instruction that raises #UD?
> 
> Poor naming, mostly. #BP handler will still scream bloody murder if the
> site is otherwise unclaimed.
> 
> It just isn't an invalid instruction.

Well, execmem_fill_with_insns_screaming_bloody_murder seems too long, how
about execmem_fill_trapping_insns?

-- 
Sincerely yours,
Mike.



Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

2024-04-16 Thread Peter Zijlstra
On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote:
> On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
> > 
> > > To populate the cache, a writable large page is allocated from vmalloc 
> > > with
> > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > > ROX.
> > 
> > > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > > +{
> > > + if (execmem_info->invalidate)
> > > + execmem_info->invalidate(ptr, size, writable);
> > > + else
> > > + memset(ptr, 0, size);
> > > +}
> > 
> > +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> > +{
> > +   /* fill memory with INT3 instructions */
> > +   if (writeable)
> > +   memset(ptr, 0xcc, size);
> > +   else
> > +   text_poke_set(ptr, 0xcc, size);
> > +}
> > 
> > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> > It raises #BP not #UD.
> 
> Do you mean that _invalidate is a poor name choice or that it's necessary
> to use an instruction that raises #UD?

Poor naming, mostly. #BP handler will still scream bloody murder if the
site is otherwise unclaimed.

It just isn't an invalid instruction.



Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

2024-04-15 Thread Mike Rapoport
On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
> 
> > To populate the cache, a writable large page is allocated from vmalloc with
> > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > ROX.
> 
> > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > +{
> > +   if (execmem_info->invalidate)
> > +   execmem_info->invalidate(ptr, size, writable);
> > +   else
> > +   memset(ptr, 0, size);
> > +}
> 
> +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> +{
> +   /* fill memory with INT3 instructions */
> +   if (writeable)
> +   memset(ptr, 0xcc, size);
> +   else
> +   text_poke_set(ptr, 0xcc, size);
> +}
> 
> Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> It raises #BP not #UD.

Do you mean that _invalidate is a poor name choice or that it's necessary
to use an instruction that raises #UD?

-- 
Sincerely yours,
Mike.



Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

2024-04-15 Thread Peter Zijlstra
On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:

> To populate the cache, a writable large page is allocated from vmalloc with
> VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> ROX.

> +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> +{
> + if (execmem_info->invalidate)
> + execmem_info->invalidate(ptr, size, writable);
> + else
> + memset(ptr, 0, size);
> +}

+static void execmem_invalidate(void *ptr, size_t size, bool writeable)
+{
+   /* fill memory with INT3 instructions */
+   if (writeable)
+   memset(ptr, 0xcc, size);
+   else
+   text_poke_set(ptr, 0xcc, size);
+}

Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
It raises #BP not #UD.



[RFC PATCH 6/7] execmem: add support for cache of large ROX pages

2024-04-11 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Using large pages to map text areas reduces iTLB pressure and improves
performance.

Extend execmem_alloc() with an ability to use PMD_SIZE'ed pages with ROX
permissions as a cache for smaller allocations.

To populate the cache, a writable large page is allocated from vmalloc with
VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
ROX.

Portions of that large page are handed out to execmem_alloc() callers
without any changes to the permissions.

When the memory is freed with execmem_free() it is invalidated again so
that it won't contain stale instructions.

The cache is enabled when an architecture sets EXECMEM_ROX_CACHE flag in
definition of an execmem_range.

Signed-off-by: Mike Rapoport (IBM) 
---
 include/linux/execmem.h |   2 +
 mm/execmem.c| 267 ++--
 2 files changed, 262 insertions(+), 7 deletions(-)

diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 9d22999dbd7d..06f678e6fe55 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -77,12 +77,14 @@ struct execmem_range {
 
 /**
  * struct execmem_info - architecture parameters for code allocations
+ * @invalidate: set memory to contain invalid instructions
  * @ranges: array of parameter sets defining architecture specific
  * parameters for executable memory allocations. The ranges that are not
  * explicitly initialized by an architecture use parameters defined for
  * @EXECMEM_DEFAULT.
  */
 struct execmem_info {
+   void (*invalidate)(void *ptr, size_t size, bool writable);
struct execmem_rangeranges[EXECMEM_TYPE_MAX];
 };
 
diff --git a/mm/execmem.c b/mm/execmem.c
index c920d2b5a721..716fba68ab0e 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -1,30 +1,88 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
+#include 
+
+#include "internal.h"
+
 static struct execmem_info *execmem_info __ro_after_init;
 static struct execmem_info default_execmem_info __ro_after_init;
 
-static void *__execmem_alloc(struct execmem_range *range, size_t size)
+struct execmem_cache {
+   struct mutex mutex;
+   struct maple_tree busy_areas;
+   struct maple_tree free_areas;
+};
+
+static struct execmem_cache execmem_cache = {
+   .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex),
+   .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN,
+execmem_cache.mutex),
+   .free_areas = MTREE_INIT_EXT(free_areas, MT_FLAGS_LOCK_EXTERN,
+execmem_cache.mutex),
+};
+
+static void execmem_cache_clean(struct work_struct *work)
+{
+   struct maple_tree *free_areas = _cache.free_areas;
+   struct mutex *mutex = _cache.mutex;
+   MA_STATE(mas, free_areas, 0, ULONG_MAX);
+   void *area;
+
+   mutex_lock(mutex);
+   mas_for_each(, area, ULONG_MAX) {
+   size_t size;
+
+   if (!xa_is_value(area))
+   continue;
+
+   size = xa_to_value(area);
+
+   if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, 
PMD_SIZE)) {
+   void *ptr = (void *)mas.index;
+
+   mas_erase();
+   vfree(ptr);
+   }
+   }
+   mutex_unlock(mutex);
+}
+
+static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean);
+
+static void execmem_invalidate(void *ptr, size_t size, bool writable)
+{
+   if (execmem_info->invalidate)
+   execmem_info->invalidate(ptr, size, writable);
+   else
+   memset(ptr, 0, size);
+}
+
+static void *execmem_vmalloc(struct execmem_range *range, size_t size,
+pgprot_t pgprot, unsigned long vm_flags)
 {
bool kasan = range->flags & EXECMEM_KASAN_SHADOW;
-   unsigned long vm_flags  = VM_FLUSH_RESET_PERMS;
gfp_t gfp_flags = GFP_KERNEL | __GFP_NOWARN;
+   unsigned int align = range->alignment;
unsigned long start = range->start;
unsigned long end = range->end;
-   unsigned int align = range->alignment;
-   pgprot_t pgprot = range->pgprot;
void *p;
 
if (kasan)
vm_flags |= VM_DEFER_KMEMLEAK;
 
-   p = __vmalloc_node_range(size, align, start, end, gfp_flags,
-pgprot, vm_flags, NUMA_NO_NODE,
+   if (vm_flags & VM_ALLOW_HUGE_VMAP)
+   align = PMD_SIZE;
+
+   p = __vmalloc_node_range(size, align, start, end, gfp_flags, pgprot,
+vm_flags, NUMA_NO_NODE,
 __builtin_return_address(0));
if (!p && range->fallback_start) {
start = range->fallback_start;
@@ -44,6 +102,199 @@ static void *__execmem_alloc(struct execmem_range *range, 
size_t size)
return NULL;
}
 
+   return p;
+}
+
+static int