Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages
On Tue, Apr 16, 2024 at 09:52:34AM +0200, Peter Zijlstra wrote: > On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote: > > On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote: > > > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote: > > > > > > > To populate the cache, a writable large page is allocated from vmalloc > > > > with > > > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped > > > > as > > > > ROX. > > > > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable) > > > > +{ > > > > + if (execmem_info->invalidate) > > > > + execmem_info->invalidate(ptr, size, writable); > > > > + else > > > > + memset(ptr, 0, size); > > > > +} > > > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable) > > > +{ > > > + /* fill memory with INT3 instructions */ > > > + if (writeable) > > > + memset(ptr, 0xcc, size); > > > + else > > > + text_poke_set(ptr, 0xcc, size); > > > +} > > > > > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction. > > > It raises #BP not #UD. > > > > Do you mean that _invalidate is a poor name choice or that it's necessary > > to use an instruction that raises #UD? > > Poor naming, mostly. #BP handler will still scream bloody murder if the > site is otherwise unclaimed. > > It just isn't an invalid instruction. Well, execmem_fill_with_insns_screaming_bloody_murder seems too long, how about execmem_fill_trapping_insns? -- Sincerely yours, Mike.
Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages
On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote: > On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote: > > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote: > > > > > To populate the cache, a writable large page is allocated from vmalloc > > > with > > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as > > > ROX. > > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable) > > > +{ > > > + if (execmem_info->invalidate) > > > + execmem_info->invalidate(ptr, size, writable); > > > + else > > > + memset(ptr, 0, size); > > > +} > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable) > > +{ > > + /* fill memory with INT3 instructions */ > > + if (writeable) > > + memset(ptr, 0xcc, size); > > + else > > + text_poke_set(ptr, 0xcc, size); > > +} > > > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction. > > It raises #BP not #UD. > > Do you mean that _invalidate is a poor name choice or that it's necessary > to use an instruction that raises #UD? Poor naming, mostly. #BP handler will still scream bloody murder if the site is otherwise unclaimed. It just isn't an invalid instruction.
Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages
On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote: > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote: > > > To populate the cache, a writable large page is allocated from vmalloc with > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as > > ROX. > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable) > > +{ > > + if (execmem_info->invalidate) > > + execmem_info->invalidate(ptr, size, writable); > > + else > > + memset(ptr, 0, size); > > +} > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable) > +{ > + /* fill memory with INT3 instructions */ > + if (writeable) > + memset(ptr, 0xcc, size); > + else > + text_poke_set(ptr, 0xcc, size); > +} > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction. > It raises #BP not #UD. Do you mean that _invalidate is a poor name choice or that it's necessary to use an instruction that raises #UD? -- Sincerely yours, Mike.
Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages
On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote: > To populate the cache, a writable large page is allocated from vmalloc with > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as > ROX. > +static void execmem_invalidate(void *ptr, size_t size, bool writable) > +{ > + if (execmem_info->invalidate) > + execmem_info->invalidate(ptr, size, writable); > + else > + memset(ptr, 0, size); > +} +static void execmem_invalidate(void *ptr, size_t size, bool writeable) +{ + /* fill memory with INT3 instructions */ + if (writeable) + memset(ptr, 0xcc, size); + else + text_poke_set(ptr, 0xcc, size); +} Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction. It raises #BP not #UD.
[RFC PATCH 6/7] execmem: add support for cache of large ROX pages
From: "Mike Rapoport (IBM)" Using large pages to map text areas reduces iTLB pressure and improves performance. Extend execmem_alloc() with an ability to use PMD_SIZE'ed pages with ROX permissions as a cache for smaller allocations. To populate the cache, a writable large page is allocated from vmalloc with VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as ROX. Portions of that large page are handed out to execmem_alloc() callers without any changes to the permissions. When the memory is freed with execmem_free() it is invalidated again so that it won't contain stale instructions. The cache is enabled when an architecture sets EXECMEM_ROX_CACHE flag in definition of an execmem_range. Signed-off-by: Mike Rapoport (IBM) --- include/linux/execmem.h | 2 + mm/execmem.c| 267 ++-- 2 files changed, 262 insertions(+), 7 deletions(-) diff --git a/include/linux/execmem.h b/include/linux/execmem.h index 9d22999dbd7d..06f678e6fe55 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -77,12 +77,14 @@ struct execmem_range { /** * struct execmem_info - architecture parameters for code allocations + * @invalidate: set memory to contain invalid instructions * @ranges: array of parameter sets defining architecture specific * parameters for executable memory allocations. The ranges that are not * explicitly initialized by an architecture use parameters defined for * @EXECMEM_DEFAULT. */ struct execmem_info { + void (*invalidate)(void *ptr, size_t size, bool writable); struct execmem_rangeranges[EXECMEM_TYPE_MAX]; }; diff --git a/mm/execmem.c b/mm/execmem.c index c920d2b5a721..716fba68ab0e 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -1,30 +1,88 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include +#include #include #include +#include + +#include "internal.h" + static struct execmem_info *execmem_info __ro_after_init; static struct execmem_info default_execmem_info __ro_after_init; -static void *__execmem_alloc(struct execmem_range *range, size_t size) +struct execmem_cache { + struct mutex mutex; + struct maple_tree busy_areas; + struct maple_tree free_areas; +}; + +static struct execmem_cache execmem_cache = { + .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex), + .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN, +execmem_cache.mutex), + .free_areas = MTREE_INIT_EXT(free_areas, MT_FLAGS_LOCK_EXTERN, +execmem_cache.mutex), +}; + +static void execmem_cache_clean(struct work_struct *work) +{ + struct maple_tree *free_areas = _cache.free_areas; + struct mutex *mutex = _cache.mutex; + MA_STATE(mas, free_areas, 0, ULONG_MAX); + void *area; + + mutex_lock(mutex); + mas_for_each(, area, ULONG_MAX) { + size_t size; + + if (!xa_is_value(area)) + continue; + + size = xa_to_value(area); + + if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, PMD_SIZE)) { + void *ptr = (void *)mas.index; + + mas_erase(); + vfree(ptr); + } + } + mutex_unlock(mutex); +} + +static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean); + +static void execmem_invalidate(void *ptr, size_t size, bool writable) +{ + if (execmem_info->invalidate) + execmem_info->invalidate(ptr, size, writable); + else + memset(ptr, 0, size); +} + +static void *execmem_vmalloc(struct execmem_range *range, size_t size, +pgprot_t pgprot, unsigned long vm_flags) { bool kasan = range->flags & EXECMEM_KASAN_SHADOW; - unsigned long vm_flags = VM_FLUSH_RESET_PERMS; gfp_t gfp_flags = GFP_KERNEL | __GFP_NOWARN; + unsigned int align = range->alignment; unsigned long start = range->start; unsigned long end = range->end; - unsigned int align = range->alignment; - pgprot_t pgprot = range->pgprot; void *p; if (kasan) vm_flags |= VM_DEFER_KMEMLEAK; - p = __vmalloc_node_range(size, align, start, end, gfp_flags, -pgprot, vm_flags, NUMA_NO_NODE, + if (vm_flags & VM_ALLOW_HUGE_VMAP) + align = PMD_SIZE; + + p = __vmalloc_node_range(size, align, start, end, gfp_flags, pgprot, +vm_flags, NUMA_NO_NODE, __builtin_return_address(0)); if (!p && range->fallback_start) { start = range->fallback_start; @@ -44,6 +102,199 @@ static void *__execmem_alloc(struct execmem_range *range, size_t size) return NULL; } + return p; +} + +static int