Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
Hi Masami and Mike, On Sat, Apr 20, 2024 at 2:11 AM Masami Hiramatsu wrote: [...] > > > > > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as > > > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe > > > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing > > > the "range" object, we will have to compare different range parameters to > > > check > > > we can share cached pages between module text and kprobe, which is not > > > efficient. Did I miss something? > > Song, thanks for trying to eplain. I think I need to explain why I used > module_alloc() originally. > > This depends on how kprobe features are implemented on the architecture, and > how much features are supported on kprobes. > > Because kprobe jump optimization and kprobe jump-back optimization need to > use a jump instruction to jump into the trampoline and jump back from the > trampoline directly, if the architecuture jmp instruction supports +-2GB range > like x86, it needs to allocate the trampoline buffer inside such address > space. > This requirement is similar to the modules (because module function needs to > call other functions in the kernel etc.), at least kprobes on x86 used > module_alloc(). > > However, if an architecture only supports breakpoint/trap based kprobe, > it does not need to consider whether the execmem is allocated. > > > > > We can always share large ROX pages as long as they are within the correct > > address space. The permissions for them are ROX and the alignment > > differences are due to KASAN and this is handled during allocation of the > > large page to refill the cache. __execmem_cache_alloc() only needs to limit > > the search for the address space of the range. > > So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it > should be configured for each arch. Especially, if it is only used for > searching parameter, it looks OK to me. Thanks for the explanation! I was thinking "we can have EXECMEM_KPROBE share the same parameters as EXECMEM_MODULE_TEXT for all architectures". But this thought is built on top of assumptions on future changes/improvements within multiple sub systems. At this moment, I have no objections moving forward with current execmem APIs. Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Sat, 20 Apr 2024 07:22:50 +0300 Mike Rapoport wrote: > On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote: > > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > > > > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > > > [...] > > > > > > > > > > > > > > [1] > > > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > > > > > > > For the ROX to work, we need different users (module text, kprobe, > > > > > > etc.) to have > > > > > > the same execmem_range. From [1]: > > > > > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, > > > > > > size_t size) > > > > > > { > > > > > > ... > > > > > >p = __execmem_cache_alloc(size); > > > > > >if (p) > > > > > >return p; > > > > > > err = execmem_cache_populate(range, size); > > > > > > ... > > > > > > } > > > > > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to > > > > > > work, > > > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > > > > > Actually, on x86 this will "just work" because everything shares the > > > > > same > > > > > address space :) > > > > > > > > > > The 2M pages in the cache will be in the modules space, so > > > > > __execmem_cache_alloc() will always return memory from that address > > > > > space. > > > > > > > > > > For other architectures this indeed needs to be fixed with passing the > > > > > range to __execmem_cache_alloc() and limiting search in the cache for > > > > > that > > > > > range. > > > > > > > > I think we at least need the "map to" concept (initially proposed by > > > > Thomas) > > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > > > the same range. > > > > > > Why? > > > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as > > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe > > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing > > the "range" object, we will have to compare different range parameters to > > check > > we can share cached pages between module text and kprobe, which is not > > efficient. Did I miss something? Song, thanks for trying to eplain. I think I need to explain why I used module_alloc() originally. This depends on how kprobe features are implemented on the architecture, and how much features are supported on kprobes. Because kprobe jump optimization and kprobe jump-back optimization need to use a jump instruction to jump into the trampoline and jump back from the trampoline directly, if the architecuture jmp instruction supports +-2GB range like x86, it needs to allocate the trampoline buffer inside such address space. This requirement is similar to the modules (because module function needs to call other functions in the kernel etc.), at least kprobes on x86 used module_alloc(). However, if an architecture only supports breakpoint/trap based kprobe, it does not need to consider whether the execmem is allocated. > > We can always share large ROX pages as long as they are within the correct > address space. The permissions for them are ROX and the alignment > differences are due to KASAN and this is handled during allocation of the > large page to refill the cache. __execmem_cache_alloc() only needs to limit > the search for the address space of the range. So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it should be configured for each arch. Especially, if it is only used for searching parameter, it looks OK to me. Thank you, > > And regardless, they way we deal with sharing of the cache can be sorted > out later. > > > Thanks, > > Song > > -- > Sincerely yours, > Mike. > -- Masami Hiramatsu (Google)
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote: > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > > [...] > > > > > > > > > > > > [1] > > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > > > > > For the ROX to work, we need different users (module text, kprobe, > > > > > etc.) to have > > > > > the same execmem_range. From [1]: > > > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t > > > > > size) > > > > > { > > > > > ... > > > > >p = __execmem_cache_alloc(size); > > > > >if (p) > > > > >return p; > > > > > err = execmem_cache_populate(range, size); > > > > > ... > > > > > } > > > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to > > > > > work, > > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > > > Actually, on x86 this will "just work" because everything shares the > > > > same > > > > address space :) > > > > > > > > The 2M pages in the cache will be in the modules space, so > > > > __execmem_cache_alloc() will always return memory from that address > > > > space. > > > > > > > > For other architectures this indeed needs to be fixed with passing the > > > > range to __execmem_cache_alloc() and limiting search in the cache for > > > > that > > > > range. > > > > > > I think we at least need the "map to" concept (initially proposed by > > > Thomas) > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > > the same range. > > > > Why? > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing > the "range" object, we will have to compare different range parameters to > check > we can share cached pages between module text and kprobe, which is not > efficient. Did I miss something? We can always share large ROX pages as long as they are within the correct address space. The permissions for them are ROX and the alignment differences are due to KASAN and this is handled during allocation of the large page to refill the cache. __execmem_cache_alloc() only needs to limit the search for the address space of the range. And regardless, they way we deal with sharing of the cache can be sorted out later. > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport wrote: > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > > [...] > > > > > > > > > > [1] > > > > > https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > > > For the ROX to work, we need different users (module text, kprobe, > > > > etc.) to have > > > > the same execmem_range. From [1]: > > > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t > > > > size) > > > > { > > > > ... > > > >p = __execmem_cache_alloc(size); > > > >if (p) > > > >return p; > > > > err = execmem_cache_populate(range, size); > > > > ... > > > > } > > > > > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > > > Actually, on x86 this will "just work" because everything shares the same > > > address space :) > > > > > > The 2M pages in the cache will be in the modules space, so > > > __execmem_cache_alloc() will always return memory from that address space. > > > > > > For other architectures this indeed needs to be fixed with passing the > > > range to __execmem_cache_alloc() and limiting search in the cache for that > > > range. > > > > I think we at least need the "map to" concept (initially proposed by Thomas) > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > > maps to EXECMEM_MODULE_TEXT, so that all these actually share > > the same range. > > Why? IIUC, we need to update __execmem_cache_alloc() to take a range pointer as input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing the "range" object, we will have to compare different range parameters to check we can share cached pages between module text and kprobe, which is not efficient. Did I miss something? Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote: > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: > [...] > > > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > > > For the ROX to work, we need different users (module text, kprobe, etc.) > > > to have > > > the same execmem_range. From [1]: > > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > > > { > > > ... > > >p = __execmem_cache_alloc(size); > > >if (p) > > >return p; > > > err = execmem_cache_populate(range, size); > > > ... > > > } > > > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > > we can only call execmem_cache_alloc() with one execmem_range. > > > > Actually, on x86 this will "just work" because everything shares the same > > address space :) > > > > The 2M pages in the cache will be in the modules space, so > > __execmem_cache_alloc() will always return memory from that address space. > > > > For other architectures this indeed needs to be fixed with passing the > > range to __execmem_cache_alloc() and limiting search in the cache for that > > range. > > I think we at least need the "map to" concept (initially proposed by Thomas) > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE > maps to EXECMEM_MODULE_TEXT, so that all these actually share > the same range. Why? > Does this make sense? > > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport wrote: [...] > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > > > For the ROX to work, we need different users (module text, kprobe, etc.) to > > have > > the same execmem_range. From [1]: > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > > { > > ... > >p = __execmem_cache_alloc(size); > >if (p) > >return p; > > err = execmem_cache_populate(range, size); > > ... > > } > > > > We are calling __execmem_cache_alloc() without range. For this to work, > > we can only call execmem_cache_alloc() with one execmem_range. > > Actually, on x86 this will "just work" because everything shares the same > address space :) > > The 2M pages in the cache will be in the modules space, so > __execmem_cache_alloc() will always return memory from that address space. > > For other architectures this indeed needs to be fixed with passing the > range to __execmem_cache_alloc() and limiting search in the cache for that > range. I think we at least need the "map to" concept (initially proposed by Thomas) to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE maps to EXECMEM_MODULE_TEXT, so that all these actually share the same range. Does this make sense? Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 19, 2024 at 08:54:40AM -0700, Song Liu wrote: > On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport wrote: > > > > On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > > > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > > > > > I'm looking at execmem_types more as definition of the > > > > > > > > consumers, maybe I > > > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > > > unnecessary complexity. IIUC, for most (if not all) archs, > > > > > > > ftrace, kprobe, > > > > > > > and bpf (and maybe also module text) all have the same > > > > > > > requirements. > > > > > > > Did I miss something? > > > > > > > > > > > > It's enough to have one architecture with different constrains for > > > > > > kprobes > > > > > > and bpf to warrant a type for each. > > > > > > > > > > AFAICT, some of these constraints can be changed without too much > > > > > work. > > > > > > > > But why? > > > > I honestly don't understand what are you trying to optimize here. A few > > > > lines of initialization in execmem_info? > > > > > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > > > harder for bpf and kprobe to share the same ROX page. In many use cases, > > > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > > > module text. It is not efficient if we have to allocate separate pages > > > for each > > > of these use cases. If this is not a problem, the current approach works. > > > > The caching of large ROX pages does not need to be per type. > > > > In the POC I've posted for caching of large ROX pages on x86 [1], the cache > > is > > global and to make kprobes and bpf use it it's enough to set a flag in > > execmem_info. > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > > For the ROX to work, we need different users (module text, kprobe, etc.) to > have > the same execmem_range. From [1]: > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size) > { > ... >p = __execmem_cache_alloc(size); >if (p) >return p; > err = execmem_cache_populate(range, size); > ... > } > > We are calling __execmem_cache_alloc() without range. For this to work, > we can only call execmem_cache_alloc() with one execmem_range. Actually, on x86 this will "just work" because everything shares the same address space :) The 2M pages in the cache will be in the modules space, so __execmem_cache_alloc() will always return memory from that address space. For other architectures this indeed needs to be fixed with passing the range to __execmem_cache_alloc() and limiting search in the cache for that range. > Did I miss something? > > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 11:56 PM Mike Rapoport wrote: > > On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > > > I'm looking at execmem_types more as definition of the consumers, > > > > > > > maybe I > > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, > > > > > > kprobe, > > > > > > and bpf (and maybe also module text) all have the same requirements. > > > > > > Did I miss something? > > > > > > > > > > It's enough to have one architecture with different constrains for > > > > > kprobes > > > > > and bpf to warrant a type for each. > > > > > > > > AFAICT, some of these constraints can be changed without too much work. > > > > > > But why? > > > I honestly don't understand what are you trying to optimize here. A few > > > lines of initialization in execmem_info? > > > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > > harder for bpf and kprobe to share the same ROX page. In many use cases, > > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > > module text. It is not efficient if we have to allocate separate pages for > > each > > of these use cases. If this is not a problem, the current approach works. > > The caching of large ROX pages does not need to be per type. > > In the POC I've posted for caching of large ROX pages on x86 [1], the cache is > global and to make kprobes and bpf use it it's enough to set a flag in > execmem_info. > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org For the ROX to work, we need different users (module text, kprobe, etc.) to have the same execmem_range. From [1]: static void *execmem_cache_alloc(struct execmem_range *range, size_t size) { ... p = __execmem_cache_alloc(size); if (p) return p; err = execmem_cache_populate(range, size); ... } We are calling __execmem_cache_alloc() without range. For this to work, we can only call execmem_cache_alloc() with one execmem_range. Did I miss something? Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 02:01:22PM -0700, Song Liu wrote: > On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > > > I'm looking at execmem_types more as definition of the consumers, > > > > > > maybe I > > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, > > > > > kprobe, > > > > > and bpf (and maybe also module text) all have the same requirements. > > > > > Did I miss something? > > > > > > > > It's enough to have one architecture with different constrains for > > > > kprobes > > > > and bpf to warrant a type for each. > > > > > > AFAICT, some of these constraints can be changed without too much work. > > > > But why? > > I honestly don't understand what are you trying to optimize here. A few > > lines of initialization in execmem_info? > > IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it > harder for bpf and kprobe to share the same ROX page. In many use cases, > a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and > module text. It is not efficient if we have to allocate separate pages for > each > of these use cases. If this is not a problem, the current approach works. The caching of large ROX pages does not need to be per type. In the POC I've posted for caching of large ROX pages on x86 [1], the cache is global and to make kprobes and bpf use it it's enough to set a flag in execmem_info. [1] https://lore.kernel.org/all/20240411160526.2093408-1-r...@kernel.org > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 10:54 AM Mike Rapoport wrote: > > On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > > > I'm looking at execmem_types more as definition of the consumers, > > > > > maybe I > > > > > should have named the enum execmem_consumer at the first place. > > > > > > > > I think looking at execmem_type from consumers' point of view adds > > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, > > > > kprobe, > > > > and bpf (and maybe also module text) all have the same requirements. > > > > Did I miss something? > > > > > > It's enough to have one architecture with different constrains for kprobes > > > and bpf to warrant a type for each. > > > > AFAICT, some of these constraints can be changed without too much work. > > But why? > I honestly don't understand what are you trying to optimize here. A few > lines of initialization in execmem_info? IIUC, having separate EXECMEM_BPF and EXECMEM_KPROBE makes it harder for bpf and kprobe to share the same ROX page. In many use cases, a 2MiB page (assuming x86_64) is enough for all BPF, kprobe, ftrace, and module text. It is not efficient if we have to allocate separate pages for each of these use cases. If this is not a problem, the current approach works. Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 09:13:27AM -0700, Song Liu wrote: > On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > > > > > > > > I'm looking at execmem_types more as definition of the consumers, maybe > > > > I > > > > should have named the enum execmem_consumer at the first place. > > > > > > I think looking at execmem_type from consumers' point of view adds > > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, kprobe, > > > and bpf (and maybe also module text) all have the same requirements. > > > Did I miss something? > > > > It's enough to have one architecture with different constrains for kprobes > > and bpf to warrant a type for each. > > AFAICT, some of these constraints can be changed without too much work. But why? I honestly don't understand what are you trying to optimize here. A few lines of initialization in execmem_info? What is the advantage in forcing architectures to have imposed limits on kprobes or bpf allocations? > > Where do you see unnecessary complexity? > > > > > IOW, we have > > > > > > enum execmem_type { > > > EXECMEM_DEFAULT, > > > EXECMEM_TEXT, > > > EXECMEM_KPROBES = EXECMEM_TEXT, > > > EXECMEM_FTRACE = EXECMEM_TEXT, > > > EXECMEM_BPF = EXECMEM_TEXT, /* we may end up without > > > _KPROBE, _FTRACE, _BPF */ > > > EXECMEM_DATA, /* rw */ > > > EXECMEM_RO_DATA, > > > EXECMEM_RO_AFTER_INIT, > > > EXECMEM_TYPE_MAX, > > > }; > > > > > > Does this make sense? > > > > How do you suggest to deal with e.g. riscv that has separate address spaces > > for modules, kprobes and bpf? > > IIUC, modules and bpf use the same address space on riscv Not exactly, bpf is a subset of modules on riscv. > while kprobes use vmalloc address. The whole point of using the entire vmalloc for kprobes is to avoid pollution of limited modules space. > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 18, 2024 at 8:37 AM Mike Rapoport wrote: > [...] > > > > Is +/- 2G enough for all realistic use cases? If so, I guess we don't > > really need > > EXECMEM_ANYWHERE below? > > > > > > > > > > * I'm not sure about BPF's requirements; it seems happy doing the same > > > > as > > > > modules. > > > > > > BPF are happy with vmalloc(). > > > > > > > So if we *must* use a common execmem allocator, what we'd reall want is > > > > our own > > > > types, e.g. > > > > > > > > EXECMEM_ANYWHERE > > > > EXECMEM_NOPLT > > > > EXECMEM_PREL32 > > > > > > > > ... and then we use those in arch code to implement module_alloc() and > > > > friends. > > > > > > I'm looking at execmem_types more as definition of the consumers, maybe I > > > should have named the enum execmem_consumer at the first place. > > > > I think looking at execmem_type from consumers' point of view adds > > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, kprobe, > > and bpf (and maybe also module text) all have the same requirements. > > Did I miss something? > > It's enough to have one architecture with different constrains for kprobes > and bpf to warrant a type for each. > AFAICT, some of these constraints can be changed without too much work. > Where do you see unnecessary complexity? > > > IOW, we have > > > > enum execmem_type { > > EXECMEM_DEFAULT, > > EXECMEM_TEXT, > > EXECMEM_KPROBES = EXECMEM_TEXT, > > EXECMEM_FTRACE = EXECMEM_TEXT, > > EXECMEM_BPF = EXECMEM_TEXT, /* we may end up without > > _KPROBE, _FTRACE, _BPF */ > > EXECMEM_DATA, /* rw */ > > EXECMEM_RO_DATA, > > EXECMEM_RO_AFTER_INIT, > > EXECMEM_TYPE_MAX, > > }; > > > > Does this make sense? > > How do you suggest to deal with e.g. riscv that has separate address spaces > for modules, kprobes and bpf? IIUC, modules and bpf use the same address space on riscv, while kprobes use vmalloc address. I haven't tried this yet, but I think we can let kprobes use the same space as modules and bpf, which is: | -4 GB | 7fff |2 GB | modules, BPF Did I get this right? Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Wed, Apr 17, 2024 at 04:32:49PM -0700, Song Liu wrote: > On Tue, Apr 16, 2024 at 12:23 AM Mike Rapoport wrote: > > > > On Mon, Apr 15, 2024 at 06:36:39PM +0100, Mark Rutland wrote: > > > On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > > > > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > > > > +/** > > > > > + * enum execmem_type - types of executable memory ranges > > > > > + * > > > > > + * There are several subsystems that allocate executable memory. > > > > > + * Architectures define different restrictions on placement, > > > > > + * permissions, alignment and other parameters for memory that can > > > > > be used > > > > > + * by these subsystems. > > > > > + * Types in this enum identify subsystems that allocate executable > > > > > memory > > > > > + * and let architectures define parameters for ranges suitable for > > > > > + * allocations by each subsystem. > > > > > + * > > > > > + * @EXECMEM_DEFAULT: default parameters that would be used for types > > > > > that > > > > > + * are not explcitly defined. > > > > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > > > > + * @EXECMEM_KPROBES: parameters for kprobes > > > > > + * @EXECMEM_FTRACE: parameters for ftrace > > > > > + * @EXECMEM_BPF: parameters for BPF > > > > > + * @EXECMEM_TYPE_MAX: > > > > > + */ > > > > > +enum execmem_type { > > > > > + EXECMEM_DEFAULT, > > > > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > > > > + EXECMEM_KPROBES, > > > > > + EXECMEM_FTRACE, > > > > > + EXECMEM_BPF, > > > > > + EXECMEM_TYPE_MAX, > > > > > +}; > > > > > > > > Can we please get a break-down of how all these types are actually > > > > different from one another? > > > > > > > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > > > > mind) and has less strict placement constraints for some of them? > > > > > > Yeah, and really I'd *much* rather deal with that in arch code, as I have > > > said > > > several times. > > > > > > For arm64 we have two bsaic restrictions: > > > > > > 1) Direct branches can go +/-128M > > >We can expand this range by having direct branches go to PLTs, at a > > >performance cost. > > > > > > 2) PREL32 relocations can go +/-2G > > >We cannot expand this further. > > > > > > * We don't need to allocate memory for ftrace. We do not use trampolines. > > > > > > * Kprobes XOL areas don't care about either of those; we don't place any > > > PC-relative instructions in those. Maybe we want to in future. > > > > > > * Modules care about both; we'd *prefer* to place them within +/-128M of > > > all > > > other kernel/module code, but if there's no space we can use PLTs and > > > expand > > > that to +/-2G. Since modules can refreence other modules, that ends up > > > actually being halved, and modules have to fit within some 2G window > > > that > > > also covers the kernel. > > Is +/- 2G enough for all realistic use cases? If so, I guess we don't > really need > EXECMEM_ANYWHERE below? > > > > > > > * I'm not sure about BPF's requirements; it seems happy doing the same as > > > modules. > > > > BPF are happy with vmalloc(). > > > > > So if we *must* use a common execmem allocator, what we'd reall want is > > > our own > > > types, e.g. > > > > > > EXECMEM_ANYWHERE > > > EXECMEM_NOPLT > > > EXECMEM_PREL32 > > > > > > ... and then we use those in arch code to implement module_alloc() and > > > friends. > > > > I'm looking at execmem_types more as definition of the consumers, maybe I > > should have named the enum execmem_consumer at the first place. > > I think looking at execmem_type from consumers' point of view adds > unnecessary complexity. IIUC, for most (if not all) archs, ftrace, kprobe, > and bpf (and maybe also module text) all have the same requirements. > Did I miss something? It's enough to have one architecture with different constrains for kprobes and bpf to warrant a type for each. Where do you see unnecessary complexity? > IOW, we have > > enum execmem_type { > EXECMEM_DEFAULT, > EXECMEM_TEXT, > EXECMEM_KPROBES = EXECMEM_TEXT, > EXECMEM_FTRACE = EXECMEM_TEXT, > EXECMEM_BPF = EXECMEM_TEXT, /* we may end up without > _KPROBE, _FTRACE, _BPF */ > EXECMEM_DATA, /* rw */ > EXECMEM_RO_DATA, > EXECMEM_RO_AFTER_INIT, > EXECMEM_TYPE_MAX, > }; > > Does this make sense? How do you suggest to deal with e.g. riscv that has separate address spaces for modules, kprobes and bpf? > Thanks, > Song -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Tue, Apr 16, 2024 at 12:23 AM Mike Rapoport wrote: > > On Mon, Apr 15, 2024 at 06:36:39PM +0100, Mark Rutland wrote: > > On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > > > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > > > +/** > > > > + * enum execmem_type - types of executable memory ranges > > > > + * > > > > + * There are several subsystems that allocate executable memory. > > > > + * Architectures define different restrictions on placement, > > > > + * permissions, alignment and other parameters for memory that can be > > > > used > > > > + * by these subsystems. > > > > + * Types in this enum identify subsystems that allocate executable > > > > memory > > > > + * and let architectures define parameters for ranges suitable for > > > > + * allocations by each subsystem. > > > > + * > > > > + * @EXECMEM_DEFAULT: default parameters that would be used for types > > > > that > > > > + * are not explcitly defined. > > > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > > > + * @EXECMEM_KPROBES: parameters for kprobes > > > > + * @EXECMEM_FTRACE: parameters for ftrace > > > > + * @EXECMEM_BPF: parameters for BPF > > > > + * @EXECMEM_TYPE_MAX: > > > > + */ > > > > +enum execmem_type { > > > > + EXECMEM_DEFAULT, > > > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > > > + EXECMEM_KPROBES, > > > > + EXECMEM_FTRACE, > > > > + EXECMEM_BPF, > > > > + EXECMEM_TYPE_MAX, > > > > +}; > > > > > > Can we please get a break-down of how all these types are actually > > > different from one another? > > > > > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > > > mind) and has less strict placement constraints for some of them? > > > > Yeah, and really I'd *much* rather deal with that in arch code, as I have > > said > > several times. > > > > For arm64 we have two bsaic restrictions: > > > > 1) Direct branches can go +/-128M > >We can expand this range by having direct branches go to PLTs, at a > >performance cost. > > > > 2) PREL32 relocations can go +/-2G > >We cannot expand this further. > > > > * We don't need to allocate memory for ftrace. We do not use trampolines. > > > > * Kprobes XOL areas don't care about either of those; we don't place any > > PC-relative instructions in those. Maybe we want to in future. > > > > * Modules care about both; we'd *prefer* to place them within +/-128M of all > > other kernel/module code, but if there's no space we can use PLTs and > > expand > > that to +/-2G. Since modules can refreence other modules, that ends up > > actually being halved, and modules have to fit within some 2G window that > > also covers the kernel. Is +/- 2G enough for all realistic use cases? If so, I guess we don't really need EXECMEM_ANYWHERE below? > > > > * I'm not sure about BPF's requirements; it seems happy doing the same as > > modules. > > BPF are happy with vmalloc(). > > > So if we *must* use a common execmem allocator, what we'd reall want is our > > own > > types, e.g. > > > > EXECMEM_ANYWHERE > > EXECMEM_NOPLT > > EXECMEM_PREL32 > > > > ... and then we use those in arch code to implement module_alloc() and > > friends. > > I'm looking at execmem_types more as definition of the consumers, maybe I > should have named the enum execmem_consumer at the first place. I think looking at execmem_type from consumers' point of view adds unnecessary complexity. IIUC, for most (if not all) archs, ftrace, kprobe, and bpf (and maybe also module text) all have the same requirements. Did I miss something? IOW, we have enum execmem_type { EXECMEM_DEFAULT, EXECMEM_TEXT, EXECMEM_KPROBES = EXECMEM_TEXT, EXECMEM_FTRACE = EXECMEM_TEXT, EXECMEM_BPF = EXECMEM_TEXT, /* we may end up without _KPROBE, _FTRACE, _BPF */ EXECMEM_DATA, /* rw */ EXECMEM_RO_DATA, EXECMEM_RO_AFTER_INIT, EXECMEM_TYPE_MAX, }; Does this make sense? Thanks, Song
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, 11 Apr 2024 19:00:41 +0300 Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > module_alloc() is used everywhere as a mean to allocate memory for code. > > Beside being semantically wrong, this unnecessarily ties all subsystems > that need to allocate code, such as ftrace, kprobes and BPF to modules and > puts the burden of code allocation to the modules code. > > Several architectures override module_alloc() because of various > constraints where the executable memory can be located and this causes > additional obstacles for improvements of code allocation. > > Start splitting code allocation from modules by introducing execmem_alloc() > and execmem_free() APIs. > > Initially, execmem_alloc() is a wrapper for module_alloc() and > execmem_free() is a replacement of module_memfree() to allow updating all > call sites to use the new APIs. > > Since architectures define different restrictions on placement, > permissions, alignment and other parameters for memory that can be used by > different subsystems that allocate executable memory, execmem_alloc() takes > a type argument, that will be used to identify the calling subsystem and to > allow architectures define parameters for ranges suitable for that > subsystem. > This looks good to me for the kprobe part. Acked-by: Masami Hiramatsu (Google) Thank you, > Signed-off-by: Mike Rapoport (IBM) > --- > arch/powerpc/kernel/kprobes.c| 6 ++-- > arch/s390/kernel/ftrace.c| 4 +-- > arch/s390/kernel/kprobes.c | 4 +-- > arch/s390/kernel/module.c| 5 +-- > arch/sparc/net/bpf_jit_comp_32.c | 8 ++--- > arch/x86/kernel/ftrace.c | 6 ++-- > arch/x86/kernel/kprobes/core.c | 4 +-- > include/linux/execmem.h | 57 > include/linux/moduleloader.h | 3 -- > kernel/bpf/core.c| 6 ++-- > kernel/kprobes.c | 8 ++--- > kernel/module/Kconfig| 1 + > kernel/module/main.c | 25 +- > mm/Kconfig | 3 ++ > mm/Makefile | 1 + > mm/execmem.c | 26 +++ > 16 files changed, 122 insertions(+), 45 deletions(-) > create mode 100644 include/linux/execmem.h > create mode 100644 mm/execmem.c > > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c > index bbca90a5e2ec..9fcd01bb2ce6 100644 > --- a/arch/powerpc/kernel/kprobes.c > +++ b/arch/powerpc/kernel/kprobes.c > @@ -19,8 +19,8 @@ > #include > #include > #include > -#include > #include > +#include > #include > #include > #include > @@ -130,7 +130,7 @@ void *alloc_insn_page(void) > { > void *page; > > - page = module_alloc(PAGE_SIZE); > + page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); > if (!page) > return NULL; > > @@ -142,7 +142,7 @@ void *alloc_insn_page(void) > } > return page; > error: > - module_memfree(page); > + execmem_free(page); > return NULL; > } > > diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c > index c46381ea04ec..798249ef5646 100644 > --- a/arch/s390/kernel/ftrace.c > +++ b/arch/s390/kernel/ftrace.c > @@ -7,13 +7,13 @@ > * Author(s): Martin Schwidefsky > */ > > -#include > #include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > @@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void) > { > const char *start, *end; > > - ftrace_plt = module_alloc(PAGE_SIZE); > + ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE); > if (!ftrace_plt) > panic("cannot allocate ftrace plt\n"); > > diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c > index f0cf20d4b3c5..3c1b1be744de 100644 > --- a/arch/s390/kernel/kprobes.c > +++ b/arch/s390/kernel/kprobes.c > @@ -9,7 +9,6 @@ > > #define pr_fmt(fmt) "kprobes: " fmt > > -#include > #include > #include > #include > @@ -21,6 +20,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -38,7 +38,7 @@ void *alloc_insn_page(void) > { > void *page; > > - page = module_alloc(PAGE_SIZE); > + page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); > if (!page) > return NULL; > set_memory_rox((unsigned long)page, 1); > diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c > index 42215f9404af..ac97a905e8cd 100644 > --- a/arch/s390/kernel/module.c > +++ b/arch/s390/kernel/module.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -76,7 +77,7 @@ void *module_alloc(unsigned long size) > #ifdef CONFIG_FUNCTION_TRACER > void module_arch_cleanup(struct module *mod) > { > - module_memfree(mod->arch.trampolines_start); > + execmem_free(mod->arch.trampolines_start); > } > #endif > > @@ -510,7 +511,7 @@
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Mon, Apr 15, 2024 at 06:36:39PM +0100, Mark Rutland wrote: > On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > > +/** > > > + * enum execmem_type - types of executable memory ranges > > > + * > > > + * There are several subsystems that allocate executable memory. > > > + * Architectures define different restrictions on placement, > > > + * permissions, alignment and other parameters for memory that can be > > > used > > > + * by these subsystems. > > > + * Types in this enum identify subsystems that allocate executable memory > > > + * and let architectures define parameters for ranges suitable for > > > + * allocations by each subsystem. > > > + * > > > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > > > + * are not explcitly defined. > > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > > + * @EXECMEM_KPROBES: parameters for kprobes > > > + * @EXECMEM_FTRACE: parameters for ftrace > > > + * @EXECMEM_BPF: parameters for BPF > > > + * @EXECMEM_TYPE_MAX: > > > + */ > > > +enum execmem_type { > > > + EXECMEM_DEFAULT, > > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > > + EXECMEM_KPROBES, > > > + EXECMEM_FTRACE, > > > + EXECMEM_BPF, > > > + EXECMEM_TYPE_MAX, > > > +}; > > > > Can we please get a break-down of how all these types are actually > > different from one another? > > > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > > mind) and has less strict placement constraints for some of them? > > Yeah, and really I'd *much* rather deal with that in arch code, as I have said > several times. > > For arm64 we have two bsaic restrictions: > > 1) Direct branches can go +/-128M >We can expand this range by having direct branches go to PLTs, at a >performance cost. > > 2) PREL32 relocations can go +/-2G >We cannot expand this further. > > * We don't need to allocate memory for ftrace. We do not use trampolines. > > * Kprobes XOL areas don't care about either of those; we don't place any > PC-relative instructions in those. Maybe we want to in future. > > * Modules care about both; we'd *prefer* to place them within +/-128M of all > other kernel/module code, but if there's no space we can use PLTs and expand > that to +/-2G. Since modules can refreence other modules, that ends up > actually being halved, and modules have to fit within some 2G window that > also covers the kernel. > > * I'm not sure about BPF's requirements; it seems happy doing the same as > modules. BPF are happy with vmalloc(). > So if we *must* use a common execmem allocator, what we'd reall want is our > own > types, e.g. > > EXECMEM_ANYWHERE > EXECMEM_NOPLT > EXECMEM_PREL32 > > ... and then we use those in arch code to implement module_alloc() and > friends. I'm looking at execmem_types more as definition of the consumers, maybe I should have named the enum execmem_consumer at the first place. And the arch constrains defined in struct execmem_range describe how memory should be allocated for each consumer. These constraints are defined early at boot and remain static, so initializing them once and letting a common allocator use them makes perfect sense to me. I agree that fallback_{start,end} are not ideal, but we have 3 architectures that have preferred and secondary range for modules. And arm and powerpc use the same logic for kprobes as well, and I don't see why this code should be duplicated. And, for instance, if you decide to place PC-relative instructions if kprobes XOL areas, you'd only need to update execmem_range for kprobes to be more like the range for modules. With central allocator it's easier to deal with the things like VM_FLUSH_RESET_PERMS and caching of ROX memory and I think it will be more maintainable that module_alloc(), alloc_insn_page() and bpf_jit_alloc_exec() spread all over the place. > Mark. -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > +/** > > + * enum execmem_type - types of executable memory ranges > > + * > > + * There are several subsystems that allocate executable memory. > > + * Architectures define different restrictions on placement, > > + * permissions, alignment and other parameters for memory that can be used > > + * by these subsystems. > > + * Types in this enum identify subsystems that allocate executable memory > > + * and let architectures define parameters for ranges suitable for > > + * allocations by each subsystem. > > + * > > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > > + * are not explcitly defined. > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > + * @EXECMEM_KPROBES: parameters for kprobes > > + * @EXECMEM_FTRACE: parameters for ftrace > > + * @EXECMEM_BPF: parameters for BPF > > + * @EXECMEM_TYPE_MAX: > > + */ > > +enum execmem_type { > > + EXECMEM_DEFAULT, > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > + EXECMEM_KPROBES, > > + EXECMEM_FTRACE, > > + EXECMEM_BPF, > > + EXECMEM_TYPE_MAX, > > +}; > > Can we please get a break-down of how all these types are actually > different from one another? > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > mind) and has less strict placement constraints for some of them? Yeah, and really I'd *much* rather deal with that in arch code, as I have said several times. For arm64 we have two bsaic restrictions: 1) Direct branches can go +/-128M We can expand this range by having direct branches go to PLTs, at a performance cost. 2) PREL32 relocations can go +/-2G We cannot expand this further. * We don't need to allocate memory for ftrace. We do not use trampolines. * Kprobes XOL areas don't care about either of those; we don't place any PC-relative instructions in those. Maybe we want to in future. * Modules care about both; we'd *prefer* to place them within +/-128M of all other kernel/module code, but if there's no space we can use PLTs and expand that to +/-2G. Since modules can refreence other modules, that ends up actually being halved, and modules have to fit within some 2G window that also covers the kernel. * I'm not sure about BPF's requirements; it seems happy doing the same as modules. So if we *must* use a common execmem allocator, what we'd reall want is our own types, e.g. EXECMEM_ANYWHERE EXECMEM_NOPLT EXECMEM_PREL32 ... and then we use those in arch code to implement module_alloc() and friends. Mark.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > +/** > > + * enum execmem_type - types of executable memory ranges > > + * > > + * There are several subsystems that allocate executable memory. > > + * Architectures define different restrictions on placement, > > + * permissions, alignment and other parameters for memory that can be used > > + * by these subsystems. > > + * Types in this enum identify subsystems that allocate executable memory > > + * and let architectures define parameters for ranges suitable for > > + * allocations by each subsystem. > > + * > > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > > + * are not explcitly defined. > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > + * @EXECMEM_KPROBES: parameters for kprobes > > + * @EXECMEM_FTRACE: parameters for ftrace > > + * @EXECMEM_BPF: parameters for BPF > > + * @EXECMEM_TYPE_MAX: > > + */ > > +enum execmem_type { > > + EXECMEM_DEFAULT, > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > + EXECMEM_KPROBES, > > + EXECMEM_FTRACE, > > + EXECMEM_BPF, > > + EXECMEM_TYPE_MAX, > > +}; > > Can we please get a break-down of how all these types are actually > different from one another? > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > mind) and has less strict placement constraints for some of them? loongarch, mips, nios2 and sparc define modules address space different from vmalloc and use that for modules, kprobes and bpf (where supported). parisc uses vmalloc range for everything, but it sets permissions to PAGE_KERNEL_RWX because it's PAGE_KERNEL_EXEC is read only and it lacks set_memory_* APIs. arm has an address space for modules, but it fall back to the entire vmalloc with CONFIG_ARM_MODULE_PLTS=y. arm64 uses different ranges for modules and bpf/kprobes. For kprobes it does vmalloc(PAGE_KERNEL_ROX) and for bpf just plain vmalloc(). For modules arm64 first tries to allocated from 128M below kernel_end and if that fails it uses 2G below kernel_end as a fallback. powerpc uses vmalloc space for everything for some configurations. For book3s-32 and 8xx it defines two ranges that are used for module text, kprobes and bpf and the module data can be allocated anywhere in vmalloc. riscv has an address space for modules, a different address space for bpf and uses vmalloc space for kprobes. s390 and x86 have modules address space and use that space for all executable allocations. The EXECMEM_FTRACE type is only used on s390 and x86 and for now it's there more for completeness rather to denote special constraints or properties. -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > +/** > + * enum execmem_type - types of executable memory ranges > + * > + * There are several subsystems that allocate executable memory. > + * Architectures define different restrictions on placement, > + * permissions, alignment and other parameters for memory that can be used > + * by these subsystems. > + * Types in this enum identify subsystems that allocate executable memory > + * and let architectures define parameters for ranges suitable for > + * allocations by each subsystem. > + * > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > + * are not explcitly defined. > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > + * @EXECMEM_KPROBES: parameters for kprobes > + * @EXECMEM_FTRACE: parameters for ftrace > + * @EXECMEM_BPF: parameters for BPF > + * @EXECMEM_TYPE_MAX: > + */ > +enum execmem_type { > + EXECMEM_DEFAULT, > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > + EXECMEM_KPROBES, > + EXECMEM_FTRACE, > + EXECMEM_BPF, > + EXECMEM_TYPE_MAX, > +}; Can we please get a break-down of how all these types are actually different from one another? I'm thinking some platforms have a tiny immediate space (arm64 comes to mind) and has less strict placement constraints for some of them?
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Fri, Apr 12, 2024 at 11:16:10AM +0200, Ingo Molnar wrote: > > * Mike Rapoport wrote: > > > +/** > > + * enum execmem_type - types of executable memory ranges > > + * > > + * There are several subsystems that allocate executable memory. > > + * Architectures define different restrictions on placement, > > + * permissions, alignment and other parameters for memory that can be used > > + * by these subsystems. > > + * Types in this enum identify subsystems that allocate executable memory > > + * and let architectures define parameters for ranges suitable for > > + * allocations by each subsystem. > > + * > > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > > + * are not explcitly defined. > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > + * @EXECMEM_KPROBES: parameters for kprobes > > + * @EXECMEM_FTRACE: parameters for ftrace > > + * @EXECMEM_BPF: parameters for BPF > > + * @EXECMEM_TYPE_MAX: > > + */ > > +enum execmem_type { > > + EXECMEM_DEFAULT, > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > + EXECMEM_KPROBES, > > + EXECMEM_FTRACE, > > + EXECMEM_BPF, > > + EXECMEM_TYPE_MAX, > > +}; > > s/explcitly > /explicitly Sure, thanks > Thanks, > > Ingo -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 11, 2024 at 12:42:05PM -0700, Luis Chamberlain wrote: > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > From: "Mike Rapoport (IBM)" > > > > module_alloc() is used everywhere as a mean to allocate memory for code. > > > > Beside being semantically wrong, this unnecessarily ties all subsystems > > that need to allocate code, such as ftrace, kprobes and BPF to modules and > > puts the burden of code allocation to the modules code. > > > > Several architectures override module_alloc() because of various > > constraints where the executable memory can be located and this causes > > additional obstacles for improvements of code allocation. > > > > Start splitting code allocation from modules by introducing execmem_alloc() > > and execmem_free() APIs. > > > > Initially, execmem_alloc() is a wrapper for module_alloc() and > > execmem_free() is a replacement of module_memfree() to allow updating all > > call sites to use the new APIs. > > > > Since architectures define different restrictions on placement, > > permissions, alignment and other parameters for memory that can be used by > > different subsystems that allocate executable memory, execmem_alloc() takes > > a type argument, that will be used to identify the calling subsystem and to > > allow architectures define parameters for ranges suitable for that > > subsystem. > > It would be good to describe this is a non-fuctional change. Ok. > > Signed-off-by: Mike Rapoport (IBM) > > --- > > > diff --git a/mm/execmem.c b/mm/execmem.c > > new file mode 100644 > > index ..ed2ea41a2543 > > --- /dev/null > > +++ b/mm/execmem.c > > @@ -0,0 +1,26 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > And this just needs to copy over the copyright notices from the main.c file. Will do. > Luis -- Sincerely yours, Mike.
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
* Mike Rapoport wrote: > +/** > + * enum execmem_type - types of executable memory ranges > + * > + * There are several subsystems that allocate executable memory. > + * Architectures define different restrictions on placement, > + * permissions, alignment and other parameters for memory that can be used > + * by these subsystems. > + * Types in this enum identify subsystems that allocate executable memory > + * and let architectures define parameters for ranges suitable for > + * allocations by each subsystem. > + * > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > + * are not explcitly defined. > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > + * @EXECMEM_KPROBES: parameters for kprobes > + * @EXECMEM_FTRACE: parameters for ftrace > + * @EXECMEM_BPF: parameters for BPF > + * @EXECMEM_TYPE_MAX: > + */ > +enum execmem_type { > + EXECMEM_DEFAULT, > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > + EXECMEM_KPROBES, > + EXECMEM_FTRACE, > + EXECMEM_BPF, > + EXECMEM_TYPE_MAX, > +}; s/explcitly /explicitly Thanks, Ingo
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > module_alloc() is used everywhere as a mean to allocate memory for code. > > Beside being semantically wrong, this unnecessarily ties all subsystems > that need to allocate code, such as ftrace, kprobes and BPF to modules and > puts the burden of code allocation to the modules code. > > Several architectures override module_alloc() because of various > constraints where the executable memory can be located and this causes > additional obstacles for improvements of code allocation. > > Start splitting code allocation from modules by introducing execmem_alloc() > and execmem_free() APIs. > > Initially, execmem_alloc() is a wrapper for module_alloc() and > execmem_free() is a replacement of module_memfree() to allow updating all > call sites to use the new APIs. > > Since architectures define different restrictions on placement, > permissions, alignment and other parameters for memory that can be used by > different subsystems that allocate executable memory, execmem_alloc() takes > a type argument, that will be used to identify the calling subsystem and to > allow architectures define parameters for ranges suitable for that > subsystem. It would be good to describe this is a non-fuctional change. > Signed-off-by: Mike Rapoport (IBM) > --- > diff --git a/mm/execmem.c b/mm/execmem.c > new file mode 100644 > index ..ed2ea41a2543 > --- /dev/null > +++ b/mm/execmem.c > @@ -0,0 +1,26 @@ > +// SPDX-License-Identifier: GPL-2.0 And this just needs to copy over the copyright notices from the main.c file. Luis
[PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
From: "Mike Rapoport (IBM)" module_alloc() is used everywhere as a mean to allocate memory for code. Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code. Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation. Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs. Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs. Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem. Signed-off-by: Mike Rapoport (IBM) --- arch/powerpc/kernel/kprobes.c| 6 ++-- arch/s390/kernel/ftrace.c| 4 +-- arch/s390/kernel/kprobes.c | 4 +-- arch/s390/kernel/module.c| 5 +-- arch/sparc/net/bpf_jit_comp_32.c | 8 ++--- arch/x86/kernel/ftrace.c | 6 ++-- arch/x86/kernel/kprobes/core.c | 4 +-- include/linux/execmem.h | 57 include/linux/moduleloader.h | 3 -- kernel/bpf/core.c| 6 ++-- kernel/kprobes.c | 8 ++--- kernel/module/Kconfig| 1 + kernel/module/main.c | 25 +- mm/Kconfig | 3 ++ mm/Makefile | 1 + mm/execmem.c | 26 +++ 16 files changed, 122 insertions(+), 45 deletions(-) create mode 100644 include/linux/execmem.h create mode 100644 mm/execmem.c diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index bbca90a5e2ec..9fcd01bb2ce6 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -19,8 +19,8 @@ #include #include #include -#include #include +#include #include #include #include @@ -130,7 +130,7 @@ void *alloc_insn_page(void) { void *page; - page = module_alloc(PAGE_SIZE); + page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); if (!page) return NULL; @@ -142,7 +142,7 @@ void *alloc_insn_page(void) } return page; error: - module_memfree(page); + execmem_free(page); return NULL; } diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..798249ef5646 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -7,13 +7,13 @@ * Author(s): Martin Schwidefsky */ -#include #include #include #include #include #include #include +#include #include #include #include @@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void) { const char *start, *end; - ftrace_plt = module_alloc(PAGE_SIZE); + ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE); if (!ftrace_plt) panic("cannot allocate ftrace plt\n"); diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c index f0cf20d4b3c5..3c1b1be744de 100644 --- a/arch/s390/kernel/kprobes.c +++ b/arch/s390/kernel/kprobes.c @@ -9,7 +9,6 @@ #define pr_fmt(fmt) "kprobes: " fmt -#include #include #include #include @@ -21,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -38,7 +38,7 @@ void *alloc_insn_page(void) { void *page; - page = module_alloc(PAGE_SIZE); + page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); if (!page) return NULL; set_memory_rox((unsigned long)page, 1); diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 42215f9404af..ac97a905e8cd 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -76,7 +77,7 @@ void *module_alloc(unsigned long size) #ifdef CONFIG_FUNCTION_TRACER void module_arch_cleanup(struct module *mod) { - module_memfree(mod->arch.trampolines_start); + execmem_free(mod->arch.trampolines_start); } #endif @@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct module *me, size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size); numpages = DIV_ROUND_UP(size, PAGE_SIZE); - start = module_alloc(numpages * PAGE_SIZE); + start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE); if (!start) return -ENOMEM; set_memory_rox((unsigned long)start, numpages); diff --git