Re: [PATCH RESEND v8 00/16] mm: jit/text allocator

2024-05-05 Thread Luis Chamberlain
On Sun, May 05, 2024 at 07:06:12PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8
> 
> v8:
> * fix intialization of default_execmem_info

Thanks, applied and pushed to modules-next. If we find fixes, let's
please just now have separate patches on top of this series.

  Luis


Re: [PATCH v7 00/16] mm: jit/text allocator

2024-05-02 Thread Luis Chamberlain
On Thu, May 02, 2024 at 11:50:36PM +0100, Liviu Dudau wrote:
> On Mon, Apr 29, 2024 at 09:29:20AM -0700, Luis Chamberlain wrote:
> > On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> > > From: "Mike Rapoport (IBM)" 
> > > 
> > > Hi,
> > > 
> > > The patches are also available in git:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> > > 
> > > v7 changes:
> > > * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
> > >   #ifdefs in a function body
> > > * add Acks, thanks everybody
> > 
> > Thanks, I've pushed this to modules-next for further exposure / testing.
> > Given the status of testing so far with prior revisions, in that only a
> > few issues were found and that those were fixed, and the status of
> > reviews, this just might be ripe for v6.10.
> 
> Looks like there is still some work needed. I've picked up next-20240501
> and on arch/mips with CONFIG_MODULE_COMPRESS_XZ=y and 
> CONFIG_MODULE_DECOMPRESS=y
> I fail to load any module:
> 
> # modprobe rfkill
> [11746.539090] Invalid ELF header magic: != ELF
> [11746.587149] execmem: unable to allocate memory
> modprobe: can't load module rfkill (kernel/net/rfkill/rfkill.ko.xz): Out of 
> memory
> 
> The (hopefully) relevant parts of my .config:

Thanks for the report! Any chance we can get you to try a bisection? I
think it should take 2-3 test boots. To help reduce scope you try modules-next:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

Then can you check by resetting your tree to commmit 3fbe6c2f820a76 (mm:
introduce execmem_alloc() and execmem_free()"). I suspect that should
boot, so your bad commit would be the tip 3c2c250cb3a5fbb ("bpf: remove
CONFIG_BPF_JIT dependency on CONFIG_MODULES of").

That gives us only a few commits to bisect:

git log --oneline 3fbe6c2f820a76bc36d5546bda85832f57c8fce2..
3c2c250cb3a5 (HEAD -> modules-next, korg/modules-next) bpf: remove 
CONFIG_BPF_JIT dependency on CONFIG_MODULES of
11e8e65cce5c kprobes: remove dependency on CONFIG_MODULES
e10cbc38697b powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where 
appropriate
4da3d38f24c5 x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
13ae3d74ee70 arch: make execmem setup available regardless of CONFIG_MODULES
460bbbc70a47 powerpc: extend execmem_params for kprobes allocations
e1a14069b5b4 arm64: extend execmem_info for generated code allocations
971e181c6585 riscv: extend execmem_params for generated code allocations
0fa276f26721 mm/execmem, arch: convert remaining overrides of module_alloc to 
execmem
022cef244287 mm/execmem, arch: convert simple overrides of module_alloc to 
execmem

With 2-3 boots we should be to tell which is the bad commit.

  Luis


Re: [PATCH v7 00/16] mm: jit/text allocator

2024-04-29 Thread Luis Chamberlain
On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> 
> v7 changes:
> * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
>   #ifdefs in a function body
> * add Acks, thanks everybody

Thanks, I've pushed this to modules-next for further exposure / testing.
Given the status of testing so far with prior revisions, in that only a
few issues were found and that those were fixed, and the status of
reviews, this just might be ripe for v6.10.

  Luis


Re: [PATCH v6 00/16] mm: jit/text allocator

2024-04-26 Thread Luis Chamberlain
On Fri, Apr 26, 2024 at 11:28:38AM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v6
> 
> v6 changes:
> * restore patch "arm64: extend execmem_info for generated code
>   allocations" that disappeared in v5 rebase
> * update execmem initialization so that by default it will be
>   initialized early while late initialization will be an opt-in

I've taken this new iteration again through modules-next so to help get
more testing exposure to this. If we run into conflicts again with mm
we can see if Andrew is willing to take this in through there. However,
it may make sense to only consider up to "mm: introduce execmem_alloc() and
execmem_free()" for v6.10 given we're bound to likely find more issues
and we are already at rc5.

  Luis


Re: [PATCH v5 00/15] mm: jit/text allocator

2024-04-23 Thread Luis Chamberlain
On Mon, Apr 22, 2024 at 12:44:21PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> (something went wrong with the prevois posting, sorry for the noise)
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better than having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v5

Thanks! I've merged and pushed this onto modules-next in its entirety now for
wider testing.

  Luis


Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers

2024-04-23 Thread Luis Chamberlain
On Tue, Apr 23, 2024 at 09:54:35AM +0200, Thomas Weißschuh wrote:
> * Patch 1 is a bugfix for the stack_erasing sysctl handler
> * Patches 2-10 change various helper functions throughout the kernel to
>   be able to handle 'const ctl_table'.
> * Patch 11 changes the signatures of all proc handlers through the tree.
>   Some other signatures are also adapted, for details see the commit
>   message.
> 
> Only patch 1 changes any code at all.
> 
> The series was compile-tested on top of next-20230423 for
> i386, x86_64, arm, arm64, riscv, loongarch, s390 and m68k.
> 
> The series was split from my larger series sysctl-const series [0].
> It only focusses on the proc_handlers but is an important step to be
> able to move all static definitions of ctl_table into .rodata.
> 
> [0] 
> https://lore.kernel.org/lkml/20231204-const-sysctl-v2-0-7a5060b11...@weissschuh.net/
> 
> Signed-off-by: Thomas Weißschuh 

Cover letters don't need SOBS we only use them for patches.

But anyway:

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v4 00/15] mm: jit/text allocator

2024-04-11 Thread Luis Chamberlain
On Thu, Apr 11, 2024 at 07:00:36PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better that having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v4

I've taken the first 5 patches through modules-next for now to get early
exposure to testing. Of those I just had minor nit feedback on the 5th,
but the rest look good.

Let's wait for review for the rest of the patches 6-15.

  Luis


Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-11 Thread Luis Chamberlain
On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystems
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> Start splitting code allocation from modules by introducing execmem_alloc()
> and execmem_free() APIs.
> 
> Initially, execmem_alloc() is a wrapper for module_alloc() and
> execmem_free() is a replacement of module_memfree() to allow updating all
> call sites to use the new APIs.
> 
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem_alloc() takes
> a type argument, that will be used to identify the calling subsystem and to
> allow architectures define parameters for ranges suitable for that
> subsystem.

It would be good to describe this is a non-fuctional change.

> Signed-off-by: Mike Rapoport (IBM) 
> ---

> diff --git a/mm/execmem.c b/mm/execmem.c
> new file mode 100644
> index ..ed2ea41a2543
> --- /dev/null
> +++ b/mm/execmem.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0

And this just needs to copy over the copyright notices from the main.c file.

  Luis


Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-31 Thread Luis Chamberlain
On Wed, Jan 31, 2024 at 06:53:13AM +, Christophe Leroy wrote:
> The problem being identified in commit 677bfb9db8a3 ("module: Don't 
> ignore errors from set_memory_XX()"), you can keep/re-apply the series 
> [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time.

Sure, queued that up into modules-testing before I push to modules-next.

  Luis


Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-30 Thread Luis Chamberlain
On Tue, Jan 30, 2024 at 06:48:11PM +0100, Marek Szyprowski wrote:
> Dear All,
> 
> On 30.01.2024 12:03, Christophe Leroy wrote:
> > Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit :
> >> [Vous ne recevez pas souvent de courriers de we...@chromium.org. D?couvrez 
> >> pourquoi ceci est important ? 
> >> https://aka.ms/LearnAboutSenderIdentification ]
> >>
> >> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote:
> >>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
> >>>> Declaring rodata_enabled and mark_rodata_ro() at all time
> >>>> helps removing related #ifdefery in C files.
> >>>>
> >>>> Signed-off-by: Christophe Leroy 
> >>> Very nice cleanup, thanks!, applied and pushed
> >>>
> >>> Luis
> >> On next-20240130, which has your modules-next branch, and thus this
> >> series and the other "module: Use set_memory_rox()" series applied,
> >> my kernel crashes in some very weird way. Reverting your branch
> >> makes the crash go away.
> >>
> >> I thought I'd report it right away. Maybe you folks would know what's
> >> happening here? This is on arm64.
> > That's strange, it seems to bug in module_bug_finalize() which is
> > _before_ calls to module_enable_ro() and such.
> >
> > Can you try to revert the 6 patches one by one to see which one
> > introduces the problem ?
> >
> > In reality, only patch 677bfb9db8a3 really change things. Other ones are
> > more on less only cleanup.
> 
> I've also run into this issue with today's (20240130) linux-next on my 
> test farm. The issue is not fully reproducible, so it was a bit hard to 
> bisect it automatically. I've spent some time on manual testing and it 
> looks that reverting the following 2 commits on top of linux-next fixes 
> the problem:
> 
> 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around 
> rodata_enabled")
> 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()")
> 
> This in fact means that commit 677bfb9db8a3 is responsible for this 
> regression, as 65929884f868 has to be reverted only because the latter 
> depends on it. Let me know what I can do to help debugging this issue.

Thanks for the bisect, I've reset my tree to commit
3559ad395bf02 ("module: Change module_enable_{nx/x/ro}() to more
explicit names") for now then, so to remove those commits.

  Luis


Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-29 Thread Luis Chamberlain
On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
> Declaring rodata_enabled and mark_rodata_ro() at all time
> helps removing related #ifdefery in C files.
> 
> Signed-off-by: Christophe Leroy 

Very nice cleanup, thanks!, applied and pushed

  Luis


Re: [PATCH 2/3] modpost: Extended modversion support

2023-11-16 Thread Luis Chamberlain
On Wed, Nov 15, 2023 at 06:50:10PM +, Matthew Maurer wrote:
> Adds a new format for modversions which stores each field in a separate
> elf section.

The "why" is critical and not mentioned. And I'd like to also see
documented this with foresight, if Rust needed could this be used
in the future for other things?

Also please include folks CC'd in *one* patch to *all* patches as
otherwise we have no context.

> This initially adds support for variable length names, but
> could later be used to add additional fields to modversions in a
> backwards compatible way if needed.
> 
> Adding support for variable length names makes it possible to enable
> MODVERSIONS and RUST at the same time.
> 
> Signed-off-by: Matthew Maurer 
> ---
>  arch/powerpc/kernel/module_64.c | 24 +-

Why was only powerpc modified? If the commit log explained this it would
make it easier for review.

> diff --git a/kernel/module/internal.h b/kernel/module/internal.h
> index c8b7b4dcf782..0c188c96a045 100644
> --- a/kernel/module/internal.h
> +++ b/kernel/module/internal.h
> @@ -80,7 +80,7 @@ struct load_info {
>   unsigned int used_pages;
>  #endif
>   struct {
> - unsigned int sym, str, mod, vers, info, pcpu;
> + unsigned int sym, str, mod, vers, info, pcpu, vers_ext_crc, 
> vers_ext_name;

We might as well modify this in a preliminary patch to add each new
unsinged int in a new line, so that it is easier to blame when each new
entry gets added. It should not grow the size of the struct at all but
it would make futur extensions easier to review what is new and git
blame easier to spot when something was added.

Although we don't use this extensively today this can easily grow for
convenience and making code easier to read.

> diff --git a/kernel/module/version.c b/kernel/module/version.c
> index 53f43ac5a73e..93d97dad8c77 100644
> --- a/kernel/module/version.c
> +++ b/kernel/module/version.c
> @@ -19,11 +19,28 @@ int check_version(const struct load_info *info,
>   unsigned int versindex = info->index.vers;
>   unsigned int i, num_versions;
>   struct modversion_info *versions;
> + struct modversion_info_ext version_ext;
>  
>   /* Exporting module didn't supply crcs?  OK, we're already tainted. */
>   if (!crc)
>   return 1;
>  
> + /* If we have extended version info, rely on it */
> + if (modversion_ext_start(info, _ext) >= 0) {

There are two things we need to do to make processing modules easier:

  1) ELF validation
  2) Once checked then process the information

We used to have this split up but also had a few places which did both
1) and 2) together. This was wrong and so I want to keep things tidy
and ensure we do things which validate the ELF separate. To that
end please put the checks to validate the ELF first so that we report
to users with a proper error/debug check in case the ELF is wrong,
this enables futher debug checks for that to be done instead of
confusing users who end up scratching their heads why something
failed.

So please split up the ELF validation check and put that into
elf_validity_cache_copy() which runs *earlier* than this.

Then *if* if has this, you just process it. Please take care to be
very pedantic in the elf_validity_cache_copy() and extend the checks
you have for validation in modversion_ext_start() and bring them to
elf_validity_cache_copy() with perhaps *more* stuff which does any
insane checks to verify it is 100% correct.

> + do {
> + if (strncmp(version_ext.name.value, symname,
> + version_ext.name.end - 
> version_ext.name.value) != 0)
> + continue;
> +
> + if (*version_ext.crc.value == *crc)
> + return 1;
> + pr_debug("Found checksum %X vs module %X\n",
> +  *crc, *version_ext.crc.value);
> + goto bad_version;
> + } while (modversion_ext_advance(_ext) == 0);

Can you do a for_each_foo()) type loop here instead after validation?
Because the validation would ensure your loop is bounded then. Look at
for_each_mod_mem_type() for inspiration.

> + goto broken_toolchain;

The broken toolchain thing would then be an issue reported in the
ELF validation.

> @@ -87,6 +105,65 @@ int same_magic(const char *amagic, const char *bmagic,
>   return strcmp(amagic, bmagic) == 0;
>  }
>  
> +#define MODVERSION_FIELD_START(sec, field) \
> + field.value = (typeof(field.value))sec.sh_addr; \
> + field.end = field.value + sec.sh_size
> +
> +ssize_t modversion_ext_start(const struct load_info *info,
> +  struct modversion_info_ext *start)
> +{
> + unsigned int crc_idx = info->index.vers_ext_crc;
> + unsigned int name_idx = info->index.vers_ext_name;
> + Elf_Shdr *sechdrs = info->sechdrs;
> +
> + // Both of these fields are needed for this to be 

Re: [PATCH v2 00/15] sysctl: Remove sentinel elements from drivers

2023-10-10 Thread Luis Chamberlain
On Mon, Oct 02, 2023 at 10:55:17AM +0200, Joel Granados via B4 Relay wrote:
> Changes in v2:
> - Left the dangling comma in the ctl_table arrays.
> - Link to v1: 
> https://lore.kernel.org/r/20230928-jag-sysctl_remove_empty_elem_drivers-v1-0-e59120fca...@samsung.com

Thanks! Pushed onto sysctl-next for wider testing.

  Luis


Re: [PATCH v3 0/7] sysctl: Remove sentinel elements from arch

2023-10-10 Thread Luis Chamberlain
On Mon, Oct 02, 2023 at 01:30:35PM +0200, Joel Granados via B4 Relay wrote:
> V3:
> * Removed the ia64 patch to avoid conflicts with the ia64 removal
> * Rebased onto v6.6-rc4
> * Kept/added the trailing comma for the ctl_table arrays. This was a comment
>   that we received "drivers/*" patch set.
> * Link to v2: 
> https://lore.kernel.org/r/20230913-jag-sysctl_remove_empty_elem_arch-v2-0-d1bd13a29...@samsung.com

Thanks! I replaced the v2 with this v3 on sysctl-next.

  Luis


Re: [PATCH v2 0/8] sysctl: Remove sentinel elements from arch

2023-09-20 Thread Luis Chamberlain
On Wed, Sep 13, 2023 at 11:10:54AM +0200, Joel Granados via B4 Relay wrote:
> V2:
> * Added clarification both in the commit messages and the coverletter as
>   to why this patch is safe to apply.
> * Added {Acked,Reviewed,Tested}-by from list
> * Link to v1: 
> https://lore.kernel.org/r/20230906-jag-sysctl_remove_empty_elem_arch-v1-0-3935d4854...@samsung.com

Thanks! I've merged this onto sysctl-next.

  Luis


Re: [PATCH v2 3/5] mmu_notifiers: Call invalidate_range() when invalidating TLBs

2023-07-24 Thread Luis Chamberlain
Cc'ing fsdevel + xfs folks as this fixes a regression tests with
XFS with generic/176.

On Thu, Jul 20, 2023 at 10:52:59AM +1000, Alistair Popple wrote:
> 
> SeongJae Park  writes:
> 
> > Hi Alistair,
> >
> > On Wed, 19 Jul 2023 22:18:44 +1000 Alistair Popple  
> > wrote:
> >
> >> The invalidate_range() is going to become an architecture specific mmu
> >> notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> >> sync with the CPU page tables. Currently it is called from separate
> >> code paths to the main CPU TLB invalidations. This can lead to a
> >> secondary TLB not getting invalidated when required and makes it hard
> >> to reason about when exactly the secondary TLB is invalidated.
> >> 
> >> To fix this move the notifier call to the architecture specific TLB
> >> maintenance functions for architectures that have secondary MMUs
> >> requiring explicit software invalidations.
> >> 
> >> This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> >> require a TLB invalidation. This invalidation is done by the
> >> architecutre specific ptep_set_access_flags() which calls
> >> flush_tlb_page() if required. However this doesn't call the notifier
> >> resulting in infinite faults being generated by devices using the SMMU
> >> if it has previously cached a read-only PTE in it's TLB.
> >> 
> >> Moving the invalidations into the TLB invalidation functions ensures
> >> all invalidations happen at the same time as the CPU invalidation. The
> >> architecture specific flush_tlb_all() routines do not call the
> >> notifier as none of the IOMMUs require this.
> >> 
> >> Signed-off-by: Alistair Popple 
> >> Suggested-by: Jason Gunthorpe 
> >
> > I found below kernel NULL-dereference issue on latest mm-unstable tree, and
> > bisect points me to the commit of this patch, namely
> > 75c400f82d347af1307010a3e06f3aa5d831d995.
> >
> > To reproduce, I use 'stress-ng --bigheap $(nproc)'.  The issue happens as 
> > soon
> > as it starts reclaiming memory.  I didn't dive deep into this yet, but
> > reporting this issue first, since you might have an idea already.
> 
> Thanks for the report SJ!
> 
> I see the problem - current->mm can (obviously!) be NULL which is what's
> leading to the NULL dereference. Instead I think on x86 I need to call
> the notifier when adding the invalidate to the tlbbatch in
> arch_tlbbatch_add_pending() which is equivalent to what ARM64 does.
> 
> The below should fix it. Will do a respin with this.
> 
> ---
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 837e4a50281a..79c46da919b9 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -282,6 +283,7 @@ static inline void arch_tlbbatch_add_pending(struct 
> arch_tlbflush_unmap_batch *b
>  {
>   inc_mm_tlb_gen(mm);
>   cpumask_or(>cpumask, >cpumask, mm_cpumask(mm));
> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
>  }
>  
>  static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 0b990fb56b66..2d253919b3e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1265,7 +1265,6 @@ void arch_tlbbatch_flush(struct 
> arch_tlbflush_unmap_batch *batch)
>  
>   put_flush_tlb_info();
>   put_cpu();
> - mmu_notifier_arch_invalidate_secondary_tlbs(current->mm, 0, -1UL);
>  }
>  
>  /*

This patch also fixes a regression introduced on linux-next, the same
crash on arch_tlbbatch_flush() is reproducible with fstests generic/176
on XFS. This patch fixes that regression [0]. This should also close out
the syzbot crash too [1]

[0] https://gist.github.com/mcgrof/b37fc8cf7e6e1b3935242681de1a83e2
[1] https://lore.kernel.org/all/3afcb4060135a...@google.com/

Tested-by: Luis Chamberlain 

  Luis


Re: [PATCH 00/79] fs: new accessors for inode->i_ctime

2023-06-30 Thread Luis Chamberlain
On Wed, Jun 21, 2023 at 03:21:41PM -0400, Steven Rostedt wrote:
> On Wed, 21 Jun 2023 10:45:05 -0400
> Jeff Layton  wrote:
> 
> > Most of this conversion was done via coccinelle, with a few of the more
> > non-standard accesses done by hand. There should be no behavioral
> > changes with this set. That will come later, as we convert individual
> > filesystems to use multigrain timestamps.
> 
> BTW, Linus has suggested to me that whenever a conccinelle script is used,
> it should be included in the change log.

Sometimes people like the coccinelle included in the commit, sometimes
people don't [0], it really ends up being up to a subjective maintainer
preference. A compromise could be to use git notes as these are
optional, however if we want to go down that path we should try to make
a general consensus on it so we can send a consistent message.

[0] https://lore.kernel.org/all/20230512073100.gc32...@twin.jikos.cz/

  Luis


Re: [PATCH 01/79] fs: add ctime accessors infrastructure

2023-06-30 Thread Luis Chamberlain
On Wed, Jun 21, 2023 at 10:45:06AM -0400, Jeff Layton wrote:
> struct timespec64 has unused bits in the tv_nsec field that can be used
> for other purposes. In future patches, we're going to change how the
> inode->i_ctime is accessed in certain inodes in order to make use of
> them. In order to do that safely though, we'll need to eradicate raw
> accesses of the inode->i_ctime field from the kernel.
> 
> Add new accessor functions for the ctime that we can use to replace them.
> 
> Signed-off-by: Jeff Layton 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH] driver core: class: mark the struct class for sysfs callbacks as constant

2023-03-27 Thread Luis Chamberlain
On Sat, Mar 25, 2023 at 09:45:37AM +0100, Greg Kroah-Hartman wrote:
> struct class should never be modified in a sysfs callback as there is
> nothing in the structure to modify, and frankly, the structure is almost
> never used in a sysfs callback, so mark it as constant to allow struct
> class to be moved to read-only memory.
> 
> While we are touching all class sysfs callbacks also mark the attribute
> as constant as it can not be modified.  The bonding code still uses this
> structure so it can not be removed from the function callbacks.
> 
> Signed-off-by: Greg Kroah-Hartman 

Reviewed-by: Luis Chamberlain 

  Luis


[PATCH 0/2] ppc: simplify sysctl registration

2023-03-10 Thread Luis Chamberlain
We can simplify the way we do sysctl registration both by
reducing the number of lines and also avoiding calllers which
could do recursion. The docs are being updated to help reflect
this better [0].

[0] https://lore.kernel.org/all/20230310223947.3917711-1-mcg...@kernel.org/T/#u 


Luis Chamberlain (2):
  ppc: simplify one-level sysctl registration for
powersave_nap_ctl_table
  ppc: simplify one-level sysctl registration for
nmi_wd_lpm_factor_ctl_table

 arch/powerpc/kernel/idle.c| 10 +-
 arch/powerpc/platforms/pseries/mobility.c | 10 +-
 2 files changed, 2 insertions(+), 18 deletions(-)

-- 
2.39.1



[PATCH 2/2] ppc: simplify one-level sysctl registration for nmi_wd_lpm_factor_ctl_table

2023-03-10 Thread Luis Chamberlain
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain 
---
 arch/powerpc/platforms/pseries/mobility.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 4cea71aa0f41..2b58e76abef8 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -62,18 +62,10 @@ static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = {
},
{}
 };
-static struct ctl_table nmi_wd_lpm_factor_sysctl_root[] = {
-   {
-   .procname   = "kernel",
-   .mode   = 0555,
-   .child  = nmi_wd_lpm_factor_ctl_table,
-   },
-   {}
-};
 
 static int __init register_nmi_wd_lpm_factor_sysctl(void)
 {
-   register_sysctl_table(nmi_wd_lpm_factor_sysctl_root);
+   register_sysctl("kernel", nmi_wd_lpm_factor_ctl_table);
 
return 0;
 }
-- 
2.39.1



[PATCH 1/2] ppc: simplify one-level sysctl registration for powersave_nap_ctl_table

2023-03-10 Thread Luis Chamberlain
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain 
---
 arch/powerpc/kernel/idle.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b9a725abc596..b1c0418b25c8 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -107,19 +107,11 @@ static struct ctl_table powersave_nap_ctl_table[] = {
},
{}
 };
-static struct ctl_table powersave_nap_sysctl_root[] = {
-   {
-   .procname   = "kernel",
-   .mode   = 0555,
-   .child  = powersave_nap_ctl_table,
-   },
-   {}
-};
 
 static int __init
 register_powersave_nap_sysctl(void)
 {
-   register_sysctl_table(powersave_nap_sysctl_root);
+   register_sysctl("kernel", powersave_nap_ctl_table);
 
return 0;
 }
-- 
2.39.1



Re: [PATCH] kallsyms: Fix scheduling with interrupts disabled in self-test

2023-01-13 Thread Luis Chamberlain
On Fri, Jan 13, 2023 at 09:44:35AM +0100, Petr Mladek wrote:
> On Thu 2023-01-12 10:24:43, Luis Chamberlain wrote:
> > On Thu, Jan 12, 2023 at 08:54:26PM +1000, Nicholas Piggin wrote:
> > > kallsyms_on_each* may schedule so must not be called with interrupts
> > > disabled. The iteration function could disable interrupts, but this
> > > also changes lookup_symbol() to match the change to the other timing
> > > code.
> > > 
> > > Reported-by: Erhard F. 
> > > Link: 
> > > https://lore.kernel.org/all/bug-216902-206...@https.bugzilla.kernel.org%2F/
> > > Reported-by: kernel test robot 
> > > Link: 
> > > https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.s...@intel.com
> > > Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
> > > Signed-off-by: Nicholas Piggin 
> > > ---
> > 
> > Thanks Nicholas!
> > 
> > Petr had just suggested removing this aspect of the selftests, the 
> > performance
> > test as its specific to the config, it doesn't run many times to get an
> > average and odd things on a system can create different metrics. Zhen Lei
> > had given up on fixing it and has a patch to instead remove this part of
> > the selftest.
> > 
> > I still find value in keeping it, but Petr, would like your opinion on
> > this fix, if we were to keep it.
> 
> I am fine with this fix.

Merged the fix. I'll push to Linus for 6.2-rc4

  Luis


Re: [PATCH] kallsyms: Fix scheduling with interrupts disabled in self-test

2023-01-12 Thread Luis Chamberlain
On Thu, Jan 12, 2023 at 08:54:26PM +1000, Nicholas Piggin wrote:
> kallsyms_on_each* may schedule so must not be called with interrupts
> disabled. The iteration function could disable interrupts, but this
> also changes lookup_symbol() to match the change to the other timing
> code.
> 
> Reported-by: Erhard F. 
> Link: 
> https://lore.kernel.org/all/bug-216902-206...@https.bugzilla.kernel.org%2F/
> Reported-by: kernel test robot 
> Link: 
> https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.s...@intel.com
> Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
> Signed-off-by: Nicholas Piggin 
> ---

Thanks Nicholas!

Petr had just suggested removing this aspect of the selftests, the performance
test as its specific to the config, it doesn't run many times to get an
average and odd things on a system can create different metrics. Zhen Lei
had given up on fixing it and has a patch to instead remove this part of
the selftest.

I still find value in keeping it, but Petr, would like your opinion on
this fix, if we were to keep it.

  Luis

>  kernel/kallsyms_selftest.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
> index f35d9cc1aab1..bfbc12da3326 100644
> --- a/kernel/kallsyms_selftest.c
> +++ b/kernel/kallsyms_selftest.c
> @@ -157,14 +157,11 @@ static void test_kallsyms_compression_ratio(void)
>  static int lookup_name(void *data, const char *name, struct module *mod, 
> unsigned long addr)
>  {
>   u64 t0, t1, t;
> - unsigned long flags;
>   struct test_stat *stat = (struct test_stat *)data;
>  
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   (void)kallsyms_lookup_name(name);
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>  
>   t = t1 - t0;
>   if (t < stat->min)
> @@ -234,18 +231,15 @@ static int find_symbol(void *data, const char *name, 
> struct module *mod, unsigne
>  static void test_perf_kallsyms_on_each_symbol(void)
>  {
>   u64 t0, t1;
> - unsigned long flags;
>   struct test_stat stat;
>  
>   memset(, 0, sizeof(stat));
>   stat.max = INT_MAX;
>   stat.name = stub_name;
>   stat.perf = 1;
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   kallsyms_on_each_symbol(find_symbol, );
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>   pr_info("kallsyms_on_each_symbol() traverse all: %lld ns\n", t1 - t0);
>  }
>  
> @@ -270,17 +264,14 @@ static int match_symbol(void *data, unsigned long addr)
>  static void test_perf_kallsyms_on_each_match_symbol(void)
>  {
>   u64 t0, t1;
> - unsigned long flags;
>   struct test_stat stat;
>  
>   memset(, 0, sizeof(stat));
>   stat.max = INT_MAX;
>   stat.name = stub_name;
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   kallsyms_on_each_match_symbol(match_symbol, stat.name, );
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>   pr_info("kallsyms_on_each_match_symbol() traverse all: %lld ns\n", t1 - 
> t0);
>  }
>  
> -- 
> 2.37.2
> 


Re: [PATCH v5 0/2] powerpc module arch checks

2022-11-02 Thread Luis Chamberlain
On Mon, Oct 31, 2022 at 10:07:31PM +1000, Nicholas Piggin wrote:
> Luis if you would be okay for patch 1 to be merged via powerpc or
> prefer to take it in the module tree (or maybe you object to the
> code in the first place).

Looks good to me, and nothing on my radar which would cause a conflict
so happy for you to take it via powerpc or I can take it and apply it
right away to tricke / get tested on linux-next by tomorrow.

Let me know.

  Luis


Re: [PATCH v5 1/2] module: add module_elf_check_arch for module-specific checks

2022-11-02 Thread Luis Chamberlain
On Mon, Oct 31, 2022 at 10:07:32PM +1000, Nicholas Piggin wrote:
> The elf_check_arch() function is also used to test compatibility of
> usermode binaries. Kernel modules may have more specific requirements,
> for example powerpc would like to test for ABI version compatibility.
> 
> Add a weak module_elf_check_arch() that defaults to true, and call it
> from elf_validity_check().
> 
> Cc: Michael Ellerman 
> Signed-off-by: Jessica Yu 
> [np: added changelog, adjust name, rebase]
> Signed-off-by: Nicholas Piggin 

Acked-by: Luis Chamberlain 

  Luis


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-10 Thread Luis Chamberlain
, Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , Mark Brown 
, Borislav Petkov , Alexander Egorenkov 
, Thomas Bogendoerfer , 
linux-par...@vger.kernel.org, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , 
sparcli...@vger.kernel.org, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel 
, x...@kernel.org, linux-ri...@lists.infradead.org, Ing
 o Molnar , Aaron Tomlin , Albert Ou 
, Heiko Carstens , Liao Chang 
, Paul Walmsley , Josh 
Poimboeuf , Thomas Richter , 
linux-m...@vger.kernel.org, Changbin Du , Palmer Dabbelt 
, linuxppc-dev@lists.ozlabs.org, 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 09, 2022 at 08:47:38AM +0100, Russell King (Oracle) wrote:
> On Wed, Jun 08, 2022 at 02:59:27AM +0300, Jarkko Sakkinen wrote:
> > diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> > index 553866751e1a..d2bb954cd54f 100644
> > --- a/arch/arm/kernel/Makefile
> > +++ b/arch/arm/kernel/Makefile
> > @@ -44,6 +44,11 @@ obj-$(CONFIG_CPU_IDLE)   += cpuidle.o
> >  obj-$(CONFIG_ISA_DMA_API)  += dma.o
> >  obj-$(CONFIG_FIQ)  += fiq.o fiqasm.o
> >  obj-$(CONFIG_MODULES)  += armksyms.o module.o
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y  += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES)  += module_alloc.o
> > +endif
> 
> Doesn't:
> 
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o

That just begs for a new kconfig symbol for the object, and for
the object then to be built with it.

The archs which override the default can use ARCH_HAS_VM_ALLOC_EXEC.
Please note that the respective free is important as well and its
not clear if we need an another define for the free. Someone has
to do that work. We want to ensure to noexec the code on free and
this can vary on each arch.

> work just as well? The kbuild modules.rst documentation says:
> 
> The order of files in $(obj-y) is significant.  Duplicates in
> the lists are allowed: the first instance will be linked into
> built-in.a and succeeding instances will be ignored.
> 
> so you should be fine... or the documentation is wrong!

Agreed, but this is just sloppy, better to use a new kconfig symbol
to represent what is actually being required.

  Luis


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-10 Thread Luis Chamberlain
l...@kernel.org>, Masahiro Yamada , Jarkko Sakkinen 
, Sami Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel 
, the arch/x86 maintainers , Russell King 
, linux-riscv , Ingo 
Molnar , Aaron Tomlin , Albert Ou 
, Heiko Carstens , Liao Chang 
, Paul Walmsley , Josh 
Poimboeuf , Thomas Richter , "open 
list:BROADCOM NVRAM DRIVER" , Changbin Du 
, Palmer Dabbelt , linuxppc-dev 
, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 09, 2022 at 05:48:52AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 08, 2022 at 01:26:19PM -0700, Luis Chamberlain wrote:
> > No, that was removed because it has only one user.
> 
> That is only part of the story.  The other part is that the overall
> kernel simply does not have any business allocating exutable memory.
> Executable memory is a very special concept for modules or module-like
> code like kprobes, and should not be exposed as a general concept.

It is not just modules and kprobes, it is also ftrace and bpf too now.
So while it should not be used everywhere calling it module_alloc()
is just confusing at this point. Likewise, module_alloc_huge() is
being proposed too and I'd rather we deal with this properly in aligment
of taking care of the rename as well.

If the concern is to restrict access we can use the module namespace stuff
so to ensure only intended users get access to it.

> Especially as executable memory really should not also be writable
> for security reasons.  In other words, we should actually never
> allocate executable memory, every.  We might seal memory and then
> mark it executable after having written to it, which is how modules
> and kprobes are implemented on all modern Linux ports anyway.

The respective free *should* do the executable bits, and there
is no generic way to do this for all archs and so it is open coded
today. In fact some architectures need further work / help and so
split up the module data and exect already on v5.19+ with the new
ARCH_WANTS_MODULES_DATA_IN_VMALLOC. See this thread for details:

https://lkml.kernel.org/r/yo1xtn441qbnt...@bombadil.infradead.org

Doing this work is not easy, but if we're going to do it, it must
be done right.

  Luis


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Luis Chamberlain
da , Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel , the arch/x86 
maintainers , Russell King , 
linux-riscv , Ingo Molnar , 
Aaron Tomlin , Albert Ou , Heiko 
Carstens , Liao Chang , Paul 
Walmsley , Josh Poimboeuf , 
Thomas Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 11:20:53AM -0700, Song Liu wrote:
> On Wed, Jun 8, 2022 at 9:12 AM Song Liu  wrote:
> > On Wed, Jun 8, 2022 at 7:21 AM Masami Hiramatsu  wrote:
> > > On Wed, 8 Jun 2022 08:25:38 +0300
> > > Jarkko Sakkinen  wrote:
> > > > On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > > > > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  
> > > > > wrote:
> > > > > > As the result, kprobes can be used with a monolithic kernel.
> > > > > It's strange when MODULES is n, but vmlinux still obtains 
> > > > > module_alloc.
> > > > >
> > > > > Maybe we need a kprobe_alloc, right?
> > > >
> > > > Perhaps not the best name but at least it documents the fact that
> > > > they use the same allocator.
> > > >
> > > > Few years ago I carved up something "half-way there" for kprobes,
> > > > and I used the name text_alloc() [*].
> > > >
> > > > [*] 
> > > > https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
> > >
> > > Yeah, I remember that. Thank you for updating your patch!
> > > I think the idea (split module_alloc() from CONFIG_MODULE) is good to me.
> > > If module support maintainers think this name is not good, you may be
> > > able to rename it as text_alloc() and make the module_alloc() as a
> > > wrapper of it.
> >
> > IIUC, most users of module_alloc() use it to allocate memory for text, 
> > except
> > that module code uses it for both text and data. Therefore, I guess calling 
> > it
> > text_alloc() is not 100% accurate until we change the module code (to use
> > a different API to allocate memory for data).
> 
> Git history showed me
> 
> 7a0e27b2a0ce mm: remove vmalloc_exec
> 
> I guess we are somehow going back in time...

No, that was removed because it has only one user. The real hard work
to generalize vmalloc_exec() with all the arch special sauce was not
done.

To do this properly architectures must be able to override it. We can
use the old vmalloc_exec() or text_alloc(). I think vmalloc_exec() is
more in line with mm stuff, but it would be our first __weak mm call
from what I can tell.

Anyway patches welcomed.

  Luis


Re: [GIT PULL] Modules fixes for v5.19-rc1

2022-05-25 Thread Luis Chamberlain
Sorry the subject should say "Modules changes".

I also forgot to itemize possible merge conflicts and resolutions
which linux-next reported:

powerpc:
https://lkml.kernel.org/r/20220520154055.7f964...@canb.auug.org.au

kbuild:
https://lkml.kernel.org/r/20220523120859.570f7...@canb.auug.org.au

  Luis


[GIT PULL] Modules fixes for v5.19-rc1

2022-05-25 Thread Luis Chamberlain
OK, finally some changes for modules. It is still pretty boring,
but I am hopefull that the cleanup will yield nice results in the
future as further cleanups will make the code much easier to
read, maintain and test. Perhaps the most exciting thing is
Christophe Leroy's CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC.
In reviewing Rick Edgecombe's prior work on enhancements for
special allocators I suspect this is going to help as module
space was the more complex aspect to deal with in his work.

AFAICT you *may* run into conflicts *if* bpf folks submit the
module_alloc_huge() stuff which I was still reviewing with Rick.
To my taste that effort seems to be going fast and I like to
take time to consider a proper interface for it which aligns well
with that others have in mind, specially in consideration for what
other architectures might need. The VM_FLUSH_RESET_PERMS stuff was
what was loose there. It doesn't seem we can address that stuff in
a generic neat way yet, and so the x86 open codes its own solution
for it.

I suspect we'll also need more tests on the huge page front so that
if more module_alloc() users want to convert we can enable folks to
give more realistic performance information rather than loose
numbers. In the future I suspect we'll just generalize module_alloc()
to vmalloc_exec() as its users are growing and the technical debt
of not drawing a clean API for it is growing.

Let me know if there are any issues.

  Luis

The following changes since commit 3123109284176b1532874591f7c81f3837bbdc17:

  Linux 5.18-rc1 (2022-04-03 14:08:21 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-5.19-rc1

for you to fetch changes up to 7390b94a3c2d93272d6da4945b81a9cf78055b7b:

  module: merge check_exported_symbol() into find_exported_symbol_in_section() 
(2022-05-12 10:29:41 -0700)


Modules updates for v5.19-rc1

As promised, for v5.19 I queued up quite a bit of work for modules, but
still with a pretty conservative eye. These changes have been soaking on
modules-next (and so linux-next) for quite some time, the code shift was
merged onto modules-next on March 22, and the last patch was queued on May
5th.

The following are the highlights of what bells and whistles we will get for
v5.19:

 1) It was time to tidy up kernel/module.c and one way of starting with
that effort was to split it up into files. At my request Aaron Tomlin
spearheaded that effort with the goal to not introduce any
functional at all during that endeavour.  The penalty for the split
is +1322 bytes total, +112 bytes in data, +1210 bytes in text while
bss is unchanged. One of the benefits of this other than helping
make the code easier to read and review is summoning more help on review
for changes with livepatching so kernel/module/livepatch.c is now
pegged as maintained by the live patching folks.

The before and after with just the move on a defconfig on x86-64:

 $ size kernel/module.o
textdata bss dec hex filename
   384344540 104   43078a846 kernel/module.o

 $ size -t kernel/module/*.o
textdata bss dec hex filename
   4785 120   049051329 kernel/module/kallsyms.o
  285774416 104   330978149 kernel/module/main.o
   1158   8   01166 48e kernel/module/procfs.o
902 108   01010 3f2 kernel/module/strict_rwx.o
   3390   0   03390 d3e kernel/module/sysfs.o
832   0   0 832 340 kernel/module/tree_lookup.o
  396444652 104   44400ad70 (TOTALS)

 2) Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING),
so to enable tracking unloaded modules which did taint the kernel.

 3) Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
which lets architectures to request having modules data in vmalloc
area instead of module area. There are three reasons why an
architecture might want this:

a) On some architectures (like book3s/32) it is not possible to protect
   against execution on a page basis. The exec stuff can be mapped by
   different arch segment sizes (on book3s/32 that is 256M segments). By
   default the module area is in an Exec segment while vmalloc area is in
   a NoExec segment. Using vmalloc lets you muck with module data as
   NoExec on those architectures whereas before you could not.

b) By pushing more module data to vmalloc you also increase the
   probability of module text to remain within a closer distance
   from kernel core text and this reduces trampolines, this has been
   reported on arm first and powerpc folks are following that lead.

c) Free'ing module_alloc() (Exec by default) area leaves this
   exposed as Exec by default, some architectures 

Re: request_module DoS

2022-05-12 Thread Luis Chamberlain
On Thu, May 12, 2022 at 10:07:26PM +1000, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > Luis Chamberlain  writes:
> ...
> >
> >> Can someone try this on ppc64le system? At this point I am not convinced
> >> this issue is generic.
> >
> > Does your x86 system have at least 784 CPUs?
> >
> > I don't know where the original report came from, but the trace shows
> > "CPU 784", which would usually indicate a system with at least that many
> > CPUs.
> 
> Update, apparently the report originally came from IBM, so I'll chase it
> up internally.
> 
> I think you're right that there's probably no issue in the module code,
> sorry to waste your time.

It gives me testing happiness to know that may be the case :)

  Luis


Re: request_module DoS

2022-05-11 Thread Luis Chamberlain
On Mon, May 09, 2022 at 09:13:03AM -0700, Luis Chamberlain wrote:
> On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote:
> > Herbert Xu  writes:
> > > Hi:
> > >
> > > There are some code paths in the kernel where you can reliably
> > > trigger a request_module of a non-existant module.  For example,
> > > if you attempt to load a non-existent crypto algorithm, or create
> > > a socket of a non-existent network family, it will result in a
> > > request_module call that is guaranteed to fail.
> > >
> > > As user-space can do this repeatedly, it can quickly overwhelm
> > > the concurrency limit in kmod.  This in itself is expected,
> > > however, at least on some platforms this appears to result in
> > > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> > >
> > > [  529.853264] request_module: kmod_concurrent_max (0) close to 0 
> > > (max_modprobes: 50), for module crypto-aegis128l, throttling...
> > ...
> > > [  580.414590] __request_module: 25 callbacks suppressed
> > > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > > plpar_hcall_norets_notrace+0x18/0x2c
> > > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > > TB:1297686321743840 (11009ms ago)
> > > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables 
> > > nfnetlink pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs 
> > > libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto 
> > > dm_mirror dm_region_hash dm_log dm_mod fuse
> > > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > > tainted 5.14.0-55.el9.ppc64le #1
> > > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > > 
> > > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > > (5.14.0-55.el9.ppc64le)
> > > [  580.423142] MSR:  8280b033   
> > > CR: 28008202  XER: 2004
> > > [  580.423148] CFAR: 0c00 IRQMASK: 1 
> > >GPR00: 28008202 c044c46b3850 c2a46f00 
> > >  
> > >GPR04:   0010 
> > > c2a83060 
> > >GPR08:  0001 0001 
> > >  
> > >GPR12: c01b9530 c043ffe16700 00020117 
> > > 10185ea8 
> > >GPR16: 10212150 10186198 101863a0 
> > > 1021b3c0 
> > >GPR20: 0001  0001 
> > > 00ff 
> > >GPR24: c043f4a00e14 c043fafe0e00 0c44 
> > >  
> > >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > > c2561aa0 
> > > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > > [  580.423168] LR [c01f7c38] 
> > > __pv_queued_spin_lock_slowpath+0x528/0x530
> > > [  580.423173] Call Trace:
> > > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > > (unreliable)
> > > [  580.423177] [c044c46b3910] [c0ea6948] 
> > > _raw_spin_lock_irqsave+0xa8/0xc0
> > > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > > prepare_to_wait_event+0x40/0x200
> > > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > > __request_module+0x320/0x510
> > > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > > crypto_alg_mod_lookup+0x1e4/0x2e0
> > > [  580.423192] [c044c46b3b60] [c06f2178] 
> > > crypto_alloc_tfm_node+0xa8/0x1a0
> > > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > > crypto_alloc_aead+0x38/0x50
> > > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > > [

Re: request_module DoS

2022-05-09 Thread Luis Chamberlain
On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote:
> Herbert Xu  writes:
> > Hi:
> >
> > There are some code paths in the kernel where you can reliably
> > trigger a request_module of a non-existant module.  For example,
> > if you attempt to load a non-existent crypto algorithm, or create
> > a socket of a non-existent network family, it will result in a
> > request_module call that is guaranteed to fail.
> >
> > As user-space can do this repeatedly, it can quickly overwhelm
> > the concurrency limit in kmod.  This in itself is expected,
> > however, at least on some platforms this appears to result in
> > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> >
> > [  529.853264] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis128l, throttling...
> ...
> > [  580.414590] __request_module: 25 callbacks suppressed
> > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > TB:1297686321743840 (11009ms ago)
> > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink 
> > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c 
> > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror 
> > dm_region_hash dm_log dm_mod fuse
> > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > tainted 5.14.0-55.el9.ppc64le #1
> > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > 
> > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423142] MSR:  8280b033   
> > CR: 28008202  XER: 2004
> > [  580.423148] CFAR: 0c00 IRQMASK: 1 
> >GPR00: 28008202 c044c46b3850 c2a46f00 
> >  
> >GPR04:   0010 
> > c2a83060 
> >GPR08:  0001 0001 
> >  
> >GPR12: c01b9530 c043ffe16700 00020117 
> > 10185ea8 
> >GPR16: 10212150 10186198 101863a0 
> > 1021b3c0 
> >GPR20: 0001  0001 
> > 00ff 
> >GPR24: c043f4a00e14 c043fafe0e00 0c44 
> >  
> >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > c2561aa0 
> > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423168] LR [c01f7c38] 
> > __pv_queued_spin_lock_slowpath+0x528/0x530
> > [  580.423173] Call Trace:
> > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > (unreliable)
> > [  580.423177] [c044c46b3910] [c0ea6948] 
> > _raw_spin_lock_irqsave+0xa8/0xc0
> > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > prepare_to_wait_event+0x40/0x200
> > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > __request_module+0x320/0x510
> > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > crypto_alg_mod_lookup+0x1e4/0x2e0
> > [  580.423192] [c044c46b3b60] [c06f2178] 
> > crypto_alloc_tfm_node+0xa8/0x1a0
> > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > crypto_alloc_aead+0x38/0x50
> > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > [  580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160
> > [  580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40
> > [  580.423207] [c044c46b3db0] [c0030880] 
> > system_call_exception+0x160/0x300
> > [  580.423209] [c044c46b3e10] [c000c168] 
> > system_call_vectored_common+0xe8/0x278
> > [  580.423213] --- interrupt: 3000 at 0x7fff9b824464
> > [  580.423214] NIP:  7fff9b824464 LR:  CTR: 
> > 
> > [  580.423215] REGS: c044c46b3e80 TRAP: 3000   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423216] MSR:  8280f033   
> > CR: 42004802  XER: 
> > [  580.423221] IRQMASK: 0 
> >GPR00: 0147 7fffdcff2780 7fff9b917100 
> > 0004 
> >GPR04: 7fffdcff27e0 0058  
> >  
> >GPR08: 

Re: request_module DoS

2022-05-08 Thread Luis Chamberlain
On Sat, May 07, 2022 at 12:14:47PM -0700, Luis Chamberlain wrote:
> On Sat, May 07, 2022 at 01:02:20AM -0700, Luis Chamberlain wrote:
> > You can try to reproduce by using adding a new test type for crypto-aegis256
> > on lib/test_kmod.c. These tests however can try something similar but other
> > modules.
> > 
> > /tools/testing/selftests/kmod/kmod.sh -t 0008
> > /tools/testing/selftests/kmod/kmod.sh -t 0009
> > 
> > I can't decipher this yet.
> 
> Without testing it... but something like this might be an easier
> reproducer:
> 
> + config_set_driver crypto-aegis256

If the module is not present though nothing really happens, and so
is it possible this is another issue?

Below a bogus module request.

diff --git a/tools/testing/selftests/kmod/kmod.sh 
b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..a747ad549940 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -65,6 +66,7 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
 ALL_TESTS="$ALL_TESTS 0011:1:1"
 ALL_TESTS="$ALL_TESTS 0012:1:1"
 ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:150:1"
 
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
@@ -504,6 +506,17 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head 
-n1"
 }
 
+kmod_test_0014()
+{
+   kmod_defaults_driver
+   MODPROBE_LIMIT=$(config_get_modprobe_limit)
+   let EXTRA=$MODPROBE_LIMIT/6
+   config_set_driver bogus_module_does_not_exist
+   config_num_thread_limit_extra $EXTRA
+   config_trigger ${FUNCNAME[0]}
+   config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
 list_tests()
 {
echo "Test ID list:"
@@ -525,6 +538,7 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module 
autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address 
visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* 
visibility under CAP_SYSLOG"
+   echo "0014 x $(get_test_count 0014) - multithreaded - push 
kmod_concurrent over max_modprobes for request_module() for a missing module"
 }
 
 usage()


Re: request_module DoS

2022-05-07 Thread Luis Chamberlain
On Sat, May 07, 2022 at 01:02:20AM -0700, Luis Chamberlain wrote:
> You can try to reproduce by using adding a new test type for crypto-aegis256
> on lib/test_kmod.c. These tests however can try something similar but other
> modules.
> 
> /tools/testing/selftests/kmod/kmod.sh -t 0008
> /tools/testing/selftests/kmod/kmod.sh -t 0009
> 
> I can't decipher this yet.

Without testing it... but something like this might be an easier
reproducer:

diff --git a/tools/testing/selftests/kmod/kmod.sh 
b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..48b6b5ec6c1e 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -41,6 +41,7 @@ set -e
 TEST_NAME="kmod"
 TEST_DRIVER="test_${TEST_NAME}"
 TEST_DIR=$(dirname $0)
+PROC_CONFIG="/proc/config.gz"
 
 # This represents
 #
@@ -65,6 +66,7 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
 ALL_TESTS="$ALL_TESTS 0011:1:1"
 ALL_TESTS="$ALL_TESTS 0012:1:1"
 ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:150:1"
 
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
@@ -79,6 +81,19 @@ test_modprobe()
fi
 }
 
+kconfig_has()
+{
+   if [ -f $PROC_CONFIG ]; then
+   if zgrep -q $1 $PROC_CONFIG 2>/dev/null; then
+   echo "yes"
+   else
+   echo "no"
+   fi
+   else
+   echo "no"
+   fi
+}
+
 function allow_user_defaults()
 {
if [ -z $DEFAULT_KMOD_DRIVER ]; then
@@ -106,6 +121,8 @@ function allow_user_defaults()
fi
 
MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
+   HAS_CRYPTO_AEGIS256_MOD="$(kconfig_has CONFIG_CRYPTO_AEGIS256=m)"
+   HAS_CRYPTO_AEGIS256_BUILTIN="$(kconfig_has CONFIG_CRYPTO_AEGIS256=y)"
 }
 
 test_reqs()
@@ -504,6 +521,21 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head 
-n1"
 }
 
+kmod_test_0014()
+{
+   kmod_defaults_driver
+   MODPROBE_LIMIT=$(config_get_modprobe_limit)
+   let EXTRA=$MODPROBE_LIMIT/6
+   config_set_driver crypto-aegis256
+   config_num_thread_limit_extra $EXTRA
+   config_trigger ${FUNCNAME[0]}
+   if [[ "$HAS_CRYPTO_AEGIS256_MOD" == "yes" || 
"$HAS_CRYPTO_AEGIS256_BUILTIN" == "yes" ]]; then
+   config_expect_result ${FUNCNAME[0]} SUCCESS
+   else
+   config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+   fi
+}
+
 list_tests()
 {
echo "Test ID list:"
@@ -525,6 +557,7 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module 
autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address 
visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* 
visibility under CAP_SYSLOG"
+   echo "0014 x $(get_test_count 0014) - multithreaded - push 
kmod_concurrent over max_modprobes for request_module() for crypto-aegis256"
 }
 
 usage()


Re: request_module DoS

2022-05-07 Thread Luis Chamberlain
On Sat, May 07, 2022 at 07:10:23AM +, Christophe Leroy wrote:
> > There are some code paths in the kernel where you can reliably
> > trigger a request_module of a non-existant module.  For example,
> > if you attempt to load a non-existent crypto algorithm, or create
> > a socket of a non-existent network family, it will result in a
> > request_module call that is guaranteed to fail.
> > 
> > As user-space can do this repeatedly, it can quickly overwhelm
> > the concurrency limit in kmod.  This in itself is expected,
> > however, at least on some platforms this appears to result in
> > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> > 
> > [  579.845320] request_module: modprobe crypto-aegis256 cannot be 
> > processed, kmod busy with 50 threads for more than 5 seconds now
> > [  580.414590] __request_module: 25 callbacks suppressed
> > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > TB:1297686321743840 (11009ms ago)
> > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink 
> > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c 
> > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror 
> > dm_region_hash dm_log dm_mod fuse
> > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > tainted 5.14.0-55.el9.ppc64le #1
> > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > 
> > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423142] MSR:  8280b033   
> > CR: 28008202  XER: 2004
> > [  580.423148] CFAR: 0c00 IRQMASK: 1
> > GPR00: 28008202 c044c46b3850 c2a46f00 
> > 
> > GPR04:   0010 
> > c2a83060
> > GPR08:  0001 0001 
> > 
> > GPR12: c01b9530 c043ffe16700 00020117 
> > 10185ea8
> > GPR16: 10212150 10186198 101863a0 
> > 1021b3c0
> > GPR20: 0001  0001 
> > 00ff
> > GPR24: c043f4a00e14 c043fafe0e00 0c44 
> > 
> > GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > c2561aa0
> > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423168] LR [c01f7c38] 
> > __pv_queued_spin_lock_slowpath+0x528/0x530
> > [  580.423173] Call Trace:
> > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > (unreliable)
> > [  580.423177] [c044c46b3910] [c0ea6948] 
> > _raw_spin_lock_irqsave+0xa8/0xc0
> > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > prepare_to_wait_event+0x40/0x200
> > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > __request_module+0x320/0x510
> > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > crypto_alg_mod_lookup+0x1e4/0x2e0
> > [  580.423192] [c044c46b3b60] [c06f2178] 
> > crypto_alloc_tfm_node+0xa8/0x1a0
> > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > crypto_alloc_aead+0x38/0x50
> > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > [  580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160
> > [  580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40
> > [  580.423207] [c044c46b3db0] [c0030880] 
> > system_call_exception+0x160/0x300
> > [  580.423209] [c044c46b3e10] [c000c168] 
> > system_call_vectored_common+0xe8/0x278
> > [  580.423213] --- interrupt: 3000 at 0x7fff9b824464
> > [  580.423214] NIP:  7fff9b824464 LR:  CTR: 
> > 
> > [  580.423215] REGS: c044c46b3e80 TRAP: 3000   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423216] MSR:  8280f033   
> > CR: 42004802  XER: 
> > [  580.423221] IRQMASK: 0
> > GPR00: 0147 7fffdcff2780 7fff9b917100 
> > 0004
> > GPR04: 7fffdcff27e0 0058  
> > 
> > GPR08:   

Re: [PATCH v6 0/6] Allocate module text and data separately

2022-03-22 Thread Luis Chamberlain
On Wed, Feb 23, 2022 at 01:02:10PM +0100, Christophe Leroy wrote:
> This series applies on top of my series "miscellanuous cleanups" v4.

Queued onto modules-testing! BTW I just had to rebase the change
with the kdb changes, it was a trivial change.

  Luis


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-02-10 Thread Luis Chamberlain
On Wed, Feb 09, 2022 at 03:46:05PM +1100, Michael Ellerman wrote:
> Luis Chamberlain  writes:
> > On Tue, Jan 11, 2022 at 12:37:42PM +0100, Michal Suchanek wrote:
> >> Hello,
> >> 
> >> This is a refresh of the KEXEC_SIG series.
> >> 
> >> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> >> with appended signatures in the kernel.
> >> 
> >> powerpc supports IMA_KEXEC but that's an exception rather than the norm.
> >> On the other hand, KEXEC_SIG is portable across platforms.
> >> 
> >> For distributions to have uniform security features across platforms one
> >> option should be used on all platforms.
> >> 
> >> Thanks
> >> 
> >> Michal
> >> 
> >> Previous revision: 
> >> https://lore.kernel.org/linuxppc-dev/cover.1637862358.git.msucha...@suse.de/
> >> Patched kernel tree: https://github.com/hramrach/kernel/tree/kexec_sig
> >> 
> >> Michal Suchanek (6):
> >>   s390/kexec_file: Don't opencode appended signature check.
> >>   powerpc/kexec_file: Add KEXEC_SIG support.
> >>   kexec_file: Don't opencode appended signature verification.
> >>   module: strip the signature marker in the verification function.
> >>   module: Use key_being_used_for for log messages in
> >> verify_appended_signature
> >>   module: Move duplicate mod_check_sig users code to mod_parse_sig
> >
> > What tree should this go through? I'd prefer if over through modules
> > tree as it can give a chance for Aaron Tomlin to work with this for his
> > code refactoring of kernel/module*.c to kernel/module/
> 
> Yeah that's fine by me, the arch changes are pretty minimal and unlikely
> to conflict much.

Ok sounds good thanks.

  Luis


Re: [PATCH v3 4/6] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-03 Thread Luis Chamberlain
On Thu, Feb 03, 2022 at 07:05:13AM +, Christophe Leroy wrote:
> 
> 
> Le 03/02/2022 à 01:01, Luis Chamberlain a écrit :
> > On Sat, Jan 29, 2022 at 05:02:09PM +, Christophe Leroy wrote:
> >> diff --git a/kernel/module.c b/kernel/module.c
> >> index 11f51e17fb9f..f3758115ebaa 100644
> >> --- a/kernel/module.c
> >> +++ b/kernel/module.c
> >> @@ -81,7 +81,9 @@
> >>   /* If this is set, the section belongs in the init part of the module */
> >>   #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
> >>   
> >> +#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >>   #define  data_layout core_layout
> >> +#endif
> >>   
> >>   /*
> >>* Mutex protects:
> >> @@ -111,6 +113,12 @@ static struct mod_tree_root {
> >>   #define module_addr_min mod_tree.addr_min
> >>   #define module_addr_max mod_tree.addr_max
> >>   
> >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >> +static struct mod_tree_root mod_data_tree __cacheline_aligned = {
> >> +  .addr_min = -1UL,
> >> +};
> >> +#endif
> >> +
> >>   #ifdef CONFIG_MODULES_TREE_LOOKUP
> >>   
> >>   /*
> >> @@ -186,6 +194,11 @@ static void mod_tree_insert(struct module *mod)
> >>__mod_tree_insert(>core_layout.mtn, _tree);
> >>if (mod->init_layout.size)
> >>__mod_tree_insert(>init_layout.mtn, _tree);
> >> +
> >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >> +  mod->data_layout.mtn.mod = mod;
> >> +  __mod_tree_insert(>data_layout.mtn, _data_tree);
> >> +#endif
> > 
> > 
> > kernel/ directory has quite a few files, module.c is the second to
> > largest file, and it has tons of stuff. Aaron is doing work to
> > split things out to make code easier to read and so that its easier
> > to review changes. See:
> > 
> > https://lkml.kernel.org/r/20220130213214.1042497-1-atom...@redhat.com
> > 
> > I think this is a good patch example which could benefit from that work.
> > So I'd much prefer to see that work go in first than this, so to see if
> > we can make the below changes more compartamentalized.
> > 
> > Curious, how much testing has been put into this series?
> 
> 
> I tested the change up to (including) patch 4 to verify it doesn't 
> introduce regression when not using 
> CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC,

> Then I tested with patch 5. I first tried with the 'hello world' test 
> module. After that I loaded several important modules and checked I 
> didn't get any regression, both with and without STRICT_MODULES_RWX and 
> I checked the consistency in /proc/vmallocinfo
>   /proc/modules /sys/class/modules/*

I wonder if we have a test for STRICT_MODULES_RWX.

> I also tested with a hacked module_alloc() to force branch trampolines.

So to verify that reducing these trampolines actually helps on an
architecture? I wonder if we can generalize this somehow to let archs
verify such strategies can help.

I was hoping for a bit more wider testing, like actually users, etc.
It does not seem like so. So we can get to that by merging this soon
into modules-next and having this bleed out issues with linux-next.
We are in good time to do this now.

The kmod tree has tons of tests:

https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git/

Can you use that to verify there are no regressions?

Aaron, Michal, if you can do the same that'd be appreciated.


  Luis


Re: [PATCH v3 0/6] Allocate module text and data separately

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:03PM +, Christophe Leroy wrote:
> This series allow architectures to request having modules data in
> vmalloc area instead of module area.
> 
> This is required on powerpc book3s/32 in order to set data non
> executable, because it is not possible to set executability on page
> basis, this is done per 256 Mbytes segments. The module area has exec
> right, vmalloc area has noexec. Without this change module data
> remains executable regardless of CONFIG_STRICT_MODULES_RWX.
> 
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.
> 

This looks good, however I'd like to see Aaron's changes go in first,
and then yours. Aaron's changes still need to be tested by 0-day and I
need to finish review, but that's the order of how I'd prefer to see
changes merged / tested. I'll try to review his changes, dump them to
modules-next and then I'd like to trouble you to rebase ontop of that.

We should get all this tested early for the next release.

  Luis


Re: [PATCH v3 4/6] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:09PM +, Christophe Leroy wrote:
> diff --git a/kernel/module.c b/kernel/module.c
> index 11f51e17fb9f..f3758115ebaa 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -81,7 +81,9 @@
>  /* If this is set, the section belongs in the init part of the module */
>  #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
>  
> +#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>  #define  data_layout core_layout
> +#endif
>  
>  /*
>   * Mutex protects:
> @@ -111,6 +113,12 @@ static struct mod_tree_root {
>  #define module_addr_min mod_tree.addr_min
>  #define module_addr_max mod_tree.addr_max
>  
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +static struct mod_tree_root mod_data_tree __cacheline_aligned = {
> + .addr_min = -1UL,
> +};
> +#endif
> +
>  #ifdef CONFIG_MODULES_TREE_LOOKUP
>  
>  /*
> @@ -186,6 +194,11 @@ static void mod_tree_insert(struct module *mod)
>   __mod_tree_insert(>core_layout.mtn, _tree);
>   if (mod->init_layout.size)
>   __mod_tree_insert(>init_layout.mtn, _tree);
> +
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> + mod->data_layout.mtn.mod = mod;
> + __mod_tree_insert(>data_layout.mtn, _data_tree);
> +#endif


kernel/ directory has quite a few files, module.c is the second to
largest file, and it has tons of stuff. Aaron is doing work to
split things out to make code easier to read and so that its easier
to review changes. See:

https://lkml.kernel.org/r/20220130213214.1042497-1-atom...@redhat.com

I think this is a good patch example which could benefit from that work.
So I'd much prefer to see that work go in first than this, so to see if
we can make the below changes more compartamentalized.

Curious, how much testing has been put into this series?

  Luis


Re: [PATCH v3 3/6] modules: Introduce data_layout

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:07PM +, Christophe Leroy wrote:
> diff --git a/kernel/module.c b/kernel/module.c
> index 163e32e39064..11f51e17fb9f 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -81,6 +81,8 @@
>  /* If this is set, the section belongs in the init part of the module */
>  #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
>  
> +#define  data_layout core_layout
> +
>  /*
>   * Mutex protects:
>   * 1) List of modules (also safely readable with preempt_disable),
> @@ -2451,7 +2454,10 @@ static void layout_sections(struct module *mod, struct 
> load_info *info)
>   || s->sh_entsize != ~0UL
>   || module_init_layout_section(sname))
>   continue;
> - s->sh_entsize = get_offset(mod, >core_layout.size, 
> s, i);
> + if (m)
> + s->sh_entsize = get_offset(mod, 
> >data_layout.size, s, i);
> + else
> + s->sh_entsize = get_offset(mod, 
> >core_layout.size, s, i);
>   pr_debug("\t%s\n", sname);

Huh why is this branching here, given you just used mod->data_layout in
all other areas?

> @@ -3468,6 +3474,8 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   if (shdr->sh_entsize & INIT_OFFSET_MASK)
>   dest = mod->init_layout.base
>   + (shdr->sh_entsize & ~INIT_OFFSET_MASK);
> + else if (!(shdr->sh_flags & SHF_EXECINSTR))
> + dest = mod->data_layout.base + shdr->sh_entsize;
>   else
>   dest = mod->core_layout.base + shdr->sh_entsize;
>  

Likewise here.

  Luis


Re: [PATCH v2 5/5] powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx

2022-02-02 Thread Luis Chamberlain
On Thu, Jan 27, 2022 at 11:28:12AM +, Christophe Leroy wrote:
> book3s/32 and 8xx have a separate area for allocating modules,
> defined by MODULES_VADDR / MODULES_END.
> 
> On book3s/32, it is not possible to protect against execution
> on a page basis. A full 256M segment is either Exec or NoExec.
> The module area is in an Exec segment while vmalloc area is
> in a NoExec segment.
> 
> In order to protect module data against execution, select
> ARCH_WANTS_MODULES_DATA_IN_VMALLOC.
> 
> For the 8xx (and possibly other 32 bits platform in the future),
> there is no such constraint on Exec/NoExec protection, however
> there is a critical distance between kernel functions and callers
> that needs to remain below 32Mbytes in order to avoid costly
> trampolines. By allocating data outside of module area, we
> increase the chance for module text to remain within acceptable
> distance from kernel core text.
> 
> So select ARCH_WANTS_MODULES_DATA_IN_VMALLOC for 8xx as well.
> 
> Signed-off-by: Christophe Leroy 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 

Cc list first and then the SOB.

  Luis


Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-02 Thread Luis Chamberlain
On Wed, Jan 26, 2022 at 06:38:30AM +, Christophe Leroy wrote:
> 
> 
> Le 25/01/2022 à 22:10, Luis Chamberlain a écrit :
> > On Mon, Jan 24, 2022 at 09:22:34AM +, Christophe Leroy wrote:
> >> This can also be useful on other powerpc/32 in order to maximize the
> >> chance of code being close enough to kernel core to avoid branch
> >> trampolines.
> > 
> > Curious about all this branch trampoline talk. Do you have data to show
> > negative impact with things as-is?
> 
> See 
> https://github.com/linuxppc/linux/commit/2ec13df167040cd153c25c4d96d0ffc573ac4c40
> 
> Or 
> https://github.com/linuxppc/linux/commit/7d485f647c1f4a6976264c90447fb0dbf07b111d


This was useful and fun to read, thanks.

> > Also, was powerpc/32 broken then without this? The commit log seems to
> > suggest so, but I don't think that's the case. How was this issue noticed?
> 
> 
> Your question is related to the trampoline topic or the exec/noexec 
> flagging ?
> 
> Regarding trampoline, everything is working OK. That's just cherry on 
> the cake, when putting data away you can have more code closer to the 
> kernel. But that would not have been a reason in itself for this series.
> 
> Regarding the exec/noexec discussion, it's a real issue. powerpc/32 
> doesn't honor page exec flag, so when you select STRICT_MODULES_RWX and 
> flag module data as no-exec, it remains executable. That's because 
> powerpc/32 MMU doesn't have a per page exec flag but only a per 
> 256Mbytes segment exec flag.
> 
> Typical PPC32 layount:
> 0xf000-0x : VMALLOC AREA ==> NO EXEC
> 0xc000-0xefff : Linear kernel memory mapping
> 0xb000-0xbfff : MODULES AREA ==> EXEC
> 0x-0xafff : User space ==> EXEC
> 
> So STRICT_MODULES_RWX is broken on some powerpc/32

You know, this is the sort of information that I think would be
very useful for the commit log. Can you ammend?

> > 
> > Are there other future CPU families being planned where this is all true for
> > as well? Are they goin to be 32-bit as well?
> 
> Future I don't know.
> 
> Regarding the trampoline stuff, I see at least the following existing 
> architectures with a similar constraint:
> 
> ARM:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/include/asm/memory.h#L55
> 
> ARM even has a config item to allow trampolines or not. I might add the 
> same to powerpc to reduce number of pages used by modules.
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/Kconfig#L1514
> 
> NDS32 has the constraint
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nds32/include/asm/memory.h#L41
> 
> NIOS2 has the constraint, allthough they handled it in a different way:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nios2/kernel/module.c#L30
> 
> 
> 
> Even ARM64 benefits from modules closer to kernel:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm64/Kconfig#L1848
> 
> 
> Another future opportunity with the ability to allocate module parts 
> separately is the possibility to then use huge vmalloc mappings.
> 
> Today huge vmalloc mappings cannot be used for modules, see recent 
> discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211227145903.187152-4-wangkefeng.w...@huawei.com/

Alrighty, this is sufficient information, thanks!

  Luis


Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 24, 2022 at 09:22:34AM +, Christophe Leroy wrote:
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.

Curious about all this branch trampoline talk. Do you have data to show
negative impact with things as-is?

Also, was powerpc/32 broken then without this? The commit log seems to
suggest so, but I don't think that's the case. How was this issue noticed?

Are there other future CPU families being planned where this is all true for
as well? Are they goin to be 32-bit as well?

  Luis


Re: [PATCH 0/7] Allocate module text and data separately

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 24, 2022 at 09:22:11AM +, Christophe Leroy wrote:
> This series allow architectures to request having modules data in
> vmalloc area instead of module area.
> 
> This is required on powerpc book3s/32 in order to set data non
> executable, because it is not possible to set executability on page
> basis, this is done per 256 Mbytes segments. The module area has exec
> right, vmalloc area has noexec.
> 
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.

Am I understanding that this entire effort is for 32-bit powerpc?
If so, why such an interest in 32-bit these days?

  Luis


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:42PM +0100, Michal Suchanek wrote:
> Hello,
> 
> This is a refresh of the KEXEC_SIG series.
> 
> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> with appended signatures in the kernel.
> 
> powerpc supports IMA_KEXEC but that's an exception rather than the norm.
> On the other hand, KEXEC_SIG is portable across platforms.
> 
> For distributions to have uniform security features across platforms one
> option should be used on all platforms.
> 
> Thanks
> 
> Michal
> 
> Previous revision: 
> https://lore.kernel.org/linuxppc-dev/cover.1637862358.git.msucha...@suse.de/
> Patched kernel tree: https://github.com/hramrach/kernel/tree/kexec_sig
> 
> Michal Suchanek (6):
>   s390/kexec_file: Don't opencode appended signature check.
>   powerpc/kexec_file: Add KEXEC_SIG support.
>   kexec_file: Don't opencode appended signature verification.
>   module: strip the signature marker in the verification function.
>   module: Use key_being_used_for for log messages in
> verify_appended_signature
>   module: Move duplicate mod_check_sig users code to mod_parse_sig

What tree should this go through? I'd prefer if over through modules
tree as it can give a chance for Aaron Tomlin to work with this for his
code refactoring of kernel/module*.c to kernel/module/

  Luis


Re: [PATCH v5 6/6] module: Move duplicate mod_check_sig users code to mod_parse_sig

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:48PM +0100, Michal Suchanek wrote:
> Multiple users of mod_check_sig check for the marker, then call
> mod_check_sig, extract signature length, and remove the signature.
> 
> Put this code in one place together with mod_check_sig.
> 
> This changes the error from ENOENT to ENODATA for ima_read_modsig in the
> case the signature marker is missing.
> 
> This also changes the buffer length in ima_read_modsig from size_t to
> unsigned long. This reduces the possible value range on 32bit but the
> length refers to kernel in-memory buffer which cannot be longer than
> ULONG_MAX.
> 
> Also change mod_check_sig to unsigned long while at it.
> 
> Signed-off-by: Michal Suchanek 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 5/6] module: Use key_being_used_for for log messages in verify_appended_signature

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:47PM +0100, Michal Suchanek wrote:
> Add value for kexec appended signature and pass in key_being_used_for
> enum rather than a string to verify_appended_signature to produce log
> messages about the signature.
> 
> Signed-off-by: Michal Suchanek 

Acked-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 4/6] module: strip the signature marker in the verification function.

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:46PM +0100, Michal Suchanek wrote:
> It is stripped by each caller separately.
> 
> Note: this changes the error for kexec_file from EKEYREJECTED to ENODATA
> when the signature marker is missing.
> 
> Signed-off-by: Michal Suchanek 
> ---
> v3: - Philipp Rudo : Update the commit with note about
>   change of raturn value
> - the module_signature.h is now no longer needed for kexec_file

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 3/6] kexec_file: Don't opencode appended signature verification.

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:45PM +0100, Michal Suchanek wrote:
> diff --git a/include/linux/verification.h b/include/linux/verification.h
> index a655923335ae..32db9287a7b0 100644
> --- a/include/linux/verification.h
> +++ b/include/linux/verification.h
> @@ -60,5 +60,8 @@ extern int verify_pefile_signature(const void *pebuf, 
> unsigned pelen,
>  enum key_being_used_for usage);
>  #endif
>  
> +int verify_appended_signature(const void *data, unsigned long *len,
> +   struct key *trusted_keys, const char *what);
> +

Looks very non-module specific.

> diff --git a/kernel/module_signing.c b/kernel/module_signing.c
> index 8723ae70ea1f..30149969f21f 100644
> --- a/kernel/module_signing.c
> +++ b/kernel/module_signing.c
> @@ -14,32 +14,38 @@
>  #include 
>  #include "module-internal.h"
>  
> -/*
> - * Verify the signature on a module.
> +/**
> + * verify_appended_signature - Verify the signature on a module with the
> + * signature marker stripped.
> + * @data: The data to be verified
> + * @len: Size of @data.
> + * @trusted_keys: Keyring to use for verification
> + * @what: Informational string for log messages
>   */
> -int mod_verify_sig(const void *mod, struct load_info *info)
> +int verify_appended_signature(const void *data, unsigned long *len,
> +   struct key *trusted_keys, const char *what)
>  {
> - struct module_signature ms;
> - size_t sig_len, modlen = info->len;
> + struct module_signature *ms;

There goes the abstraction, so why not make this clear where we re-use
the struct module_signature for various things and call it as it is,
verify_mod_appended_signature() or some such?

David? Any preference?

Other than that:

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v4 1/6] s390/kexec_file: Don't opencode appended signature check.

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 10, 2022 at 02:49:53PM +0100, Michal Suchanek wrote:
> Module verification already implements appeded signature check.
> 
> Reuse it for kexec_file.
> 
> The kexec_file implementation uses EKEYREJECTED error in some cases when
> there is no key and the common implementation uses ENOPKG or EBADMSG
> instead.
> 
> Signed-off-by: Michal Suchanek 
> Acked-by: Heiko Carstens 
> ---
> v3: Philipp Rudo : Update the commit with note about
> change of return value
> ---
>  arch/s390/kernel/machine_kexec_file.c | 22 +-
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/s390/kernel/machine_kexec_file.c 
> b/arch/s390/kernel/machine_kexec_file.c
> index 8f43575a4dd3..c944d71316c7 100644
> --- a/arch/s390/kernel/machine_kexec_file.c
> +++ b/arch/s390/kernel/machine_kexec_file.c
> @@ -31,6 +31,7 @@ int s390_verify_sig(const char *kernel, unsigned long 
> kernel_len)
>   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
>   struct module_signature *ms;
>   unsigned long sig_len;
> + int ret;
>  
>   /* Skip signature verification when not secure IPLed. */
>   if (!ipl_secure_flag)
> @@ -45,25 +46,12 @@ int s390_verify_sig(const char *kernel, unsigned long 
> kernel_len)
>   kernel_len -= marker_len;
>  
>   ms = (void *)kernel + kernel_len - sizeof(*ms);
> - kernel_len -= sizeof(*ms);
> + ret = mod_check_sig(ms, kernel_len, "kexec");
> + if (ret)
> + return ret;
>  
>   sig_len = be32_to_cpu(ms->sig_len);
> - if (sig_len >= kernel_len)
> - return -EKEYREJECTED;

There is a small minor fix here, where by using mod_check_sig() now
decreased the kernel_len by the sizeof(*ms). It is minor though.

> - kernel_len -= sig_len;
> -
> - if (ms->id_type != PKEY_ID_PKCS7)
> - return -EKEYREJECTED;

More importantly is the return value used here changes but given the
Ack by Heiko I suspect this if fine and does not break old userspace,
the only change here is the possible error value returned by the
kexec_file_load() system call.

Reviewed-by: Luis Chamberlain 

   Luis


Re: [PATCH v2 6/8] inotify: simplify subdirectory registration with register_sysctl()

2021-11-24 Thread Luis Chamberlain
On Wed, Nov 24, 2021 at 10:44:09AM +0100, Jan Kara wrote:
> On Tue 23-11-21 12:24:20, Luis Chamberlain wrote:
> > From: Xiaoming Ni 
> > 
> > There is no need to user boiler plate code to specify a set of base
> > directories we're going to stuff sysctls under. Simplify this by using
> > register_sysctl() and specifying the directory path directly.
> > 
> > Move inotify_user sysctl to inotify_user.c while at it to remove clutter
> > from kernel/sysctl.c.
> > 
> > Signed-off-by: Xiaoming Ni 
> > [mcgrof: update commit log to reflect new path we decided to take]
> > Signed-off-by: Luis Chamberlain 
> 
> This looks fishy. You register inotify_table but not fanotify_table and
> remove both...

Indeed, the following was missing, I'll roll it in:

diff --git a/fs/notify/fanotify/fanotify_user.c 
b/fs/notify/fanotify/fanotify_user.c
index 559bc1e9926d..a35693eb1f36 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -59,7 +59,7 @@ static int fanotify_max_queued_events __read_mostly;
 static long ft_zero = 0;
 static long ft_int_max = INT_MAX;
 
-struct ctl_table fanotify_table[] = {
+static struct ctl_table fanotify_table[] = {
{
.procname   = "max_user_groups",
.data   = _user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS],
@@ -88,6 +88,13 @@ struct ctl_table fanotify_table[] = {
},
{ }
 };
+
+static void __init fanotify_sysctls_init(void)
+{
+   register_sysctl("fs/fanotify", fanotify_table);
+}
+#else
+#define fanotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 /*
@@ -1685,6 +1692,7 @@ static int __init fanotify_user_setup(void)
init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
FANOTIFY_DEFAULT_MAX_GROUPS;
init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks;
+   fanotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 616af2ea20f3..556cc63c88ee 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-extern struct ctl_table fanotify_table[]; /* for sysctl */
-
 #define FAN_GROUP_FLAG(group, flag) \
((group)->fanotify_data.flags & (flag))
 


[PATCH v2 2/8] i915: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/gpu/drm/i915/i915_perf.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2f01b8c0284c..5979e3258647 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4273,26 +4273,6 @@ static struct ctl_table oa_table[] = {
{}
 };
 
-static struct ctl_table i915_root[] = {
-   {
-.procname = "i915",
-.maxlen = 0,
-.mode = 0555,
-.child = oa_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = i915_root,
-},
-   {}
-};
-
 static void oa_init_supported_formats(struct i915_perf *perf)
 {
struct drm_i915_private *i915 = perf->i915;
@@ -4488,7 +4468,7 @@ static int destroy_config(int id, void *p, void *data)
 
 int i915_perf_sysctl_register(void)
 {
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl("dev/i915", oa_table);
return 0;
 }
 
-- 
2.33.0



[PATCH v2 0/8] sysctl: second set of kernel/sysctl cleanups

2021-11-23 Thread Luis Chamberlain
This is the 2nd set of kernel/sysctl.c cleanups. The diff stat should
reflect how this is a much better way to deal with theses. Fortunately
coccinelle can be used to ensure correctness for most of these and/or
future merge conflicts.

Note that since this is part of a larger effort to cleanup
kernel/sysctl.c I think we have no other option but to go with
merging these patches in either Andrew's tree or keep them staged
in a separate tree and send a merge request later. Otherwise
kernel/sysctl.c will end up becoming a sore spot for the next
merge window.

Changes in this v2:

 * As suggested by Eric W. Biederman I dropped the subdir new call
   and just used the register_sysctl() by specifying the parent
   directory.
 * 0-day cleanups, commit log enhancements
 * Updated the coccinelle patch with register_sysctl()

Luis Chamberlain (6):
  hpet: simplify subdirectory registration with register_sysctl()
  i915: simplify subdirectory registration with register_sysctl()
  macintosh/mac_hid.c: simplify subdirectory registration with
register_sysctl()
  ocfs2: simplify subdirectory registration with register_sysctl()
  test_sysctl: simplify subdirectory registration with register_sysctl()
  cdrom: simplify subdirectory registration with register_sysctl()

Xiaoming Ni (2):
  inotify: simplify subdirectory registration with register_sysctl()
  eventpoll: simplify sysctl declaration with register_sysctl()

 drivers/cdrom/cdrom.c| 23 +--
 drivers/char/hpet.c  | 22 +-
 drivers/gpu/drm/i915/i915_perf.c | 22 +-
 drivers/macintosh/mac_hid.c  | 24 +---
 fs/eventpoll.c   | 10 +-
 fs/notify/inotify/inotify_user.c | 11 ++-
 fs/ocfs2/stackglue.c | 25 +
 include/linux/inotify.h  |  3 ---
 include/linux/poll.h |  2 --
 include/linux/sysctl.h   |  1 -
 kernel/sysctl.c  | 28 
 lib/test_sysctl.c| 22 +-
 12 files changed, 25 insertions(+), 168 deletions(-)

-- 
2.33.0



[PATCH v2 6/8] inotify: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
From: Xiaoming Ni 

There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

Move inotify_user sysctl to inotify_user.c while at it to remove clutter
from kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
[mcgrof: update commit log to reflect new path we decided to take]
Signed-off-by: Luis Chamberlain 
---
 fs/notify/inotify/inotify_user.c | 11 ++-
 include/linux/inotify.h  |  3 ---
 kernel/sysctl.c  | 21 -
 3 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 29fca3284bb5..54583f62dc44 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -58,7 +58,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
 static long it_zero = 0;
 static long it_int_max = INT_MAX;
 
-struct ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname   = "max_user_instances",
.data   = 
_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES],
@@ -87,6 +87,14 @@ struct ctl_table inotify_table[] = {
},
{ }
 };
+
+static void __init inotify_sysctls_init(void)
+{
+   register_sysctl("fs/inotify", inotify_table);
+}
+
+#else
+#define inotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
@@ -849,6 +857,7 @@ static int __init inotify_user_setup(void)
inotify_max_queued_events = 16384;
init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
+   inotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index 6a24905f6e1e..8d20caa1b268 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -7,11 +7,8 @@
 #ifndef _LINUX_INOTIFY_H
 #define _LINUX_INOTIFY_H
 
-#include 
 #include 
 
-extern struct ctl_table inotify_table[]; /* for sysctl */
-
 #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | 
\
  IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
  IN_MOVED_TO | IN_CREATE | IN_DELETE | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 7a90a12b9ea4..6aa67c737e4e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -125,13 +125,6 @@ static const int maxolduid = 65535;
 static const int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
 
-#ifdef CONFIG_INOTIFY_USER
-#include 
-#endif
-#ifdef CONFIG_FANOTIFY
-#include 
-#endif
-
 #ifdef CONFIG_PROC_SYSCTL
 
 /**
@@ -3099,20 +3092,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_INOTIFY_USER
-   {
-   .procname   = "inotify",
-   .mode   = 0555,
-   .child  = inotify_table,
-   },
-#endif
-#ifdef CONFIG_FANOTIFY
-   {
-   .procname   = "fanotify",
-   .mode   = 0555,
-   .child  = fanotify_table,
-   },
-#endif
 #ifdef CONFIG_EPOLL
{
.procname   = "epoll",
-- 
2.33.0



[PATCH v2 5/8] test_sysctl: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci lib/test_sysctl.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 lib/test_sysctl.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 3750323973f4..a5a3d6c27e1f 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -128,26 +128,6 @@ static struct ctl_table test_table[] = {
{ }
 };
 
-static struct ctl_table test_sysctl_table[] = {
-   {
-   .procname   = "test_sysctl",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_table,
-   },
-   { }
-};
-
-static struct ctl_table test_sysctl_root_table[] = {
-   {
-   .procname   = "debug",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_sysctl_table,
-   },
-   { }
-};
-
 static struct ctl_table_header *test_sysctl_header;
 
 static int __init test_sysctl_init(void)
@@ -155,7 +135,7 @@ static int __init test_sysctl_init(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
-   test_sysctl_header = register_sysctl_table(test_sysctl_root_table);
+   test_sysctl_header = register_sysctl("debug/test_sysctl", test_table);
if (!test_sysctl_header) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
-- 
2.33.0



[PATCH v2 8/8] eventpoll: simplify sysctl declaration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
From: Xiaoming Ni 

The kernel/sysctl.c is a kitchen sink where everyone leaves
their dirty dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to
places where they actually belong. The proc sysctl maintainers
do not want to know what sysctl knobs you wish to add for your own
piece of code, we just care about the core logic.

So move the epoll_table sysctl to fs/eventpoll.c and use
use register_sysctl().

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/eventpoll.c | 10 +-
 include/linux/poll.h   |  2 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  7 ---
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 06f4c5ae1451..e2daa940ebce 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -307,7 +307,7 @@ static void unlist_file(struct epitems_head *head)
 static long long_zero;
 static long long_max = LONG_MAX;
 
-struct ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname   = "max_user_watches",
.data   = _user_watches,
@@ -319,6 +319,13 @@ struct ctl_table epoll_table[] = {
},
{ }
 };
+
+static void __init epoll_sysctls_init(void)
+{
+   register_sysctl("fs/epoll", epoll_table);
+}
+#else
+#define epoll_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static const struct file_operations eventpoll_fops;
@@ -2378,6 +2385,7 @@ static int __init eventpoll_init(void)
/* Allocates slab cache used to allocate "struct eppoll_entry" */
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+   epoll_sysctls_init();
 
ephead_cache = kmem_cache_create("ep_head",
sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 1cdc32b1f1b0..a9e0e1c2d1f2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -8,12 +8,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-extern struct ctl_table epoll_table[]; /* for sysctl */
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
 #ifdef __clang__
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 718492057c70..5e0428a71899 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -218,7 +218,6 @@ extern int no_unaligned_warning;
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
 extern struct ctl_table firmware_config_table[];
-extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6aa67c737e4e..b09ff41720e3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3092,13 +3092,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_EPOLL
-   {
-   .procname   = "epoll",
-   .mode   = 0555,
-   .child  = epoll_table,
-   },
-#endif
 #endif
{
.procname   = "protected_symlinks",
-- 
2.33.0



[PATCH v2 1/8] hpet: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci drivers/char/hpet.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/char/hpet.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 4e5431f01450..563dfae3b8da 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = {
{}
 };
 
-static struct ctl_table hpet_root[] = {
-   {
-.procname = "hpet",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_root,
-},
-   {}
-};
-
 static struct ctl_table_header *sysctl_header;
 
 /*
@@ -1061,7 +1041,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
 
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl("dev/hpet", hpet_table);
 
result = acpi_bus_register_driver(_acpi_driver);
if (result < 0) {
-- 
2.33.0



[PATCH v2 7/8] cdrom: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/cdrom/cdrom.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 9877e413fce3..1b57d4666e43 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3691,27 +3691,6 @@ static struct ctl_table cdrom_table[] = {
},
{ }
 };
-
-static struct ctl_table cdrom_cdrom_table[] = {
-   {
-   .procname   = "cdrom",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_table,
-   },
-   { }
-};
-
-/* Make sure that /proc/sys/dev is there */
-static struct ctl_table cdrom_root_table[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_cdrom_table,
-   },
-   { }
-};
 static struct ctl_table_header *cdrom_sysctl_header;
 
 static void cdrom_sysctl_register(void)
@@ -3721,7 +3700,7 @@ static void cdrom_sysctl_register(void)
if (!atomic_add_unless(, 1, 1))
return;
 
-   cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+   cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table);
 
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
-- 
2.33.0



[PATCH v2 3/8] macintosh/mac_hid.c: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/macintosh/mac_hid.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 28b8581b44dd..d8c4d5664145 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -239,33 +239,11 @@ static struct ctl_table mac_hid_files[] = {
{ }
 };
 
-/* dir in /proc/sys/dev */
-static struct ctl_table mac_hid_dir[] = {
-   {
-   .procname   = "mac_hid",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_files,
-   },
-   { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static struct ctl_table mac_hid_root_dir[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_dir,
-   },
-   { }
-};
-
 static struct ctl_table_header *mac_hid_sysctl_header;
 
 static int __init mac_hid_init(void)
 {
-   mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+   mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files);
if (!mac_hid_sysctl_header)
return -ENOMEM;
 
-- 
2.33.0



[PATCH v2 4/8] ocfs2: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 fs/ocfs2/stackglue.c | 25 +
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 16f1bfc407f2..731558a6f27d 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -672,31 +672,8 @@ static struct ctl_table ocfs2_mod_table[] = {
{ }
 };
 
-static struct ctl_table ocfs2_kern_table[] = {
-   {
-   .procname   = "ocfs2",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_mod_table
-   },
-   { }
-};
-
-static struct ctl_table ocfs2_root_table[] = {
-   {
-   .procname   = "fs",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_kern_table
-   },
-   { }
-};
-
 static struct ctl_table_header *ocfs2_table_header;
 
-
 /*
  * Initialization
  */
@@ -705,7 +682,7 @@ static int __init ocfs2_stack_glue_init(void)
 {
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
 
-   ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+   ocfs2_table_header = register_sysctl("fs/ocfs2", ocfs2_mod_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
   "ocfs2 stack glue: unable to register sysctl\n");
-- 
2.33.0



Re: [PATCH 12/13] sysctl: add helper to register empty subdir

2021-11-16 Thread Luis Chamberlain
On Fri, May 29, 2020 at 08:03:02AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain  writes:
> 
> > The way to create a subdirectory from the base set of directories
> > is a bit obscure, so provide a helper which makes this clear, and
> > also helps remove boiler plate code required to do this work.
> 
> I agreee calling:
> register_sysctl("fs/binfmt_misc", sysctl_mount_point)
> is a bit obscure but if you are going to make a wrapper
> please make it the trivial one liner above.
> 
> Say something that looks like:
>   struct sysctl_header *register_sysctl_mount_point(const char *path)
> {
>   return register_sysctl(path, sysctl_mount_point);
> }
> 
> And yes please talk about a mount point and not an empty dir, as these
> are permanently empty directories to serve as mount points.  There are
> some subtle but important permission checks this allows in the case of
> unprivileged mounts.
> 
> Further code like this belong in proc_sysctl.c next to all of the code
> it is related to so that it is easier to see how to refactor the code if
> necessary.

Alrighty, it's been a while since this kernel/sysctl.c kitchen sink
cleanup... so it's time to respin this now that the merge window is
open.  I already rebased patches, addressed all input and now just
waiting to fix any compilation errors.  I'm going to split the patches
up into real small sets so to ensure we just get this through becauase
getting this in otherwise is going to be hard.

I'd appreciate folk's review once the patches start going out. I think
a hard part will be deciding what tree this should got through.

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-03 Thread Luis Chamberlain
On Tue, Nov 02, 2021 at 07:28:02PM -0600, Jens Axboe wrote:
> On 11/2/21 6:49 PM, Dan Williams wrote:
> > On Tue, Nov 2, 2021 at 5:10 PM Luis Chamberlain  wrote:
> >>
> >> On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> >>> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  
> >>> wrote:
> >>>>
> >>>> If nd_integrity_init() fails we'd get del_gendisk() called,
> >>>> but that's not correct as we should only call that if we're
> >>>> done with device_add_disk(). Fix this by providing unwinding
> >>>> prior to the devm call being registered and moving the devm
> >>>> registration to the very end.
> >>>>
> >>>> This should fix calling del_gendisk() if nd_integrity_init()
> >>>> fails. I only spotted this issue through code inspection. It
> >>>> does not fix any real world bug.
> >>>>
> >>>
> >>> Just fyi, I'm preparing patches to delete this driver completely as it
> >>> is unused by any shipping platform. I hope to get that removal into
> >>> v5.16.
> >>
> >> Curious if are you going to nuking it on v5.16? Otherwise it would stand
> >> in the way of the last few patches to add __must_check for the final
> >> add_disk() error handling changes.
> > 
> > True, I don't think I can get it nuked in time, so you can add my
> > Reviewed-by for this one.
> 
> Luis, I lost track of the nv* patches from this discussion. If you want
> them in 5.16 and they are reviewed, please do resend and I'll pick them
> up for the middle-of-merge-window push.

Sure thing, I'll resend whatever is left. I also noticed for some reason
I forgot to convert nvdimm/pmem and so I'll roll those new patches in,
but I suspect that those might be too late unless we get them reviewed
in time.

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-03 Thread Luis Chamberlain
On Tue, Nov 02, 2021 at 05:49:12PM -0700, Dan Williams wrote:
> On Tue, Nov 2, 2021 at 5:10 PM Luis Chamberlain  wrote:
> >
> > On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> > > On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  
> > > wrote:
> > > >
> > > > If nd_integrity_init() fails we'd get del_gendisk() called,
> > > > but that's not correct as we should only call that if we're
> > > > done with device_add_disk(). Fix this by providing unwinding
> > > > prior to the devm call being registered and moving the devm
> > > > registration to the very end.
> > > >
> > > > This should fix calling del_gendisk() if nd_integrity_init()
> > > > fails. I only spotted this issue through code inspection. It
> > > > does not fix any real world bug.
> > > >
> > >
> > > Just fyi, I'm preparing patches to delete this driver completely as it
> > > is unused by any shipping platform. I hope to get that removal into
> > > v5.16.
> >
> > Curious if are you going to nuking it on v5.16? Otherwise it would stand
> > in the way of the last few patches to add __must_check for the final
> > add_disk() error handling changes.
> 
> True, I don't think I can get it nuked in time, so you can add my
> Reviewed-by for this one.

This patch required the previous patch in this series to also be
applied. Can I apply your Reviewed-by there too?

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-02 Thread Luis Chamberlain
On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > If nd_integrity_init() fails we'd get del_gendisk() called,
> > but that's not correct as we should only call that if we're
> > done with device_add_disk(). Fix this by providing unwinding
> > prior to the devm call being registered and moving the devm
> > registration to the very end.
> >
> > This should fix calling del_gendisk() if nd_integrity_init()
> > fails. I only spotted this issue through code inspection. It
> > does not fix any real world bug.
> >
> 
> Just fyi, I'm preparing patches to delete this driver completely as it
> is unused by any shipping platform. I hope to get that removal into
> v5.16.

Curious if are you going to nuking it on v5.16? Otherwise it would stand
in the way of the last few patches to add __must_check for the final
add_disk() error handling changes.

  Luis


Re: [PATCH 03/13] nvdimm/btt: do not call del_gendisk() if not needed

2021-11-02 Thread Luis Chamberlain
On Sun, Oct 31, 2021 at 10:47:22AM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > We know we don't need del_gendisk() if we haven't added
> > the disk, so just skip it. This should fix a bug on older
> > kernels, as del_gendisk() became able to deal with
> > disks not added only recently, after the patch titled
> > "block: add flag for add_disk() completion notation".
> 
> Perhaps put this in:
> 
> commit $abbrev_commit ("block: add flag for add_disk() completion 
> notation")
> 
> ...format, but I can't seem to find that commit?

Indeed, that patch got dropped and it would seem Christoph preferred
a simpler approach with the new disk_live()

commit 40b3a52ffc5bc3b5427d5d35b035cfb19d03fdd6
Author: Christoph Hellwig 
Date:   Wed Aug 18 16:45:32 2021 +0200

block: add a sanity check for a live disk in del_gendisk

> If you're touching the changelog how about one that clarifies the
> impact and drops "we"?
> "del_gendisk() is not required if the disk has not been added. On
> kernels prior to commit $abbrev_commit ("block: add flag for
> add_disk() completion notation")
> it is mandatory to not call del_gendisk() if the underlying device has
> not been through device_add()."
> 
> Fixes: 41cd8b70c37a ("libnvdimm, btt: add support for blk integrity")
> 
> With that you can add:
> 
> Reviewed-by: Dan Williams 

You got it.

  Luis


Re: [PATCH 00/13] block: add_disk() error handling stragglers

2021-10-25 Thread Luis Chamberlain
On Thu, Oct 21, 2021 at 08:10:49PM -0700, Geoff Levand wrote:
> Hi Luis,
> 
> On 10/18/21 9:15 AM, Luis Chamberlain wrote:
> > On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> >> Hi Luis,
> >>
> >> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> >>> This patch set consists of al the straggler drivers for which we have
> >>> have no patch reviews done for yet. I'd like to ask for folks to please
> >>> consider chiming in, specially if you're the maintainer for the driver.
> >>> Additionally if you can specify if you'll take the patch in yourself or
> >>> if you want Jens to take it, that'd be great too.
> >>
> >> Do you have a git repo with the patch set applied that I can use to test 
> >> with?
> > 
> > Sure, although the second to last patch is in a state of flux given
> > the ataflop driver currently is broken and so we're seeing how to fix
> > that first:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling
> 
> That branch has so many changes applied on top of the base v5.15-rc4
> that the patches I need to apply to test on PS3 with don't apply.
> 
> Do you have something closer to say v5.15-rc5?  Preferred would be
> just your add_disk() error handling patches plus what they depend
> on.

If you just want to test the ps3 changes, I've put this branch together
just for yo, its based on v5.15-rc6:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20211025-ps3-add-disk

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-10-19 Thread Luis Chamberlain
On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > If nd_integrity_init() fails we'd get del_gendisk() called,
> > but that's not correct as we should only call that if we're
> > done with device_add_disk(). Fix this by providing unwinding
> > prior to the devm call being registered and moving the devm
> > registration to the very end.
> >
> > This should fix calling del_gendisk() if nd_integrity_init()
> > fails. I only spotted this issue through code inspection. It
> > does not fix any real world bug.
> >
> 
> Just fyi, I'm preparing patches to delete this driver completely as it
> is unused by any shipping platform. I hope to get that removal into
> v5.16.

I'll remove this from my queue, any chance you can review the changes
for nvdimm/btt?

  Luis


Re: [PATCH 00/13] block: add_disk() error handling stragglers

2021-10-18 Thread Luis Chamberlain
On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> Hi Luis,
> 
> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> > This patch set consists of al the straggler drivers for which we have
> > have no patch reviews done for yet. I'd like to ask for folks to please
> > consider chiming in, specially if you're the maintainer for the driver.
> > Additionally if you can specify if you'll take the patch in yourself or
> > if you want Jens to take it, that'd be great too.
> 
> Do you have a git repo with the patch set applied that I can use to test with?

Sure, although the second to last patch is in a state of flux given
the ataflop driver currently is broken and so we're seeing how to fix
that first:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling

  Luis


[PATCH 08/13] zram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/zram/zram_drv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0a9309a2ef54..bdbded107e6b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1983,7 +1983,9 @@ static int zram_add(void)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
-   device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+   ret = device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+   if (ret)
+   goto out_cleanup_disk;
 
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 
@@ -1991,6 +1993,8 @@ static int zram_add(void)
pr_info("Added device: %s\n", zram->disk->disk_name);
return device_id;
 
+out_cleanup_disk:
+   blk_cleanup_disk(zram->disk);
 out_free_idr:
idr_remove(_index_idr, device_id);
 out_free_dev:
-- 
2.30.2



[PATCH 11/13] ps3vram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH 05/13] nvdimm/btt: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 23ee8c005db5..57b921c5fbb5 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1542,7 +1542,9 @@ static int btt_blk_init(struct btt *btt)
}
 
set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
-   device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+   rc = device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+   if (rc)
+   goto out_cleanup_disk;
 
btt->nd_btt->size = btt->nlba * (u64)btt->sector_size;
nvdimm_check_and_set_ro(btt->btt_disk);
-- 
2.30.2



[PATCH 13/13] mtd/ubi/block: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/mtd/ubi/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
list_add_tail(>list, _devices);
 
/* Must be the last step: anyone can call file ops from now on */
-   add_disk(dev->gd);
+   ret = add_disk(dev->gd);
+   if (ret)
+   goto out_destroy_wq;
+
dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 dev->ubi_num, dev->vol_id, vi->name);
mutex_unlock(_mutex);
return 0;
 
+out_destroy_wq:
+   list_del(>list);
+   destroy_workqueue(dev->wq);
 out_remove_minor:
idr_remove(_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2



[PATCH 09/13] z2ram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling. Only the disk is cleaned up inside
z2ram_register_disk() as the caller deals with the rest.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/z2ram.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/z2ram.c b/drivers/block/z2ram.c
index 4eef218108c6..ccc52c935faf 100644
--- a/drivers/block/z2ram.c
+++ b/drivers/block/z2ram.c
@@ -318,6 +318,7 @@ static const struct blk_mq_ops z2_mq_ops = {
 static int z2ram_register_disk(int minor)
 {
struct gendisk *disk;
+   int err;
 
disk = blk_mq_alloc_disk(_set, NULL);
if (IS_ERR(disk))
@@ -333,8 +334,10 @@ static int z2ram_register_disk(int minor)
sprintf(disk->disk_name, "z2ram");
 
z2ram_gendisk[minor] = disk;
-   add_disk(disk);
-   return 0;
+   err = add_disk(disk);
+   if (err)
+   blk_cleanup_disk(disk);
+   return err;
 }
 
 static int __init z2_init(void)
-- 
2.30.2



[PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-10-15 Thread Luis Chamberlain
If nd_integrity_init() fails we'd get del_gendisk() called,
but that's not correct as we should only call that if we're
done with device_add_disk(). Fix this by providing unwinding
prior to the devm call being registered and moving the devm
registration to the very end.

This should fix calling del_gendisk() if nd_integrity_init()
fails. I only spotted this issue through code inspection. It
does not fix any real world bug.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/blk.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 088d3dd6f6fa..591fa1f86f1e 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -240,6 +240,7 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
resource_size_t available_disk_size;
struct gendisk *disk;
u64 internal_nlba;
+   int rc;
 
internal_nlba = div_u64(nsblk->size, nsblk_internal_lbasize(nsblk));
available_disk_size = internal_nlba * nsblk_sector_size(nsblk);
@@ -256,20 +257,26 @@ static int nsblk_attach_disk(struct nd_namespace_blk 
*nsblk)
blk_queue_logical_block_size(disk->queue, nsblk_sector_size(nsblk));
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
 
-   if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
-   return -ENOMEM;
-
if (nsblk_meta_size(nsblk)) {
-   int rc = nd_integrity_init(disk, nsblk_meta_size(nsblk));
+   rc = nd_integrity_init(disk, nsblk_meta_size(nsblk));
 
if (rc)
-   return rc;
+   goto out_before_devm_err;
}
 
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
device_add_disk(dev, disk, NULL);
+
+   /* nd_blk_release_disk() is called if this fails */
+   if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
+   return -ENOMEM;
+
nvdimm_check_and_set_ro(disk);
return 0;
+
+out_before_devm_err:
+   blk_cleanup_disk(disk);
+   return rc;
 }
 
 static int nd_blk_probe(struct device *dev)
-- 
2.30.2



[PATCH 10/13] ps3disk: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3disk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device 
*_dev)
 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 get_capacity(gendisk) >> 11);
 
-   device_add_disk(>sbd.core, gendisk, NULL);
-   return 0;
+   error = device_add_disk(>sbd.core, gendisk, NULL);
+   if (error)
+   goto fail_cleanup_disk;
 
+   return 0;
+fail_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 fail_free_tag_set:
blk_mq_free_tag_set(>tag_set);
 fail_teardown:
-- 
2.30.2



[PATCH 03/13] nvdimm/btt: do not call del_gendisk() if not needed

2021-10-15 Thread Luis Chamberlain
We know we don't need del_gendisk() if we haven't added
the disk, so just skip it. This should fix a bug on older
kernels, as del_gendisk() became able to deal with
disks not added only recently, after the patch titled
"block: add flag for add_disk() completion notation".

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 52de60b7adee..29cc7325e890 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1538,7 +1538,6 @@ static int btt_blk_init(struct btt *btt)
int rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
 
if (rc) {
-   del_gendisk(btt->btt_disk);
blk_cleanup_disk(btt->btt_disk);
return rc;
}
-- 
2.30.2



[PATCH 02/13] nvme-multipath: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since we now can tell for sure when a disk was added, move
setting the bit NVME_NSHEAD_DISK_LIVE only when we did
add the disk successfully.

Nothing to do here as the cleanup is done elsewhere. We take
care and use test_and_set_bit() because it is protects against
two nvme paths simultaneously calling device_add_disk() on the
same namespace head.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvme/host/multipath.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index e8ccdd398f78..022837f7be41 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -496,13 +496,23 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct 
nvme_ns_head *head)
 static void nvme_mpath_set_live(struct nvme_ns *ns)
 {
struct nvme_ns_head *head = ns->head;
+   int rc;
 
if (!head->disk)
return;
 
+   /*
+* test_and_set_bit() is used because it is protecting against two nvme
+* paths simultaneously calling device_add_disk() on the same namespace
+* head.
+*/
if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, >flags)) {
-   device_add_disk(>subsys->dev, head->disk,
-   nvme_ns_id_attr_groups);
+   rc = device_add_disk(>subsys->dev, head->disk,
+nvme_ns_id_attr_groups);
+   if (rc) {
+   clear_bit(NVME_NSHEAD_DISK_LIVE, >flags);
+   return;
+   }
nvme_add_ns_head_cdev(head);
}
 
-- 
2.30.2



[PATCH 00/13] block: add_disk() error handling stragglers

2021-10-15 Thread Luis Chamberlain
This patch set consists of al the straggler drivers for which we have
have no patch reviews done for yet. I'd like to ask for folks to please
consider chiming in, specially if you're the maintainer for the driver.
Additionally if you can specify if you'll take the patch in yourself or
if you want Jens to take it, that'd be great too.

Luis Chamberlain (13):
  block/brd: add error handling support for add_disk()
  nvme-multipath: add error handling support for add_disk()
  nvdimm/btt: do not call del_gendisk() if not needed
  nvdimm/btt: use goto error labels on btt_blk_init()
  nvdimm/btt: add error handling support for add_disk()
  nvdimm/blk: avoid calling del_gendisk() on early failures
  nvdimm/blk: add error handling support for add_disk()
  zram: add error handling support for add_disk()
  z2ram: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/brd.c   |  9 +++--
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/z2ram.c |  7 +--
 drivers/block/zram/zram_drv.c |  6 +-
 drivers/mtd/ubi/block.c   |  8 +++-
 drivers/nvdimm/blk.c  | 21 +++--
 drivers/nvdimm/btt.c  | 24 +++-
 drivers/nvme/host/multipath.c | 14 --
 10 files changed, 89 insertions(+), 29 deletions(-)

-- 
2.30.2



[PATCH 07/13] nvdimm/blk: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since nvdimm/blk uses devm we just need to move the devm
registration towards the end. And in hindsight, that seems
to also provide a fix given del_gendisk() should not be
called unless the disk was already added via add_disk().
The probably of that issue happening is low though, like
OOM while calling devm_add_action(), so the fix is minor.

We manually unwind in case of add_disk() failure prior
to the devm registration.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 591fa1f86f1e..9f1eb41404ac 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -265,7 +265,9 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
}
 
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
-   device_add_disk(dev, disk, NULL);
+   rc = device_add_disk(dev, disk, NULL);
+   if (rc)
+   goto out_before_devm_err;
 
/* nd_blk_release_disk() is called if this fails */
if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
-- 
2.30.2



[PATCH 01/13] block/brd: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/brd.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 530b31240203..6065f493876f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -372,6 +372,7 @@ static int brd_alloc(int i)
struct brd_device *brd;
struct gendisk *disk;
char buf[DISK_NAME_LEN];
+   int err = -ENOMEM;
 
mutex_lock(_devices_mutex);
list_for_each_entry(brd, _devices, brd_list) {
@@ -422,16 +423,20 @@ static int brd_alloc(int i)
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
-   add_disk(disk);
+   err = add_disk(disk);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(disk);
 out_free_dev:
mutex_lock(_devices_mutex);
list_del(>brd_list);
mutex_unlock(_devices_mutex);
kfree(brd);
-   return -ENOMEM;
+   return err;
 }
 
 static void brd_probe(dev_t dev)
-- 
2.30.2



[PATCH 04/13] nvdimm/btt: use goto error labels on btt_blk_init()

2021-10-15 Thread Luis Chamberlain
This will make it easier to share common error paths.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 29cc7325e890..23ee8c005db5 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1520,10 +1520,11 @@ static int btt_blk_init(struct btt *btt)
 {
struct nd_btt *nd_btt = btt->nd_btt;
struct nd_namespace_common *ndns = nd_btt->ndns;
+   int rc = -ENOMEM;
 
btt->btt_disk = blk_alloc_disk(NUMA_NO_NODE);
if (!btt->btt_disk)
-   return -ENOMEM;
+   goto out;
 
nvdimm_namespace_disk_name(ndns, btt->btt_disk->disk_name);
btt->btt_disk->first_minor = 0;
@@ -1535,19 +1536,23 @@ static int btt_blk_init(struct btt *btt)
blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_disk->queue);
 
if (btt_meta_size(btt)) {
-   int rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
-
-   if (rc) {
-   blk_cleanup_disk(btt->btt_disk);
-   return rc;
-   }
+   rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
+   if (rc)
+   goto out_cleanup_disk;
}
+
set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+
btt->nd_btt->size = btt->nlba * (u64)btt->sector_size;
nvdimm_check_and_set_ro(btt->btt_disk);
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(btt->btt_disk);
+out:
+   return rc;
 }
 
 static void btt_blk_cleanup(struct btt *btt)
-- 
2.30.2



[PATCH 12/13] block/sunvdc: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sunvdc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
if (IS_ERR(g)) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
   port->vio.name);
-   blk_mq_free_tag_set(>tag_set);
-   return PTR_ERR(g);
+   err = PTR_ERR(g);
+   goto out_free_tag;
}
 
port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
   port->vdisk_size, (port->vdisk_size >> (20 - 9)),
   port->vio.ver.major, port->vio.ver.minor);
 
-   device_add_disk(>vio.vdev->dev, g, NULL);
+   err = device_add_disk(>vio.vdev->dev, g, NULL);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(g);
+out_free_tag:
+   blk_mq_free_tag_set(>tag_set);
+   return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2



[PATCH v2 08/10] block/sx8: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

A completion is used to notify the initial probe what is
happening and so we must defer error handling on completion.
Do this by remembering the error and using the shared cleanup
function.

The tags are shared and so are hanlded later for the
driver already.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sx8.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index 420cd952ddc4..1c79248c4826 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -297,6 +297,7 @@ struct carm_host {
 
struct work_struct  fsm_task;
 
+   int probe_err;
struct completion   probe_comp;
 };
 
@@ -1181,8 +1182,11 @@ static void carm_fsm_task (struct work_struct *work)
struct gendisk *disk = port->disk;
 
set_capacity(disk, port->capacity);
-   add_disk(disk);
-   activated++;
+   host->probe_err = add_disk(disk);
+   if (!host->probe_err)
+   activated++;
+   else
+   break;
}
 
printk(KERN_INFO DRV_NAME "(%s): %d ports activated\n",
@@ -1192,11 +1196,9 @@ static void carm_fsm_task (struct work_struct *work)
reschedule = 1;
break;
}
-
case HST_PROBE_FINISHED:
complete(>probe_comp);
break;
-
case HST_ERROR:
/* FIXME: TODO */
break;
@@ -1507,7 +1509,10 @@ static int carm_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_free_irq;
 
DPRINTK("waiting for probe_comp\n");
+   host->probe_err = -ENODEV;
wait_for_completion(>probe_comp);
+   if (host->probe_err)
+   goto err_out_free_irq;
 
printk(KERN_INFO "%s: pci %s, ports %d, io %llx, irq %u, major %d\n",
   host->name, pci_name(pdev), (int) CARM_MAX_PORTS,
-- 
2.30.2



[PATCH v2 02/10] pktcdvd: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The out_mem2 error label already does what we need so
re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0f26b2510a75..415248962e67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2729,7 +2729,9 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
/* inherit events of the host device */
disk->events = pd->bdev->bd_disk->events;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_mem2;
 
pkt_sysfs_dev_new(pd);
pkt_debugfs_dev_new(pd);
-- 
2.30.2



[PATCH v2 07/10] block/sunvdc: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sunvdc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
if (IS_ERR(g)) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
   port->vio.name);
-   blk_mq_free_tag_set(>tag_set);
-   return PTR_ERR(g);
+   err = PTR_ERR(g);
+   goto out_free_tag;
}
 
port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
   port->vdisk_size, (port->vdisk_size >> (20 - 9)),
   port->vio.ver.major, port->vio.ver.minor);
 
-   device_add_disk(>vio.vdev->dev, g, NULL);
+   err = device_add_disk(>vio.vdev->dev, g, NULL);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(g);
+out_free_tag:
+   blk_mq_free_tag_set(>tag_set);
+   return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2



[PATCH v2 04/10] ps3vram: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH v2 01/10] mtip32xx: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The read_capacity_error error label already does what we need,
so just re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/mtip32xx/mtip32xx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 901855717cb5..d0b40309f47e 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3633,7 +3633,9 @@ static int mtip_block_initialize(struct driver_data *dd)
set_capacity(dd->disk, capacity);
 
/* Enable the block device and add it to /dev */
-   device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   rv = device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   if (rv)
+   goto read_capacity_error;
 
if (dd->mtip_svc_handler) {
set_bit(MTIP_DDF_INIT_DONE_BIT, >dd_flag);
-- 
2.30.2



[PATCH v2 09/10] pf: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/paride/pf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index f471d48a87bc..380d80e507c7 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -962,7 +962,9 @@ static int __init pf_init_unit(struct pf_unit *pf, bool 
autoprobe, int port,
if (pf_probe(pf))
goto out_pi_release;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_pi_release;
pf->present = 1;
return 0;
 
-- 
2.30.2



[PATCH v2 06/10] block/rsxx: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
index 83636714b8d7..8d9d69f5dfbc 100644
--- a/drivers/block/rsxx/core.c
+++ b/drivers/block/rsxx/core.c
@@ -935,7 +935,9 @@ static int rsxx_pci_probe(struct pci_dev *dev,
card->size8 = 0;
}
 
-   rsxx_attach_dev(card);
+   st = rsxx_attach_dev(card);
+   if (st)
+   goto failed_create_dev;
 
/* Setup Debugfs */
rsxx_debugfs_dev_new(card);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 1cc40b0ea761..b2d3ac3efce2 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -192,6 +192,8 @@ static bool rsxx_discard_supported(struct rsxx_cardinfo 
*card)
 
 int rsxx_attach_dev(struct rsxx_cardinfo *card)
 {
+   int err = 0;
+
mutex_lock(>dev_lock);
 
/* The block device requires the stripe size from the config. */
@@ -200,13 +202,17 @@ int rsxx_attach_dev(struct rsxx_cardinfo *card)
set_capacity(card->gendisk, card->size8 >> 9);
else
set_capacity(card->gendisk, 0);
-   device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
-   card->bdev_attached = 1;
+   err = device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
+   if (err == 0)
+   card->bdev_attached = 1;
}
 
mutex_unlock(>dev_lock);
 
-   return 0;
+   if (err)
+   blk_cleanup_disk(card->gendisk);
+
+   return err;
 }
 
 void rsxx_detach_dev(struct rsxx_cardinfo *card)
-- 
2.30.2



[PATCH v2 00/10] block: fourth batch of add_disk() error handling conversions

2021-09-27 Thread Luis Chamberlain
This is the fourth batch of add_disk() error handling driver
conversions. This set along with the entire 7 set of driver conversions
can be found on my 20210927-for-axboe-add-disk-error-handling branch
[0].

On this v2 series the following modifications have been made since the
last v1 series of this patch set:

  - rebased onto linux-next tag 20210927
  - added the only reviewed-by tag for this series for rnbd Jack Wang

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210927-for-axboe-add-disk-error-handling

Luis Chamberlain (10):
  mtip32xx: add error handling support for add_disk()
  pktcdvd: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  rnbd: add error handling support for add_disk()
  block/rsxx: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  block/sx8: add error handling support for add_disk()
  pf: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/mtip32xx/mtip32xx.c |  4 +++-
 drivers/block/paride/pf.c |  4 +++-
 drivers/block/pktcdvd.c   |  4 +++-
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/rnbd/rnbd-clt.c | 13 +
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/sx8.c   | 13 +
 drivers/mtd/ubi/block.c   |  8 +++-
 11 files changed, 69 insertions(+), 22 deletions(-)

-- 
2.30.2



[PATCH v2 05/10] rnbd: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Acked-by: Jack Wang 
Signed-off-by: Luis Chamberlain 
---
 drivers/block/rnbd/rnbd-clt.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index bd4a41afbbfc..1ba1c868535a 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1384,8 +1384,10 @@ static void setup_request_queue(struct rnbd_clt_dev *dev)
blk_queue_write_cache(dev->queue, dev->wc, dev->fua);
 }
 
-static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
+static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
 {
+   int err;
+
dev->gd->major  = rnbd_client_major;
dev->gd->first_minor= idx << RNBD_PART_BITS;
dev->gd->minors = 1 << RNBD_PART_BITS;
@@ -1410,7 +1412,11 @@ static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev 
*dev, int idx)
 
if (!dev->rotational)
blk_queue_flag_set(QUEUE_FLAG_NONROT, dev->queue);
-   add_disk(dev->gd);
+   err = add_disk(dev->gd);
+   if (err)
+   blk_cleanup_disk(dev->gd);
+
+   return err;
 }
 
 static int rnbd_client_setup_device(struct rnbd_clt_dev *dev)
@@ -1426,8 +1432,7 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev 
*dev)
rnbd_init_mq_hw_queues(dev);
 
setup_request_queue(dev);
-   rnbd_clt_setup_gen_disk(dev, idx);
-   return 0;
+   return rnbd_clt_setup_gen_disk(dev, idx);
 }
 
 static struct rnbd_clt_dev *init_dev(struct rnbd_clt_session *sess,
-- 
2.30.2



[PATCH v2 10/10] mtd/ubi/block: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/mtd/ubi/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
list_add_tail(>list, _devices);
 
/* Must be the last step: anyone can call file ops from now on */
-   add_disk(dev->gd);
+   ret = add_disk(dev->gd);
+   if (ret)
+   goto out_destroy_wq;
+
dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 dev->ubi_num, dev->vol_id, vi->name);
mutex_unlock(_mutex);
return 0;
 
+out_destroy_wq:
+   list_del(>list);
+   destroy_workqueue(dev->wq);
 out_remove_minor:
idr_remove(_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2



[PATCH v2 03/10] ps3disk: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3disk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device 
*_dev)
 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 get_capacity(gendisk) >> 11);
 
-   device_add_disk(>sbd.core, gendisk, NULL);
-   return 0;
+   error = device_add_disk(>sbd.core, gendisk, NULL);
+   if (error)
+   goto fail_cleanup_disk;
 
+   return 0;
+fail_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 fail_free_tag_set:
blk_mq_free_tag_set(>tag_set);
 fail_teardown:
-- 
2.30.2



[PATCH 04/10] ps3vram: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH 02/10] pktcdvd: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The out_mem2 error label already does what we need so
re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0f26b2510a75..415248962e67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2729,7 +2729,9 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
/* inherit events of the host device */
disk->events = pd->bdev->bd_disk->events;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_mem2;
 
pkt_sysfs_dev_new(pd);
pkt_debugfs_dev_new(pd);
-- 
2.30.2



[PATCH 01/10] mtip32xx: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The read_capacity_error error label already does what we need,
so just re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/mtip32xx/mtip32xx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 901855717cb5..d0b40309f47e 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3633,7 +3633,9 @@ static int mtip_block_initialize(struct driver_data *dd)
set_capacity(dd->disk, capacity);
 
/* Enable the block device and add it to /dev */
-   device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   rv = device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   if (rv)
+   goto read_capacity_error;
 
if (dd->mtip_svc_handler) {
set_bit(MTIP_DDF_INIT_DONE_BIT, >dd_flag);
-- 
2.30.2



[PATCH 00/10] block: fourth batch of add_disk() error handling conversions

2021-09-01 Thread Luis Chamberlain
The full set of changes can be found on my branch titled
20210901-for-axboe-add-disk-error-handling [0] which is
now based on axboe/master.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210901-for-axboe-add-disk-error-handling

Luis Chamberlain (10):
  mtip32xx: add error handling support for add_disk()
  pktcdvd: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  rnbd: add error handling support for add_disk()
  block/rsxx: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  block/sx8: add error handling support for add_disk()
  pf: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/mtip32xx/mtip32xx.c |  4 +++-
 drivers/block/paride/pf.c |  4 +++-
 drivers/block/pktcdvd.c   |  4 +++-
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/rnbd/rnbd-clt.c | 13 +
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/sx8.c   | 13 +
 drivers/mtd/ubi/block.c   |  8 +++-
 11 files changed, 69 insertions(+), 22 deletions(-)

-- 
2.30.2



  1   2   >