Re: request_module DoS

2022-05-12 Thread Luis Chamberlain
On Thu, May 12, 2022 at 10:07:26PM +1000, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > Luis Chamberlain  writes:
> ...
> >
> >> Can someone try this on ppc64le system? At this point I am not convinced
> >> this issue is generic.
> >
> > Does your x86 system have at least 784 CPUs?
> >
> > I don't know where the original report came from, but the trace shows
> > "CPU 784", which would usually indicate a system with at least that many
> > CPUs.
> 
> Update, apparently the report originally came from IBM, so I'll chase it
> up internally.
> 
> I think you're right that there's probably no issue in the module code,
> sorry to waste your time.

It gives me testing happiness to know that may be the case :)

  Luis


Re: request_module DoS

2022-05-11 Thread Luis Chamberlain
On Mon, May 09, 2022 at 09:13:03AM -0700, Luis Chamberlain wrote:
> On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote:
> > Herbert Xu  writes:
> > > Hi:
> > >
> > > There are some code paths in the kernel where you can reliably
> > > trigger a request_module of a non-existant module.  For example,
> > > if you attempt to load a non-existent crypto algorithm, or create
> > > a socket of a non-existent network family, it will result in a
> > > request_module call that is guaranteed to fail.
> > >
> > > As user-space can do this repeatedly, it can quickly overwhelm
> > > the concurrency limit in kmod.  This in itself is expected,
> > > however, at least on some platforms this appears to result in
> > > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> > >
> > > [  529.853264] request_module: kmod_concurrent_max (0) close to 0 
> > > (max_modprobes: 50), for module crypto-aegis128l, throttling...
> > ...
> > > [  580.414590] __request_module: 25 callbacks suppressed
> > > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > > plpar_hcall_norets_notrace+0x18/0x2c
> > > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > > TB:1297686321743840 (11009ms ago)
> > > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables 
> > > nfnetlink pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs 
> > > libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto 
> > > dm_mirror dm_region_hash dm_log dm_mod fuse
> > > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > > tainted 5.14.0-55.el9.ppc64le #1
> > > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > > 
> > > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > > (5.14.0-55.el9.ppc64le)
> > > [  580.423142] MSR:  8280b033   
> > > CR: 28008202  XER: 2004
> > > [  580.423148] CFAR: 0c00 IRQMASK: 1 
> > >GPR00: 28008202 c044c46b3850 c2a46f00 
> > >  
> > >GPR04:   0010 
> > > c2a83060 
> > >GPR08:  0001 0001 
> > >  
> > >GPR12: c01b9530 c043ffe16700 00020117 
> > > 10185ea8 
> > >GPR16: 10212150 10186198 101863a0 
> > > 1021b3c0 
> > >GPR20: 0001  0001 
> > > 00ff 
> > >GPR24: c043f4a00e14 c043fafe0e00 0c44 
> > >  
> > >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > > c2561aa0 
> > > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > > [  580.423168] LR [c01f7c38] 
> > > __pv_queued_spin_lock_slowpath+0x528/0x530
> > > [  580.423173] Call Trace:
> > > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > > (unreliable)
> > > [  580.423177] [c044c46b3910] [c0ea6948] 
> > > _raw_spin_lock_irqsave+0xa8/0xc0
> > > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > > prepare_to_wait_event+0x40/0x200
> > > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > > __request_module+0x320/0x510
> > > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > > crypto_alg_mod_lookup+0x1e4/0x2e0
> > > [  580.423192] [c044c46b3b60] [c06f2178] 
> > > crypto_alloc_tfm_node+0xa8/0x1a0
> > > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > > crypto_alloc_aead+0x38/0x50
> > > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > > [

Re: request_module DoS

2022-05-09 Thread Luis Chamberlain
On Mon, May 09, 2022 at 09:23:39PM +1000, Michael Ellerman wrote:
> Herbert Xu  writes:
> > Hi:
> >
> > There are some code paths in the kernel where you can reliably
> > trigger a request_module of a non-existant module.  For example,
> > if you attempt to load a non-existent crypto algorithm, or create
> > a socket of a non-existent network family, it will result in a
> > request_module call that is guaranteed to fail.
> >
> > As user-space can do this repeatedly, it can quickly overwhelm
> > the concurrency limit in kmod.  This in itself is expected,
> > however, at least on some platforms this appears to result in
> > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> >
> > [  529.853264] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis128l, throttling...
> ...
> > [  580.414590] __request_module: 25 callbacks suppressed
> > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > TB:1297686321743840 (11009ms ago)
> > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink 
> > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c 
> > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror 
> > dm_region_hash dm_log dm_mod fuse
> > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > tainted 5.14.0-55.el9.ppc64le #1
> > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > 
> > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423142] MSR:  8280b033   
> > CR: 28008202  XER: 2004
> > [  580.423148] CFAR: 0c00 IRQMASK: 1 
> >GPR00: 28008202 c044c46b3850 c2a46f00 
> >  
> >GPR04:   0010 
> > c2a83060 
> >GPR08:  0001 0001 
> >  
> >GPR12: c01b9530 c043ffe16700 00020117 
> > 10185ea8 
> >GPR16: 10212150 10186198 101863a0 
> > 1021b3c0 
> >GPR20: 0001  0001 
> > 00ff 
> >GPR24: c043f4a00e14 c043fafe0e00 0c44 
> >  
> >GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > c2561aa0 
> > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423168] LR [c01f7c38] 
> > __pv_queued_spin_lock_slowpath+0x528/0x530
> > [  580.423173] Call Trace:
> > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > (unreliable)
> > [  580.423177] [c044c46b3910] [c0ea6948] 
> > _raw_spin_lock_irqsave+0xa8/0xc0
> > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > prepare_to_wait_event+0x40/0x200
> > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > __request_module+0x320/0x510
> > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > crypto_alg_mod_lookup+0x1e4/0x2e0
> > [  580.423192] [c044c46b3b60] [c06f2178] 
> > crypto_alloc_tfm_node+0xa8/0x1a0
> > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > crypto_alloc_aead+0x38/0x50
> > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > [  580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160
> > [  580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40
> > [  580.423207] [c044c46b3db0] [c0030880] 
> > system_call_exception+0x160/0x300
> > [  580.423209] [c044c46b3e10] [c000c168] 
> > system_call_vectored_common+0xe8/0x278
> > [  580.423213] --- interrupt: 3000 at 0x7fff9b824464
> > [  580.423214] NIP:  7fff9b824464 LR:  CTR: 
> > 
> > [  580.423215] REGS: c044c46b3e80 TRAP: 3000   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423216] MSR:  8280f033   
> > CR: 42004802  XER: 
> > [  580.423221] IRQMASK: 0 
> >GPR00: 0147 7fffdcff2780 7fff9b917100 
> > 0004 
> >GPR04: 7fffdcff27e0 0058  
> >  
> >GPR08: 

Re: request_module DoS

2022-05-08 Thread Luis Chamberlain
On Sat, May 07, 2022 at 12:14:47PM -0700, Luis Chamberlain wrote:
> On Sat, May 07, 2022 at 01:02:20AM -0700, Luis Chamberlain wrote:
> > You can try to reproduce by using adding a new test type for crypto-aegis256
> > on lib/test_kmod.c. These tests however can try something similar but other
> > modules.
> > 
> > /tools/testing/selftests/kmod/kmod.sh -t 0008
> > /tools/testing/selftests/kmod/kmod.sh -t 0009
> > 
> > I can't decipher this yet.
> 
> Without testing it... but something like this might be an easier
> reproducer:
> 
> + config_set_driver crypto-aegis256

If the module is not present though nothing really happens, and so
is it possible this is another issue?

Below a bogus module request.

diff --git a/tools/testing/selftests/kmod/kmod.sh 
b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..a747ad549940 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -65,6 +66,7 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
 ALL_TESTS="$ALL_TESTS 0011:1:1"
 ALL_TESTS="$ALL_TESTS 0012:1:1"
 ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:150:1"
 
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
@@ -504,6 +506,17 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head 
-n1"
 }
 
+kmod_test_0014()
+{
+   kmod_defaults_driver
+   MODPROBE_LIMIT=$(config_get_modprobe_limit)
+   let EXTRA=$MODPROBE_LIMIT/6
+   config_set_driver bogus_module_does_not_exist
+   config_num_thread_limit_extra $EXTRA
+   config_trigger ${FUNCNAME[0]}
+   config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
 list_tests()
 {
echo "Test ID list:"
@@ -525,6 +538,7 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module 
autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address 
visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* 
visibility under CAP_SYSLOG"
+   echo "0014 x $(get_test_count 0014) - multithreaded - push 
kmod_concurrent over max_modprobes for request_module() for a missing module"
 }
 
 usage()


Re: request_module DoS

2022-05-07 Thread Luis Chamberlain
On Sat, May 07, 2022 at 01:02:20AM -0700, Luis Chamberlain wrote:
> You can try to reproduce by using adding a new test type for crypto-aegis256
> on lib/test_kmod.c. These tests however can try something similar but other
> modules.
> 
> /tools/testing/selftests/kmod/kmod.sh -t 0008
> /tools/testing/selftests/kmod/kmod.sh -t 0009
> 
> I can't decipher this yet.

Without testing it... but something like this might be an easier
reproducer:

diff --git a/tools/testing/selftests/kmod/kmod.sh 
b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..48b6b5ec6c1e 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -41,6 +41,7 @@ set -e
 TEST_NAME="kmod"
 TEST_DRIVER="test_${TEST_NAME}"
 TEST_DIR=$(dirname $0)
+PROC_CONFIG="/proc/config.gz"
 
 # This represents
 #
@@ -65,6 +66,7 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
 ALL_TESTS="$ALL_TESTS 0011:1:1"
 ALL_TESTS="$ALL_TESTS 0012:1:1"
 ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:150:1"
 
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
@@ -79,6 +81,19 @@ test_modprobe()
fi
 }
 
+kconfig_has()
+{
+   if [ -f $PROC_CONFIG ]; then
+   if zgrep -q $1 $PROC_CONFIG 2>/dev/null; then
+   echo "yes"
+   else
+   echo "no"
+   fi
+   else
+   echo "no"
+   fi
+}
+
 function allow_user_defaults()
 {
if [ -z $DEFAULT_KMOD_DRIVER ]; then
@@ -106,6 +121,8 @@ function allow_user_defaults()
fi
 
MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
+   HAS_CRYPTO_AEGIS256_MOD="$(kconfig_has CONFIG_CRYPTO_AEGIS256=m)"
+   HAS_CRYPTO_AEGIS256_BUILTIN="$(kconfig_has CONFIG_CRYPTO_AEGIS256=y)"
 }
 
 test_reqs()
@@ -504,6 +521,21 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head 
-n1"
 }
 
+kmod_test_0014()
+{
+   kmod_defaults_driver
+   MODPROBE_LIMIT=$(config_get_modprobe_limit)
+   let EXTRA=$MODPROBE_LIMIT/6
+   config_set_driver crypto-aegis256
+   config_num_thread_limit_extra $EXTRA
+   config_trigger ${FUNCNAME[0]}
+   if [[ "$HAS_CRYPTO_AEGIS256_MOD" == "yes" || 
"$HAS_CRYPTO_AEGIS256_BUILTIN" == "yes" ]]; then
+   config_expect_result ${FUNCNAME[0]} SUCCESS
+   else
+   config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+   fi
+}
+
 list_tests()
 {
echo "Test ID list:"
@@ -525,6 +557,7 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module 
autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address 
visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* 
visibility under CAP_SYSLOG"
+   echo "0014 x $(get_test_count 0014) - multithreaded - push 
kmod_concurrent over max_modprobes for request_module() for crypto-aegis256"
 }
 
 usage()


Re: request_module DoS

2022-05-07 Thread Luis Chamberlain
On Sat, May 07, 2022 at 07:10:23AM +, Christophe Leroy wrote:
> > There are some code paths in the kernel where you can reliably
> > trigger a request_module of a non-existant module.  For example,
> > if you attempt to load a non-existent crypto algorithm, or create
> > a socket of a non-existent network family, it will result in a
> > request_module call that is guaranteed to fail.
> > 
> > As user-space can do this repeatedly, it can quickly overwhelm
> > the concurrency limit in kmod.  This in itself is expected,
> > however, at least on some platforms this appears to result in
> > a live-lock.  Here is an example triggered by stress-ng on ppc64:
> > 
> > [  579.845320] request_module: modprobe crypto-aegis256 cannot be 
> > processed, kmod busy with 50 threads for more than 5 seconds now
> > [  580.414590] __request_module: 25 callbacks suppressed
> > [  580.414597] request_module: kmod_concurrent_max (0) close to 0 
> > (max_modprobes: 50), for module crypto-aegis256-all, throttling...
> > [  580.423082] watchdog: CPU 784 self-detected hard LOCKUP @ 
> > plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423097] watchdog: CPU 784 TB:1297691958559475, last heartbeat 
> > TB:1297686321743840 (11009ms ago)
> > [  580.423099] Modules linked in: cast6_generic cast5_generic cast_common 
> > camellia_generic blowfish_generic blowfish_common tun nft_fib_inet 
> > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding tls ip_set nf_tables nfnetlink 
> > pseries_rng binfmt_misc drm drm_panel_orientation_quirks xfs libcrc32c 
> > sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror 
> > dm_region_hash dm_log dm_mod fuse
> > [  580.423136] CPU: 784 PID: 77071 Comm: stress-ng Kdump: loaded Not 
> > tainted 5.14.0-55.el9.ppc64le #1
> > [  580.423139] NIP:  c00f8ff4 LR: c01f7c38 CTR: 
> > 
> > [  580.423140] REGS: c043fdd7bd60 TRAP: 0900   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423142] MSR:  8280b033   
> > CR: 28008202  XER: 2004
> > [  580.423148] CFAR: 0c00 IRQMASK: 1
> > GPR00: 28008202 c044c46b3850 c2a46f00 
> > 
> > GPR04:   0010 
> > c2a83060
> > GPR08:  0001 0001 
> > 
> > GPR12: c01b9530 c043ffe16700 00020117 
> > 10185ea8
> > GPR16: 10212150 10186198 101863a0 
> > 1021b3c0
> > GPR20: 0001  0001 
> > 00ff
> > GPR24: c043f4a00e14 c043fafe0e00 0c44 
> > 
> > GPR28: c043f4a00e00 c043f4a00e00 c21e0e00 
> > c2561aa0
> > [  580.423166] NIP [c00f8ff4] plpar_hcall_norets_notrace+0x18/0x2c
> > [  580.423168] LR [c01f7c38] 
> > __pv_queued_spin_lock_slowpath+0x528/0x530
> > [  580.423173] Call Trace:
> > [  580.423174] [c044c46b3850] [00016b60] 0x16b60 
> > (unreliable)
> > [  580.423177] [c044c46b3910] [c0ea6948] 
> > _raw_spin_lock_irqsave+0xa8/0xc0
> > [  580.423182] [c044c46b3940] [c01dd7c0] 
> > prepare_to_wait_event+0x40/0x200
> > [  580.423185] [c044c46b39a0] [c019e9e0] 
> > __request_module+0x320/0x510
> > [  580.423188] [c044c46b3ac0] [c06f1a14] 
> > crypto_alg_mod_lookup+0x1e4/0x2e0
> > [  580.423192] [c044c46b3b60] [c06f2178] 
> > crypto_alloc_tfm_node+0xa8/0x1a0
> > [  580.423194] [c044c46b3be0] [c06f84f8] 
> > crypto_alloc_aead+0x38/0x50
> > [  580.423196] [c044c46b3c00] [c072cba0] aead_bind+0x70/0x140
> > [  580.423199] [c044c46b3c40] [c0727824] alg_bind+0xb4/0x210
> > [  580.423201] [c044c46b3cc0] [c0bc2ad4] __sys_bind+0x114/0x160
> > [  580.423205] [c044c46b3d90] [c0bc2b48] sys_bind+0x28/0x40
> > [  580.423207] [c044c46b3db0] [c0030880] 
> > system_call_exception+0x160/0x300
> > [  580.423209] [c044c46b3e10] [c000c168] 
> > system_call_vectored_common+0xe8/0x278
> > [  580.423213] --- interrupt: 3000 at 0x7fff9b824464
> > [  580.423214] NIP:  7fff9b824464 LR:  CTR: 
> > 
> > [  580.423215] REGS: c044c46b3e80 TRAP: 3000   Not tainted  
> > (5.14.0-55.el9.ppc64le)
> > [  580.423216] MSR:  8280f033   
> > CR: 42004802  XER: 
> > [  580.423221] IRQMASK: 0
> > GPR00: 0147 7fffdcff2780 7fff9b917100 
> > 0004
> > GPR04: 7fffdcff27e0 0058  
> > 
> > GPR08:   

Re: [PATCH v6 0/6] Allocate module text and data separately

2022-03-22 Thread Luis Chamberlain
On Wed, Feb 23, 2022 at 01:02:10PM +0100, Christophe Leroy wrote:
> This series applies on top of my series "miscellanuous cleanups" v4.

Queued onto modules-testing! BTW I just had to rebase the change
with the kdb changes, it was a trivial change.

  Luis


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-02-10 Thread Luis Chamberlain
On Wed, Feb 09, 2022 at 03:46:05PM +1100, Michael Ellerman wrote:
> Luis Chamberlain  writes:
> > On Tue, Jan 11, 2022 at 12:37:42PM +0100, Michal Suchanek wrote:
> >> Hello,
> >> 
> >> This is a refresh of the KEXEC_SIG series.
> >> 
> >> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> >> with appended signatures in the kernel.
> >> 
> >> powerpc supports IMA_KEXEC but that's an exception rather than the norm.
> >> On the other hand, KEXEC_SIG is portable across platforms.
> >> 
> >> For distributions to have uniform security features across platforms one
> >> option should be used on all platforms.
> >> 
> >> Thanks
> >> 
> >> Michal
> >> 
> >> Previous revision: 
> >> https://lore.kernel.org/linuxppc-dev/cover.1637862358.git.msucha...@suse.de/
> >> Patched kernel tree: https://github.com/hramrach/kernel/tree/kexec_sig
> >> 
> >> Michal Suchanek (6):
> >>   s390/kexec_file: Don't opencode appended signature check.
> >>   powerpc/kexec_file: Add KEXEC_SIG support.
> >>   kexec_file: Don't opencode appended signature verification.
> >>   module: strip the signature marker in the verification function.
> >>   module: Use key_being_used_for for log messages in
> >> verify_appended_signature
> >>   module: Move duplicate mod_check_sig users code to mod_parse_sig
> >
> > What tree should this go through? I'd prefer if over through modules
> > tree as it can give a chance for Aaron Tomlin to work with this for his
> > code refactoring of kernel/module*.c to kernel/module/
> 
> Yeah that's fine by me, the arch changes are pretty minimal and unlikely
> to conflict much.

Ok sounds good thanks.

  Luis


Re: [PATCH v3 4/6] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-03 Thread Luis Chamberlain
On Thu, Feb 03, 2022 at 07:05:13AM +, Christophe Leroy wrote:
> 
> 
> Le 03/02/2022 à 01:01, Luis Chamberlain a écrit :
> > On Sat, Jan 29, 2022 at 05:02:09PM +, Christophe Leroy wrote:
> >> diff --git a/kernel/module.c b/kernel/module.c
> >> index 11f51e17fb9f..f3758115ebaa 100644
> >> --- a/kernel/module.c
> >> +++ b/kernel/module.c
> >> @@ -81,7 +81,9 @@
> >>   /* If this is set, the section belongs in the init part of the module */
> >>   #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
> >>   
> >> +#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >>   #define  data_layout core_layout
> >> +#endif
> >>   
> >>   /*
> >>* Mutex protects:
> >> @@ -111,6 +113,12 @@ static struct mod_tree_root {
> >>   #define module_addr_min mod_tree.addr_min
> >>   #define module_addr_max mod_tree.addr_max
> >>   
> >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >> +static struct mod_tree_root mod_data_tree __cacheline_aligned = {
> >> +  .addr_min = -1UL,
> >> +};
> >> +#endif
> >> +
> >>   #ifdef CONFIG_MODULES_TREE_LOOKUP
> >>   
> >>   /*
> >> @@ -186,6 +194,11 @@ static void mod_tree_insert(struct module *mod)
> >>__mod_tree_insert(>core_layout.mtn, _tree);
> >>if (mod->init_layout.size)
> >>__mod_tree_insert(>init_layout.mtn, _tree);
> >> +
> >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> >> +  mod->data_layout.mtn.mod = mod;
> >> +  __mod_tree_insert(>data_layout.mtn, _data_tree);
> >> +#endif
> > 
> > 
> > kernel/ directory has quite a few files, module.c is the second to
> > largest file, and it has tons of stuff. Aaron is doing work to
> > split things out to make code easier to read and so that its easier
> > to review changes. See:
> > 
> > https://lkml.kernel.org/r/20220130213214.1042497-1-atom...@redhat.com
> > 
> > I think this is a good patch example which could benefit from that work.
> > So I'd much prefer to see that work go in first than this, so to see if
> > we can make the below changes more compartamentalized.
> > 
> > Curious, how much testing has been put into this series?
> 
> 
> I tested the change up to (including) patch 4 to verify it doesn't 
> introduce regression when not using 
> CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC,

> Then I tested with patch 5. I first tried with the 'hello world' test 
> module. After that I loaded several important modules and checked I 
> didn't get any regression, both with and without STRICT_MODULES_RWX and 
> I checked the consistency in /proc/vmallocinfo
>   /proc/modules /sys/class/modules/*

I wonder if we have a test for STRICT_MODULES_RWX.

> I also tested with a hacked module_alloc() to force branch trampolines.

So to verify that reducing these trampolines actually helps on an
architecture? I wonder if we can generalize this somehow to let archs
verify such strategies can help.

I was hoping for a bit more wider testing, like actually users, etc.
It does not seem like so. So we can get to that by merging this soon
into modules-next and having this bleed out issues with linux-next.
We are in good time to do this now.

The kmod tree has tons of tests:

https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git/

Can you use that to verify there are no regressions?

Aaron, Michal, if you can do the same that'd be appreciated.


  Luis


Re: [PATCH v3 0/6] Allocate module text and data separately

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:03PM +, Christophe Leroy wrote:
> This series allow architectures to request having modules data in
> vmalloc area instead of module area.
> 
> This is required on powerpc book3s/32 in order to set data non
> executable, because it is not possible to set executability on page
> basis, this is done per 256 Mbytes segments. The module area has exec
> right, vmalloc area has noexec. Without this change module data
> remains executable regardless of CONFIG_STRICT_MODULES_RWX.
> 
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.
> 

This looks good, however I'd like to see Aaron's changes go in first,
and then yours. Aaron's changes still need to be tested by 0-day and I
need to finish review, but that's the order of how I'd prefer to see
changes merged / tested. I'll try to review his changes, dump them to
modules-next and then I'd like to trouble you to rebase ontop of that.

We should get all this tested early for the next release.

  Luis


Re: [PATCH v3 4/6] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:09PM +, Christophe Leroy wrote:
> diff --git a/kernel/module.c b/kernel/module.c
> index 11f51e17fb9f..f3758115ebaa 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -81,7 +81,9 @@
>  /* If this is set, the section belongs in the init part of the module */
>  #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
>  
> +#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>  #define  data_layout core_layout
> +#endif
>  
>  /*
>   * Mutex protects:
> @@ -111,6 +113,12 @@ static struct mod_tree_root {
>  #define module_addr_min mod_tree.addr_min
>  #define module_addr_max mod_tree.addr_max
>  
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +static struct mod_tree_root mod_data_tree __cacheline_aligned = {
> + .addr_min = -1UL,
> +};
> +#endif
> +
>  #ifdef CONFIG_MODULES_TREE_LOOKUP
>  
>  /*
> @@ -186,6 +194,11 @@ static void mod_tree_insert(struct module *mod)
>   __mod_tree_insert(>core_layout.mtn, _tree);
>   if (mod->init_layout.size)
>   __mod_tree_insert(>init_layout.mtn, _tree);
> +
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> + mod->data_layout.mtn.mod = mod;
> + __mod_tree_insert(>data_layout.mtn, _data_tree);
> +#endif


kernel/ directory has quite a few files, module.c is the second to
largest file, and it has tons of stuff. Aaron is doing work to
split things out to make code easier to read and so that its easier
to review changes. See:

https://lkml.kernel.org/r/20220130213214.1042497-1-atom...@redhat.com

I think this is a good patch example which could benefit from that work.
So I'd much prefer to see that work go in first than this, so to see if
we can make the below changes more compartamentalized.

Curious, how much testing has been put into this series?

  Luis


Re: [PATCH v3 3/6] modules: Introduce data_layout

2022-02-02 Thread Luis Chamberlain
On Sat, Jan 29, 2022 at 05:02:07PM +, Christophe Leroy wrote:
> diff --git a/kernel/module.c b/kernel/module.c
> index 163e32e39064..11f51e17fb9f 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -81,6 +81,8 @@
>  /* If this is set, the section belongs in the init part of the module */
>  #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
>  
> +#define  data_layout core_layout
> +
>  /*
>   * Mutex protects:
>   * 1) List of modules (also safely readable with preempt_disable),
> @@ -2451,7 +2454,10 @@ static void layout_sections(struct module *mod, struct 
> load_info *info)
>   || s->sh_entsize != ~0UL
>   || module_init_layout_section(sname))
>   continue;
> - s->sh_entsize = get_offset(mod, >core_layout.size, 
> s, i);
> + if (m)
> + s->sh_entsize = get_offset(mod, 
> >data_layout.size, s, i);
> + else
> + s->sh_entsize = get_offset(mod, 
> >core_layout.size, s, i);
>   pr_debug("\t%s\n", sname);

Huh why is this branching here, given you just used mod->data_layout in
all other areas?

> @@ -3468,6 +3474,8 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   if (shdr->sh_entsize & INIT_OFFSET_MASK)
>   dest = mod->init_layout.base
>   + (shdr->sh_entsize & ~INIT_OFFSET_MASK);
> + else if (!(shdr->sh_flags & SHF_EXECINSTR))
> + dest = mod->data_layout.base + shdr->sh_entsize;
>   else
>   dest = mod->core_layout.base + shdr->sh_entsize;
>  

Likewise here.

  Luis


Re: [PATCH v2 5/5] powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx

2022-02-02 Thread Luis Chamberlain
On Thu, Jan 27, 2022 at 11:28:12AM +, Christophe Leroy wrote:
> book3s/32 and 8xx have a separate area for allocating modules,
> defined by MODULES_VADDR / MODULES_END.
> 
> On book3s/32, it is not possible to protect against execution
> on a page basis. A full 256M segment is either Exec or NoExec.
> The module area is in an Exec segment while vmalloc area is
> in a NoExec segment.
> 
> In order to protect module data against execution, select
> ARCH_WANTS_MODULES_DATA_IN_VMALLOC.
> 
> For the 8xx (and possibly other 32 bits platform in the future),
> there is no such constraint on Exec/NoExec protection, however
> there is a critical distance between kernel functions and callers
> that needs to remain below 32Mbytes in order to avoid costly
> trampolines. By allocating data outside of module area, we
> increase the chance for module text to remain within acceptable
> distance from kernel core text.
> 
> So select ARCH_WANTS_MODULES_DATA_IN_VMALLOC for 8xx as well.
> 
> Signed-off-by: Christophe Leroy 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 

Cc list first and then the SOB.

  Luis


Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-02 Thread Luis Chamberlain
On Wed, Jan 26, 2022 at 06:38:30AM +, Christophe Leroy wrote:
> 
> 
> Le 25/01/2022 à 22:10, Luis Chamberlain a écrit :
> > On Mon, Jan 24, 2022 at 09:22:34AM +, Christophe Leroy wrote:
> >> This can also be useful on other powerpc/32 in order to maximize the
> >> chance of code being close enough to kernel core to avoid branch
> >> trampolines.
> > 
> > Curious about all this branch trampoline talk. Do you have data to show
> > negative impact with things as-is?
> 
> See 
> https://github.com/linuxppc/linux/commit/2ec13df167040cd153c25c4d96d0ffc573ac4c40
> 
> Or 
> https://github.com/linuxppc/linux/commit/7d485f647c1f4a6976264c90447fb0dbf07b111d


This was useful and fun to read, thanks.

> > Also, was powerpc/32 broken then without this? The commit log seems to
> > suggest so, but I don't think that's the case. How was this issue noticed?
> 
> 
> Your question is related to the trampoline topic or the exec/noexec 
> flagging ?
> 
> Regarding trampoline, everything is working OK. That's just cherry on 
> the cake, when putting data away you can have more code closer to the 
> kernel. But that would not have been a reason in itself for this series.
> 
> Regarding the exec/noexec discussion, it's a real issue. powerpc/32 
> doesn't honor page exec flag, so when you select STRICT_MODULES_RWX and 
> flag module data as no-exec, it remains executable. That's because 
> powerpc/32 MMU doesn't have a per page exec flag but only a per 
> 256Mbytes segment exec flag.
> 
> Typical PPC32 layount:
> 0xf000-0x : VMALLOC AREA ==> NO EXEC
> 0xc000-0xefff : Linear kernel memory mapping
> 0xb000-0xbfff : MODULES AREA ==> EXEC
> 0x-0xafff : User space ==> EXEC
> 
> So STRICT_MODULES_RWX is broken on some powerpc/32

You know, this is the sort of information that I think would be
very useful for the commit log. Can you ammend?

> > 
> > Are there other future CPU families being planned where this is all true for
> > as well? Are they goin to be 32-bit as well?
> 
> Future I don't know.
> 
> Regarding the trampoline stuff, I see at least the following existing 
> architectures with a similar constraint:
> 
> ARM:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/include/asm/memory.h#L55
> 
> ARM even has a config item to allow trampolines or not. I might add the 
> same to powerpc to reduce number of pages used by modules.
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm/Kconfig#L1514
> 
> NDS32 has the constraint
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nds32/include/asm/memory.h#L41
> 
> NIOS2 has the constraint, allthough they handled it in a different way:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/nios2/kernel/module.c#L30
> 
> 
> 
> Even ARM64 benefits from modules closer to kernel:
> 
> https://elixir.bootlin.com/linux/v5.16/source/arch/arm64/Kconfig#L1848
> 
> 
> Another future opportunity with the ability to allocate module parts 
> separately is the possibility to then use huge vmalloc mappings.
> 
> Today huge vmalloc mappings cannot be used for modules, see recent 
> discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211227145903.187152-4-wangkefeng.w...@huawei.com/

Alrighty, this is sufficient information, thanks!

  Luis


Re: [PATCH 6/7] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 24, 2022 at 09:22:34AM +, Christophe Leroy wrote:
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.

Curious about all this branch trampoline talk. Do you have data to show
negative impact with things as-is?

Also, was powerpc/32 broken then without this? The commit log seems to
suggest so, but I don't think that's the case. How was this issue noticed?

Are there other future CPU families being planned where this is all true for
as well? Are they goin to be 32-bit as well?

  Luis


Re: [PATCH 0/7] Allocate module text and data separately

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 24, 2022 at 09:22:11AM +, Christophe Leroy wrote:
> This series allow architectures to request having modules data in
> vmalloc area instead of module area.
> 
> This is required on powerpc book3s/32 in order to set data non
> executable, because it is not possible to set executability on page
> basis, this is done per 256 Mbytes segments. The module area has exec
> right, vmalloc area has noexec.
> 
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.

Am I understanding that this entire effort is for 32-bit powerpc?
If so, why such an interest in 32-bit these days?

  Luis


Re: [PATCH v5 0/6] KEXEC_SIG with appended signature

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:42PM +0100, Michal Suchanek wrote:
> Hello,
> 
> This is a refresh of the KEXEC_SIG series.
> 
> This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
> with appended signatures in the kernel.
> 
> powerpc supports IMA_KEXEC but that's an exception rather than the norm.
> On the other hand, KEXEC_SIG is portable across platforms.
> 
> For distributions to have uniform security features across platforms one
> option should be used on all platforms.
> 
> Thanks
> 
> Michal
> 
> Previous revision: 
> https://lore.kernel.org/linuxppc-dev/cover.1637862358.git.msucha...@suse.de/
> Patched kernel tree: https://github.com/hramrach/kernel/tree/kexec_sig
> 
> Michal Suchanek (6):
>   s390/kexec_file: Don't opencode appended signature check.
>   powerpc/kexec_file: Add KEXEC_SIG support.
>   kexec_file: Don't opencode appended signature verification.
>   module: strip the signature marker in the verification function.
>   module: Use key_being_used_for for log messages in
> verify_appended_signature
>   module: Move duplicate mod_check_sig users code to mod_parse_sig

What tree should this go through? I'd prefer if over through modules
tree as it can give a chance for Aaron Tomlin to work with this for his
code refactoring of kernel/module*.c to kernel/module/

  Luis


Re: [PATCH v5 6/6] module: Move duplicate mod_check_sig users code to mod_parse_sig

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:48PM +0100, Michal Suchanek wrote:
> Multiple users of mod_check_sig check for the marker, then call
> mod_check_sig, extract signature length, and remove the signature.
> 
> Put this code in one place together with mod_check_sig.
> 
> This changes the error from ENOENT to ENODATA for ima_read_modsig in the
> case the signature marker is missing.
> 
> This also changes the buffer length in ima_read_modsig from size_t to
> unsigned long. This reduces the possible value range on 32bit but the
> length refers to kernel in-memory buffer which cannot be longer than
> ULONG_MAX.
> 
> Also change mod_check_sig to unsigned long while at it.
> 
> Signed-off-by: Michal Suchanek 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 5/6] module: Use key_being_used_for for log messages in verify_appended_signature

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:47PM +0100, Michal Suchanek wrote:
> Add value for kexec appended signature and pass in key_being_used_for
> enum rather than a string to verify_appended_signature to produce log
> messages about the signature.
> 
> Signed-off-by: Michal Suchanek 

Acked-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 4/6] module: strip the signature marker in the verification function.

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:46PM +0100, Michal Suchanek wrote:
> It is stripped by each caller separately.
> 
> Note: this changes the error for kexec_file from EKEYREJECTED to ENODATA
> when the signature marker is missing.
> 
> Signed-off-by: Michal Suchanek 
> ---
> v3: - Philipp Rudo : Update the commit with note about
>   change of raturn value
> - the module_signature.h is now no longer needed for kexec_file

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 3/6] kexec_file: Don't opencode appended signature verification.

2022-01-25 Thread Luis Chamberlain
On Tue, Jan 11, 2022 at 12:37:45PM +0100, Michal Suchanek wrote:
> diff --git a/include/linux/verification.h b/include/linux/verification.h
> index a655923335ae..32db9287a7b0 100644
> --- a/include/linux/verification.h
> +++ b/include/linux/verification.h
> @@ -60,5 +60,8 @@ extern int verify_pefile_signature(const void *pebuf, 
> unsigned pelen,
>  enum key_being_used_for usage);
>  #endif
>  
> +int verify_appended_signature(const void *data, unsigned long *len,
> +   struct key *trusted_keys, const char *what);
> +

Looks very non-module specific.

> diff --git a/kernel/module_signing.c b/kernel/module_signing.c
> index 8723ae70ea1f..30149969f21f 100644
> --- a/kernel/module_signing.c
> +++ b/kernel/module_signing.c
> @@ -14,32 +14,38 @@
>  #include 
>  #include "module-internal.h"
>  
> -/*
> - * Verify the signature on a module.
> +/**
> + * verify_appended_signature - Verify the signature on a module with the
> + * signature marker stripped.
> + * @data: The data to be verified
> + * @len: Size of @data.
> + * @trusted_keys: Keyring to use for verification
> + * @what: Informational string for log messages
>   */
> -int mod_verify_sig(const void *mod, struct load_info *info)
> +int verify_appended_signature(const void *data, unsigned long *len,
> +   struct key *trusted_keys, const char *what)
>  {
> - struct module_signature ms;
> - size_t sig_len, modlen = info->len;
> + struct module_signature *ms;

There goes the abstraction, so why not make this clear where we re-use
the struct module_signature for various things and call it as it is,
verify_mod_appended_signature() or some such?

David? Any preference?

Other than that:

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v4 1/6] s390/kexec_file: Don't opencode appended signature check.

2022-01-25 Thread Luis Chamberlain
On Mon, Jan 10, 2022 at 02:49:53PM +0100, Michal Suchanek wrote:
> Module verification already implements appeded signature check.
> 
> Reuse it for kexec_file.
> 
> The kexec_file implementation uses EKEYREJECTED error in some cases when
> there is no key and the common implementation uses ENOPKG or EBADMSG
> instead.
> 
> Signed-off-by: Michal Suchanek 
> Acked-by: Heiko Carstens 
> ---
> v3: Philipp Rudo : Update the commit with note about
> change of return value
> ---
>  arch/s390/kernel/machine_kexec_file.c | 22 +-
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/s390/kernel/machine_kexec_file.c 
> b/arch/s390/kernel/machine_kexec_file.c
> index 8f43575a4dd3..c944d71316c7 100644
> --- a/arch/s390/kernel/machine_kexec_file.c
> +++ b/arch/s390/kernel/machine_kexec_file.c
> @@ -31,6 +31,7 @@ int s390_verify_sig(const char *kernel, unsigned long 
> kernel_len)
>   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
>   struct module_signature *ms;
>   unsigned long sig_len;
> + int ret;
>  
>   /* Skip signature verification when not secure IPLed. */
>   if (!ipl_secure_flag)
> @@ -45,25 +46,12 @@ int s390_verify_sig(const char *kernel, unsigned long 
> kernel_len)
>   kernel_len -= marker_len;
>  
>   ms = (void *)kernel + kernel_len - sizeof(*ms);
> - kernel_len -= sizeof(*ms);
> + ret = mod_check_sig(ms, kernel_len, "kexec");
> + if (ret)
> + return ret;
>  
>   sig_len = be32_to_cpu(ms->sig_len);
> - if (sig_len >= kernel_len)
> - return -EKEYREJECTED;

There is a small minor fix here, where by using mod_check_sig() now
decreased the kernel_len by the sizeof(*ms). It is minor though.

> - kernel_len -= sig_len;
> -
> - if (ms->id_type != PKEY_ID_PKCS7)
> - return -EKEYREJECTED;

More importantly is the return value used here changes but given the
Ack by Heiko I suspect this if fine and does not break old userspace,
the only change here is the possible error value returned by the
kexec_file_load() system call.

Reviewed-by: Luis Chamberlain 

   Luis


Re: [PATCH v2 6/8] inotify: simplify subdirectory registration with register_sysctl()

2021-11-24 Thread Luis Chamberlain
On Wed, Nov 24, 2021 at 10:44:09AM +0100, Jan Kara wrote:
> On Tue 23-11-21 12:24:20, Luis Chamberlain wrote:
> > From: Xiaoming Ni 
> > 
> > There is no need to user boiler plate code to specify a set of base
> > directories we're going to stuff sysctls under. Simplify this by using
> > register_sysctl() and specifying the directory path directly.
> > 
> > Move inotify_user sysctl to inotify_user.c while at it to remove clutter
> > from kernel/sysctl.c.
> > 
> > Signed-off-by: Xiaoming Ni 
> > [mcgrof: update commit log to reflect new path we decided to take]
> > Signed-off-by: Luis Chamberlain 
> 
> This looks fishy. You register inotify_table but not fanotify_table and
> remove both...

Indeed, the following was missing, I'll roll it in:

diff --git a/fs/notify/fanotify/fanotify_user.c 
b/fs/notify/fanotify/fanotify_user.c
index 559bc1e9926d..a35693eb1f36 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -59,7 +59,7 @@ static int fanotify_max_queued_events __read_mostly;
 static long ft_zero = 0;
 static long ft_int_max = INT_MAX;
 
-struct ctl_table fanotify_table[] = {
+static struct ctl_table fanotify_table[] = {
{
.procname   = "max_user_groups",
.data   = _user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS],
@@ -88,6 +88,13 @@ struct ctl_table fanotify_table[] = {
},
{ }
 };
+
+static void __init fanotify_sysctls_init(void)
+{
+   register_sysctl("fs/fanotify", fanotify_table);
+}
+#else
+#define fanotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 /*
@@ -1685,6 +1692,7 @@ static int __init fanotify_user_setup(void)
init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
FANOTIFY_DEFAULT_MAX_GROUPS;
init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks;
+   fanotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 616af2ea20f3..556cc63c88ee 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-extern struct ctl_table fanotify_table[]; /* for sysctl */
-
 #define FAN_GROUP_FLAG(group, flag) \
((group)->fanotify_data.flags & (flag))
 


[PATCH v2 2/8] i915: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/gpu/drm/i915/i915_perf.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2f01b8c0284c..5979e3258647 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4273,26 +4273,6 @@ static struct ctl_table oa_table[] = {
{}
 };
 
-static struct ctl_table i915_root[] = {
-   {
-.procname = "i915",
-.maxlen = 0,
-.mode = 0555,
-.child = oa_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = i915_root,
-},
-   {}
-};
-
 static void oa_init_supported_formats(struct i915_perf *perf)
 {
struct drm_i915_private *i915 = perf->i915;
@@ -4488,7 +4468,7 @@ static int destroy_config(int id, void *p, void *data)
 
 int i915_perf_sysctl_register(void)
 {
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl("dev/i915", oa_table);
return 0;
 }
 
-- 
2.33.0



[PATCH v2 0/8] sysctl: second set of kernel/sysctl cleanups

2021-11-23 Thread Luis Chamberlain
This is the 2nd set of kernel/sysctl.c cleanups. The diff stat should
reflect how this is a much better way to deal with theses. Fortunately
coccinelle can be used to ensure correctness for most of these and/or
future merge conflicts.

Note that since this is part of a larger effort to cleanup
kernel/sysctl.c I think we have no other option but to go with
merging these patches in either Andrew's tree or keep them staged
in a separate tree and send a merge request later. Otherwise
kernel/sysctl.c will end up becoming a sore spot for the next
merge window.

Changes in this v2:

 * As suggested by Eric W. Biederman I dropped the subdir new call
   and just used the register_sysctl() by specifying the parent
   directory.
 * 0-day cleanups, commit log enhancements
 * Updated the coccinelle patch with register_sysctl()

Luis Chamberlain (6):
  hpet: simplify subdirectory registration with register_sysctl()
  i915: simplify subdirectory registration with register_sysctl()
  macintosh/mac_hid.c: simplify subdirectory registration with
register_sysctl()
  ocfs2: simplify subdirectory registration with register_sysctl()
  test_sysctl: simplify subdirectory registration with register_sysctl()
  cdrom: simplify subdirectory registration with register_sysctl()

Xiaoming Ni (2):
  inotify: simplify subdirectory registration with register_sysctl()
  eventpoll: simplify sysctl declaration with register_sysctl()

 drivers/cdrom/cdrom.c| 23 +--
 drivers/char/hpet.c  | 22 +-
 drivers/gpu/drm/i915/i915_perf.c | 22 +-
 drivers/macintosh/mac_hid.c  | 24 +---
 fs/eventpoll.c   | 10 +-
 fs/notify/inotify/inotify_user.c | 11 ++-
 fs/ocfs2/stackglue.c | 25 +
 include/linux/inotify.h  |  3 ---
 include/linux/poll.h |  2 --
 include/linux/sysctl.h   |  1 -
 kernel/sysctl.c  | 28 
 lib/test_sysctl.c| 22 +-
 12 files changed, 25 insertions(+), 168 deletions(-)

-- 
2.33.0



[PATCH v2 6/8] inotify: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
From: Xiaoming Ni 

There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

Move inotify_user sysctl to inotify_user.c while at it to remove clutter
from kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
[mcgrof: update commit log to reflect new path we decided to take]
Signed-off-by: Luis Chamberlain 
---
 fs/notify/inotify/inotify_user.c | 11 ++-
 include/linux/inotify.h  |  3 ---
 kernel/sysctl.c  | 21 -
 3 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 29fca3284bb5..54583f62dc44 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -58,7 +58,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
 static long it_zero = 0;
 static long it_int_max = INT_MAX;
 
-struct ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname   = "max_user_instances",
.data   = 
_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES],
@@ -87,6 +87,14 @@ struct ctl_table inotify_table[] = {
},
{ }
 };
+
+static void __init inotify_sysctls_init(void)
+{
+   register_sysctl("fs/inotify", inotify_table);
+}
+
+#else
+#define inotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
@@ -849,6 +857,7 @@ static int __init inotify_user_setup(void)
inotify_max_queued_events = 16384;
init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
+   inotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index 6a24905f6e1e..8d20caa1b268 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -7,11 +7,8 @@
 #ifndef _LINUX_INOTIFY_H
 #define _LINUX_INOTIFY_H
 
-#include 
 #include 
 
-extern struct ctl_table inotify_table[]; /* for sysctl */
-
 #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | 
\
  IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
  IN_MOVED_TO | IN_CREATE | IN_DELETE | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 7a90a12b9ea4..6aa67c737e4e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -125,13 +125,6 @@ static const int maxolduid = 65535;
 static const int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
 
-#ifdef CONFIG_INOTIFY_USER
-#include 
-#endif
-#ifdef CONFIG_FANOTIFY
-#include 
-#endif
-
 #ifdef CONFIG_PROC_SYSCTL
 
 /**
@@ -3099,20 +3092,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_INOTIFY_USER
-   {
-   .procname   = "inotify",
-   .mode   = 0555,
-   .child  = inotify_table,
-   },
-#endif
-#ifdef CONFIG_FANOTIFY
-   {
-   .procname   = "fanotify",
-   .mode   = 0555,
-   .child  = fanotify_table,
-   },
-#endif
 #ifdef CONFIG_EPOLL
{
.procname   = "epoll",
-- 
2.33.0



[PATCH v2 5/8] test_sysctl: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci lib/test_sysctl.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 lib/test_sysctl.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 3750323973f4..a5a3d6c27e1f 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -128,26 +128,6 @@ static struct ctl_table test_table[] = {
{ }
 };
 
-static struct ctl_table test_sysctl_table[] = {
-   {
-   .procname   = "test_sysctl",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_table,
-   },
-   { }
-};
-
-static struct ctl_table test_sysctl_root_table[] = {
-   {
-   .procname   = "debug",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_sysctl_table,
-   },
-   { }
-};
-
 static struct ctl_table_header *test_sysctl_header;
 
 static int __init test_sysctl_init(void)
@@ -155,7 +135,7 @@ static int __init test_sysctl_init(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
-   test_sysctl_header = register_sysctl_table(test_sysctl_root_table);
+   test_sysctl_header = register_sysctl("debug/test_sysctl", test_table);
if (!test_sysctl_header) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
-- 
2.33.0



[PATCH v2 8/8] eventpoll: simplify sysctl declaration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
From: Xiaoming Ni 

The kernel/sysctl.c is a kitchen sink where everyone leaves
their dirty dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to
places where they actually belong. The proc sysctl maintainers
do not want to know what sysctl knobs you wish to add for your own
piece of code, we just care about the core logic.

So move the epoll_table sysctl to fs/eventpoll.c and use
use register_sysctl().

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/eventpoll.c | 10 +-
 include/linux/poll.h   |  2 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  7 ---
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 06f4c5ae1451..e2daa940ebce 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -307,7 +307,7 @@ static void unlist_file(struct epitems_head *head)
 static long long_zero;
 static long long_max = LONG_MAX;
 
-struct ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname   = "max_user_watches",
.data   = _user_watches,
@@ -319,6 +319,13 @@ struct ctl_table epoll_table[] = {
},
{ }
 };
+
+static void __init epoll_sysctls_init(void)
+{
+   register_sysctl("fs/epoll", epoll_table);
+}
+#else
+#define epoll_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static const struct file_operations eventpoll_fops;
@@ -2378,6 +2385,7 @@ static int __init eventpoll_init(void)
/* Allocates slab cache used to allocate "struct eppoll_entry" */
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+   epoll_sysctls_init();
 
ephead_cache = kmem_cache_create("ep_head",
sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 1cdc32b1f1b0..a9e0e1c2d1f2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -8,12 +8,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-extern struct ctl_table epoll_table[]; /* for sysctl */
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
 #ifdef __clang__
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 718492057c70..5e0428a71899 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -218,7 +218,6 @@ extern int no_unaligned_warning;
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
 extern struct ctl_table firmware_config_table[];
-extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6aa67c737e4e..b09ff41720e3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3092,13 +3092,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_EPOLL
-   {
-   .procname   = "epoll",
-   .mode   = 0555,
-   .child  = epoll_table,
-   },
-#endif
 #endif
{
.procname   = "protected_symlinks",
-- 
2.33.0



[PATCH v2 1/8] hpet: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci drivers/char/hpet.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/char/hpet.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 4e5431f01450..563dfae3b8da 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = {
{}
 };
 
-static struct ctl_table hpet_root[] = {
-   {
-.procname = "hpet",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_root,
-},
-   {}
-};
-
 static struct ctl_table_header *sysctl_header;
 
 /*
@@ -1061,7 +1041,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
 
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl("dev/hpet", hpet_table);
 
result = acpi_bus_register_driver(_acpi_driver);
if (result < 0) {
-- 
2.33.0



[PATCH v2 7/8] cdrom: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/cdrom/cdrom.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 9877e413fce3..1b57d4666e43 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3691,27 +3691,6 @@ static struct ctl_table cdrom_table[] = {
},
{ }
 };
-
-static struct ctl_table cdrom_cdrom_table[] = {
-   {
-   .procname   = "cdrom",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_table,
-   },
-   { }
-};
-
-/* Make sure that /proc/sys/dev is there */
-static struct ctl_table cdrom_root_table[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_cdrom_table,
-   },
-   { }
-};
 static struct ctl_table_header *cdrom_sysctl_header;
 
 static void cdrom_sysctl_register(void)
@@ -3721,7 +3700,7 @@ static void cdrom_sysctl_register(void)
if (!atomic_add_unless(, 1, 1))
return;
 
-   cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+   cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table);
 
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
-- 
2.33.0



[PATCH v2 3/8] macintosh/mac_hid.c: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/macintosh/mac_hid.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 28b8581b44dd..d8c4d5664145 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -239,33 +239,11 @@ static struct ctl_table mac_hid_files[] = {
{ }
 };
 
-/* dir in /proc/sys/dev */
-static struct ctl_table mac_hid_dir[] = {
-   {
-   .procname   = "mac_hid",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_files,
-   },
-   { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static struct ctl_table mac_hid_root_dir[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_dir,
-   },
-   { }
-};
-
 static struct ctl_table_header *mac_hid_sysctl_header;
 
 static int __init mac_hid_init(void)
 {
-   mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+   mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files);
if (!mac_hid_sysctl_header)
return -ENOMEM;
 
-- 
2.33.0



[PATCH v2 4/8] ocfs2: simplify subdirectory registration with register_sysctl()

2021-11-23 Thread Luis Chamberlain
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) 
};
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 fs/ocfs2/stackglue.c | 25 +
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 16f1bfc407f2..731558a6f27d 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -672,31 +672,8 @@ static struct ctl_table ocfs2_mod_table[] = {
{ }
 };
 
-static struct ctl_table ocfs2_kern_table[] = {
-   {
-   .procname   = "ocfs2",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_mod_table
-   },
-   { }
-};
-
-static struct ctl_table ocfs2_root_table[] = {
-   {
-   .procname   = "fs",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_kern_table
-   },
-   { }
-};
-
 static struct ctl_table_header *ocfs2_table_header;
 
-
 /*
  * Initialization
  */
@@ -705,7 +682,7 @@ static int __init ocfs2_stack_glue_init(void)
 {
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
 
-   ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+   ocfs2_table_header = register_sysctl("fs/ocfs2", ocfs2_mod_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
   "ocfs2 stack glue: unable to register sysctl\n");
-- 
2.33.0



Re: [PATCH 12/13] sysctl: add helper to register empty subdir

2021-11-16 Thread Luis Chamberlain
On Fri, May 29, 2020 at 08:03:02AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain  writes:
> 
> > The way to create a subdirectory from the base set of directories
> > is a bit obscure, so provide a helper which makes this clear, and
> > also helps remove boiler plate code required to do this work.
> 
> I agreee calling:
> register_sysctl("fs/binfmt_misc", sysctl_mount_point)
> is a bit obscure but if you are going to make a wrapper
> please make it the trivial one liner above.
> 
> Say something that looks like:
>   struct sysctl_header *register_sysctl_mount_point(const char *path)
> {
>   return register_sysctl(path, sysctl_mount_point);
> }
> 
> And yes please talk about a mount point and not an empty dir, as these
> are permanently empty directories to serve as mount points.  There are
> some subtle but important permission checks this allows in the case of
> unprivileged mounts.
> 
> Further code like this belong in proc_sysctl.c next to all of the code
> it is related to so that it is easier to see how to refactor the code if
> necessary.

Alrighty, it's been a while since this kernel/sysctl.c kitchen sink
cleanup... so it's time to respin this now that the merge window is
open.  I already rebased patches, addressed all input and now just
waiting to fix any compilation errors.  I'm going to split the patches
up into real small sets so to ensure we just get this through becauase
getting this in otherwise is going to be hard.

I'd appreciate folk's review once the patches start going out. I think
a hard part will be deciding what tree this should got through.

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-03 Thread Luis Chamberlain
On Tue, Nov 02, 2021 at 07:28:02PM -0600, Jens Axboe wrote:
> On 11/2/21 6:49 PM, Dan Williams wrote:
> > On Tue, Nov 2, 2021 at 5:10 PM Luis Chamberlain  wrote:
> >>
> >> On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> >>> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  
> >>> wrote:
> >>>>
> >>>> If nd_integrity_init() fails we'd get del_gendisk() called,
> >>>> but that's not correct as we should only call that if we're
> >>>> done with device_add_disk(). Fix this by providing unwinding
> >>>> prior to the devm call being registered and moving the devm
> >>>> registration to the very end.
> >>>>
> >>>> This should fix calling del_gendisk() if nd_integrity_init()
> >>>> fails. I only spotted this issue through code inspection. It
> >>>> does not fix any real world bug.
> >>>>
> >>>
> >>> Just fyi, I'm preparing patches to delete this driver completely as it
> >>> is unused by any shipping platform. I hope to get that removal into
> >>> v5.16.
> >>
> >> Curious if are you going to nuking it on v5.16? Otherwise it would stand
> >> in the way of the last few patches to add __must_check for the final
> >> add_disk() error handling changes.
> > 
> > True, I don't think I can get it nuked in time, so you can add my
> > Reviewed-by for this one.
> 
> Luis, I lost track of the nv* patches from this discussion. If you want
> them in 5.16 and they are reviewed, please do resend and I'll pick them
> up for the middle-of-merge-window push.

Sure thing, I'll resend whatever is left. I also noticed for some reason
I forgot to convert nvdimm/pmem and so I'll roll those new patches in,
but I suspect that those might be too late unless we get them reviewed
in time.

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-03 Thread Luis Chamberlain
On Tue, Nov 02, 2021 at 05:49:12PM -0700, Dan Williams wrote:
> On Tue, Nov 2, 2021 at 5:10 PM Luis Chamberlain  wrote:
> >
> > On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> > > On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  
> > > wrote:
> > > >
> > > > If nd_integrity_init() fails we'd get del_gendisk() called,
> > > > but that's not correct as we should only call that if we're
> > > > done with device_add_disk(). Fix this by providing unwinding
> > > > prior to the devm call being registered and moving the devm
> > > > registration to the very end.
> > > >
> > > > This should fix calling del_gendisk() if nd_integrity_init()
> > > > fails. I only spotted this issue through code inspection. It
> > > > does not fix any real world bug.
> > > >
> > >
> > > Just fyi, I'm preparing patches to delete this driver completely as it
> > > is unused by any shipping platform. I hope to get that removal into
> > > v5.16.
> >
> > Curious if are you going to nuking it on v5.16? Otherwise it would stand
> > in the way of the last few patches to add __must_check for the final
> > add_disk() error handling changes.
> 
> True, I don't think I can get it nuked in time, so you can add my
> Reviewed-by for this one.

This patch required the previous patch in this series to also be
applied. Can I apply your Reviewed-by there too?

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-11-02 Thread Luis Chamberlain
On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > If nd_integrity_init() fails we'd get del_gendisk() called,
> > but that's not correct as we should only call that if we're
> > done with device_add_disk(). Fix this by providing unwinding
> > prior to the devm call being registered and moving the devm
> > registration to the very end.
> >
> > This should fix calling del_gendisk() if nd_integrity_init()
> > fails. I only spotted this issue through code inspection. It
> > does not fix any real world bug.
> >
> 
> Just fyi, I'm preparing patches to delete this driver completely as it
> is unused by any shipping platform. I hope to get that removal into
> v5.16.

Curious if are you going to nuking it on v5.16? Otherwise it would stand
in the way of the last few patches to add __must_check for the final
add_disk() error handling changes.

  Luis


Re: [PATCH 03/13] nvdimm/btt: do not call del_gendisk() if not needed

2021-11-02 Thread Luis Chamberlain
On Sun, Oct 31, 2021 at 10:47:22AM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > We know we don't need del_gendisk() if we haven't added
> > the disk, so just skip it. This should fix a bug on older
> > kernels, as del_gendisk() became able to deal with
> > disks not added only recently, after the patch titled
> > "block: add flag for add_disk() completion notation".
> 
> Perhaps put this in:
> 
> commit $abbrev_commit ("block: add flag for add_disk() completion 
> notation")
> 
> ...format, but I can't seem to find that commit?

Indeed, that patch got dropped and it would seem Christoph preferred
a simpler approach with the new disk_live()

commit 40b3a52ffc5bc3b5427d5d35b035cfb19d03fdd6
Author: Christoph Hellwig 
Date:   Wed Aug 18 16:45:32 2021 +0200

block: add a sanity check for a live disk in del_gendisk

> If you're touching the changelog how about one that clarifies the
> impact and drops "we"?
> "del_gendisk() is not required if the disk has not been added. On
> kernels prior to commit $abbrev_commit ("block: add flag for
> add_disk() completion notation")
> it is mandatory to not call del_gendisk() if the underlying device has
> not been through device_add()."
> 
> Fixes: 41cd8b70c37a ("libnvdimm, btt: add support for blk integrity")
> 
> With that you can add:
> 
> Reviewed-by: Dan Williams 

You got it.

  Luis


Re: [PATCH 00/13] block: add_disk() error handling stragglers

2021-10-25 Thread Luis Chamberlain
On Thu, Oct 21, 2021 at 08:10:49PM -0700, Geoff Levand wrote:
> Hi Luis,
> 
> On 10/18/21 9:15 AM, Luis Chamberlain wrote:
> > On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> >> Hi Luis,
> >>
> >> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> >>> This patch set consists of al the straggler drivers for which we have
> >>> have no patch reviews done for yet. I'd like to ask for folks to please
> >>> consider chiming in, specially if you're the maintainer for the driver.
> >>> Additionally if you can specify if you'll take the patch in yourself or
> >>> if you want Jens to take it, that'd be great too.
> >>
> >> Do you have a git repo with the patch set applied that I can use to test 
> >> with?
> > 
> > Sure, although the second to last patch is in a state of flux given
> > the ataflop driver currently is broken and so we're seeing how to fix
> > that first:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling
> 
> That branch has so many changes applied on top of the base v5.15-rc4
> that the patches I need to apply to test on PS3 with don't apply.
> 
> Do you have something closer to say v5.15-rc5?  Preferred would be
> just your add_disk() error handling patches plus what they depend
> on.

If you just want to test the ps3 changes, I've put this branch together
just for yo, its based on v5.15-rc6:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20211025-ps3-add-disk

  Luis


Re: [PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-10-19 Thread Luis Chamberlain
On Fri, Oct 15, 2021 at 05:13:48PM -0700, Dan Williams wrote:
> On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain  wrote:
> >
> > If nd_integrity_init() fails we'd get del_gendisk() called,
> > but that's not correct as we should only call that if we're
> > done with device_add_disk(). Fix this by providing unwinding
> > prior to the devm call being registered and moving the devm
> > registration to the very end.
> >
> > This should fix calling del_gendisk() if nd_integrity_init()
> > fails. I only spotted this issue through code inspection. It
> > does not fix any real world bug.
> >
> 
> Just fyi, I'm preparing patches to delete this driver completely as it
> is unused by any shipping platform. I hope to get that removal into
> v5.16.

I'll remove this from my queue, any chance you can review the changes
for nvdimm/btt?

  Luis


Re: [PATCH 00/13] block: add_disk() error handling stragglers

2021-10-18 Thread Luis Chamberlain
On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> Hi Luis,
> 
> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> > This patch set consists of al the straggler drivers for which we have
> > have no patch reviews done for yet. I'd like to ask for folks to please
> > consider chiming in, specially if you're the maintainer for the driver.
> > Additionally if you can specify if you'll take the patch in yourself or
> > if you want Jens to take it, that'd be great too.
> 
> Do you have a git repo with the patch set applied that I can use to test with?

Sure, although the second to last patch is in a state of flux given
the ataflop driver currently is broken and so we're seeing how to fix
that first:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling

  Luis


[PATCH 08/13] zram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/zram/zram_drv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0a9309a2ef54..bdbded107e6b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1983,7 +1983,9 @@ static int zram_add(void)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
-   device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+   ret = device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+   if (ret)
+   goto out_cleanup_disk;
 
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 
@@ -1991,6 +1993,8 @@ static int zram_add(void)
pr_info("Added device: %s\n", zram->disk->disk_name);
return device_id;
 
+out_cleanup_disk:
+   blk_cleanup_disk(zram->disk);
 out_free_idr:
idr_remove(_index_idr, device_id);
 out_free_dev:
-- 
2.30.2



[PATCH 11/13] ps3vram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH 05/13] nvdimm/btt: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 23ee8c005db5..57b921c5fbb5 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1542,7 +1542,9 @@ static int btt_blk_init(struct btt *btt)
}
 
set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
-   device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+   rc = device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+   if (rc)
+   goto out_cleanup_disk;
 
btt->nd_btt->size = btt->nlba * (u64)btt->sector_size;
nvdimm_check_and_set_ro(btt->btt_disk);
-- 
2.30.2



[PATCH 13/13] mtd/ubi/block: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/mtd/ubi/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
list_add_tail(>list, _devices);
 
/* Must be the last step: anyone can call file ops from now on */
-   add_disk(dev->gd);
+   ret = add_disk(dev->gd);
+   if (ret)
+   goto out_destroy_wq;
+
dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 dev->ubi_num, dev->vol_id, vi->name);
mutex_unlock(_mutex);
return 0;
 
+out_destroy_wq:
+   list_del(>list);
+   destroy_workqueue(dev->wq);
 out_remove_minor:
idr_remove(_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2



[PATCH 09/13] z2ram: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling. Only the disk is cleaned up inside
z2ram_register_disk() as the caller deals with the rest.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/z2ram.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/z2ram.c b/drivers/block/z2ram.c
index 4eef218108c6..ccc52c935faf 100644
--- a/drivers/block/z2ram.c
+++ b/drivers/block/z2ram.c
@@ -318,6 +318,7 @@ static const struct blk_mq_ops z2_mq_ops = {
 static int z2ram_register_disk(int minor)
 {
struct gendisk *disk;
+   int err;
 
disk = blk_mq_alloc_disk(_set, NULL);
if (IS_ERR(disk))
@@ -333,8 +334,10 @@ static int z2ram_register_disk(int minor)
sprintf(disk->disk_name, "z2ram");
 
z2ram_gendisk[minor] = disk;
-   add_disk(disk);
-   return 0;
+   err = add_disk(disk);
+   if (err)
+   blk_cleanup_disk(disk);
+   return err;
 }
 
 static int __init z2_init(void)
-- 
2.30.2



[PATCH 06/13] nvdimm/blk: avoid calling del_gendisk() on early failures

2021-10-15 Thread Luis Chamberlain
If nd_integrity_init() fails we'd get del_gendisk() called,
but that's not correct as we should only call that if we're
done with device_add_disk(). Fix this by providing unwinding
prior to the devm call being registered and moving the devm
registration to the very end.

This should fix calling del_gendisk() if nd_integrity_init()
fails. I only spotted this issue through code inspection. It
does not fix any real world bug.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/blk.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 088d3dd6f6fa..591fa1f86f1e 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -240,6 +240,7 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
resource_size_t available_disk_size;
struct gendisk *disk;
u64 internal_nlba;
+   int rc;
 
internal_nlba = div_u64(nsblk->size, nsblk_internal_lbasize(nsblk));
available_disk_size = internal_nlba * nsblk_sector_size(nsblk);
@@ -256,20 +257,26 @@ static int nsblk_attach_disk(struct nd_namespace_blk 
*nsblk)
blk_queue_logical_block_size(disk->queue, nsblk_sector_size(nsblk));
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
 
-   if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
-   return -ENOMEM;
-
if (nsblk_meta_size(nsblk)) {
-   int rc = nd_integrity_init(disk, nsblk_meta_size(nsblk));
+   rc = nd_integrity_init(disk, nsblk_meta_size(nsblk));
 
if (rc)
-   return rc;
+   goto out_before_devm_err;
}
 
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
device_add_disk(dev, disk, NULL);
+
+   /* nd_blk_release_disk() is called if this fails */
+   if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
+   return -ENOMEM;
+
nvdimm_check_and_set_ro(disk);
return 0;
+
+out_before_devm_err:
+   blk_cleanup_disk(disk);
+   return rc;
 }
 
 static int nd_blk_probe(struct device *dev)
-- 
2.30.2



[PATCH 10/13] ps3disk: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3disk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device 
*_dev)
 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 get_capacity(gendisk) >> 11);
 
-   device_add_disk(>sbd.core, gendisk, NULL);
-   return 0;
+   error = device_add_disk(>sbd.core, gendisk, NULL);
+   if (error)
+   goto fail_cleanup_disk;
 
+   return 0;
+fail_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 fail_free_tag_set:
blk_mq_free_tag_set(>tag_set);
 fail_teardown:
-- 
2.30.2



[PATCH 03/13] nvdimm/btt: do not call del_gendisk() if not needed

2021-10-15 Thread Luis Chamberlain
We know we don't need del_gendisk() if we haven't added
the disk, so just skip it. This should fix a bug on older
kernels, as del_gendisk() became able to deal with
disks not added only recently, after the patch titled
"block: add flag for add_disk() completion notation".

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 52de60b7adee..29cc7325e890 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1538,7 +1538,6 @@ static int btt_blk_init(struct btt *btt)
int rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
 
if (rc) {
-   del_gendisk(btt->btt_disk);
blk_cleanup_disk(btt->btt_disk);
return rc;
}
-- 
2.30.2



[PATCH 02/13] nvme-multipath: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since we now can tell for sure when a disk was added, move
setting the bit NVME_NSHEAD_DISK_LIVE only when we did
add the disk successfully.

Nothing to do here as the cleanup is done elsewhere. We take
care and use test_and_set_bit() because it is protects against
two nvme paths simultaneously calling device_add_disk() on the
same namespace head.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvme/host/multipath.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index e8ccdd398f78..022837f7be41 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -496,13 +496,23 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct 
nvme_ns_head *head)
 static void nvme_mpath_set_live(struct nvme_ns *ns)
 {
struct nvme_ns_head *head = ns->head;
+   int rc;
 
if (!head->disk)
return;
 
+   /*
+* test_and_set_bit() is used because it is protecting against two nvme
+* paths simultaneously calling device_add_disk() on the same namespace
+* head.
+*/
if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, >flags)) {
-   device_add_disk(>subsys->dev, head->disk,
-   nvme_ns_id_attr_groups);
+   rc = device_add_disk(>subsys->dev, head->disk,
+nvme_ns_id_attr_groups);
+   if (rc) {
+   clear_bit(NVME_NSHEAD_DISK_LIVE, >flags);
+   return;
+   }
nvme_add_ns_head_cdev(head);
}
 
-- 
2.30.2



[PATCH 00/13] block: add_disk() error handling stragglers

2021-10-15 Thread Luis Chamberlain
This patch set consists of al the straggler drivers for which we have
have no patch reviews done for yet. I'd like to ask for folks to please
consider chiming in, specially if you're the maintainer for the driver.
Additionally if you can specify if you'll take the patch in yourself or
if you want Jens to take it, that'd be great too.

Luis Chamberlain (13):
  block/brd: add error handling support for add_disk()
  nvme-multipath: add error handling support for add_disk()
  nvdimm/btt: do not call del_gendisk() if not needed
  nvdimm/btt: use goto error labels on btt_blk_init()
  nvdimm/btt: add error handling support for add_disk()
  nvdimm/blk: avoid calling del_gendisk() on early failures
  nvdimm/blk: add error handling support for add_disk()
  zram: add error handling support for add_disk()
  z2ram: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/brd.c   |  9 +++--
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/z2ram.c |  7 +--
 drivers/block/zram/zram_drv.c |  6 +-
 drivers/mtd/ubi/block.c   |  8 +++-
 drivers/nvdimm/blk.c  | 21 +++--
 drivers/nvdimm/btt.c  | 24 +++-
 drivers/nvme/host/multipath.c | 14 --
 10 files changed, 89 insertions(+), 29 deletions(-)

-- 
2.30.2



[PATCH 07/13] nvdimm/blk: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since nvdimm/blk uses devm we just need to move the devm
registration towards the end. And in hindsight, that seems
to also provide a fix given del_gendisk() should not be
called unless the disk was already added via add_disk().
The probably of that issue happening is low though, like
OOM while calling devm_add_action(), so the fix is minor.

We manually unwind in case of add_disk() failure prior
to the devm registration.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 591fa1f86f1e..9f1eb41404ac 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -265,7 +265,9 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
}
 
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
-   device_add_disk(dev, disk, NULL);
+   rc = device_add_disk(dev, disk, NULL);
+   if (rc)
+   goto out_before_devm_err;
 
/* nd_blk_release_disk() is called if this fails */
if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
-- 
2.30.2



[PATCH 01/13] block/brd: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/brd.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 530b31240203..6065f493876f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -372,6 +372,7 @@ static int brd_alloc(int i)
struct brd_device *brd;
struct gendisk *disk;
char buf[DISK_NAME_LEN];
+   int err = -ENOMEM;
 
mutex_lock(_devices_mutex);
list_for_each_entry(brd, _devices, brd_list) {
@@ -422,16 +423,20 @@ static int brd_alloc(int i)
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
-   add_disk(disk);
+   err = add_disk(disk);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(disk);
 out_free_dev:
mutex_lock(_devices_mutex);
list_del(>brd_list);
mutex_unlock(_devices_mutex);
kfree(brd);
-   return -ENOMEM;
+   return err;
 }
 
 static void brd_probe(dev_t dev)
-- 
2.30.2



[PATCH 04/13] nvdimm/btt: use goto error labels on btt_blk_init()

2021-10-15 Thread Luis Chamberlain
This will make it easier to share common error paths.

Signed-off-by: Luis Chamberlain 
---
 drivers/nvdimm/btt.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 29cc7325e890..23ee8c005db5 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1520,10 +1520,11 @@ static int btt_blk_init(struct btt *btt)
 {
struct nd_btt *nd_btt = btt->nd_btt;
struct nd_namespace_common *ndns = nd_btt->ndns;
+   int rc = -ENOMEM;
 
btt->btt_disk = blk_alloc_disk(NUMA_NO_NODE);
if (!btt->btt_disk)
-   return -ENOMEM;
+   goto out;
 
nvdimm_namespace_disk_name(ndns, btt->btt_disk->disk_name);
btt->btt_disk->first_minor = 0;
@@ -1535,19 +1536,23 @@ static int btt_blk_init(struct btt *btt)
blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_disk->queue);
 
if (btt_meta_size(btt)) {
-   int rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
-
-   if (rc) {
-   blk_cleanup_disk(btt->btt_disk);
-   return rc;
-   }
+   rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
+   if (rc)
+   goto out_cleanup_disk;
}
+
set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
device_add_disk(>nd_btt->dev, btt->btt_disk, NULL);
+
btt->nd_btt->size = btt->nlba * (u64)btt->sector_size;
nvdimm_check_and_set_ro(btt->btt_disk);
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(btt->btt_disk);
+out:
+   return rc;
 }
 
 static void btt_blk_cleanup(struct btt *btt)
-- 
2.30.2



[PATCH 12/13] block/sunvdc: add error handling support for add_disk()

2021-10-15 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sunvdc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
if (IS_ERR(g)) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
   port->vio.name);
-   blk_mq_free_tag_set(>tag_set);
-   return PTR_ERR(g);
+   err = PTR_ERR(g);
+   goto out_free_tag;
}
 
port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
   port->vdisk_size, (port->vdisk_size >> (20 - 9)),
   port->vio.ver.major, port->vio.ver.minor);
 
-   device_add_disk(>vio.vdev->dev, g, NULL);
+   err = device_add_disk(>vio.vdev->dev, g, NULL);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(g);
+out_free_tag:
+   blk_mq_free_tag_set(>tag_set);
+   return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2



[PATCH v2 08/10] block/sx8: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

A completion is used to notify the initial probe what is
happening and so we must defer error handling on completion.
Do this by remembering the error and using the shared cleanup
function.

The tags are shared and so are hanlded later for the
driver already.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sx8.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index 420cd952ddc4..1c79248c4826 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -297,6 +297,7 @@ struct carm_host {
 
struct work_struct  fsm_task;
 
+   int probe_err;
struct completion   probe_comp;
 };
 
@@ -1181,8 +1182,11 @@ static void carm_fsm_task (struct work_struct *work)
struct gendisk *disk = port->disk;
 
set_capacity(disk, port->capacity);
-   add_disk(disk);
-   activated++;
+   host->probe_err = add_disk(disk);
+   if (!host->probe_err)
+   activated++;
+   else
+   break;
}
 
printk(KERN_INFO DRV_NAME "(%s): %d ports activated\n",
@@ -1192,11 +1196,9 @@ static void carm_fsm_task (struct work_struct *work)
reschedule = 1;
break;
}
-
case HST_PROBE_FINISHED:
complete(>probe_comp);
break;
-
case HST_ERROR:
/* FIXME: TODO */
break;
@@ -1507,7 +1509,10 @@ static int carm_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_free_irq;
 
DPRINTK("waiting for probe_comp\n");
+   host->probe_err = -ENODEV;
wait_for_completion(>probe_comp);
+   if (host->probe_err)
+   goto err_out_free_irq;
 
printk(KERN_INFO "%s: pci %s, ports %d, io %llx, irq %u, major %d\n",
   host->name, pci_name(pdev), (int) CARM_MAX_PORTS,
-- 
2.30.2



[PATCH v2 02/10] pktcdvd: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The out_mem2 error label already does what we need so
re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0f26b2510a75..415248962e67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2729,7 +2729,9 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
/* inherit events of the host device */
disk->events = pd->bdev->bd_disk->events;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_mem2;
 
pkt_sysfs_dev_new(pd);
pkt_debugfs_dev_new(pd);
-- 
2.30.2



[PATCH v2 07/10] block/sunvdc: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sunvdc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
if (IS_ERR(g)) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
   port->vio.name);
-   blk_mq_free_tag_set(>tag_set);
-   return PTR_ERR(g);
+   err = PTR_ERR(g);
+   goto out_free_tag;
}
 
port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
   port->vdisk_size, (port->vdisk_size >> (20 - 9)),
   port->vio.ver.major, port->vio.ver.minor);
 
-   device_add_disk(>vio.vdev->dev, g, NULL);
+   err = device_add_disk(>vio.vdev->dev, g, NULL);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(g);
+out_free_tag:
+   blk_mq_free_tag_set(>tag_set);
+   return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2



[PATCH v2 04/10] ps3vram: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH v2 01/10] mtip32xx: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The read_capacity_error error label already does what we need,
so just re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/mtip32xx/mtip32xx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 901855717cb5..d0b40309f47e 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3633,7 +3633,9 @@ static int mtip_block_initialize(struct driver_data *dd)
set_capacity(dd->disk, capacity);
 
/* Enable the block device and add it to /dev */
-   device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   rv = device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   if (rv)
+   goto read_capacity_error;
 
if (dd->mtip_svc_handler) {
set_bit(MTIP_DDF_INIT_DONE_BIT, >dd_flag);
-- 
2.30.2



[PATCH v2 09/10] pf: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/paride/pf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index f471d48a87bc..380d80e507c7 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -962,7 +962,9 @@ static int __init pf_init_unit(struct pf_unit *pf, bool 
autoprobe, int port,
if (pf_probe(pf))
goto out_pi_release;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_pi_release;
pf->present = 1;
return 0;
 
-- 
2.30.2



[PATCH v2 06/10] block/rsxx: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
index 83636714b8d7..8d9d69f5dfbc 100644
--- a/drivers/block/rsxx/core.c
+++ b/drivers/block/rsxx/core.c
@@ -935,7 +935,9 @@ static int rsxx_pci_probe(struct pci_dev *dev,
card->size8 = 0;
}
 
-   rsxx_attach_dev(card);
+   st = rsxx_attach_dev(card);
+   if (st)
+   goto failed_create_dev;
 
/* Setup Debugfs */
rsxx_debugfs_dev_new(card);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 1cc40b0ea761..b2d3ac3efce2 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -192,6 +192,8 @@ static bool rsxx_discard_supported(struct rsxx_cardinfo 
*card)
 
 int rsxx_attach_dev(struct rsxx_cardinfo *card)
 {
+   int err = 0;
+
mutex_lock(>dev_lock);
 
/* The block device requires the stripe size from the config. */
@@ -200,13 +202,17 @@ int rsxx_attach_dev(struct rsxx_cardinfo *card)
set_capacity(card->gendisk, card->size8 >> 9);
else
set_capacity(card->gendisk, 0);
-   device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
-   card->bdev_attached = 1;
+   err = device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
+   if (err == 0)
+   card->bdev_attached = 1;
}
 
mutex_unlock(>dev_lock);
 
-   return 0;
+   if (err)
+   blk_cleanup_disk(card->gendisk);
+
+   return err;
 }
 
 void rsxx_detach_dev(struct rsxx_cardinfo *card)
-- 
2.30.2



[PATCH v2 00/10] block: fourth batch of add_disk() error handling conversions

2021-09-27 Thread Luis Chamberlain
This is the fourth batch of add_disk() error handling driver
conversions. This set along with the entire 7 set of driver conversions
can be found on my 20210927-for-axboe-add-disk-error-handling branch
[0].

On this v2 series the following modifications have been made since the
last v1 series of this patch set:

  - rebased onto linux-next tag 20210927
  - added the only reviewed-by tag for this series for rnbd Jack Wang

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210927-for-axboe-add-disk-error-handling

Luis Chamberlain (10):
  mtip32xx: add error handling support for add_disk()
  pktcdvd: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  rnbd: add error handling support for add_disk()
  block/rsxx: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  block/sx8: add error handling support for add_disk()
  pf: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/mtip32xx/mtip32xx.c |  4 +++-
 drivers/block/paride/pf.c |  4 +++-
 drivers/block/pktcdvd.c   |  4 +++-
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/rnbd/rnbd-clt.c | 13 +
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/sx8.c   | 13 +
 drivers/mtd/ubi/block.c   |  8 +++-
 11 files changed, 69 insertions(+), 22 deletions(-)

-- 
2.30.2



[PATCH v2 05/10] rnbd: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Acked-by: Jack Wang 
Signed-off-by: Luis Chamberlain 
---
 drivers/block/rnbd/rnbd-clt.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index bd4a41afbbfc..1ba1c868535a 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1384,8 +1384,10 @@ static void setup_request_queue(struct rnbd_clt_dev *dev)
blk_queue_write_cache(dev->queue, dev->wc, dev->fua);
 }
 
-static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
+static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
 {
+   int err;
+
dev->gd->major  = rnbd_client_major;
dev->gd->first_minor= idx << RNBD_PART_BITS;
dev->gd->minors = 1 << RNBD_PART_BITS;
@@ -1410,7 +1412,11 @@ static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev 
*dev, int idx)
 
if (!dev->rotational)
blk_queue_flag_set(QUEUE_FLAG_NONROT, dev->queue);
-   add_disk(dev->gd);
+   err = add_disk(dev->gd);
+   if (err)
+   blk_cleanup_disk(dev->gd);
+
+   return err;
 }
 
 static int rnbd_client_setup_device(struct rnbd_clt_dev *dev)
@@ -1426,8 +1432,7 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev 
*dev)
rnbd_init_mq_hw_queues(dev);
 
setup_request_queue(dev);
-   rnbd_clt_setup_gen_disk(dev, idx);
-   return 0;
+   return rnbd_clt_setup_gen_disk(dev, idx);
 }
 
 static struct rnbd_clt_dev *init_dev(struct rnbd_clt_session *sess,
-- 
2.30.2



[PATCH v2 10/10] mtd/ubi/block: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/mtd/ubi/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
list_add_tail(>list, _devices);
 
/* Must be the last step: anyone can call file ops from now on */
-   add_disk(dev->gd);
+   ret = add_disk(dev->gd);
+   if (ret)
+   goto out_destroy_wq;
+
dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 dev->ubi_num, dev->vol_id, vi->name);
mutex_unlock(_mutex);
return 0;
 
+out_destroy_wq:
+   list_del(>list);
+   destroy_workqueue(dev->wq);
 out_remove_minor:
idr_remove(_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2



[PATCH v2 03/10] ps3disk: add error handling support for add_disk()

2021-09-27 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3disk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device 
*_dev)
 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 get_capacity(gendisk) >> 11);
 
-   device_add_disk(>sbd.core, gendisk, NULL);
-   return 0;
+   error = device_add_disk(>sbd.core, gendisk, NULL);
+   if (error)
+   goto fail_cleanup_disk;
 
+   return 0;
+fail_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 fail_free_tag_set:
blk_mq_free_tag_set(>tag_set);
 fail_teardown:
-- 
2.30.2



[PATCH 04/10] ps3vram: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3vram.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-   device_add_disk(>core, gendisk, NULL);
+   error = device_add_disk(>core, gendisk, NULL);
+   if (error)
+   goto out_cleanup_disk;
+
return 0;
 
+out_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 out_cache_cleanup:
remove_proc_entry(DEVICE_NAME, NULL);
ps3vram_cache_cleanup(dev);
-- 
2.30.2



[PATCH 02/10] pktcdvd: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The out_mem2 error label already does what we need so
re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0f26b2510a75..415248962e67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2729,7 +2729,9 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
/* inherit events of the host device */
disk->events = pd->bdev->bd_disk->events;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_mem2;
 
pkt_sysfs_dev_new(pd);
pkt_debugfs_dev_new(pd);
-- 
2.30.2



[PATCH 01/10] mtip32xx: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The read_capacity_error error label already does what we need,
so just re-use that.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/mtip32xx/mtip32xx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 901855717cb5..d0b40309f47e 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3633,7 +3633,9 @@ static int mtip_block_initialize(struct driver_data *dd)
set_capacity(dd->disk, capacity);
 
/* Enable the block device and add it to /dev */
-   device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   rv = device_add_disk(>pdev->dev, dd->disk, mtip_disk_attr_groups);
+   if (rv)
+   goto read_capacity_error;
 
if (dd->mtip_svc_handler) {
set_bit(MTIP_DDF_INIT_DONE_BIT, >dd_flag);
-- 
2.30.2



[PATCH 00/10] block: fourth batch of add_disk() error handling conversions

2021-09-01 Thread Luis Chamberlain
The full set of changes can be found on my branch titled
20210901-for-axboe-add-disk-error-handling [0] which is
now based on axboe/master.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210901-for-axboe-add-disk-error-handling

Luis Chamberlain (10):
  mtip32xx: add error handling support for add_disk()
  pktcdvd: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  rnbd: add error handling support for add_disk()
  block/rsxx: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  block/sx8: add error handling support for add_disk()
  pf: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/mtip32xx/mtip32xx.c |  4 +++-
 drivers/block/paride/pf.c |  4 +++-
 drivers/block/pktcdvd.c   |  4 +++-
 drivers/block/ps3disk.c   |  8 ++--
 drivers/block/ps3vram.c   |  7 ++-
 drivers/block/rnbd/rnbd-clt.c | 13 +
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 drivers/block/sunvdc.c| 14 +++---
 drivers/block/sx8.c   | 13 +
 drivers/mtd/ubi/block.c   |  8 +++-
 11 files changed, 69 insertions(+), 22 deletions(-)

-- 
2.30.2



[PATCH 08/10] block/sx8: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

A completion is used to notify the initial probe what is
happening and so we must defer error handling on completion.
Do this by remembering the error and using the shared cleanup
function.

The tags are shared and so are hanlded later for the
driver already.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sx8.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index 420cd952ddc4..1c79248c4826 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -297,6 +297,7 @@ struct carm_host {
 
struct work_struct  fsm_task;
 
+   int probe_err;
struct completion   probe_comp;
 };
 
@@ -1181,8 +1182,11 @@ static void carm_fsm_task (struct work_struct *work)
struct gendisk *disk = port->disk;
 
set_capacity(disk, port->capacity);
-   add_disk(disk);
-   activated++;
+   host->probe_err = add_disk(disk);
+   if (!host->probe_err)
+   activated++;
+   else
+   break;
}
 
printk(KERN_INFO DRV_NAME "(%s): %d ports activated\n",
@@ -1192,11 +1196,9 @@ static void carm_fsm_task (struct work_struct *work)
reschedule = 1;
break;
}
-
case HST_PROBE_FINISHED:
complete(>probe_comp);
break;
-
case HST_ERROR:
/* FIXME: TODO */
break;
@@ -1507,7 +1509,10 @@ static int carm_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_free_irq;
 
DPRINTK("waiting for probe_comp\n");
+   host->probe_err = -ENODEV;
wait_for_completion(>probe_comp);
+   if (host->probe_err)
+   goto err_out_free_irq;
 
printk(KERN_INFO "%s: pci %s, ports %d, io %llx, irq %u, major %d\n",
   host->name, pci_name(pdev), (int) CARM_MAX_PORTS,
-- 
2.30.2



[PATCH 09/10] pf: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/paride/pf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index f471d48a87bc..380d80e507c7 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -962,7 +962,9 @@ static int __init pf_init_unit(struct pf_unit *pf, bool 
autoprobe, int port,
if (pf_probe(pf))
goto out_pi_release;
 
-   add_disk(disk);
+   ret = add_disk(disk);
+   if (ret)
+   goto out_pi_release;
pf->present = 1;
return 0;
 
-- 
2.30.2



[PATCH 05/10] rnbd: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/rnbd/rnbd-clt.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index bd4a41afbbfc..1ba1c868535a 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1384,8 +1384,10 @@ static void setup_request_queue(struct rnbd_clt_dev *dev)
blk_queue_write_cache(dev->queue, dev->wc, dev->fua);
 }
 
-static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
+static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
 {
+   int err;
+
dev->gd->major  = rnbd_client_major;
dev->gd->first_minor= idx << RNBD_PART_BITS;
dev->gd->minors = 1 << RNBD_PART_BITS;
@@ -1410,7 +1412,11 @@ static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev 
*dev, int idx)
 
if (!dev->rotational)
blk_queue_flag_set(QUEUE_FLAG_NONROT, dev->queue);
-   add_disk(dev->gd);
+   err = add_disk(dev->gd);
+   if (err)
+   blk_cleanup_disk(dev->gd);
+
+   return err;
 }
 
 static int rnbd_client_setup_device(struct rnbd_clt_dev *dev)
@@ -1426,8 +1432,7 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev 
*dev)
rnbd_init_mq_hw_queues(dev);
 
setup_request_queue(dev);
-   rnbd_clt_setup_gen_disk(dev, idx);
-   return 0;
+   return rnbd_clt_setup_gen_disk(dev, idx);
 }
 
 static struct rnbd_clt_dev *init_dev(struct rnbd_clt_session *sess,
-- 
2.30.2



[PATCH 07/10] block/sunvdc: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/sunvdc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
if (IS_ERR(g)) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
   port->vio.name);
-   blk_mq_free_tag_set(>tag_set);
-   return PTR_ERR(g);
+   err = PTR_ERR(g);
+   goto out_free_tag;
}
 
port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
   port->vdisk_size, (port->vdisk_size >> (20 - 9)),
   port->vio.ver.major, port->vio.ver.minor);
 
-   device_add_disk(>vio.vdev->dev, g, NULL);
+   err = device_add_disk(>vio.vdev->dev, g, NULL);
+   if (err)
+   goto out_cleanup_disk;
 
return 0;
+
+out_cleanup_disk:
+   blk_cleanup_disk(g);
+out_free_tag:
+   blk_mq_free_tag_set(>tag_set);
+   return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2



[PATCH 10/10] mtd/ubi/block: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/mtd/ubi/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
list_add_tail(>list, _devices);
 
/* Must be the last step: anyone can call file ops from now on */
-   add_disk(dev->gd);
+   ret = add_disk(dev->gd);
+   if (ret)
+   goto out_destroy_wq;
+
dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 dev->ubi_num, dev->vol_id, vi->name);
mutex_unlock(_mutex);
return 0;
 
+out_destroy_wq:
+   list_del(>list);
+   destroy_workqueue(dev->wq);
 out_remove_minor:
idr_remove(_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2



[PATCH 03/10] ps3disk: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/ps3disk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device 
*_dev)
 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 get_capacity(gendisk) >> 11);
 
-   device_add_disk(>sbd.core, gendisk, NULL);
-   return 0;
+   error = device_add_disk(>sbd.core, gendisk, NULL);
+   if (error)
+   goto fail_cleanup_disk;
 
+   return 0;
+fail_cleanup_disk:
+   blk_cleanup_disk(gendisk);
 fail_free_tag_set:
blk_mq_free_tag_set(>tag_set);
 fail_teardown:
-- 
2.30.2



[PATCH 06/10] block/rsxx: add error handling support for add_disk()

2021-09-01 Thread Luis Chamberlain
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain 
---
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
index 83636714b8d7..8d9d69f5dfbc 100644
--- a/drivers/block/rsxx/core.c
+++ b/drivers/block/rsxx/core.c
@@ -935,7 +935,9 @@ static int rsxx_pci_probe(struct pci_dev *dev,
card->size8 = 0;
}
 
-   rsxx_attach_dev(card);
+   st = rsxx_attach_dev(card);
+   if (st)
+   goto failed_create_dev;
 
/* Setup Debugfs */
rsxx_debugfs_dev_new(card);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 1cc40b0ea761..b2d3ac3efce2 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -192,6 +192,8 @@ static bool rsxx_discard_supported(struct rsxx_cardinfo 
*card)
 
 int rsxx_attach_dev(struct rsxx_cardinfo *card)
 {
+   int err = 0;
+
mutex_lock(>dev_lock);
 
/* The block device requires the stripe size from the config. */
@@ -200,13 +202,17 @@ int rsxx_attach_dev(struct rsxx_cardinfo *card)
set_capacity(card->gendisk, card->size8 >> 9);
else
set_capacity(card->gendisk, 0);
-   device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
-   card->bdev_attached = 1;
+   err = device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
+   if (err == 0)
+   card->bdev_attached = 1;
}
 
mutex_unlock(>dev_lock);
 
-   return 0;
+   if (err)
+   blk_cleanup_disk(card->gendisk);
+
+   return err;
 }
 
 void rsxx_detach_dev(struct rsxx_cardinfo *card)
-- 
2.30.2



Re: [dm-devel] [PATCH 05/26] block: add blk_alloc_disk and blk_cleanup_disk APIs

2021-05-21 Thread Luis Chamberlain
On Fri, May 21, 2021 at 07:50:55AM +0200, Christoph Hellwig wrote:
> Add two new APIs to allocate and free a gendisk including the
> request_queue for use with BIO based drivers.  This is to avoid
> boilerplate code in drivers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/genhd.c | 35 +++
>  include/linux/genhd.h | 22 ++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index e4974af3d729..6d4ce962866d 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1302,6 +1302,25 @@ struct gendisk *__alloc_disk_node(int minors, int 
> node_id)
>  }
>  EXPORT_SYMBOL(__alloc_disk_node);
>  
> +struct gendisk *__blk_alloc_disk(int node)
> +{
> + struct request_queue *q;
> + struct gendisk *disk;
> +
> + q = blk_alloc_queue(node);
> + if (!q)
> + return NULL;
> +
> + disk = __alloc_disk_node(0, node);
> + if (!disk) {
> + blk_cleanup_queue(q);
> + return NULL;
> + }
> + disk->queue = q;
> + return disk;
> +}
> +EXPORT_SYMBOL(__blk_alloc_disk);

Its not obvious to me why using this new API requires you then to
set minors explicitly to 1, and yet here underneath we see the minors
argument passed is 0.

Nor is it clear from the documentation.

  Luis


Re: [dm-devel] [PATCH 04/26] block: add a flag to make put_disk on partially initalized disks safer

2021-05-21 Thread Luis Chamberlain
On Fri, May 21, 2021 at 07:50:54AM +0200, Christoph Hellwig wrote:
> Add a flag to indicate that __device_add_disk did grab a queue reference
> so that disk_release only drops it if we actually had it.  This sort
> out one of the major pitfals with partially initialized gendisk that
> a lot of drivers did get wrong or still do.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [dm-devel] [PATCH 03/26] block: automatically enable GENHD_FL_EXT_DEVT

2021-05-21 Thread Luis Chamberlain
On Fri, May 21, 2021 at 07:50:53AM +0200, Christoph Hellwig wrote:
> Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated
> without an explicit number of minors.  This is what all new block
> drivers should do, so make sure it is the default without boilerplate
> code.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [dm-devel] [PATCH 02/26] block: move the DISK_MAX_PARTS sanity check into __device_add_disk

2021-05-21 Thread Luis Chamberlain
On Fri, May 21, 2021 at 07:50:52AM +0200, Christoph Hellwig wrote:
> Keep this together with the first place that actually looks at
> ->minors and prepare for not passing a minors argument to
> alloc_disk.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Luis Chamberlain 

  Luis


Re: [dm-devel] [PATCH 01/26] block: refactor device number setup in __device_add_disk

2021-05-21 Thread Luis Chamberlain
On Fri, May 21, 2021 at 07:50:51AM +0200, Christoph Hellwig wrote:
> diff --git a/block/genhd.c b/block/genhd.c
> index 39ca97b0edc6..2c00bc3261d9 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -335,52 +335,22 @@ static int blk_mangle_minor(int minor)

<-- snip -->

> -int blk_alloc_devt(struct block_device *bdev, dev_t *devt)
> +int blk_alloc_ext_minor(void)
>  {
> - struct gendisk *disk = bdev->bd_disk;
>   int idx;
>  
> - /* in consecutive minor range? */
> - if (bdev->bd_partno < disk->minors) {
> - *devt = MKDEV(disk->major, disk->first_minor + bdev->bd_partno);
> - return 0;
> - }
> -

It is not obviously clear to me, why this was part of add_disk()
path, and ...

> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index dc60ecf46fe6..504297bdc8bf 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -379,9 +380,15 @@ static struct block_device *add_partition(struct gendisk 
> *disk, int partno,
>   pdev->type = _type;
>   pdev->parent = ddev;
>  
> - err = blk_alloc_devt(bdev, );
> - if (err)
> - goto out_put;
> + /* in consecutive minor range? */
> + if (bdev->bd_partno < disk->minors) {
> + devt = MKDEV(disk->major, disk->first_minor + bdev->bd_partno);
> + } else {
> + err = blk_alloc_ext_minor();
> + if (err < 0)
> + goto out_put;
> + devt = MKDEV(BLOCK_EXT_MAJOR, err);
> + }
>   pdev->devt = devt;
>  
>   /* delay uevent until 'holders' subdir is created */

... and why we only add this here now.

Other than that, this looks like a super nice cleanup!

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v2 1/1] kernel.h: Split out panic and oops helpers

2021-04-09 Thread Luis Chamberlain
On Fri, Apr 09, 2021 at 01:02:50PM +0300, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt to start cleaning it up by splitting out panic and
> oops helpers.
> 
> There are several purposes of doing this:
> - dropping dependency in bug.h
> - dropping a loop by moving out panic_notifier.h
> - unload kernel.h from something which has its own domain
> 
> At the same time convert users tree-wide to use new headers, although
> for the time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.
> 
> Signed-off-by: Andy Shevchenko 
> Reviewed-by: Bjorn Andersson 
> Acked-by: Mike Rapoport 
> Acked-by: Corey Minyard 
> Acked-by: Christian Brauner 
> Acked-by: Arnd Bergmann 
> Acked-by: Kees Cook 
> Acked-by: Wei Liu 
> Acked-by: Rasmus Villemoes 
> Signed-off-by: Andrew Morton 

Acked-by: Luis Chamberlain 

  Luis


Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-07 Thread Luis Chamberlain
On Wed, Apr 07, 2021 at 05:59:19PM +0300, Andy Shevchenko wrote:
> On Wed, Apr 7, 2021 at 5:30 PM Luis Chamberlain  wrote:
> > On Wed, Apr 07, 2021 at 10:33:44AM +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 7, 2021 at 10:25 AM Luis Chamberlain  
> > > wrote:
> > > > On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> 
> ...
> 
> > > > Why is it worth it to add another file just for this?
> > >
> > > The main point is to break tons of loops that prevent having clean
> > > headers anymore.
> > >
> > > In this case, see bug.h, which is very important in this sense.
> >
> > OK based on the commit log this was not clear, it seemed more of moving
> > panic stuff to its own file, so just cleanup.
> 
> Sorry for that. it should have mentioned the kernel folder instead of
> lib. But I think it won't clarify the above.
> 
> In any case there are several purposes in this case
>  - dropping dependency in bug.h
>  - dropping a loop by moving out panic_notifier.h
>  - unload kernel.h from something which has its own domain
> 
> I think that you are referring to the commit message describing 3rd
> one, but not 1st and 2nd.

Right!

> I will amend this for the future splits, thanks!

Don't get me wrong, I love the motivation behind just the 3rd purpose,
however I figured there might be something more when I saw panic_notifier.h.
It was just not clear.

But awesome stuff!

  Luis


Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-07 Thread Luis Chamberlain
On Wed, Apr 07, 2021 at 10:33:44AM +0300, Andy Shevchenko wrote:
> On Wed, Apr 7, 2021 at 10:25 AM Luis Chamberlain  wrote:
> >
> > On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> > > diff --git a/include/linux/panic_notifier.h 
> > > b/include/linux/panic_notifier.h
> > > new file mode 100644
> > > index ..41e32483d7a7
> > > --- /dev/null
> > > +++ b/include/linux/panic_notifier.h
> > > @@ -0,0 +1,12 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef _LINUX_PANIC_NOTIFIERS_H
> > > +#define _LINUX_PANIC_NOTIFIERS_H
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +extern struct atomic_notifier_head panic_notifier_list;
> > > +
> > > +extern bool crash_kexec_post_notifiers;
> > > +
> > > +#endif   /* _LINUX_PANIC_NOTIFIERS_H */
> >
> > Why is it worth it to add another file just for this?
> 
> The main point is to break tons of loops that prevent having clean
> headers anymore.
>
> In this case, see bug.h, which is very important in this sense.

OK based on the commit log this was not clear, it seemed more of moving
panic stuff to its own file, so just cleanup.

> >  Seems like a very
> > small file.
> 
> If it is an argument, it's kinda strange. We have much smaller headers.

The motivation for such separate file was just not clear on the commit
log.

  Luis


Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-06 Thread Luis Chamberlain
On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> new file mode 100644
> index ..41e32483d7a7
> --- /dev/null
> +++ b/include/linux/panic_notifier.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_PANIC_NOTIFIERS_H
> +#define _LINUX_PANIC_NOTIFIERS_H
> +
> +#include 
> +#include 
> +
> +extern struct atomic_notifier_head panic_notifier_list;
> +
> +extern bool crash_kexec_post_notifiers;
> +
> +#endif   /* _LINUX_PANIC_NOTIFIERS_H */

Why is it worth it to add another file just for this? Seems like a very
small file.

  Luis


Re: [PATCH v5 08/12] init: main: add KUnit to kernel init

2020-07-07 Thread Luis Chamberlain
On Fri, Jun 26, 2020 at 02:09:13PM -0700, Brendan Higgins wrote:
> Remove KUnit from init calls entirely, instead call directly from
> kernel_init().

The commit log does not explain *why*.

> Co-developed-by: Alan Maguire 
> Signed-off-by: Alan Maguire 
> Signed-off-by: Brendan Higgins 
> Reviewed-by: Stephen Boyd 

Other than that:

Reviewed-by: Luis Chamberlain 

  Luis


Re: [PATCH v5 01/12] vmlinux.lds.h: add linker section for KUnit test suites

2020-07-07 Thread Luis Chamberlain
On Fri, Jun 26, 2020 at 02:22:11PM -0700, Brendan Higgins wrote:
> On Fri, Jun 26, 2020 at 2:20 PM Kees Cook  wrote:
> >
> > On Fri, Jun 26, 2020 at 02:09:06PM -0700, Brendan Higgins wrote:
> > > Add a linker section where KUnit can put references to its test suites.
> > > This patch is the first step in transitioning to dispatching all KUnit
> > > tests from a centralized executor rather than having each as its own
> > > separate late_initcall.
> > >
> > > Co-developed-by: Iurii Zaikin 
> > > Signed-off-by: Iurii Zaikin 
> > > Signed-off-by: Brendan Higgins 
> > > Reviewed-by: Stephen Boyd 
> > > ---
> > >  include/asm-generic/vmlinux.lds.h | 8 
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/include/asm-generic/vmlinux.lds.h 
> > > b/include/asm-generic/vmlinux.lds.h
> > > index db600ef218d7d..4f9b036fc9616 100644
> > > --- a/include/asm-generic/vmlinux.lds.h
> > > +++ b/include/asm-generic/vmlinux.lds.h
> > > @@ -881,6 +881,13 @@
> > >   KEEP(*(.con_initcall.init)) \
> > >   __con_initcall_end = .;
> > >
> > > +/* Alignment must be consistent with (kunit_suite *) in 
> > > include/kunit/test.h */
> >
> > Nit on naming:
> >
> > > +#define KUNIT_TEST_SUITES\
> >
> > I would call this KUNIT_TABLE to maintain the same names as other things
> > of this nature.
> >
> > > + . = ALIGN(8);   \
> > > + __kunit_suites_start = .;   \
> > > + KEEP(*(.kunit_test_suites)) \
> > > + __kunit_suites_end = .;
> > > +
> > >  #ifdef CONFIG_BLK_DEV_INITRD
> > >  #define INIT_RAM_FS  \
> > >   . = ALIGN(4);   \
> > > @@ -1056,6 +1063,7 @@
> > >   INIT_CALLS  \
> > >   CON_INITCALL\
> > >   INIT_RAM_FS \
> > > + KUNIT_TEST_SUITES   \
> > >   }
> >
> > Nack: this must be in INIT_DATA, not in INIT_DATA_SECTION. Not all
> > architectures use the INIT_DATA_SECTION macro (e.g. arm64), but everything
> > uses INIT_DATA.
> 
> Oh, maybe that would eliminate the need for the other linkerscript
> patches? That would be nice.

Curious, did changing it as Kees suggest fix it for m68k?

  Luis


Re: [v2 PATCH] crypto: af_alg - Fix regression on empty requests

2020-07-03 Thread Luis Chamberlain
On Thu, Jul 02, 2020 at 01:32:21PM +1000, Herbert Xu wrote:
> On Tue, Jun 30, 2020 at 02:18:11PM +0530, Naresh Kamboju wrote:
> > 
> > Since we are on this subject,
> > LTP af_alg02  test case fails on stable 4.9 and stable 4.4
> > This is not a regression because the test case has been failing from
> > the beginning.
> > 
> > Is this test case expected to fail on stable 4.9 and 4.4 ?
> > or any chance to fix this on these older branches ?
> > 
> > Test output:
> > af_alg02.c:52: BROK: Timed out while reading from request socket.
> > 
> > ref:
> > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884917/suite/ltp-crypto-tests/test/af_alg02/history/
> > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884606/suite/ltp-crypto-tests/test/af_alg02/log
> 
> Actually this test really is broken.

FWIW the patch "umh: fix processed error when UMH_WAIT_PROC is used" was
dropped from linux-next for now as it was missing checking for signals.
I'll be open coding iall checks for each UMH_WAIT_PROC callers next. Its
not clear if this was the issue with this test case, but figured I'd let
you know.

  Luis


Re: [PATCH 6/6] kernel: add a kernel_wait helper

2020-06-20 Thread Luis Chamberlain
On Sat, Jun 20, 2020 at 08:35:38AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 19, 2020 at 09:17:00PM +0000, Luis Chamberlain wrote:
> > On Thu, Jun 18, 2020 at 04:46:27PM +0200, Christoph Hellwig wrote:
> > > --- a/kernel/exit.c
> > > +++ b/kernel/exit.c
> > > @@ -1626,6 +1626,22 @@ long kernel_wait4(pid_t upid, int __user 
> > > *stat_addr, int options,
> > >   return ret;
> > >  }
> > >  
> > > +int kernel_wait(pid_t pid, int *stat)
> > > +{
> > > + struct wait_opts wo = {
> > > + .wo_type= PIDTYPE_PID,
> > > + .wo_pid = find_get_pid(pid),
> > > + .wo_flags   = WEXITED,
> > > + };
> > > + int ret;
> > > +
> > > + ret = do_wait();
> > > + if (ret > 0 && wo.wo_stat)
> > > + *stat = wo.wo_stat;
> > 
> > Since all we care about is WEXITED, that could be simplified
> > to something like this:
> > 
> > if (ret > 0 && KWIFEXITED(wo.wo_stat)
> > *stat = KWEXITSTATUS(wo.wo_stat)
> > 
> > Otherwise callers have to use W*() wrappers.
> > 
> > > + put_pid(wo.wo_pid);
> > > + return ret;
> > > +}
> > 
> > Then we don't get *any* in-kernel code dealing with the W*() crap.
> > I just unwrapped this for the umh [0], given that otherwise we'd
> > have to use KW*() callers elsewhere. Doing it upshot one level
> > further would be even better.
> > 
> > [0] https://lkml.kernel.org/r/20200610154923.27510-1-mcg...@kernel.org  
> > 
> Do you just want to pick this patch up, add your suggested bits and
> add it to the beginning of your series?  That should clean the whole
> thing up a bit.  Nothing else in this series depends on the patch.

Sure but let's wait to hear from the NFS folks.

I'm waiting to hear from NFS folks if the one place where the UMH is
fixed for the error code (on fs/nfsd/nfs4recover.c we never were
disabling the upcall as the error code of -ENOENT or -EACCES was *never*
properly checked for) to see how critical that was. If it can help
stable kernels the fix can go in as I proposed, followed by this patch
to further take the KWEXITSTATUS() up further, and ensure we *never*
deal with this in-kernel. If its not a fix stable kernels should care
for what you suggest of taking this patch first would be best and I'd be
happy to do that.

  Luis


Re: [PATCH 6/6] kernel: add a kernel_wait helper

2020-06-19 Thread Luis Chamberlain
On Thu, Jun 18, 2020 at 04:46:27PM +0200, Christoph Hellwig wrote:
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1626,6 +1626,22 @@ long kernel_wait4(pid_t upid, int __user *stat_addr, 
> int options,
>   return ret;
>  }
>  
> +int kernel_wait(pid_t pid, int *stat)
> +{
> + struct wait_opts wo = {
> + .wo_type= PIDTYPE_PID,
> + .wo_pid = find_get_pid(pid),
> + .wo_flags   = WEXITED,
> + };
> + int ret;
> +
> + ret = do_wait();
> + if (ret > 0 && wo.wo_stat)
> + *stat = wo.wo_stat;

Since all we care about is WEXITED, that could be simplified
to something like this:

if (ret > 0 && KWIFEXITED(wo.wo_stat)
*stat = KWEXITSTATUS(wo.wo_stat)

Otherwise callers have to use W*() wrappers.

> + put_pid(wo.wo_pid);
> + return ret;
> +}

Then we don't get *any* in-kernel code dealing with the W*() crap.
I just unwrapped this for the umh [0], given that otherwise we'd
have to use KW*() callers elsewhere. Doing it upshot one level
further would be even better.

[0] https://lkml.kernel.org/r/20200610154923.27510-1-mcg...@kernel.org  


  Luis


Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 11:13:21AM +0300, Jani Nikula wrote:
> On Fri, 29 May 2020, Luis Chamberlain  wrote:
> > Often enough all we need to do is create a subdirectory so that
> > we can stuff sysctls underneath it. However, *if* that directory
> > was already created early on the boot sequence we really have no
> > need to use the full boiler plate code for it, we can just use
> > local variables to help us guide sysctl to place the new leaf files.
> 
> I find it hard to figure out the lifetime requirements for the tables
> passed in; when it's okay to use local variables and when you need
> longer lifetimes. It's not documented, everyone appears to be using
> static tables for this. It's far from obvious.

I agree 2000% that it is not obvious. What made me consider it was that
I *knew* that the base directory would already exist, so it wouldn't
make sense for the code to rely on earlier parts of a table if part
of the hierarchy already existed.

In fact, a *huge* part of the due dilligence on this and futre series
on this cleanup will be to be 100% sure that the base path is already
created. And so this use is obviously dangerous, you just *need* to
know that the base path is created before.

Non-posted changes also deal with link order to help address this
in other places, given that link order controls how *initcalls()
(early_initcall(), late_initcall(), etc) are ordered if you have
multiple of these.

I had a link order series long ago which augmented our ability to make
things clearer at a link order. Eventually I believe this will become
more important, specially as we end up wanting to async more code.

For now, we can only rely on manual code inspection for ensuring
proper ordering. Part of the implicit aspects of this cleanup is
to slowly make these things clearer for each base path.

So... the "fs" base path will actually end up being created in
fs/sysctl.c after we are *fully* done with the fs sysctl cleanups.

  Luis


Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 12:26:13PM +0200, Greg KH wrote:
> On Fri, May 29, 2020 at 07:41:04AM +0000, Luis Chamberlain wrote:
> > From: Xiaoming Ni 
> > 
> > Move the firmware config sysctl table to fallback_table.c and use the
> > new register_sysctl_subdir() helper. This removes the clutter from
> > kernel/sysctl.c.
> > 
> > Signed-off-by: Xiaoming Ni 
> > Signed-off-by: Luis Chamberlain 
> > ---
> >  drivers/base/firmware_loader/fallback.c   |  4 
> >  drivers/base/firmware_loader/fallback.h   | 11 ++
> >  drivers/base/firmware_loader/fallback_table.c | 22 +--
> >  include/linux/sysctl.h|  1 -
> >  kernel/sysctl.c   |  7 --
> >  5 files changed, 35 insertions(+), 10 deletions(-)
> 
> So it now takes more lines than the old stuff?  :(

Pretty much agreed with the other changes, thanks for the review!

But this diff-stat change, indeed, it is unfortunate that we end up
with more code here than before. We'll try to reduce it instead
somehow, however in some cases during this spring-cleaning, since
the goal is to move code from one file to another, it *may* require
more code. So it won't always be negative. But we'll try!

  Luis


Re: [Intel-gfx] [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 01:23:19AM -0700, Kees Cook wrote:
> On Fri, May 29, 2020 at 07:41:01AM +0000, Luis Chamberlain wrote:
> > This simplifies the code considerably. The following coccinelle
> > SmPL grammar rule was used to transform this code.
> > 
> > // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c
> > 
> > @c1@
> > expression E1;
> > identifier subdir, sysctls;
> > @@
> > 
> > static struct ctl_table subdir[] = {
> > {
> > .procname = E1,
> > .maxlen = 0,
> > .mode = 0555,
> > .child = sysctls,
> > },
> > { }
> > };
> > 
> > @c2@
> > identifier c1.subdir;
> > 
> > expression E2;
> > identifier base;
> > @@
> > 
> > static struct ctl_table base[] = {
> > {
> > .procname = E2,
> > .maxlen = 0,
> > .mode = 0555,
> > .child = subdir,
> > },
> > { }
> > };
> > 
> > @c3@
> > identifier c2.base;
> > identifier header;
> > @@
> > 
> > header = register_sysctl_table(base);
> > 
> > @r1 depends on c1 && c2 && c3@
> > expression c1.E1;
> > identifier c1.subdir, c1.sysctls;
> > @@
> > 
> > -static struct ctl_table subdir[] = {
> > -   {
> > -   .procname = E1,
> > -   .maxlen = 0,
> > -   .mode = 0555,
> > -   .child = sysctls,
> > -   },
> > -   { }
> > -};
> > 
> > @r2 depends on c1 && c2 && c3@
> > identifier c1.subdir;
> > 
> > expression c2.E2;
> > identifier c2.base;
> > @@
> > -static struct ctl_table base[] = {
> > -   {
> > -   .procname = E2,
> > -   .maxlen = 0,
> > -   .mode = 0555,
> > -   .child = subdir,
> > -   },
> > -   { }
> > -};
> > 
> > @r3 depends on c1 && c2 && c3@
> > expression c1.E1;
> > identifier c1.sysctls;
> > expression c2.E2;
> > identifier c2.base;
> > identifier c3.header;
> > @@
> > 
> > header =
> > -register_sysctl_table(base);
> > +register_sysctl_subdir(E2, E1, sysctls);
> > 
> > Generated-by: Coccinelle SmPL
> > 
> > Signed-off-by: Luis Chamberlain 
> > ---
> >  fs/ocfs2/stackglue.c | 27 ---
> >  1 file changed, 4 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> > index a191094694c6..addafced7f59 100644
> > --- a/fs/ocfs2/stackglue.c
> > +++ b/fs/ocfs2/stackglue.c
> > @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = {
> > },
> > { }
> >  };
> > -
> > -static struct ctl_table ocfs2_kern_table[] = {
> > -   {
> > -   .procname   = "ocfs2",
> > -   .data   = NULL,
> > -   .maxlen = 0,
> > -   .mode   = 0555,
> > -   .child  = ocfs2_mod_table
> > -   },
> > -   { }
> > -};
> > -
> > -static struct ctl_table ocfs2_root_table[] = {
> > -   {
> > -   .procname   = "fs",
> > -   .data   = NULL,
> > -   .maxlen = 0,
> > -   .mode   = 0555,
> > -   .child  = ocfs2_kern_table
> > -   },
> > -   { }
> > -};
> > +   .data   = NULL,
> > +   .data   = NULL,
> 
> The conversion script doesn't like the .data field assignments. ;)
> 
> Was this series built with allmodconfig? I would have expected this to
> blow up very badly. :)

Yikes, sense, you're right. Nope, I left the random config tests to
0day. Will fix, thanks!

  Luis


[PATCH 13/13] fs: move binfmt_misc sysctl to its own file

2020-05-29 Thread Luis Chamberlain
This moves the binfmt_misc sysctl to its own file to help remove
clutter from kernel/sysctl.c.

Signed-off-by: Luis Chamberlain 
---
 fs/binfmt_misc.c | 1 +
 kernel/sysctl.c  | 7 ---
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index f69a043f562b..656b3f5f3bbf 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void)
int err = register_filesystem(_fs_type);
if (!err)
insert_binfmt(_format);
+   register_sysctl_empty_subdir("fs", "binfmt_misc");
return err;
 }
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 460532cd5ac8..7714e7b476c2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3042,13 +3042,6 @@ static struct ctl_table fs_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
},
-#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
-   {
-   .procname   = "binfmt_misc",
-   .mode   = 0555,
-   .child  = sysctl_mount_point,
-   },
-#endif
{
.procname   = "pipe-max-size",
.data   = _max_size,
-- 
2.26.2



[PATCH 12/13] sysctl: add helper to register empty subdir

2020-05-29 Thread Luis Chamberlain
The way to create a subdirectory from the base set of directories
is a bit obscure, so provide a helper which makes this clear, and
also helps remove boiler plate code required to do this work.

Signed-off-by: Luis Chamberlain 
---
 include/linux/sysctl.h |  7 +++
 kernel/sysctl.c| 16 +---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 33a471b56345..89c92390e6de 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -208,6 +208,8 @@ extern void register_sysctl_init(const char *path, struct 
ctl_table *table,
 extern struct ctl_table_header *register_sysctl_subdir(const char *base,
   const char *subdir,
   struct ctl_table *table);
+extern void register_sysctl_empty_subdir(const char *base, const char *subdir);
+
 void do_sysctl_args(void);
 
 extern int pwrsw_enabled;
@@ -231,6 +233,11 @@ inline struct ctl_table_header 
*register_sysctl_subdir(const char *base,
return NULL;
 }
 
+static inline void register_sysctl_empty_subdir(const char *base,
+   const char *subdir)
+{
+}
+
 static inline struct ctl_table_header *register_sysctl_paths(
const struct ctl_path *path, struct ctl_table *table)
 {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f9a35325d5d5..460532cd5ac8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3188,13 +3188,17 @@ struct ctl_table_header *register_sysctl_subdir(const 
char *base,
{ }
};
 
-   if (!table->procname)
+   if (table != sysctl_mount_point && !table->procname)
goto out;
 
hdr = register_sysctl_table(base_table);
if (unlikely(!hdr)) {
-   pr_err("failed when creating subdirectory sysctl %s/%s/%s\n",
-  base, subdir, table->procname);
+   if (table != sysctl_mount_point)
+   pr_err("failed when creating subdirectory sysctl 
%s/%s/%s\n",
+  base, subdir, table->procname);
+   else
+   pr_err("failed when creating empty subddirectory 
%s/%s\n",
+  base, subdir);
goto out;
}
kmemleak_not_leak(hdr);
@@ -3202,6 +3206,12 @@ struct ctl_table_header *register_sysctl_subdir(const 
char *base,
return hdr;
 }
 EXPORT_SYMBOL_GPL(register_sysctl_subdir);
+
+void register_sysctl_empty_subdir(const char *base,
+ const char *subdir)
+{
+   register_sysctl_subdir(base, subdir, sysctl_mount_point);
+}
 #endif /* CONFIG_SYSCTL */
 /*
  * No sense putting this after each symbol definition, twice,
-- 
2.26.2



[PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move the firmware config sysctl table to fallback_table.c and use the
new register_sysctl_subdir() helper. This removes the clutter from
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 drivers/base/firmware_loader/fallback.c   |  4 
 drivers/base/firmware_loader/fallback.h   | 11 ++
 drivers/base/firmware_loader/fallback_table.c | 22 +--
 include/linux/sysctl.h|  1 -
 kernel/sysctl.c   |  7 --
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index d9ac7296205e..8190653ae9a3 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -200,12 +200,16 @@ static struct class firmware_class = {
 
 int register_sysfs_loader(void)
 {
+   int ret = register_firmware_config_sysctl();
+   if (ret != 0)
+   return ret;
return class_register(_class);
 }
 
 void unregister_sysfs_loader(void)
 {
class_unregister(_class);
+   unregister_firmware_config_sysctl();
 }
 
 static ssize_t firmware_loading_show(struct device *dev,
diff --git a/drivers/base/firmware_loader/fallback.h 
b/drivers/base/firmware_loader/fallback.h
index 06f4577733a8..7d2cb5f6ceb8 100644
--- a/drivers/base/firmware_loader/fallback.h
+++ b/drivers/base/firmware_loader/fallback.h
@@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void);
 
 int register_sysfs_loader(void);
 void unregister_sysfs_loader(void);
+#ifdef CONFIG_SYSCTL
+extern int register_firmware_config_sysctl(void);
+extern void unregister_firmware_config_sysctl(void);
+#else
+static inline int register_firmware_config_sysctl(void)
+{
+   return 0;
+}
+static inline void unregister_firmware_config_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
+
 #else /* CONFIG_FW_LOADER_USER_HELPER */
 static inline int firmware_fallback_sysfs(struct firmware *fw, const char 
*name,
  struct device *device,
diff --git a/drivers/base/firmware_loader/fallback_table.c 
b/drivers/base/firmware_loader/fallback_table.c
index 46a731dede6f..4234aa5ee5df 100644
--- a/drivers/base/firmware_loader/fallback_table.c
+++ b/drivers/base/firmware_loader/fallback_table.c
@@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = {
 EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE);
 
 #ifdef CONFIG_SYSCTL
-struct ctl_table firmware_config_table[] = {
+static struct ctl_table firmware_config_table[] = {
{
.procname   = "force_sysfs_fallback",
.data   = _fallback_config.force_sysfs_fallback,
@@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = {
},
{ }
 };
-#endif
+
+static struct ctl_table_header *hdr;
+int register_firmware_config_sysctl(void)
+{
+   if (hdr)
+   return -EEXIST;
+   hdr = register_sysctl_subdir("kernel", "firmware_config",
+firmware_config_table);
+   if (!hdr)
+   return -ENOMEM;
+   return 0;
+}
+
+void unregister_firmware_config_sysctl(void)
+{
+   if (hdr)
+   unregister_sysctl_table(hdr);
+}
+#endif /* CONFIG_SYSCTL */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 58bc978d4f03..aa01f54d0442 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -217,7 +217,6 @@ extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
-extern struct ctl_table firmware_config_table[];
 extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 30c2d521502a..e007375c8a11 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2088,13 +2088,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0555,
.child  = usermodehelper_table,
},
-#ifdef CONFIG_FW_LOADER_USER_HELPER
-   {
-   .procname   = "firmware_config",
-   .mode   = 0555,
-   .child  = firmware_config_table,
-   },
-#endif
{
.procname   = "overflowuid",
.data   = ,
-- 
2.26.2



[PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c
and use register_sysctl_subdir() to help remove the clutter out of
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 drivers/char/random.c  | 14 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  5 -
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a7cf6aa65908..73fd4b6e9c18 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int 
write,
 }
 
 static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
-extern struct ctl_table random_table[];
-struct ctl_table random_table[] = {
+static struct ctl_table random_table[] = {
{
.procname   = "poolsize",
.data   = _poolsize,
@@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = {
 #endif
{ }
 };
+
+/*
+ * rand_initialize() is called before sysctl_init(),
+ * so we cannot call register_sysctl_init() in rand_initialize()
+ */
+static int __init random_sysctls_init(void)
+{
+   register_sysctl_subdir("kernel", "random", random_table);
+   return 0;
+}
+device_initcall(random_sysctls_init);
 #endif /* CONFIG_SYSCTL */
 
 struct batched_entropy {
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index e5364b69dd95..33a471b56345 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -216,7 +216,6 @@ extern int unaligned_dump_stack;
 extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
-extern struct ctl_table random_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5c116904feb7..f9a35325d5d5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2078,11 +2078,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = sysctl_max_threads,
},
-   {
-   .procname   = "random",
-   .mode   = 0555,
-   .child  = random_table,
-   },
{
.procname   = "usermodehelper",
.mode   = 0555,
-- 
2.26.2



[PATCH 10/13] eventpoll: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move epoll_table sysctl to fs/eventpoll.c and remove the
clutter out of kernel/sysctl.c by using register_sysctl_subdir()..

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/eventpoll.c | 10 +-
 include/linux/poll.h   |  2 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  7 ---
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..957ebc9700e3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -299,7 +299,7 @@ static LIST_HEAD(tfile_check_list);
 static long long_zero;
 static long long_max = LONG_MAX;
 
-struct ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname   = "max_user_watches",
.data   = _user_watches,
@@ -311,6 +311,13 @@ struct ctl_table epoll_table[] = {
},
{ }
 };
+
+static void __init epoll_sysctls_init(void)
+{
+   register_sysctl_subdir("fs", "epoll", epoll_table);
+}
+#else
+#define epoll_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static const struct file_operations eventpoll_fops;
@@ -2422,6 +2429,7 @@ static int __init eventpoll_init(void)
/* Allocates slab cache used to allocate "struct eppoll_entry" */
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+   epoll_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 1cdc32b1f1b0..a9e0e1c2d1f2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -8,12 +8,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-extern struct ctl_table epoll_table[]; /* for sysctl */
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
 #ifdef __clang__
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index aa01f54d0442..e5364b69dd95 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -217,7 +217,6 @@ extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
-extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e007375c8a11..5c116904feb7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3001,13 +3001,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_EPOLL
-   {
-   .procname   = "epoll",
-   .mode   = 0555,
-   .child  = epoll_table,
-   },
-#endif
 #endif
{
.procname   = "protected_symlinks",
-- 
2.26.2



[PATCH 07/13] test_sysctl: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci lib/test_sysctl.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 lib/test_sysctl.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 84eaae22d3a6..b17581307756 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -128,26 +128,6 @@ static struct ctl_table test_table[] = {
{ }
 };
 
-static struct ctl_table test_sysctl_table[] = {
-   {
-   .procname   = "test_sysctl",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_table,
-   },
-   { }
-};
-
-static struct ctl_table test_sysctl_root_table[] = {
-   {
-   .procname   = "debug",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_sysctl_table,
-   },
-   { }
-};
-
 static struct ctl_table_header *test_sysctl_header;
 
 static int __init test_sysctl_init(void)
@@ -155,7 +135,8 @@ static int __init test_sysctl_init(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
-   test_sysctl_header = register_sysctl_table(test_sysctl_root_table);
+   test_sysctl_header = register_sysctl_subdir("debug", "test_sysctl",
+   test_table);
if (!test_sysctl_header) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
-- 
2.26.2



[PATCH 08/13] inotify: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

move inotify_user sysctl to inotify_user.c and use the new
register_sysctl_subdir() helper.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/notify/inotify/inotify_user.c | 11 ++-
 include/linux/inotify.h  |  3 ---
 kernel/sysctl.c  | 11 ---
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index f88bbcc9efeb..64859fbf8463 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -46,7 +46,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
 
 #include 
 
-struct ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname   = "max_user_instances",
.data   = 
_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES],
@@ -73,6 +73,14 @@ struct ctl_table inotify_table[] = {
},
{ }
 };
+
+static void __init inotify_sysctls_init(void)
+{
+   register_sysctl_subdir("fs", "inotify", inotify_table);
+}
+
+#else
+#define inotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static inline __u32 inotify_arg_to_mask(u32 arg)
@@ -826,6 +834,7 @@ static int __init inotify_user_setup(void)
inotify_max_queued_events = 16384;
init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
+   inotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index 6a24905f6e1e..8d20caa1b268 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -7,11 +7,8 @@
 #ifndef _LINUX_INOTIFY_H
 #define _LINUX_INOTIFY_H
 
-#include 
 #include 
 
-extern struct ctl_table inotify_table[]; /* for sysctl */
-
 #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | 
\
  IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
  IN_MOVED_TO | IN_CREATE | IN_DELETE | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 04ff032f2863..30c2d521502a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -123,10 +123,6 @@ static const int maxolduid = 65535;
 static int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
 
-#ifdef CONFIG_INOTIFY_USER
-#include 
-#endif
-
 #ifdef CONFIG_PROC_SYSCTL
 
 /**
@@ -3012,13 +3008,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_INOTIFY_USER
-   {
-   .procname   = "inotify",
-   .mode   = 0555,
-   .child  = inotify_table,
-   },
-#endif 
 #ifdef CONFIG_EPOLL
{
.procname   = "epoll",
-- 
2.26.2



  1   2   >