ppc64le and 32-bit LE userland compatibility
Hey all, a couple of us over in #talos-workstation on freenode have been working on an effort to bring up a Linux PowerPC userland that runs in 32-bit little-endian mode, aka ppcle. As far as we can tell, no ABI has ever been designated for this (unless you count the patchset from a decade ago [1]), so it's pretty much uncharted territory as far as Linux is concerned. We want to sync up with libc and the relevant kernel folks to establish the best path forward. The practical application that drove these early developments (as you might expect) is x86 emulation. The box86 project [2] implements a translation layer for ia32 library calls to native architecture ones; this way, emulation overhead is significantly reduced by relying on native libraries where possible (libc, libGL, etc.) instead of emulating an entire x86 userspace. box86 is primarily targeted at ARM, but it can be adapted to other architectures—so long as they match ia32's 32-bit, little-endian nature. Hence the need for a ppcle userland; modern POWER brought ppc64le as a supported configuration, but without a 32-bit equivalent there is no option for a 32/64 multilib environment, as seen with ppc/ppc64 and arm/aarch64. Surprisingly, beyond minor patching of gcc to get crosscompile going, bootstrapping the initial userland was not much of a problem. The work has been done on top of the Void Linux PowerPC project [3], and much of that is now present in its source package tree [4]. The first issue with running the userland came from the ppc32 signal handler forcing BE in the MSR, causing any 32LE process receiving a signal (such as a shell receiving SIGCHLD) to terminate with SIGILL. This was trivially patched, along with enabling the 32-bit vDSO on ppc64le kernels [5]. (Given that this behavior has been in place since 2006, I don't think anyone has been using the kernel in this state to run ppcle userlands.) The next problem concerns the ABI more directly. The failure mode was `file` surfacing EINVAL from pread64 when invoked on an ELF; pread64 was passed a garbage value for `pos`, which didn't appear to be caused by anything in `file`. Initially it seemed as though the 32-bit components of the arg were getting swapped, and we made hacky fixes to glibc and musl to put them in the "right order"; however, we weren't sure if that was the correct approach, or if there were knock-on effects we didn't know about. So we found the relevant compat code path in the kernel, at arch/powerpc/kernel/sys_ppc32.c, where there exists this comment: > /* > * long long munging: > * The 32 bit ABI passes long longs in an odd even register pair. > */ It seems that the opposite is true in LE mode, and something is expecting long longs to start on an even register. I realized this after I tried swapping hi/ lo `u32`s here and didn't see an improvement. I whipped up a patch [6] that switches which syscalls use padding arguments depending on endianness, while hopefully remaining tidy enough to be unobtrusive. (I took some liberties with variable names/types so that the macro could be consistent.) This was enough to fix up the `file` bug. I'm no seasoned kernel hacker, though, and there is still concern over the right way to approach this, whether it should live in the kernel or libc, etc. Frankly, I don't know the ABI structure enough to understand why the register padding has to be different in this case, or what lower-level component is responsible for it. For comparison, I had a look at the mips tree, since it's bi-endian and has a similar 32/64 situation. There is a macro conditional upon endianness that is responsible for munging long longs; it uses __MIPSEB__ and __MIPSEL__ instead of an if/else on the generic __LITTLE_ENDIAN__. Not sure what to make of that. (It also simply swaps registers for LE, unlike what I did for ppc.) Also worth noting is the one other outstanding bug, where the time-related syscalls in the 32-bit vDSO seem to return garbage. It doesn't look like an endian bug to me, and it doesn't affect standard syscalls (which is why if you run `date` on musl it prints the correct time, unlike on glibc). The vDSO time functions are implemented in ppc asm (arch/powerpc/kernel/vdso32/ gettimeofday.S), and I've never touched the stuff, so if anyone has a clue I'm all ears. Again, I'd appreciate feedback on the approach to take here, in order to touch/special-case only the minimum necessary, while keeping the kernel/libc folks happy. Cheers, Will [she/her] (p.s. there is ancillary interest in a ppcle-native kernel as well; that's a good deal more work and not the focus of this message at all, but it is a topic of interest) [1]: https://lwn.net/Articles/408845/ [2]: https://github.com/ptitSeb/box86 [3]: https://voidlinux-ppc.org/ [4]: https://github.com/void-ppc/void-packages [5]: https://gist.github.com/eerykitty/01707dc6bca2be32b4c5e30d15d15dcf [6]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c
Re: [PATCH v3 5/7] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
Hi "Aneesh, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.7-rc7] [cannot apply to next-20200529] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Restrict-papr_scm-to-P8-and-above/20200519-195350 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-randconfig-r016-20200529 (attached as .config) compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 2d068e534f1671459e1b135852c1b3c10502e929) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>, old ones prefixed by <<): In file included from arch/powerpc/kernel/syscall_64.c:4: In file included from arch/powerpc/include/asm/asm-prototypes.h:12: >> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of >> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 'mmu_has_feature'? arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here static inline bool mmu_has_feature(unsigned long feature) ^ In file included from arch/powerpc/kernel/syscall_64.c:4: In file included from arch/powerpc/include/asm/asm-prototypes.h:17: In file included from arch/powerpc/include/asm/mmu_context.h:12: In file included from arch/powerpc/include/asm/cputhreads.h:7: >> arch/powerpc/include/asm/cpu_has_feature.h:49:20: error: static declaration >> of 'cpu_has_feature' follows non-static declaration static inline bool cpu_has_feature(unsigned long feature) ^ arch/powerpc/include/asm/cacheflush.h:126:6: note: previous implicit declaration is here if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ 2 errors generated. -- In file included from arch/powerpc/kernel/kprobes.c:23: >> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of >> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 'mmu_has_feature'? arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here static inline bool mmu_has_feature(unsigned long feature) ^ 1 error generated. -- In file included from arch/powerpc/kernel/optprobes.c:15: >> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of >> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 'mmu_has_feature'? arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here static inline bool mmu_has_feature(unsigned long feature) ^ arch/powerpc/kernel/optprobes.c:149:6: warning: no previous prototype for function 'patch_imm32_load_insns' [-Wmissing-prototypes] void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr) ^ arch/powerpc/kernel/optprobes.c:149:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr) ^ static arch/powerpc/kernel/optprobes.c:167:6: warning: no previous prototype for function 'patch_imm64_load_insns' [-Wmissing-prototypes] void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) ^ arch/powerpc/kernel/optprobes.c:167:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) ^ static 2 warnings and 1 error generated. -- In file included from arch/powerpc/kernel/epapr_paravirt.c:11: >> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of >> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 'mmu_has_feature'? arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here static inline bool mmu_has_feature(unsigned long feature) ^ In file included from arch/powerpc/kernel/epapr_parav
Re: [PATCH v3 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
Hi "Aneesh, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.7-rc7 next-20200529] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Restrict-papr_scm-to-P8-and-above/20200519-195350 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-randconfig-r016-20200529 (attached as .config) compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 2d068e534f1671459e1b135852c1b3c10502e929) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>, old ones prefixed by <<): >> arch/powerpc/lib/pmem.c:44:6: error: implicit declaration of function >> 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/lib/pmem.c:44:6: note: did you mean 'mmu_has_feature'? arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here static inline bool mmu_has_feature(unsigned long feature) ^ arch/powerpc/lib/pmem.c:50:6: error: implicit declaration of function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration] if (cpu_has_feature(CPU_FTR_ARCH_207S)) ^ arch/powerpc/lib/pmem.c:57:6: warning: no previous prototype for function 'arch_wb_cache_pmem' [-Wmissing-prototypes] void arch_wb_cache_pmem(void *addr, size_t size) ^ arch/powerpc/lib/pmem.c:57:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void arch_wb_cache_pmem(void *addr, size_t size) ^ static arch/powerpc/lib/pmem.c:64:6: warning: no previous prototype for function 'arch_invalidate_pmem' [-Wmissing-prototypes] void arch_invalidate_pmem(void *addr, size_t size) ^ arch/powerpc/lib/pmem.c:64:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void arch_invalidate_pmem(void *addr, size_t size) ^ static 2 warnings and 2 errors generated. vim +/cpu_has_feature +44 arch/powerpc/lib/pmem.c 41 42 static inline void clean_pmem_range(unsigned long start, unsigned long stop) 43 { > 44 if (cpu_has_feature(CPU_FTR_ARCH_207S)) 45 return __clean_pmem_range(start, stop); 46 } 47 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH net] drivers/net/ibmvnic: Update VNIC protocol version reporting
From: Thomas Falcon Date: Thu, 28 May 2020 11:19:17 -0500 > VNIC protocol version is reported in big-endian format, but it > is not byteswapped before logging. Fix that, and remove version > comparison as only one protocol version exists at this time. > > Signed-off-by: Thomas Falcon Applied, thanks.
Re: [RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error
From: Peter Zijlstra Date: Fri, 29 May 2020 23:35:51 +0200 > ../arch/sparc/include/asm/percpu_64.h:7:24: warning: call-clobbered register > used for global register variable > register unsigned long __local_per_cpu_offset asm("g5"); The "-ffixed-g5" option on the command line tells gcc that we are using 'g5' as a fixed register, so some part of your build isn't using the: KBUILD_CFLAGS += -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare from arch/sparc/Makefile for some reason.
[powerpc:next-test 160/198] arch/powerpc/platforms/powernv/pci-ioda.c:1889:13: warning: function 'pnv_ioda_setup_bus_dma' is not needed and will not be emitted
tree: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test head: e376ca093587eafd840bb0f9df04090e2a54249c commit: dc3d8f85bb571c3640ebba24b82a527cf2cb3f24 [160/198] powerpc/powernv/pci: Re-work bus PE configuration config: powerpc64-randconfig-r024-20200529 (attached as .config) compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 2d068e534f1671459e1b135852c1b3c10502e929) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc64 cross compiling tool for clang build # apt-get install binutils-powerpc64-linux-gnu git checkout dc3d8f85bb571c3640ebba24b82a527cf2cb3f24 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All warnings (new ones prefixed by >>, old ones prefixed by <<): >> arch/powerpc/platforms/powernv/pci-ioda.c:1889:13: warning: function >> 'pnv_ioda_setup_bus_dma' is not needed and will not be emitted >> [-Wunneeded-internal-declaration] static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) ^ 1 warning generated. vim +/pnv_ioda_setup_bus_dma +1889 arch/powerpc/platforms/powernv/pci-ioda.c fe7e85c6f5ff63 Gavin Shan 2014-09-30 1888 5eada8a3f087df Alexey Kardashevskiy 2018-12-19 @1889 static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1890 { 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1891 struct pci_dev *dev; 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1892 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1893 list_for_each_entry(dev, >devices, bus_list) { b348aa65297659 Alexey Kardashevskiy 2015-06-05 1894 set_iommu_table_base(>dev, pe->table_group.tables[0]); 0617fc0ca412b5 Christoph Hellwig 2019-02-13 1895 dev->dev.archdata.dma_offset = pe->tce_bypass_base; dff4a39e880062 Gavin Shan 2014-07-15 1896 5c89a87d13d168 Alexey Kardashevskiy 2015-06-18 1897 if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate) 5eada8a3f087df Alexey Kardashevskiy 2018-12-19 1898 pnv_ioda_setup_bus_dma(pe, dev->subordinate); 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1899 } 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1900 } 74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01 1901 :: The code at line 1889 was first introduced by commit :: 5eada8a3f087df74af1c2797770a3e2249374fe1 powerpc/iommu_api: Move IOMMU groups setup to a single place :: TO: Alexey Kardashevskiy :: CC: Michael Ellerman --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v9 2/5] seq_buf: Export seq_buf_printf
'seq_buf' provides a very useful abstraction for writing to a string buffer without needing to worry about it over-flowing. However even though the API has been stable for couple of years now its still not exported to kernel loadable modules limiting its usage. Hence this patch proposes update to 'seq_buf.c' to mark seq_buf_printf() which is part of the seq_buf API to be exported to kernel loadable GPL modules. This symbol will be used in later parts of this patch-set to simplify content creation for a sysfs attribute. Cc: Piotr Maziarz Cc: Cezary Rojewski Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Borislav Petkov Signed-off-by: Vaibhav Jain --- Changelog: v8..v9: * None v7..v8: * Updated the patch title [ Christoph Hellwig ] * Updated patch description to replace confusing term 'external kernel modules' to 'kernel lodable modules'. Resend: * Added ack from Steven Rostedt v6..v7: * New patch in the series --- lib/seq_buf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/seq_buf.c b/lib/seq_buf.c index 4e865d42ab03..707453f5d58e 100644 --- a/lib/seq_buf.c +++ b/lib/seq_buf.c @@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...) return ret; } +EXPORT_SYMBOL_GPL(seq_buf_printf); #ifdef CONFIG_BINARY_PRINTF /** -- 2.26.2
[PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH' that returns a newly introduced 'struct nd_papr_pdsm_health' instance containing dimm health information back to user space in response to ND_CMD_CALL. This functionality is implemented in newly introduced papr_pdsm_health() that queries the nvdimm health information and then copies this information to the package payload whose layout is defined by 'struct nd_papr_pdsm_health'. The patch also introduces a new member 'struct papr_scm_priv.health' thats an instance of 'struct nd_papr_pdsm_health' to cache the health information of a nvdimm. As a result functions drc_pmem_query_health() and flags_show() are updated to populate and use this new struct instead of a u64 integer that was earlier used. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v8..v9: * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ] * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g * Renamed papr_scm_get_health() to papr_psdm_health() * Updated patch description to replace papr-scm dimm with nvdimm. v7..v8: * None Resend: * None v6..v7: * Updated flags_show() to use seq_buf_printf(). [Mpe] * Updated papr_scm_get_health() to use newly introduced __drc_pmem_query_health() bypassing the cache [Mpe]. v5..v6: * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to gaurd against possibility of different compilers adding different paddings to the struct [ Dan Williams ] * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of 'bool' and also updated drc_pmem_query_health() to take this into account. [ Dan Williams ] v4..v5: * None v3..v4: * Call the DSM_PAPR_SCM_HEALTH service function from papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh] v2..v3: * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types as its exported to the userspace [Aneesh] * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health from enum to #defines [Aneesh] v1..v2: * New patch in the series --- arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++ arch/powerpc/platforms/pseries/papr_scm.c | 125 +++--- 2 files changed, 147 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h index 6407fefcc007..411725a91591 100644 --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg { */ enum papr_pdsm { PAPR_PDSM_MIN = 0x0, + PAPR_PDSM_HEALTH, PAPR_PDSM_MAX, }; @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) return (void *)(pcmd->payload); } +/* Various nvdimm health indicators */ +#define PAPR_PDSM_DIMM_HEALTHY 0 +#define PAPR_PDSM_DIMM_UNHEALTHY 1 +#define PAPR_PDSM_DIMM_CRITICAL 2 +#define PAPR_PDSM_DIMM_FATAL 3 + +/* + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH + * Various flags indicate the health status of the dimm. + * + * dimm_unarmed: Dimm not armed. So contents wont persist. + * dimm_bad_shutdown : Previous shutdown did not persist contents. + * dimm_bad_restore: Contents from previous shutdown werent restored. + * dimm_scrubbed : Contents of the dimm have been scrubbed. + * dimm_locked : Contents of the dimm cant be modified until CEC reboot + * dimm_encrypted : Contents of dimm are encrypted. + * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_ + */ +struct nd_papr_pdsm_health_v1 { + __u8 dimm_unarmed; + __u8 dimm_bad_shutdown; + __u8 dimm_bad_restore; + __u8 dimm_scrubbed; + __u8 dimm_locked; + __u8 dimm_encrypted; + __u16 dimm_health; +} __packed; + +/* + * Typedef the current struct for dimm_health so that any application + * or kernel recompiled after introducing a new version automatically + * supports the new version. + */ +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1 + +/* Current version number for the dimm health struct */ +#define ND_PAPR_PDSM_HEALTH_VERSION 1 + #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */ diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 5e2237e7ec08..c0606c0c659c 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -88,7 +88,7 @@ struct papr_scm_priv { unsigned long lasthealth_jiffies; /* Health information for the dimm */ - u64 health_bitmap; + struct nd_papr_pdsm_health health; }; static int drc_pmem_bind(struct papr_scm_priv *p) @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p) static int __drc_pmem_query_health(struct papr_scm_priv *p) { unsigned long ret[PLPAR_HCALL_BUFSIZE]; + u64 health; long rc;
[PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm module and add the command family NVDIMM_FAMILY_PAPR to the white list of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the nvdimm command mask and implement necessary scaffolding in the module to handle ND_CMD_CALL ioctl and PDSM requests that we receive. The layout of the PDSM request as we expect from libnvdimm/libndctl is described in newly introduced uapi header 'papr_pdsm.h' which defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used to communicate the PDSM request via member 'nd_cmd_pkg.nd_command' and size of payload that need to be sent/received for servicing the PDSM. A new function is_cmd_valid() is implemented that reads the args to papr_scm_ndctl() and performs sanity tests on them. A new function papr_scm_service_pdsm() is introduced and is called from papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL command from libnvdimm. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v8..v9: * Reduced the usage of term SCM replacing it with appropriate replacement [ Dan Williams, Aneesh ] * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h' * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'. * Minor update to patch description. v7..v8: * Removed the 'payload_offset' field from 'struct nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ] * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg', 'reserved' field of 10-bytes is introduced. [ Aneesh ] * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h [ Ira ] Resend: * None v6..v7 : * Removed the re-definitions of __packed macro from papr_scm_pdsm.h [Mpe]. * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe]. * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h [Mpe]. * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe] v5..v6 : * Changed the usage of the term DSM to PDSM to distinguish it from the ACPI term [ Dan Williams ] * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct to reflect the new terminology. * Updated the patch description and title to reflect the new terminology. * Squashed patch to introduce new command family in 'ndctl.h' with this patch [ Dan Williams ] * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0 [ Dan Williams ] * Removed redundant license text from the papr_scm_psdm.h file. [ Dan Williams ] * s/envelop/envelope/ at various places [ Dan Williams ] * Added '__packed' attribute to command package header to gaurd against different compiler adding paddings between the fields. [ Dan Williams] * Converted various pr_debug to dev_debug [ Dan Williams ] v4..v5 : * None v3..v4 : * None v2..v3 : * Updated the patch prefix to 'ndctl/uapi' [Aneesh] v1..v2 : * None --- arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++ arch/powerpc/platforms/pseries/papr_scm.c | 101 +++- include/uapi/linux/ndctl.h| 1 + 3 files changed, 232 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h new file mode 100644 index ..6407fefcc007 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -0,0 +1,136 @@ +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ +/* + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl + * + * (C) Copyright IBM 2020 + * + * Author: Vaibhav Jain + */ + +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_ + +#include + +/* + * PDSM Envelope: + * + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via + * envelope which consists of a header and user-defined payload sections. + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field. + * There is reserved field that can used to introduce new fields to the + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload' + * lies at a 8-byte boundary. + * + * +-+-+---+ + * | 64-Bytes | 16-Bytes | Max 176-Bytes | + * +-+-+---+ + * | nd_pdsm_cmd_pkg | | + * |-+ | | + * | nd_cmd_pkg | | | + * +-+-+---+ + * | nd_family | | | + * |
[PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Implement support for fetching nvdimm health information via H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair of 64-bit bitmap, bitwise-and of which is then stored in 'struct papr_scm_priv' and subsequently partially exposed to user-space via newly introduced dimm specific attribute 'papr/flags'. Since the hcall is costly, the health information is cached and only re-queried, 60s after the previous successful hcall. The patch also adds a documentation text describing flags reported by the the new sysfs attribute 'papr/flags' is also introduced at Documentation/ABI/testing/sysfs-bus-papr-pmem. [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for PAPR hcalls") Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v8..v9: * Rename some variables and defines to reduce usage of term SCM replacing it with PMEM [Dan Williams, Aneesh] * s/PAPR_SCM_DIMM/PAPR_PMEM/g * s/papr_scm_nd_attributes/papr_nd_attributes/g * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem v7..v8: * Update type of variable 'rc' in __drc_pmem_query_health() and drc_pmem_query_health() to long and int respectively. [ Ira ] * Updated the patch description to s/64 bit Big Endian Number/64-bit bitmap/ [ Ira, Aneesh ]. Resend: * None v6..v7 : * Used the exported buf_seq_printf() function to generate content for 'papr/flags' * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c and removed the papr_scm.h file [Mpe] * Some minor consistency issued in sysfs-bus-papr-scm documentation. [Mpe] * s/dimm_mutex/health_mutex/g [Mpe] * Split drc_pmem_query_health() into two function one of which takes care of caching and locking. [Mpe] * Fixed a local copy creation of dimm health information using READ_ONCE(). [Mpe] v5..v6 : * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' [Dan Williams] * Include documentation for 'papr/flags' attr [Dan Williams] * Change flag 'save_fail' to 'flush_fail' [Dan Williams] * Caching of health bitmap to reduce expensive hcalls [Dan Williams] * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] * Replaced two __be64 integers from papr_scm_priv to a single u64 integer [Mpe] * Updated patch description to reflect the changes made in this version. * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from flags_show() [Dan Williams] v4..v5 : * None v3..v4 : * None v2..v3 : * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for NVDIMM unarmed [Aneesh] v1..v2 : * New patch in the series. --- Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++ arch/powerpc/platforms/pseries/papr_scm.c | 169 +- 2 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem new file mode 100644 index ..5b10d036a8d4 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem @@ -0,0 +1,27 @@ +What: /sys/bus/nd/devices/nmemX/papr/flags +Date: Apr, 2020 +KernelVersion: v5.8 +Contact: linuxppc-dev , linux-nvd...@lists.01.org, +Description: + (RO) Report flags indicating various states of a + papr-pmem NVDIMM device. Each flag maps to a one or + more bits set in the dimm-health-bitmap retrieved in + response to H_SCM_HEALTH hcall. The details of the bit + flags returned in response to this hcall is available + at 'Documentation/powerpc/papr_hcalls.rst' . Below are + the flags reported in this sysfs file: + + * "not_armed" : Indicates that NVDIMM contents will not + survive a power cycle. + * "flush_fail" : Indicates that NVDIMM contents + couldn't be flushed during last + shut-down event. + * "restore_fail": Indicates that NVDIMM contents + couldn't be restored during NVDIMM + initialization. + * "encrypted" : NVDIMM contents are encrypted. + * "smart_notify": There is health event for the NVDIMM. + * "scrubbed": Indicating that contents of the + NVDIMM have been scrubbed. + * "locked" : Indicating that NVDIMM contents cant + be modified until next power cycle. diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index f35592423380..149431594839 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++
[PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall
Add documentation to 'papr_hcalls.rst' describing the bitmap flags that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM specification. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v8..v9: * s/SCM/PMEM device. [ Dan Williams, Aneesh ] v7..v8: * Added a clarification on bit-ordering of Health Bitmap Resend: * None v6..v7: * None v5..v6: * New patch in the series --- Documentation/powerpc/papr_hcalls.rst | 46 --- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst index 3493631a60f8..48fcf1255a33 100644 --- a/Documentation/powerpc/papr_hcalls.rst +++ b/Documentation/powerpc/papr_hcalls.rst @@ -220,13 +220,51 @@ from the LPAR memory. **H_SCM_HEALTH** | Input: drcIndex -| Out: *health-bitmap, health-bit-valid-bitmap* +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* | Return Value: *H_Success, H_Parameter, H_Hardware* Given a DRC Index return the info on predictive failure and overall health of -the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are -valid. +the PMEM device. The asserted bits in the health-bitmap indicate one or more states +(described in table below) of the PMEM device and health-bit-valid-bitmap indicate +which bits in health-bitmap are valid. The bits are reported in +reverse bit ordering for example a value of 0xC400 +indicates bits 0, 1, and 5 are valid. + +Health Bitmap Flags: + ++--+---+ +| Bit | Definition | ++==+===+ +| 00 | PMEM device is unable to persist memory contents. | +| | If the system is powered down, nothing will be saved. | ++--+---+ +| 01 | PMEM device failed to persist memory contents. Either contents were | +| | not saved successfully on power down or were not restored properly on | +| | power up. | ++--+---+ +| 02 | PMEM device contents are persisted from previous IPL. The data from | +| | the last boot were successfully restored. | ++--+---+ +| 03 | PMEM device contents are not persisted from previous IPL. There was no| +| | data to restore from the last boot. | ++--+---+ +| 04 | PMEM device memory life remaining is critically low | ++--+---+ +| 05 | PMEM device will be garded off next IPL due to failure | ++--+---+ +| 06 | PMEM device contents cannot persist due to current platform health | +| | status. A hardware failure may prevent data from being saved or | +| | restored. | ++--+---+ +| 07 | PMEM device is unable to persist memory contents in certain conditions| ++--+---+ +| 08 | PMEM device is encrypted | ++--+---+ +| 09 | PMEM device has successfully completed a requested erase or secure | +| | erase procedure. | ++--+---+ +|10:63 | Reserved / Unused | ++--+---+ **H_SCM_PERFORMANCE_STATS** -- 2.26.2
[PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health
Changes since v8 [1]: * Updated proposed changes to remove usage of term 'SCM' due to ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ] * Replaced the usage of term 'SCM' with 'PMEM' in most contexts. [ Aneesh ] * Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* . * Updates to various newly introduced identifiers in 'papr_scm.c' removing the 'SCM' prefix from their names. * Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR * Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH * Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'. * Renamed sysfs ABI document from sysfs-bus-papr-scm to sysfs-bus-papr-pmem. * No behavioural changes from v8 [1]. [1] https://lore.kernel.org/linux-nvdimm/20200527041244.37821-1-vaib...@linux.ibm.com/ --- The PAPR standard[2][4] provides mechanisms to query the health and performance stats of an NVDIMM via various hcalls as described in Ref[3]. Until now these stats were never available nor exposed to the user-space tools like 'ndctl'. This is partly due to PAPR platform not having support for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm health status and a user had no way to determine the current health status of a NDVIMM. To overcome this limitation, this patch-set updates papr_scm kernel module to query and fetch NVDIMM health stats using hcalls described in Ref[3]. This health and performance stats are then exposed to userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by libndctl. These changes coupled with proposed ndtcl changes located at Ref[5] should provide a way for the user to retrieve NVDIMM health status using ndtcl. Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM in a emulation environment: # ndctl list -DH [ { "dev":"nmem0", "health":{ "health_state":"fatal", "shutdown_state":"dirty" } } ] Dimm health report output on a pseries guest lpar with vPMEM or HMS based NVDIMMs that are in perfectly healthy conditions: # ndctl list -d nmem0 -H [ { "dev":"nmem0", "health":{ "health_state":"ok", "shutdown_state":"clean" } } ] PAPR NVDIMM-Specific-Methods(PDSM) == PDSM requests are issued by vendor specific code in libndctl to execute certain operations or fetch information from NVDIMMS. PDSMs requests can be sent to papr_scm module via libndctl(userspace) and libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be handled in the dimm control function papr_scm_ndctl(). Current patchset proposes a single PDSM to retrieve NVDIMM health, defined in the newly introduced uapi header named 'papr_pdsm.h'. Support for more PDSMs will be added in future. Structure of the patch-set == The patch-set starts with a doc patch documenting details of hcall H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf() thats used in subsequent patches to generate sysfs attribute content. Third patch implements support for fetching NVDIMM health information from PHYP and partially exposing it to user-space via a NVDIMM sysfs flag. Fourth patches deal with implementing support for servicing PDSM commands in papr_scm module. Finally the last patch implements support for servicing PDSM 'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to libndctl. References: [2] "Power Architecture Platform Reference" https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference [3] commit 58b278f568f0 ("powerpc: Provide initial documentation for PAPR hcalls") [4] "Linux on Power Architecture Platform Reference" https://members.openpowerfoundation.org/document/dl/469 [5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9 --- Vaibhav Jain (5): powerpc: Document details on H_SCM_HEALTH hcall seq_buf: Export seq_buf_printf powerpc/papr_scm: Fetch nvdimm health information from PHYP ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++ Documentation/powerpc/papr_hcalls.rst | 46 ++- arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 + arch/powerpc/platforms/pseries/papr_scm.c | 363 +- include/uapi/linux/ndctl.h| 1 + lib/seq_buf.c | 1 + 6 files changed, 600 insertions(+), 13 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h -- 2.26.2
[PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu variables
Ahmed and Sebastian wanted additional lockdep_assert*() macros and ran into header hell. Move the IRQ state into per-cpu variables, which removes the dependency on task_struct, which is what generated the header-hell. These patches are intended to go on top of: https://lkml.kernel.org/r/20200529212728.795169...@infradead.org but should apply on current tip/master without much difficulty. There are a few build fixes for Sparc64, PowerPC64 and s390. Especially the Sparc one I'm not sure about.
[PATCH v3 2/5] powerpc64: Break asm/percpu.h vs spinlock_types.h dependency
In order to use in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) --- arch/powerpc/include/asm/dtl.h | 52 + arch/powerpc/include/asm/lppaca.h | 44 --- arch/powerpc/kernel/time.c |1 arch/powerpc/kvm/book3s_hv.c |1 arch/powerpc/platforms/pseries/dtl.c |1 arch/powerpc/platforms/pseries/lpar.c |1 arch/powerpc/platforms/pseries/setup.c |1 arch/powerpc/platforms/pseries/svm.c |1 8 files changed, 58 insertions(+), 44 deletions(-) --- /dev/null +++ b/arch/powerpc/include/asm/dtl.h @@ -0,0 +1,52 @@ +#ifndef _ASM_POWERPC_DTL_H +#define _ASM_POWERPC_DTL_H + +#include +#include + +/* + * Layout of entries in the hypervisor's dispatch trace log buffer. + */ +struct dtl_entry { + u8 dispatch_reason; + u8 preempt_reason; + __be16 processor_id; + __be32 enqueue_to_dispatch_time; + __be32 ready_to_enqueue_time; + __be32 waiting_to_ready_time; + __be64 timebase; + __be64 fault_addr; + __be64 srr0; + __be64 srr1; +}; + +#define DISPATCH_LOG_BYTES 4096/* bytes per cpu */ +#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry)) + +/* + * Dispatch trace log event enable mask: + * 0x1: voluntary virtual processor waits + * 0x2: time-slice preempts + * 0x4: virtual partition memory page faults + */ +#define DTL_LOG_CEDE 0x1 +#define DTL_LOG_PREEMPT0x2 +#define DTL_LOG_FAULT 0x4 +#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT) + +extern struct kmem_cache *dtl_cache; +extern rwlock_t dtl_access_lock; + +/* + * When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls + * reading from the dispatch trace log. If other code wants to consume + * DTL entries, it can set this pointer to a function that will get + * called once for each DTL entry that gets processed. + */ +extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index); + +extern void register_dtl_buffer(int cpu); +extern void alloc_dtl_buffers(unsigned long *time_limit); +extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity); + +#endif /* _ASM_POWERPC_DTL_H */ --- a/arch/powerpc/include/asm/lppaca.h +++ b/arch/powerpc/include/asm/lppaca.h @@ -42,7 +42,6 @@ */ #include #include -#include #include #include #include @@ -146,49 +145,6 @@ struct slb_shadow { } save_area[SLB_NUM_BOLTED]; } cacheline_aligned; -/* - * Layout of entries in the hypervisor's dispatch trace log buffer. - */ -struct dtl_entry { - u8 dispatch_reason; - u8 preempt_reason; - __be16 processor_id; - __be32 enqueue_to_dispatch_time; - __be32 ready_to_enqueue_time; - __be32 waiting_to_ready_time; - __be64 timebase; - __be64 fault_addr; - __be64 srr0; - __be64 srr1; -}; - -#define DISPATCH_LOG_BYTES 4096/* bytes per cpu */ -#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry)) - -/* - * Dispatch trace log event enable mask: - * 0x1: voluntary virtual processor waits - * 0x2: time-slice preempts - * 0x4: virtual partition memory page faults - */ -#define DTL_LOG_CEDE 0x1 -#define DTL_LOG_PREEMPT0x2 -#define DTL_LOG_FAULT 0x4 -#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT) - -extern struct kmem_cache *dtl_cache; -extern rwlock_t dtl_access_lock; - -/* - * When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls - * reading from the dispatch trace log. If other code wants to consume - * DTL entries, it can set this pointer to a function that will get - * called once for each DTL entry that gets processed. - */ -extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index); - -extern void register_dtl_buffer(int cpu); -extern void alloc_dtl_buffers(unsigned long *time_limit); extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity); #endif /* CONFIG_PPC_BOOK3S */ --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -69,6 +69,7 @@ #include #include #include +#include /* powerpc clocksource/clockevent code */ --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -74,6 +74,7 @@ #include #include #include +#include #include "book3s.h" --- a/arch/powerpc/platforms/pseries/dtl.c +++ b/arch/powerpc/platforms/pseries/dtl.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -40,6 +40,7 @@ #include #include #include +#include #include "pseries.h" ---
[RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error
In order to break a header dependency between lockdep and task_struct, I need per-cpu stuff from lockdep. Including from lockdep.h gives a build error, this patch cures that, but results in the following warning: ../arch/sparc/include/asm/percpu_64.h:7:24: warning: call-clobbered register used for global register variable register unsigned long __local_per_cpu_offset asm("g5"); But i've no idea how to fix that :/ but it does build. Not-Signed-off-by: Peter Zijlstra (Intel) --- arch/sparc/include/asm/trap_block.h |2 ++ 1 file changed, 2 insertions(+) --- a/arch/sparc/include/asm/trap_block.h +++ b/arch/sparc/include/asm/trap_block.h @@ -2,6 +2,8 @@ #ifndef _SPARC_TRAP_BLOCK_H #define _SPARC_TRAP_BLOCK_H +#include + #include #include
[PATCH v3 5/5] lockdep: Remove lockdep_hardirq{s_enabled, _context}() argument
Now that the macros use per-cpu data, we no longer need the argument. Signed-off-by: Peter Zijlstra (Intel) --- arch/x86/entry/common.c|2 +- include/linux/irqflags.h |8 include/linux/lockdep.h|2 +- kernel/locking/lockdep.c | 30 +++--- kernel/softirq.c |2 +- tools/include/linux/irqflags.h |4 ++-- 6 files changed, 24 insertions(+), 24 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -689,7 +689,7 @@ noinstr void idtentry_exit_user(struct p noinstr bool idtentry_enter_nmi(struct pt_regs *regs) { - bool irq_state = lockdep_hardirqs_enabled(current); + bool irq_state = lockdep_hardirqs_enabled(); __nmi_enter(); lockdep_hardirqs_off(CALLER_ADDR0); --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -40,9 +40,9 @@ DECLARE_PER_CPU(int, hardirq_context); extern void trace_hardirqs_off_finish(void); extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); -# define lockdep_hardirq_context(p)(this_cpu_read(hardirq_context)) +# define lockdep_hardirq_context() (this_cpu_read(hardirq_context)) # define lockdep_softirq_context(p)((p)->softirq_context) -# define lockdep_hardirqs_enabled(p) (this_cpu_read(hardirqs_enabled)) +# define lockdep_hardirqs_enabled()(this_cpu_read(hardirqs_enabled)) # define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled) # define lockdep_hardirq_enter() \ do { \ @@ -109,9 +109,9 @@ do {\ # define trace_hardirqs_off_finish() do { } while (0) # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) -# define lockdep_hardirq_context(p)0 +# define lockdep_hardirq_context() 0 # define lockdep_softirq_context(p)0 -# define lockdep_hardirqs_enabled(p) 0 +# define lockdep_hardirqs_enabled()0 # define lockdep_softirqs_enabled(p) 0 # define lockdep_hardirq_enter() do { } while (0) # define lockdep_hardirq_threaded()do { } while (0) --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -736,7 +736,7 @@ do { \ # define lockdep_assert_RT_in_threaded_ctx() do { \ WARN_ONCE(debug_locks && !current->lockdep_recursion && \ - lockdep_hardirq_context(current) && \ + lockdep_hardirq_context() && \ !(current->hardirq_threaded || current->irq_config), \ "Not in threaded context on PREEMPT_RT as expected\n"); \ } while (0) --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_str pr_warn("-\n"); pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n", curr->comm, task_pid_nr(curr), - lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT, + lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT, curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT, - lockdep_hardirqs_enabled(curr), + lockdep_hardirqs_enabled(), curr->softirqs_enabled); print_lock(next); @@ -3331,9 +3331,9 @@ print_usage_bug(struct task_struct *curr pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n", curr->comm, task_pid_nr(curr), - lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT, + lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT, lockdep_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT, - lockdep_hardirqs_enabled(curr), + lockdep_hardirqs_enabled(), lockdep_softirqs_enabled(curr)); print_lock(this); @@ -3655,7 +3655,7 @@ void lockdep_hardirqs_on_prepare(unsigne if (DEBUG_LOCKS_WARN_ON(current->lockdep_recursion & LOCKDEP_RECURSION_MASK)) return; - if (unlikely(lockdep_hardirqs_enabled(current))) { + if (unlikely(lockdep_hardirqs_enabled())) { /* * Neither irq nor preemption are disabled here * so this is racy by nature but losing one hit @@ -3683,7 +3683,7 @@ void lockdep_hardirqs_on_prepare(unsigne * Can't allow enabling interrupts while in an interrupt handler, * that's general bad form and such. Recursion, limited stack etc.. */ - if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context(current))) + if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context())) return;
[PATCH v3 4/5] lockdep: Change hardirq{s_enabled, _context} to per-cpu variables
Currently all IRQ-tracking state is in task_struct, this means that task_struct needs to be defined before we use it. Especially for lockdep_assert_irq*() this can lead to header-hell. Move the hardirq state into per-cpu variables to avoid the task_struct dependency. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/irqflags.h | 19 --- include/linux/lockdep.h | 34 ++ include/linux/sched.h|2 -- kernel/fork.c|4 +--- kernel/locking/lockdep.c | 30 +++--- kernel/softirq.c |6 ++ 6 files changed, 52 insertions(+), 43 deletions(-) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -14,6 +14,7 @@ #include #include +#include /* Currently lockdep_softirqs_on/off is used only by lockdep */ #ifdef CONFIG_PROVE_LOCKING @@ -31,18 +32,22 @@ #endif #ifdef CONFIG_TRACE_IRQFLAGS + +DECLARE_PER_CPU(int, hardirqs_enabled); +DECLARE_PER_CPU(int, hardirq_context); + extern void trace_hardirqs_on_prepare(void); extern void trace_hardirqs_off_finish(void); extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); -# define lockdep_hardirq_context(p)((p)->hardirq_context) +# define lockdep_hardirq_context(p)(this_cpu_read(hardirq_context)) # define lockdep_softirq_context(p)((p)->softirq_context) -# define lockdep_hardirqs_enabled(p) ((p)->hardirqs_enabled) +# define lockdep_hardirqs_enabled(p) (this_cpu_read(hardirqs_enabled)) # define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled) -# define lockdep_hardirq_enter() \ -do { \ - if (!current->hardirq_context++)\ - current->hardirq_threaded = 0; \ +# define lockdep_hardirq_enter() \ +do { \ + if (this_cpu_inc_return(hardirq_context) == 1) \ + current->hardirq_threaded = 0; \ } while (0) # define lockdep_hardirq_threaded()\ do { \ @@ -50,7 +55,7 @@ do { \ } while (0) # define lockdep_hardirq_exit()\ do { \ - current->hardirq_context--; \ + this_cpu_dec(hardirq_context); \ } while (0) # define lockdep_softirq_enter() \ do { \ --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -19,6 +19,7 @@ extern int lock_stat; #define MAX_LOCKDEP_SUBCLASSES 8UL +#include #include enum lockdep_wait_type { @@ -703,28 +704,29 @@ do { \ lock_release(&(lock)->dep_map, _THIS_IP_); \ } while (0) -#define lockdep_assert_irqs_enabled() do {\ - WARN_ONCE(debug_locks && !current->lockdep_recursion && \ - !current->hardirqs_enabled, \ - "IRQs not enabled as expected\n");\ - } while (0) +DECLARE_PER_CPU(int, hardirqs_enabled); +DECLARE_PER_CPU(int, hardirq_context); -#define lockdep_assert_irqs_disabled() do {\ - WARN_ONCE(debug_locks && !current->lockdep_recursion && \ - current->hardirqs_enabled,\ - "IRQs not disabled as expected\n"); \ - } while (0) +#define lockdep_assert_irqs_enabled() \ +do { \ + WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled)); \ +} while (0) -#define lockdep_assert_in_irq() do { \ - WARN_ONCE(debug_locks && !current->lockdep_recursion && \ - !current->hardirq_context,\ - "Not in hardirq as expected\n"); \ - } while (0) +#define lockdep_assert_irqs_disabled() \ +do { \ + WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled)); \ +} while (0) + +#define lockdep_assert_in_irq() \ +do { \ + WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context)); \ +} while (0) #else # define might_lock(lock) do { } while (0) # define might_lock_read(lock) do { } while (0) # define might_lock_nested(lock, subclass) do { } while (0) + # define lockdep_assert_irqs_enabled() do { } while (0) # define lockdep_assert_irqs_disabled() do { } while (0) # define lockdep_assert_in_irq() do {
[PATCH v3 3/5] s390: Break cyclic percpu include
In order to use in irqflags.h, we need to make sure asm/percpu.h does not itself depend on irqflags.h Signed-off-by: Peter Zijlstra (Intel) --- arch/s390/include/asm/smp.h |1 + arch/s390/include/asm/thread_info.h |1 - 2 files changed, 1 insertion(+), 1 deletion(-) --- a/arch/s390/include/asm/smp.h +++ b/arch/s390/include/asm/smp.h @@ -10,6 +10,7 @@ #include #include +#include #define raw_smp_processor_id() (S390_lowcore.cpu_nr) --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -24,7 +24,6 @@ #ifndef __ASSEMBLY__ #include #include -#include #define STACK_INIT_OFFSET \ (THREAD_SIZE - STACK_FRAME_OVERHEAD - sizeof(struct pt_regs))
Re: [musl] ppc64le and 32-bit LE userland compatibility
On Fri, May 29, 2020 at 07:03:48PM +, Will Springer wrote: > The next problem concerns the ABI more directly. The failure mode was `file` > surfacing EINVAL from pread64 when invoked on an ELF; pread64 was passed a > garbage value for `pos`, which didn't appear to be caused by anything in > `file`. Initially it seemed as though the 32-bit components of the arg were > getting swapped, and we made hacky fixes to glibc and musl to put them in the > "right order"; however, we weren't sure if that was the correct approach, or > if there were knock-on effects we didn't know about. So we found the relevant > compat code path in the kernel, at arch/powerpc/kernel/sys_ppc32.c, where > there exists this comment: > > > /* > > * long long munging: > > * The 32 bit ABI passes long longs in an odd even register pair. > > */ > > It seems that the opposite is true in LE mode, and something is expecting long > longs to start on an even register. I realized this after I tried swapping hi/ > lo `u32`s here and didn't see an improvement. I whipped up a patch [6] that > switches which syscalls use padding arguments depending on endianness, while > hopefully remaining tidy enough to be unobtrusive. (I took some liberties with > variable names/types so that the macro could be consistent.) The argument passing for pread/pwrite is historically a mess and differs between archs. musl has a dedicated macro that archs can define to override it. But it looks like it should match regardless of BE vs LE, and musl already defines it for powerpc with the default definition, adding a zero arg to start on an even arg-slot index, which is an odd register (since ppc32 args start with an odd one, r3). > [6]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c I don't think this is correct, but I'm confused about where it's getting messed up because it looks like it should already be right. > This was enough to fix up the `file` bug. I'm no seasoned kernel hacker, > though, and there is still concern over the right way to approach this, > whether it should live in the kernel or libc, etc. Frankly, I don't know the > ABI structure enough to understand why the register padding has to be > different in this case, or what lower-level component is responsible for it.. > For comparison, I had a look at the mips tree, since it's bi-endian and has a > similar 32/64 situation. There is a macro conditional upon endianness that is > responsible for munging long longs; it uses __MIPSEB__ and __MIPSEL__ instead > of an if/else on the generic __LITTLE_ENDIAN__. Not sure what to make of > that. > (It also simply swaps registers for LE, unlike what I did for ppc.) Indeed the problem is probably that you need to swap registers for LE, not remove the padding slot. Did you check what happens if you pass a value larger than 32 bits? If so, the right way to fix this on the kernel side would be to construct the value as a union rather than by bitwise ops so it's endian-agnostic: (union { u32 parts[2]; u64 val; }){{ arg1, arg2 }}.val But the kernel folks might prefer endian ifdefs for some odd reason... > Also worth noting is the one other outstanding bug, where the time-related > syscalls in the 32-bit vDSO seem to return garbage. It doesn't look like an > endian bug to me, and it doesn't affect standard syscalls (which is why if you > run `date` on musl it prints the correct time, unlike on glibc). The vDSO time > functions are implemented in ppc asm (arch/powerpc/kernel/vdso32/ > gettimeofday.S), and I've never touched the stuff, so if anyone has a clue > I'm > all ears. Not sure about this. Worst-case, just leave it disabled until someone finds a fix. Rich
Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
On Fri, May 29, 2020 at 3:55 AM Aneesh Kumar K.V wrote: > > On 5/29/20 3:22 PM, Jan Kara wrote: > > Hi! > > > > On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote: > >> Thanks Michal. I also missed Jeff in this email thread. > > > > And I think you'll also need some of the sched maintainers for the prctl > > bits... > > > >> On 5/29/20 3:03 PM, Michal Suchánek wrote: > >>> Adding Jan > >>> > >>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: > With POWER10, architecture is adding new pmem flush and sync > instructions. > The kernel should prevent the usage of MAP_SYNC if applications are not > using > the new instructions on newer hardware. > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable > the usage of MAP_SYNC. The kernel config option is added to allow the > user > to control whether MAP_SYNC should be enabled by default or not. > > Signed-off-by: Aneesh Kumar K.V > > ... > diff --git a/kernel/fork.c b/kernel/fork.c > index 8c700f881d92..d5a9a363e81e 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp > DEFINE_SPINLOCK(mmlist_lock); > static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; > +#else > +unsigned long default_map_sync_mask = 0; > +#endif > + > > > > I'm not sure CONFIG is really the right approach here. For a distro that > > would > > basically mean to disable MAP_SYNC for all PPC kernels unless application > > explicitly uses the right prctl. Shouldn't we rather initialize > > default_map_sync_mask on boot based on whether the CPU we run on requires > > new flush instructions or not? Otherwise the patch looks sensible. > > > > yes that is correct. We ideally want to deny MAP_SYNC only w.r.t > POWER10. But on a virtualized platform there is no easy way to detect > that. We could ideally hook this into the nvdimm driver where we look at > the new compat string ibm,persistent-memory-v2 and then disable MAP_SYNC > if we find a device with the specific value. > > BTW with the recent changes I posted for the nvdimm driver, older kernel > won't initialize persistent memory device on newer hardware. Newer > hardware will present the device to OS with a different device tree > compat string. > > My expectation w.r.t this patch was, Distro would want to mark > CONFIG_ARCH_MAP_SYNC_DISABLE=n based on the different application > certification. Otherwise application will have to end up calling the > prctl(MMF_DISABLE_MAP_SYNC, 0) any way. If that is the case, should this > be dependent on P10? > > With that I am wondering should we even have this patch? Can we expect > userspace get updated to use new instruction?. > > With ppc64 we never had a real persistent memory device available for > end user to try. The available persistent memory stack was using vPMEM > which was presented as a volatile memory region for which there is no > need to use any of the flush instructions. We could safely assume that > as we get applications certified/verified for working with pmem device > on ppc64, they would all be using the new instructions? I think prctl is the wrong interface for this. I was thinking a sysfs interface along the same lines as /sys/block/pmemX/dax/write_cache. That attribute is toggling DAXDEV_WRITE_CACHE for the determination of whether the platform or the kernel needs to handle cache flushing relative to power loss. A similar attribute can be established for DAXDEV_SYNC, it would simply default to off based on a configuration time policy, but be dynamically changeable at runtime via sysfs. These flags are device properties that affect the kernel and userspace's handling of persistence.
Re: [PATCH v8 0/8] powerpc: switch VDSO to C implementation
Hi Michael, Le 28/04/2020 à 15:16, Christophe Leroy a écrit : This is the seventh version of a series to switch powerpc VDSO to generic C implementation. Main changes since v7 are: - Added gettime64 on PPC32 This series applies on today's powerpc/merge branch. See the last patches for details on changes and performance. Do you have any plans for this series ? Even if you don't feel like merging it this cycle, I think patches 1 to 3 are worth it. Christophe Christophe Leroy (8): powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macro powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage() powerpc/vdso: Remove unused \tmp param in __get_datapage() powerpc/processor: Move cpu_relax() into asm/vdso/processor.h powerpc/vdso: Prepare for switching VDSO to generic C implementation. powerpc/vdso: Switch VDSO to generic C implementation. lib/vdso: force inlining of __cvdso_clock_gettime_common() powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32 arch/powerpc/Kconfig | 2 + arch/powerpc/include/asm/clocksource.h | 7 + arch/powerpc/include/asm/processor.h | 10 +- arch/powerpc/include/asm/vdso/clocksource.h | 7 + arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++ arch/powerpc/include/asm/vdso/processor.h| 23 ++ arch/powerpc/include/asm/vdso/vsyscall.h | 25 ++ arch/powerpc/include/asm/vdso_datapage.h | 50 ++-- arch/powerpc/kernel/asm-offsets.c| 49 +-- arch/powerpc/kernel/time.c | 91 +- arch/powerpc/kernel/vdso.c | 58 +--- arch/powerpc/kernel/vdso32/Makefile | 32 +- arch/powerpc/kernel/vdso32/cacheflush.S | 2 +- arch/powerpc/kernel/vdso32/config-fake32.h | 34 +++ arch/powerpc/kernel/vdso32/datapage.S| 7 +- arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +-- arch/powerpc/kernel/vdso32/vdso32.lds.S | 8 +- arch/powerpc/kernel/vdso32/vgettimeofday.c | 35 +++ arch/powerpc/kernel/vdso64/Makefile | 23 +- arch/powerpc/kernel/vdso64/cacheflush.S | 9 +- arch/powerpc/kernel/vdso64/datapage.S| 31 +- arch/powerpc/kernel/vdso64/gettimeofday.S| 243 +-- arch/powerpc/kernel/vdso64/vdso64.lds.S | 7 +- arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 ++ lib/vdso/gettimeofday.c | 2 +- 25 files changed, 460 insertions(+), 799 deletions(-) create mode 100644 arch/powerpc/include/asm/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h create mode 100644 arch/powerpc/include/asm/vdso/processor.h create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c
Re: [PATCH] powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
Le 29/05/2020 à 20:50, Christophe Leroy a écrit : From: Christophe Leroy 'thread' doesn't exist in kuap_check() macro. Use 'current' instead. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Signed-off-by: Christophe Leroy Argh, can you drop this line ? Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy Reported-by: kbuild test robot --- arch/powerpc/include/asm/book3s/32/kup.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index db0a1c281587..668508c8a1b5 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -75,7 +75,7 @@ .macro kuap_check current, gpr #ifdef CONFIG_PPC_KUAP_DEBUG - lwz \gpr, KUAP(thread) + lwz \gpr, THREAD + KUAP(\current) 999: twnei \gpr, 0 EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) #endif
Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message
> Please just remove the message instead, it's a tiny allocation that's > unlikely to ever fail, and the caller will print an error anyway. How do you think about to take another look at a previous update suggestion like the following? powerpc/nvram: Delete three error messages for a failed memory allocation https://patchwork.ozlabs.org/project/linuxppc-dev/patch/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/ https://lore.kernel.org/linuxppc-dev/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/ https://lore.kernel.org/patchwork/patch/752720/ https://lkml.org/lkml/2017/1/19/537 Regards, Markus
[PATCH] powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
From: Christophe Leroy 'thread' doesn't exist in kuap_check() macro. Use 'current' instead. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Signed-off-by: Christophe Leroy Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/kup.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index db0a1c281587..668508c8a1b5 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -75,7 +75,7 @@ .macro kuap_check current, gpr #ifdef CONFIG_PPC_KUAP_DEBUG - lwz \gpr, KUAP(thread) + lwz \gpr, THREAD + KUAP(\current) 999: twnei \gpr, 0 EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) #endif -- 2.25.0
Re: [PATCH] ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed
On Mon, 25 May 2020 22:12:46 +0800, Xiyu Yang wrote: > fsl_asrc_dma_hw_params() invokes dma_request_channel() or > fsl_asrc_get_dma_channel(), which returns a reference of the specified > dma_chan object to "pair->dma_chan[dir]" with increased refcnt. > > The reference counting issue happens in one exception handling path of > fsl_asrc_dma_hw_params(). When config DMA channel failed for Back-End, > the function forgets to decrease the refcnt increased by > dma_request_channel() or fsl_asrc_get_dma_channel(), causing a refcnt > leak. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed commit: 36124fb19f1ae68a500cd76a76d40c6e81bee346 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [Intel-gfx] [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()
On Fri, May 29, 2020 at 11:49:12AM +, Luis Chamberlain wrote: > Yikes, sense, you're right. Nope, I left the random config tests to > 0day. Will fix, thanks! Yeah, I do the same for randconfig, but I always do an "allmodconfig" build before sending stuff. It's a good smoke test. -- Kees Cook
Re: [PATCH 12/13] sysctl: add helper to register empty subdir
Luis Chamberlain writes: > The way to create a subdirectory from the base set of directories > is a bit obscure, so provide a helper which makes this clear, and > also helps remove boiler plate code required to do this work. I agreee calling: register_sysctl("fs/binfmt_misc", sysctl_mount_point) is a bit obscure but if you are going to make a wrapper please make it the trivial one liner above. Say something that looks like: struct sysctl_header *register_sysctl_mount_point(const char *path) { return register_sysctl(path, sysctl_mount_point); } And yes please talk about a mount point and not an empty dir, as these are permanently empty directories to serve as mount points. There are some subtle but important permission checks this allows in the case of unprivileged mounts. Further code like this belong in proc_sysctl.c next to all of the code it is related to so that it is easier to see how to refactor the code if necessary. Eric > > Signed-off-by: Luis Chamberlain > --- > include/linux/sysctl.h | 7 +++ > kernel/sysctl.c| 16 +--- > 2 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h > index 33a471b56345..89c92390e6de 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -208,6 +208,8 @@ extern void register_sysctl_init(const char *path, struct > ctl_table *table, > extern struct ctl_table_header *register_sysctl_subdir(const char *base, > const char *subdir, > struct ctl_table *table); > +extern void register_sysctl_empty_subdir(const char *base, const char > *subdir); > + > void do_sysctl_args(void); > > extern int pwrsw_enabled; > @@ -231,6 +233,11 @@ inline struct ctl_table_header > *register_sysctl_subdir(const char *base, > return NULL; > } > > +static inline void register_sysctl_empty_subdir(const char *base, > + const char *subdir) > +{ > +} > + > static inline struct ctl_table_header *register_sysctl_paths( > const struct ctl_path *path, struct ctl_table *table) > { > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index f9a35325d5d5..460532cd5ac8 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -3188,13 +3188,17 @@ struct ctl_table_header *register_sysctl_subdir(const > char *base, > { } > }; > > - if (!table->procname) > + if (table != sysctl_mount_point && !table->procname) > goto out; > > hdr = register_sysctl_table(base_table); > if (unlikely(!hdr)) { > - pr_err("failed when creating subdirectory sysctl %s/%s/%s\n", > -base, subdir, table->procname); > + if (table != sysctl_mount_point) > + pr_err("failed when creating subdirectory sysctl > %s/%s/%s\n", > +base, subdir, table->procname); > + else > + pr_err("failed when creating empty subddirectory > %s/%s\n", > +base, subdir); > goto out; > } > kmemleak_not_leak(hdr); > @@ -3202,6 +3206,12 @@ struct ctl_table_header *register_sysctl_subdir(const > char *base, > return hdr; > } > EXPORT_SYMBOL_GPL(register_sysctl_subdir); > + > +void register_sysctl_empty_subdir(const char *base, > + const char *subdir) > +{ > + register_sysctl_subdir(base, subdir, sysctl_mount_point); > +} > #endif /* CONFIG_SYSCTL */ > /* > * No sense putting this after each symbol definition, twice,
Re: [PATCH 02/13] cdrom: use new sysctl subdir helper register_sysctl_subdir()
Luis Chamberlain writes: > This simplifies the code considerably. The following coccinelle With register_sysctl the code would read: cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table); Please go that direction. Thank you. Eric
Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper
ebied...@xmission.com (Eric W. Biederman) writes: > Luis Chamberlain writes: > >> Often enough all we need to do is create a subdirectory so that >> we can stuff sysctls underneath it. However, *if* that directory >> was already created early on the boot sequence we really have no >> need to use the full boiler plate code for it, we can just use >> local variables to help us guide sysctl to place the new leaf files. >> >> So use a helper to do precisely this. > > Reset restart. This is patch is total nonsense. > > - You are using register_sysctl_table which as I believe I have > mentioned is a deprecated compatibility wrapper. The point of > spring house cleaning is to get off of the deprecated functions > isn't it? > > - You are using the old nasty form for creating directories instead > of just passing in a path. > > - None of this is even remotely necessary. The directories > are created automatically if you just register their entries. Oh. *blink* The poor naming threw me off. This is a clumsy and poorly named version of register_sysctl(); Yes. This change is totally unnecessary. Eric
Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper
Luis Chamberlain writes: > Often enough all we need to do is create a subdirectory so that > we can stuff sysctls underneath it. However, *if* that directory > was already created early on the boot sequence we really have no > need to use the full boiler plate code for it, we can just use > local variables to help us guide sysctl to place the new leaf files. > > So use a helper to do precisely this. Reset restart. This is patch is total nonsense. - You are using register_sysctl_table which as I believe I have mentioned is a deprecated compatibility wrapper. The point of spring house cleaning is to get off of the deprecated functions isn't it? - You are using the old nasty form for creating directories instead of just passing in a path. - None of this is even remotely necessary. The directories are created automatically if you just register their entries. Eric
Re: [PATCH v3 3/3] arch, scripts: Add script to check relocations at compile time
On Sun, May 24, 2020 at 2:26 PM Alexandre Ghiti wrote: > > Relocating kernel at runtime is done very early in the boot process, so > it is not convenient to check for relocations there and react in case a > relocation was not expected. > > Powerpc architecture has a script that allows to check at compile time > for such unexpected relocations: extract the common logic to scripts/ > and add arch specific scripts triggered at postlink. > > At the moment, powerpc and riscv architectures take advantage of this > compile-time check. > > Signed-off-by: Alexandre Ghiti > --- > arch/powerpc/tools/relocs_check.sh | 18 ++- > arch/riscv/Makefile.postlink | 36 ++ > arch/riscv/tools/relocs_check.sh | 26 + > scripts/relocs_check.sh| 20 + > 4 files changed, 84 insertions(+), 16 deletions(-) > create mode 100644 arch/riscv/Makefile.postlink > create mode 100755 arch/riscv/tools/relocs_check.sh > create mode 100755 scripts/relocs_check.sh Maybe you should send the change arch/powerpc/tools/relocs_check.sh as a separate patch so that it can be picked up by arch/powerpc maintainers. > > diff --git a/arch/powerpc/tools/relocs_check.sh > b/arch/powerpc/tools/relocs_check.sh > index 014e00e74d2b..e367895941ae 100755 > --- a/arch/powerpc/tools/relocs_check.sh > +++ b/arch/powerpc/tools/relocs_check.sh > @@ -15,21 +15,8 @@ if [ $# -lt 3 ]; then > exit 1 > fi > > -# Have Kbuild supply the path to objdump and nm so we handle cross > compilation. > -objdump="$1" > -nm="$2" > -vmlinux="$3" > - > -# Remove from the bad relocations those that match an undefined weak symbol > -# which will result in an absolute relocation to 0. > -# Weak unresolved symbols are of that form in nm output: > -# " w _binary__btf_vmlinux_bin_end" > -undef_weak_symbols=$($nm "$vmlinux" | awk '$1 ~ /w/ { print $2 }') > - > bad_relocs=$( > -$objdump -R "$vmlinux" | > - # Only look at relocation lines. > - grep -E '\ +${srctree}/scripts/relocs_check.sh "$@" | > # These relocations are okay > # On PPC64: > # R_PPC64_RELATIVE, R_PPC64_NONE > @@ -43,8 +30,7 @@ R_PPC_ADDR16_LO > R_PPC_ADDR16_HI > R_PPC_ADDR16_HA > R_PPC_RELATIVE > -R_PPC_NONE' | > - ([ "$undef_weak_symbols" ] && grep -F -w -v "$undef_weak_symbols" || > cat) > +R_PPC_NONE' > ) > > if [ -z "$bad_relocs" ]; then > diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink > new file mode 100644 > index ..bf2b2bca1845 > --- /dev/null > +++ b/arch/riscv/Makefile.postlink > @@ -0,0 +1,36 @@ > +# SPDX-License-Identifier: GPL-2.0 > +# === > +# Post-link riscv pass > +# === > +# > +# Check that vmlinux relocations look sane > + > +PHONY := __archpost > +__archpost: > + > +-include include/config/auto.conf > +include scripts/Kbuild.include > + > +quiet_cmd_relocs_check = CHKREL $@ > +cmd_relocs_check = \ > + $(CONFIG_SHELL) $(srctree)/arch/riscv/tools/relocs_check.sh > "$(OBJDUMP)" "$(NM)" "$@" > + > +# `@true` prevents complaint when there is nothing to be done > + > +vmlinux: FORCE > + @true > +ifdef CONFIG_RELOCATABLE > + $(call if_changed,relocs_check) > +endif > + > +%.ko: FORCE > + @true > + > +clean: > + @true > + > +PHONY += FORCE clean > + > +FORCE: > + > +.PHONY: $(PHONY) > diff --git a/arch/riscv/tools/relocs_check.sh > b/arch/riscv/tools/relocs_check.sh > new file mode 100755 > index ..baeb2e7b2290 > --- /dev/null > +++ b/arch/riscv/tools/relocs_check.sh > @@ -0,0 +1,26 @@ > +#!/bin/sh > +# SPDX-License-Identifier: GPL-2.0-or-later > +# Based on powerpc relocs_check.sh > + > +# This script checks the relocations of a vmlinux for "suspicious" > +# relocations. > + > +if [ $# -lt 3 ]; then > +echo "$0 [path to objdump] [path to nm] [path to vmlinux]" 1>&2 > +exit 1 > +fi > + > +bad_relocs=$( > +${srctree}/scripts/relocs_check.sh "$@" | > + # These relocations are okay > + # R_RISCV_RELATIVE > + grep -F -w -v 'R_RISCV_RELATIVE' > +) > + > +if [ -z "$bad_relocs" ]; then > + exit 0 > +fi > + > +num_bad=$(echo "$bad_relocs" | wc -l) > +echo "WARNING: $num_bad bad relocations" > +echo "$bad_relocs" > diff --git a/scripts/relocs_check.sh b/scripts/relocs_check.sh > new file mode 100755 > index ..137c660499f3 > --- /dev/null > +++ b/scripts/relocs_check.sh > @@ -0,0 +1,20 @@ > +#!/bin/sh > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +# Get a list of all the relocations, remove from it the relocations > +# that are known to be legitimate and return this list to arch specific > +# script that will look for suspicious relocations. > + > +objdump="$1" > +nm="$2" > +vmlinux="$3" > + > +# Remove
Re: [PATCH v3 2/3] riscv: Introduce CONFIG_RELOCATABLE
On Sun, May 24, 2020 at 2:25 PM Alexandre Ghiti wrote: > > This config allows to compile the kernel as PIE and to relocate it at > any virtual address at runtime: this paves the way to KASLR and to 4-level > page table folding at runtime. Runtime relocation is possible since > relocation metadata are embedded into the kernel. > > Note that relocating at runtime introduces an overhead even if the > kernel is loaded at the same address it was linked at and that the compiler > options are those used in arm64 which uses the same RELA relocation > format. > > Signed-off-by: Alexandre Ghiti > --- > arch/riscv/Kconfig | 12 +++ > arch/riscv/Makefile | 5 ++- > arch/riscv/kernel/vmlinux.lds.S | 6 ++-- > arch/riscv/mm/Makefile | 4 +++ > arch/riscv/mm/init.c| 63 + > 5 files changed, 87 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index a31e1a41913a..93127d5913fe 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -170,6 +170,18 @@ config PGTABLE_LEVELS > default 3 if 64BIT > default 2 > > +config RELOCATABLE > + bool > + depends on MMU > + help > + This builds a kernel as a Position Independent Executable (PIE), > + which retains all relocation metadata required to relocate the > + kernel binary at runtime to a different virtual address than the > + address it was linked at. > + Since RISCV uses the RELA relocation format, this requires a > + relocation pass at runtime even if the kernel is loaded at the > + same address it was linked at. > + > source "arch/riscv/Kconfig.socs" > > menu "Platform type" > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile > index fb6e37db836d..1406416ea743 100644 > --- a/arch/riscv/Makefile > +++ b/arch/riscv/Makefile > @@ -9,7 +9,10 @@ > # > > OBJCOPYFLAGS:= -O binary > -LDFLAGS_vmlinux := > +ifeq ($(CONFIG_RELOCATABLE),y) > +LDFLAGS_vmlinux := -shared -Bsymbolic -z notext -z norelro > +KBUILD_CFLAGS += -fPIE > +endif > ifeq ($(CONFIG_DYNAMIC_FTRACE),y) > LDFLAGS_vmlinux := --no-relax > endif > diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S > index a9abde62909f..e8ffba8c2044 100644 > --- a/arch/riscv/kernel/vmlinux.lds.S > +++ b/arch/riscv/kernel/vmlinux.lds.S > @@ -85,8 +85,10 @@ SECTIONS > > BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0) > > - .rel.dyn : { > - *(.rel.dyn*) > + .rela.dyn : ALIGN(8) { > + __rela_dyn_start = .; > + *(.rela .rela*) > + __rela_dyn_end = .; > } > > _end = .; > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > index 363ef01c30b1..dc5cdaa80bc1 100644 > --- a/arch/riscv/mm/Makefile > +++ b/arch/riscv/mm/Makefile > @@ -1,6 +1,10 @@ > # SPDX-License-Identifier: GPL-2.0-only > > CFLAGS_init.o := -mcmodel=medany > +ifdef CONFIG_RELOCATABLE > +CFLAGS_init.o += -fno-pie > +endif > + > ifdef CONFIG_FTRACE > CFLAGS_REMOVE_init.o = -pg > endif > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 17f108baec4f..7074522d40c6 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -13,6 +13,9 @@ > #include > #include > #include > +#ifdef CONFIG_RELOCATABLE > +#include > +#endif > > #include > #include > @@ -379,6 +382,53 @@ static uintptr_t __init best_map_size(phys_addr_t base, > phys_addr_t size) > #error "setup_vm() is called from head.S before relocate so it should not > use absolute addressing." > #endif > > +#ifdef CONFIG_RELOCATABLE > +extern unsigned long __rela_dyn_start, __rela_dyn_end; > + > +#ifdef CONFIG_64BIT > +#define Elf_Rela Elf64_Rela > +#define Elf_Addr Elf64_Addr > +#else > +#define Elf_Rela Elf32_Rela > +#define Elf_Addr Elf32_Addr > +#endif > + > +void __init relocate_kernel(uintptr_t load_pa) > +{ > + Elf_Rela *rela = (Elf_Rela *)&__rela_dyn_start; > + /* > +* This holds the offset between the linked virtual address and the > +* relocated virtual address. > +*/ > + uintptr_t reloc_offset = kernel_virt_addr - KERNEL_LINK_ADDR; > + /* > +* This holds the offset between kernel linked virtual address and > +* physical address. > +*/ > + uintptr_t va_kernel_link_pa_offset = KERNEL_LINK_ADDR - load_pa; > + > + for ( ; rela < (Elf_Rela *)&__rela_dyn_end; rela++) { > + Elf_Addr addr = (rela->r_offset - va_kernel_link_pa_offset); > + Elf_Addr relocated_addr = rela->r_addend; > + > + if (rela->r_info != R_RISCV_RELATIVE) > + continue; > + > + /* > +* Make sure to not relocate vdso symbols like rt_sigreturn > +* which are linked from the address 0 in vmlinux since > +* vdso symbol addresses are actually used as an
Re: [PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()
On 2020/5/29 18:26, Greg KH wrote: On Fri, May 29, 2020 at 07:41:06AM +, Luis Chamberlain wrote: From: Xiaoming Ni Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c and use register_sysctl_subdir() to help remove the clutter out of kernel/sysctl.c. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- drivers/char/random.c | 14 -- include/linux/sysctl.h | 1 - kernel/sysctl.c| 5 - 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index a7cf6aa65908..73fd4b6e9c18 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int write, } static int sysctl_poolsize = INPUT_POOL_WORDS * 32; -extern struct ctl_table random_table[]; -struct ctl_table random_table[] = { +static struct ctl_table random_table[] = { { .procname = "poolsize", .data = _poolsize, @@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = { #endif { } }; + +/* + * rand_initialize() is called before sysctl_init(), + * so we cannot call register_sysctl_init() in rand_initialize() + */ +static int __init random_sysctls_init(void) +{ + register_sysctl_subdir("kernel", "random", random_table); No error checking? :( It was my mistake, I forgot to add a comment here. Same as the comment of register_sysctl_init(), There is almost no failure here during the system initialization phase, and failure in time does not affect the main function. Thanks Xiaoming Ni
Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper
On Fri, May 29, 2020 at 11:13:21AM +0300, Jani Nikula wrote: > On Fri, 29 May 2020, Luis Chamberlain wrote: > > Often enough all we need to do is create a subdirectory so that > > we can stuff sysctls underneath it. However, *if* that directory > > was already created early on the boot sequence we really have no > > need to use the full boiler plate code for it, we can just use > > local variables to help us guide sysctl to place the new leaf files. > > I find it hard to figure out the lifetime requirements for the tables > passed in; when it's okay to use local variables and when you need > longer lifetimes. It's not documented, everyone appears to be using > static tables for this. It's far from obvious. I agree 2000% that it is not obvious. What made me consider it was that I *knew* that the base directory would already exist, so it wouldn't make sense for the code to rely on earlier parts of a table if part of the hierarchy already existed. In fact, a *huge* part of the due dilligence on this and futre series on this cleanup will be to be 100% sure that the base path is already created. And so this use is obviously dangerous, you just *need* to know that the base path is created before. Non-posted changes also deal with link order to help address this in other places, given that link order controls how *initcalls() (early_initcall(), late_initcall(), etc) are ordered if you have multiple of these. I had a link order series long ago which augmented our ability to make things clearer at a link order. Eventually I believe this will become more important, specially as we end up wanting to async more code. For now, we can only rely on manual code inspection for ensuring proper ordering. Part of the implicit aspects of this cleanup is to slowly make these things clearer for each base path. So... the "fs" base path will actually end up being created in fs/sysctl.c after we are *fully* done with the fs sysctl cleanups. Luis
Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()
On 2020/5/29 18:26, Greg KH wrote: On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote: From: Xiaoming Ni Move the firmware config sysctl table to fallback_table.c and use the new register_sysctl_subdir() helper. This removes the clutter from kernel/sysctl.c. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- drivers/base/firmware_loader/fallback.c | 4 drivers/base/firmware_loader/fallback.h | 11 ++ drivers/base/firmware_loader/fallback_table.c | 22 +-- include/linux/sysctl.h| 1 - kernel/sysctl.c | 7 -- 5 files changed, 35 insertions(+), 10 deletions(-) So it now takes more lines than the old stuff? :( CONFIG_FW_LOADER = m Before cleaning, no matter whether ko is loaded or not, the sysctl interface will be created, but now we need to add register and unregister interfaces, so the number of lines of code has increased diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c index d9ac7296205e..8190653ae9a3 100644 --- a/drivers/base/firmware_loader/fallback.c +++ b/drivers/base/firmware_loader/fallback.c @@ -200,12 +200,16 @@ static struct class firmware_class = { int register_sysfs_loader(void) { + int ret = register_firmware_config_sysctl(); + if (ret != 0) + return ret; checkpatch :( This is my fault, thanks for your guidance return class_register(_class); And if that fails? Yes, it is better to call register_firmware_config_sysctl() after class_register(). thanks for your guidance. } void unregister_sysfs_loader(void) { class_unregister(_class); + unregister_firmware_config_sysctl(); } static ssize_t firmware_loading_show(struct device *dev, diff --git a/drivers/base/firmware_loader/fallback.h b/drivers/base/firmware_loader/fallback.h index 06f4577733a8..7d2cb5f6ceb8 100644 --- a/drivers/base/firmware_loader/fallback.h +++ b/drivers/base/firmware_loader/fallback.h @@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void); int register_sysfs_loader(void); void unregister_sysfs_loader(void); +#ifdef CONFIG_SYSCTL +extern int register_firmware_config_sysctl(void); +extern void unregister_firmware_config_sysctl(void); +#else +static inline int register_firmware_config_sysctl(void) +{ + return 0; +} +static inline void unregister_firmware_config_sysctl(void) { } +#endif /* CONFIG_SYSCTL */ + #else /* CONFIG_FW_LOADER_USER_HELPER */ static inline int firmware_fallback_sysfs(struct firmware *fw, const char *name, struct device *device, diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index 46a731dede6f..4234aa5ee5df 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = { EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE); #ifdef CONFIG_SYSCTL -struct ctl_table firmware_config_table[] = { +static struct ctl_table firmware_config_table[] = { { .procname = "force_sysfs_fallback", .data = _fallback_config.force_sysfs_fallback, @@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = { }, { } }; -#endif + +static struct ctl_table_header *hdr; +int register_firmware_config_sysctl(void) +{ + if (hdr) + return -EEXIST; How can hdr be set? It's my mistake, register_firmware_config_sysctl() is not exported, there will be no repeated calls. thanks for your guidance. + hdr = register_sysctl_subdir("kernel", "firmware_config", +firmware_config_table); + if (!hdr) + return -ENOMEM; + return 0; +} + +void unregister_firmware_config_sysctl(void) +{ + if (hdr) + unregister_sysctl_table(hdr); Why can't unregister_sysctl_table() take a null pointer value? Sorry, I didn't notice that the unregister_sysctl_table() already checks the input parameters. thanks for your guidance. And what sets 'hdr' (worst name for a static variable) to NULL so that it knows not to be unregistered again as it looks like register_firmware_config_sysctl() could be called multiple times. How about renaming hdr to firmware_config_sysct_table_header? + if (hdr) + return -EEXIST; After deleting this code in register_firmware_config_sysctl(), and considering register_firmware_config_sysctl() and unregister_firmware_config_sysctl() are not exported, whether there is no need to add "hdr = NULL;" ? Thanks Xiaoming Ni
Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()
On Fri, May 29, 2020 at 12:26:13PM +0200, Greg KH wrote: > On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote: > > From: Xiaoming Ni > > > > Move the firmware config sysctl table to fallback_table.c and use the > > new register_sysctl_subdir() helper. This removes the clutter from > > kernel/sysctl.c. > > > > Signed-off-by: Xiaoming Ni > > Signed-off-by: Luis Chamberlain > > --- > > drivers/base/firmware_loader/fallback.c | 4 > > drivers/base/firmware_loader/fallback.h | 11 ++ > > drivers/base/firmware_loader/fallback_table.c | 22 +-- > > include/linux/sysctl.h| 1 - > > kernel/sysctl.c | 7 -- > > 5 files changed, 35 insertions(+), 10 deletions(-) > > So it now takes more lines than the old stuff? :( Pretty much agreed with the other changes, thanks for the review! But this diff-stat change, indeed, it is unfortunate that we end up with more code here than before. We'll try to reduce it instead somehow, however in some cases during this spring-cleaning, since the goal is to move code from one file to another, it *may* require more code. So it won't always be negative. But we'll try! Luis
Re: [Intel-gfx] [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()
On Fri, May 29, 2020 at 01:23:19AM -0700, Kees Cook wrote: > On Fri, May 29, 2020 at 07:41:01AM +, Luis Chamberlain wrote: > > This simplifies the code considerably. The following coccinelle > > SmPL grammar rule was used to transform this code. > > > > // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c > > > > @c1@ > > expression E1; > > identifier subdir, sysctls; > > @@ > > > > static struct ctl_table subdir[] = { > > { > > .procname = E1, > > .maxlen = 0, > > .mode = 0555, > > .child = sysctls, > > }, > > { } > > }; > > > > @c2@ > > identifier c1.subdir; > > > > expression E2; > > identifier base; > > @@ > > > > static struct ctl_table base[] = { > > { > > .procname = E2, > > .maxlen = 0, > > .mode = 0555, > > .child = subdir, > > }, > > { } > > }; > > > > @c3@ > > identifier c2.base; > > identifier header; > > @@ > > > > header = register_sysctl_table(base); > > > > @r1 depends on c1 && c2 && c3@ > > expression c1.E1; > > identifier c1.subdir, c1.sysctls; > > @@ > > > > -static struct ctl_table subdir[] = { > > - { > > - .procname = E1, > > - .maxlen = 0, > > - .mode = 0555, > > - .child = sysctls, > > - }, > > - { } > > -}; > > > > @r2 depends on c1 && c2 && c3@ > > identifier c1.subdir; > > > > expression c2.E2; > > identifier c2.base; > > @@ > > -static struct ctl_table base[] = { > > - { > > - .procname = E2, > > - .maxlen = 0, > > - .mode = 0555, > > - .child = subdir, > > - }, > > - { } > > -}; > > > > @r3 depends on c1 && c2 && c3@ > > expression c1.E1; > > identifier c1.sysctls; > > expression c2.E2; > > identifier c2.base; > > identifier c3.header; > > @@ > > > > header = > > -register_sysctl_table(base); > > +register_sysctl_subdir(E2, E1, sysctls); > > > > Generated-by: Coccinelle SmPL > > > > Signed-off-by: Luis Chamberlain > > --- > > fs/ocfs2/stackglue.c | 27 --- > > 1 file changed, 4 insertions(+), 23 deletions(-) > > > > diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c > > index a191094694c6..addafced7f59 100644 > > --- a/fs/ocfs2/stackglue.c > > +++ b/fs/ocfs2/stackglue.c > > @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = { > > }, > > { } > > }; > > - > > -static struct ctl_table ocfs2_kern_table[] = { > > - { > > - .procname = "ocfs2", > > - .data = NULL, > > - .maxlen = 0, > > - .mode = 0555, > > - .child = ocfs2_mod_table > > - }, > > - { } > > -}; > > - > > -static struct ctl_table ocfs2_root_table[] = { > > - { > > - .procname = "fs", > > - .data = NULL, > > - .maxlen = 0, > > - .mode = 0555, > > - .child = ocfs2_kern_table > > - }, > > - { } > > -}; > > + .data = NULL, > > + .data = NULL, > > The conversion script doesn't like the .data field assignments. ;) > > Was this series built with allmodconfig? I would have expected this to > blow up very badly. :) Yikes, sense, you're right. Nope, I left the random config tests to 0day. Will fix, thanks! Luis
Re: [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code
Andrew Donnellan writes: > On 29/5/20 4:14 pm, Daniel Axtens wrote: >> syzkaller is picking up a bunch of crashes that look like this: >> >> Unrecoverable exception 380 at c037ed60 (msr=80001031) >> Oops: Unrecoverable exception, sig: 6 [#1] >> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries >> Modules linked in: >> CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted >> 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 >> NIP: c037ed60 LR: c004bac8 CTR: c0030990 >> REGS: c000555a7230 TRAP: 0380 Not tainted >> (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) >> MSR: 80001031 CR: 48222882 XER: 2000 >> CFAR: c004bac4 IRQMASK: 0 >> GPR00: c004bb68 c000555a74c0 c24b3500 0005 >> GPR04: c004bb88 c0080091 >> GPR08: 000b c004bac8 00016000 c2503500 >> GPR12: c0030990 c319 106a5898 106a >> GPR16: 106a5890 c7a92000 c8180e00 c7a8f700 >> GPR20: c7a904b0 1011 c259d318 5deadbeef100 >> GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778 >> GPR28: 0001 8280b033 c000555a75a0 >> NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50 >> LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310 >> Call Trace: >> [c000555a74c0] [c004bb68] >> interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) >> [c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0 >> --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 >> .. >> >> That looks like the KCOV helper accessing memory that's not safe to >> access in the interrupt handling context. >> >> Do not instrument the new syscall entry/exit code with KCOV, GCOV or >> UBSAN. >> >> Cc: Nicholas Piggin >> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic >> in C") >> Signed-off-by: Daniel Axtens > > This seems reasonable - I've verified that this does indeed suppress the > kcov trace calls. Thanks. > Acked-by: Andrew Donnellan > > (does this need to be tagged for stable? the Fixes: commit is in 5.6 but > we're at 5.7-rc7 at this point...) No. The Fixed commit is based on v5.6-rc2, but it didn't go in until v5.7-rc1: $ git describe --contains --match 'v*' 68b34588e202 v5.7-rc1~35^2~46 I plan to send it to Linus before the v5.7 release. cheers
Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
On 5/29/20 3:22 PM, Jan Kara wrote: Hi! On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote: Thanks Michal. I also missed Jeff in this email thread. And I think you'll also need some of the sched maintainers for the prctl bits... On 5/29/20 3:03 PM, Michal Suchánek wrote: Adding Jan On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: With POWER10, architecture is adding new pmem flush and sync instructions. The kernel should prevent the usage of MAP_SYNC if applications are not using the new instructions on newer hardware. This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable the usage of MAP_SYNC. The kernel config option is added to allow the user to control whether MAP_SYNC should be enabled by default or not. Signed-off-by: Aneesh Kumar K.V ... diff --git a/kernel/fork.c b/kernel/fork.c index 8c700f881d92..d5a9a363e81e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock); static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; +#else +unsigned long default_map_sync_mask = 0; +#endif + I'm not sure CONFIG is really the right approach here. For a distro that would basically mean to disable MAP_SYNC for all PPC kernels unless application explicitly uses the right prctl. Shouldn't we rather initialize default_map_sync_mask on boot based on whether the CPU we run on requires new flush instructions or not? Otherwise the patch looks sensible. yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10. But on a virtualized platform there is no easy way to detect that. We could ideally hook this into the nvdimm driver where we look at the new compat string ibm,persistent-memory-v2 and then disable MAP_SYNC if we find a device with the specific value. BTW with the recent changes I posted for the nvdimm driver, older kernel won't initialize persistent memory device on newer hardware. Newer hardware will present the device to OS with a different device tree compat string. My expectation w.r.t this patch was, Distro would want to mark CONFIG_ARCH_MAP_SYNC_DISABLE=n based on the different application certification. Otherwise application will have to end up calling the prctl(MMF_DISABLE_MAP_SYNC, 0) any way. If that is the case, should this be dependent on P10? With that I am wondering should we even have this patch? Can we expect userspace get updated to use new instruction?. With ppc64 we never had a real persistent memory device available for end user to try. The available persistent memory stack was using vPMEM which was presented as a volatile memory region for which there is no need to use any of the flush instructions. We could safely assume that as we get applications certified/verified for working with pmem device on ppc64, they would all be using the new instructions? -aneesh
Re: [PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()
On Fri, May 29, 2020 at 07:41:06AM +, Luis Chamberlain wrote: > From: Xiaoming Ni > > Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c > and use register_sysctl_subdir() to help remove the clutter out of > kernel/sysctl.c. > > Signed-off-by: Xiaoming Ni > Signed-off-by: Luis Chamberlain > --- > drivers/char/random.c | 14 -- > include/linux/sysctl.h | 1 - > kernel/sysctl.c| 5 - > 3 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index a7cf6aa65908..73fd4b6e9c18 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int > write, > } > > static int sysctl_poolsize = INPUT_POOL_WORDS * 32; > -extern struct ctl_table random_table[]; > -struct ctl_table random_table[] = { > +static struct ctl_table random_table[] = { > { > .procname = "poolsize", > .data = _poolsize, > @@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = { > #endif > { } > }; > + > +/* > + * rand_initialize() is called before sysctl_init(), > + * so we cannot call register_sysctl_init() in rand_initialize() > + */ > +static int __init random_sysctls_init(void) > +{ > + register_sysctl_subdir("kernel", "random", random_table); No error checking? :(
Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()
On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote: > From: Xiaoming Ni > > Move the firmware config sysctl table to fallback_table.c and use the > new register_sysctl_subdir() helper. This removes the clutter from > kernel/sysctl.c. > > Signed-off-by: Xiaoming Ni > Signed-off-by: Luis Chamberlain > --- > drivers/base/firmware_loader/fallback.c | 4 > drivers/base/firmware_loader/fallback.h | 11 ++ > drivers/base/firmware_loader/fallback_table.c | 22 +-- > include/linux/sysctl.h| 1 - > kernel/sysctl.c | 7 -- > 5 files changed, 35 insertions(+), 10 deletions(-) So it now takes more lines than the old stuff? :( > > diff --git a/drivers/base/firmware_loader/fallback.c > b/drivers/base/firmware_loader/fallback.c > index d9ac7296205e..8190653ae9a3 100644 > --- a/drivers/base/firmware_loader/fallback.c > +++ b/drivers/base/firmware_loader/fallback.c > @@ -200,12 +200,16 @@ static struct class firmware_class = { > > int register_sysfs_loader(void) > { > + int ret = register_firmware_config_sysctl(); > + if (ret != 0) > + return ret; checkpatch :( > return class_register(_class); And if that fails? > } > > void unregister_sysfs_loader(void) > { > class_unregister(_class); > + unregister_firmware_config_sysctl(); > } > > static ssize_t firmware_loading_show(struct device *dev, > diff --git a/drivers/base/firmware_loader/fallback.h > b/drivers/base/firmware_loader/fallback.h > index 06f4577733a8..7d2cb5f6ceb8 100644 > --- a/drivers/base/firmware_loader/fallback.h > +++ b/drivers/base/firmware_loader/fallback.h > @@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void); > > int register_sysfs_loader(void); > void unregister_sysfs_loader(void); > +#ifdef CONFIG_SYSCTL > +extern int register_firmware_config_sysctl(void); > +extern void unregister_firmware_config_sysctl(void); > +#else > +static inline int register_firmware_config_sysctl(void) > +{ > + return 0; > +} > +static inline void unregister_firmware_config_sysctl(void) { } > +#endif /* CONFIG_SYSCTL */ > + > #else /* CONFIG_FW_LOADER_USER_HELPER */ > static inline int firmware_fallback_sysfs(struct firmware *fw, const char > *name, > struct device *device, > diff --git a/drivers/base/firmware_loader/fallback_table.c > b/drivers/base/firmware_loader/fallback_table.c > index 46a731dede6f..4234aa5ee5df 100644 > --- a/drivers/base/firmware_loader/fallback_table.c > +++ b/drivers/base/firmware_loader/fallback_table.c > @@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = { > EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE); > > #ifdef CONFIG_SYSCTL > -struct ctl_table firmware_config_table[] = { > +static struct ctl_table firmware_config_table[] = { > { > .procname = "force_sysfs_fallback", > .data = _fallback_config.force_sysfs_fallback, > @@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = { > }, > { } > }; > -#endif > + > +static struct ctl_table_header *hdr; > +int register_firmware_config_sysctl(void) > +{ > + if (hdr) > + return -EEXIST; How can hdr be set? > + hdr = register_sysctl_subdir("kernel", "firmware_config", > + firmware_config_table); > + if (!hdr) > + return -ENOMEM; > + return 0; > +} > + > +void unregister_firmware_config_sysctl(void) > +{ > + if (hdr) > + unregister_sysctl_table(hdr); Why can't unregister_sysctl_table() take a null pointer value? And what sets 'hdr' (worst name for a static variable) to NULL so that it knows not to be unregistered again as it looks like register_firmware_config_sysctl() could be called multiple times. thanks, greg k-h
Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
Hi! On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote: > Thanks Michal. I also missed Jeff in this email thread. And I think you'll also need some of the sched maintainers for the prctl bits... > On 5/29/20 3:03 PM, Michal Suchánek wrote: > > Adding Jan > > > > On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: > > > With POWER10, architecture is adding new pmem flush and sync instructions. > > > The kernel should prevent the usage of MAP_SYNC if applications are not > > > using > > > the new instructions on newer hardware. > > > > > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable > > > the usage of MAP_SYNC. The kernel config option is added to allow the user > > > to control whether MAP_SYNC should be enabled by default or not. > > > > > > Signed-off-by: Aneesh Kumar K.V ... > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index 8c700f881d92..d5a9a363e81e 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp > > > DEFINE_SPINLOCK(mmlist_lock); > > > static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; > > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE > > > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; > > > +#else > > > +unsigned long default_map_sync_mask = 0; > > > +#endif > > > + I'm not sure CONFIG is really the right approach here. For a distro that would basically mean to disable MAP_SYNC for all PPC kernels unless application explicitly uses the right prctl. Shouldn't we rather initialize default_map_sync_mask on boot based on whether the CPU we run on requires new flush instructions or not? Otherwise the patch looks sensible. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH v4 6/7] KVM: MIPS: clean up redundant 'kvm_run' parameters
On 27/05/20 08:24, Tianjia Zhang wrote: >>> >>> > > Hi Huacai, > > These two patches(6/7 and 7/7) should be merged into the tree of the > mips architecture separately. At present, there seems to be no good way > to merge the whole architecture patchs. > > For this series of patches, some architectures have been merged, some > need to update the patch. Hi Tianjia, I will take care of this during the merge window. Thanks, Paolo
Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
Hi, Thanks Michal. I also missed Jeff in this email thread. -aneesh On 5/29/20 3:03 PM, Michal Suchánek wrote: Adding Jan On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: With POWER10, architecture is adding new pmem flush and sync instructions. The kernel should prevent the usage of MAP_SYNC if applications are not using the new instructions on newer hardware. This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable the usage of MAP_SYNC. The kernel config option is added to allow the user to control whether MAP_SYNC should be enabled by default or not. Signed-off-by: Aneesh Kumar K.V --- include/linux/sched/coredump.h | 13 ++--- include/uapi/linux/prctl.h | 3 +++ kernel/fork.c | 8 +++- kernel/sys.c | 18 ++ mm/Kconfig | 3 +++ mm/mmap.c | 4 6 files changed, 45 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index ecdc6542070f..9ba6b3d5f991 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ #define MMF_OOM_VICTIM25 /* mm is the oom victim */ #define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ -#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) +#define MMF_DISABLE_MAP_SYNC 27 /* disable THP for all VMAs */ +#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) +#define MMF_DISABLE_MAP_SYNC_MASK (1 << MMF_DISABLE_MAP_SYNC) -#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ -MMF_DISABLE_THP_MASK) +#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \ + MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK) + +static inline bool map_sync_enabled(struct mm_struct *mm) +{ + return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK); +} #endif /* _LINUX_SCHED_COREDUMP_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 07b4f8131e36..ee4cde32d5cf 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -238,4 +238,7 @@ struct prctl_mm_map { #define PR_SET_IO_FLUSHER 57 #define PR_GET_IO_FLUSHER 58 +#define PR_SET_MAP_SYNC_ENABLE 59 +#define PR_GET_MAP_SYNC_ENABLE 60 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 8c700f881d92..d5a9a363e81e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock); static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; +#else +unsigned long default_map_sync_mask = 0; +#endif + static int __init coredump_filter_setup(char *s) { default_dump_filter = @@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->flags = current->mm->flags & MMF_INIT_MASK; mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; } else { - mm->flags = default_dump_filter; + mm->flags = default_dump_filter | default_map_sync_mask; mm->def_flags = 0; } diff --git a/kernel/sys.c b/kernel/sys.c index d325f3ab624a..f6127cf4128b 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, clear_bit(MMF_DISABLE_THP, >mm->flags); up_write(>mm->mmap_sem); break; + + case PR_GET_MAP_SYNC_ENABLE: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + error = !test_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); + break; + case PR_SET_MAP_SYNC_ENABLE: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (down_write_killable(>mm->mmap_sem)) + return -EINTR; + if (arg2) + clear_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); + else + set_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); + up_write(>mm->mmap_sem); + break; + case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: /* No longer implemented: */ diff --git a/mm/Kconfig b/mm/Kconfig index c1acc34c1c35..38fd7cfbfca8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config ARCH_MAP_SYNC_DISABLE + bool + endmenu diff --git a/mm/mmap.c b/mm/mmap.c
Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
Adding Jan On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: > With POWER10, architecture is adding new pmem flush and sync instructions. > The kernel should prevent the usage of MAP_SYNC if applications are not using > the new instructions on newer hardware. > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable > the usage of MAP_SYNC. The kernel config option is added to allow the user > to control whether MAP_SYNC should be enabled by default or not. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/sched/coredump.h | 13 ++--- > include/uapi/linux/prctl.h | 3 +++ > kernel/fork.c | 8 +++- > kernel/sys.c | 18 ++ > mm/Kconfig | 3 +++ > mm/mmap.c | 4 > 6 files changed, 45 insertions(+), 4 deletions(-) > > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h > index ecdc6542070f..9ba6b3d5f991 100644 > --- a/include/linux/sched/coredump.h > +++ b/include/linux/sched/coredump.h > @@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm) > #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ > #define MMF_OOM_VICTIM 25 /* mm is the oom victim */ > #define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ > -#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) > +#define MMF_DISABLE_MAP_SYNC 27 /* disable THP for all VMAs */ > +#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) > +#define MMF_DISABLE_MAP_SYNC_MASK(1 << MMF_DISABLE_MAP_SYNC) > > -#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | > MMF_DUMP_FILTER_MASK |\ > - MMF_DISABLE_THP_MASK) > +#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | > MMF_DUMP_FILTER_MASK | \ > + MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK) > + > +static inline bool map_sync_enabled(struct mm_struct *mm) > +{ > + return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK); > +} > > #endif /* _LINUX_SCHED_COREDUMP_H */ > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index 07b4f8131e36..ee4cde32d5cf 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -238,4 +238,7 @@ struct prctl_mm_map { > #define PR_SET_IO_FLUSHER57 > #define PR_GET_IO_FLUSHER58 > > +#define PR_SET_MAP_SYNC_ENABLE 59 > +#define PR_GET_MAP_SYNC_ENABLE 60 > + > #endif /* _LINUX_PRCTL_H */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 8c700f881d92..d5a9a363e81e 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock); > > static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; > +#else > +unsigned long default_map_sync_mask = 0; > +#endif > + > static int __init coredump_filter_setup(char *s) > { > default_dump_filter = > @@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, > struct task_struct *p, > mm->flags = current->mm->flags & MMF_INIT_MASK; > mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; > } else { > - mm->flags = default_dump_filter; > + mm->flags = default_dump_filter | default_map_sync_mask; > mm->def_flags = 0; > } > > diff --git a/kernel/sys.c b/kernel/sys.c > index d325f3ab624a..f6127cf4128b 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, > arg2, unsigned long, arg3, > clear_bit(MMF_DISABLE_THP, >mm->flags); > up_write(>mm->mmap_sem); > break; > + > + case PR_GET_MAP_SYNC_ENABLE: > + if (arg2 || arg3 || arg4 || arg5) > + return -EINVAL; > + error = !test_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); > + break; > + case PR_SET_MAP_SYNC_ENABLE: > + if (arg3 || arg4 || arg5) > + return -EINVAL; > + if (down_write_killable(>mm->mmap_sem)) > + return -EINTR; > + if (arg2) > + clear_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); > + else > + set_bit(MMF_DISABLE_MAP_SYNC, >mm->flags); > + up_write(>mm->mmap_sem); > + break; > + > case PR_MPX_ENABLE_MANAGEMENT: > case PR_MPX_DISABLE_MANAGEMENT: > /* No longer implemented: */ > diff --git a/mm/Kconfig b/mm/Kconfig > index c1acc34c1c35..38fd7cfbfca8 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD > config MAPPING_DIRTY_HELPERS > bool > > +config ARCH_MAP_SYNC_DISABLE > +
Re: [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code
On 29/5/20 4:14 pm, Daniel Axtens wrote: syzkaller is picking up a bunch of crashes that look like this: Unrecoverable exception 380 at c037ed60 (msr=80001031) Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 NIP: c037ed60 LR: c004bac8 CTR: c0030990 REGS: c000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) MSR: 80001031 CR: 48222882 XER: 2000 CFAR: c004bac4 IRQMASK: 0 GPR00: c004bb68 c000555a74c0 c24b3500 0005 GPR04: c004bb88 c0080091 GPR08: 000b c004bac8 00016000 c2503500 GPR12: c0030990 c319 106a5898 106a GPR16: 106a5890 c7a92000 c8180e00 c7a8f700 GPR20: c7a904b0 1011 c259d318 5deadbeef100 GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778 GPR28: 0001 8280b033 c000555a75a0 NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50 LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310 Call Trace: [c000555a74c0] [c004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) [c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0 --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 .. That looks like the KCOV helper accessing memory that's not safe to access in the interrupt handling context. Do not instrument the new syscall entry/exit code with KCOV, GCOV or UBSAN. Cc: Nicholas Piggin Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Daniel Axtens This seems reasonable - I've verified that this does indeed suppress the kcov trace calls. Acked-by: Andrew Donnellan (does this need to be tagged for stable? the Fixes: commit is in 5.6 but we're at 5.7-rc7 at this point...) -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()
On Fri, May 29, 2020 at 07:41:01AM +, Luis Chamberlain wrote: > This simplifies the code considerably. The following coccinelle > SmPL grammar rule was used to transform this code. > > // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c > > @c1@ > expression E1; > identifier subdir, sysctls; > @@ > > static struct ctl_table subdir[] = { > { > .procname = E1, > .maxlen = 0, > .mode = 0555, > .child = sysctls, > }, > { } > }; > > @c2@ > identifier c1.subdir; > > expression E2; > identifier base; > @@ > > static struct ctl_table base[] = { > { > .procname = E2, > .maxlen = 0, > .mode = 0555, > .child = subdir, > }, > { } > }; > > @c3@ > identifier c2.base; > identifier header; > @@ > > header = register_sysctl_table(base); > > @r1 depends on c1 && c2 && c3@ > expression c1.E1; > identifier c1.subdir, c1.sysctls; > @@ > > -static struct ctl_table subdir[] = { > - { > - .procname = E1, > - .maxlen = 0, > - .mode = 0555, > - .child = sysctls, > - }, > - { } > -}; > > @r2 depends on c1 && c2 && c3@ > identifier c1.subdir; > > expression c2.E2; > identifier c2.base; > @@ > -static struct ctl_table base[] = { > - { > - .procname = E2, > - .maxlen = 0, > - .mode = 0555, > - .child = subdir, > - }, > - { } > -}; > > @r3 depends on c1 && c2 && c3@ > expression c1.E1; > identifier c1.sysctls; > expression c2.E2; > identifier c2.base; > identifier c3.header; > @@ > > header = > -register_sysctl_table(base); > +register_sysctl_subdir(E2, E1, sysctls); > > Generated-by: Coccinelle SmPL > > Signed-off-by: Luis Chamberlain > --- > fs/ocfs2/stackglue.c | 27 --- > 1 file changed, 4 insertions(+), 23 deletions(-) > > diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c > index a191094694c6..addafced7f59 100644 > --- a/fs/ocfs2/stackglue.c > +++ b/fs/ocfs2/stackglue.c > @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = { > }, > { } > }; > - > -static struct ctl_table ocfs2_kern_table[] = { > - { > - .procname = "ocfs2", > - .data = NULL, > - .maxlen = 0, > - .mode = 0555, > - .child = ocfs2_mod_table > - }, > - { } > -}; > - > -static struct ctl_table ocfs2_root_table[] = { > - { > - .procname = "fs", > - .data = NULL, > - .maxlen = 0, > - .mode = 0555, > - .child = ocfs2_kern_table > - }, > - { } > -}; > + .data = NULL, > + .data = NULL, The conversion script doesn't like the .data field assignments. ;) Was this series built with allmodconfig? I would have expected this to blow up very badly. :) -- Kees Cook
Re: [PATCH 12/13] sysctl: add helper to register empty subdir
On Fri, May 29, 2020 at 07:41:07AM +, Luis Chamberlain wrote: > The way to create a subdirectory from the base set of directories > is a bit obscure, so provide a helper which makes this clear, and > also helps remove boiler plate code required to do this work. > > Signed-off-by: Luis Chamberlain Reviewed-by: Kees Cook -- Kees Cook
Re: [PATCH 13/13] fs: move binfmt_misc sysctl to its own file
On Fri, May 29, 2020 at 07:41:08AM +, Luis Chamberlain wrote: > This moves the binfmt_misc sysctl to its own file to help remove > clutter from kernel/sysctl.c. > > Signed-off-by: Luis Chamberlain > --- > fs/binfmt_misc.c | 1 + > kernel/sysctl.c | 7 --- > 2 files changed, 1 insertion(+), 7 deletions(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index f69a043f562b..656b3f5f3bbf 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void) > int err = register_filesystem(_fs_type); > if (!err) > insert_binfmt(_format); > + register_sysctl_empty_subdir("fs", "binfmt_misc"); > return err; Nit: let's make the dir before registering the filesystem. I can't imagine a realistic situation where userspace is reacting so fast it would actually fail to mount the fs on /proc/sys/fs/binfmt_misc, but why risk it? -Kees > } > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 460532cd5ac8..7714e7b476c2 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -3042,13 +3042,6 @@ static struct ctl_table fs_table[] = { > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_TWO, > }, > -#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) > - { > - .procname = "binfmt_misc", > - .mode = 0555, > - .child = sysctl_mount_point, > - }, > -#endif > { > .procname = "pipe-max-size", > .data = _max_size, > -- > 2.26.2 > -- Kees Cook
Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper
On Fri, 29 May 2020, Luis Chamberlain wrote: > Often enough all we need to do is create a subdirectory so that > we can stuff sysctls underneath it. However, *if* that directory > was already created early on the boot sequence we really have no > need to use the full boiler plate code for it, we can just use > local variables to help us guide sysctl to place the new leaf files. I find it hard to figure out the lifetime requirements for the tables passed in; when it's okay to use local variables and when you need longer lifetimes. It's not documented, everyone appears to be using static tables for this. It's far from obvious. BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center
[PATCH 13/13] fs: move binfmt_misc sysctl to its own file
This moves the binfmt_misc sysctl to its own file to help remove clutter from kernel/sysctl.c. Signed-off-by: Luis Chamberlain --- fs/binfmt_misc.c | 1 + kernel/sysctl.c | 7 --- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index f69a043f562b..656b3f5f3bbf 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void) int err = register_filesystem(_fs_type); if (!err) insert_binfmt(_format); + register_sysctl_empty_subdir("fs", "binfmt_misc"); return err; } diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 460532cd5ac8..7714e7b476c2 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3042,13 +3042,6 @@ static struct ctl_table fs_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, -#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) - { - .procname = "binfmt_misc", - .mode = 0555, - .child = sysctl_mount_point, - }, -#endif { .procname = "pipe-max-size", .data = _max_size, -- 2.26.2
[PATCH 12/13] sysctl: add helper to register empty subdir
The way to create a subdirectory from the base set of directories is a bit obscure, so provide a helper which makes this clear, and also helps remove boiler plate code required to do this work. Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 7 +++ kernel/sysctl.c| 16 +--- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 33a471b56345..89c92390e6de 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -208,6 +208,8 @@ extern void register_sysctl_init(const char *path, struct ctl_table *table, extern struct ctl_table_header *register_sysctl_subdir(const char *base, const char *subdir, struct ctl_table *table); +extern void register_sysctl_empty_subdir(const char *base, const char *subdir); + void do_sysctl_args(void); extern int pwrsw_enabled; @@ -231,6 +233,11 @@ inline struct ctl_table_header *register_sysctl_subdir(const char *base, return NULL; } +static inline void register_sysctl_empty_subdir(const char *base, + const char *subdir) +{ +} + static inline struct ctl_table_header *register_sysctl_paths( const struct ctl_path *path, struct ctl_table *table) { diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f9a35325d5d5..460532cd5ac8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3188,13 +3188,17 @@ struct ctl_table_header *register_sysctl_subdir(const char *base, { } }; - if (!table->procname) + if (table != sysctl_mount_point && !table->procname) goto out; hdr = register_sysctl_table(base_table); if (unlikely(!hdr)) { - pr_err("failed when creating subdirectory sysctl %s/%s/%s\n", - base, subdir, table->procname); + if (table != sysctl_mount_point) + pr_err("failed when creating subdirectory sysctl %s/%s/%s\n", + base, subdir, table->procname); + else + pr_err("failed when creating empty subddirectory %s/%s\n", + base, subdir); goto out; } kmemleak_not_leak(hdr); @@ -3202,6 +3206,12 @@ struct ctl_table_header *register_sysctl_subdir(const char *base, return hdr; } EXPORT_SYMBOL_GPL(register_sysctl_subdir); + +void register_sysctl_empty_subdir(const char *base, + const char *subdir) +{ + register_sysctl_subdir(base, subdir, sysctl_mount_point); +} #endif /* CONFIG_SYSCTL */ /* * No sense putting this after each symbol definition, twice, -- 2.26.2
[PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()
From: Xiaoming Ni Move the firmware config sysctl table to fallback_table.c and use the new register_sysctl_subdir() helper. This removes the clutter from kernel/sysctl.c. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- drivers/base/firmware_loader/fallback.c | 4 drivers/base/firmware_loader/fallback.h | 11 ++ drivers/base/firmware_loader/fallback_table.c | 22 +-- include/linux/sysctl.h| 1 - kernel/sysctl.c | 7 -- 5 files changed, 35 insertions(+), 10 deletions(-) diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c index d9ac7296205e..8190653ae9a3 100644 --- a/drivers/base/firmware_loader/fallback.c +++ b/drivers/base/firmware_loader/fallback.c @@ -200,12 +200,16 @@ static struct class firmware_class = { int register_sysfs_loader(void) { + int ret = register_firmware_config_sysctl(); + if (ret != 0) + return ret; return class_register(_class); } void unregister_sysfs_loader(void) { class_unregister(_class); + unregister_firmware_config_sysctl(); } static ssize_t firmware_loading_show(struct device *dev, diff --git a/drivers/base/firmware_loader/fallback.h b/drivers/base/firmware_loader/fallback.h index 06f4577733a8..7d2cb5f6ceb8 100644 --- a/drivers/base/firmware_loader/fallback.h +++ b/drivers/base/firmware_loader/fallback.h @@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void); int register_sysfs_loader(void); void unregister_sysfs_loader(void); +#ifdef CONFIG_SYSCTL +extern int register_firmware_config_sysctl(void); +extern void unregister_firmware_config_sysctl(void); +#else +static inline int register_firmware_config_sysctl(void) +{ + return 0; +} +static inline void unregister_firmware_config_sysctl(void) { } +#endif /* CONFIG_SYSCTL */ + #else /* CONFIG_FW_LOADER_USER_HELPER */ static inline int firmware_fallback_sysfs(struct firmware *fw, const char *name, struct device *device, diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index 46a731dede6f..4234aa5ee5df 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = { EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE); #ifdef CONFIG_SYSCTL -struct ctl_table firmware_config_table[] = { +static struct ctl_table firmware_config_table[] = { { .procname = "force_sysfs_fallback", .data = _fallback_config.force_sysfs_fallback, @@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = { }, { } }; -#endif + +static struct ctl_table_header *hdr; +int register_firmware_config_sysctl(void) +{ + if (hdr) + return -EEXIST; + hdr = register_sysctl_subdir("kernel", "firmware_config", +firmware_config_table); + if (!hdr) + return -ENOMEM; + return 0; +} + +void unregister_firmware_config_sysctl(void) +{ + if (hdr) + unregister_sysctl_table(hdr); +} +#endif /* CONFIG_SYSCTL */ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 58bc978d4f03..aa01f54d0442 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -217,7 +217,6 @@ extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; extern struct ctl_table random_table[]; -extern struct ctl_table firmware_config_table[]; extern struct ctl_table epoll_table[]; #else /* CONFIG_SYSCTL */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 30c2d521502a..e007375c8a11 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2088,13 +2088,6 @@ static struct ctl_table kern_table[] = { .mode = 0555, .child = usermodehelper_table, }, -#ifdef CONFIG_FW_LOADER_USER_HELPER - { - .procname = "firmware_config", - .mode = 0555, - .child = firmware_config_table, - }, -#endif { .procname = "overflowuid", .data = , -- 2.26.2
[PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()
From: Xiaoming Ni Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c and use register_sysctl_subdir() to help remove the clutter out of kernel/sysctl.c. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- drivers/char/random.c | 14 -- include/linux/sysctl.h | 1 - kernel/sysctl.c| 5 - 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index a7cf6aa65908..73fd4b6e9c18 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int write, } static int sysctl_poolsize = INPUT_POOL_WORDS * 32; -extern struct ctl_table random_table[]; -struct ctl_table random_table[] = { +static struct ctl_table random_table[] = { { .procname = "poolsize", .data = _poolsize, @@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = { #endif { } }; + +/* + * rand_initialize() is called before sysctl_init(), + * so we cannot call register_sysctl_init() in rand_initialize() + */ +static int __init random_sysctls_init(void) +{ + register_sysctl_subdir("kernel", "random", random_table); + return 0; +} +device_initcall(random_sysctls_init); #endif /* CONFIG_SYSCTL */ struct batched_entropy { diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index e5364b69dd95..33a471b56345 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -216,7 +216,6 @@ extern int unaligned_dump_stack; extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; -extern struct ctl_table random_table[]; #else /* CONFIG_SYSCTL */ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 5c116904feb7..f9a35325d5d5 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2078,11 +2078,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = sysctl_max_threads, }, - { - .procname = "random", - .mode = 0555, - .child = random_table, - }, { .procname = "usermodehelper", .mode = 0555, -- 2.26.2
[PATCH 10/13] eventpoll: simplify sysctl declaration with register_sysctl_subdir()
From: Xiaoming Ni Move epoll_table sysctl to fs/eventpoll.c and remove the clutter out of kernel/sysctl.c by using register_sysctl_subdir().. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- fs/eventpoll.c | 10 +- include/linux/poll.h | 2 -- include/linux/sysctl.h | 1 - kernel/sysctl.c| 7 --- 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 12eebcdea9c8..957ebc9700e3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -299,7 +299,7 @@ static LIST_HEAD(tfile_check_list); static long long_zero; static long long_max = LONG_MAX; -struct ctl_table epoll_table[] = { +static struct ctl_table epoll_table[] = { { .procname = "max_user_watches", .data = _user_watches, @@ -311,6 +311,13 @@ struct ctl_table epoll_table[] = { }, { } }; + +static void __init epoll_sysctls_init(void) +{ + register_sysctl_subdir("fs", "epoll", epoll_table); +} +#else +#define epoll_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ static const struct file_operations eventpoll_fops; @@ -2422,6 +2429,7 @@ static int __init eventpoll_init(void) /* Allocates slab cache used to allocate "struct eppoll_entry" */ pwq_cache = kmem_cache_create("eventpoll_pwq", sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); + epoll_sysctls_init(); return 0; } diff --git a/include/linux/poll.h b/include/linux/poll.h index 1cdc32b1f1b0..a9e0e1c2d1f2 100644 --- a/include/linux/poll.h +++ b/include/linux/poll.h @@ -8,12 +8,10 @@ #include #include #include -#include #include #include #include -extern struct ctl_table epoll_table[]; /* for sysctl */ /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating additional memory. */ #ifdef __clang__ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index aa01f54d0442..e5364b69dd95 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -217,7 +217,6 @@ extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; extern struct ctl_table random_table[]; -extern struct ctl_table epoll_table[]; #else /* CONFIG_SYSCTL */ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e007375c8a11..5c116904feb7 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3001,13 +3001,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_EPOLL - { - .procname = "epoll", - .mode = 0555, - .child = epoll_table, - }, -#endif #endif { .procname = "protected_symlinks", -- 2.26.2
[PATCH 07/13] test_sysctl: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci lib/test_sysctl.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- lib/test_sysctl.c | 23 ++- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c index 84eaae22d3a6..b17581307756 100644 --- a/lib/test_sysctl.c +++ b/lib/test_sysctl.c @@ -128,26 +128,6 @@ static struct ctl_table test_table[] = { { } }; -static struct ctl_table test_sysctl_table[] = { - { - .procname = "test_sysctl", - .maxlen = 0, - .mode = 0555, - .child = test_table, - }, - { } -}; - -static struct ctl_table test_sysctl_root_table[] = { - { - .procname = "debug", - .maxlen = 0, - .mode = 0555, - .child = test_sysctl_table, - }, - { } -}; - static struct ctl_table_header *test_sysctl_header; static int __init test_sysctl_init(void) @@ -155,7 +135,8 @@ static int __init test_sysctl_init(void) test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL); if (!test_data.bitmap_0001) return -ENOMEM; - test_sysctl_header = register_sysctl_table(test_sysctl_root_table); + test_sysctl_header = register_sysctl_subdir("debug", "test_sysctl", + test_table); if (!test_sysctl_header) { kfree(test_data.bitmap_0001); return -ENOMEM; -- 2.26.2
[PATCH 08/13] inotify: simplify sysctl declaration with register_sysctl_subdir()
From: Xiaoming Ni move inotify_user sysctl to inotify_user.c and use the new register_sysctl_subdir() helper. Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain --- fs/notify/inotify/inotify_user.c | 11 ++- include/linux/inotify.h | 3 --- kernel/sysctl.c | 11 --- 3 files changed, 10 insertions(+), 15 deletions(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index f88bbcc9efeb..64859fbf8463 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -46,7 +46,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly; #include -struct ctl_table inotify_table[] = { +static struct ctl_table inotify_table[] = { { .procname = "max_user_instances", .data = _user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES], @@ -73,6 +73,14 @@ struct ctl_table inotify_table[] = { }, { } }; + +static void __init inotify_sysctls_init(void) +{ + register_sysctl_subdir("fs", "inotify", inotify_table); +} + +#else +#define inotify_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ static inline __u32 inotify_arg_to_mask(u32 arg) @@ -826,6 +834,7 @@ static int __init inotify_user_setup(void) inotify_max_queued_events = 16384; init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192; + inotify_sysctls_init(); return 0; } diff --git a/include/linux/inotify.h b/include/linux/inotify.h index 6a24905f6e1e..8d20caa1b268 100644 --- a/include/linux/inotify.h +++ b/include/linux/inotify.h @@ -7,11 +7,8 @@ #ifndef _LINUX_INOTIFY_H #define _LINUX_INOTIFY_H -#include #include -extern struct ctl_table inotify_table[]; /* for sysctl */ - #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \ IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \ IN_MOVED_TO | IN_CREATE | IN_DELETE | \ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 04ff032f2863..30c2d521502a 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -123,10 +123,6 @@ static const int maxolduid = 65535; static int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; -#ifdef CONFIG_INOTIFY_USER -#include -#endif - #ifdef CONFIG_PROC_SYSCTL /** @@ -3012,13 +3008,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_INOTIFY_USER - { - .procname = "inotify", - .mode = 0555, - .child = inotify_table, - }, -#endif #ifdef CONFIG_EPOLL { .procname = "epoll", -- 2.26.2
[PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- fs/ocfs2/stackglue.c | 27 --- 1 file changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index a191094694c6..addafced7f59 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = { }, { } }; - -static struct ctl_table ocfs2_kern_table[] = { - { - .procname = "ocfs2", - .data = NULL, - .maxlen = 0, - .mode = 0555, - .child = ocfs2_mod_table - }, - { } -}; - -static struct ctl_table ocfs2_root_table[] = { - { - .procname = "fs", - .data = NULL, - .maxlen = 0, - .mode = 0555, - .child = ocfs2_kern_table - }, - { } -}; + .data = NULL, + .data = NULL, static struct ctl_table_header *ocfs2_table_header; @@ -711,7 +691,8 @@ static int __init ocfs2_stack_glue_init(void) { strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB); - ocfs2_table_header = register_sysctl_table(ocfs2_root_table); + ocfs2_table_header = register_sysctl_subdir("fs", "ocfs2", + ocfs2_mod_table); if (!ocfs2_table_header) { printk(KERN_ERR "ocfs2 stack glue: unable to register sysctl\n"); -- 2.26.2
[PATCH 01/13] sysctl: add new register_sysctl_subdir() helper
Often enough all we need to do is create a subdirectory so that we can stuff sysctls underneath it. However, *if* that directory was already created early on the boot sequence we really have no need to use the full boiler plate code for it, we can just use local variables to help us guide sysctl to place the new leaf files. So use a helper to do precisely this. Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 11 +++ kernel/sysctl.c| 37 + 2 files changed, 48 insertions(+) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index ddaa06ddd852..58bc978d4f03 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -205,6 +205,9 @@ void unregister_sysctl_table(struct ctl_table_header * table); extern int sysctl_init(void); extern void register_sysctl_init(const char *path, struct ctl_table *table, const char *table_name); +extern struct ctl_table_header *register_sysctl_subdir(const char *base, + const char *subdir, + struct ctl_table *table); void do_sysctl_args(void); extern int pwrsw_enabled; @@ -223,6 +226,14 @@ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * return NULL; } +static +inline struct ctl_table_header *register_sysctl_subdir(const char *base, + const char *subdir, + struct ctl_table *table) +{ + return NULL; +} + static inline struct ctl_table_header *register_sysctl_paths( const struct ctl_path *path, struct ctl_table *table) { diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 008ac0576ae5..04ff032f2863 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3195,6 +3195,43 @@ void __init register_sysctl_init(const char *path, struct ctl_table *table, } kmemleak_not_leak(hdr); } + +struct ctl_table_header *register_sysctl_subdir(const char *base, + const char *subdir, + struct ctl_table *table) +{ + struct ctl_table_header *hdr = NULL; + struct ctl_table subdir_table[] = { + { + .procname = subdir, + .mode = 0555, + .child = table, + }, + { } + }; + struct ctl_table base_table[] = { + { + .procname = base, + .mode = 0555, + .child = subdir_table, + }, + { } + }; + + if (!table->procname) + goto out; + + hdr = register_sysctl_table(base_table); + if (unlikely(!hdr)) { + pr_err("failed when creating subdirectory sysctl %s/%s/%s\n", + base, subdir, table->procname); + goto out; + } + kmemleak_not_leak(hdr); +out: + return hdr; +} +EXPORT_SYMBOL_GPL(register_sysctl_subdir); #endif /* CONFIG_SYSCTL */ /* * No sense putting this after each symbol definition, twice, -- 2.26.2
[PATCH 05/13] macintosh/mac_hid.c: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci drivers/macintosh/mac_hid.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- drivers/macintosh/mac_hid.c | 25 ++--- 1 file changed, 2 insertions(+), 23 deletions(-) diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c index 28b8581b44dd..736d0e151716 100644 --- a/drivers/macintosh/mac_hid.c +++ b/drivers/macintosh/mac_hid.c @@ -239,33 +239,12 @@ static struct ctl_table mac_hid_files[] = { { } }; -/* dir in /proc/sys/dev */ -static struct ctl_table mac_hid_dir[] = { - { - .procname = "mac_hid", - .maxlen = 0, - .mode = 0555, - .child = mac_hid_files, - }, - { } -}; - -/* /proc/sys/dev itself, in case that is not there yet */ -static struct ctl_table mac_hid_root_dir[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = mac_hid_dir, - }, - { } -}; - static struct ctl_table_header *mac_hid_sysctl_header; static int __init mac_hid_init(void) { - mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir); + mac_hid_sysctl_header = register_sysctl_subdir("dev", "mac_hid", + mac_hid_files); if (!mac_hid_sysctl_header) return -ENOMEM; -- 2.26.2
[PATCH 04/13] i915: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci drivers/gpu/drm/i915/i915_perf.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- drivers/gpu/drm/i915/i915_perf.c | 22 +- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 665bb076e84d..52509b573794 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -4203,26 +4203,6 @@ static struct ctl_table oa_table[] = { {} }; -static struct ctl_table i915_root[] = { - { -.procname = "i915", -.maxlen = 0, -.mode = 0555, -.child = oa_table, -}, - {} -}; - -static struct ctl_table dev_root[] = { - { -.procname = "dev", -.maxlen = 0, -.mode = 0555, -.child = i915_root, -}, - {} -}; - /** * i915_perf_init - initialize i915-perf state on module bind * @i915: i915 device instance @@ -4383,7 +4363,7 @@ static int destroy_config(int id, void *p, void *data) void i915_perf_sysctl_register(void) { - sysctl_header = register_sysctl_table(dev_root); + sysctl_header = register_sysctl_subdir("dev", "i915", oa_table); } void i915_perf_sysctl_unregister(void) -- 2.26.2
[PATCH 02/13] cdrom: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci drivers/cdrom/cdrom.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- drivers/cdrom/cdrom.c | 23 ++- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index a0a7ae705de8..3c638f464cef 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -3719,26 +3719,6 @@ static struct ctl_table cdrom_table[] = { { } }; -static struct ctl_table cdrom_cdrom_table[] = { - { - .procname = "cdrom", - .maxlen = 0, - .mode = 0555, - .child = cdrom_table, - }, - { } -}; - -/* Make sure that /proc/sys/dev is there */ -static struct ctl_table cdrom_root_table[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = cdrom_cdrom_table, - }, - { } -}; static struct ctl_table_header *cdrom_sysctl_header; static void cdrom_sysctl_register(void) @@ -3748,7 +3728,8 @@ static void cdrom_sysctl_register(void) if (!atomic_add_unless(, 1, 1)) return; - cdrom_sysctl_header = register_sysctl_table(cdrom_root_table); + cdrom_sysctl_header = register_sysctl_subdir("dev", "cdrom", +cdrom_table); /* set the defaults */ cdrom_sysctl_settings.autoclose = autoclose; -- 2.26.2
[PATCH 03/13] hpet: use new sysctl subdir helper register_sysctl_subdir()
This simplifies the code considerably. The following coccinelle SmPL grammar rule was used to transform this code. // pycocci sysctl-subdir.cocci drivers/char/hpet.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; @@ header = -register_sysctl_table(base); +register_sysctl_subdir(E2, E1, sysctls); Generated-by: Coccinelle SmPL Signed-off-by: Luis Chamberlain --- drivers/char/hpet.c | 22 +- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c index ed3b7dab678d..169c970d5ff8 100644 --- a/drivers/char/hpet.c +++ b/drivers/char/hpet.c @@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = { {} }; -static struct ctl_table hpet_root[] = { - { -.procname = "hpet", -.maxlen = 0, -.mode = 0555, -.child = hpet_table, -}, - {} -}; - -static struct ctl_table dev_root[] = { - { -.procname = "dev", -.maxlen = 0, -.mode = 0555, -.child = hpet_root, -}, - {} -}; - static struct ctl_table_header *sysctl_header; /* @@ -1059,7 +1039,7 @@ static int __init hpet_init(void) if (result < 0) return -ENODEV; - sysctl_header = register_sysctl_table(dev_root); + sysctl_header = register_sysctl_subdir("dev", "hpet", hpet_table); result = acpi_bus_register_driver(_acpi_driver); if (result < 0) { -- 2.26.2
[PATCH 00/13] sysctl: spring cleaning
Me and Xiaoming are working on some kernel/sysctl.c spring cleaning. During a recent linux-next merge conflict it became clear that the kitchen sink on kernel/sysctl.c creates too many conflicts, and so we need to do away with stuffing everyone's knobs on this one file. This is part of that work. This is not expected to get merged yet, but since our delta is pretty considerable at this point, we need to piece meal this and collect reviews for what we have so far. This follows up on some of his recent work. This series focuses on a new helper to deal with subdirectories and empty subdirectories. The terminology that we will embrace will be that things like "fs", "kernel", "debug" are based directories, and directories underneath this are subdirectories. In this case, the cleanup ends up also trimming the amount of code we have for sysctls. If this seems reasonable we'll kdocify this a bit too. This code has been boot tested without issues, and I'm letting 0day do its thing to test against many kconfig builds. If you however spot any issues please let us know. Luis Chamberlain (9): sysctl: add new register_sysctl_subdir() helper cdrom: use new sysctl subdir helper register_sysctl_subdir() hpet: use new sysctl subdir helper register_sysctl_subdir() i915: use new sysctl subdir helper register_sysctl_subdir() macintosh/mac_hid.c: use new sysctl subdir helper register_sysctl_subdir() ocfs2: use new sysctl subdir helper register_sysctl_subdir() test_sysctl: use new sysctl subdir helper register_sysctl_subdir() sysctl: add helper to register empty subdir fs: move binfmt_misc sysctl to its own file Xiaoming Ni (4): inotify: simplify sysctl declaration with register_sysctl_subdir() firmware_loader: simplify sysctl declaration with register_sysctl_subdir() eventpoll: simplify sysctl declaration with register_sysctl_subdir() random: simplify sysctl declaration with register_sysctl_subdir() drivers/base/firmware_loader/fallback.c | 4 + drivers/base/firmware_loader/fallback.h | 11 +++ drivers/base/firmware_loader/fallback_table.c | 22 - drivers/cdrom/cdrom.c | 23 + drivers/char/hpet.c | 22 + drivers/char/random.c | 14 +++- drivers/gpu/drm/i915/i915_perf.c | 22 + drivers/macintosh/mac_hid.c | 25 +- fs/binfmt_misc.c | 1 + fs/eventpoll.c| 10 ++- fs/notify/inotify/inotify_user.c | 11 ++- fs/ocfs2/stackglue.c | 27 +- include/linux/inotify.h | 3 - include/linux/poll.h | 2 - include/linux/sysctl.h| 21 - kernel/sysctl.c | 84 +++ lib/test_sysctl.c | 23 + 17 files changed, 144 insertions(+), 181 deletions(-) -- 2.26.2
[PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code
syzkaller is picking up a bunch of crashes that look like this: Unrecoverable exception 380 at c037ed60 (msr=80001031) Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 NIP: c037ed60 LR: c004bac8 CTR: c0030990 REGS: c000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) MSR: 80001031 CR: 48222882 XER: 2000 CFAR: c004bac4 IRQMASK: 0 GPR00: c004bb68 c000555a74c0 c24b3500 0005 GPR04: c004bb88 c0080091 GPR08: 000b c004bac8 00016000 c2503500 GPR12: c0030990 c319 106a5898 106a GPR16: 106a5890 c7a92000 c8180e00 c7a8f700 GPR20: c7a904b0 1011 c259d318 5deadbeef100 GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778 GPR28: 0001 8280b033 c000555a75a0 NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50 LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310 Call Trace: [c000555a74c0] [c004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) [c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0 --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 .. That looks like the KCOV helper accessing memory that's not safe to access in the interrupt handling context. Do not instrument the new syscall entry/exit code with KCOV, GCOV or UBSAN. Cc: Nicholas Piggin Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Daniel Axtens --- be warned: I haven't attempted to reproduce the crash yet, nor have I been able to test that this fixes it. I will attempt to do that soon. Logically though, it does seem like this would be a good thing to do regardless. --- arch/powerpc/kernel/Makefile | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 1c4385852d3d..1d443a7dc8a7 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -156,12 +156,19 @@ obj-$(CONFIG_PPC_SECVAR_SYSFS)+= secvar-sysfs.o GCOV_PROFILE_prom_init.o := n KCOV_INSTRUMENT_prom_init.o := n UBSAN_SANITIZE_prom_init.o := n + GCOV_PROFILE_kprobes.o := n KCOV_INSTRUMENT_kprobes.o := n UBSAN_SANITIZE_kprobes.o := n + GCOV_PROFILE_kprobes-ftrace.o := n KCOV_INSTRUMENT_kprobes-ftrace.o := n UBSAN_SANITIZE_kprobes-ftrace.o := n + +GCOV_PROFILE_syscall_64.o := n +KCOV_INSTRUMENT_syscall_64.o := n +UBSAN_SANITIZE_syscall_64.o := n + UBSAN_SANITIZE_vdso.o := n # Necessary for booting with kcov enabled on book3e machines -- 2.20.1