ppc64le and 32-bit LE userland compatibility

2020-05-29 Thread Will Springer
Hey all, a couple of us over in #talos-workstation on freenode have been
working on an effort to bring up a Linux PowerPC userland that runs in 32-bit
little-endian mode, aka ppcle. As far as we can tell, no ABI has ever been
designated for this (unless you count the patchset from a decade ago [1]), so
it's pretty much uncharted territory as far as Linux is concerned. We want to
sync up with libc and the relevant kernel folks to establish the best path
forward.

The practical application that drove these early developments (as you might
expect) is x86 emulation. The box86 project [2] implements a translation layer
for ia32 library calls to native architecture ones; this way, emulation
overhead is significantly reduced by relying on native libraries where
possible (libc, libGL, etc.) instead of emulating an entire x86 userspace.
box86 is primarily targeted at ARM, but it can be adapted to other
architectures—so long as they match ia32's 32-bit, little-endian nature. Hence
the need for a ppcle userland; modern POWER brought ppc64le as a supported
configuration, but without a 32-bit equivalent there is no option for a 32/64
multilib environment, as seen with ppc/ppc64 and arm/aarch64.

Surprisingly, beyond minor patching of gcc to get crosscompile going,
bootstrapping the initial userland was not much of a problem. The work has
been done on top of the Void Linux PowerPC project [3], and much of that is
now present in its source package tree [4].

The first issue with running the userland came from the ppc32 signal handler 
forcing BE in the MSR, causing any 32LE process receiving a signal (such as a 
shell receiving SIGCHLD) to terminate with SIGILL. This was trivially patched, 
along with enabling the 32-bit vDSO on ppc64le kernels [5]. (Given that this 
behavior has been in place since 2006, I don't think anyone has been using the 
kernel in this state to run ppcle userlands.)

The next problem concerns the ABI more directly. The failure mode was `file`
surfacing EINVAL from pread64 when invoked on an ELF; pread64 was passed a
garbage value for `pos`, which didn't appear to be caused by anything in 
`file`. Initially it seemed as though the 32-bit components of the arg were
getting swapped, and we made hacky fixes to glibc and musl to put them in the
"right order"; however, we weren't sure if that was the correct approach, or
if there were knock-on effects we didn't know about. So we found the relevant
compat code path in the kernel, at arch/powerpc/kernel/sys_ppc32.c, where
there exists this comment:

> /*
>  * long long munging:
>  * The 32 bit ABI passes long longs in an odd even register pair.
>  */

It seems that the opposite is true in LE mode, and something is expecting long
longs to start on an even register. I realized this after I tried swapping hi/
lo `u32`s here and didn't see an improvement. I whipped up a patch [6] that
switches which syscalls use padding arguments depending on endianness, while
hopefully remaining tidy enough to be unobtrusive. (I took some liberties with
variable names/types so that the macro could be consistent.)

This was enough to fix up the `file` bug. I'm no seasoned kernel hacker,
though, and there is still concern over the right way to approach this,
whether it should live in the kernel or libc, etc. Frankly, I don't know the
ABI structure enough to understand why the register padding has to be
different in this case, or what lower-level component is responsible for it. 
For comparison, I had a look at the mips tree, since it's bi-endian and has a 
similar 32/64 situation. There is a macro conditional upon endianness that is 
responsible for munging long longs; it uses __MIPSEB__ and __MIPSEL__ instead 
of an if/else on the generic __LITTLE_ENDIAN__. Not sure what to make of that. 
(It also simply swaps registers for LE, unlike what I did for ppc.)

Also worth noting is the one other outstanding bug, where the time-related
syscalls in the 32-bit vDSO seem to return garbage. It doesn't look like an
endian bug to me, and it doesn't affect standard syscalls (which is why if you
run `date` on musl it prints the correct time, unlike on glibc). The vDSO time
functions are implemented in ppc asm (arch/powerpc/kernel/vdso32/
gettimeofday.S), and I've never touched the stuff, so if anyone has a clue I'm 
all ears.

Again, I'd appreciate feedback on the approach to take here, in order to 
touch/special-case only the minimum necessary, while keeping the kernel/libc 
folks happy.

Cheers,
Will [she/her]

(p.s. there is ancillary interest in a ppcle-native kernel as well; that's a 
good deal more work and not the focus of this message at all, but it is a 
topic of interest)

[1]: https://lwn.net/Articles/408845/
[2]: https://github.com/ptitSeb/box86
[3]: https://voidlinux-ppc.org/
[4]: https://github.com/void-ppc/void-packages
[5]: https://gist.github.com/eerykitty/01707dc6bca2be32b4c5e30d15d15dcf
[6]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c






Re: [PATCH v3 5/7] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.

2020-05-29 Thread kbuild test robot
Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.7-rc7]
[cannot apply to next-20200529]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Restrict-papr_scm-to-P8-and-above/20200519-195350
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r016-20200529 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
2d068e534f1671459e1b135852c1b3c10502e929)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

In file included from arch/powerpc/kernel/syscall_64.c:4:
In file included from arch/powerpc/include/asm/asm-prototypes.h:12:
>> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of 
>> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 
'mmu_has_feature'?
arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here
static inline bool mmu_has_feature(unsigned long feature)
^
In file included from arch/powerpc/kernel/syscall_64.c:4:
In file included from arch/powerpc/include/asm/asm-prototypes.h:17:
In file included from arch/powerpc/include/asm/mmu_context.h:12:
In file included from arch/powerpc/include/asm/cputhreads.h:7:
>> arch/powerpc/include/asm/cpu_has_feature.h:49:20: error: static declaration 
>> of 'cpu_has_feature' follows non-static declaration
static inline bool cpu_has_feature(unsigned long feature)
^
arch/powerpc/include/asm/cacheflush.h:126:6: note: previous implicit 
declaration is here
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
2 errors generated.
--
In file included from arch/powerpc/kernel/kprobes.c:23:
>> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of 
>> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 
'mmu_has_feature'?
arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here
static inline bool mmu_has_feature(unsigned long feature)
^
1 error generated.
--
In file included from arch/powerpc/kernel/optprobes.c:15:
>> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of 
>> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 
'mmu_has_feature'?
arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here
static inline bool mmu_has_feature(unsigned long feature)
^
arch/powerpc/kernel/optprobes.c:149:6: warning: no previous prototype for 
function 'patch_imm32_load_insns' [-Wmissing-prototypes]
void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
^
arch/powerpc/kernel/optprobes.c:149:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
^
static
arch/powerpc/kernel/optprobes.c:167:6: warning: no previous prototype for 
function 'patch_imm64_load_insns' [-Wmissing-prototypes]
void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr)
^
arch/powerpc/kernel/optprobes.c:167:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr)
^
static
2 warnings and 1 error generated.
--
In file included from arch/powerpc/kernel/epapr_paravirt.c:11:
>> arch/powerpc/include/asm/cacheflush.h:126:6: error: implicit declaration of 
>> function 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/include/asm/cacheflush.h:126:6: note: did you mean 
'mmu_has_feature'?
arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here
static inline bool mmu_has_feature(unsigned long feature)
^
In file included from arch/powerpc/kernel/epapr_parav

Re: [PATCH v3 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction

2020-05-29 Thread kbuild test robot
Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.7-rc7 next-20200529]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Restrict-papr_scm-to-P8-and-above/20200519-195350
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r016-20200529 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
2d068e534f1671459e1b135852c1b3c10502e929)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> arch/powerpc/lib/pmem.c:44:6: error: implicit declaration of function 
>> 'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/lib/pmem.c:44:6: note: did you mean 'mmu_has_feature'?
arch/powerpc/include/asm/mmu.h:234:20: note: 'mmu_has_feature' declared here
static inline bool mmu_has_feature(unsigned long feature)
^
arch/powerpc/lib/pmem.c:50:6: error: implicit declaration of function 
'cpu_has_feature' [-Werror,-Wimplicit-function-declaration]
if (cpu_has_feature(CPU_FTR_ARCH_207S))
^
arch/powerpc/lib/pmem.c:57:6: warning: no previous prototype for function 
'arch_wb_cache_pmem' [-Wmissing-prototypes]
void arch_wb_cache_pmem(void *addr, size_t size)
^
arch/powerpc/lib/pmem.c:57:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
void arch_wb_cache_pmem(void *addr, size_t size)
^
static
arch/powerpc/lib/pmem.c:64:6: warning: no previous prototype for function 
'arch_invalidate_pmem' [-Wmissing-prototypes]
void arch_invalidate_pmem(void *addr, size_t size)
^
arch/powerpc/lib/pmem.c:64:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
void arch_invalidate_pmem(void *addr, size_t size)
^
static
2 warnings and 2 errors generated.

vim +/cpu_has_feature +44 arch/powerpc/lib/pmem.c

41  
42  static inline void clean_pmem_range(unsigned long start, unsigned long 
stop)
43  {
  > 44  if (cpu_has_feature(CPU_FTR_ARCH_207S))
45  return __clean_pmem_range(start, stop);
46  }
47  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH net] drivers/net/ibmvnic: Update VNIC protocol version reporting

2020-05-29 Thread David Miller
From: Thomas Falcon 
Date: Thu, 28 May 2020 11:19:17 -0500

> VNIC protocol version is reported in big-endian format, but it
> is not byteswapped before logging. Fix that, and remove version
> comparison as only one protocol version exists at this time.
> 
> Signed-off-by: Thomas Falcon 

Applied, thanks.


Re: [RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error

2020-05-29 Thread David Miller
From: Peter Zijlstra 
Date: Fri, 29 May 2020 23:35:51 +0200

> ../arch/sparc/include/asm/percpu_64.h:7:24: warning: call-clobbered register 
> used for global register variable
> register unsigned long __local_per_cpu_offset asm("g5");

The "-ffixed-g5" option on the command line tells gcc that we are
using 'g5' as a fixed register, so some part of your build isn't using
the:

KBUILD_CFLAGS += -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare

from arch/sparc/Makefile for some reason.


[powerpc:next-test 160/198] arch/powerpc/platforms/powernv/pci-ioda.c:1889:13: warning: function 'pnv_ioda_setup_bus_dma' is not needed and will not be emitted

2020-05-29 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   e376ca093587eafd840bb0f9df04090e2a54249c
commit: dc3d8f85bb571c3640ebba24b82a527cf2cb3f24 [160/198] powerpc/powernv/pci: 
Re-work bus PE configuration
config: powerpc64-randconfig-r024-20200529 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
2d068e534f1671459e1b135852c1b3c10502e929)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
git checkout dc3d8f85bb571c3640ebba24b82a527cf2cb3f24
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> arch/powerpc/platforms/powernv/pci-ioda.c:1889:13: warning: function 
>> 'pnv_ioda_setup_bus_dma' is not needed and will not be emitted 
>> [-Wunneeded-internal-declaration]
static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
^
1 warning generated.

vim +/pnv_ioda_setup_bus_dma +1889 arch/powerpc/platforms/powernv/pci-ioda.c

fe7e85c6f5ff63 Gavin Shan 2014-09-30  1888  
5eada8a3f087df Alexey Kardashevskiy   2018-12-19 @1889  static void 
pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1890  {
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1891  struct pci_dev 
*dev;
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1892  
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1893  
list_for_each_entry(dev, >devices, bus_list) {
b348aa65297659 Alexey Kardashevskiy   2015-06-05  1894  
set_iommu_table_base(>dev, pe->table_group.tables[0]);
0617fc0ca412b5 Christoph Hellwig  2019-02-13  1895  
dev->dev.archdata.dma_offset = pe->tce_bypass_base;
dff4a39e880062 Gavin Shan 2014-07-15  1896  
5c89a87d13d168 Alexey Kardashevskiy   2015-06-18  1897  if 
((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
5eada8a3f087df Alexey Kardashevskiy   2018-12-19  1898  
pnv_ioda_setup_bus_dma(pe, dev->subordinate);
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1899  }
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1900  }
74251fe21bfa93 Benjamin Herrenschmidt 2013-07-01  1901  

:: The code at line 1889 was first introduced by commit
:: 5eada8a3f087df74af1c2797770a3e2249374fe1 powerpc/iommu_api: Move IOMMU 
groups setup to a single place

:: TO: Alexey Kardashevskiy 
:: CC: Michael Ellerman 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH v9 2/5] seq_buf: Export seq_buf_printf

2020-05-29 Thread Vaibhav Jain
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.

Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.

Cc: Piotr Maziarz 
Cc: Cezary Rojewski 
Cc: Christoph Hellwig 
Cc: Steven Rostedt 
Cc: Borislav Petkov 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v8..v9:
* None

v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
  modules' to 'kernel lodable modules'.

Resend:
* Added ack from Steven Rostedt

v6..v7:
* New patch in the series
---
 lib/seq_buf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
 
return ret;
 }
+EXPORT_SYMBOL_GPL(seq_buf_printf);
 
 #ifdef CONFIG_BINARY_PRINTF
 /**
-- 
2.26.2



[PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

2020-05-29 Thread Vaibhav Jain
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.

The patch also introduces a new member 'struct papr_scm_priv.health'
thats an instance of 'struct nd_papr_pdsm_health' to cache the health
information of a nvdimm. As a result functions drc_pmem_query_health()
and flags_show() are updated to populate and use this new struct
instead of a u64 integer that was earlier used.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.

v7..v8:
* None

Resend:
* None

v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
  __drc_pmem_query_health() bypassing the cache [Mpe].

v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
  gaurd against possibility of different compilers adding different
  paddings to the struct [ Dan Williams ]

* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
  'bool' and also updated drc_pmem_query_health() to take this into
  account. [ Dan Williams ]

v4..v5:
* None

v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
  papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]

v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
  as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
  from enum to #defines [Aneesh]

v1..v2:
* New patch in the series
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 125 +++---
 2 files changed, 147 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 6407fefcc007..411725a91591 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
  */
 enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+   PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
 };
 
@@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct 
nd_pdsm_cmd_pkg *pcmd)
return (void *)(pcmd->payload);
 }
 
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY   0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL  2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * dimm_unarmed: Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown   : Previous shutdown did not persist contents.
+ * dimm_bad_restore: Contents from previous shutdown werent restored.
+ * dimm_scrubbed   : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted  : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_
+ */
+struct nd_papr_pdsm_health_v1 {
+   __u8 dimm_unarmed;
+   __u8 dimm_bad_shutdown;
+   __u8 dimm_bad_restore;
+   __u8 dimm_scrubbed;
+   __u8 dimm_locked;
+   __u8 dimm_encrypted;
+   __u16 dimm_health;
+} __packed;
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_PDSM_HEALTH_VERSION 1
+
 #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 5e2237e7ec08..c0606c0c659c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -88,7 +88,7 @@ struct papr_scm_priv {
unsigned long lasthealth_jiffies;
 
/* Health information for the dimm */
-   u64 health_bitmap;
+   struct nd_papr_pdsm_health health;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
 static int __drc_pmem_query_health(struct papr_scm_priv *p)
 {
unsigned long ret[PLPAR_HCALL_BUFSIZE];
+   u64 health;
long rc;
 
 

[PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-05-29 Thread Vaibhav Jain
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.

The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.

A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
  replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.

v7..v8:
* Removed the 'payload_offset' field from 'struct
  nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
  at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
  'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
  [ Ira ]

Resend:
* None

v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
  [Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
  [Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]

v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
  ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
  to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
  this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x1 to 0x0
  [ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
  [ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
  against different compiler adding paddings between the fields.
  [ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]

v4..v5 :
* None

v3..v4 :
* None

v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]

v1..v2 :
* None
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++
 arch/powerpc/platforms/pseries/papr_scm.c | 101 +++-
 include/uapi/linux/ndctl.h|   1 +
 3 files changed, 232 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index ..6407fefcc007
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain 
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include 
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ *  +-+-+---+
+ *  |   64-Bytes  |   16-Bytes  |   Max 176-Bytes   |
+ *  +-+-+---+
+ *  |   nd_pdsm_cmd_pkg |   |
+ *  |-+ |   |
+ *  |  nd_cmd_pkg | |   |
+ *  +-+-+---+
+ *  | nd_family   | |   |
+ *  | 

[PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-29 Thread Vaibhav Jain
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.

The patch also adds a  documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.

[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v8..v9:
* Rename some variables and defines to reduce usage of term SCM
  replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem

v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
  drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
  bitmap/ [ Ira, Aneesh ].

Resend:
* None

v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
  'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
  and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
  documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
  care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
  READ_ONCE(). [Mpe]

v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
  [Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
  integer [Mpe]
* Updated patch description to reflect the changes made in this
  version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
  flags_show() [Dan Williams]

v4..v5 :
* None

v3..v4 :
* None

v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
 NVDIMM unarmed [Aneesh]

v1..v2 :
* New patch in the series.
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 169 +-
 2 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index ..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What:  /sys/bus/nd/devices/nmemX/papr/flags
+Date:  Apr, 2020
+KernelVersion: v5.8
+Contact:   linuxppc-dev , 
linux-nvd...@lists.01.org,
+Description:
+   (RO) Report flags indicating various states of a
+   papr-pmem NVDIMM device. Each flag maps to a one or
+   more bits set in the dimm-health-bitmap retrieved in
+   response to H_SCM_HEALTH hcall. The details of the bit
+   flags returned in response to this hcall is available
+   at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+   the flags reported in this sysfs file:
+
+   * "not_armed"   : Indicates that NVDIMM contents will not
+ survive a power cycle.
+   * "flush_fail"  : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+   * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+   * "encrypted"   : NVDIMM contents are encrypted.
+   * "smart_notify": There is health event for the NVDIMM.
+   * "scrubbed": Indicating that contents of the
+ NVDIMM have been scrubbed.
+   * "locked"  : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..149431594839 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ 

[PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall

2020-05-29 Thread Vaibhav Jain
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Signed-off-by: Vaibhav Jain 
---
Changelog:

v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]

v7..v8:
* Added a clarification on bit-ordering of Health Bitmap

Resend:
* None

v6..v7:
* None

v5..v6:
* New patch in the series
---
 Documentation/powerpc/papr_hcalls.rst | 46 ---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/powerpc/papr_hcalls.rst 
b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
 **H_SCM_HEALTH**
 
 | Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
 | Return Value: *H_Success, H_Parameter, H_Hardware*
 
 Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more 
states
+(described in table below) of the PMEM device and health-bit-valid-bitmap 
indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++--+---+
+|  Bit |   Definition  
|
++==+===+
+|  00  | PMEM device is unable to persist memory contents. 
|
+|  | If the system is powered down, nothing will be saved. 
|
++--+---+
+|  01  | PMEM device failed to persist memory contents. Either contents were   
|
+|  | not saved successfully on power down or were not restored properly on 
|
+|  | power up. 
|
++--+---+
+|  02  | PMEM device contents are persisted from previous IPL. The data from   
|
+|  | the last boot were successfully restored. 
|
++--+---+
+|  03  | PMEM device contents are not persisted from previous IPL. There was 
no|
+|  | data to restore from the last boot.   
|
++--+---+
+|  04  | PMEM device memory life remaining is critically low   
|
++--+---+
+|  05  | PMEM device will be garded off next IPL due to failure
|
++--+---+
+|  06  | PMEM device contents cannot persist due to current platform health
|
+|  | status. A hardware failure may prevent data from being saved or   
|
+|  | restored. 
|
++--+---+
+|  07  | PMEM device is unable to persist memory contents in certain 
conditions|
++--+---+
+|  08  | PMEM device is encrypted  
|
++--+---+
+|  09  | PMEM device has successfully completed a requested erase or secure
|
+|  | erase procedure.  
|
++--+---+
+|10:63 | Reserved / Unused 
|
++--+---+
 
 **H_SCM_PERFORMANCE_STATS**
 
-- 
2.26.2



[PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health

2020-05-29 Thread Vaibhav Jain
Changes since v8 [1]:

* Updated proposed changes to remove usage of term 'SCM' due to
  ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
  [ Aneesh ]
* Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* .
* Updates to various newly introduced identifiers in 'papr_scm.c'
  removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'.
* Renamed sysfs ABI document from sysfs-bus-papr-scm to
  sysfs-bus-papr-pmem.
* No behavioural changes from v8 [1].

[1] 
https://lore.kernel.org/linux-nvdimm/20200527041244.37821-1-vaib...@linux.ibm.com/
---

The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3].  Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.

To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3].  This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.

These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.

Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:

 # ndctl list -DH
[
  {
"dev":"nmem0",
"health":{
  "health_state":"fatal",
  "shutdown_state":"dirty"
}
  }
]

Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:

 # ndctl list -d nmem0 -H
[
  {
"dev":"nmem0",
"health":{
  "health_state":"ok",
  "shutdown_state":"clean"
}
  }
]

PAPR NVDIMM-Specific-Methods(PDSM)
==

PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.

Structure of the patch-set
==

The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.

Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.

Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.

Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.

References:
[2] "Power Architecture Platform Reference"
  https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
 ("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
 https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9

---

Vaibhav Jain (5):
  powerpc: Document details on H_SCM_HEALTH hcall
  seq_buf: Export seq_buf_printf
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
  powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++
 Documentation/powerpc/papr_hcalls.rst |  46 ++-
 arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 +
 arch/powerpc/platforms/pseries/papr_scm.c | 363 +-
 include/uapi/linux/ndctl.h|   1 +
 lib/seq_buf.c |   1 +
 6 files changed, 600 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
 create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h

-- 
2.26.2



[PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu variables

2020-05-29 Thread Peter Zijlstra
Ahmed and Sebastian wanted additional lockdep_assert*() macros and ran
into header hell.

Move the IRQ state into per-cpu variables, which removes the dependency on
task_struct, which is what generated the header-hell.

These patches are intended to go on top of:

 https://lkml.kernel.org/r/20200529212728.795169...@infradead.org

but should apply on current tip/master without much difficulty.

There are a few build fixes for Sparc64, PowerPC64 and s390. Especially the
Sparc one I'm not sure about.



[PATCH v3 2/5] powerpc64: Break asm/percpu.h vs spinlock_types.h dependency

2020-05-29 Thread Peter Zijlstra
In order to use  in lockdep.h, we need to make sure
asm/percpu.h does not itself depend on lockdep.

The below seems to make that so and builds powerpc64-defconfig +
PROVE_LOCKING.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/powerpc/include/asm/dtl.h |   52 +
 arch/powerpc/include/asm/lppaca.h  |   44 ---
 arch/powerpc/kernel/time.c |1 
 arch/powerpc/kvm/book3s_hv.c   |1 
 arch/powerpc/platforms/pseries/dtl.c   |1 
 arch/powerpc/platforms/pseries/lpar.c  |1 
 arch/powerpc/platforms/pseries/setup.c |1 
 arch/powerpc/platforms/pseries/svm.c   |1 
 8 files changed, 58 insertions(+), 44 deletions(-)

--- /dev/null
+++ b/arch/powerpc/include/asm/dtl.h
@@ -0,0 +1,52 @@
+#ifndef _ASM_POWERPC_DTL_H
+#define _ASM_POWERPC_DTL_H
+
+#include 
+#include 
+
+/*
+ * Layout of entries in the hypervisor's dispatch trace log buffer.
+ */
+struct dtl_entry {
+   u8  dispatch_reason;
+   u8  preempt_reason;
+   __be16  processor_id;
+   __be32  enqueue_to_dispatch_time;
+   __be32  ready_to_enqueue_time;
+   __be32  waiting_to_ready_time;
+   __be64  timebase;
+   __be64  fault_addr;
+   __be64  srr0;
+   __be64  srr1;
+};
+
+#define DISPATCH_LOG_BYTES 4096/* bytes per cpu */
+#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
+
+/*
+ * Dispatch trace log event enable mask:
+ *   0x1: voluntary virtual processor waits
+ *   0x2: time-slice preempts
+ *   0x4: virtual partition memory page faults
+ */
+#define DTL_LOG_CEDE   0x1
+#define DTL_LOG_PREEMPT0x2
+#define DTL_LOG_FAULT  0x4
+#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
+
+extern struct kmem_cache *dtl_cache;
+extern rwlock_t dtl_access_lock;
+
+/*
+ * When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
+ * reading from the dispatch trace log.  If other code wants to consume
+ * DTL entries, it can set this pointer to a function that will get
+ * called once for each DTL entry that gets processed.
+ */
+extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
+
+extern void register_dtl_buffer(int cpu);
+extern void alloc_dtl_buffers(unsigned long *time_limit);
+extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
+
+#endif /* _ASM_POWERPC_DTL_H */
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -42,7 +42,6 @@
  */
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -146,49 +145,6 @@ struct slb_shadow {
} save_area[SLB_NUM_BOLTED];
 } cacheline_aligned;
 
-/*
- * Layout of entries in the hypervisor's dispatch trace log buffer.
- */
-struct dtl_entry {
-   u8  dispatch_reason;
-   u8  preempt_reason;
-   __be16  processor_id;
-   __be32  enqueue_to_dispatch_time;
-   __be32  ready_to_enqueue_time;
-   __be32  waiting_to_ready_time;
-   __be64  timebase;
-   __be64  fault_addr;
-   __be64  srr0;
-   __be64  srr1;
-};
-
-#define DISPATCH_LOG_BYTES 4096/* bytes per cpu */
-#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
-
-/*
- * Dispatch trace log event enable mask:
- *   0x1: voluntary virtual processor waits
- *   0x2: time-slice preempts
- *   0x4: virtual partition memory page faults
- */
-#define DTL_LOG_CEDE   0x1
-#define DTL_LOG_PREEMPT0x2
-#define DTL_LOG_FAULT  0x4
-#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
-
-extern struct kmem_cache *dtl_cache;
-extern rwlock_t dtl_access_lock;
-
-/*
- * When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
- * reading from the dispatch trace log.  If other code wants to consume
- * DTL entries, it can set this pointer to a function that will get
- * called once for each DTL entry that gets processed.
- */
-extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
-
-extern void register_dtl_buffer(int cpu);
-extern void alloc_dtl_buffers(unsigned long *time_limit);
 extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
 
 #endif /* CONFIG_PPC_BOOK3S */
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* powerpc clocksource/clockevent code */
 
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -74,6 +74,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "book3s.h"
 
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
--- 

[RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error

2020-05-29 Thread Peter Zijlstra
In order to break a header dependency between lockdep and task_struct,
I need per-cpu stuff from lockdep.

Including  from lockdep.h gives a build error, this
patch cures that, but results in the following warning:

../arch/sparc/include/asm/percpu_64.h:7:24: warning: call-clobbered register 
used for global register variable
register unsigned long __local_per_cpu_offset asm("g5");

But i've no idea how to fix that :/ but it does build.

Not-Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/sparc/include/asm/trap_block.h |2 ++
 1 file changed, 2 insertions(+)

--- a/arch/sparc/include/asm/trap_block.h
+++ b/arch/sparc/include/asm/trap_block.h
@@ -2,6 +2,8 @@
 #ifndef _SPARC_TRAP_BLOCK_H
 #define _SPARC_TRAP_BLOCK_H
 
+#include 
+
 #include 
 #include 
 




[PATCH v3 5/5] lockdep: Remove lockdep_hardirq{s_enabled, _context}() argument

2020-05-29 Thread Peter Zijlstra
Now that the macros use per-cpu data, we no longer need the argument.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/entry/common.c|2 +-
 include/linux/irqflags.h   |8 
 include/linux/lockdep.h|2 +-
 kernel/locking/lockdep.c   |   30 +++---
 kernel/softirq.c   |2 +-
 tools/include/linux/irqflags.h |4 ++--
 6 files changed, 24 insertions(+), 24 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -689,7 +689,7 @@ noinstr void idtentry_exit_user(struct p
 
 noinstr bool idtentry_enter_nmi(struct pt_regs *regs)
 {
-   bool irq_state = lockdep_hardirqs_enabled(current);
+   bool irq_state = lockdep_hardirqs_enabled();
 
__nmi_enter();
lockdep_hardirqs_off(CALLER_ADDR0);
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -40,9 +40,9 @@ DECLARE_PER_CPU(int, hardirq_context);
   extern void trace_hardirqs_off_finish(void);
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
-# define lockdep_hardirq_context(p)(this_cpu_read(hardirq_context))
+# define lockdep_hardirq_context() (this_cpu_read(hardirq_context))
 # define lockdep_softirq_context(p)((p)->softirq_context)
-# define lockdep_hardirqs_enabled(p)   (this_cpu_read(hardirqs_enabled))
+# define lockdep_hardirqs_enabled()(this_cpu_read(hardirqs_enabled))
 # define lockdep_softirqs_enabled(p)   ((p)->softirqs_enabled)
 # define lockdep_hardirq_enter()   \
 do {   \
@@ -109,9 +109,9 @@ do {\
 # define trace_hardirqs_off_finish()   do { } while (0)
 # define trace_hardirqs_on()   do { } while (0)
 # define trace_hardirqs_off()  do { } while (0)
-# define lockdep_hardirq_context(p)0
+# define lockdep_hardirq_context() 0
 # define lockdep_softirq_context(p)0
-# define lockdep_hardirqs_enabled(p)   0
+# define lockdep_hardirqs_enabled()0
 # define lockdep_softirqs_enabled(p)   0
 # define lockdep_hardirq_enter()   do { } while (0)
 # define lockdep_hardirq_threaded()do { } while (0)
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -736,7 +736,7 @@ do {
\
 
 # define lockdep_assert_RT_in_threaded_ctx() do {  \
WARN_ONCE(debug_locks && !current->lockdep_recursion && \
- lockdep_hardirq_context(current) &&   \
+ lockdep_hardirq_context() &&  \
  !(current->hardirq_threaded || current->irq_config),  
\
  "Not in threaded context on PREEMPT_RT as 
expected\n");   \
 } while (0)
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_str
pr_warn("-\n");
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
curr->comm, task_pid_nr(curr),
-   lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
+   lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
-   lockdep_hardirqs_enabled(curr),
+   lockdep_hardirqs_enabled(),
curr->softirqs_enabled);
print_lock(next);
 
@@ -3331,9 +3331,9 @@ print_usage_bug(struct task_struct *curr
 
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
curr->comm, task_pid_nr(curr),
-   lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
+   lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
lockdep_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
-   lockdep_hardirqs_enabled(curr),
+   lockdep_hardirqs_enabled(),
lockdep_softirqs_enabled(curr));
print_lock(this);
 
@@ -3655,7 +3655,7 @@ void lockdep_hardirqs_on_prepare(unsigne
if (DEBUG_LOCKS_WARN_ON(current->lockdep_recursion & 
LOCKDEP_RECURSION_MASK))
return;
 
-   if (unlikely(lockdep_hardirqs_enabled(current))) {
+   if (unlikely(lockdep_hardirqs_enabled())) {
/*
 * Neither irq nor preemption are disabled here
 * so this is racy by nature but losing one hit
@@ -3683,7 +3683,7 @@ void lockdep_hardirqs_on_prepare(unsigne
 * Can't allow enabling interrupts while in an interrupt handler,
 * that's general bad form and such. Recursion, limited stack etc..
 */
-   if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context(current)))
+   if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()))
return;
 

[PATCH v3 4/5] lockdep: Change hardirq{s_enabled, _context} to per-cpu variables

2020-05-29 Thread Peter Zijlstra
Currently all IRQ-tracking state is in task_struct, this means that
task_struct needs to be defined before we use it.

Especially for lockdep_assert_irq*() this can lead to header-hell.

Move the hardirq state into per-cpu variables to avoid the task_struct
dependency.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/irqflags.h |   19 ---
 include/linux/lockdep.h  |   34 ++
 include/linux/sched.h|2 --
 kernel/fork.c|4 +---
 kernel/locking/lockdep.c |   30 +++---
 kernel/softirq.c |6 ++
 6 files changed, 52 insertions(+), 43 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 /* Currently lockdep_softirqs_on/off is used only by lockdep */
 #ifdef CONFIG_PROVE_LOCKING
@@ -31,18 +32,22 @@
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
+
+DECLARE_PER_CPU(int, hardirqs_enabled);
+DECLARE_PER_CPU(int, hardirq_context);
+
   extern void trace_hardirqs_on_prepare(void);
   extern void trace_hardirqs_off_finish(void);
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
-# define lockdep_hardirq_context(p)((p)->hardirq_context)
+# define lockdep_hardirq_context(p)(this_cpu_read(hardirq_context))
 # define lockdep_softirq_context(p)((p)->softirq_context)
-# define lockdep_hardirqs_enabled(p)   ((p)->hardirqs_enabled)
+# define lockdep_hardirqs_enabled(p)   (this_cpu_read(hardirqs_enabled))
 # define lockdep_softirqs_enabled(p)   ((p)->softirqs_enabled)
-# define lockdep_hardirq_enter()   \
-do {   \
-   if (!current->hardirq_context++)\
-   current->hardirq_threaded = 0;  \
+# define lockdep_hardirq_enter()   \
+do {   \
+   if (this_cpu_inc_return(hardirq_context) == 1)  \
+   current->hardirq_threaded = 0;  \
 } while (0)
 # define lockdep_hardirq_threaded()\
 do {   \
@@ -50,7 +55,7 @@ do {  \
 } while (0)
 # define lockdep_hardirq_exit()\
 do {   \
-   current->hardirq_context--; \
+   this_cpu_dec(hardirq_context);  \
 } while (0)
 # define lockdep_softirq_enter()   \
 do {   \
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -19,6 +19,7 @@ extern int lock_stat;
 
 #define MAX_LOCKDEP_SUBCLASSES 8UL
 
+#include 
 #include 
 
 enum lockdep_wait_type {
@@ -703,28 +704,29 @@ do {  
\
lock_release(&(lock)->dep_map, _THIS_IP_);  \
 } while (0)
 
-#define lockdep_assert_irqs_enabled()  do {\
-   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
- !current->hardirqs_enabled,   \
- "IRQs not enabled as expected\n");\
-   } while (0)
+DECLARE_PER_CPU(int, hardirqs_enabled);
+DECLARE_PER_CPU(int, hardirq_context);
 
-#define lockdep_assert_irqs_disabled() do {\
-   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
- current->hardirqs_enabled,\
- "IRQs not disabled as expected\n");   \
-   } while (0)
+#define lockdep_assert_irqs_enabled()  \
+do {   \
+   WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));  \
+} while (0)
 
-#define lockdep_assert_in_irq() do {   \
-   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
- !current->hardirq_context,\
- "Not in hardirq as expected\n");  \
-   } while (0)
+#define lockdep_assert_irqs_disabled() \
+do {   \
+   WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled));   \
+} while (0)
+
+#define lockdep_assert_in_irq()
\
+do {   \
+   WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));   \
+} while (0)
 
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
 # define might_lock_nested(lock, subclass) do { } while (0)
+
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { 

[PATCH v3 3/5] s390: Break cyclic percpu include

2020-05-29 Thread Peter Zijlstra
In order to use  in irqflags.h, we need to make sure
asm/percpu.h does not itself depend on irqflags.h

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/s390/include/asm/smp.h |1 +
 arch/s390/include/asm/thread_info.h |1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

--- a/arch/s390/include/asm/smp.h
+++ b/arch/s390/include/asm/smp.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 #define raw_smp_processor_id() (S390_lowcore.cpu_nr)
 
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -24,7 +24,6 @@
 #ifndef __ASSEMBLY__
 #include 
 #include 
-#include 
 
 #define STACK_INIT_OFFSET \
(THREAD_SIZE - STACK_FRAME_OVERHEAD - sizeof(struct pt_regs))




Re: [musl] ppc64le and 32-bit LE userland compatibility

2020-05-29 Thread Rich Felker
On Fri, May 29, 2020 at 07:03:48PM +, Will Springer wrote:
> The next problem concerns the ABI more directly. The failure mode was `file`
> surfacing EINVAL from pread64 when invoked on an ELF; pread64 was passed a
> garbage value for `pos`, which didn't appear to be caused by anything in 
> `file`. Initially it seemed as though the 32-bit components of the arg were
> getting swapped, and we made hacky fixes to glibc and musl to put them in the
> "right order"; however, we weren't sure if that was the correct approach, or
> if there were knock-on effects we didn't know about. So we found the relevant
> compat code path in the kernel, at arch/powerpc/kernel/sys_ppc32.c, where
> there exists this comment:
> 
> > /*
> >  * long long munging:
> >  * The 32 bit ABI passes long longs in an odd even register pair.
> >  */
> 
> It seems that the opposite is true in LE mode, and something is expecting long
> longs to start on an even register. I realized this after I tried swapping hi/
> lo `u32`s here and didn't see an improvement. I whipped up a patch [6] that
> switches which syscalls use padding arguments depending on endianness, while
> hopefully remaining tidy enough to be unobtrusive. (I took some liberties with
> variable names/types so that the macro could be consistent.)

The argument passing for pread/pwrite is historically a mess and
differs between archs. musl has a dedicated macro that archs can
define to override it. But it looks like it should match regardless of
BE vs LE, and musl already defines it for powerpc with the default
definition, adding a zero arg to start on an even arg-slot index,
which is an odd register (since ppc32 args start with an odd one, r3).

> [6]: https://gist.github.com/Skirmisher/02891c1a8cafa0ff18b2460933ef4f3c

I don't think this is correct, but I'm confused about where it's
getting messed up because it looks like it should already be right.

> This was enough to fix up the `file` bug. I'm no seasoned kernel hacker,
> though, and there is still concern over the right way to approach this,
> whether it should live in the kernel or libc, etc. Frankly, I don't know the
> ABI structure enough to understand why the register padding has to be
> different in this case, or what lower-level component is responsible for it.. 
> For comparison, I had a look at the mips tree, since it's bi-endian and has a 
> similar 32/64 situation. There is a macro conditional upon endianness that is 
> responsible for munging long longs; it uses __MIPSEB__ and __MIPSEL__ instead 
> of an if/else on the generic __LITTLE_ENDIAN__. Not sure what to make of 
> that. 
> (It also simply swaps registers for LE, unlike what I did for ppc.)

Indeed the problem is probably that you need to swap registers for LE,
not remove the padding slot. Did you check what happens if you pass a
value larger than 32 bits?

If so, the right way to fix this on the kernel side would be to
construct the value as a union rather than by bitwise ops so it's
endian-agnostic:

(union { u32 parts[2]; u64 val; }){{ arg1, arg2 }}.val

But the kernel folks might prefer endian ifdefs for some odd reason...

> Also worth noting is the one other outstanding bug, where the time-related
> syscalls in the 32-bit vDSO seem to return garbage. It doesn't look like an
> endian bug to me, and it doesn't affect standard syscalls (which is why if you
> run `date` on musl it prints the correct time, unlike on glibc). The vDSO time
> functions are implemented in ppc asm (arch/powerpc/kernel/vdso32/
> gettimeofday.S), and I've never touched the stuff, so if anyone has a clue 
> I'm 
> all ears.

Not sure about this. Worst-case, just leave it disabled until someone
finds a fix.

Rich


Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Dan Williams
On Fri, May 29, 2020 at 3:55 AM Aneesh Kumar K.V
 wrote:
>
> On 5/29/20 3:22 PM, Jan Kara wrote:
> > Hi!
> >
> > On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
> >> Thanks Michal. I also missed Jeff in this email thread.
> >
> > And I think you'll also need some of the sched maintainers for the prctl
> > bits...
> >
> >> On 5/29/20 3:03 PM, Michal Suchánek wrote:
> >>> Adding Jan
> >>>
> >>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
>  With POWER10, architecture is adding new pmem flush and sync 
>  instructions.
>  The kernel should prevent the usage of MAP_SYNC if applications are not 
>  using
>  the new instructions on newer hardware.
> 
>  This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
>  the usage of MAP_SYNC. The kernel config option is added to allow the 
>  user
>  to control whether MAP_SYNC should be enabled by default or not.
> 
>  Signed-off-by: Aneesh Kumar K.V 
> > ...
>  diff --git a/kernel/fork.c b/kernel/fork.c
>  index 8c700f881d92..d5a9a363e81e 100644
>  --- a/kernel/fork.c
>  +++ b/kernel/fork.c
>  @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp 
>  DEFINE_SPINLOCK(mmlist_lock);
> static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
>  +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
>  +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
>  +#else
>  +unsigned long default_map_sync_mask = 0;
>  +#endif
>  +
> >
> > I'm not sure CONFIG is really the right approach here. For a distro that 
> > would
> > basically mean to disable MAP_SYNC for all PPC kernels unless application
> > explicitly uses the right prctl. Shouldn't we rather initialize
> > default_map_sync_mask on boot based on whether the CPU we run on requires
> > new flush instructions or not? Otherwise the patch looks sensible.
> >
>
> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t
> POWER10. But on a virtualized platform there is no easy way to detect
> that. We could ideally hook this into the nvdimm driver where we look at
> the new compat string ibm,persistent-memory-v2 and then disable MAP_SYNC
> if we find a device with the specific value.
>
> BTW with the recent changes I posted for the nvdimm driver, older kernel
> won't initialize persistent memory device on newer hardware. Newer
> hardware will present the device to OS with a different device tree
> compat string.
>
> My expectation  w.r.t this patch was, Distro would want to  mark
> CONFIG_ARCH_MAP_SYNC_DISABLE=n based on the different application
> certification.  Otherwise application will have to end up calling the
> prctl(MMF_DISABLE_MAP_SYNC, 0) any way. If that is the case, should this
> be dependent on P10?
>
> With that I am wondering should we even have this patch? Can we expect
> userspace get updated to use new instruction?.
>
> With ppc64 we never had a real persistent memory device available for
> end user to try. The available persistent memory stack was using vPMEM
> which was presented as a volatile memory region for which there is no
> need to use any of the flush instructions. We could safely assume that
> as we get applications certified/verified for working with pmem device
> on ppc64, they would all be using the new instructions?

I think prctl is the wrong interface for this. I was thinking a sysfs
interface along the same lines as /sys/block/pmemX/dax/write_cache.
That attribute is toggling DAXDEV_WRITE_CACHE for the determination of
whether the platform or the kernel needs to handle cache flushing
relative to power loss. A similar attribute can be established for
DAXDEV_SYNC, it would simply default to off based on a configuration
time policy, but be dynamically changeable at runtime via sysfs.

These flags are device properties that affect the kernel and
userspace's handling of persistence.


Re: [PATCH v8 0/8] powerpc: switch VDSO to C implementation

2020-05-29 Thread Christophe Leroy

Hi Michael,

Le 28/04/2020 à 15:16, Christophe Leroy a écrit :

This is the seventh version of a series to switch powerpc VDSO to
generic C implementation.

Main changes since v7 are:
- Added gettime64 on PPC32

This series applies on today's powerpc/merge branch.

See the last patches for details on changes and performance.


Do you have any plans for this series ?

Even if you don't feel like merging it this cycle, I think patches 1 to 
3 are worth it.


Christophe



Christophe Leroy (8):
   powerpc/vdso64: Switch from __get_datapage() to get_datapage inline
 macro
   powerpc/vdso: Remove __kernel_datapage_offset and simplify
 __get_datapage()
   powerpc/vdso: Remove unused \tmp param in __get_datapage()
   powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
   powerpc/vdso: Prepare for switching VDSO to generic C implementation.
   powerpc/vdso: Switch VDSO to generic C implementation.
   lib/vdso: force inlining of __cvdso_clock_gettime_common()
   powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

  arch/powerpc/Kconfig |   2 +
  arch/powerpc/include/asm/clocksource.h   |   7 +
  arch/powerpc/include/asm/processor.h |  10 +-
  arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
  arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
  arch/powerpc/include/asm/vdso/processor.h|  23 ++
  arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
  arch/powerpc/include/asm/vdso_datapage.h |  50 ++--
  arch/powerpc/kernel/asm-offsets.c|  49 +--
  arch/powerpc/kernel/time.c   |  91 +-
  arch/powerpc/kernel/vdso.c   |  58 +---
  arch/powerpc/kernel/vdso32/Makefile  |  32 +-
  arch/powerpc/kernel/vdso32/cacheflush.S  |   2 +-
  arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
  arch/powerpc/kernel/vdso32/datapage.S|   7 +-
  arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +--
  arch/powerpc/kernel/vdso32/vdso32.lds.S  |   8 +-
  arch/powerpc/kernel/vdso32/vgettimeofday.c   |  35 +++
  arch/powerpc/kernel/vdso64/Makefile  |  23 +-
  arch/powerpc/kernel/vdso64/cacheflush.S  |   9 +-
  arch/powerpc/kernel/vdso64/datapage.S|  31 +-
  arch/powerpc/kernel/vdso64/gettimeofday.S| 243 +--
  arch/powerpc/kernel/vdso64/vdso64.lds.S  |   7 +-
  arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
  lib/vdso/gettimeofday.c  |   2 +-
  25 files changed, 460 insertions(+), 799 deletions(-)
  create mode 100644 arch/powerpc/include/asm/clocksource.h
  create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
  create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
  create mode 100644 arch/powerpc/include/asm/vdso/processor.h
  create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
  create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
  create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
  create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c



Re: [PATCH] powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG

2020-05-29 Thread Christophe Leroy




Le 29/05/2020 à 20:50, Christophe Leroy a écrit :

From: Christophe Leroy 

'thread' doesn't exist in kuap_check() macro.

Use 'current' instead.

Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access 
Protection")
Signed-off-by: Christophe Leroy 


Argh, can you drop this line ?


Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 


Reported-by: kbuild test robot 


---
  arch/powerpc/include/asm/book3s/32/kup.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index db0a1c281587..668508c8a1b5 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -75,7 +75,7 @@
  
  .macro kuap_check	current, gpr

  #ifdef CONFIG_PPC_KUAP_DEBUG
-   lwz \gpr, KUAP(thread)
+   lwz \gpr, THREAD + KUAP(\current)
  999:  twnei   \gpr, 0
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
  #endif



Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message

2020-05-29 Thread Markus Elfring
> Please just remove the message instead, it's a tiny allocation that's
> unlikely to ever fail, and the caller will print an error anyway.

How do you think about to take another look at a previous update suggestion
like the following?

powerpc/nvram: Delete three error messages for a failed memory allocation
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
https://lore.kernel.org/linuxppc-dev/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
https://lore.kernel.org/patchwork/patch/752720/
https://lkml.org/lkml/2017/1/19/537

Regards,
Markus


[PATCH] powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG

2020-05-29 Thread Christophe Leroy
From: Christophe Leroy 

'thread' doesn't exist in kuap_check() macro.

Use 'current' instead.

Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access 
Protection")
Signed-off-by: Christophe Leroy 
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/kup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index db0a1c281587..668508c8a1b5 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -75,7 +75,7 @@
 
 .macro kuap_check  current, gpr
 #ifdef CONFIG_PPC_KUAP_DEBUG
-   lwz \gpr, KUAP(thread)
+   lwz \gpr, THREAD + KUAP(\current)
 999:   twnei   \gpr, 0
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
 #endif
-- 
2.25.0



Re: [PATCH] ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed

2020-05-29 Thread Mark Brown
On Mon, 25 May 2020 22:12:46 +0800, Xiyu Yang wrote:
> fsl_asrc_dma_hw_params() invokes dma_request_channel() or
> fsl_asrc_get_dma_channel(), which returns a reference of the specified
> dma_chan object to "pair->dma_chan[dir]" with increased refcnt.
> 
> The reference counting issue happens in one exception handling path of
> fsl_asrc_dma_hw_params(). When config DMA channel failed for Back-End,
> the function forgets to decrease the refcnt increased by
> dma_request_channel() or fsl_asrc_get_dma_channel(), causing a refcnt
> leak.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed
  commit: 36124fb19f1ae68a500cd76a76d40c6e81bee346

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [Intel-gfx] [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Kees Cook
On Fri, May 29, 2020 at 11:49:12AM +, Luis Chamberlain wrote:
> Yikes, sense, you're right. Nope, I left the random config tests to
> 0day. Will fix, thanks!

Yeah, I do the same for randconfig, but I always do an "allmodconfig"
build before sending stuff. It's a good smoke test.

-- 
Kees Cook


Re: [PATCH 12/13] sysctl: add helper to register empty subdir

2020-05-29 Thread Eric W. Biederman
Luis Chamberlain  writes:

> The way to create a subdirectory from the base set of directories
> is a bit obscure, so provide a helper which makes this clear, and
> also helps remove boiler plate code required to do this work.

I agreee calling:
register_sysctl("fs/binfmt_misc", sysctl_mount_point)
is a bit obscure but if you are going to make a wrapper
please make it the trivial one liner above.

Say something that looks like:
struct sysctl_header *register_sysctl_mount_point(const char *path)
{
return register_sysctl(path, sysctl_mount_point);
}

And yes please talk about a mount point and not an empty dir, as these
are permanently empty directories to serve as mount points.  There are
some subtle but important permission checks this allows in the case of
unprivileged mounts.

Further code like this belong in proc_sysctl.c next to all of the code
it is related to so that it is easier to see how to refactor the code if
necessary.

Eric

>
> Signed-off-by: Luis Chamberlain 
> ---
>  include/linux/sysctl.h |  7 +++
>  kernel/sysctl.c| 16 +---
>  2 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 33a471b56345..89c92390e6de 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -208,6 +208,8 @@ extern void register_sysctl_init(const char *path, struct 
> ctl_table *table,
>  extern struct ctl_table_header *register_sysctl_subdir(const char *base,
>  const char *subdir,
>  struct ctl_table *table);
> +extern void register_sysctl_empty_subdir(const char *base, const char 
> *subdir);
> +
>  void do_sysctl_args(void);
>  
>  extern int pwrsw_enabled;
> @@ -231,6 +233,11 @@ inline struct ctl_table_header 
> *register_sysctl_subdir(const char *base,
>   return NULL;
>  }
>  
> +static inline void register_sysctl_empty_subdir(const char *base,
> + const char *subdir)
> +{
> +}
> +
>  static inline struct ctl_table_header *register_sysctl_paths(
>   const struct ctl_path *path, struct ctl_table *table)
>  {
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f9a35325d5d5..460532cd5ac8 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -3188,13 +3188,17 @@ struct ctl_table_header *register_sysctl_subdir(const 
> char *base,
>   { }
>   };
>  
> - if (!table->procname)
> + if (table != sysctl_mount_point && !table->procname)
>   goto out;
>  
>   hdr = register_sysctl_table(base_table);
>   if (unlikely(!hdr)) {
> - pr_err("failed when creating subdirectory sysctl %s/%s/%s\n",
> -base, subdir, table->procname);
> + if (table != sysctl_mount_point)
> + pr_err("failed when creating subdirectory sysctl 
> %s/%s/%s\n",
> +base, subdir, table->procname);
> + else
> + pr_err("failed when creating empty subddirectory 
> %s/%s\n",
> +base, subdir);
>   goto out;
>   }
>   kmemleak_not_leak(hdr);
> @@ -3202,6 +3206,12 @@ struct ctl_table_header *register_sysctl_subdir(const 
> char *base,
>   return hdr;
>  }
>  EXPORT_SYMBOL_GPL(register_sysctl_subdir);
> +
> +void register_sysctl_empty_subdir(const char *base,
> +   const char *subdir)
> +{
> + register_sysctl_subdir(base, subdir, sysctl_mount_point);
> +}
>  #endif /* CONFIG_SYSCTL */
>  /*
>   * No sense putting this after each symbol definition, twice,


Re: [PATCH 02/13] cdrom: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Eric W. Biederman
Luis Chamberlain  writes:

> This simplifies the code considerably. The following coccinelle

With register_sysctl the code would read:

cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table);

Please go that direction.  Thank you.

Eric


Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Luis Chamberlain  writes:
>
>> Often enough all we need to do is create a subdirectory so that
>> we can stuff sysctls underneath it. However, *if* that directory
>> was already created early on the boot sequence we really have no
>> need to use the full boiler plate code for it, we can just use
>> local variables to help us guide sysctl to place the new leaf files.
>>
>> So use a helper to do precisely this.
>
> Reset restart.  This is patch is total nonsense.
>
> - You are using register_sysctl_table which as I believe I have
>   mentioned is a deprecated compatibility wrapper.  The point of
>   spring house cleaning is to get off of the deprecated functions
>   isn't it?
>
> - You are using the old nasty form for creating directories instead
>   of just passing in a path.
>
> - None of this is even remotely necessary.  The directories
>   are created automatically if you just register their entries.

Oh.  *blink*  The poor naming threw me off.

This is a clumsy and poorly named version of register_sysctl();

Yes. This change is totally unnecessary.

Eric



Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Eric W. Biederman
Luis Chamberlain  writes:

> Often enough all we need to do is create a subdirectory so that
> we can stuff sysctls underneath it. However, *if* that directory
> was already created early on the boot sequence we really have no
> need to use the full boiler plate code for it, we can just use
> local variables to help us guide sysctl to place the new leaf files.
>
> So use a helper to do precisely this.

Reset restart.  This is patch is total nonsense.

- You are using register_sysctl_table which as I believe I have
  mentioned is a deprecated compatibility wrapper.  The point of
  spring house cleaning is to get off of the deprecated functions
  isn't it?

- You are using the old nasty form for creating directories instead
  of just passing in a path.

- None of this is even remotely necessary.  The directories
  are created automatically if you just register their entries.

Eric


Re: [PATCH v3 3/3] arch, scripts: Add script to check relocations at compile time

2020-05-29 Thread Anup Patel
On Sun, May 24, 2020 at 2:26 PM Alexandre Ghiti  wrote:
>
> Relocating kernel at runtime is done very early in the boot process, so
> it is not convenient to check for relocations there and react in case a
> relocation was not expected.
>
> Powerpc architecture has a script that allows to check at compile time
> for such unexpected relocations: extract the common logic to scripts/
> and add arch specific scripts triggered at postlink.
>
> At the moment, powerpc and riscv architectures take advantage of this
> compile-time check.
>
> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/powerpc/tools/relocs_check.sh | 18 ++-
>  arch/riscv/Makefile.postlink   | 36 ++
>  arch/riscv/tools/relocs_check.sh   | 26 +
>  scripts/relocs_check.sh| 20 +
>  4 files changed, 84 insertions(+), 16 deletions(-)
>  create mode 100644 arch/riscv/Makefile.postlink
>  create mode 100755 arch/riscv/tools/relocs_check.sh
>  create mode 100755 scripts/relocs_check.sh

Maybe you should send the change arch/powerpc/tools/relocs_check.sh
as a separate patch so that it can be picked up by arch/powerpc maintainers.

>
> diff --git a/arch/powerpc/tools/relocs_check.sh 
> b/arch/powerpc/tools/relocs_check.sh
> index 014e00e74d2b..e367895941ae 100755
> --- a/arch/powerpc/tools/relocs_check.sh
> +++ b/arch/powerpc/tools/relocs_check.sh
> @@ -15,21 +15,8 @@ if [ $# -lt 3 ]; then
> exit 1
>  fi
>
> -# Have Kbuild supply the path to objdump and nm so we handle cross 
> compilation.
> -objdump="$1"
> -nm="$2"
> -vmlinux="$3"
> -
> -# Remove from the bad relocations those that match an undefined weak symbol
> -# which will result in an absolute relocation to 0.
> -# Weak unresolved symbols are of that form in nm output:
> -# "  w _binary__btf_vmlinux_bin_end"
> -undef_weak_symbols=$($nm "$vmlinux" | awk '$1 ~ /w/ { print $2 }')
> -
>  bad_relocs=$(
> -$objdump -R "$vmlinux" |
> -   # Only look at relocation lines.
> -   grep -E '\ +${srctree}/scripts/relocs_check.sh "$@" |
> # These relocations are okay
> # On PPC64:
> #   R_PPC64_RELATIVE, R_PPC64_NONE
> @@ -43,8 +30,7 @@ R_PPC_ADDR16_LO
>  R_PPC_ADDR16_HI
>  R_PPC_ADDR16_HA
>  R_PPC_RELATIVE
> -R_PPC_NONE' |
> -   ([ "$undef_weak_symbols" ] && grep -F -w -v "$undef_weak_symbols" || 
> cat)
> +R_PPC_NONE'
>  )
>
>  if [ -z "$bad_relocs" ]; then
> diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink
> new file mode 100644
> index ..bf2b2bca1845
> --- /dev/null
> +++ b/arch/riscv/Makefile.postlink
> @@ -0,0 +1,36 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# ===
> +# Post-link riscv pass
> +# ===
> +#
> +# Check that vmlinux relocations look sane
> +
> +PHONY := __archpost
> +__archpost:
> +
> +-include include/config/auto.conf
> +include scripts/Kbuild.include
> +
> +quiet_cmd_relocs_check = CHKREL  $@
> +cmd_relocs_check = \
> +   $(CONFIG_SHELL) $(srctree)/arch/riscv/tools/relocs_check.sh 
> "$(OBJDUMP)" "$(NM)" "$@"
> +
> +# `@true` prevents complaint when there is nothing to be done
> +
> +vmlinux: FORCE
> +   @true
> +ifdef CONFIG_RELOCATABLE
> +   $(call if_changed,relocs_check)
> +endif
> +
> +%.ko: FORCE
> +   @true
> +
> +clean:
> +   @true
> +
> +PHONY += FORCE clean
> +
> +FORCE:
> +
> +.PHONY: $(PHONY)
> diff --git a/arch/riscv/tools/relocs_check.sh 
> b/arch/riscv/tools/relocs_check.sh
> new file mode 100755
> index ..baeb2e7b2290
> --- /dev/null
> +++ b/arch/riscv/tools/relocs_check.sh
> @@ -0,0 +1,26 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +# Based on powerpc relocs_check.sh
> +
> +# This script checks the relocations of a vmlinux for "suspicious"
> +# relocations.
> +
> +if [ $# -lt 3 ]; then
> +echo "$0 [path to objdump] [path to nm] [path to vmlinux]" 1>&2
> +exit 1
> +fi
> +
> +bad_relocs=$(
> +${srctree}/scripts/relocs_check.sh "$@" |
> +   # These relocations are okay
> +   #   R_RISCV_RELATIVE
> +   grep -F -w -v 'R_RISCV_RELATIVE'
> +)
> +
> +if [ -z "$bad_relocs" ]; then
> +   exit 0
> +fi
> +
> +num_bad=$(echo "$bad_relocs" | wc -l)
> +echo "WARNING: $num_bad bad relocations"
> +echo "$bad_relocs"
> diff --git a/scripts/relocs_check.sh b/scripts/relocs_check.sh
> new file mode 100755
> index ..137c660499f3
> --- /dev/null
> +++ b/scripts/relocs_check.sh
> @@ -0,0 +1,20 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +# Get a list of all the relocations, remove from it the relocations
> +# that are known to be legitimate and return this list to arch specific
> +# script that will look for suspicious relocations.
> +
> +objdump="$1"
> +nm="$2"
> +vmlinux="$3"
> +
> +# Remove 

Re: [PATCH v3 2/3] riscv: Introduce CONFIG_RELOCATABLE

2020-05-29 Thread Anup Patel
On Sun, May 24, 2020 at 2:25 PM Alexandre Ghiti  wrote:
>
> This config allows to compile the kernel as PIE and to relocate it at
> any virtual address at runtime: this paves the way to KASLR and to 4-level
> page table folding at runtime. Runtime relocation is possible since
> relocation metadata are embedded into the kernel.
>
> Note that relocating at runtime introduces an overhead even if the
> kernel is loaded at the same address it was linked at and that the compiler
> options are those used in arm64 which uses the same RELA relocation
> format.
>
> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/riscv/Kconfig  | 12 +++
>  arch/riscv/Makefile |  5 ++-
>  arch/riscv/kernel/vmlinux.lds.S |  6 ++--
>  arch/riscv/mm/Makefile  |  4 +++
>  arch/riscv/mm/init.c| 63 +
>  5 files changed, 87 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index a31e1a41913a..93127d5913fe 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -170,6 +170,18 @@ config PGTABLE_LEVELS
> default 3 if 64BIT
> default 2
>
> +config RELOCATABLE
> +   bool
> +   depends on MMU
> +   help
> +  This builds a kernel as a Position Independent Executable (PIE),
> +  which retains all relocation metadata required to relocate the
> +  kernel binary at runtime to a different virtual address than the
> +  address it was linked at.
> +  Since RISCV uses the RELA relocation format, this requires a
> +  relocation pass at runtime even if the kernel is loaded at the
> +  same address it was linked at.
> +
>  source "arch/riscv/Kconfig.socs"
>
>  menu "Platform type"
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index fb6e37db836d..1406416ea743 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -9,7 +9,10 @@
>  #
>
>  OBJCOPYFLAGS:= -O binary
> -LDFLAGS_vmlinux :=
> +ifeq ($(CONFIG_RELOCATABLE),y)
> +LDFLAGS_vmlinux := -shared -Bsymbolic -z notext -z norelro
> +KBUILD_CFLAGS += -fPIE
> +endif
>  ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> LDFLAGS_vmlinux := --no-relax
>  endif
> diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
> index a9abde62909f..e8ffba8c2044 100644
> --- a/arch/riscv/kernel/vmlinux.lds.S
> +++ b/arch/riscv/kernel/vmlinux.lds.S
> @@ -85,8 +85,10 @@ SECTIONS
>
> BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0)
>
> -   .rel.dyn : {
> -   *(.rel.dyn*)
> +   .rela.dyn : ALIGN(8) {
> +   __rela_dyn_start = .;
> +   *(.rela .rela*)
> +   __rela_dyn_end = .;
> }
>
> _end = .;
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index 363ef01c30b1..dc5cdaa80bc1 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -1,6 +1,10 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>
>  CFLAGS_init.o := -mcmodel=medany
> +ifdef CONFIG_RELOCATABLE
> +CFLAGS_init.o += -fno-pie
> +endif
> +
>  ifdef CONFIG_FTRACE
>  CFLAGS_REMOVE_init.o = -pg
>  endif
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 17f108baec4f..7074522d40c6 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -13,6 +13,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_RELOCATABLE
> +#include 
> +#endif
>
>  #include 
>  #include 
> @@ -379,6 +382,53 @@ static uintptr_t __init best_map_size(phys_addr_t base, 
> phys_addr_t size)
>  #error "setup_vm() is called from head.S before relocate so it should not 
> use absolute addressing."
>  #endif
>
> +#ifdef CONFIG_RELOCATABLE
> +extern unsigned long __rela_dyn_start, __rela_dyn_end;
> +
> +#ifdef CONFIG_64BIT
> +#define Elf_Rela Elf64_Rela
> +#define Elf_Addr Elf64_Addr
> +#else
> +#define Elf_Rela Elf32_Rela
> +#define Elf_Addr Elf32_Addr
> +#endif
> +
> +void __init relocate_kernel(uintptr_t load_pa)
> +{
> +   Elf_Rela *rela = (Elf_Rela *)&__rela_dyn_start;
> +   /*
> +* This holds the offset between the linked virtual address and the
> +* relocated virtual address.
> +*/
> +   uintptr_t reloc_offset = kernel_virt_addr - KERNEL_LINK_ADDR;
> +   /*
> +* This holds the offset between kernel linked virtual address and
> +* physical address.
> +*/
> +   uintptr_t va_kernel_link_pa_offset = KERNEL_LINK_ADDR - load_pa;
> +
> +   for ( ; rela < (Elf_Rela *)&__rela_dyn_end; rela++) {
> +   Elf_Addr addr = (rela->r_offset - va_kernel_link_pa_offset);
> +   Elf_Addr relocated_addr = rela->r_addend;
> +
> +   if (rela->r_info != R_RISCV_RELATIVE)
> +   continue;
> +
> +   /*
> +* Make sure to not relocate vdso symbols like rt_sigreturn
> +* which are linked from the address 0 in vmlinux since
> +* vdso symbol addresses are actually used as an 

Re: [PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Xiaoming Ni

On 2020/5/29 18:26, Greg KH wrote:

On Fri, May 29, 2020 at 07:41:06AM +, Luis Chamberlain wrote:

From: Xiaoming Ni 

Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c
and use register_sysctl_subdir() to help remove the clutter out of
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
  drivers/char/random.c  | 14 --
  include/linux/sysctl.h |  1 -
  kernel/sysctl.c|  5 -
  3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a7cf6aa65908..73fd4b6e9c18 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int 
write,
  }
  
  static int sysctl_poolsize = INPUT_POOL_WORDS * 32;

-extern struct ctl_table random_table[];
-struct ctl_table random_table[] = {
+static struct ctl_table random_table[] = {
{
.procname   = "poolsize",
.data   = _poolsize,
@@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = {
  #endif
{ }
  };
+
+/*
+ * rand_initialize() is called before sysctl_init(),
+ * so we cannot call register_sysctl_init() in rand_initialize()
+ */
+static int __init random_sysctls_init(void)
+{
+   register_sysctl_subdir("kernel", "random", random_table);


No error checking?

:(

It was my mistake, I forgot to add a comment here.
Same as the comment of register_sysctl_init(),
There is almost no failure here during the system initialization phase,
and failure in time does not affect the main function.

Thanks
Xiaoming Ni





Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 11:13:21AM +0300, Jani Nikula wrote:
> On Fri, 29 May 2020, Luis Chamberlain  wrote:
> > Often enough all we need to do is create a subdirectory so that
> > we can stuff sysctls underneath it. However, *if* that directory
> > was already created early on the boot sequence we really have no
> > need to use the full boiler plate code for it, we can just use
> > local variables to help us guide sysctl to place the new leaf files.
> 
> I find it hard to figure out the lifetime requirements for the tables
> passed in; when it's okay to use local variables and when you need
> longer lifetimes. It's not documented, everyone appears to be using
> static tables for this. It's far from obvious.

I agree 2000% that it is not obvious. What made me consider it was that
I *knew* that the base directory would already exist, so it wouldn't
make sense for the code to rely on earlier parts of a table if part
of the hierarchy already existed.

In fact, a *huge* part of the due dilligence on this and futre series
on this cleanup will be to be 100% sure that the base path is already
created. And so this use is obviously dangerous, you just *need* to
know that the base path is created before.

Non-posted changes also deal with link order to help address this
in other places, given that link order controls how *initcalls()
(early_initcall(), late_initcall(), etc) are ordered if you have
multiple of these.

I had a link order series long ago which augmented our ability to make
things clearer at a link order. Eventually I believe this will become
more important, specially as we end up wanting to async more code.

For now, we can only rely on manual code inspection for ensuring
proper ordering. Part of the implicit aspects of this cleanup is
to slowly make these things clearer for each base path.

So... the "fs" base path will actually end up being created in
fs/sysctl.c after we are *fully* done with the fs sysctl cleanups.

  Luis


Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Xiaoming Ni

On 2020/5/29 18:26, Greg KH wrote:

On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote:

From: Xiaoming Ni 

Move the firmware config sysctl table to fallback_table.c and use the
new register_sysctl_subdir() helper. This removes the clutter from
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
  drivers/base/firmware_loader/fallback.c   |  4 
  drivers/base/firmware_loader/fallback.h   | 11 ++
  drivers/base/firmware_loader/fallback_table.c | 22 +--
  include/linux/sysctl.h|  1 -
  kernel/sysctl.c   |  7 --
  5 files changed, 35 insertions(+), 10 deletions(-)


So it now takes more lines than the old stuff?  :(


CONFIG_FW_LOADER = m
Before cleaning, no matter whether ko is loaded or not, the sysctl
interface will be created, but now we need to add register and
unregister interfaces, so the number of lines of code has increased



diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index d9ac7296205e..8190653ae9a3 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -200,12 +200,16 @@ static struct class firmware_class = {
  
  int register_sysfs_loader(void)

  {
+   int ret = register_firmware_config_sysctl();
+   if (ret != 0)
+   return ret;


checkpatch :(

This is my fault,  thanks for your guidance




return class_register(_class);


And if that fails?

Yes, it is better to call register_firmware_config_sysctl() after 
class_register().

thanks for your guidance.



  }
  
  void unregister_sysfs_loader(void)

  {
class_unregister(_class);
+   unregister_firmware_config_sysctl();
  }
  
  static ssize_t firmware_loading_show(struct device *dev,

diff --git a/drivers/base/firmware_loader/fallback.h 
b/drivers/base/firmware_loader/fallback.h
index 06f4577733a8..7d2cb5f6ceb8 100644
--- a/drivers/base/firmware_loader/fallback.h
+++ b/drivers/base/firmware_loader/fallback.h
@@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void);
  
  int register_sysfs_loader(void);

  void unregister_sysfs_loader(void);
+#ifdef CONFIG_SYSCTL
+extern int register_firmware_config_sysctl(void);
+extern void unregister_firmware_config_sysctl(void);
+#else
+static inline int register_firmware_config_sysctl(void)
+{
+   return 0;
+}
+static inline void unregister_firmware_config_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
+
  #else /* CONFIG_FW_LOADER_USER_HELPER */
  static inline int firmware_fallback_sysfs(struct firmware *fw, const char 
*name,
  struct device *device,
diff --git a/drivers/base/firmware_loader/fallback_table.c 
b/drivers/base/firmware_loader/fallback_table.c
index 46a731dede6f..4234aa5ee5df 100644
--- a/drivers/base/firmware_loader/fallback_table.c
+++ b/drivers/base/firmware_loader/fallback_table.c
@@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = {
  EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE);
  
  #ifdef CONFIG_SYSCTL

-struct ctl_table firmware_config_table[] = {
+static struct ctl_table firmware_config_table[] = {
{
.procname   = "force_sysfs_fallback",
.data   = _fallback_config.force_sysfs_fallback,
@@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = {
},
{ }
  };
-#endif
+
+static struct ctl_table_header *hdr;
+int register_firmware_config_sysctl(void)
+{
+   if (hdr)
+   return -EEXIST;


How can hdr be set?


It's my mistake, register_firmware_config_sysctl() is not exported,
there will be no repeated calls.
thanks for your guidance.


+   hdr = register_sysctl_subdir("kernel", "firmware_config",
+firmware_config_table);
+   if (!hdr)
+   return -ENOMEM;
+   return 0;
+}
+
+void unregister_firmware_config_sysctl(void)
+{
+   if (hdr)
+   unregister_sysctl_table(hdr);


Why can't unregister_sysctl_table() take a null pointer value?

Sorry, I didn't notice that the unregister_sysctl_table() already checks
the input parameters.
thanks for your guidance.



And what sets 'hdr' (worst name for a static variable) to NULL so that
it knows not to be unregistered again as it looks like
register_firmware_config_sysctl() could be called multiple times.


How about renaming hdr to firmware_config_sysct_table_header?

+ if (hdr)
+   return -EEXIST;
After deleting this code in register_firmware_config_sysctl(), and 
considering register_firmware_config_sysctl() and 
unregister_firmware_config_sysctl() are not exported, whether there is

no need to add  "hdr = NULL;" ?

Thanks
Xiaoming Ni






Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 12:26:13PM +0200, Greg KH wrote:
> On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote:
> > From: Xiaoming Ni 
> > 
> > Move the firmware config sysctl table to fallback_table.c and use the
> > new register_sysctl_subdir() helper. This removes the clutter from
> > kernel/sysctl.c.
> > 
> > Signed-off-by: Xiaoming Ni 
> > Signed-off-by: Luis Chamberlain 
> > ---
> >  drivers/base/firmware_loader/fallback.c   |  4 
> >  drivers/base/firmware_loader/fallback.h   | 11 ++
> >  drivers/base/firmware_loader/fallback_table.c | 22 +--
> >  include/linux/sysctl.h|  1 -
> >  kernel/sysctl.c   |  7 --
> >  5 files changed, 35 insertions(+), 10 deletions(-)
> 
> So it now takes more lines than the old stuff?  :(

Pretty much agreed with the other changes, thanks for the review!

But this diff-stat change, indeed, it is unfortunate that we end up
with more code here than before. We'll try to reduce it instead
somehow, however in some cases during this spring-cleaning, since
the goal is to move code from one file to another, it *may* require
more code. So it won't always be negative. But we'll try!

  Luis


Re: [Intel-gfx] [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
On Fri, May 29, 2020 at 01:23:19AM -0700, Kees Cook wrote:
> On Fri, May 29, 2020 at 07:41:01AM +, Luis Chamberlain wrote:
> > This simplifies the code considerably. The following coccinelle
> > SmPL grammar rule was used to transform this code.
> > 
> > // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c
> > 
> > @c1@
> > expression E1;
> > identifier subdir, sysctls;
> > @@
> > 
> > static struct ctl_table subdir[] = {
> > {
> > .procname = E1,
> > .maxlen = 0,
> > .mode = 0555,
> > .child = sysctls,
> > },
> > { }
> > };
> > 
> > @c2@
> > identifier c1.subdir;
> > 
> > expression E2;
> > identifier base;
> > @@
> > 
> > static struct ctl_table base[] = {
> > {
> > .procname = E2,
> > .maxlen = 0,
> > .mode = 0555,
> > .child = subdir,
> > },
> > { }
> > };
> > 
> > @c3@
> > identifier c2.base;
> > identifier header;
> > @@
> > 
> > header = register_sysctl_table(base);
> > 
> > @r1 depends on c1 && c2 && c3@
> > expression c1.E1;
> > identifier c1.subdir, c1.sysctls;
> > @@
> > 
> > -static struct ctl_table subdir[] = {
> > -   {
> > -   .procname = E1,
> > -   .maxlen = 0,
> > -   .mode = 0555,
> > -   .child = sysctls,
> > -   },
> > -   { }
> > -};
> > 
> > @r2 depends on c1 && c2 && c3@
> > identifier c1.subdir;
> > 
> > expression c2.E2;
> > identifier c2.base;
> > @@
> > -static struct ctl_table base[] = {
> > -   {
> > -   .procname = E2,
> > -   .maxlen = 0,
> > -   .mode = 0555,
> > -   .child = subdir,
> > -   },
> > -   { }
> > -};
> > 
> > @r3 depends on c1 && c2 && c3@
> > expression c1.E1;
> > identifier c1.sysctls;
> > expression c2.E2;
> > identifier c2.base;
> > identifier c3.header;
> > @@
> > 
> > header =
> > -register_sysctl_table(base);
> > +register_sysctl_subdir(E2, E1, sysctls);
> > 
> > Generated-by: Coccinelle SmPL
> > 
> > Signed-off-by: Luis Chamberlain 
> > ---
> >  fs/ocfs2/stackglue.c | 27 ---
> >  1 file changed, 4 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> > index a191094694c6..addafced7f59 100644
> > --- a/fs/ocfs2/stackglue.c
> > +++ b/fs/ocfs2/stackglue.c
> > @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = {
> > },
> > { }
> >  };
> > -
> > -static struct ctl_table ocfs2_kern_table[] = {
> > -   {
> > -   .procname   = "ocfs2",
> > -   .data   = NULL,
> > -   .maxlen = 0,
> > -   .mode   = 0555,
> > -   .child  = ocfs2_mod_table
> > -   },
> > -   { }
> > -};
> > -
> > -static struct ctl_table ocfs2_root_table[] = {
> > -   {
> > -   .procname   = "fs",
> > -   .data   = NULL,
> > -   .maxlen = 0,
> > -   .mode   = 0555,
> > -   .child  = ocfs2_kern_table
> > -   },
> > -   { }
> > -};
> > +   .data   = NULL,
> > +   .data   = NULL,
> 
> The conversion script doesn't like the .data field assignments. ;)
> 
> Was this series built with allmodconfig? I would have expected this to
> blow up very badly. :)

Yikes, sense, you're right. Nope, I left the random config tests to
0day. Will fix, thanks!

  Luis


Re: [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code

2020-05-29 Thread Michael Ellerman
Andrew Donnellan  writes:
> On 29/5/20 4:14 pm, Daniel Axtens wrote:
>> syzkaller is picking up a bunch of crashes that look like this:
>> 
>> Unrecoverable exception 380 at c037ed60 (msr=80001031)
>> Oops: Unrecoverable exception, sig: 6 [#1]
>> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> Modules linked in:
>> CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 
>> 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
>> NIP:  c037ed60 LR: c004bac8 CTR: c0030990
>> REGS: c000555a7230 TRAP: 0380   Not tainted  
>> (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
>> MSR:  80001031   CR: 48222882  XER: 2000
>> CFAR: c004bac4 IRQMASK: 0
>> GPR00: c004bb68 c000555a74c0 c24b3500 0005
>> GPR04:   c004bb88 c0080091
>> GPR08: 000b c004bac8 00016000 c2503500
>> GPR12: c0030990 c319 106a5898 106a
>> GPR16: 106a5890 c7a92000 c8180e00 c7a8f700
>> GPR20: c7a904b0 1011 c259d318 5deadbeef100
>> GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778
>> GPR28: 0001 8280b033  c000555a75a0
>> NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50
>> LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310
>> Call Trace:
>> [c000555a74c0] [c004bb68] 
>> interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable)
>> [c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0
>> --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
>> ..
>> 
>> That looks like the KCOV helper accessing memory that's not safe to
>> access in the interrupt handling context.
>> 
>> Do not instrument the new syscall entry/exit code with KCOV, GCOV or
>> UBSAN.
>> 
>> Cc: Nicholas Piggin 
>> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic 
>> in C")
>> Signed-off-by: Daniel Axtens 
>
> This seems reasonable - I've verified that this does indeed suppress the 
> kcov trace calls.

Thanks.

> Acked-by: Andrew Donnellan 
>
> (does this need to be tagged for stable? the Fixes: commit is in 5.6 but 
> we're at 5.7-rc7 at this point...)

No. The Fixed commit is based on v5.6-rc2, but it didn't go in until v5.7-rc1:

  $ git describe --contains --match 'v*' 68b34588e202
  v5.7-rc1~35^2~46

I plan to send it to Linus before the v5.7 release.

cheers


Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Aneesh Kumar K.V

On 5/29/20 3:22 PM, Jan Kara wrote:

Hi!

On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:

Thanks Michal. I also missed Jeff in this email thread.


And I think you'll also need some of the sched maintainers for the prctl
bits...


On 5/29/20 3:03 PM, Michal Suchánek wrote:

Adding Jan

On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.

This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.

Signed-off-by: Aneesh Kumar K.V 

...

diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
   static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+


I'm not sure CONFIG is really the right approach here. For a distro that would
basically mean to disable MAP_SYNC for all PPC kernels unless application
explicitly uses the right prctl. Shouldn't we rather initialize
default_map_sync_mask on boot based on whether the CPU we run on requires
new flush instructions or not? Otherwise the patch looks sensible.



yes that is correct. We ideally want to deny MAP_SYNC only w.r.t 
POWER10. But on a virtualized platform there is no easy way to detect 
that. We could ideally hook this into the nvdimm driver where we look at 
the new compat string ibm,persistent-memory-v2 and then disable MAP_SYNC

if we find a device with the specific value.

BTW with the recent changes I posted for the nvdimm driver, older kernel 
won't initialize persistent memory device on newer hardware. Newer 
hardware will present the device to OS with a different device tree 
compat string.


My expectation  w.r.t this patch was, Distro would want to  mark
CONFIG_ARCH_MAP_SYNC_DISABLE=n based on the different application 
certification.  Otherwise application will have to end up calling the 
prctl(MMF_DISABLE_MAP_SYNC, 0) any way. If that is the case, should this

be dependent on P10?

With that I am wondering should we even have this patch? Can we expect 
userspace get updated to use new instruction?.


With ppc64 we never had a real persistent memory device available for 
end user to try. The available persistent memory stack was using vPMEM 
which was presented as a volatile memory region for which there is no 
need to use any of the flush instructions. We could safely assume that 
as we get applications certified/verified for working with pmem device 
on ppc64, they would all be using the new instructions?



-aneesh





Re: [PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Greg KH
On Fri, May 29, 2020 at 07:41:06AM +, Luis Chamberlain wrote:
> From: Xiaoming Ni 
> 
> Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c
> and use register_sysctl_subdir() to help remove the clutter out of
> kernel/sysctl.c.
> 
> Signed-off-by: Xiaoming Ni 
> Signed-off-by: Luis Chamberlain 
> ---
>  drivers/char/random.c  | 14 --
>  include/linux/sysctl.h |  1 -
>  kernel/sysctl.c|  5 -
>  3 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index a7cf6aa65908..73fd4b6e9c18 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int 
> write,
>  }
>  
>  static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
> -extern struct ctl_table random_table[];
> -struct ctl_table random_table[] = {
> +static struct ctl_table random_table[] = {
>   {
>   .procname   = "poolsize",
>   .data   = _poolsize,
> @@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = {
>  #endif
>   { }
>  };
> +
> +/*
> + * rand_initialize() is called before sysctl_init(),
> + * so we cannot call register_sysctl_init() in rand_initialize()
> + */
> +static int __init random_sysctls_init(void)
> +{
> + register_sysctl_subdir("kernel", "random", random_table);

No error checking?

:(



Re: [PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Greg KH
On Fri, May 29, 2020 at 07:41:04AM +, Luis Chamberlain wrote:
> From: Xiaoming Ni 
> 
> Move the firmware config sysctl table to fallback_table.c and use the
> new register_sysctl_subdir() helper. This removes the clutter from
> kernel/sysctl.c.
> 
> Signed-off-by: Xiaoming Ni 
> Signed-off-by: Luis Chamberlain 
> ---
>  drivers/base/firmware_loader/fallback.c   |  4 
>  drivers/base/firmware_loader/fallback.h   | 11 ++
>  drivers/base/firmware_loader/fallback_table.c | 22 +--
>  include/linux/sysctl.h|  1 -
>  kernel/sysctl.c   |  7 --
>  5 files changed, 35 insertions(+), 10 deletions(-)

So it now takes more lines than the old stuff?  :(

> 
> diff --git a/drivers/base/firmware_loader/fallback.c 
> b/drivers/base/firmware_loader/fallback.c
> index d9ac7296205e..8190653ae9a3 100644
> --- a/drivers/base/firmware_loader/fallback.c
> +++ b/drivers/base/firmware_loader/fallback.c
> @@ -200,12 +200,16 @@ static struct class firmware_class = {
>  
>  int register_sysfs_loader(void)
>  {
> + int ret = register_firmware_config_sysctl();
> + if (ret != 0)
> + return ret;

checkpatch :(

>   return class_register(_class);

And if that fails?

>  }
>  
>  void unregister_sysfs_loader(void)
>  {
>   class_unregister(_class);
> + unregister_firmware_config_sysctl();
>  }
>  
>  static ssize_t firmware_loading_show(struct device *dev,
> diff --git a/drivers/base/firmware_loader/fallback.h 
> b/drivers/base/firmware_loader/fallback.h
> index 06f4577733a8..7d2cb5f6ceb8 100644
> --- a/drivers/base/firmware_loader/fallback.h
> +++ b/drivers/base/firmware_loader/fallback.h
> @@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void);
>  
>  int register_sysfs_loader(void);
>  void unregister_sysfs_loader(void);
> +#ifdef CONFIG_SYSCTL
> +extern int register_firmware_config_sysctl(void);
> +extern void unregister_firmware_config_sysctl(void);
> +#else
> +static inline int register_firmware_config_sysctl(void)
> +{
> + return 0;
> +}
> +static inline void unregister_firmware_config_sysctl(void) { }
> +#endif /* CONFIG_SYSCTL */
> +
>  #else /* CONFIG_FW_LOADER_USER_HELPER */
>  static inline int firmware_fallback_sysfs(struct firmware *fw, const char 
> *name,
> struct device *device,
> diff --git a/drivers/base/firmware_loader/fallback_table.c 
> b/drivers/base/firmware_loader/fallback_table.c
> index 46a731dede6f..4234aa5ee5df 100644
> --- a/drivers/base/firmware_loader/fallback_table.c
> +++ b/drivers/base/firmware_loader/fallback_table.c
> @@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = {
>  EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE);
>  
>  #ifdef CONFIG_SYSCTL
> -struct ctl_table firmware_config_table[] = {
> +static struct ctl_table firmware_config_table[] = {
>   {
>   .procname   = "force_sysfs_fallback",
>   .data   = _fallback_config.force_sysfs_fallback,
> @@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = {
>   },
>   { }
>  };
> -#endif
> +
> +static struct ctl_table_header *hdr;
> +int register_firmware_config_sysctl(void)
> +{
> + if (hdr)
> + return -EEXIST;

How can hdr be set?

> + hdr = register_sysctl_subdir("kernel", "firmware_config",
> +  firmware_config_table);
> + if (!hdr)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +void unregister_firmware_config_sysctl(void)
> +{
> + if (hdr)
> + unregister_sysctl_table(hdr);

Why can't unregister_sysctl_table() take a null pointer value?

And what sets 'hdr' (worst name for a static variable) to NULL so that
it knows not to be unregistered again as it looks like
register_firmware_config_sysctl() could be called multiple times.

thanks,

greg k-h


Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Jan Kara
Hi!

On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
> Thanks Michal. I also missed Jeff in this email thread.

And I think you'll also need some of the sched maintainers for the prctl
bits...

> On 5/29/20 3:03 PM, Michal Suchánek wrote:
> > Adding Jan
> > 
> > On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> > > With POWER10, architecture is adding new pmem flush and sync instructions.
> > > The kernel should prevent the usage of MAP_SYNC if applications are not 
> > > using
> > > the new instructions on newer hardware.
> > > 
> > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
> > > the usage of MAP_SYNC. The kernel config option is added to allow the user
> > > to control whether MAP_SYNC should be enabled by default or not.
> > > 
> > > Signed-off-by: Aneesh Kumar K.V 
...
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index 8c700f881d92..d5a9a363e81e 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp 
> > > DEFINE_SPINLOCK(mmlist_lock);
> > >   static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
> > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> > > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> > > +#else
> > > +unsigned long default_map_sync_mask = 0;
> > > +#endif
> > > +

I'm not sure CONFIG is really the right approach here. For a distro that would
basically mean to disable MAP_SYNC for all PPC kernels unless application
explicitly uses the right prctl. Shouldn't we rather initialize
default_map_sync_mask on boot based on whether the CPU we run on requires
new flush instructions or not? Otherwise the patch looks sensible.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v4 6/7] KVM: MIPS: clean up redundant 'kvm_run' parameters

2020-05-29 Thread Paolo Bonzini
On 27/05/20 08:24, Tianjia Zhang wrote:
>>>
>>>
> 
> Hi Huacai,
> 
> These two patches(6/7 and 7/7) should be merged into the tree of the
> mips architecture separately. At present, there seems to be no good way
> to merge the whole architecture patchs.
> 
> For this series of patches, some architectures have been merged, some
> need to update the patch.

Hi Tianjia, I will take care of this during the merge window.

Thanks,

Paolo



Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Aneesh Kumar K.V

Hi,


Thanks Michal. I also missed Jeff in this email thread.

-aneesh

On 5/29/20 3:03 PM, Michal Suchánek wrote:

Adding Jan

On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.

This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.

Signed-off-by: Aneesh Kumar K.V 
---
  include/linux/sched/coredump.h | 13 ++---
  include/uapi/linux/prctl.h |  3 +++
  kernel/fork.c  |  8 +++-
  kernel/sys.c   | 18 ++
  mm/Kconfig |  3 +++
  mm/mmap.c  |  4 
  6 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..9ba6b3d5f991 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
  #define MMF_DISABLE_THP   24  /* disable THP for all VMAs */
  #define MMF_OOM_VICTIM25  /* mm is the oom victim */
  #define MMF_OOM_REAP_QUEUED   26  /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC   27  /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC_MASK  (1 << MMF_DISABLE_MAP_SYNC)
  
-#define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\

-MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK  (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+   MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+   return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
+}
  
  #endif /* _LINUX_SCHED_COREDUMP_H */

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
  #define PR_SET_IO_FLUSHER 57
  #define PR_GET_IO_FLUSHER 58
  
+#define PR_SET_MAP_SYNC_ENABLE		59

+#define PR_GET_MAP_SYNC_ENABLE 60
+
  #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
  
  static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
  
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE

+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
  static int __init coredump_filter_setup(char *s)
  {
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
-   mm->flags = default_dump_filter;
+   mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
  
diff --git a/kernel/sys.c b/kernel/sys.c

index d325f3ab624a..f6127cf4128b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, >mm->flags);
up_write(>mm->mmap_sem);
break;
+
+   case PR_GET_MAP_SYNC_ENABLE:
+   if (arg2 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = !test_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
+   break;
+   case PR_SET_MAP_SYNC_ENABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   if (down_write_killable(>mm->mmap_sem))
+   return -EINTR;
+   if (arg2)
+   clear_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
+   else
+   set_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
+   up_write(>mm->mmap_sem);
+   break;
+
case PR_MPX_ENABLE_MANAGEMENT:
case PR_MPX_DISABLE_MANAGEMENT:
/* No longer implemented: */
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
  config MAPPING_DIRTY_HELPERS
  bool
  
+config ARCH_MAP_SYNC_DISABLE

+   bool
+
  endmenu
diff --git a/mm/mmap.c b/mm/mmap.c

Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-05-29 Thread Michal Suchánek
Adding Jan

On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> With POWER10, architecture is adding new pmem flush and sync instructions.
> The kernel should prevent the usage of MAP_SYNC if applications are not using
> the new instructions on newer hardware.
> 
> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
> the usage of MAP_SYNC. The kernel config option is added to allow the user
> to control whether MAP_SYNC should be enabled by default or not.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  include/linux/sched/coredump.h | 13 ++---
>  include/uapi/linux/prctl.h |  3 +++
>  kernel/fork.c  |  8 +++-
>  kernel/sys.c   | 18 ++
>  mm/Kconfig |  3 +++
>  mm/mmap.c  |  4 
>  6 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index ecdc6542070f..9ba6b3d5f991 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #define MMF_DISABLE_THP  24  /* disable THP for all VMAs */
>  #define MMF_OOM_VICTIM   25  /* mm is the oom victim */
>  #define MMF_OOM_REAP_QUEUED  26  /* mm was queued for oom_reaper */
> -#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> +#define MMF_DISABLE_MAP_SYNC 27  /* disable THP for all VMAs */
> +#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> +#define MMF_DISABLE_MAP_SYNC_MASK(1 << MMF_DISABLE_MAP_SYNC)
>  
> -#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | 
> MMF_DUMP_FILTER_MASK |\
> -  MMF_DISABLE_THP_MASK)
> +#define MMF_INIT_MASK(MMF_DUMPABLE_MASK | 
> MMF_DUMP_FILTER_MASK | \
> + MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
> +
> +static inline bool map_sync_enabled(struct mm_struct *mm)
> +{
> + return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
> +}
>  
>  #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..ee4cde32d5cf 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -238,4 +238,7 @@ struct prctl_mm_map {
>  #define PR_SET_IO_FLUSHER57
>  #define PR_GET_IO_FLUSHER58
>  
> +#define PR_SET_MAP_SYNC_ENABLE   59
> +#define PR_GET_MAP_SYNC_ENABLE   60
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8c700f881d92..d5a9a363e81e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
>  
>  static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
>  
> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> +#else
> +unsigned long default_map_sync_mask = 0;
> +#endif
> +
>  static int __init coredump_filter_setup(char *s)
>  {
>   default_dump_filter =
> @@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
> struct task_struct *p,
>   mm->flags = current->mm->flags & MMF_INIT_MASK;
>   mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
>   } else {
> - mm->flags = default_dump_filter;
> + mm->flags = default_dump_filter | default_map_sync_mask;
>   mm->def_flags = 0;
>   }
>  
> diff --git a/kernel/sys.c b/kernel/sys.c
> index d325f3ab624a..f6127cf4128b 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> arg2, unsigned long, arg3,
>   clear_bit(MMF_DISABLE_THP, >mm->flags);
>   up_write(>mm->mmap_sem);
>   break;
> +
> + case PR_GET_MAP_SYNC_ENABLE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> + error = !test_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + break;
> + case PR_SET_MAP_SYNC_ENABLE:
> + if (arg3 || arg4 || arg5)
> + return -EINVAL;
> + if (down_write_killable(>mm->mmap_sem))
> + return -EINTR;
> + if (arg2)
> + clear_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + else
> + set_bit(MMF_DISABLE_MAP_SYNC, >mm->flags);
> + up_write(>mm->mmap_sem);
> + break;
> +
>   case PR_MPX_ENABLE_MANAGEMENT:
>   case PR_MPX_DISABLE_MANAGEMENT:
>   /* No longer implemented: */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c1acc34c1c35..38fd7cfbfca8 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
>  config MAPPING_DIRTY_HELPERS
>  bool
>  
> +config ARCH_MAP_SYNC_DISABLE
> +

Re: [PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code

2020-05-29 Thread Andrew Donnellan

On 29/5/20 4:14 pm, Daniel Axtens wrote:

syzkaller is picking up a bunch of crashes that look like this:

Unrecoverable exception 380 at c037ed60 (msr=80001031)
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 
5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
NIP:  c037ed60 LR: c004bac8 CTR: c0030990
REGS: c000555a7230 TRAP: 0380   Not tainted  
(5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
MSR:  80001031   CR: 48222882  XER: 2000
CFAR: c004bac4 IRQMASK: 0
GPR00: c004bb68 c000555a74c0 c24b3500 0005
GPR04:   c004bb88 c0080091
GPR08: 000b c004bac8 00016000 c2503500
GPR12: c0030990 c319 106a5898 106a
GPR16: 106a5890 c7a92000 c8180e00 c7a8f700
GPR20: c7a904b0 1011 c259d318 5deadbeef100
GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778
GPR28: 0001 8280b033  c000555a75a0
NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50
LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310
Call Trace:
[c000555a74c0] [c004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 
(unreliable)
[c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0
--- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
..

That looks like the KCOV helper accessing memory that's not safe to
access in the interrupt handling context.

Do not instrument the new syscall entry/exit code with KCOV, GCOV or
UBSAN.

Cc: Nicholas Piggin 
Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in 
C")
Signed-off-by: Daniel Axtens 


This seems reasonable - I've verified that this does indeed suppress the 
kcov trace calls.


Acked-by: Andrew Donnellan 

(does this need to be tagged for stable? the Fixes: commit is in 5.6 but 
we're at 5.7-rc7 at this point...)


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited


Re: [PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Kees Cook
On Fri, May 29, 2020 at 07:41:01AM +, Luis Chamberlain wrote:
> This simplifies the code considerably. The following coccinelle
> SmPL grammar rule was used to transform this code.
> 
> // pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c
> 
> @c1@
> expression E1;
> identifier subdir, sysctls;
> @@
> 
> static struct ctl_table subdir[] = {
>   {
>   .procname = E1,
>   .maxlen = 0,
>   .mode = 0555,
>   .child = sysctls,
>   },
>   { }
> };
> 
> @c2@
> identifier c1.subdir;
> 
> expression E2;
> identifier base;
> @@
> 
> static struct ctl_table base[] = {
>   {
>   .procname = E2,
>   .maxlen = 0,
>   .mode = 0555,
>   .child = subdir,
>   },
>   { }
> };
> 
> @c3@
> identifier c2.base;
> identifier header;
> @@
> 
> header = register_sysctl_table(base);
> 
> @r1 depends on c1 && c2 && c3@
> expression c1.E1;
> identifier c1.subdir, c1.sysctls;
> @@
> 
> -static struct ctl_table subdir[] = {
> - {
> - .procname = E1,
> - .maxlen = 0,
> - .mode = 0555,
> - .child = sysctls,
> - },
> - { }
> -};
> 
> @r2 depends on c1 && c2 && c3@
> identifier c1.subdir;
> 
> expression c2.E2;
> identifier c2.base;
> @@
> -static struct ctl_table base[] = {
> - {
> - .procname = E2,
> - .maxlen = 0,
> - .mode = 0555,
> - .child = subdir,
> - },
> - { }
> -};
> 
> @r3 depends on c1 && c2 && c3@
> expression c1.E1;
> identifier c1.sysctls;
> expression c2.E2;
> identifier c2.base;
> identifier c3.header;
> @@
> 
> header =
> -register_sysctl_table(base);
> +register_sysctl_subdir(E2, E1, sysctls);
> 
> Generated-by: Coccinelle SmPL
> 
> Signed-off-by: Luis Chamberlain 
> ---
>  fs/ocfs2/stackglue.c | 27 ---
>  1 file changed, 4 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> index a191094694c6..addafced7f59 100644
> --- a/fs/ocfs2/stackglue.c
> +++ b/fs/ocfs2/stackglue.c
> @@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = {
>   },
>   { }
>  };
> -
> -static struct ctl_table ocfs2_kern_table[] = {
> - {
> - .procname   = "ocfs2",
> - .data   = NULL,
> - .maxlen = 0,
> - .mode   = 0555,
> - .child  = ocfs2_mod_table
> - },
> - { }
> -};
> -
> -static struct ctl_table ocfs2_root_table[] = {
> - {
> - .procname   = "fs",
> - .data   = NULL,
> - .maxlen = 0,
> - .mode   = 0555,
> - .child  = ocfs2_kern_table
> - },
> - { }
> -};
> + .data   = NULL,
> + .data   = NULL,

The conversion script doesn't like the .data field assignments. ;)

Was this series built with allmodconfig? I would have expected this to
blow up very badly. :)

-- 
Kees Cook


Re: [PATCH 12/13] sysctl: add helper to register empty subdir

2020-05-29 Thread Kees Cook
On Fri, May 29, 2020 at 07:41:07AM +, Luis Chamberlain wrote:
> The way to create a subdirectory from the base set of directories
> is a bit obscure, so provide a helper which makes this clear, and
> also helps remove boiler plate code required to do this work.
> 
> Signed-off-by: Luis Chamberlain 

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH 13/13] fs: move binfmt_misc sysctl to its own file

2020-05-29 Thread Kees Cook
On Fri, May 29, 2020 at 07:41:08AM +, Luis Chamberlain wrote:
> This moves the binfmt_misc sysctl to its own file to help remove
> clutter from kernel/sysctl.c.
> 
> Signed-off-by: Luis Chamberlain 
> ---
>  fs/binfmt_misc.c | 1 +
>  kernel/sysctl.c  | 7 ---
>  2 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> index f69a043f562b..656b3f5f3bbf 100644
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void)
>   int err = register_filesystem(_fs_type);
>   if (!err)
>   insert_binfmt(_format);
> + register_sysctl_empty_subdir("fs", "binfmt_misc");
>   return err;

Nit: let's make the dir before registering the filesystem. I can't
imagine a realistic situation where userspace is reacting so fast it
would actually fail to mount the fs on /proc/sys/fs/binfmt_misc, but why
risk it?

-Kees

>  }
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 460532cd5ac8..7714e7b476c2 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -3042,13 +3042,6 @@ static struct ctl_table fs_table[] = {
>   .extra1 = SYSCTL_ZERO,
>   .extra2 = SYSCTL_TWO,
>   },
> -#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
> - {
> - .procname   = "binfmt_misc",
> - .mode   = 0555,
> - .child  = sysctl_mount_point,
> - },
> -#endif
>   {
>   .procname   = "pipe-max-size",
>   .data   = _max_size,
> -- 
> 2.26.2
> 

-- 
Kees Cook


Re: [PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Jani Nikula
On Fri, 29 May 2020, Luis Chamberlain  wrote:
> Often enough all we need to do is create a subdirectory so that
> we can stuff sysctls underneath it. However, *if* that directory
> was already created early on the boot sequence we really have no
> need to use the full boiler plate code for it, we can just use
> local variables to help us guide sysctl to place the new leaf files.

I find it hard to figure out the lifetime requirements for the tables
passed in; when it's okay to use local variables and when you need
longer lifetimes. It's not documented, everyone appears to be using
static tables for this. It's far from obvious.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center


[PATCH 13/13] fs: move binfmt_misc sysctl to its own file

2020-05-29 Thread Luis Chamberlain
This moves the binfmt_misc sysctl to its own file to help remove
clutter from kernel/sysctl.c.

Signed-off-by: Luis Chamberlain 
---
 fs/binfmt_misc.c | 1 +
 kernel/sysctl.c  | 7 ---
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index f69a043f562b..656b3f5f3bbf 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void)
int err = register_filesystem(_fs_type);
if (!err)
insert_binfmt(_format);
+   register_sysctl_empty_subdir("fs", "binfmt_misc");
return err;
 }
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 460532cd5ac8..7714e7b476c2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3042,13 +3042,6 @@ static struct ctl_table fs_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
},
-#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
-   {
-   .procname   = "binfmt_misc",
-   .mode   = 0555,
-   .child  = sysctl_mount_point,
-   },
-#endif
{
.procname   = "pipe-max-size",
.data   = _max_size,
-- 
2.26.2



[PATCH 12/13] sysctl: add helper to register empty subdir

2020-05-29 Thread Luis Chamberlain
The way to create a subdirectory from the base set of directories
is a bit obscure, so provide a helper which makes this clear, and
also helps remove boiler plate code required to do this work.

Signed-off-by: Luis Chamberlain 
---
 include/linux/sysctl.h |  7 +++
 kernel/sysctl.c| 16 +---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 33a471b56345..89c92390e6de 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -208,6 +208,8 @@ extern void register_sysctl_init(const char *path, struct 
ctl_table *table,
 extern struct ctl_table_header *register_sysctl_subdir(const char *base,
   const char *subdir,
   struct ctl_table *table);
+extern void register_sysctl_empty_subdir(const char *base, const char *subdir);
+
 void do_sysctl_args(void);
 
 extern int pwrsw_enabled;
@@ -231,6 +233,11 @@ inline struct ctl_table_header 
*register_sysctl_subdir(const char *base,
return NULL;
 }
 
+static inline void register_sysctl_empty_subdir(const char *base,
+   const char *subdir)
+{
+}
+
 static inline struct ctl_table_header *register_sysctl_paths(
const struct ctl_path *path, struct ctl_table *table)
 {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f9a35325d5d5..460532cd5ac8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3188,13 +3188,17 @@ struct ctl_table_header *register_sysctl_subdir(const 
char *base,
{ }
};
 
-   if (!table->procname)
+   if (table != sysctl_mount_point && !table->procname)
goto out;
 
hdr = register_sysctl_table(base_table);
if (unlikely(!hdr)) {
-   pr_err("failed when creating subdirectory sysctl %s/%s/%s\n",
-  base, subdir, table->procname);
+   if (table != sysctl_mount_point)
+   pr_err("failed when creating subdirectory sysctl 
%s/%s/%s\n",
+  base, subdir, table->procname);
+   else
+   pr_err("failed when creating empty subddirectory 
%s/%s\n",
+  base, subdir);
goto out;
}
kmemleak_not_leak(hdr);
@@ -3202,6 +3206,12 @@ struct ctl_table_header *register_sysctl_subdir(const 
char *base,
return hdr;
 }
 EXPORT_SYMBOL_GPL(register_sysctl_subdir);
+
+void register_sysctl_empty_subdir(const char *base,
+ const char *subdir)
+{
+   register_sysctl_subdir(base, subdir, sysctl_mount_point);
+}
 #endif /* CONFIG_SYSCTL */
 /*
  * No sense putting this after each symbol definition, twice,
-- 
2.26.2



[PATCH 09/13] firmware_loader: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move the firmware config sysctl table to fallback_table.c and use the
new register_sysctl_subdir() helper. This removes the clutter from
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 drivers/base/firmware_loader/fallback.c   |  4 
 drivers/base/firmware_loader/fallback.h   | 11 ++
 drivers/base/firmware_loader/fallback_table.c | 22 +--
 include/linux/sysctl.h|  1 -
 kernel/sysctl.c   |  7 --
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index d9ac7296205e..8190653ae9a3 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -200,12 +200,16 @@ static struct class firmware_class = {
 
 int register_sysfs_loader(void)
 {
+   int ret = register_firmware_config_sysctl();
+   if (ret != 0)
+   return ret;
return class_register(_class);
 }
 
 void unregister_sysfs_loader(void)
 {
class_unregister(_class);
+   unregister_firmware_config_sysctl();
 }
 
 static ssize_t firmware_loading_show(struct device *dev,
diff --git a/drivers/base/firmware_loader/fallback.h 
b/drivers/base/firmware_loader/fallback.h
index 06f4577733a8..7d2cb5f6ceb8 100644
--- a/drivers/base/firmware_loader/fallback.h
+++ b/drivers/base/firmware_loader/fallback.h
@@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void);
 
 int register_sysfs_loader(void);
 void unregister_sysfs_loader(void);
+#ifdef CONFIG_SYSCTL
+extern int register_firmware_config_sysctl(void);
+extern void unregister_firmware_config_sysctl(void);
+#else
+static inline int register_firmware_config_sysctl(void)
+{
+   return 0;
+}
+static inline void unregister_firmware_config_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
+
 #else /* CONFIG_FW_LOADER_USER_HELPER */
 static inline int firmware_fallback_sysfs(struct firmware *fw, const char 
*name,
  struct device *device,
diff --git a/drivers/base/firmware_loader/fallback_table.c 
b/drivers/base/firmware_loader/fallback_table.c
index 46a731dede6f..4234aa5ee5df 100644
--- a/drivers/base/firmware_loader/fallback_table.c
+++ b/drivers/base/firmware_loader/fallback_table.c
@@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = {
 EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE);
 
 #ifdef CONFIG_SYSCTL
-struct ctl_table firmware_config_table[] = {
+static struct ctl_table firmware_config_table[] = {
{
.procname   = "force_sysfs_fallback",
.data   = _fallback_config.force_sysfs_fallback,
@@ -45,4 +45,22 @@ struct ctl_table firmware_config_table[] = {
},
{ }
 };
-#endif
+
+static struct ctl_table_header *hdr;
+int register_firmware_config_sysctl(void)
+{
+   if (hdr)
+   return -EEXIST;
+   hdr = register_sysctl_subdir("kernel", "firmware_config",
+firmware_config_table);
+   if (!hdr)
+   return -ENOMEM;
+   return 0;
+}
+
+void unregister_firmware_config_sysctl(void)
+{
+   if (hdr)
+   unregister_sysctl_table(hdr);
+}
+#endif /* CONFIG_SYSCTL */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 58bc978d4f03..aa01f54d0442 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -217,7 +217,6 @@ extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
-extern struct ctl_table firmware_config_table[];
 extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 30c2d521502a..e007375c8a11 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2088,13 +2088,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0555,
.child  = usermodehelper_table,
},
-#ifdef CONFIG_FW_LOADER_USER_HELPER
-   {
-   .procname   = "firmware_config",
-   .mode   = 0555,
-   .child  = firmware_config_table,
-   },
-#endif
{
.procname   = "overflowuid",
.data   = ,
-- 
2.26.2



[PATCH 11/13] random: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move random_table sysctl from kernel/sysctl.c to drivers/char/random.c
and use register_sysctl_subdir() to help remove the clutter out of
kernel/sysctl.c.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 drivers/char/random.c  | 14 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  5 -
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a7cf6aa65908..73fd4b6e9c18 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2101,8 +2101,7 @@ static int proc_do_entropy(struct ctl_table *table, int 
write,
 }
 
 static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
-extern struct ctl_table random_table[];
-struct ctl_table random_table[] = {
+static struct ctl_table random_table[] = {
{
.procname   = "poolsize",
.data   = _poolsize,
@@ -2164,6 +2163,17 @@ struct ctl_table random_table[] = {
 #endif
{ }
 };
+
+/*
+ * rand_initialize() is called before sysctl_init(),
+ * so we cannot call register_sysctl_init() in rand_initialize()
+ */
+static int __init random_sysctls_init(void)
+{
+   register_sysctl_subdir("kernel", "random", random_table);
+   return 0;
+}
+device_initcall(random_sysctls_init);
 #endif /* CONFIG_SYSCTL */
 
 struct batched_entropy {
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index e5364b69dd95..33a471b56345 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -216,7 +216,6 @@ extern int unaligned_dump_stack;
 extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
-extern struct ctl_table random_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5c116904feb7..f9a35325d5d5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2078,11 +2078,6 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = sysctl_max_threads,
},
-   {
-   .procname   = "random",
-   .mode   = 0555,
-   .child  = random_table,
-   },
{
.procname   = "usermodehelper",
.mode   = 0555,
-- 
2.26.2



[PATCH 10/13] eventpoll: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

Move epoll_table sysctl to fs/eventpoll.c and remove the
clutter out of kernel/sysctl.c by using register_sysctl_subdir()..

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/eventpoll.c | 10 +-
 include/linux/poll.h   |  2 --
 include/linux/sysctl.h |  1 -
 kernel/sysctl.c|  7 ---
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..957ebc9700e3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -299,7 +299,7 @@ static LIST_HEAD(tfile_check_list);
 static long long_zero;
 static long long_max = LONG_MAX;
 
-struct ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname   = "max_user_watches",
.data   = _user_watches,
@@ -311,6 +311,13 @@ struct ctl_table epoll_table[] = {
},
{ }
 };
+
+static void __init epoll_sysctls_init(void)
+{
+   register_sysctl_subdir("fs", "epoll", epoll_table);
+}
+#else
+#define epoll_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static const struct file_operations eventpoll_fops;
@@ -2422,6 +2429,7 @@ static int __init eventpoll_init(void)
/* Allocates slab cache used to allocate "struct eppoll_entry" */
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+   epoll_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 1cdc32b1f1b0..a9e0e1c2d1f2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -8,12 +8,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-extern struct ctl_table epoll_table[]; /* for sysctl */
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
 #ifdef __clang__
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index aa01f54d0442..e5364b69dd95 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -217,7 +217,6 @@ extern int no_unaligned_warning;
 
 extern struct ctl_table sysctl_mount_point[];
 extern struct ctl_table random_table[];
-extern struct ctl_table epoll_table[];
 
 #else /* CONFIG_SYSCTL */
 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table 
* table)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e007375c8a11..5c116904feb7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3001,13 +3001,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_EPOLL
-   {
-   .procname   = "epoll",
-   .mode   = 0555,
-   .child  = epoll_table,
-   },
-#endif
 #endif
{
.procname   = "protected_symlinks",
-- 
2.26.2



[PATCH 07/13] test_sysctl: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci lib/test_sysctl.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 lib/test_sysctl.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 84eaae22d3a6..b17581307756 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -128,26 +128,6 @@ static struct ctl_table test_table[] = {
{ }
 };
 
-static struct ctl_table test_sysctl_table[] = {
-   {
-   .procname   = "test_sysctl",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_table,
-   },
-   { }
-};
-
-static struct ctl_table test_sysctl_root_table[] = {
-   {
-   .procname   = "debug",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = test_sysctl_table,
-   },
-   { }
-};
-
 static struct ctl_table_header *test_sysctl_header;
 
 static int __init test_sysctl_init(void)
@@ -155,7 +135,8 @@ static int __init test_sysctl_init(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
-   test_sysctl_header = register_sysctl_table(test_sysctl_root_table);
+   test_sysctl_header = register_sysctl_subdir("debug", "test_sysctl",
+   test_table);
if (!test_sysctl_header) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
-- 
2.26.2



[PATCH 08/13] inotify: simplify sysctl declaration with register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
From: Xiaoming Ni 

move inotify_user sysctl to inotify_user.c and use the new
register_sysctl_subdir() helper.

Signed-off-by: Xiaoming Ni 
Signed-off-by: Luis Chamberlain 
---
 fs/notify/inotify/inotify_user.c | 11 ++-
 include/linux/inotify.h  |  3 ---
 kernel/sysctl.c  | 11 ---
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index f88bbcc9efeb..64859fbf8463 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -46,7 +46,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
 
 #include 
 
-struct ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname   = "max_user_instances",
.data   = 
_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES],
@@ -73,6 +73,14 @@ struct ctl_table inotify_table[] = {
},
{ }
 };
+
+static void __init inotify_sysctls_init(void)
+{
+   register_sysctl_subdir("fs", "inotify", inotify_table);
+}
+
+#else
+#define inotify_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
 static inline __u32 inotify_arg_to_mask(u32 arg)
@@ -826,6 +834,7 @@ static int __init inotify_user_setup(void)
inotify_max_queued_events = 16384;
init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
+   inotify_sysctls_init();
 
return 0;
 }
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index 6a24905f6e1e..8d20caa1b268 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -7,11 +7,8 @@
 #ifndef _LINUX_INOTIFY_H
 #define _LINUX_INOTIFY_H
 
-#include 
 #include 
 
-extern struct ctl_table inotify_table[]; /* for sysctl */
-
 #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | 
\
  IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
  IN_MOVED_TO | IN_CREATE | IN_DELETE | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 04ff032f2863..30c2d521502a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -123,10 +123,6 @@ static const int maxolduid = 65535;
 static int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
 
-#ifdef CONFIG_INOTIFY_USER
-#include 
-#endif
-
 #ifdef CONFIG_PROC_SYSCTL
 
 /**
@@ -3012,13 +3008,6 @@ static struct ctl_table fs_table[] = {
.proc_handler   = proc_dointvec,
},
 #endif
-#ifdef CONFIG_INOTIFY_USER
-   {
-   .procname   = "inotify",
-   .mode   = 0555,
-   .child  = inotify_table,
-   },
-#endif 
 #ifdef CONFIG_EPOLL
{
.procname   = "epoll",
-- 
2.26.2



[PATCH 06/13] ocfs2: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci fs/ocfs2/stackglue.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL

Signed-off-by: Luis Chamberlain 
---
 fs/ocfs2/stackglue.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index a191094694c6..addafced7f59 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -677,28 +677,8 @@ static struct ctl_table ocfs2_mod_table[] = {
},
{ }
 };
-
-static struct ctl_table ocfs2_kern_table[] = {
-   {
-   .procname   = "ocfs2",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_mod_table
-   },
-   { }
-};
-
-static struct ctl_table ocfs2_root_table[] = {
-   {
-   .procname   = "fs",
-   .data   = NULL,
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = ocfs2_kern_table
-   },
-   { }
-};
+   .data   = NULL,
+   .data   = NULL,
 
 static struct ctl_table_header *ocfs2_table_header;
 
@@ -711,7 +691,8 @@ static int __init ocfs2_stack_glue_init(void)
 {
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
 
-   ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+   ocfs2_table_header = register_sysctl_subdir("fs", "ocfs2",
+   ocfs2_mod_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
   "ocfs2 stack glue: unable to register sysctl\n");
-- 
2.26.2



[PATCH 01/13] sysctl: add new register_sysctl_subdir() helper

2020-05-29 Thread Luis Chamberlain
Often enough all we need to do is create a subdirectory so that
we can stuff sysctls underneath it. However, *if* that directory
was already created early on the boot sequence we really have no
need to use the full boiler plate code for it, we can just use
local variables to help us guide sysctl to place the new leaf files.

So use a helper to do precisely this.

Signed-off-by: Luis Chamberlain 
---
 include/linux/sysctl.h | 11 +++
 kernel/sysctl.c| 37 +
 2 files changed, 48 insertions(+)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index ddaa06ddd852..58bc978d4f03 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -205,6 +205,9 @@ void unregister_sysctl_table(struct ctl_table_header * 
table);
 extern int sysctl_init(void);
 extern void register_sysctl_init(const char *path, struct ctl_table *table,
 const char *table_name);
+extern struct ctl_table_header *register_sysctl_subdir(const char *base,
+  const char *subdir,
+  struct ctl_table *table);
 void do_sysctl_args(void);
 
 extern int pwrsw_enabled;
@@ -223,6 +226,14 @@ static inline struct ctl_table_header 
*register_sysctl_table(struct ctl_table *
return NULL;
 }
 
+static
+inline struct ctl_table_header *register_sysctl_subdir(const char *base,
+  const char *subdir,
+  struct ctl_table *table)
+{
+   return NULL;
+}
+
 static inline struct ctl_table_header *register_sysctl_paths(
const struct ctl_path *path, struct ctl_table *table)
 {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 008ac0576ae5..04ff032f2863 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3195,6 +3195,43 @@ void __init register_sysctl_init(const char *path, 
struct ctl_table *table,
}
kmemleak_not_leak(hdr);
 }
+
+struct ctl_table_header *register_sysctl_subdir(const char *base,
+   const char *subdir,
+   struct ctl_table *table)
+{
+   struct ctl_table_header *hdr = NULL;
+   struct ctl_table subdir_table[] = {
+   {
+   .procname   = subdir,
+   .mode   = 0555,
+   .child  = table,
+   },
+   { }
+   };
+   struct ctl_table base_table[] = {
+   {
+   .procname   = base,
+   .mode   = 0555,
+   .child  = subdir_table,
+   },
+   { }
+   };
+
+   if (!table->procname)
+   goto out;
+
+   hdr = register_sysctl_table(base_table);
+   if (unlikely(!hdr)) {
+   pr_err("failed when creating subdirectory sysctl %s/%s/%s\n",
+  base, subdir, table->procname);
+   goto out;
+   }
+   kmemleak_not_leak(hdr);
+out:
+   return hdr;
+}
+EXPORT_SYMBOL_GPL(register_sysctl_subdir);
 #endif /* CONFIG_SYSCTL */
 /*
  * No sense putting this after each symbol definition, twice,
-- 
2.26.2



[PATCH 05/13] macintosh/mac_hid.c: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci drivers/macintosh/mac_hid.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL

Signed-off-by: Luis Chamberlain 
---
 drivers/macintosh/mac_hid.c | 25 ++---
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 28b8581b44dd..736d0e151716 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -239,33 +239,12 @@ static struct ctl_table mac_hid_files[] = {
{ }
 };
 
-/* dir in /proc/sys/dev */
-static struct ctl_table mac_hid_dir[] = {
-   {
-   .procname   = "mac_hid",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_files,
-   },
-   { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static struct ctl_table mac_hid_root_dir[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = mac_hid_dir,
-   },
-   { }
-};
-
 static struct ctl_table_header *mac_hid_sysctl_header;
 
 static int __init mac_hid_init(void)
 {
-   mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+   mac_hid_sysctl_header = register_sysctl_subdir("dev", "mac_hid",
+  mac_hid_files);
if (!mac_hid_sysctl_header)
return -ENOMEM;
 
-- 
2.26.2



[PATCH 04/13] i915: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci drivers/gpu/drm/i915/i915_perf.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/gpu/drm/i915/i915_perf.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 665bb076e84d..52509b573794 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4203,26 +4203,6 @@ static struct ctl_table oa_table[] = {
{}
 };
 
-static struct ctl_table i915_root[] = {
-   {
-.procname = "i915",
-.maxlen = 0,
-.mode = 0555,
-.child = oa_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = i915_root,
-},
-   {}
-};
-
 /**
  * i915_perf_init - initialize i915-perf state on module bind
  * @i915: i915 device instance
@@ -4383,7 +4363,7 @@ static int destroy_config(int id, void *p, void *data)
 
 void i915_perf_sysctl_register(void)
 {
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl_subdir("dev", "i915", oa_table);
 }
 
 void i915_perf_sysctl_unregister(void)
-- 
2.26.2



[PATCH 02/13] cdrom: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci drivers/cdrom/cdrom.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain 
---
 drivers/cdrom/cdrom.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index a0a7ae705de8..3c638f464cef 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3719,26 +3719,6 @@ static struct ctl_table cdrom_table[] = {
{ }
 };
 
-static struct ctl_table cdrom_cdrom_table[] = {
-   {
-   .procname   = "cdrom",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_table,
-   },
-   { }
-};
-
-/* Make sure that /proc/sys/dev is there */
-static struct ctl_table cdrom_root_table[] = {
-   {
-   .procname   = "dev",
-   .maxlen = 0,
-   .mode   = 0555,
-   .child  = cdrom_cdrom_table,
-   },
-   { }
-};
 static struct ctl_table_header *cdrom_sysctl_header;
 
 static void cdrom_sysctl_register(void)
@@ -3748,7 +3728,8 @@ static void cdrom_sysctl_register(void)
if (!atomic_add_unless(, 1, 1))
return;
 
-   cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+   cdrom_sysctl_header = register_sysctl_subdir("dev", "cdrom",
+cdrom_table);
 
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
-- 
2.26.2



[PATCH 03/13] hpet: use new sysctl subdir helper register_sysctl_subdir()

2020-05-29 Thread Luis Chamberlain
This simplifies the code considerably. The following coccinelle
SmPL grammar rule was used to transform this code.

// pycocci sysctl-subdir.cocci drivers/char/hpet.c

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-   {
-   .procname = E1,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = sysctls,
-   },
-   { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-   {
-   .procname = E2,
-   .maxlen = 0,
-   .mode = 0555,
-   .child = subdir,
-   },
-   { }
-};

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
@@

header =
-register_sysctl_table(base);
+register_sysctl_subdir(E2, E1, sysctls);

Generated-by: Coccinelle SmPL

Signed-off-by: Luis Chamberlain 
---
 drivers/char/hpet.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index ed3b7dab678d..169c970d5ff8 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = {
{}
 };
 
-static struct ctl_table hpet_root[] = {
-   {
-.procname = "hpet",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_table,
-},
-   {}
-};
-
-static struct ctl_table dev_root[] = {
-   {
-.procname = "dev",
-.maxlen = 0,
-.mode = 0555,
-.child = hpet_root,
-},
-   {}
-};
-
 static struct ctl_table_header *sysctl_header;
 
 /*
@@ -1059,7 +1039,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
 
-   sysctl_header = register_sysctl_table(dev_root);
+   sysctl_header = register_sysctl_subdir("dev", "hpet", hpet_table);
 
result = acpi_bus_register_driver(_acpi_driver);
if (result < 0) {
-- 
2.26.2



[PATCH 00/13] sysctl: spring cleaning

2020-05-29 Thread Luis Chamberlain
Me and Xiaoming are working on some kernel/sysctl.c spring cleaning.
During a recent linux-next merge conflict it became clear that
the kitchen sink on kernel/sysctl.c creates too many conflicts,
and so we need to do away with stuffing everyone's knobs on this
one file.

This is part of that work. This is not expected to get merged yet, but
since our delta is pretty considerable at this point, we need to piece
meal this and collect reviews for what we have so far. This follows up
on some of his recent work.

This series focuses on a new helper to deal with subdirectories and
empty subdirectories. The terminology that we will embrace will be
that things like "fs", "kernel", "debug" are based directories, and
directories underneath this are subdirectories.

In this case, the cleanup ends up also trimming the amount of
code we have for sysctls.

If this seems reasonable we'll kdocify this a bit too.

This code has been boot tested without issues, and I'm letting 0day do
its thing to test against many kconfig builds. If you however spot
any issues please let us know.

Luis Chamberlain (9):
  sysctl: add new register_sysctl_subdir() helper
  cdrom: use new sysctl subdir helper register_sysctl_subdir()
  hpet: use new sysctl subdir helper register_sysctl_subdir()
  i915: use new sysctl subdir helper register_sysctl_subdir()
  macintosh/mac_hid.c: use new sysctl subdir helper
register_sysctl_subdir()
  ocfs2: use new sysctl subdir helper register_sysctl_subdir()
  test_sysctl: use new sysctl subdir helper register_sysctl_subdir()
  sysctl: add helper to register empty subdir
  fs: move binfmt_misc sysctl to its own file

Xiaoming Ni (4):
  inotify: simplify sysctl declaration with register_sysctl_subdir()
  firmware_loader: simplify sysctl declaration with
register_sysctl_subdir()
  eventpoll: simplify sysctl declaration with register_sysctl_subdir()
  random: simplify sysctl declaration with register_sysctl_subdir()

 drivers/base/firmware_loader/fallback.c   |  4 +
 drivers/base/firmware_loader/fallback.h   | 11 +++
 drivers/base/firmware_loader/fallback_table.c | 22 -
 drivers/cdrom/cdrom.c | 23 +
 drivers/char/hpet.c   | 22 +
 drivers/char/random.c | 14 +++-
 drivers/gpu/drm/i915/i915_perf.c  | 22 +
 drivers/macintosh/mac_hid.c   | 25 +-
 fs/binfmt_misc.c  |  1 +
 fs/eventpoll.c| 10 ++-
 fs/notify/inotify/inotify_user.c  | 11 ++-
 fs/ocfs2/stackglue.c  | 27 +-
 include/linux/inotify.h   |  3 -
 include/linux/poll.h  |  2 -
 include/linux/sysctl.h| 21 -
 kernel/sysctl.c   | 84 +++
 lib/test_sysctl.c | 23 +
 17 files changed, 144 insertions(+), 181 deletions(-)

-- 
2.26.2



[PATCH] powerpc/64/syscall: Disable sanitisers for C syscall entry/exit code

2020-05-29 Thread Daniel Axtens
syzkaller is picking up a bunch of crashes that look like this:

Unrecoverable exception 380 at c037ed60 (msr=80001031)
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 
5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0
NIP:  c037ed60 LR: c004bac8 CTR: c0030990
REGS: c000555a7230 TRAP: 0380   Not tainted  
(5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e)
MSR:  80001031   CR: 48222882  XER: 2000
CFAR: c004bac4 IRQMASK: 0
GPR00: c004bb68 c000555a74c0 c24b3500 0005
GPR04:   c004bb88 c0080091
GPR08: 000b c004bac8 00016000 c2503500
GPR12: c0030990 c319 106a5898 106a
GPR16: 106a5890 c7a92000 c8180e00 c7a8f700
GPR20: c7a904b0 1011 c259d318 5deadbeef100
GPR24: 5deadbeef122 c00078422700 c9ee88b8 c00078422778
GPR28: 0001 8280b033  c000555a75a0
NIP [c037ed60] __sanitizer_cov_trace_pc+0x40/0x50
LR [c004bac8] interrupt_exit_kernel_prepare+0x118/0x310
Call Trace:
[c000555a74c0] [c004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 
(unreliable)
[c000555a7530] [c000f9a8] interrupt_return+0x118/0x1c0
--- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50
..

That looks like the KCOV helper accessing memory that's not safe to
access in the interrupt handling context.

Do not instrument the new syscall entry/exit code with KCOV, GCOV or
UBSAN.

Cc: Nicholas Piggin 
Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in 
C")
Signed-off-by: Daniel Axtens 

---

be warned: I haven't attempted to reproduce the crash yet,
nor have I been able to test that this fixes it. I will attempt to do
that soon. Logically though, it does seem like this would be a
good thing to do regardless.
---
 arch/powerpc/kernel/Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1c4385852d3d..1d443a7dc8a7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -156,12 +156,19 @@ obj-$(CONFIG_PPC_SECVAR_SYSFS)+= secvar-sysfs.o
 GCOV_PROFILE_prom_init.o := n
 KCOV_INSTRUMENT_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
+
 GCOV_PROFILE_kprobes.o := n
 KCOV_INSTRUMENT_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
+
 GCOV_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
+
+GCOV_PROFILE_syscall_64.o := n
+KCOV_INSTRUMENT_syscall_64.o := n
+UBSAN_SANITIZE_syscall_64.o := n
+
 UBSAN_SANITIZE_vdso.o := n
 
 # Necessary for booting with kcov enabled on book3e machines
-- 
2.20.1