date:20200317

Re: [PATCH 07/15] powerpc/watchpoint: Get watchpoint count dynamically while disabling them

2020-03-17 Thread Ravi Bangoria





On 3/17/20 4:02 PM, Christophe Leroy wrote:



Le 09/03/2020 à 09:57, Ravi Bangoria a écrit :

Instead of disabling only one watchpooint, get num of available
watchpoints dynamically and disable all of them.

Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/include/asm/hw_breakpoint.h | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 980ac7d9f267..ec61e2b7195c 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -75,14 +75,13 @@ extern void ptrace_triggered(struct perf_event *bp,
  struct perf_sample_data *data, struct pt_regs *regs);
  static inline void hw_breakpoint_disable(void)
  {
-    struct arch_hw_breakpoint brk;
-
-    brk.address = 0;
-    brk.type = 0;
-    brk.len = 0;
-    brk.hw_len = 0;
-    if (ppc_breakpoint_available())
-    __set_breakpoint(&brk, 0);
+    int i;
+    struct arch_hw_breakpoint null_brk = {0};
+
+    if (ppc_breakpoint_available()) {


I think this test should go into nr_wp_slots() which should return zero when no 
breakpoint is available.


Seems possible. Will change it in next version.

Thanks,
Ravi

Re: [PATCH 5/5] selftests/powerpc: Add README for GZIP engine tests

2020-03-17 Thread Daniel Axtens

This is a good readme, the instructions for compiling and testing work.

Reviewed-by: Daniel Axtens 

Regards,
Daniel

Raphael Moreira Zinsly  writes:

> Include a README file with the instructions to use the
> testcases at selftests/powerpc/nx-gzip.
>
> Signed-off-by: Bulent Abali 
> Signed-off-by: Raphael Moreira Zinsly 
> ---
>  .../powerpc/nx-gzip/99-nx-gzip.rules  |  1 +
>  .../testing/selftests/powerpc/nx-gzip/README  | 44 +++
>  2 files changed, 45 insertions(+)
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/README
>
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules 
> b/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules
> new file mode 100644
> index ..5a7118495cb3
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules
> @@ -0,0 +1 @@
> +SUBSYSTEM=="nxgzip", KERNEL=="nx-gzip", MODE="0666"
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/README 
> b/tools/testing/selftests/powerpc/nx-gzip/README
> new file mode 100644
> index ..ff0c817a65c5
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/nx-gzip/README
> @@ -0,0 +1,44 @@
> +Test the nx-gzip function:
> +=
> +
> +Verify that following device exists:
> +  /dev/crypto/nx-gzip
> +If you get a permission error run as sudo or set the device permissions:
> +   sudo chmod go+rw /dev/crypto/nx-gzip
> +However, chmod may not survive across boots. You may create a udev file such
> +as:
> +   /etc/udev/rules.d/99-nx-gzip.rules
> +
> +
> +Then make and run:
> +$ make
> +gcc -O3 -I./inc -o gzfht_test gzfht_test.c gzip_vas.c
> +gcc -O3 -I./inc -o gunz_test gunz_test.c gzip_vas.c
> +
> +
> +Compress any file using Fixed Huffman mode. Output will have a .nx.gz suffix:
> +$ ./gzfht_test gzip_vas.c
> +file gzip_vas.c read, 5276 bytes
> +compressed 5276 to 2564 bytes total, crc32 checksum = b937a37d
> +
> +
> +Uncompress the previous output. Output will have a .nx.gunzip suffix:
> +$ ./gunz_test gzip_vas.c.nx.gz
> +gzHeader FLG 0
> +00 00 00 00 04 03
> +gzHeader MTIME, XFL, OS ignored
> +computed checksum b937a37d isize 149c
> +stored   checksum b937a37d isize 149c
> +decomp is complete: fclose
> +
> +
> +Compare two files:
> +$ sha1sum gzip_vas.c.nx.gz.nx.gunzip gzip_vas.c
> +f041cd8581e8d920f79f6ce7f65411be5d026c2a  gzip_vas.c.nx.gz.nx.gunzip
> +f041cd8581e8d920f79f6ce7f65411be5d026c2a  gzip_vas.c
> +
> +
> +Note that the code here are intended for testing the nx-gzip hardware 
> function.
> +They are not intended for demonstrating performance or compression ratio.
> +For more information and source code consider using:
> +https://github.com/libnxz/power-gzip
> -- 
> 2.21.0

Re: [PATCH 05/15] powerpc/watchpoint: Provide DAWR number to set_dawr

2020-03-17 Thread Ravi Bangoria





On 3/17/20 3:58 PM, Christophe Leroy wrote:



Le 09/03/2020 à 09:57, Ravi Bangoria a écrit :

Introduce new parameter 'nr' to set_dawr() which indicates which DAWR
should be programed.


While we are at it (In another patch I think), we should do the same to 
set_dabr() so that we can use both DABR and DABR2


This series is for DAWR only and does not support DABR2. I'll look
at how other book3s family processors provides DABR2 supports.

Thanks,
Ravi

Re: [PATCH 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase

2020-03-17 Thread Daniel Axtens

Raphael Moreira Zinsly  writes:

> Include a decompression testcase for the powerpc NX-GZIP
> engine.

I compiled gzip with the AFL++ fuzzer and generated a corpus of tests to
run against this decompressor. I also fuzzed the decompressor
directly. I found a few issues. I _think_ they're just in the userspace
but I'm a bit too early in the process to know.

I realise this is self-test code but:
a) it stops me testing more deeply, and
b) it looks like some of this code is shared with 
https://github.com/libnxz/power-gzip/

The issues I've found are:

1) In the ERR_NX_DATA_LENGTH case, the decompressor doesn't check that
   you're making forward progress, so you can provoke it into an
   infinite loop.

Here's an _extremely_ ugly fix:

diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
index 653de92698cc..236a1f567656 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
@@ -343,6 +343,8 @@ int decompress_file(int argc, char **argv, void *devhandle)
nx_dde_t dde_out[6] __attribute__((aligned (128)));
int pgfault_retries;
 
+   int last_first_used = 0;
+
/* when using mmap'ed files */
off_t input_file_offset;
 
@@ -642,6 +644,11 @@ int decompress_file(int argc, char **argv, void *devhandle)
first_used = fifo_used_first_bytes(cur_in, used_in, fifo_in_len);
last_used = fifo_used_last_bytes(cur_in, used_in, fifo_in_len);
 
+   if (first_used > 0 && last_first_used > 0) {
+   assert(first_used != last_first_used);
+   }
+   last_first_used = first_used;
+
if (first_used > 0)
nx_append_dde(ddl_in, fifo_in + cur_in, first_used);
 

2) It looks like you can provoke an out-of-bounds write. I've seen both
infinte loops printing something that seems to come from the file
content like:

57201: Got signal 11 si_code 3, si_addr 0xcacacacacacacac8

or a less bizzare address like

19285: Got signal 11 si_code 1, si_addr 0x7fffcf1b

Depending on the build I've also seen the stack smasher protection fire.

I don't understand the code well enough to figure out how this comes to
be just yet.

I've included a few test cases as attachments. I've preconverted them
with xxd to avoid anything that might flag suspicious gzip files!
Decompress them then use `xxd -r attachment testcase.gz` to convert them
back.

Regards,
Daniel



infloop.bz2
Description: Binary data


sig1.bz2
Description: Binary data


sig676767.bz2
Description: Binary data


sigededed.bz2
Description: Binary data


>
> Signed-off-by: Bulent Abali 
> Signed-off-by: Raphael Moreira Zinsly 
> ---
>  .../selftests/powerpc/nx-gzip/Makefile|7 +-
>  .../selftests/powerpc/nx-gzip/gunz_test.c | 1058 +
>  2 files changed, 1062 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
>
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile 
> b/tools/testing/selftests/powerpc/nx-gzip/Makefile
> index ab903f63bbbd..82abc19a49a0 100644
> --- a/tools/testing/selftests/powerpc/nx-gzip/Makefile
> +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile
> @@ -1,9 +1,9 @@
>  CC = gcc
>  CFLAGS = -O3
>  INC = ./inc
> -SRC = gzfht_test.c
> +SRC = gzfht_test.c gunz_test.c
>  OBJ = $(SRC:.c=.o)
> -TESTS = gzfht_test
> +TESTS = gzfht_test gunz_test
>  EXTRA_SOURCES = gzip_vas.c
>  
>  all: $(TESTS)
> @@ -16,6 +16,7 @@ $(TESTS): $(OBJ)
>  
>  run_tests: $(TESTS)
>   ./gzfht_test gzip_vas.c
> + ./gunz_test gzip_vas.c.nx.gz
>  
>  clean:
> - rm -f $(TESTS) *.o *~ *.gz
> + rm -f $(TESTS) *.o *~ *.gz *.gunzip
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c 
> b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
> new file mode 100644
> index ..653de92698cc
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
> @@ -0,0 +1,1058 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * P9 gunzip sample code for demonstrating the P9 NX hardware
> + * interface.  Not intended for productive uses or for performance or
> + * compression ratio measurements.  Note also that /dev/crypto/gzip,
> + * VAS and skiboot support are required
> + *
> + * Copyright 2020 IBM Corp.
> + *
> + * Author: Bulent Abali 
> + *
> + * https://github.com/libnxz/power-gzip for zlib api and other utils
> + * Definitions of acronyms used here.  See
> + * P9 NX Gzip Accelerator User's Manual for details
> + *
> + * adler/crc: 32 bit checksums appended to stream tail
> + * ce:   completion extension
> + * cpb:  coprocessor parameter block (metadata)
> + * crb:  coprocessor request block (command)
> + * csb:  coprocessor status block (status)
> + * dht:  dynamic huffman table
> + * dde:  data descriptor element (address, length)
> + * ddl:  list of ddes
> + * dh/fh:dynamic and fixed huffman types
> + *

[PATCHv2] selftests/powerpc: Turn off timeout setting for benchmarks, dscr, signal, tm

2020-03-17 Thread Po-Hsu Lin

Some specific tests in powerpc can take longer than the default 45
seconds that added in commit 852c8cbf34d3 ("selftests/kselftest/runner.sh:
Add 45 second timeout per test") to run, the following test result was
collected across 2 Power8 nodes and 1 Power9 node in our pool:
  powerpc/benchmarks/futex_bench - 52s
  powerpc/dscr/dscr_sysfs_test - 116s
  powerpc/signal/signal_fuzzer - 88s
  powerpc/tm/tm_unavailable_test - 168s
  powerpc/tm/tm-poison - 240s

Thus they will fail with TIMEOUT error. Disable the timeout setting
for these sub-tests to allow them finish properly.

https://bugs.launchpad.net/bugs/1864642
Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per 
test")
Signed-off-by: Po-Hsu Lin 
---
 tools/testing/selftests/powerpc/benchmarks/Makefile | 2 ++
 tools/testing/selftests/powerpc/benchmarks/settings | 1 +
 tools/testing/selftests/powerpc/dscr/Makefile   | 2 ++
 tools/testing/selftests/powerpc/dscr/settings   | 1 +
 tools/testing/selftests/powerpc/signal/Makefile | 2 ++
 tools/testing/selftests/powerpc/signal/settings | 1 +
 tools/testing/selftests/powerpc/tm/Makefile | 2 ++
 tools/testing/selftests/powerpc/tm/settings | 1 +
 8 files changed, 12 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/benchmarks/settings
 create mode 100644 tools/testing/selftests/powerpc/dscr/settings
 create mode 100644 tools/testing/selftests/powerpc/signal/settings
 create mode 100644 tools/testing/selftests/powerpc/tm/settings

diff --git a/tools/testing/selftests/powerpc/benchmarks/Makefile 
b/tools/testing/selftests/powerpc/benchmarks/Makefile
index d40300a..a32a6ab 100644
--- a/tools/testing/selftests/powerpc/benchmarks/Makefile
+++ b/tools/testing/selftests/powerpc/benchmarks/Makefile
@@ -2,6 +2,8 @@
 TEST_GEN_PROGS := gettimeofday context_switch fork mmap_bench futex_bench 
null_syscall
 TEST_GEN_FILES := exec_target
 
+TEST_FILES := settings
+
 CFLAGS += -O2
 
 top_srcdir = ../../../../..
diff --git a/tools/testing/selftests/powerpc/benchmarks/settings 
b/tools/testing/selftests/powerpc/benchmarks/settings
new file mode 100644
index 000..e7b9417
--- /dev/null
+++ b/tools/testing/selftests/powerpc/benchmarks/settings
@@ -0,0 +1 @@
+timeout=0
diff --git a/tools/testing/selftests/powerpc/dscr/Makefile 
b/tools/testing/selftests/powerpc/dscr/Makefile
index 5df4763..cfa6eed 100644
--- a/tools/testing/selftests/powerpc/dscr/Makefile
+++ b/tools/testing/selftests/powerpc/dscr/Makefile
@@ -3,6 +3,8 @@ TEST_GEN_PROGS := dscr_default_test dscr_explicit_test 
dscr_user_test   \
  dscr_inherit_test dscr_inherit_exec_test dscr_sysfs_test  \
  dscr_sysfs_thread_test
 
+TEST_FILES := settings
+
 top_srcdir = ../../../../..
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/dscr/settings 
b/tools/testing/selftests/powerpc/dscr/settings
new file mode 100644
index 000..e7b9417
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dscr/settings
@@ -0,0 +1 @@
+timeout=0
diff --git a/tools/testing/selftests/powerpc/signal/Makefile 
b/tools/testing/selftests/powerpc/signal/Makefile
index 113838f..153fafc 100644
--- a/tools/testing/selftests/powerpc/signal/Makefile
+++ b/tools/testing/selftests/powerpc/signal/Makefile
@@ -5,6 +5,8 @@ CFLAGS += -maltivec
 $(OUTPUT)/signal_tm: CFLAGS += -mhtm
 $(OUTPUT)/sigfuz: CFLAGS += -pthread -m64
 
+TEST_FILES := settings
+
 top_srcdir = ../../../../..
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/signal/settings 
b/tools/testing/selftests/powerpc/signal/settings
new file mode 100644
index 000..e7b9417
--- /dev/null
+++ b/tools/testing/selftests/powerpc/signal/settings
@@ -0,0 +1 @@
+timeout=0
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index b15a1a3..7b99d09 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -7,6 +7,8 @@ TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack
$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \
tm-signal-context-force-tm tm-poison
 
+TEST_FILES := settings
+
 top_srcdir = ../../../../..
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/tm/settings 
b/tools/testing/selftests/powerpc/tm/settings
new file mode 100644
index 000..e7b9417
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/settings
@@ -0,0 +1 @@
+timeout=0
-- 
2.7.4

Re: [PATCH 03/15] powerpc/watchpoint: Introduce function to get nr watchpoints dynamically

2020-03-17 Thread Ravi Bangoria





diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index f2f8d8aa8e3b..741c4f7573c4 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -43,6 +43,8 @@ struct arch_hw_breakpoint {
  #define DABR_MAX_LEN    8
  #define DAWR_MAX_LEN    512
+extern int nr_wp_slots(void);


'extern' keyword is unneeded and irrelevant here. Please remove it. Even 
checkpatch is unhappy 
(https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/12172//artifact/linux/checkpatch.log)


Sure.

...

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 110db94cdf3c..6d4b029532e2 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -835,6 +835,12 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint 
*a,
  return true;
  }
+/* Returns total number of data breakpoints available. */
+int nr_wp_slots(void)
+{
+    return HBP_NUM_MAX;
+}
+


This is not worth a global function. At least it should be a static function 
located in hw_breakpoint.c. But it would be even better to have it as a static 
inline in asm/hw_breakpoint.h


Makes sense. Will change it.

Thanks.

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Andrew Morton

On Tue, 17 Mar 2020 08:04:14 + (UTC) Christophe Leroy 
 wrote:

> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
> following build failure is encoutered:
> 
> In file included from arch/powerpc/mm/fault.c:33:0:
> ./include/linux/hugetlb.h: In function 'hstate_inode':
> ./include/linux/hugetlb.h:477:9: error: implicit declaration of function 
> 'HUGETLBFS_SB' [-Werror=implicit-function-declaration]
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>  ^
> ./include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 
> 'int')
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>   ^
> 
> Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE.
> 
> Reported-by: kbuild test robot 
> Link: https://patchwork.ozlabs.org/patch/1255548/#2386036
> Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes")

A 12 year old build error?  If accurate, that has to be a world record.

> Cc: sta...@vger.kernel.org

I think I'll remove this.  Obviously nobody is suffering from this problem!

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Bharata B Rao

On Wed, Mar 18, 2020 at 08:50:44AM +0530, Srikar Dronamraju wrote:
> * Vlastimil Babka  [2020-03-17 17:45:15]:
> 
> > On 3/17/20 5:25 PM, Srikar Dronamraju wrote:
> > > * Vlastimil Babka  [2020-03-17 16:56:04]:
> > > 
> > >> 
> > >> I wonder why do you get a memory leak while Sachin in the same situation 
> > >> [1]
> > >> gets a crash? I don't understand anything anymore.
> > > 
> > > Sachin was testing on linux-next which has Kirill's patch which modifies
> > > slub to use kmalloc_node instead of kmalloc. While Bharata is testing on
> > > upstream, which doesn't have this. 
> > 
> > Yes, that Kirill's patch was about the memcg shrinker map allocation. But 
> > the
> > patch hunk that Bharata posted as a "hack" that fixes the problem, it 
> > follows
> > that there has to be something else that calls kmalloc_node(node) where 
> > node is
> > one that doesn't have present pages.
> > 
> > He mentions alloc_fair_sched_group() which has:
> > 
> > for_each_possible_cpu(i) {
> > cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
> >   GFP_KERNEL, cpu_to_node(i));
> > ...
> > se = kzalloc_node(sizeof(struct sched_entity),
> >   GFP_KERNEL, cpu_to_node(i));
> > 
> 
> 
> Sachin's experiment.
> Upstream-next/ memcg /
> possible nodes were 0-31
> online nodes were 0-1
> kmalloc_node called for_each_node / for_each_possible_node.
> This would crash while allocating slab from !N_ONLINE nodes.
> 
> Bharata's experiment.
> Upstream
> possible nodes were 0-1
> online nodes were 0-1
> kmalloc_node called for_each_online_node/ for_each_possible_cpu
> i.e kmalloc is called for N_ONLINE nodes.
> So wouldn't crash
> 
> Even if his possible nodes were 0-256. I don't think we have kmalloc_node
> being called in !N_ONLINE nodes. Hence its not crashing.
> If we see the above code that you quote, kzalloc_node is using cpu_to_node
> which in Bharata's case will always return 1.
> 
> 
> > I assume one of these structs is 1k and other 512 bytes (rounded) and that 
> > for
> > some possible cpu's cpu_to_node(i) will be 0, which has no present pages. 
> > And as
> > Bharata pasted, node_to_mem_node(0) = 0

Correct, these two kazalloc_node() calls for all possible cpus are
causing increased slab memory consumption in my case.

> > So this looks like the same scenario, but it doesn't crash? Is the node 0
> > actually online here, and/or does it have N_NORMAL_MEMORY state?
> 

Node 0 is online, but N_NORMAL_MEMORY state is empty. In fact memory
leak goes away if I insert the below check/assignment in the slab
alloc code path:

+   if (!node_isset(node, node_states[N_NORMAL_MEMORY]))
+   node = NUMA_NO_NODE;

Regards,
Bharata.

Re: [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-03-17 Thread Alexey Kardashevskiy




On 18/02/2020 18:36, Alexey Kardashevskiy wrote:
> Here is an attempt to support bigger DMA space for devices
> supporting DMA masks less than 59 bits (GPUs come into mind
> first). POWER9 PHBs have an option to map 2 windows at 0
> and select a windows based on DMA address being below or above
> 4GB.
> 
> This adds the "iommu=iommu_bypass" kernel parameter and
> supports VFIO+pseries machine - current this requires telling
> upstream+unmodified QEMU about this via
> -global spapr-pci-host-bridge.dma64_win_addr=0x1
> or per-phb property. 4/4 advertises the new option but
> there is no automation around it in QEMU (should it be?).
> 
> For now it is either 1<<59 or 4GB mode; dynamic switching is
> not supported (could be via sysfs).
> 
> This is a rebased version of
> https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/
> 
> This is based on sha1
> 71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux".
> 
> Please comment. Thanks.


Anyone from the POWERPC side wants to comment? Thanks,


> 
> 
> 
> Alexey Kardashevskiy (5):
>   powerpc/powernv/ioda: Move TCE bypass base to PE
>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>   powerpc/powernv/ioda: Allow smaller TCE table levels
>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
> 
>  arch/powerpc/include/asm/iommu.h  |   2 +
>  arch/powerpc/include/asm/opal-api.h   |   9 +-
>  arch/powerpc/include/asm/opal.h   |   2 +
>  arch/powerpc/platforms/powernv/pci.h  |   2 +-
>  include/uapi/linux/vfio.h |   2 +
>  arch/powerpc/platforms/powernv/npu-dma.c  |   1 +
>  arch/powerpc/platforms/powernv/opal-call.c|   2 +
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 229 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  10 +-
>  10 files changed, 207 insertions(+), 56 deletions(-)
> 

-- 
Alexey

Re: [PATCH 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase

2020-03-17 Thread Daniel Axtens

Raphael Moreira Zinsly  writes:

> Include a decompression testcase for the powerpc NX-GZIP
> engine.
>
> Signed-off-by: Bulent Abali 
> Signed-off-by: Raphael Moreira Zinsly 
> ---
>  .../selftests/powerpc/nx-gzip/Makefile|7 +-
>  .../selftests/powerpc/nx-gzip/gunz_test.c | 1058 +
>  2 files changed, 1062 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
>
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile 
> b/tools/testing/selftests/powerpc/nx-gzip/Makefile
> index ab903f63bbbd..82abc19a49a0 100644
> --- a/tools/testing/selftests/powerpc/nx-gzip/Makefile
> +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile
> @@ -1,9 +1,9 @@
>  CC = gcc
>  CFLAGS = -O3
>  INC = ./inc
> -SRC = gzfht_test.c
> +SRC = gzfht_test.c gunz_test.c
>  OBJ = $(SRC:.c=.o)
> -TESTS = gzfht_test
> +TESTS = gzfht_test gunz_test
>  EXTRA_SOURCES = gzip_vas.c
>  
>  all: $(TESTS)
> @@ -16,6 +16,7 @@ $(TESTS): $(OBJ)
>  
>  run_tests: $(TESTS)
>   ./gzfht_test gzip_vas.c
> + ./gunz_test gzip_vas.c.nx.gz
>  
>  clean:
> - rm -f $(TESTS) *.o *~ *.gz
> + rm -f $(TESTS) *.o *~ *.gz *.gunzip
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c 
> b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
> new file mode 100644
> index ..653de92698cc
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
> @@ -0,0 +1,1058 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * P9 gunzip sample code for demonstrating the P9 NX hardware
> + * interface.  Not intended for productive uses or for performance or
> + * compression ratio measurements.  Note also that /dev/crypto/gzip,
> + * VAS and skiboot support are required
> + *
> + * Copyright 2020 IBM Corp.
> + *
> + * Author: Bulent Abali 
> + *
> + * https://github.com/libnxz/power-gzip for zlib api and other utils
> + * Definitions of acronyms used here.  See
> + * P9 NX Gzip Accelerator User's Manual for details
> + *
> + * adler/crc: 32 bit checksums appended to stream tail
> + * ce:   completion extension
> + * cpb:  coprocessor parameter block (metadata)
> + * crb:  coprocessor request block (command)
> + * csb:  coprocessor status block (status)
> + * dht:  dynamic huffman table
> + * dde:  data descriptor element (address, length)
> + * ddl:  list of ddes
> + * dh/fh:dynamic and fixed huffman types
> + * fc:   coprocessor function code
> + * histlen:  history/dictionary length
> + * history:  sliding window of up to 32KB of data
> + * lzcount:  Deflate LZ symbol counts
> + * rembytecnt: remaining byte count
> + * sfbt: source final block type; last block's type during decomp
> + * spbc: source processed byte count
> + * subc: source unprocessed bit count
> + * tebc: target ending bit count; valid bits in the last byte
> + * tpbc: target processed byte count
> + * vas:  virtual accelerator switch; the user mode interface
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "nxu.h"
> +#include "nx.h"
> +
> +int nx_dbg = 0;
> +FILE *nx_gzip_log = NULL;
> +
> +#define NX_MIN(X, Y) (((X) < (Y))?(X):(Y))
> +#define NX_MAX(X, Y) (((X) > (Y))?(X):(Y))
> +
> +#define mb() asm volatile("sync" ::: "memory")
> +#define rmb()asm volatile("lwsync" ::: "memory")
> +#define wmb()rmb()
> +
> +const int fifo_in_len = 1<<24;
> +const int fifo_out_len = 1<<24;
> +const int page_sz = 1<<16;
> +const int line_sz = 1<<7;
> +const int window_max = 1<<15;
> +const int retry_max = 50;
> +
> +extern void *nx_fault_storage_address;
> +extern void *nx_function_begin(int function, int pri);
> +extern int nx_function_end(void *handle);
> +
> +/*
> + * Fault in pages prior to NX job submission.  wr=1 may be required to
> + * touch writeable pages.  System zero pages do not fault-in the page as
> + * intended.  Typically set wr=1 for NX target pages and set wr=0 for
> + * NX source pages.
> + */
> +static int nx_touch_pages(void *buf, long buf_len, long page_len, int wr)
> +{
> + char *begin = buf;
> + char *end = (char *) buf + buf_len - 1;
> + volatile char t;
> +
> + assert(buf_len >= 0 && !!buf);
> +
> + NXPRT(fprintf(stderr, "touch %p %p len 0x%lx wr=%d\n", buf,
> + buf + buf_len, buf_len, wr));
> +
> + if (buf_len <= 0 || buf == NULL)
> + return -1;
> +
> + do {
> + t = *begin;
> + if (wr)
> + *begin = t;
> + begin = begin + page_len;
> + } while (begin < end);
> +
> + /* When buf_sz is small or buf tail is in another page. */
> + t = *end;
> + if (wr)
> + *end = t;
> +
> + return 0;
> +}
> +
> +void sigsegv_handler(int sig, siginfo_t *info, void *ctx)
> +{
>

Re: [PATCH 1/5] selftests/powerpc: Add header files for GZIP engine test

2020-03-17 Thread Daniel Axtens

Hi,

This is throwing a number of snowpatch warnings, as well as a whitespace
warning when I apply it. Please could you check the warnings at
https://patchwork.ozlabs.org/patch/1255779/

It looks like the rest of the series also throws some warnings - please
check those also.

Kind regards,
Daniel


Raphael Moreira Zinsly  writes:

> Add files to access the powerpc NX-GZIP engine in user space.
>
> Signed-off-by: Bulent Abali 
> Signed-off-by: Raphael Moreira Zinsly 
> ---
>  .../selftests/powerpc/nx-gzip/inc/crb.h   | 170 ++
>  .../selftests/powerpc/nx-gzip/inc/nx-gzip.h   |  27 +++
>  .../powerpc/nx-gzip/inc/nx-helpers.h  |  53 ++
>  .../selftests/powerpc/nx-gzip/inc/nx.h|  30 
>  4 files changed, 280 insertions(+)
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/crb.h
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h
>  create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx.h
>
> diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h 
> b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h
> new file mode 100644
> index ..6af25fb8461a
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h
> @@ -0,0 +1,170 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef __CRB_H
> +#define __CRB_H
> +#include 
> +
> +typedef unsigned char u8;
> +typedef unsigned int u32;
> +typedef unsigned long long u64;
> +
> +/* From nx-842.h */
> +
> +/* CCW 842 CI/FC masks
> + * NX P8 workbook, section 4.3.1, figure 4-6
> + * "CI/FC Boundary by NX CT type"
> + */
> +#define CCW_CI_842  (0x3ff8)
> +#define CCW_FC_842  (0x0007)
> +
> +/* end - nx-842.h */
> +
> +#ifndef __aligned
> +#define __aligned(x)__attribute__((aligned(x)))
> +#endif
> +
> +#ifndef __packed
> +#define __packed__attribute__((packed))
> +#endif
> +
> +/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */
> +
> +#define CCB_VALUE(0x3fff)
> +#define CCB_ADDRESS  (0xfff8)
> +#define CCB_CM   (0x0007)
> +#define CCB_CM0  (0x0004)
> +#define CCB_CM12 (0x0003)
> +
> +#define CCB_CM0_ALL_COMPLETIONS  (0x0)
> +#define CCB_CM0_LAST_IN_CHAIN(0x4)
> +#define CCB_CM12_STORE   (0x0)
> +#define CCB_CM12_INTERRUPT   (0x1)
> +
> +#define CCB_SIZE (0x10)
> +#define CCB_ALIGNCCB_SIZE
> +
> +struct coprocessor_completion_block {
> + __be64 value;
> + __be64 address;
> +} __packed __aligned(CCB_ALIGN);
> +
> +
> +/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */
> +
> +#define CSB_V(0x80)
> +#define CSB_F(0x04)
> +#define CSB_CH   (0x03)
> +#define CSB_CE_INCOMPLETE(0x80)
> +#define CSB_CE_TERMINATION   (0x40)
> +#define CSB_CE_TPBC  (0x20)
> +
> +#define CSB_CC_SUCCESS   (0)
> +#define CSB_CC_INVALID_ALIGN (1)
> +#define CSB_CC_OPERAND_OVERLAP   (2)
> +#define CSB_CC_DATA_LENGTH   (3)
> +#define CSB_CC_TRANSLATION   (5)
> +#define CSB_CC_PROTECTION(6)
> +#define CSB_CC_RD_EXTERNAL   (7)
> +#define CSB_CC_INVALID_OPERAND   (8)
> +#define CSB_CC_PRIVILEGE (9)
> +#define CSB_CC_INTERNAL  (10)
> +#define CSB_CC_WR_EXTERNAL   (12)
> +#define CSB_CC_NOSPC (13)
> +#define CSB_CC_EXCESSIVE_DDE (14)
> +#define CSB_CC_WR_TRANSLATION(15)
> +#define CSB_CC_WR_PROTECTION (16)
> +#define CSB_CC_UNKNOWN_CODE  (17)
> +#define CSB_CC_ABORT (18)
> +#define CSB_CC_TRANSPORT (20)
> +#define CSB_CC_SEGMENTED_DDL (31)
> +#define CSB_CC_PROGRESS_POINT(32)
> +#define CSB_CC_DDE_OVERFLOW  (33)
> +#define CSB_CC_SESSION   (34)
> +#define CSB_CC_PROVISION (36)
> +#define CSB_CC_CHAIN (37)
> +#define CSB_CC_SEQUENCE  (38)
> +#define CSB_CC_HW(39)
> +
> +#define CSB_SIZE (0x10)
> +#define CSB_ALIGNCSB_SIZE
> +
> +struct coprocessor_status_block {
> + u8 flags;
> + u8 cs;
> + u8 cc;
> + u8 ce;
> + __be32 count;
> + __be64 address;
> +} __packed __aligned(CSB_ALIGN);
> +
> +
> +/* Chapter 6.5.10 Data-Descriptor List (DDL)
> + * each list contains one or more Data-Descriptor Entries (DDE)
> + */
> +
> +#define DDE_P(0x8000)
> +
> +#define DDE_SIZE (0x10)
> +#define DDE_ALIGNDDE_SIZE
> +
> +struct data_descriptor_entry {
> + __be16 flags;
> + u8 count;
> + u8 index;
> + __be32 length;
> + __be64 address;
> +} __packed __aligned(DDE_ALIGN);
> +
> +
> +/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
> +
> +#define CRB_SIZE (0x80)
> +#define CRB_ALIGN(0x100) /* Errata: requires 256 alignment */
> +
> +
> +/* Copro

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Srikar Dronamraju

* Vlastimil Babka  [2020-03-17 17:45:15]:

> On 3/17/20 5:25 PM, Srikar Dronamraju wrote:
> > * Vlastimil Babka  [2020-03-17 16:56:04]:
> > 
> >> 
> >> I wonder why do you get a memory leak while Sachin in the same situation 
> >> [1]
> >> gets a crash? I don't understand anything anymore.
> > 
> > Sachin was testing on linux-next which has Kirill's patch which modifies
> > slub to use kmalloc_node instead of kmalloc. While Bharata is testing on
> > upstream, which doesn't have this. 
> 
> Yes, that Kirill's patch was about the memcg shrinker map allocation. But the
> patch hunk that Bharata posted as a "hack" that fixes the problem, it follows
> that there has to be something else that calls kmalloc_node(node) where node 
> is
> one that doesn't have present pages.
> 
> He mentions alloc_fair_sched_group() which has:
> 
> for_each_possible_cpu(i) {
> cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
>   GFP_KERNEL, cpu_to_node(i));
> ...
> se = kzalloc_node(sizeof(struct sched_entity),
>   GFP_KERNEL, cpu_to_node(i));
> 

Sachin's experiment.
Upstream-next/ memcg /
possible nodes were 0-31
online nodes were 0-1
kmalloc_node called for_each_node / for_each_possible_node.
This would crash while allocating slab from !N_ONLINE nodes.

Bharata's experiment.
Upstream
possible nodes were 0-1
online nodes were 0-1
kmalloc_node called for_each_online_node/ for_each_possible_cpu
i.e kmalloc is called for N_ONLINE nodes.
So wouldn't crash

Even if his possible nodes were 0-256. I don't think we have kmalloc_node
being called in !N_ONLINE nodes. Hence its not crashing.
If we see the above code that you quote, kzalloc_node is using cpu_to_node
which in Bharata's case will always return 1.

> I assume one of these structs is 1k and other 512 bytes (rounded) and that for
> some possible cpu's cpu_to_node(i) will be 0, which has no present pages. And 
> as
> Bharata pasted, node_to_mem_node(0) = 0
> So this looks like the same scenario, but it doesn't crash? Is the node 0
> actually online here, and/or does it have N_NORMAL_MEMORY state?

I still dont have any clue on the leak though.

-- 
Thanks and Regards
Srikar Dronamraju

[PATCH] soc: fsl: enable acpi support

2020-03-17 Thread Peng Ma

This patch enables ACPI support in Rcpm driver.

Signed-off-by: Peng Ma 
---
 drivers/soc/fsl/rcpm.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
index a093dbe..7da6bbd 100644
--- a/drivers/soc/fsl/rcpm.c
+++ b/drivers/soc/fsl/rcpm.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RCPM_WAKEUP_CELL_MAX_SIZE  7
 
@@ -139,10 +140,17 @@ static const struct of_device_id rcpm_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, rcpm_of_match);
 
+static const struct acpi_device_id rcpm_imx_acpi_ids[] = {
+   {"NXP0012",},
+   { }
+};
+MODULE_DEVICE_TABLE(acpi, rcpm_imx_acpi_ids);
+
 static struct platform_driver rcpm_driver = {
.driver = {
.name = "rcpm",
.of_match_table = rcpm_of_match,
+   .acpi_match_table = ACPI_PTR(rcpm_imx_acpi_ids),
.pm = &rcpm_pm_ops,
},
.probe = rcpm_probe,
-- 
2.9.5

[powerpc:merge] BUILD SUCCESS 8a445cbcb9f5090cb07ec6cbb89a8a1fc99a0ff7

2020-03-17 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 8a445cbcb9f5090cb07ec6cbb89a8a1fc99a0ff7  Automatic merge of 
branches 'master', 'next' and 'fixes' into merge

elapsed time: 785m

configs tested: 156
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64allyesconfig
arm  allyesconfig
arm64 allnoconfig
arm   allnoconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
sparcallyesconfig
arm64allmodconfig
mips  fuloong2e_defconfig
s390defconfig
openrisc simple_smp_defconfig
riscv allnoconfig
s390 allyesconfig
i386  allnoconfig
i386 alldefconfig
i386 allyesconfig
i386defconfig
ia64 alldefconfig
ia64 allmodconfig
ia64  allnoconfig
ia64 allyesconfig
ia64defconfig
arm  allmodconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
nios2 3c120_defconfig
openriscor1ksim_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32 allnoconfig
nds32   defconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
m68k   m5475evb_defconfig
m68k  multi_defconfig
m68k   sun3_defconfig
arc  allyesconfig
arc defconfig
microblaze  mmu_defconfig
microblazenommu_defconfig
powerpc   allnoconfig
powerpc defconfig
powerpc   ppc64_defconfig
powerpc  rhel-kconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  malta_kvm_defconfig
pariscallnoconfig
pariscgeneric-64bit_defconfig
pariscgeneric-32bit_defconfig
parisc   allyesconfig
x86_64   randconfig-a001-20200317
x86_64   randconfig-a002-20200317
x86_64   randconfig-a003-20200317
i386     randconfig-a001-20200317
i386     randconfig-a002-20200317
i386     randconfig-a003-20200317
alpha    randconfig-a001-20200317
m68k     randconfig-a001-20200317
mips     randconfig-a001-20200317
nds32    randconfig-a001-20200317
parisc   randconfig-a001-20200317
h8300randconfig-a001-20200318
sparc64  randconfig-a001-20200318
c6x  randconfig-a001-20200318
nios2randconfig-a001-20200318
microblaze   randconfig-a001-20200318
csky randconfig-a001-20200318
openrisc randconfig-a001-20200318
s390 randconfig-a001-20200318
sh   randconfig-a001-20200318
xtensa   randconfig-a001-20200318
xtensa   randconfig-a001-20200317
openrisc     randconfig-a001-20200317
csky     randconfig-a001-20200317
sh   randconfig-a001-20200317
s390     randconfig-a001-20200317
x86_64   randconfig-b001-20200317
x86_64   randconfig-b002-20200317
x86_64   randconfig-b003-20200317
i386     randconfig-b001-20200317
i386     randconfig-b002-20200317
i386     randconfig-b003-20200317
x86_64   randconfig-c001-20200317
x86_64   randconfig-c002-20200317
x86_64

[powerpc:next-test] BUILD SUCCESS 033c9d310e29706e46c05f4fe7e863b1f32939db

2020-03-17 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 033c9d310e29706e46c05f4fe7e863b1f32939db  powerpc/hash64/devmap: 
Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries

elapsed time: 669m

configs tested: 156
configs skipped: 7

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64allyesconfig
arm  allyesconfig
arm64 allnoconfig
arm   allnoconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
sparcallyesconfig
arm64allmodconfig
mips  fuloong2e_defconfig
s390defconfig
openrisc simple_smp_defconfig
riscv allnoconfig
s390 allyesconfig
powerpc   allnoconfig
i386  allnoconfig
i386 alldefconfig
i386 allyesconfig
i386defconfig
ia64 alldefconfig
ia64 allmodconfig
ia64  allnoconfig
ia64 allyesconfig
ia64defconfig
arm  allmodconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
nios2 3c120_defconfig
openriscor1ksim_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32 allnoconfig
nds32   defconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
m68k   m5475evb_defconfig
m68k  multi_defconfig
m68k   sun3_defconfig
arc defconfig
arc  allyesconfig
powerpc   ppc64_defconfig
powerpc  rhel-kconfig
microblaze  mmu_defconfig
microblazenommu_defconfig
powerpc defconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  malta_kvm_defconfig
pariscallnoconfig
parisc   allyesconfig
pariscgeneric-32bit_defconfig
pariscgeneric-64bit_defconfig
x86_64   randconfig-a001-20200317
x86_64   randconfig-a002-20200317
x86_64   randconfig-a003-20200317
i386 randconfig-a001-20200317
i386 randconfig-a002-20200317
i386 randconfig-a003-20200317
alpharandconfig-a001-20200317
m68k randconfig-a001-20200317
mips randconfig-a001-20200317
nds32randconfig-a001-20200317
parisc   randconfig-a001-20200317
c6x  randconfig-a001-20200317
h8300randconfig-a001-20200317
microblaze   randconfig-a001-20200317
nios2randconfig-a001-20200317
sparc64  randconfig-a001-20200317
csky randconfig-a001-20200318
openrisc randconfig-a001-20200318
s390 randconfig-a001-20200318
sh   randconfig-a001-20200318
xtensa   randconfig-a001-20200318
csky randconfig-a001-20200317
openrisc randconfig-a001-20200317
s390 randconfig-a001-20200317
sh   randconfig-a001-20200317
xtensa   randconfig-a001-20200317
x86_64   randconfig-b001-20200317
x86_64   randconfig-b002-20200317
x86_64   randconfig-b003-20200317
i386 randconfig-b001-20200317
i386 randconfig-b002-20200317
i386 randconfig-b003-20200317
x86_64   randconfig-c001-20200317
x86_64   randconfig-c002-20200317
x86_64   randconfig-c003-20200317
i386

[powerpc:fixes-test] BUILD SUCCESS af3d0a68698c7e5df8b72267086b23422a3954bb

2020-03-17 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: af3d0a68698c7e5df8b72267086b23422a3954bb  powerpc/kasan: Fix 
shadow memory protection with CONFIG_KASAN_VMALLOC

elapsed time: 2837m

configs tested: 251
configs skipped: 218

The following configs have been built successfully.
More configs may be tested in the coming days.

arm  allmodconfig
arm   allnoconfig
arm  allyesconfig
arm64allmodconfig
arm64 allnoconfig
arm64allyesconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
sparcallyesconfig
pariscgeneric-32bit_defconfig
mips allyesconfig
riscvnommu_virt_defconfig
h8300 edosk2674_defconfig
mips  fuloong2e_defconfig
s390defconfig
m68k   sun3_defconfig
sparc   defconfig
ia64defconfig
powerpc defconfig
riscv allnoconfig
s390 alldefconfig
s390 allyesconfig
sparc64 defconfig
shtitan_defconfig
nios2 3c120_defconfig
riscv  rv32_defconfig
i386defconfig
nds32 allnoconfig
powerpc   ppc64_defconfig
powerpc   allnoconfig
ia64  allnoconfig
pariscallnoconfig
s390  debug_defconfig
i386  allnoconfig
i386 alldefconfig
i386 allyesconfig
ia64 alldefconfig
ia64 allmodconfig
ia64 allyesconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
openriscor1ksim_defconfig
openrisc simple_smp_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32   defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
m68k   m5475evb_defconfig
m68k  multi_defconfig
arc  allyesconfig
arc defconfig
microblaze  mmu_defconfig
microblazenommu_defconfig
powerpc  rhel-kconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips  malta_kvm_defconfig
parisc   allyesconfig
pariscgeneric-64bit_defconfig
x86_64   randconfig-a001-20200316
x86_64   randconfig-a002-20200316
x86_64   randconfig-a003-20200316
i386 randconfig-a001-20200316
i386 randconfig-a002-20200316
i386 randconfig-a003-20200316
x86_64   randconfig-a001-20200317
x86_64   randconfig-a002-20200317
x86_64   randconfig-a003-20200317
i386 randconfig-a001-20200317
i386 randconfig-a002-20200317
i386 randconfig-a003-20200317
x86_64   randconfig-a001-20200318
x86_64   randconfig-a002-20200318
x86_64   randconfig-a003-20200318
i386 randconfig-a001-20200318
i386 randconfig-a002-20200318
i386 randconfig-a003-20200318
alpharandconfig-a001-20200317
m68k randconfig-a001-20200317
mips randconfig-a001-20200317
nds32randconfig-a001-20200317
parisc   randconfig-a001-20200317
riscvrandconfig-a001-20200317
alpharandconfig-a001-20200316
m68k randconfig-a001-20200316
mips randconfig-a001-20200316
nds32randconfig-a001-20200316
parisc

[powerpc:next] BUILD SUCCESS 59ed2adf393109c56d383e568f2e57bb5ad6d901

2020-03-17 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next
branch HEAD: 59ed2adf393109c56d383e568f2e57bb5ad6d901  powerpc/lib: Fix 
emulate_step() std test

elapsed time: 669m

configs tested: 161
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64allyesconfig
arm  allyesconfig
arm64 allnoconfig
arm   allnoconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
arm64allmodconfig
mips  fuloong2e_defconfig
s390defconfig
openrisc simple_smp_defconfig
riscv allnoconfig
s390 allyesconfig
powerpc   allnoconfig
i386  allnoconfig
i386 alldefconfig
i386 allyesconfig
i386defconfig
ia64 alldefconfig
ia64 allmodconfig
ia64  allnoconfig
ia64 allyesconfig
ia64defconfig
arm  allmodconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
nios2 3c120_defconfig
openriscor1ksim_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32 allnoconfig
nds32   defconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
m68k   m5475evb_defconfig
m68k  multi_defconfig
m68k   sun3_defconfig
arc  allyesconfig
arc defconfig
microblaze  mmu_defconfig
microblazenommu_defconfig
powerpc defconfig
powerpc   ppc64_defconfig
powerpc  rhel-kconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  malta_kvm_defconfig
pariscallnoconfig
parisc   allyesconfig
pariscgeneric-32bit_defconfig
pariscgeneric-64bit_defconfig
x86_64   randconfig-a001-20200317
x86_64   randconfig-a002-20200317
x86_64   randconfig-a003-20200317
i386 randconfig-a001-20200317
i386 randconfig-a002-20200317
i386 randconfig-a003-20200317
alpharandconfig-a001-20200317
m68k randconfig-a001-20200317
mips randconfig-a001-20200317
nds32randconfig-a001-20200317
parisc   randconfig-a001-20200317
c6x  randconfig-a001-20200317
h8300randconfig-a001-20200317
microblaze   randconfig-a001-20200317
nios2randconfig-a001-20200317
sparc64  randconfig-a001-20200317
csky randconfig-a001-20200318
openrisc randconfig-a001-20200318
s390 randconfig-a001-20200318
sh   randconfig-a001-20200318
xtensa   randconfig-a001-20200318
csky randconfig-a001-20200317
openrisc randconfig-a001-20200317
s390 randconfig-a001-20200317
sh   randconfig-a001-20200317
xtensa   randconfig-a001-20200317
x86_64   randconfig-b001-20200317
x86_64   randconfig-b002-20200317
x86_64   randconfig-b003-20200317
i386 randconfig-b001-20200317
i386 randconfig-b002-20200317
i386 randconfig-b003-20200317
x86_64   randconfig-c001-20200317
x86_64   randconfig-c002-20200317
x86_64   randconfig-c003-20200317
i386 randconfig-c001-20200317
i386 randconfig-c002-20200317
i386

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread kbuild test robot

Hi Borislav,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/mm]
[cannot apply to linux/master powerpc/next s390/features tip/x86/core 
linus/master v5.6-rc6 next-20200317]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Borislav-Petkov/treewide-Rename-unencrypted-to-decrypted/20200318-015738
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   powerpc64-linux-ld: warning: orphan section `.gnu.hash' from `linker stubs' 
being placed in section `.gnu.hash'
   powerpc64-linux-ld: kernel/dma/mapping.o: in function `.dma_pgprot':
>> mapping.c:(.text+0xc5c): undefined reference to `.force_dma_decrypted'
   powerpc64-linux-ld: kernel/dma/mapping.o: in function `.dma_common_mmap':
   mapping.c:(.text+0xcfc): undefined reference to `.force_dma_decrypted'
   powerpc64-linux-ld: kernel/dma/direct.o: in function 
`.dma_direct_get_required_mask':
>> direct.c:(.text+0x920): undefined reference to `.force_dma_decrypted'
   powerpc64-linux-ld: kernel/dma/direct.o: in function 
`.__dma_direct_alloc_pages':
   direct.c:(.text+0x9fc): undefined reference to `.force_dma_decrypted'
>> powerpc64-linux-ld: direct.c:(.text+0xaa8): undefined reference to 
>> `.force_dma_decrypted'
   powerpc64-linux-ld: kernel/dma/direct.o:direct.c:(.text+0xb28): more 
undefined references to `.force_dma_decrypted' follow

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread kbuild test robot

Hi Borislav,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/mm]
[cannot apply to linux/master powerpc/next s390/features tip/x86/core 
linus/master v5.6-rc6 next-20200317]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Borislav-Petkov/treewide-Rename-unencrypted-to-decrypted/20200318-015738
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1
config: s390-zfcpdump_defconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   s390-linux-ld: kernel/dma/mapping.o: in function `dma_pgprot':
   mapping.c:(.text+0x144): undefined reference to `force_dma_decrypted'
   s390-linux-ld: kernel/dma/mapping.o: in function `dma_common_mmap':
   mapping.c:(.text+0x1a4): undefined reference to `force_dma_decrypted'
   s390-linux-ld: kernel/dma/direct.o: in function 
`dma_direct_get_required_mask':
   direct.c:(.text+0xae): undefined reference to `force_dma_decrypted'
   s390-linux-ld: kernel/dma/direct.o: in function `__dma_direct_alloc_pages':
   direct.c:(.text+0x152): undefined reference to `force_dma_decrypted'
>> s390-linux-ld: direct.c:(.text+0x1e8): undefined reference to 
>> `force_dma_decrypted'
   s390-linux-ld: kernel/dma/direct.o:direct.c:(.text+0x2e0): more undefined 
references to `force_dma_decrypted' follow

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn

2020-03-17 Thread Daniel Axtens

Hi Pratik,

Thanks.

I have checked:

 - for matching puts/gets
 - that all the '.' to '->' conversions, aud uses of '&' check out
 - that the Snowpatch checks pass (https://patchwork.ozlabs.org/patch/1255580/)

On that basis:

Reviewed-by: Daniel Axtens 

Regards,
Daniel

> The patch avoids allocating cpufreq_policy on stack hence fixing frame
> size overflow in 'powernv_cpufreq_work_fn'
>
> Fixes: 227942809b52 ("cpufreq: powernv: Restore cpu frequency to policy->cur 
> on unthrottling")
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 56f4bc0d209e..20ee0661555a 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -902,6 +902,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
>  void powernv_cpufreq_work_fn(struct work_struct *work)
>  {
>   struct chip *chip = container_of(work, struct chip, throttle);
> + struct cpufreq_policy *policy;
>   unsigned int cpu;
>   cpumask_t mask;
>  
> @@ -916,12 +917,14 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
>   chip->restore = false;
>   for_each_cpu(cpu, &mask) {
>   int index;
> - struct cpufreq_policy policy;
>  
> - cpufreq_get_policy(&policy, cpu);
> - index = cpufreq_table_find_index_c(&policy, policy.cur);
> - powernv_cpufreq_target_index(&policy, index);
> - cpumask_andnot(&mask, &mask, policy.cpus);
> + policy = cpufreq_cpu_get(cpu);
> + if (!policy)
> + continue;
> + index = cpufreq_table_find_index_c(policy, policy->cur);
> + powernv_cpufreq_target_index(policy, index);
> + cpumask_andnot(&mask, &mask, policy->cpus);
> + cpufreq_cpu_put(policy);
>   }
>  out:
>   put_online_cpus();
> -- 
> 2.24.1

Re: [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online

2020-03-17 Thread Wei Yang

On Tue, Mar 17, 2020 at 11:49:40AM +0100, David Hildenbrand wrote:
>All in-tree users except the mm-core are gone. Let's drop the export.
>
>Cc: Andrew Morton 
>Cc: Michal Hocko 
>Cc: Oscar Salvador 
>Cc: "Rafael J. Wysocki" 
>Cc: Baoquan He 
>Cc: Wei Yang 
>Signed-off-by: David Hildenbrand 

Reviewed-by: Wei Yang 

>---
> mm/memory_hotplug.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 1a00b5a37ef6..2d2aae830b92 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -71,7 +71,6 @@ bool memhp_auto_online;
> #else
> bool memhp_auto_online = true;
> #endif
>-EXPORT_SYMBOL_GPL(memhp_auto_online);
> 
> static int __init setup_memhp_default_state(char *str)
> {
>-- 
>2.24.1

-- 
Wei Yang
Help you, Help me

Re: [PATCH v2 4/8] powernv/memtrace: always online added memory blocks

2020-03-17 Thread Wei Yang

On Tue, Mar 17, 2020 at 11:49:38AM +0100, David Hildenbrand wrote:
>Let's always try to online the re-added memory blocks. In case add_memory()
>already onlined the added memory blocks, the first device_online() call
>will fail and stop processing the remaining memory blocks.
>
>This avoids manually having to check memhp_auto_online.
>
>Note: PPC always onlines all hotplugged memory directly from the kernel
>as well - something that is handled by user space on other
>architectures.
>
>Cc: Benjamin Herrenschmidt 
>Cc: Paul Mackerras 
>Cc: Michael Ellerman 
>Cc: Andrew Morton 
>Cc: Greg Kroah-Hartman 
>Cc: Michal Hocko 
>Cc: Oscar Salvador 
>Cc: "Rafael J. Wysocki" 
>Cc: Baoquan He 
>Cc: Wei Yang 
>Cc: linuxppc-dev@lists.ozlabs.org
>Signed-off-by: David Hildenbrand 

Looks good.

Reviewed-by: Wei Yang 

>---
> arch/powerpc/platforms/powernv/memtrace.c | 14 --
> 1 file changed, 4 insertions(+), 10 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
>b/arch/powerpc/platforms/powernv/memtrace.c
>index d6d64f8718e6..13b369d2cc45 100644
>--- a/arch/powerpc/platforms/powernv/memtrace.c
>+++ b/arch/powerpc/platforms/powernv/memtrace.c
>@@ -231,16 +231,10 @@ static int memtrace_online(void)
>   continue;
>   }
> 
>-  /*
>-   * If kernel isn't compiled with the auto online option
>-   * we need to online the memory ourselves.
>-   */
>-  if (!memhp_auto_online) {
>-  lock_device_hotplug();
>-  walk_memory_blocks(ent->start, ent->size, NULL,
>- online_mem_block);
>-  unlock_device_hotplug();
>-  }
>+  lock_device_hotplug();
>+  walk_memory_blocks(ent->start, ent->size, NULL,
>+ online_mem_block);
>+  unlock_device_hotplug();
> 
>   /*
>* Memory was added successfully so clean up references to it
>-- 
>2.24.1

-- 
Wei Yang
Help you, Help me

Re: [PATCH 6/6] soc: fsl: qe: fix sparse warnings for ucc_slow.c

2020-03-17 Thread Li Yang

On Mon, Mar 16, 2020 at 4:08 PM Rasmus Villemoes
 wrote:
>
> On 12/03/2020 23.28, Li Yang wrote:
> > Fixes the following sparse warnings:
> >
> [snip]
> >
> > Also removed the unneccessary clearing for kzalloc'ed structure.
>
> Please don't mix that in the same patch, do it in a preparatory patch.
> That makes reviewing much easier.

Thanks for the review.

One of the few lines removed are actually causing sparse warning,
that's why I changed this together with the sparse fixes.  I can add
all the lines removed in the log for better record.

>
> >
> >   /* Get PRAM base */
> >   uccs->us_pram_offset =
> > @@ -231,24 +224,24 @@ int ucc_slow_init(struct ucc_slow_info * us_info, 
> > struct ucc_slow_private ** ucc
> >   /* clear bd buffer */
> >   qe_iowrite32be(0, &bd->buf);
> >   /* set bd status and length */
> > - qe_iowrite32be(0, (u32 *)bd);
> > + qe_iowrite32be(0, (u32 __iomem *)bd);
>
> It's cleaner to do two qe_iowrite16be to &bd->status and &bd->length,
> that avoids the casting altogether.

It probably is cleaner, but also slower for the bd manipulation that
can be in the hot path.  The QE code has been using this way to
access/update the bd status and length together for a long time.  And
I remembered that it could help to prevent synchronization issues for
accessing status and length separately.

It is probably cleaner to change the data structure to use union for
accessing status and length together.  But that will need a big change
to update all the current users of the structure which I don't have
the time to do right now.

>
> >   bd++;
> >   }
> >   /* for last BD set Wrap bit */
> >   qe_iowrite32be(0, &bd->buf);
> > - qe_iowrite32be(cpu_to_be32(T_W), (u32 *)bd);
> > + qe_iowrite32be(T_W, (u32 __iomem *)bd);
>
> Yeah, and this is why. Who can actually keep track of where that bit
> ends up being set with that casting going on. Please use
> qe_iowrite16be() with an appropriately modified constant to the
> appropriate field instead of these games.
>
> And if the hardware doesn't support 16 bit writes, the definition of
> struct qe_bd is wrong and should have a single __be32 status_length
> field, with appropriate accessors defined.

Same comment as above.

>
> >   /* Init Rx bds */
> >   bd = uccs->rx_bd = qe_muram_addr(uccs->rx_base_offset);
> >   for (i = 0; i < us_info->rx_bd_ring_len - 1; i++) {
> >   /* set bd status and length */
> > - qe_iowrite32be(0, (u32 *)bd);
> > + qe_iowrite32be(0, (u32 __iomem *)bd);
>
> Same.
>
> >   /* clear bd buffer */
> >   qe_iowrite32be(0, &bd->buf);
> >   bd++;
> >   }
> >   /* for last BD set Wrap bit */
> > - qe_iowrite32be(cpu_to_be32(R_W), (u32 *)bd);
> > + qe_iowrite32be(R_W, (u32 __iomem *)bd);
>
> Same.
>
> >   qe_iowrite32be(0, &bd->buf);
> >
> >   /* Set GUMR (For more details see the hardware spec.). */
> > @@ -273,8 +266,8 @@ int ucc_slow_init(struct ucc_slow_info * us_info, 
> > struct ucc_slow_private ** ucc
> >   qe_iowrite32be(gumr, &us_regs->gumr_h);
> >
> >   /* gumr_l */
> > - gumr = us_info->tdcr | us_info->rdcr | us_info->tenc | us_info->renc |
> > - us_info->diag | us_info->mode;
> > + gumr = (u32)us_info->tdcr | (u32)us_info->rdcr | (u32)us_info->tenc |
> > +(u32)us_info->renc | (u32)us_info->diag | (u32)us_info->mode;
>
> Are the tdcr, rdcr, tenc, renc fields actually set anywhere (the same
> for the diag and mode, but word-grepping for those give way too many
> false positives)? They seem to be a somewhat pointless split out of the
> bitfields of gumr_l, and not populated anywhere?. That's not directly
> related to this patch, of course, but getting rid of them first (if they
> are indeed completely unused) might make the sparse cleanup a little
> simpler.

I would agree with you that is not neccessary to create different enum
types for these bits in the same register in the first place.  But I
would like to prevent aggressive cleanups of this driver for old
hardware that is becoming harder and harder to be thoroughly tested
right now.  These fields are probably not used by the in tree ucc_uart
driver right now.  But as a generic library for QE, I think it is
harmless to keep these simple code there for potentially customized
version of the uart driver or other ucc slow drivers.

Regards,
Leo

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Thomas Gleixner

Borislav Petkov  writes:

> On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote:
>> On 3/17/20 4:18 AM, Borislav Petkov wrote:
>> > Back then when the whole SME machinery started getting mainlined, it
>> > was agreed that for simplicity, clarity and sanity's sake, the terms
>> > denoting encrypted and not-encrypted memory should be "encrypted" and
>> > "decrypted". And the majority of the code sticks to that convention
>> > except those two. So rename them.
>> 
>> Don't "unencrypted" and "decrypted" mean different things?
>> 
>> Unencrypted to me means "encryption was never used for this data".
>> 
>> Decrypted means "this was/is encrypted but here is a plaintext copy".
>
> Maybe but linguistical semantics is not the point here.
>
> The idea is to represent a "binary" concept of memory being encrypted
> or memory being not encrypted. And at the time we decided to use
> "encrypted" and "decrypted" for those two things.
>
> Do you see the need to differentiate a third "state", so to speak, of
> memory which was never encrypted?

I think so.

encrypted data is something you can't use without having the key

decrypted data is the plaintext copy of something encrypted, so
it might be of sensible nature.

unencrypted data can still be sensible, but nothing ever bothered to
encrypt it in the first place.

So having this distinction is useful in terms of setting the context
straight.

Thanks,

tglx

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Joe Perches

On Tue, 2020-03-17 at 14:24 -0700, Dave Hansen wrote:
> On 3/17/20 2:06 PM, Borislav Petkov wrote:
> > On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote:
> > > On 3/17/20 4:18 AM, Borislav Petkov wrote:
> > > > Back then when the whole SME machinery started getting mainlined, it
> > > > was agreed that for simplicity, clarity and sanity's sake, the terms
> > > > denoting encrypted and not-encrypted memory should be "encrypted" and
> > > > "decrypted". And the majority of the code sticks to that convention
> > > > except those two. So rename them.
> > > Don't "unencrypted" and "decrypted" mean different things?
> > > 
> > > Unencrypted to me means "encryption was never used for this data".
> > > 
> > > Decrypted means "this was/is encrypted but here is a plaintext copy".
> > Maybe but linguistical semantics is not the point here.
> > 
> > The idea is to represent a "binary" concept of memory being encrypted
> > or memory being not encrypted. And at the time we decided to use
> > "encrypted" and "decrypted" for those two things.
> 
> Yeah, agreed.  We're basically trying to name "!encrypted".
> 
> > Do you see the need to differentiate a third "state", so to speak, of
> > memory which was never encrypted?
> 
> No, there are just two states.  I just think the "!encrypted" case
> should not be called "decrypted".

Nor do I, it's completely misleading.

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Borislav Petkov

On Tue, Mar 17, 2020 at 02:24:59PM -0700, Dave Hansen wrote:
> No, there are just two states.  I just think the "!encrypted" case
> should not be called "decrypted".

Yeah, we suck at naming - news at 11! :-)

I believe we even considered things like "encrypted" vs "clear" but
that sucked too. ;-\

In any case, that ship has sailed now and having two as differently as
possible looking words to denote the two "states" should be good enough
for our purposes...

Oh well.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Dave Hansen

On 3/17/20 2:06 PM, Borislav Petkov wrote:
> On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote:
>> On 3/17/20 4:18 AM, Borislav Petkov wrote:
>>> Back then when the whole SME machinery started getting mainlined, it
>>> was agreed that for simplicity, clarity and sanity's sake, the terms
>>> denoting encrypted and not-encrypted memory should be "encrypted" and
>>> "decrypted". And the majority of the code sticks to that convention
>>> except those two. So rename them.
>> Don't "unencrypted" and "decrypted" mean different things?
>>
>> Unencrypted to me means "encryption was never used for this data".
>>
>> Decrypted means "this was/is encrypted but here is a plaintext copy".
> Maybe but linguistical semantics is not the point here.
> 
> The idea is to represent a "binary" concept of memory being encrypted
> or memory being not encrypted. And at the time we decided to use
> "encrypted" and "decrypted" for those two things.

Yeah, agreed.  We're basically trying to name "!encrypted".

> Do you see the need to differentiate a third "state", so to speak, of
> memory which was never encrypted?

No, there are just two states.  I just think the "!encrypted" case
should not be called "decrypted".

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Borislav Petkov

On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote:
> On 3/17/20 4:18 AM, Borislav Petkov wrote:
> > Back then when the whole SME machinery started getting mainlined, it
> > was agreed that for simplicity, clarity and sanity's sake, the terms
> > denoting encrypted and not-encrypted memory should be "encrypted" and
> > "decrypted". And the majority of the code sticks to that convention
> > except those two. So rename them.
> 
> Don't "unencrypted" and "decrypted" mean different things?
> 
> Unencrypted to me means "encryption was never used for this data".
> 
> Decrypted means "this was/is encrypted but here is a plaintext copy".

Maybe but linguistical semantics is not the point here.

The idea is to represent a "binary" concept of memory being encrypted
or memory being not encrypted. And at the time we decided to use
"encrypted" and "decrypted" for those two things.

Do you see the need to differentiate a third "state", so to speak, of
memory which was never encrypted?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Dave Hansen

On 3/17/20 4:18 AM, Borislav Petkov wrote:
> Back then when the whole SME machinery started getting mainlined, it
> was agreed that for simplicity, clarity and sanity's sake, the terms
> denoting encrypted and not-encrypted memory should be "encrypted" and
> "decrypted". And the majority of the code sticks to that convention
> except those two. So rename them.

Don't "unencrypted" and "decrypted" mean different things?

Unencrypted to me means "encryption was never used for this data".

Decrypted means "this was/is encrypted but here is a plaintext copy".

This, for instance:

> +++ b/kernel/dma/direct.c
> @@ -26,7 +26,7 @@ unsigned int zone_dma_bits __ro_after_init = 24;
>  static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>   phys_addr_t phys)
>  {
> - if (force_dma_unencrypted(dev))
> + if (force_dma_decrypted(dev))
>   return __phys_to_dma(dev, phys);

is referring to DMA that is not and never was encrypted.  It's skipping
the encryption altogether.  There's no act of "decryption" anywhere.

This, on the other hand, seems named wrong to me:

> /*
>  * Macros to add or remove encryption attribute
>  */
> #define pgprot_encrypted(prot)  __pgprot(__sme_set(pgprot_val(prot)))
> #define pgprot_decrypted(prot)  __pgprot(__sme_clr(pgprot_val(prot)))

This seems like it would be better named pgprot_unencrypted().

Re: [PATCH V7 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2020-03-17 Thread Haren Myneni

On Tue, 2020-03-17 at 16:28 +1100, Michael Ellerman wrote:
> Haren Myneni  writes:
> > For each fault CRB, update fault address in CRB (fault_storage_addr)
> > and translation error status in CSB so that user space can touch the
> > fault address and resend the request. If the user space passed invalid
> > CSB address send signal to process with SIGSEGV.
> >
> > Signed-off-by: Sukadev Bhattiprolu 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/powernv/vas-fault.c | 114 
> > +
> >  1 file changed, 114 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
> > b/arch/powerpc/platforms/powernv/vas-fault.c
> > index 1c6d5cc..751ce48 100644
> > --- a/arch/powerpc/platforms/powernv/vas-fault.c
> > +++ b/arch/powerpc/platforms/powernv/vas-fault.c
> > @@ -11,6 +11,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -26,6 +27,118 @@
> >  #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
> >  
> >  /*
> > + * Update the CSB to indicate a translation error.
> > + *
> > + * If we are unable to update the CSB means copy_to_user failed due to
> > + * invalid csb_addr, send a signal to the process.
> > + *
> > + * Remaining settings in the CSB are based on wait_for_csb() of
> > + * NX-GZIP.
> > + */
> > +static void update_csb(struct vas_window *window,
> > +   struct coprocessor_request_block *crb)
> > +{
> > +   int rc;
> > +   struct pid *pid;
> > +   void __user *csb_addr;
> > +   struct task_struct *tsk;
> > +   struct kernel_siginfo info;
> > +   struct coprocessor_status_block csb;
> 
> csb is on the stack, and later copied to user, which is a risk for
> creating an infoleak.
> 
> Also please use reverse Christmas tree layout for your variables.
> 
> > +
> > +   /*
> > +* NX user space windows can not be opened for task->mm=NULL
> > +* and faults will not be generated for kernel requests.
> > +*/
> > +   if (!window->mm || !window->user_win)
> > +   return;
> 
> If that's a should-never-happen condition then should it do a
> WARN_ON_ONCE() rather than silently returning?

Will add WARN_ON

> 
> > +   csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
> > +
> > +   csb.cc = CSB_CC_TRANSLATION;
> > +   csb.ce = CSB_CE_TERMINATION;
> > +   csb.cs = 0;
> > +   csb.count = 0;
> > +
> > +   /*
> > +* NX operates and returns in BE format as defined CRB struct.
> > +* So return fault_storage_addr in BE as NX pastes in FIFO and
> > +* expects user space to convert to CPU format.
> > +*/
> > +   csb.address = crb->stamp.nx.fault_storage_addr;
> > +   csb.flags = 0;
> 
> I'm pretty sure this has initialised all the fields of csb.
> 
> But, I'd still be much happier if you zeroed the whole struct to begin
> with, that way we know for sure we can't leak any uninitialised bytes to
> userspace. It's only 16 bytes so it shouldn't add any noticeable
> overhead.
Sure, will initialize csb
> 
> > +
> > +   pid = window->pid;
> > +   tsk = get_pid_task(pid, PIDTYPE_PID);
> > +   /*
> > +* Send window will be closed after processing all NX requests
> > +* and process exits after closing all windows. In multi-thread
> > +* applications, thread may not exists, but does not close FD
> > +* (means send window) upon exit. Parent thread (tgid) can use
> > +* and close the window later.
> > +* pid and mm references are taken when window is opened by
> > +* process (pid). So tgid is used only when child thread opens
> > +* a window and exits without closing it in multithread tasks.
> > +*/
> > +   if (!tsk) {
> > +   pid = window->tgid;
> > +   tsk = get_pid_task(pid, PIDTYPE_PID);
> > +   /*
> > +* Parent thread will be closing window during its exit.
> > +* So should not get here.
> > +*/
> > +   if (!tsk)
> > +   return;
> 
> Similar question on WARN_ON_ONCE()
Yes, we can add WARN_ON
> 
> > +   }
> > +
> > +   /* Return if the task is exiting. */
> 
> Why? Just because it's no use? It's racy isn't it, so it can't be for
> correctness?
Yes process is exiting and no need to update CSB. We release the
task->usage refcount after copy_to_user().

> 
> > +   if (tsk->flags & PF_EXITING) {
> > +   put_task_struct(tsk);
> > +   return;
> > +   }
> > +
> > +   use_mm(window->mm);
> 
> There's no check that csb_addr is actually pointing into userspace, but
> copy_to_user() does it for you.
> 
> > +   rc = copy_to_user(csb_addr, &csb, sizeof(csb));
> > +   /*
> > +* User space polls on csb.flags (first byte). So add barrier
> > +* then copy first byte with csb flags update.
> > +*/
> > +   smp_mb();
> 
> You only need to order the stores above vs the store below to csb.flags.
> So you should only need an smp_wmb() here.
Sure, will add
if (!rc) {
csb.flags = CSB_V;
smp_mb();
rc = copy_to_user(csb_addr, &csb

Re: [PATCH V7 08/14] powerpc/vas: Take reference to PID and mm for user space windows

2020-03-17 Thread Haren Myneni

On Tue, 2020-03-17 at 15:09 +1100, Michael Ellerman wrote:
> Haren Myneni  writes:
> > Process close windows after its requests are completed. In multi-thread
> > applications, child can open a window but release FD will not be called
> > upon its exit. Parent thread will be closing it later upon its exit.
> 
> What if the parent exits first?

Thanks for the review.

If the parent exists, child thread will close the window when it exits.
So we should not get the case where window is still open. 


> 
> > The parent can also send NX requests with this window and NX can
> > generate page faults. After kernel handles the page fault, send
> > signal to process by using PID if CSB address is invalid. Parent
> > thread will not receive signal since its PID is different from the one
> > saved in vas_window. So use tgid in case if the task for the pid saved
> > in window is not running and send signal to its parent.
> >
> > To prevent reusing the pid until the window closed, take reference to
> > pid and task mm.
> 
> That text is all very dense. Can you please flesh it out and reword it
> to clearly spell out what's going on in much more detail.

Sure, will update the commit description. 
> 
> 
> > diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
> > b/arch/powerpc/platforms/powernv/vas-window.c
> > index a45d81d..7587258 100644
> > --- a/arch/powerpc/platforms/powernv/vas-window.c
> > +++ b/arch/powerpc/platforms/powernv/vas-window.c
> > @@ -1266,8 +1300,17 @@ int vas_win_close(struct vas_window *window)
> > poll_window_castout(window);
> >  
> > /* if send window, drop reference to matching receive window */
> > -   if (window->tx_win)
> > +   if (window->tx_win) {
> > +   if (window->user_win) {
> > +   /* Drop references to pid and mm */
> > +   put_pid(window->pid);
> > +   if (window->mm) {
> > +   mmdrop(window->mm);
> > +   mm_context_remove_copro(window->mm);
> 
> That seems backward. Once you drop the reference the mm can be freed
> can't it?

Yes, will change.

> 
> > +   }
> > +   }
> > put_rx_win(window->rxwin);
> > +   }
> >  
> > vas_window_free(window);
> 
> cheers

Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually

2020-03-17 Thread David Hildenbrand

> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>   set_online_page_callback(&hv_online_page);
>   register_memory_notifier(&hv_memory_nb);
> + init_completion(&dm_device.ol_waitevent);

I'll move this one line up.

>  #endif
>  
>   hv_set_drvdata(dev, &dm_device);
> 


-- 
Thanks,

David / dhildenb

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 1:04 AM, Christophe Leroy wrote:
> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
> following build failure is encoutered:
> 
> In file included from arch/powerpc/mm/fault.c:33:0:
> ./include/linux/hugetlb.h: In function 'hstate_inode':
> ./include/linux/hugetlb.h:477:9: error: implicit declaration of function 
> 'HUGETLBFS_SB' [-Werror=implicit-function-declaration]
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>  ^
> ./include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 
> 'int')
>   return HUGETLBFS_SB(i->i_sb)->hstate;
>   ^
> 
> Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE.
> 
> Reported-by: kbuild test robot 
> Link: https://patchwork.ozlabs.org/patch/1255548/#2386036
> Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

As hugetlb.h evolved over time, I suspect nobody imagined a configuration
with CONFIG_HUGETLB_PAGE and not CONFIG_HUGETLBFS.  This patch does address
the build issues.  So,

Reviewed-by: Mike Kravetz 

However, there are many definitions in that file not behind #ifdef
CONFIG_HUGETLBFS that make no sense unless CONFIG_HUGETLBFS is defined.
Such cleanup is way beyond the scope of this patch/effort.  I will add
it to the list of hugetlb/hugetlbfs things that can be cleaned up.
-- 
Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 9:47 AM, Christophe Leroy wrote:
> 
> 
> Le 17/03/2020 à 17:40, Mike Kravetz a écrit :
>> On 3/17/20 1:43 AM, Christophe Leroy wrote:
>>>
>>>
>>> Le 17/03/2020 à 09:25, Baoquan He a écrit :
 On 03/17/20 at 08:04am, Christophe Leroy wrote:
> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
> following build failure is encoutered:

   From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS?
 I could misunderstand the def_bool, please correct me if I am wrong.
>>>
>>> AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default 
>>> HUGETLB_PAGE is not selected when HUGETLBFS is not. But it is still 
>>> possible for an arch to select HUGETLB_PAGE without selecting HUGETLBFS 
>>> when it uses huge pages for other purpose than hugetlb file system.
>>>
>>
>> Hi Christophe,
>>
>> Do you actually have a use case/example of using hugetlb pages without
>> hugetlbfs?  I can understand that there are some use cases which never
>> use the filesystem interface.  However, hugetlb support is so intertwined
>> with hugetlbfs, I am thinking there would be issues trying to use them
>> separately.  I will look into this further.
>>
> 
> Hi Mike,
> 
> Series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620
> 
> And especially patch 39 to 41.
> 

Ah, ok.  You are simply using a few interfaces in the hugetlb header files.
The huge pages created in your mappings are not PageHuge() pages.

-- 
Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Christophe Leroy





Le 17/03/2020 à 17:40, Mike Kravetz a écrit :

On 3/17/20 1:43 AM, Christophe Leroy wrote:



Le 17/03/2020 à 09:25, Baoquan He a écrit :

On 03/17/20 at 08:04am, Christophe Leroy wrote:

When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
following build failure is encoutered:


  From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS?
I could misunderstand the def_bool, please correct me if I am wrong.


AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default HUGETLB_PAGE is 
not selected when HUGETLBFS is not. But it is still possible for an arch to 
select HUGETLB_PAGE without selecting HUGETLBFS when it uses huge pages for 
other purpose than hugetlb file system.



Hi Christophe,

Do you actually have a use case/example of using hugetlb pages without
hugetlbfs?  I can understand that there are some use cases which never
use the filesystem interface.  However, hugetlb support is so intertwined
with hugetlbfs, I am thinking there would be issues trying to use them
separately.  I will look into this further.



Hi Mike,

Series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620

And especially patch 39 to 41.

Thanks
Christophe

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Vlastimil Babka

On 3/17/20 5:25 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka  [2020-03-17 16:56:04]:
> 
>> 
>> I wonder why do you get a memory leak while Sachin in the same situation [1]
>> gets a crash? I don't understand anything anymore.
> 
> Sachin was testing on linux-next which has Kirill's patch which modifies
> slub to use kmalloc_node instead of kmalloc. While Bharata is testing on
> upstream, which doesn't have this. 

Yes, that Kirill's patch was about the memcg shrinker map allocation. But the
patch hunk that Bharata posted as a "hack" that fixes the problem, it follows
that there has to be something else that calls kmalloc_node(node) where node is
one that doesn't have present pages.

He mentions alloc_fair_sched_group() which has:

for_each_possible_cpu(i) {
cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
  GFP_KERNEL, cpu_to_node(i));
...
se = kzalloc_node(sizeof(struct sched_entity),
  GFP_KERNEL, cpu_to_node(i));

I assume one of these structs is 1k and other 512 bytes (rounded) and that for
some possible cpu's cpu_to_node(i) will be 0, which has no present pages. And as
Bharata pasted, node_to_mem_node(0) = 0
So this looks like the same scenario, but it doesn't crash? Is the node 0
actually online here, and/or does it have N_NORMAL_MEMORY state?

>> 
>> [1]
>> https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/
>> 
>

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Srikar Dronamraju

* Vlastimil Babka  [2020-03-17 14:34:25]:

> 
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
> > flags, int node,
> > struct kmem_cache_cpu *c)
> >  {
> > void *object;
> > -   int searchnode = node;
> >  
> > -   if (node == NUMA_NO_NODE)
> > -   searchnode = numa_mem_id();
> > -   else if (!node_present_pages(node))
> > -   searchnode = node_to_mem_node(node);
> > -
> > -   object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > +   object = get_partial_node(s, get_node(s, node), c, flags);
> > if (object || node != NUMA_NO_NODE)>return object;
> >
> >   return get_any_partial(s, flags, c);
> 
> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
> hunk below), thus the get_any_partial() call becomes dead code?

Very true. 

Would it be okay if we remove the node != NUMA_NO_NODE
if (object || node != NUMA_NO_NODE)
return object;

will now become
if (object)
return object;

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS

2020-03-17 Thread Mike Kravetz

On 3/17/20 1:43 AM, Christophe Leroy wrote:
> 
> 
> Le 17/03/2020 à 09:25, Baoquan He a écrit :
>> On 03/17/20 at 08:04am, Christophe Leroy wrote:
>>> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the
>>> following build failure is encoutered:
>>
>>  From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS?
>> I could misunderstand the def_bool, please correct me if I am wrong.
> 
> AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default HUGETLB_PAGE 
> is not selected when HUGETLBFS is not. But it is still possible for an arch 
> to select HUGETLB_PAGE without selecting HUGETLBFS when it uses huge pages 
> for other purpose than hugetlb file system.
> 

Hi Christophe,

Do you actually have a use case/example of using hugetlb pages without
hugetlbfs?  I can understand that there are some use cases which never
use the filesystem interface.  However, hugetlb support is so intertwined
with hugetlbfs, I am thinking there would be issues trying to use them
separately.  I will look into this further.

-- 
Mike Kravetz

Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually

2020-03-17 Thread David Hildenbrand

On 17.03.20 17:29, Vitaly Kuznetsov wrote:
> David Hildenbrand  writes:
> 
>> We get the MEM_ONLINE notifier call if memory is added right from the
>> kernel via add_memory() or later from user space.
>>
>> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
>> mechanism (->done) for that. Initialize the wait event only once and
>> reinitialize before adding memory. Unconditionally call complete() and
>> wait_for_completion_timeout().
>>
>> If there are no waiters, complete() will only increment ->done - which
>> will be reset by reinit_completion(). If complete() has already been
>> called, wait_for_completion_timeout() will not wait.
>>
>> There is still the chance for a small race between concurrent
>> reinit_completion() and complete(). If complete() wins, we would not
>> wait - which is tolerable (and the race exists in current code as
>> well).
> 
> How can we see concurent reinit_completion() and complete()? Obvioulsy,
> we are not onlining new memory in kernel and hv_mem_hot_add() calls are
> serialized, we're waiting up to 5*HZ for the added block to come online
> before proceeding to the next one. Or do you mean we actually hit this
> 5*HZ timeout, proceeded to the next block and immediately after
> reinit_completion() we saw complete() for the previously added block?

Yes exactly - or if an admin manually offlines+re-onlines a random
memory block.

> This is tolerable indeed, we're making forward progress (and this all is
> 'best effort' anyway).

Exactly my thoughts.

[...]

> 
> Reviewed-by: Vitaly Kuznetsov 
> 

Thanks!

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually

2020-03-17 Thread Vitaly Kuznetsov

David Hildenbrand  writes:

> We get the MEM_ONLINE notifier call if memory is added right from the
> kernel via add_memory() or later from user space.
>
> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
> mechanism (->done) for that. Initialize the wait event only once and
> reinitialize before adding memory. Unconditionally call complete() and
> wait_for_completion_timeout().
>
> If there are no waiters, complete() will only increment ->done - which
> will be reset by reinit_completion(). If complete() has already been
> called, wait_for_completion_timeout() will not wait.
>
> There is still the chance for a small race between concurrent
> reinit_completion() and complete(). If complete() wins, we would not
> wait - which is tolerable (and the race exists in current code as
> well).

How can we see concurent reinit_completion() and complete()? Obvioulsy,
we are not onlining new memory in kernel and hv_mem_hot_add() calls are
serialized, we're waiting up to 5*HZ for the added block to come online
before proceeding to the next one. Or do you mean we actually hit this
5*HZ timeout, proceeded to the next block and immediately after
reinit_completion() we saw complete() for the previously added block?
This is tolerable indeed, we're making forward progress (and this all is
'best effort' anyway).

>
> Note: We only wait for "some" memory to get onlined, which seems to be
>   good enough for now.
>
> Cc: "K. Y. Srinivasan" 
> Cc: Haiyang Zhang 
> Cc: Stephen Hemminger 
> Cc: Wei Liu 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: "Rafael J. Wysocki" 
> Cc: Baoquan He 
> Cc: Wei Yang 
> Cc: Vitaly Kuznetsov 
> Cc: linux-hyp...@vger.kernel.org
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/hv/hv_balloon.c | 25 ++---
>  1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a02ce43d778d..af5e09f08130 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -533,7 +533,6 @@ struct hv_dynmem_device {
>* State to synchronize hot-add.
>*/
>   struct completion  ol_waitevent;
> - bool ha_waiting;
>   /*
>* This thread handles hot-add
>* requests from the host as well as notifying
> @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, 
> unsigned long val,
>   switch (val) {
>   case MEM_ONLINE:
>   case MEM_CANCEL_ONLINE:
> - if (dm_device.ha_waiting) {
> - dm_device.ha_waiting = false;
> - complete(&dm_device.ol_waitevent);
> - }
> + complete(&dm_device.ol_waitevent);
>   break;
>  
>   case MEM_OFFLINE:
> @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
> long size,
>   has->covered_end_pfn +=  processed_pfn;
>   spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
> - init_completion(&dm_device.ol_waitevent);
> - dm_device.ha_waiting = !memhp_auto_online;
> + reinit_completion(&dm_device.ol_waitevent);
>  
>   nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>   ret = add_memory(nid, PFN_PHYS((start_pfn)),
> @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, 
> unsigned long size,
>   }
>  
>   /*
> -  * Wait for the memory block to be onlined when memory onlining
> -  * is done outside of kernel (memhp_auto_online). Since the hot
> -  * add has succeeded, it is ok to proceed even if the pages in
> -  * the hot added region have not been "onlined" within the
> -  * allowed time.
> +  * Wait for memory to get onlined. If the kernel onlined the
> +  * memory when adding it, this will return directly. Otherwise,
> +  * it will wait for user space to online the memory. This helps
> +  * to avoid adding memory faster than it is getting onlined. As
> +  * adding succeeded, it is ok to proceed even if the memory was
> +  * not onlined in time.
>*/
> - if (dm_device.ha_waiting)
> - wait_for_completion_timeout(&dm_device.ol_waitevent,
> - 5*HZ);
> + wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ);
>   post_status(&dm_device);
>   }
>  }
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>   set_online_page_callback(&hv_online_page);
>   register_memory_notifier(&hv_memory_nb);
> + init_completion(&dm_device.ol_waitevent);
>  #endif
>  
>   hv_set_drvdata(dev, &dm_device);

Reviewed-by: Vitaly Kuznetsov 

-- 
Vitaly

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Srikar Dronamraju

* Vlastimil Babka  [2020-03-17 16:56:04]:

> 
> I wonder why do you get a memory leak while Sachin in the same situation [1]
> gets a crash? I don't understand anything anymore.

Sachin was testing on linux-next which has Kirill's patch which modifies
slub to use kmalloc_node instead of kmalloc. While Bharata is testing on
upstream, which doesn't have this. 

> 
> [1]
> https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/
> 

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH v1 16/46] powerpc/mm: Allocate static page tables for fixmap

2020-03-17 Thread Luc Van Oostenryck

On Tue, Mar 17, 2020 at 03:38:46PM +0100, Christophe Leroy wrote:
> > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> > index f62de06e3d07..9934659cb871 100644
> > --- a/arch/powerpc/mm/pgtable_32.c
> > +++ b/arch/powerpc/mm/pgtable_32.c
> > @@ -29,11 +29,27 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   extern char etext[], _stext[], _sinittext[], _einittext[];
> > +static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
> 
> Sparse reports this as a variable size array. This is definitely not. Gcc
> properly sees it is an 8k table (2 pages).

Yes, thing is that FIXMAP_PTE_SIZE is not that constant since it uses
__builtin_ffs() (via PTE_SHIFT / PTE_T_LOG2).
Nevertheless, since Sparse v0.6.1 (released in October) accepts these
in constant expressions, like GCC does.

-- Luc

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Vlastimil Babka

On 3/17/20 12:53 PM, Bharata B Rao wrote:
> On Tue, Mar 17, 2020 at 02:56:28PM +0530, Bharata B Rao wrote:
>> Case 1: 2 node NUMA, node0 empty
>> 
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 1 cpus: 0 1 2 3 4 5 6 7
>> node 1 size: 16294 MB
>> node 1 free: 15453 MB
>> node distances:
>> node   0   1 
>>   0:  10  40 
>>   1:  40  10 
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..888e4d245444 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1971,10 +1971,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
>> flags, int node,
>>  void *object;
>>  int searchnode = node;
>>  
>> -if (node == NUMA_NO_NODE)
>> +if (node == NUMA_NO_NODE || !node_present_pages(node))
>>  searchnode = numa_mem_id();
>> -else if (!node_present_pages(node))
>> -searchnode = node_to_mem_node(node);
> 
> For the above topology, I see this:
> 
> node_to_mem_node(1) = 1
> node_to_mem_node(0) = 0
> node_to_mem_node(NUMA_NO_NODE) = 0
> 
> Looks like the last two cases (returning memory-less node 0) is the
> problem here?

I wonder why do you get a memory leak while Sachin in the same situation [1]
gets a crash? I don't understand anything anymore.

[1]
https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/

> Regards,
> Bharata.
> 
>

Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-17 Thread Stefan Berger


On 3/17/20 11:22 AM, Sachin Sant wrote:

On 17-Mar-2020, at 6:38 PM, Stefan Berger  wrote:

From: Stefan Berger 

This patch fixes the following problem when the ibmvtpm driver
is built as a module:

ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] 
undefined!
make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make: *** [Makefile:1298: modules] Error 2

Signed-off-by: Stefan Berger 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 


Thank you!

   Stefan

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Vlastimil Babka

On 3/17/20 3:51 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka  [2020-03-17 14:53:26]:
> 
>> >> > 
>> >> > Mitigate this by allocating the new slab from the node_numa_mem.
>> >> 
>> >> Are you sure this is really needed and the other 3 patches are not enough 
>> >> for
>> >> the current SLUB code to work as needed? It seems you are changing the 
>> >> semantics
>> >> here...
>> >> 
>> > 
>> > The other 3 patches are not enough because we don't carry the searchnode
>> > when the actual alloc_pages_node gets called. 
>> > 
>> > With only the 3 patches, we see the above Panic, its signature is slightly
>> > different from what Sachin first reported and which I have carried in 1st
>> > patch.
>> 
>> Ah, I see. So that's the missing pgdat after your series [1] right?
> 
> Yes the pgdat would be missing after my cpuless, memoryless node patchset.
> However..
>> 
>> That sounds like an argument for Michal's suggestions that pgdats exist and 
>> have
>> correctly populated zonelists for all possible nodes.
> 
> Only the first patch in this series would be affected by pgdat existing or
> not.  Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages
> would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is
> cpuless it would return 0.

I thought the point was to return 1 for node 0.

> If we pass this node 0 (which is memoryless/cpuless) to
> alloc_pages_node. Please note I am only setting node_numa_mem only
> for offline nodes. However we could change this to set for all offline and
> memoryless nodes.

That would indeed make sense.

But I guess that alloc_pages would still crash as the result of
numa_to_mem_node() is not passed down to alloc_pages() without this patch. In
__alloc_pages_node() we currently have "The node must be valid and online" so
offline nodes don't have zonelists. Either they get them, or we indeed need
something like this patch. But in order to not make get_any_partial() dead code,
the final replacement of invalid node with a valid one should be done in
alloc_slab_page() I guess?

>> node_to_mem_node() could be just a shortcut for the first zone's node in the
>> zonelist, so that fallback follows the topology.
>> 
>> [1]
>> https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e
>> 
> 
> 
>

Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-17 Thread Sachin Sant



> On 17-Mar-2020, at 6:38 PM, Stefan Berger  wrote:
> 
> From: Stefan Berger 
> 
> This patch fixes the following problem when the ibmvtpm driver
> is built as a module:
> 
> ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] 
> undefined!
> make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
> make: *** [Makefile:1298: modules] Error 2
> 
> Signed-off-by: Stefan Berger 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 

Thanks
-Sachin

Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote:

SBIP

> +static int metricgroup__add_metric_runtime_param(struct strbuf *events,
> + struct list_head *group_list, struct pmu_event *pe)
> +{
> + int i, count;
> + int ret = -EINVAL;
> +
> + count = arch_get_runtimeparam();
> +
> + /* This loop is added to create multiple
> +  * events depend on count value and add
> +  * those events to group_list.
> +  */
> +
> + for (i = 0; i < count; i++) {
> + const char **ids;
> + int idnum;
> + struct egroup *eg;
> + char value[PATH_MAX];
> +
> + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, i) < 
> 0)
> + return ret;
> +
> + if (events->len > 0)
> + strbuf_addf(events, ",");
> +
> + if (metricgroup__has_constraint(pe))
> + metricgroup__add_metric_non_group(events, ids, idnum);
> + else
> + metricgroup__add_metric_weak_group(events, ids, idnum);
> +
> + eg = malloc(sizeof(*eg));
> + if (!eg) {
> + ret = -ENOMEM;
> + return ret;
> + }
> +
> + sprintf(value, "%s%c%d", pe->metric_name, '_', i);
> + eg->ids = ids;
> + eg->idnum = idnum;
> + eg->metric_name = strdup(value);
> + if (!eg->metric_name) {
> + ret = -ENOMEM;
> + return ret;
> + }
> +
> + eg->metric_expr = pe->metric_expr;
> + eg->metric_unit = pe->unit;
> + list_add_tail(&eg->nd, group_list);
> + ret = 0;
> +
> + if (ret != 0)
> + break;

again, this is part of metricgroup__add_metric_param no? why not use it?

jirka

Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote:
> This patch refactor metricgroup__add_metric function where
> some part of it move to function metricgroup__add_metric_param.
> No logic change.
> 
> Signed-off-by: Kajol Jain 
> ---
>  tools/perf/util/metricgroup.c | 63 +--
>  1 file changed, 38 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index c3a8c701609a..b4919bcfbd8b 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event 
> *pe)
>   return false;
>  }
>  
> +static int metricgroup__add_metric_param(struct strbuf *events,
> + struct list_head *group_list, struct pmu_event *pe)
> +{
> +
> + const char **ids;
> + int idnum;
> + struct egroup *eg;
> + int ret = -EINVAL;
> +
> + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0)
> + return ret;
> +
> + if (events->len > 0)
> + strbuf_addf(events, ",");
> +
> + if (metricgroup__has_constraint(pe))
> + metricgroup__add_metric_non_group(events, ids, idnum);
> + else
> + metricgroup__add_metric_weak_group(events, ids, idnum);
> +
> + eg = malloc(sizeof(*eg));
> + if (!eg)
> + ret = -ENOMEM;
> +
> + eg->ids = ids;
> + eg->idnum = idnum;
> + eg->metric_name = pe->metric_name;
> + eg->metric_expr = pe->metric_expr;
> + eg->metric_unit = pe->unit;
> + list_add_tail(&eg->nd, group_list);
> + ret = 0;
> +
> + return ret;

return 0;

jirka

Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote:
> This patch refactor metricgroup__add_metric function where
> some part of it move to function metricgroup__add_metric_param.
> No logic change.
> 
> Signed-off-by: Kajol Jain 
> ---
>  tools/perf/util/metricgroup.c | 63 +--
>  1 file changed, 38 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index c3a8c701609a..b4919bcfbd8b 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event 
> *pe)
>   return false;
>  }
>  
> +static int metricgroup__add_metric_param(struct strbuf *events,
> + struct list_head *group_list, struct pmu_event *pe)
> +{
> +
> + const char **ids;
> + int idnum;
> + struct egroup *eg;
> + int ret = -EINVAL;
> +
> + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0)
> + return ret;
> +
> + if (events->len > 0)
> + strbuf_addf(events, ",");
> +
> + if (metricgroup__has_constraint(pe))
> + metricgroup__add_metric_non_group(events, ids, idnum);
> + else
> + metricgroup__add_metric_weak_group(events, ids, idnum);
> +
> + eg = malloc(sizeof(*eg));
> + if (!eg)
> + ret = -ENOMEM;

??? you need to return in here, eg is NULL

> +
> + eg->ids = ids;
> + eg->idnum = idnum;
> + eg->metric_name = pe->metric_name;
> + eg->metric_expr = pe->metric_expr;
> + eg->metric_unit = pe->unit;
> + list_add_tail(&eg->nd, group_list);
> + ret = 0;
> +
> + return ret;
> +}
> +
>  static int metricgroup__add_metric(const char *metric, struct strbuf *events,
>  struct list_head *group_list)
>  {
> @@ -493,35 +528,13 @@ static int metricgroup__add_metric(const char *metric, 
> struct strbuf *events,
>   continue;
>   if (match_metric(pe->metric_group, metric) ||
>   match_metric(pe->metric_name, metric)) {
> - const char **ids;
> - int idnum;
> - struct egroup *eg;
>  
>   pr_debug("metric expr %s for %s\n", pe->metric_expr, 
> pe->metric_name);
>  
> - if (expr__find_other(pe->metric_expr,
> -  NULL, &ids, &idnum) < 0)
> + ret = metricgroup__add_metric_param(events,
> + group_list, pe);
> + if (ret == -EINVAL)
>   continue;

previous code did 'continue' on ret < 0, why just -EINVAL now?

jirka

Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote:
> This patch refactor metricgroup__add_metric function where
> some part of it move to function metricgroup__add_metric_param.
> No logic change.

can't compile this change:

[jolsa@krava perf]$ make JOBS=1
  BUILD:   Doing 'make -j1' parallel build
  CC   util/metricgroup.o
util/metricgroup.c: In function ‘metricgroup__add_metric_param’:
util/metricgroup.c:486:6: error: too many arguments to function 
‘expr__find_other’
  486 |  if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0)
  |  ^~~~
In file included from util/metricgroup.c:14:
util/expr.h:25:5: note: declared here
   25 | int expr__find_other(const char *expr, const char *one, const char 
***other,
  | ^~~~
mv: cannot stat 'util/.metricgroup.o.tmp': No such file or directory
make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: 
util/metricgroup.o] Error 1
make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: 
util] Error 2
make[2]: *** [Makefile.perf:616: perf-in.o] Error 2
make[1]: *** [Makefile.perf:225: sub-make] Error 2
make: *** [Makefile:70: all] Error 2

jirka

> 
> Signed-off-by: Kajol Jain 
> ---
>  tools/perf/util/metricgroup.c | 63 +--
>  1 file changed, 38 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index c3a8c701609a..b4919bcfbd8b 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event 
> *pe)
>   return false;
>  }
>  
> +static int metricgroup__add_metric_param(struct strbuf *events,
> + struct list_head *group_list, struct pmu_event *pe)
> +{
> +
> + const char **ids;
> + int idnum;
> + struct egroup *eg;
> + int ret = -EINVAL;
> +
> + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0)
> + return ret;
> +
> + if (events->len > 0)
> + strbuf_addf(events, ",");
> +
> + if (metricgroup__has_constraint(pe))
> + metricgroup__add_metric_non_group(events, ids, idnum);
> + else
> + metricgroup__add_metric_weak_group(events, ids, idnum);
> +
> + eg = malloc(sizeof(*eg));
> + if (!eg)
> + ret = -ENOMEM;
> +
> + eg->ids = ids;
> + eg->idnum = idnum;
> + eg->metric_name = pe->metric_name;
> + eg->metric_expr = pe->metric_expr;
> + eg->metric_unit = pe->unit;
> + list_add_tail(&eg->nd, group_list);
> + ret = 0;
> +
> + return ret;
> +}
> +
>  static int metricgroup__add_metric(const char *metric, struct strbuf *events,
>  struct list_head *group_list)
>  {
> @@ -493,35 +528,13 @@ static int metricgroup__add_metric(const char *metric, 
> struct strbuf *events,
>   continue;
>   if (match_metric(pe->metric_group, metric) ||
>   match_metric(pe->metric_name, metric)) {
> - const char **ids;
> - int idnum;
> - struct egroup *eg;
>  
>   pr_debug("metric expr %s for %s\n", pe->metric_expr, 
> pe->metric_name);
>  
> - if (expr__find_other(pe->metric_expr,
> -  NULL, &ids, &idnum) < 0)
> + ret = metricgroup__add_metric_param(events,
> + group_list, pe);
> + if (ret == -EINVAL)
>   continue;
> - if (events->len > 0)
> - strbuf_addf(events, ",");
> -
> - if (metricgroup__has_constraint(pe))
> - metricgroup__add_metric_non_group(events, ids, 
> idnum);
> - else
> - metricgroup__add_metric_weak_group(events, ids, 
> idnum);
> -
> - eg = malloc(sizeof(struct egroup));
> - if (!eg) {
> - ret = -ENOMEM;
> - break;
> - }
> - eg->ids = ids;
> - eg->idnum = idnum;
> - eg->metric_name = pe->metric_name;
> - eg->metric_expr = pe->metric_expr;
> - eg->metric_unit = pe->unit;
> - list_add_tail(&eg->nd, group_list);
> - ret = 0;
>   }
>   }
>   return ret;
> -- 
> 2.18.1
>

Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote:

SNIP

> diff --git a/tools/perf/arch/powerpc/util/header.c 
> b/tools/perf/arch/powerpc/util/header.c
> index 3b4cdfc5efd6..dcc3c6ab2e67 100644
> --- a/tools/perf/arch/powerpc/util/header.c
> +++ b/tools/perf/arch/powerpc/util/header.c
> @@ -7,6 +7,8 @@
>  #include 
>  #include 
>  #include "header.h"
> +#include "metricgroup.h"
> +#include 
>  
>  #define mfspr(rn)   ({unsigned long rval; \
>asm volatile("mfspr %0," __stringify(rn) \
> @@ -16,6 +18,8 @@
>  #define PVR_VER(pvr)(((pvr) >>  16) & 0x) /* Version field */
>  #define PVR_REV(pvr)(((pvr) >>   0) & 0x) /* Revison field */
>  
> +#define SOCKETS_INFO_FILE_PATH "/devices/hv_24x7/interface/sockets"
> +
>  int
>  get_cpuid(char *buffer, size_t sz)
>  {
> @@ -44,3 +48,9 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
>  
>   return bufp;
>  }
> +
> +int arch_get_runtimeparam(void)
> +{
> + int count;
> + return sysfs__read_int(SOCKETS_INFO_FILE_PATH, &count) < 0 ? 1 : count;

is that SOCKETS_INFO_FILE_PATH define used later? if not please
put the path directly as an argument to sysfs__read_int

jirka

Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"

2020-03-17 Thread Jiri Olsa

On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote:

SNIP

> diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
> index 0938ad166ece..7786829b048b 100644
> --- a/tools/perf/util/expr.h
> +++ b/tools/perf/util/expr.h
> @@ -17,12 +17,13 @@ struct expr_parse_ctx {
>  
>  struct expr_scanner_ctx {
>   int start_token;
> + int expr__runtimeparam;

no need for expr__ prefix in here.. jsut runtime_param

> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 402af3e8d287..85ac6d913782 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -336,7 +336,7 @@ void perf_stat__collect_metric_expr(struct evlist 
> *evsel_list)
>   metric_events = counter->metric_events;
>   if (!metric_events) {
>   if (expr__find_other(counter->metric_expr, 
> counter->name,
> - &metric_names, 
> &num_metric_names) < 0)
> + &metric_names, 
> &num_metric_names, 1) < 0)
>   continue;
>  
>   metric_events = calloc(sizeof(struct evsel *),
> @@ -777,7 +777,15 @@ static void generic_metric(struct perf_stat_config 
> *config,
>   }
>  
>   if (!metric_events[i]) {
> - if (expr__parse(&ratio, &pctx, metric_expr) == 0) {
> + int param = 1;
> + if (strstr(metric_expr, "?")) {
> + char *tmp = strrchr(metric_name, '_');
> +
> + tmp++;
> + param = strtol(tmp, &tmp, 10);
> + }

so, if metric_expr contains '?' you go and search metric_name for '_'
and use the number after '_' ... ugh.. what's the logic? 

I understand you create as many metrics as the magic runtime param
tells you.. but what's the connection to this?

could you please outline in the changelog or comment the whole scheme
of how this all should work? like step by step on some simple example?

thanks,
jirka

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Srikar Dronamraju

* Vlastimil Babka  [2020-03-17 14:53:26]:

> >> > 
> >> > Mitigate this by allocating the new slab from the node_numa_mem.
> >> 
> >> Are you sure this is really needed and the other 3 patches are not enough 
> >> for
> >> the current SLUB code to work as needed? It seems you are changing the 
> >> semantics
> >> here...
> >> 
> > 
> > The other 3 patches are not enough because we don't carry the searchnode
> > when the actual alloc_pages_node gets called. 
> > 
> > With only the 3 patches, we see the above Panic, its signature is slightly
> > different from what Sachin first reported and which I have carried in 1st
> > patch.
> 
> Ah, I see. So that's the missing pgdat after your series [1] right?

Yes the pgdat would be missing after my cpuless, memoryless node patchset.
However..
> 
> That sounds like an argument for Michal's suggestions that pgdats exist and 
> have
> correctly populated zonelists for all possible nodes.

Only the first patch in this series would be affected by pgdat existing or
not.  Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages
would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is
cpuless it would return 0. If we pass this node 0 (which is memoryless/cpuless) 
to
alloc_pages_node. Please note I am only setting node_numa_mem only
for offline nodes. However we could change this to set for all offline and
memoryless nodes.

> node_to_mem_node() could be just a shortcut for the first zone's node in the
> zonelist, so that fallback follows the topology.
> 
> [1]
> https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e
> 



-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH v1 39/46] powerpc/8xx: Add a function to early map kernel via huge pages

2020-03-17 Thread Christophe Leroy





Le 17/03/2020 à 02:39, kbuild test robot a écrit :

Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200316]
[cannot apply to powerpc/next v5.6-rc6 v5.6-rc5 v5.6-rc4 v5.6-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/Use-hugepages-to-map-kernel-mem-on-8xx/20200317-065610
base:8548fd2f20ed19b0e8c0585b71fdfde1ae00ae3c
config: powerpc-tqm8xx_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.2.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # save the attached .config to linux build tree
 GCC_VERSION=9.2.0 make.cross ARCH=powerpc

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

In file included from arch/powerpc/mm/fault.c:33:
include/linux/hugetlb.h: In function 'hstate_inode':

include/linux/hugetlb.h:522:9: error: implicit declaration of function 
'HUGETLBFS_SB'; did you mean 'HUGETLBFS_MAGIC'? 
[-Werror=implicit-function-declaration]

  522 |  return HUGETLBFS_SB(i->i_sb)->hstate;
  | ^~~~
  | HUGETLBFS_MAGIC

include/linux/hugetlb.h:522:30: error: invalid type argument of '->' (have 
'int')

  522 |  return HUGETLBFS_SB(i->i_sb)->hstate;
  |  ^~
cc1: all warnings being treated as errors


hstate_inode() shouldn't use HUGETLBFS_SB() which CONFIG_HUGETLBFS is 
not set.


Proposed fix at https://patchwork.ozlabs.org/patch/1256108/

[...]



include/linux/hugetlb.h:522:30: error: invalid type argument of '->' (have 
'int')

  522 |  return HUGETLBFS_SB(i->i_sb)->hstate;
  |  ^~
At top level:
arch/powerpc//mm/nohash/8xx.c:73:18: error: '__early_map_kernel_hugepage' 
defined but not used [-Werror=unused-function]
   73 | static int __ref __early_map_kernel_hugepage(unsigned long va, 
phys_addr_t pa,
  |  ^~~
cc1: all warnings being treated as errors


This patch is a preparation patch. The function is not used yet, that's 
normal. Ok, it breaks bisectability. Should it be squashed with the 
first user of the function ?


Christophe

Re: [PATCH v1 16/46] powerpc/mm: Allocate static page tables for fixmap

2020-03-17 Thread Christophe Leroy





Le 16/03/2020 à 13:36, Christophe Leroy a écrit :

Allocate static page tables for the fixmap area. This allows
setting mappings through page tables before memblock is ready.
That's needed to use early_ioremap() early and to use standard
page mappings with fixmap.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/fixmap.h |  4 
  arch/powerpc/kernel/setup_32.c|  2 +-
  arch/powerpc/mm/pgtable_32.c  | 16 
  3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/fixmap.h 
b/arch/powerpc/include/asm/fixmap.h
index 2ef155a3c821..ccbe2e83c950 100644
--- a/arch/powerpc/include/asm/fixmap.h
+++ b/arch/powerpc/include/asm/fixmap.h
@@ -86,6 +86,10 @@ enum fixed_addresses {
  #define __FIXADDR_SIZE(__end_of_fixed_addresses << PAGE_SHIFT)
  #define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE)
  
+#define FIXMAP_ALIGNED_SIZE	(ALIGN(FIXADDR_TOP, PGDIR_SIZE) - \

+ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE))
+#define FIXMAP_PTE_SIZE(FIXMAP_ALIGNED_SIZE / PGDIR_SIZE * 
PTE_TABLE_SIZE)
+
  #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NCG
  #define FIXMAP_PAGE_IOPAGE_KERNEL_NCG
  
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c

index 5b49b26eb154..3f1e1c0b328a 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -81,7 +81,7 @@ notrace void __init machine_init(u64 dt_ptr)
/* Configure static keys first, now that we're relocated. */
setup_feature_keys();
  
-	early_ioremap_setup();

+   early_ioremap_init();
  
  	/* Enable early debugging if any specified (see udbg.h) */

udbg_early_init();
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index f62de06e3d07..9934659cb871 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -29,11 +29,27 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 
  
  extern char etext[], _stext[], _sinittext[], _einittext[];
  
+static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;


Sparse reports this as a variable size array. This is definitely not. 
Gcc properly sees it is an 8k table (2 pages).


Christophe



+
+notrace void __init early_ioremap_init(void)
+{
+   unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE);
+   pte_t *ptep = (pte_t *)early_fixmap_pagetable;
+   pmd_t *pmdp = pmd_ptr_k(addr);
+
+   for (; (s32)(FIXADDR_TOP - addr) > 0;
+addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++)
+   pmd_populate_kernel(&init_mm, pmdp, ptep);
+
+   early_ioremap_setup();
+}
+
  static void __init *early_alloc_pgtable(unsigned long size)
  {
void *ptr = memblock_alloc(size, size);



Christophe

Re: [PATCH 08/12] docs: fix broken references to text files

2020-03-17 Thread Mathieu Poirier

On Tue, Mar 17, 2020 at 02:10:47PM +0100, Mauro Carvalho Chehab wrote:
> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/admin-guide/kernel-parameters.txt  |  2 +-
>  Documentation/memory-barriers.txt|  2 +-
>  Documentation/process/submit-checklist.rst   |  2 +-
>  .../translations/it_IT/process/submit-checklist.rst  |  2 +-
>  Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
>  .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
>  .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
>  Documentation/virt/kvm/hypercalls.rst|  4 ++--
>  arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
>  drivers/gpu/drm/Kconfig  |  2 +-
>  drivers/gpu/drm/drm_ioctl.c  |  2 +-
>  drivers/hwtracing/coresight/Kconfig  |  2 +-
>  fs/fat/Kconfig   |  8 
>  fs/fuse/Kconfig  |  2 +-
>  fs/fuse/dev.c|  2 +-
>  fs/overlayfs/Kconfig |  6 +++---
>  include/linux/mm.h   |  4 ++--
>  include/uapi/linux/ethtool_netlink.h |  2 +-
>  include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
>  mm/gup.c | 12 ++--
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
>  virt/kvm/arm/vgic/vgic.h |  4 ++--
>  24 files changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index df34a4176e58..28be91d4e66b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -960,7 +960,7 @@
>   edid/1680x1050.bin, or edid/1920x1080.bin is given
>   and no file with the same name exists. Details and
>   instructions how to build your own EDID data are
> - available in Documentation/driver-api/edid.rst. An EDID
> + available in Documentation/admin-guide/edid.rst. An EDID
>   data set will only be used for a particular connector,
>   if its name and a colon are prepended to the EDID
>   name. Each connector may use a unique EDID data
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index e1c355e84edd..eaabc3134294 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
> writes
>  until they are certain (1) that the write will actually happen, (2)
>  of the location of the write, and (3) of the value to be written.
>  But please carefully read the "CONTROL DEPENDENCIES" section and the
> -Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
> +Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
>  break dependencies in a great many highly creative ways.
>  
>   CPU 1 CPU 2
> diff --git a/Documentation/process/submit-checklist.rst 
> b/Documentation/process/submit-checklist.rst
> index 8e56337d422d..3f8e9d5d95c2 100644
> --- a/Documentation/process/submit-checklist.rst
> +++ b/Documentation/process/submit-checklist.rst
> @@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
>  and why.
>  
>  26) If any ioctl's are added by the patch, then also update
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  27) If your modified source code depends on or uses any of the kernel
>  APIs or features that are related to the following ``Kconfig`` symbols,
> diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
> b/Documentation/translations/it_IT/process/submit-checklist.rst
> index 995ee69fab11..3e575502690f 100644
> --- a/Documentation/translations/it_IT/process/submit-checklist.rst
> +++ b/Documentation/translations/it_IT/process/submit-checklist.rst
> @@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
>  sorgenti che ne spieghi la logica: cosa fanno e perché.
>  
>  25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  26) Se il codice che avete modificato dipende o usa una qualsiasi 
> interfaccia o
>  funzionali

Re: [PATCH v1 46/46] powerpc/32s: Implement dedicated kasan_init_region()

2020-03-17 Thread Christophe Leroy





Le 16/03/2020 à 13:36, Christophe Leroy a écrit :

Implement a kasan_init_region() dedicated to book3s/32 that
allocates KASAN regions using BATs.

Signed-off-by: Christophe Leroy 


Note that the sparse warning on pmac32_defconfig is definitely a false 
positive. See details patch 16/46 ("powerpc/mm: Allocate static page 
tables for fixmap")


Christophe

Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes

2020-03-17 Thread Srikar Dronamraju

* Bharata B Rao  [2020-03-17 19:52:32]:

Thanks Bharata.

> This patchset can also fix a related problem that I reported earlier at
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html
> with an additional change, suggested by Srikar as shown below:
> 
> >  
> > -   for_each_online_node(node) {
> > +   for_each_node(node) {
> > +   /*
> > +* For all possible but not yet online nodes, ensure their
> > +* node_numa_mem is set correctly so that kmalloc_node works
> > +* for such nodes.
> > +*/
> > +   if (!node_online(node)) {
> 
> Change the above line to like below:
> 
> +   if (!node_state(node, N_MEMORY)) {
> 

Just to clarify, this is needed if we don't have
http://lore.kernel.org/lkml/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u


-- 
Thanks and Regards
Srikar Dronamraju

Re:Re: [PATCH v3] powerpc/fsl-85xx: fix compile error

2020-03-17 Thread 王文虎

From: Michael Ellerman 
 Date: 2020-03-17 19:22:13
To:"王文虎" 
 cc: Benjamin Herrenschmidt ,Paul Mackerras 
,Allison Randal ,Richard Fontana 
,Greg Kroah-Hartman ,Thomas 
Gleixner 
,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org,triv...@kernel.org,ker...@vivo.com,stable

Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>王文虎 
 writes:
>> From: Michael Ellerman 
>>  Date: 2020-03-16 17:41:12
>> To:WANG Wenhu ,Benjamin Herrenschmidt 
>> ,Paul Mackerras ,WANG Wenhu 
>> ,Allison Randal ,Richard Fontana 
>> ,Greg Kroah-Hartman ,Thomas 
>> Gleixner 
>> ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org
>>  cc: triv...@kernel.org,ker...@vivo.com,stable 
>> Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>WANG Wenhu 
>>  writes:
 Include "linux/of_address.h" to fix the compile error for
 mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.

   CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function 
 ‘mpc85xx_l2ctlr_of_probe’:
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration 
 of function ‘of_iomap’; did you mean ‘pci_iomap’? 
 [-Werror=implicit-function-declaration]
   l2ctlr = of_iomap(dev->dev.of_node, 0);
^~~~
pci_iomap
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes 
 pointer from integer without a cast [-Werror=int-conversion]
   l2ctlr = of_iomap(dev->dev.of_node, 0);
  ^
 cc1: all warnings being treated as errors
 scripts/Makefile.build:267: recipe for target 
 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
 make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1

 Fixes: commit 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
>>>
>>>The syntax is:
>>>
>>>Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
>>>
 Cc: stable 
>>>
>>>The commit above went into v2.6.37.
>>>
>>>So no one has noticed this bug since then, how? Or did something else
>>>change to expose the problem?
>>
>> Actually a hard question to answer cause it also left me scratching for long.
>> However, I can not find right or definite answer.
>
>Oh, actually it's fairly straight forward, the code can't be built at
>all in upstream because CONFIG_FSL_85XX_CACHE_SRAM is not selectable or
>selected by anything.

Yeah, sure that is the reason, and I meant it was hard to figure out why
nobody had ever compiled the driver with FSL_85XX_CACHE_SRAM enabled
until me.
>
>You sent a patch previously to make it selectable, which Scott thought
>was a bad idea.
>
>So this whole file is dead code as far as I'm concerned, so patches for
>it definitely do not need to go to stable.
>
>If you want to add a user for it then please send a series doing that,
>and this commit can be the first.

For this, as you mentioned, it is dead and do not need to be applied to any 
stable.
And I recommand the patch as a unit itself cause our module which uses
it is still under developing, and the module itself would be taken as a
complete logical block. Also it would take some time.

Thanks, Wenhu
>
>cheers

Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes

2020-03-17 Thread Bharata B Rao

This patchset can also fix a related problem that I reported earlier at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html
with an additional change, suggested by Srikar as shown below:

On Tue, Mar 17, 2020 at 06:47:53PM +0530, Srikar Dronamraju wrote:
> Currently fallback nodes for offline nodes aren't set. Hence by default
> node 0 ends up being the default node. However node 0 might be offline.
> 
> Fix this by explicitly setting fallback node. Ensure first_memory_node
> is set before kernel does explicit setting of fallback node.
> 
>  arch/powerpc/mm/numa.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 281531340230..6e97ab6575cb 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void)
>   if (!numa_enabled)
>   return;
>  
> - for_each_online_node(node) {
> + for_each_node(node) {
> + /*
> +  * For all possible but not yet online nodes, ensure their
> +  * node_numa_mem is set correctly so that kmalloc_node works
> +  * for such nodes.
> +  */
> + if (!node_online(node)) {

Change the above line to like below:

+   if (!node_state(node, N_MEMORY)) {

Regards,
Bharata.

[PATCH v5 0/3] Introduce Self-Save API for deep stop states

2020-03-17 Thread Pratik Rajesh Sampat

Skiboot patches v4: 
https://lists.ozlabs.org/pipermail/skiboot/2020-February/016404.html
Linux patches v4: https://lkml.org/lkml/2020/2/12/446
Changelog
v4 --> v5
Based on Michael Ellerman's comments, in patch3:
1. Added documentation in power-mgt.txt for self-save and self-restore
2. Used bitmap abstractions while parsing device tree
3. Fixed scenario where zero or less SPRs were treated as a success
4. Removed redundant device_node pointer
5. Removed unnecessary pr_warn messages
6. Changed the old of_find_node_by_path calls to of_find_compatible_node
7. Fixed leaking reference of the parsed device_node pointer 

Currently the stop-API supports a mechanism called as self-restore
which allows us to restore the values of certain SPRs on wakeup from a
deep-stop state to a desired value. To use this, the Kernel makes an
OPAL call passing the PIR of the CPU, the SPR number and the value to
which the SPR should be restored when that CPU wakes up from a deep
stop state.

Recently, a new feature, named self-save has been enabled in the
stop-api, which is an alternative mechanism to do the same, except
that self-save will save the current content of the SPR before
entering a deep stop state and also restore the content back on
waking up from a deep stop state.

This patch series aims at introducing and leveraging the self-save feature in
the kernel.

Now, as the kernel has a choice to prefer one mode over the other and
there can be registers in both the save/restore SPR list which are sent
from the device tree, a new interface has been defined for the seamless
handing of the modes for each SPR.

A list of preferred SPRs are maintained in the kernel which contains two
properties:
1. supported_mode: Helps in identifying if it strictly supports self
   save or restore or both.
   Initialized using the information from device tree.
2. preferred_mode: Calls out what mode is preferred for each SPR. It
   could be strictly self save or restore, or it can also
   determine the preference of  mode over the other if both
   are present by encapsulating the other in bitmask from
   LSB to MSB.
   Initialized statically.

Below is a table to show the Scenario::Consequence when the self save and
self restore modes are available or disabled in different combinations as
perceived from the device tree thus giving complete backwards compatibly
regardless of an older firmware running a newer kernel or vise-versa.
Support for self save or self-restore is embedded in the device tree,
along with the set of registers it supports.

SR = Self restore; SS = Self save

.---..
| Scenario  |Consequence |
:---+:
| Legacy Firmware. No SS or SR node | Self restore is called for all |
|   | supported SPRs |
:---+:
| SR: !active SS: !active   | Deep stop states disabled  |
:---+:
| SR: active SS: !active| Self restore is called for all |
|   | supported SPRs |
:---+:
| SR: active SS: active | Goes through the preferences for each  |
|   | SPR and executes of the modes  |
|   | accordingly. Currently, Self restore is|
|   | called for all the SPRs except PSSCR   |
|   | which is self saved|
:---+:
| SR: active(only HID0) SS: active  | Self save called for all supported |
|   | registers expect HID0 (as HID0 cannot  |
|   | be self saved currently)   |
:---+:
| SR: !active SS: active| currently will disable deep states as  |
|   | HID0 is needed to be self restored and |
|   | cannot be self saved   |
'---''

Pratik Rajesh Sampat (3):
  powerpc/powernv: Interface to define support and preference for a SPR
  powerpc/powernv: Introduce Self save support
  powerpc/powernv: Parse device tree, population of SPR support

 .../bindings/powerpc/opal/power-mgt.txt   |  10 +
 arch/powerpc/include/asm/opal-api.h   |   3 +-
 arch/p

[PATCH v5 3/3] powerpc/powernv: Parse device tree, population of SPR support

2020-03-17 Thread Pratik Rajesh Sampat

Parse the device tree for nodes self-save, self-restore and populate
support for the preferred SPRs based what was advertised by the device
tree.

Signed-off-by: Pratik Rajesh Sampat 
Reviewed-by: Ram Pai 
---
 .../bindings/powerpc/opal/power-mgt.txt   | 10 +++
 arch/powerpc/platforms/powernv/idle.c | 78 +++
 2 files changed, 88 insertions(+)

diff --git a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt 
b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
index 9d619e955576..093cb5fe3d2d 100644
--- a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
+++ b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
@@ -116,3 +116,13 @@ otherwise. The length of all the property arrays must be 
the same.
which of the fields of the PMICR are set in the corresponding
entries in ibm,cpu-idle-state-pmicr. This is an optional
property on POWER8 and is absent on POWER9.
+
+- self-restore:
+ Array of unsigned 64-bit values containing a property for sprn-mask
+ with each bit indicating the index of the supported SPR for the
+ functionality. This is an optional property for both Power8 and Power9
+
+- self-save:
+  Array of unsigned 64-bit values containing a property for sprn-mask
+  with each bit indicating the index of the supported SPR for the
+  functionality. This is an optional property for both Power8 and Power9
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 97aeb45e897b..c39111b338ff 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -1436,6 +1436,81 @@ static void __init pnv_probe_idle_states(void)
supported_cpuidle_states |= pnv_idle_states[i].flags;
 }
 
+/*
+ * Extracts and populates the self save or restore capabilities
+ * passed from the device tree node
+ */
+static int extract_save_restore_state_dt(struct device_node *np, int type)
+{
+   int nr_sprns = 0, i, bitmask_index;
+   u64 *temp_u64;
+   u64 bit_pos;
+
+   nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask");
+   if (nr_sprns <= 0)
+   return -EINVAL;
+   temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL);
+   if (of_property_read_u64_array(np, "sprn-bitmask",
+  temp_u64, nr_sprns)) {
+   pr_warn("cpuidle-powernv: failed to find registers in DT\n");
+   kfree(temp_u64);
+   return -EINVAL;
+   }
+   /*
+* Populate acknowledgment of support for the sprs in the global vector
+* gotten by the registers supplied by the firmware.
+* The registers are in a bitmask, bit index within
+* that specifies the SPR
+*/
+   for (i = 0; i < nr_preferred_sprs; i++) {
+   bitmask_index = BIT_WORD(preferred_sprs[i].spr);
+   bit_pos = BIT_MASK(preferred_sprs[i].spr);
+   if ((temp_u64[bitmask_index] & bit_pos) == 0) {
+   if (type == SELF_RESTORE_TYPE)
+   preferred_sprs[i].supported_mode &=
+   ~SELF_RESTORE_STRICT;
+   else
+   preferred_sprs[i].supported_mode &=
+   ~SELF_SAVE_STRICT;
+   continue;
+   }
+   if (type == SELF_RESTORE_TYPE) {
+   preferred_sprs[i].supported_mode |=
+   SELF_RESTORE_STRICT;
+   } else {
+   preferred_sprs[i].supported_mode |=
+   SELF_SAVE_STRICT;
+   }
+   }
+
+   kfree(temp_u64);
+   return 0;
+}
+
+static int pnv_parse_deepstate_dt(void)
+{
+   struct device_node *np;
+   int rc = 0, i;
+
+   /* Self restore register population */
+   np = of_find_compatible_node(NULL, NULL, "ibm,opal-self-restore");
+   if (np) {
+   rc = extract_save_restore_state_dt(np, SELF_RESTORE_TYPE);
+   if (rc != 0)
+   return rc;
+   }
+   /* Self save register population */
+   np = of_find_compatible_node(NULL, NULL, "ibm,opal-self-save");
+   if (!np) {
+   for (i = 0; i < nr_preferred_sprs; i++)
+   preferred_sprs[i].supported_mode &= ~SELF_SAVE_STRICT;
+   } else {
+   rc = extract_save_restore_state_dt(np, SELF_SAVE_TYPE);
+   }
+   of_node_put(np);
+   return rc;
+}
+
 /*
  * This function parses device-tree and populates all the information
  * into pnv_idle_states structure. It also sets up nr_pnv_idle_states
@@ -1584,6 +1659,9 @@ static int __init pnv_init_idle_states(void)
return rc;
pnv_probe_idle_states();
 
+   rc = pnv_parse_deepstate_dt();
+   if (rc)
+   return rc;
if (!cpu_has_feature(CPU_FTR_ARCH_3

[PATCH v5 2/3] powerpc/powernv: Introduce Self save support

2020-03-17 Thread Pratik Rajesh Sampat

This commit introduces and leverages the Self save API which OPAL now
supports.

Add the new Self Save OPAL API call in the list of OPAL calls.
Implement the self saving of the SPRs based on the support populated
while respecting it's preferences.

This implementation allows mixing of support for the SPRs, which
means that a SPR can be self restored while another SPR be self saved if
they support and prefer it to be so.

Signed-off-by: Pratik Rajesh Sampat 
Reviewed-by: Ram Pai 
---
 arch/powerpc/include/asm/opal-api.h|  3 ++-
 arch/powerpc/include/asm/opal.h|  1 +
 arch/powerpc/platforms/powernv/idle.c  | 22 ++
 arch/powerpc/platforms/powernv/opal-call.c |  1 +
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index c1f25a760eb1..1b6e1a68d431 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -214,7 +214,8 @@
 #define OPAL_SECVAR_GET176
 #define OPAL_SECVAR_GET_NEXT   177
 #define OPAL_SECVAR_ENQUEUE_UPDATE 178
-#define OPAL_LAST  178
+#define OPAL_SLW_SELF_SAVE_REG 181
+#define OPAL_LAST  181
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9986ac34b8e2..389a85b63805 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -203,6 +203,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_handle_hmi2(__be64 *out_flags);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_slw_self_save_reg(uint64_t cpu_pir, uint64_t sprn);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 03fe835aadd1..97aeb45e897b 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -279,6 +279,26 @@ static int pnv_self_save_restore_sprs(void)
if (rc != 0)
return rc;
break;
+   } else if (preferred & curr_spr.supported_mode
+  & SELF_SAVE_STRICT) {
+   is_initialized = true;
+   if (curr_spr.spr == SPRN_HMEER &&
+   cpu_thread_in_core(cpu) != 0) {
+   continue;
+   }
+   rc = opal_slw_self_save_reg(pir,
+   curr_spr.spr);
+   if (rc != 0)
+   return rc;
+   switch (curr_spr.spr) {
+   case SPRN_LPCR:
+   is_lpcr_self_save = true;
+   break;
+   case SPRN_PTCR:
+   is_ptcr_self_save = true;
+   break;
+   }
+   break;
}
preferred_sprs[index].preferred_mode =
preferred_sprs[index].preferred_mode >>
@@ -1159,6 +1179,8 @@ void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 
lpcr_val)
if (!is_lpcr_self_save)
opal_slw_set_reg(pir, SPRN_LPCR,
 lpcr_val);
+   else
+   opal_slw_self_save_reg(pir, SPRN_LPCR);
}
 }
 
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..11e0ceb90de0 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -223,6 +223,7 @@ OPAL_CALL(opal_handle_hmi,  
OPAL_HANDLE_HMI);
 OPAL_CALL(opal_handle_hmi2,OPAL_HANDLE_HMI2);
 OPAL_CALL(opal_config_cpu_idle_state,  OPAL_CONFIG_CPU_IDLE_STATE);
 OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG);
+OPAL_CALL(opal_slw_self_save_reg,  OPAL_SLW_SELF_SAV

[PATCH v5 1/3] powerpc/powernv: Interface to define support and preference for a SPR

2020-03-17 Thread Pratik Rajesh Sampat

Define a bitmask interface to determine support for the Self Restore,
Self Save or both.

Also define an interface to determine the preference of that SPR to
be strictly saved or restored or encapsulated with an order of preference.

The preference bitmask is shown as below:

|... | 2nd pref | 1st pref |

MSB   LSB

The preference from higher to lower is from LSB to MSB with a shift of 8
bits.
Example:
Prefer self save first, if not available then prefer self
restore
The preference mask for this scenario will be seen as below.
((SELF_RESTORE_STRICT << PREFERENCE_SHIFT) | SELF_SAVE_STRICT)
-
|... | Self restore | Self save |
-
MSB LSB

Finally, declare a list of preferred SPRs which encapsulate the bitmaks
for preferred and supported with defaults of both being set to support
legacy firmware.

This commit also implements using the above interface and retains the
legacy functionality of self restore.

Signed-off-by: Pratik Rajesh Sampat 
Reviewed-by: Ram Pai 
---
 arch/powerpc/platforms/powernv/idle.c | 316 +-
 1 file changed, 259 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 78599bca66c2..03fe835aadd1 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -32,9 +32,112 @@
 #define P9_STOP_SPR_MSR 2000
 #define P9_STOP_SPR_PSSCR  855
 
+/* Interface for the stop state supported and preference */
+#define SELF_RESTORE_TYPE0
+#define SELF_SAVE_TYPE   1
+
+#define NR_PREFERENCES2
+#define PREFERENCE_SHIFT  4
+#define PREFERENCE_MASK   0xf
+
+#define UNSUPPORTED 0x0
+#define SELF_RESTORE_STRICT 0x1
+#define SELF_SAVE_STRICT0x2
+
+/*
+ * Bitmask defining the kind of preferences available.
+ * Note : The higher to lower preference is from LSB to MSB, with a shift of
+ * 4 bits.
+ * 
+ * || 2nd pref | 1st pref |
+ * 
+ * MSB   LSB
+ */
+/* Prefer Restore if available, otherwise unsupported */
+#define PREFER_SELF_RESTORE_ONLY   SELF_RESTORE_STRICT
+/* Prefer Save if available, otherwise unsupported */
+#define PREFER_SELF_SAVE_ONLY  SELF_SAVE_STRICT
+/* Prefer Restore when available, otherwise prefer Save */
+#define PREFER_RESTORE_SAVE((SELF_SAVE_STRICT << \
+ PREFERENCE_SHIFT)\
+ | SELF_RESTORE_STRICT)
+/* Prefer Save when available, otherwise prefer Restore*/
+#define PREFER_SAVE_RESTORE((SELF_RESTORE_STRICT <<\
+ PREFERENCE_SHIFT)\
+ | SELF_SAVE_STRICT)
 static u32 supported_cpuidle_states;
 struct pnv_idle_states_t *pnv_idle_states;
 int nr_pnv_idle_states;
+/* Caching the lpcr & ptcr support to use later */
+static bool is_lpcr_self_save;
+static bool is_ptcr_self_save;
+
+struct preferred_sprs {
+   u64 spr;
+   u32 preferred_mode;
+   u32 supported_mode;
+};
+
+/*
+ * Preferred mode: Order of precedence when both self-save and self-restore
+ *supported
+ * Supported mode: Default support. Can be overwritten during system
+ *initialization
+ */
+struct preferred_sprs preferred_sprs[] = {
+   {
+   .spr = SPRN_HSPRG0,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_LPCR,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_PTCR,
+   .preferred_mode = PREFER_SAVE_RESTORE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HMEER,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID0,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = P9_STOP_SPR_MSR,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = P9_STOP_SPR_PSSCR,
+   .preferred_mode = PREFER_SAVE_RESTORE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID1,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID4,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+

Re: [PATCH v4 3/3] powerpc/powernv: Parse device tree, population of SPR support

2020-03-17 Thread Pratik Sampat


Thank you Michael for the review.


On 17/03/20 8:43 am, Michael Ellerman wrote:

Pratik Rajesh Sampat  writes:

Parse the device tree for nodes self-save, self-restore and populate
support for the preferred SPRs based what was advertised by the device
tree.

These should be documented in:
   Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt


Sure thing I'll add them.


diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 97aeb45e897b..27dfadf609e8 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -1436,6 +1436,85 @@ static void __init pnv_probe_idle_states(void)
supported_cpuidle_states |= pnv_idle_states[i].flags;
  }
  
+/*

+ * Extracts and populates the self save or restore capabilities
+ * passed from the device tree node
+ */
+static int extract_save_restore_state_dt(struct device_node *np, int type)
+{
+   int nr_sprns = 0, i, bitmask_index;
+   int rc = 0;
+   u64 *temp_u64;
+   u64 bit_pos;
+
+   nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask");
+   if (nr_sprns <= 0)
+   return rc;

Using <= 0 means zero SPRs is treated by success as the caller, is that
intended? If so a comment would be appropriate.


My bad, this is undesirable. This should be treated as a failure.
I'll fix this.


+   temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL);
+   if (of_property_read_u64_array(np, "sprn-bitmask",
+  temp_u64, nr_sprns)) {
+   pr_warn("cpuidle-powernv: failed to find registers in DT\n");
+   kfree(temp_u64);
+   return -EINVAL;
+   }
+   /*
+* Populate acknowledgment of support for the sprs in the global vector
+* gotten by the registers supplied by the firmware.
+* The registers are in a bitmask, bit index within
+* that specifies the SPR
+*/
+   for (i = 0; i < nr_preferred_sprs; i++) {
+   bitmask_index = preferred_sprs[i].spr / 64;
+   bit_pos = preferred_sprs[i].spr % 64;

This is basically a hand coded bitmap, see eg. BIT_WORD(), BIT_MASK() etc.

I don't think there's an easy way to convert temp_u64 into a proper
bitmap, so it's probably not worth doing that. But at least use the macros.


Right. I'll use the macros for a cleaner abstraction.


+   if ((temp_u64[bitmask_index] & (1UL << bit_pos)) == 0) {
+   if (type == SELF_RESTORE_TYPE)
+   preferred_sprs[i].supported_mode &=
+   ~SELF_RESTORE_STRICT;
+   else
+   preferred_sprs[i].supported_mode &=
+   ~SELF_SAVE_STRICT;
+   continue;
+   }
+   if (type == SELF_RESTORE_TYPE) {
+   preferred_sprs[i].supported_mode |=
+   SELF_RESTORE_STRICT;
+   } else {
+   preferred_sprs[i].supported_mode |=
+   SELF_SAVE_STRICT;
+   }
+   }
+
+   kfree(temp_u64);
+   return rc;
+}
+
+static int pnv_parse_deepstate_dt(void)
+{
+   struct device_node *sr_np, *ss_np;

You never use these concurrently AFAICS, so you could just have a single *np.


Sure, got rid of it.


+   int rc = 0, i;
+
+   /* Self restore register population */
+   sr_np = of_find_node_by_path("/ibm,opal/power-mgt/self-restore");

I know the existing idle code uses of_find_node_by_path(), but that's
because it's old and crufty. Please don't add new searches by path. You
should be searching by compatible.


Alright, I'll replace of_find_node_by_path() calls with 
of_find_compatible_node()


+   if (!sr_np) {
+   pr_warn("opal: self restore Node not found");

This warning and the others below will fire on all existing firmware
versions, which is not OK.


I'll get rid of the warnings. They seem unnecessary to me also now.


+   } else {
+   rc = extract_save_restore_state_dt(sr_np, SELF_RESTORE_TYPE);
+   if (rc != 0)
+   return rc;
+   }
+   /* Self save register population */
+   ss_np = of_find_node_by_path("/ibm,opal/power-mgt/self-save");
+   if (!ss_np) {
+   pr_warn("opal: self save Node not found");
+   pr_warn("Legacy firmware. Assuming default self-restore 
support");
+   for (i = 0; i < nr_preferred_sprs; i++)
+   preferred_sprs[i].supported_mode &= ~SELF_SAVE_STRICT;
+   } else {
+   rc = extract_save_restore_state_dt(ss_np, SELF_SAVE_TYPE);
+   }
+   return rc;

You're leaking references on all the device_nodes in here, you need
of_node_put() before exiting.


Got it. I'll clear the refcount before exitting.


+}


cheers

Thanks!
Prat

Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-03-17 Thread Michal Hocko

On Tue 17-03-20 14:44:45, Vlastimil Babka wrote:
> On 3/16/20 10:06 AM, Michal Hocko wrote:
> > On Thu 12-03-20 17:41:58, Vlastimil Babka wrote:
> > [...]
> >> with nid present in:
> >> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some 
> >> online
> > 
> > I would rather have a dummy pgdat for those. Have a look at 
> > $ git grep "NODE_DATA.*->" | wc -l
> > 63
> > 
> > Who knows how many else we have there. I haven't looked more closely.
> > Besides that what is a real reason to not have pgdat ther and force all
> > users of a $random node from those that the platform considers possible
> > for special casing? Is that a memory overhead? Is that really a thing?
> 
> I guess we can ignore memory overhead. I guess there only might be some 
> concern
> that for nodes that are initially offline, we will allocate the pgdat on a
> different node, and after they are online, it will stay on a different node 
> with
> more access latency from local cpus. If we only allocate for online nodes, it
> can always be local? But I guess it doesn't matter that much.

This is not the case even now because of chicke&egg. You need a memory
to allocate from and that memory has to be managed somewhere per node
(pgdat). Keep in mind we do not have the bootmem allocator for the
hotplug. Have a look at hotadd_new_pgdat and when it is called. There
are some attempts to allocate memmap from the hotpluged memory but I am
not sure we can do the whole thing without pgdat in place. If we can
then can come up with some replace the pgdat magic. But still I am not
even sure this is something we really have to optimize for.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Vlastimil Babka

On 3/17/20 2:45 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka  [2020-03-17 14:34:25]:
> 
>> On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
>> > Currently while allocating a slab for a offline node, we use its
>> > associated node_numa_mem to search for a partial slab. If we don't find
>> > a partial slab, we try allocating a slab from the offline node using
>> > __alloc_pages_node. However this is bound to fail.
>> > 
>> > NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0
>> > LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
>> > Call Trace:
>> > [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 
>> > (unreliable)
>> > [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0
>> > [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820
>> > [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60
>> > [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490
>> > [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110
>> > [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270
>> > [c008b3683b90] [c0234e08] online_css+0x48/0xd0
>> > [c008b3683bc0] [c023dedc] 
>> > cgroup_apply_control_enable+0x2ec/0x4d0
>> > [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0
>> > [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0
>> > [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230
>> > [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0
>> > [c008b3683e20] [c000b278] system_call+0x5c/0x68
>> > 
>> > Mitigate this by allocating the new slab from the node_numa_mem.
>> 
>> Are you sure this is really needed and the other 3 patches are not enough for
>> the current SLUB code to work as needed? It seems you are changing the 
>> semantics
>> here...
>> 
> 
> The other 3 patches are not enough because we don't carry the searchnode
> when the actual alloc_pages_node gets called. 
> 
> With only the 3 patches, we see the above Panic, its signature is slightly
> different from what Sachin first reported and which I have carried in 1st
> patch.

Ah, I see. So that's the missing pgdat after your series [1] right?

That sounds like an argument for Michal's suggestions that pgdats exist and have
correctly populated zonelists for all possible nodes.
node_to_mem_node() could be just a shortcut for the first zone's node in the
zonelist, so that fallback follows the topology.

[1]
https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e

>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, 
>> > gfp_t flags, int node,
>> >struct kmem_cache_cpu *c)
>> >  {
>> >void *object;
>> > -  int searchnode = node;
>> >  
>> > -  if (node == NUMA_NO_NODE)
>> > -  searchnode = numa_mem_id();
>> > -  else if (!node_present_pages(node))
>> > -  searchnode = node_to_mem_node(node);
>> > -
>> > -  object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> > +  object = get_partial_node(s, get_node(s, node), c, flags);
>> >if (object || node != NUMA_NO_NODE)>return object;
>> >
>> >   return get_any_partial(s, flags, c);
>> 
>> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
>> hunk below), thus the get_any_partial() call becomes dead code?
>> 
>> > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct 
>> > kmem_cache *s, gfp_t flags,
>> >  
>> >WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
>> >  
>> > +  if (node == NUMA_NO_NODE)
>> > +  node = numa_mem_id();
>> > +  else if (!node_present_pages(node))
>> > +  node = node_to_mem_node(node);
>> > +
>> >freelist = get_partial(s, flags, node, c);
>> >  
>> >if (freelist)
>> > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, 
>> > gfp_t gfpflags, int node,
>> >  redo:
>> >  
>> >if (unlikely(!node_match(page, node))) {
>> > -  int searchnode = node;
>> > -
>> >if (node != NUMA_NO_NODE && !node_present_pages(node))
>> > -  searchnode = node_to_mem_node(node);
>> > +  node = node_to_mem_node(node);
>> >  
>> > -  if (unlikely(!node_match(page, searchnode))) {
>> > +  if (unlikely(!node_match(page, node))) {
>> >stat(s, ALLOC_NODE_MISMATCH);
>> >deactivate_slab(s, page, c->freelist, c);
>> >goto new_slab;
>> > 
>> 
>

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Srikar Dronamraju

* Vlastimil Babka  [2020-03-17 14:34:25]:

> On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
> > Currently while allocating a slab for a offline node, we use its
> > associated node_numa_mem to search for a partial slab. If we don't find
> > a partial slab, we try allocating a slab from the offline node using
> > __alloc_pages_node. However this is bound to fail.
> > 
> > NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0
> > LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
> > Call Trace:
> > [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 
> > (unreliable)
> > [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0
> > [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820
> > [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60
> > [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490
> > [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110
> > [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270
> > [c008b3683b90] [c0234e08] online_css+0x48/0xd0
> > [c008b3683bc0] [c023dedc] 
> > cgroup_apply_control_enable+0x2ec/0x4d0
> > [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0
> > [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0
> > [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230
> > [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0
> > [c008b3683e20] [c000b278] system_call+0x5c/0x68
> > 
> > Mitigate this by allocating the new slab from the node_numa_mem.
> 
> Are you sure this is really needed and the other 3 patches are not enough for
> the current SLUB code to work as needed? It seems you are changing the 
> semantics
> here...
> 

The other 3 patches are not enough because we don't carry the searchnode
when the actual alloc_pages_node gets called. 

With only the 3 patches, we see the above Panic, its signature is slightly
different from what Sachin first reported and which I have carried in 1st
patch.

> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
> > flags, int node,
> > struct kmem_cache_cpu *c)
> >  {
> > void *object;
> > -   int searchnode = node;
> >  
> > -   if (node == NUMA_NO_NODE)
> > -   searchnode = numa_mem_id();
> > -   else if (!node_present_pages(node))
> > -   searchnode = node_to_mem_node(node);
> > -
> > -   object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > +   object = get_partial_node(s, get_node(s, node), c, flags);
> > if (object || node != NUMA_NO_NODE)>return object;
> >
> >   return get_any_partial(s, flags, c);
> 
> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
> hunk below), thus the get_any_partial() call becomes dead code?
> 
> > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct 
> > kmem_cache *s, gfp_t flags,
> >  
> > WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
> >  
> > +   if (node == NUMA_NO_NODE)
> > +   node = numa_mem_id();
> > +   else if (!node_present_pages(node))
> > +   node = node_to_mem_node(node);
> > +
> > freelist = get_partial(s, flags, node, c);
> >  
> > if (freelist)
> > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, 
> > gfp_t gfpflags, int node,
> >  redo:
> >  
> > if (unlikely(!node_match(page, node))) {
> > -   int searchnode = node;
> > -
> > if (node != NUMA_NO_NODE && !node_present_pages(node))
> > -   searchnode = node_to_mem_node(node);
> > +   node = node_to_mem_node(node);
> >  
> > -   if (unlikely(!node_match(page, searchnode))) {
> > +   if (unlikely(!node_match(page, node))) {
> > stat(s, ALLOC_NODE_MISMATCH);
> > deactivate_slab(s, page, c->freelist, c);
> > goto new_slab;
> > 
> 

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-03-17 Thread Vlastimil Babka

On 3/16/20 10:06 AM, Michal Hocko wrote:
> On Thu 12-03-20 17:41:58, Vlastimil Babka wrote:
> [...]
>> with nid present in:
>> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some 
>> online
> 
> I would rather have a dummy pgdat for those. Have a look at 
> $ git grep "NODE_DATA.*->" | wc -l
> 63
> 
> Who knows how many else we have there. I haven't looked more closely.
> Besides that what is a real reason to not have pgdat ther and force all
> users of a $random node from those that the platform considers possible
> for special casing? Is that a memory overhead? Is that really a thing?

I guess we can ignore memory overhead. I guess there only might be some concern
that for nodes that are initially offline, we will allocate the pgdat on a
different node, and after they are online, it will stay on a different node with
more access latency from local cpus. If we only allocate for online nodes, it
can always be local? But I guess it doesn't matter that much.

> Somebody has suggested to tweak some of the low level routines to do the
> special casing but I really have to say I do not like that. We shouldn't
> use the first online node or anything like that. We should simply always
> follow the topology presented by FW and of that we need to have a pgdat.
>

Re: [PATCH 1/4] mm: Check for node_online in node_present_pages

2020-03-17 Thread Srikar Dronamraju

* Srikar Dronamraju  [2020-03-17 18:47:50]:

> 
> Reported-by: Sachin Sant 
> Tested-by: Sachin Sant 
> Signed-off-by: Srikar Dronamraju 
> ---
>  include/linux/mmzone.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f3f264826423..88078a3b95e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>   atomic_long_t   vm_stat[NR_VM_NODE_STAT_ITEMS];
>  } pg_data_t;
> 
> -#define node_present_pages(nid)  (NODE_DATA(nid)->node_present_pages)
> -#define node_spanned_pages(nid)  (NODE_DATA(nid)->node_spanned_pages)
> +#define node_present_pages(nid)  \
> + (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> +#define node_spanned_pages(nid)  \
> + (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>  #ifdef CONFIG_FLAT_NODE_MEM_MAP
>  #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr))
>  #else

If we indeed start having pgdat/NODE_DATA for even offline nodes as Michal
Hocko, then we may not this particular patch.

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Vlastimil Babka

On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
> Currently while allocating a slab for a offline node, we use its
> associated node_numa_mem to search for a partial slab. If we don't find
> a partial slab, we try allocating a slab from the offline node using
> __alloc_pages_node. However this is bound to fail.
> 
> NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0
> LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
> Call Trace:
> [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 
> (unreliable)
> [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0
> [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820
> [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60
> [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490
> [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110
> [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270
> [c008b3683b90] [c0234e08] online_css+0x48/0xd0
> [c008b3683bc0] [c023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
> [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0
> [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0
> [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230
> [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0
> [c008b3683e20] [c000b278] system_call+0x5c/0x68
> 
> Mitigate this by allocating the new slab from the node_numa_mem.

Are you sure this is really needed and the other 3 patches are not enough for
the current SLUB code to work as needed? It seems you are changing the semantics
here...

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
> flags, int node,
>   struct kmem_cache_cpu *c)
>  {
>   void *object;
> - int searchnode = node;
>  
> - if (node == NUMA_NO_NODE)
> - searchnode = numa_mem_id();
> - else if (!node_present_pages(node))
> - searchnode = node_to_mem_node(node);
> -
> - object = get_partial_node(s, get_node(s, searchnode), c, flags);
> + object = get_partial_node(s, get_node(s, node), c, flags);
>   if (object || node != NUMA_NO_NODE)>return object;
>
>   return get_any_partial(s, flags, c);

I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
hunk below), thus the get_any_partial() call becomes dead code?

> @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache 
> *s, gfp_t flags,
>  
>   WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
>  
> + if (node == NUMA_NO_NODE)
> + node = numa_mem_id();
> + else if (!node_present_pages(node))
> + node = node_to_mem_node(node);
> +
>   freelist = get_partial(s, flags, node, c);
>  
>   if (freelist)
> @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, 
> gfp_t gfpflags, int node,
>  redo:
>  
>   if (unlikely(!node_match(page, node))) {
> - int searchnode = node;
> -
>   if (node != NUMA_NO_NODE && !node_present_pages(node))
> - searchnode = node_to_mem_node(node);
> + node = node_to_mem_node(node);
>  
> - if (unlikely(!node_match(page, searchnode))) {
> + if (unlikely(!node_match(page, node))) {
>   stat(s, ALLOC_NODE_MISMATCH);
>   deactivate_slab(s, page, c->freelist, c);
>   goto new_slab;
>

Re: [PATCH 08/12] docs: fix broken references to text files

2020-03-17 Thread Jason Gunthorpe

On Tue, Mar 17, 2020 at 02:10:47PM +0100, Mauro Carvalho Chehab wrote:
> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Signed-off-by: Mauro Carvalho Chehab 
>  Documentation/admin-guide/kernel-parameters.txt  |  2 +-
>  Documentation/memory-barriers.txt|  2 +-
>  Documentation/process/submit-checklist.rst   |  2 +-
>  .../translations/it_IT/process/submit-checklist.rst  |  2 +-
>  Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
>  .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
>  .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
>  Documentation/virt/kvm/hypercalls.rst|  4 ++--
>  arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
>  drivers/gpu/drm/Kconfig  |  2 +-
>  drivers/gpu/drm/drm_ioctl.c  |  2 +-
>  drivers/hwtracing/coresight/Kconfig  |  2 +-
>  fs/fat/Kconfig   |  8 
>  fs/fuse/Kconfig  |  2 +-
>  fs/fuse/dev.c|  2 +-
>  fs/overlayfs/Kconfig |  6 +++---
>  include/linux/mm.h   |  4 ++--
>  include/uapi/linux/ethtool_netlink.h |  2 +-
>  include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-

For the rdma files

Acked-by: Jason Gunthorpe 

Jason

[PATCH 1/4] mm: Check for node_online in node_present_pages

2020-03-17 Thread Srikar Dronamraju

Calling a kmalloc_node on a possible node which is not yet onlined can
lead to panic. Currently node_present_pages() doesn't verify the node is
online before accessing the pgdat for the node. However pgdat struct may
not be available resulting in a crash.

NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
LR [c03d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c008b3783960] [c03d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
[c008b3783a70] [c03d6fa0] __kmalloc_node+0x110/0x490
[c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
[c008b3783b30] [c03fee38] mem_cgroup_css_online+0x108/0x270
[c008b3783b90] [c0235aa8] online_css+0x48/0xd0
[c008b3783bc0] [c023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
[c008b3783d10] [c051e170] kernfs_iop_mkdir+0x90/0xf0
[c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
[c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
[c008b3783e20] [c000b278] system_call+0x5c/0x68

Fix this by verifying the node is online before accessing the pgdat
structure. Fix the same for node_spanned_pages() too.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Sachin Sant 
Cc: Michal Hocko 
Cc: Christopher Lameter 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim 
Cc: Kirill Tkhai 
Cc: Vlastimil Babka 
Cc: Srikar Dronamraju 
Cc: Bharata B Rao 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Srikar Dronamraju 
---
 include/linux/mmzone.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f3f264826423..88078a3b95e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -756,8 +756,10 @@ typedef struct pglist_data {
atomic_long_t   vm_stat[NR_VM_NODE_STAT_ITEMS];
 } pg_data_t;
 
-#define node_present_pages(nid)(NODE_DATA(nid)->node_present_pages)
-#define node_spanned_pages(nid)(NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid)\
+   (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
+#define node_spanned_pages(nid)\
+   (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
 #define pgdat_page_nr(pgdat, pagenr)   ((pgdat)->node_mem_map + (pagenr))
 #else
-- 
2.18.1

[PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes

2020-03-17 Thread Srikar Dronamraju

Currently fallback nodes for offline nodes aren't set. Hence by default
node 0 ends up being the default node. However node 0 might be offline.

Fix this by explicitly setting fallback node. Ensure first_memory_node
is set before kernel does explicit setting of fallback node.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Sachin Sant 
Cc: Michal Hocko 
Cc: Christopher Lameter 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim 
Cc: Kirill Tkhai 
Cc: Vlastimil Babka 
Cc: Srikar Dronamraju 
Cc: Bharata B Rao 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/mm/numa.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 281531340230..6e97ab6575cb 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void)
if (!numa_enabled)
return;
 
-   for_each_online_node(node) {
+   for_each_node(node) {
+   /*
+* For all possible but not yet online nodes, ensure their
+* node_numa_mem is set correctly so that kmalloc_node works
+* for such nodes.
+*/
+   if (!node_online(node)) {
+   reset_numa_mem(node);
+   continue;
+   }
pr_info("Node %d CPUs:", node);
 
count = 0;
-- 
2.18.1

[PATCH 3/4] mm: Implement reset_numa_mem

2020-03-17 Thread Srikar Dronamraju

For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY
fallback node. Currently kernel has an API set_numa_mem that sets
node_numa_mem for memoryless node. However this API cannot be used for
offline nodes. Hence all offline nodes will have their node_numa_mem set
to 0. However systems can themselves have node 0 as offline i.e
memoryless and cpuless at this time. In such cases,
node_to_mem_node() fails to provide a N_MEMORY fallback node.

Mitigate this by having a new API that sets the default node_numa_mem for
offline nodes to be first_memory_node.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Sachin Sant 
Cc: Michal Hocko 
Cc: Christopher Lameter 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim 
Cc: Kirill Tkhai 
Cc: Vlastimil Babka 
Cc: Srikar Dronamraju 
Cc: Bharata B Rao 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Srikar Dronamraju 
---
 include/asm-generic/topology.h | 3 +++
 include/linux/topology.h   | 7 +++
 2 files changed, 10 insertions(+)

diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
index 238873739550..e803ee7850e6 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -68,6 +68,9 @@
 #ifndef set_numa_mem
 #define set_numa_mem(node)
 #endif
+#ifndef reset_numa_mem
+#define reset_numa_mem(node)
+#endif
 #ifndef set_cpu_numa_mem
 #define set_cpu_numa_mem(cpu, node)
 #endif
diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..bebda80038bf 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -147,6 +147,13 @@ static inline int node_to_mem_node(int node)
 }
 #endif
 
+#ifndef reset_numa_mem
+static inline void reset_numa_mem(int node)
+{
+   _node_numa_mem_[node] = first_memory_node;
+}
+#endif
+
 #ifndef numa_mem_id
 /* Returns the number of the nearest Node with memory */
 static inline int numa_mem_id(void)
-- 
2.18.1

[PATCH 0/4] Fix kmalloc_node on offline nodes

2020-03-17 Thread Srikar Dronamraju

Sachin recently reported that linux-next was no more bootable on few
systems.
https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10
#

Sachin bisected the problem to Commit a75056fc1e7c ("mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node")

The root cause analysis showed that mm/slub and powerpc/numa had some 
shortcomings
with respect to offline nodes.

This patch series is on top of patches posted at
https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Sachin Sant 
Cc: Michal Hocko 
Cc: Christopher Lameter 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim 
Cc: Kirill Tkhai 
Cc: Vlastimil Babka 
Cc: Srikar Dronamraju 
Cc: Bharata B Rao 

Srikar Dronamraju (4):
  mm: Check for node_online in node_present_pages
  mm/slub: Use mem_node to allocate a new slab
  mm: Implement reset_numa_mem
  powerpc/numa: Set fallback nodes for offline nodes

 arch/powerpc/mm/numa.c | 11 ++-
 include/asm-generic/topology.h |  3 +++
 include/linux/mmzone.h |  6 --
 include/linux/topology.h   |  7 +++
 mm/slub.c  | 19 ---
 5 files changed, 32 insertions(+), 14 deletions(-)

--
2.18.1

[PATCH 2/4] mm/slub: Use mem_node to allocate a new slab

2020-03-17 Thread Srikar Dronamraju

Currently while allocating a slab for a offline node, we use its
associated node_numa_mem to search for a partial slab. If we don't find
a partial slab, we try allocating a slab from the offline node using
__alloc_pages_node. However this is bound to fail.

NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0
LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
Call Trace:
[c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 
(unreliable)
[c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0
[c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820
[c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60
[c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490
[c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110
[c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270
[c008b3683b90] [c0234e08] online_css+0x48/0xd0
[c008b3683bc0] [c023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
[c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0
[c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0
[c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230
[c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0
[c008b3683e20] [c000b278] system_call+0x5c/0x68

Mitigate this by allocating the new slab from the node_numa_mem.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Sachin Sant 
Cc: Michal Hocko 
Cc: Christopher Lameter 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim 
Cc: Kirill Tkhai 
Cc: Vlastimil Babka 
Cc: Srikar Dronamraju 
Cc: Bharata B Rao 

Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Srikar Dronamraju 
---
 mm/slub.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1c55bf7892bf..fdf7f38f96e6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
struct kmem_cache_cpu *c)
 {
void *object;
-   int searchnode = node;
 
-   if (node == NUMA_NO_NODE)
-   searchnode = numa_mem_id();
-   else if (!node_present_pages(node))
-   searchnode = node_to_mem_node(node);
-
-   object = get_partial_node(s, get_node(s, searchnode), c, flags);
+   object = get_partial_node(s, get_node(s, node), c, flags);
if (object || node != NUMA_NO_NODE)
return object;
 
@@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache 
*s, gfp_t flags,
 
WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
 
+   if (node == NUMA_NO_NODE)
+   node = numa_mem_id();
+   else if (!node_present_pages(node))
+   node = node_to_mem_node(node);
+
freelist = get_partial(s, flags, node, c);
 
if (freelist)
@@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t 
gfpflags, int node,
 redo:
 
if (unlikely(!node_match(page, node))) {
-   int searchnode = node;
-
if (node != NUMA_NO_NODE && !node_present_pages(node))
-   searchnode = node_to_mem_node(node);
+   node = node_to_mem_node(node);
 
-   if (unlikely(!node_match(page, searchnode))) {
+   if (unlikely(!node_match(page, node))) {
stat(s, ALLOC_NODE_MISMATCH);
deactivate_slab(s, page, c->freelist, c);
goto new_slab;
-- 
2.18.1

Re: [PATCH] powerpc/32: Fix missing NULL pmd check in virt_to_kpte()

2020-03-17 Thread Michael Ellerman

On Sat, 2020-03-07 at 10:09:15 UTC, Christophe Leroy wrote:
> Commit 2efc7c085f05 ("powerpc/32: drop get_pteptr()"),
> replaced get_pteptr() by virt_to_kpte(). But virt_to_kpte() lacks a
> NULL pmd check and returns an invalid non NULL pointer when there
> is no page table.
> 
> Reported-by: Nick Desaulniers 
> Fixes: 2efc7c085f05 ("powerpc/32: drop get_pteptr()")
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/cc6f0e39000900e5dd1448103a9571f0eccd7d4e

cheers

Re: [PATCH v2 -next] powerpc/pmac/smp: drop unnecessary volatile qualifier

2020-03-17 Thread Michael Ellerman

On Tue, 2020-03-03 at 08:56:04 UTC, YueHaibing wrote:
> core99_l2_cache/core99_l3_cache no need to mark as volatile,
> just remove it.
> 
> Signed-off-by: YueHaibing 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a4037d1f1fc4e92b69d7196d4568c33078d465ea

cheers

Re: [PATCH] powerpc/64s/radix: Fix !SMP build

2020-03-17 Thread Michael Ellerman

On Mon, 2020-03-02 at 01:04:10 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/993cfecc59e57de237ae27fadc84ed24efa87a4d

cheers

Re: [PATCH] selftests: powerpc: Add tlbie_test in .gitignore

2020-03-17 Thread Michael Ellerman

On Fri, 2020-02-28 at 00:00:09 UTC, Christophe Leroy wrote:
> The commit identified below added tlbie_test but
> forgot to add it in .gitignore.
> 
> Fixes: 93cad5f78995 ("selftests/powerpc: Add test case for tlbie vs mtpidr 
> ordering issue")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/47bf235f324c696395c30541fe4fcf99fcd24188

cheers

Re: [PATCH] powerpc: fix emulate_step std test

2020-03-17 Thread Michael Ellerman

On Wed, 2020-02-26 at 05:53:02 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/59ed2adf393109c56d383e568f2e57bb5ad6d901

cheers

Re: [PATCH] powerpc/vdso: remove deprecated VDS64_HAS_DESCRIPTORS references

2020-03-17 Thread Michael Ellerman

On Mon, 2020-02-24 at 21:18:48 UTC, Joe Lawrence wrote:
> The original 2005 patch that introduced the powerpc vdso, pre-git
> ("ppc64: Implement a vDSO and use it for signal trampoline") notes that:
> 
>   ... symbols exposed by the vDSO aren't "normal" function symbols, apps
>   can't be expected to link against them directly, the vDSO's are both
>   seen as if they were linked at 0 and the symbols just contain offsets
>   to the various functions.  This is done on purpose to avoid a
>   relocation step (ppc64 functions normally have descriptors with abs
>   addresses in them).  When glibc uses those functions, it's expected to
>   use it's own trampolines that know how to reach them.
> 
> Despite that explanation, there remains dead #ifdef
> VDS64_HAS_DESCRIPTORS code-blocks that provide alternate function
> definitions that setup function descriptors.
> 
> Since VDS64_HAS_DESCRIPTORS has been unused for all these years, we
> might as well finally remove it from the codebase.
> 
> Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204430.=
> html
> Link: https://lore.kernel.org/lkml/1108002773.7733.196.camel@gaston/
> Signed-off-by: Joe Lawrence 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ffd3eaf178b0f616a071e510e289d937330b0b35

cheers

Re: [PATCH 1/2] cpufreq/powernv: Fix use-after-free

2020-03-17 Thread Michael Ellerman

On Thu, 2020-02-06 at 06:26:21 UTC, Oliver O'Halloran wrote:
> The cpufreq driver has a use-after-free that we can hit if:
> 
> a) There's an OCC message pending when the notifier is registered, and
> b) The cpufreq driver fails to register with the core.
> 
> When a) occurs the notifier schedules a workqueue item to handle the
> message. The backing work_struct is located on chips[].throttle and when b)
> happens we clean up by freeing the array. Once we get to the (now free)
> queued item and the kernel crashes.
> 
> Cc: Vaidyanathan Srinivasan 
> Fixes: c5e29ea ("cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}")
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d0a72efac89d1c35ac55197895201b7b94c5e6ef

cheers

Re: [PATCH] powerpc/fsl_booke: avoid creating duplicate tlb1 entry

2020-03-17 Thread Michael Ellerman

On Thu, 2020-01-23 at 11:19:25 UTC, Laurentiu Tudor wrote:
> In the current implementation, the call to loadcam_multi() is wrapped
> between switch_to_as1() and restore_to_as0() calls so, when it tries
> to create its own temporary AS=3D1 TLB1 entry, it ends up duplicating the
> existing one created by switch_to_as1(). Add a check to skip creating
> the temporary entry if already running in AS=3D1.
> 
> Fixes: d9e1831a4202 ("powerpc/85xx: Load all early TLB entries at once")
> Signed-off-by: Laurentiu Tudor 
> Cc: sta...@vger.kernel.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aa4113340ae6c2811e046f08c2bc21011d20a072

cheers

Re: [PATCH] evh_bytechan: fix out of bounds accesses

2020-03-17 Thread Michael Ellerman

On Thu, 2020-01-09 at 07:39:12 UTC, Stephen Rothwell wrote:
> ev_byte_channel_send() assumes that its third argument is a 16 byte array.
> Some places where it is called it may not be (or we can't easily tell
> if it is).  Newer compilers have started producing warnings about this,
> so make sure we actually pass a 16 byte array.
> 
> There may be more elegant solutions to this, but the driver is quite
> old and hasn't been updated in many years.
> 
> The warnings (from a powerpc allyesconfig build) are:
> 
> In file included from include/linux/byteorder/big_endian.h:5,
>  from arch/powerpc/include/uapi/asm/byteorder.h:14,
>  from include/asm-generic/bitops/le.h:6,
>  from arch/powerpc/include/asm/bitops.h:250,
>  from include/linux/bitops.h:29,
>  from include/linux/kernel.h:12,
>  from include/asm-generic/bug.h:19,
>  from arch/powerpc/include/asm/bug.h:109,
>  from include/linux/bug.h:5,
>  from include/linux/mmdebug.h:5,
>  from include/linux/gfp.h:5,
>  from include/linux/slab.h:15,
>  from drivers/tty/ehv_bytechan.c:24:
> drivers/tty/ehv_bytechan.c: In function =E2=80=98ehv_bc_udbg_putc=E2=80=99:
> arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 =
> is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds]
>   298 |  r6 =3D be32_to_cpu(p[1]);
> include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac=
> ro =E2=80=98__be32_to_cpu=E2=80=99
>40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
>   |   ^
> arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro =
> =E2=80=98be32_to_cpu=E2=80=99
>   298 |  r6 =3D be32_to_cpu(p[1]);
>   |   ^~~
> drivers/tty/ehv_bytechan.c:166:13: note: while referencing =E2=80=98data=E2=
> =80=99
>   166 | static void ehv_bc_udbg_putc(char c)
>   | ^~~~
> In file included from include/linux/byteorder/big_endian.h:5,
>  from arch/powerpc/include/uapi/asm/byteorder.h:14,
>  from include/asm-generic/bitops/le.h:6,
>  from arch/powerpc/include/asm/bitops.h:250,
>  from include/linux/bitops.h:29,
>  from include/linux/kernel.h:12,
>  from include/asm-generic/bug.h:19,
>  from arch/powerpc/include/asm/bug.h:109,
>  from include/linux/bug.h:5,
>  from include/linux/mmdebug.h:5,
>  from include/linux/gfp.h:5,
>  from include/linux/slab.h:15,
>  from drivers/tty/ehv_bytechan.c:24:
> arch/powerpc/include/asm/epapr_hcalls.h:299:20: warning: array subscript 2 =
> is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds]
>   299 |  r7 =3D be32_to_cpu(p[2]);
> include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac=
> ro =E2=80=98__be32_to_cpu=E2=80=99
>40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
>   |   ^
> arch/powerpc/include/asm/epapr_hcalls.h:299:7: note: in expansion of macro =
> =E2=80=98be32_to_cpu=E2=80=99
>   299 |  r7 =3D be32_to_cpu(p[2]);
>   |   ^~~
> drivers/tty/ehv_bytechan.c:166:13: note: while referencing =E2=80=98data=E2=
> =80=99
>   166 | static void ehv_bc_udbg_putc(char c)
>   | ^~~~
> In file included from include/linux/byteorder/big_endian.h:5,
>  from arch/powerpc/include/uapi/asm/byteorder.h:14,
>  from include/asm-generic/bitops/le.h:6,
>  from arch/powerpc/include/asm/bitops.h:250,
>  from include/linux/bitops.h:29,
>  from include/linux/kernel.h:12,
>  from include/asm-generic/bug.h:19,
>  from arch/powerpc/include/asm/bug.h:109,
>  from include/linux/bug.h:5,
>  from include/linux/mmdebug.h:5,
>  from include/linux/gfp.h:5,
>  from include/linux/slab.h:15,
>  from drivers/tty/ehv_bytechan.c:24:
> arch/powerpc/include/asm/epapr_hcalls.h:300:20: warning: array subscript 3 =
> is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds]
>   300 |  r8 =3D be32_to_cpu(p[3]);
> include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac=
> ro =E2=80=98__be32_to_cpu=E2=80=99
>40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
>   |   ^
> arch/powerpc/include/asm/epapr_hcalls.h:300:7: note: in expansion of macro =
> =E2=80=98be32_to_cpu=E2=80=99
>   300 |  r8 =3D be32_to_cpu(p[3]);
>   |   ^~~
> drivers/tty/ehv_bytechan.c:166:13: note: while referencing

Re: [PATCH] powerpc/pmac/smp: avoid unused-variable warnings

2020-03-17 Thread Michael Ellerman

On Fri, 2019-09-20 at 15:39:51 UTC, Ilie Halip wrote:
> When building with ppc64_defconfig, the compiler reports
> that these 2 variables are not used:
> warning: unused variable 'core99_l2_cache' [-Wunused-variable]
> warning: unused variable 'core99_l3_cache' [-Wunused-variable]
> 
> They are only used when CONFIG_PPC64 is not defined. Move
> them into a section which does the same macro check.
> 
> Reported-by: Nathan Chancellor 
> Signed-off-by: Ilie Halip 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9451c79bc39e610882bdd12370f01af5004a3c4f

cheers

[PATCH 11/12] docs: filesystems: fix renamed references

2020-03-17 Thread Mauro Carvalho Chehab

Some filesystem references got broken by a previous patch
series I submitted. Address those.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/sysfs-devices-node   |  2 +-
 Documentation/ABI/testing/procfs-smaps_rollup |  2 +-
 Documentation/admin-guide/cpu-load.rst|  2 +-
 .../admin-guide/kernel-parameters.txt |  2 +-
 Documentation/admin-guide/nfs/nfsroot.rst |  2 +-
 .../driver-api/driver-model/device.rst|  4 +-
 .../driver-api/driver-model/overview.rst  |  2 +-
 Documentation/filesystems/dax.txt |  2 +-
 Documentation/filesystems/dnotify.txt |  2 +-
 .../filesystems/ramfs-rootfs-initramfs.rst|  2 +-
 Documentation/filesystems/sysfs.rst   |  2 +-
 .../powerpc/firmware-assisted-dump.rst|  2 +-
 Documentation/process/adding-syscalls.rst |  2 +-
 .../it_IT/process/adding-syscalls.rst |  2 +-
 .../translations/zh_CN/filesystems/sysfs.txt  |  6 +--
 MAINTAINERS   | 54 +--
 Next/merge.log| 12 ++---
 drivers/base/core.c   |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   |  2 +-
 fs/Kconfig|  2 +-
 fs/Kconfig.binfmt |  2 +-
 fs/adfs/Kconfig   |  2 +-
 fs/affs/Kconfig   |  2 +-
 fs/afs/Kconfig|  6 +--
 fs/bfs/Kconfig|  2 +-
 fs/cramfs/Kconfig |  2 +-
 fs/ecryptfs/Kconfig   |  2 +-
 fs/hfs/Kconfig|  2 +-
 fs/hpfs/Kconfig   |  2 +-
 fs/isofs/Kconfig  |  2 +-
 fs/namespace.c|  2 +-
 fs/notify/inotify/Kconfig |  2 +-
 fs/ntfs/Kconfig   |  2 +-
 fs/ocfs2/Kconfig  |  2 +-
 fs/proc/Kconfig   |  4 +-
 fs/romfs/Kconfig  |  2 +-
 fs/sysfs/dir.c|  2 +-
 fs/sysfs/file.c   |  2 +-
 fs/sysfs/mount.c  |  2 +-
 fs/sysfs/symlink.c|  2 +-
 fs/sysv/Kconfig   |  2 +-
 fs/udf/Kconfig|  2 +-
 include/linux/kobject.h   |  2 +-
 include/linux/kobject_ns.h|  2 +-
 include/linux/relay.h |  2 +-
 include/linux/sysfs.h |  2 +-
 kernel/relay.c|  2 +-
 lib/kobject.c |  4 +-
 48 files changed, 86 insertions(+), 86 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
 Contact:   Linux Memory Management list 
 Description:
Provides information about the node's distribution and memory
-   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.txt
+   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.rst
 
 What:  /sys/devices/system/node/nodeX/numastat
 Date:  October 2002
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps.  These fields represent
the sum of the Pss field of each type (anon, file, shmem).
-   For more details, see Documentation/filesystems/proc.txt
+   For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
 
Typical output looks like this:
diff --git a/Documentation/admin-guide/cpu-load.rst 
b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
 --
 
 - http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
+- Documentation/filesystems/proc.rst (1.8)
 
 
 Thanks
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 28be91d4e66b..192a36a66fc9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -683,7 +683,7 @@
cored

[PATCH 08/12] docs: fix broken references to text files

2020-03-17 Thread Mauro Carvalho Chehab

Several references got broken due to txt to ReST conversion.

Several of them can be automatically fixed with:

scripts/documentation-file-ref-check --fix

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/admin-guide/kernel-parameters.txt  |  2 +-
 Documentation/memory-barriers.txt|  2 +-
 Documentation/process/submit-checklist.rst   |  2 +-
 .../translations/it_IT/process/submit-checklist.rst  |  2 +-
 Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
 .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
 .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
 Documentation/virt/kvm/arm/pvtime.rst|  2 +-
 Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
 Documentation/virt/kvm/hypercalls.rst|  4 ++--
 arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
 drivers/gpu/drm/Kconfig  |  2 +-
 drivers/gpu/drm/drm_ioctl.c  |  2 +-
 drivers/hwtracing/coresight/Kconfig  |  2 +-
 fs/fat/Kconfig   |  8 
 fs/fuse/Kconfig  |  2 +-
 fs/fuse/dev.c|  2 +-
 fs/overlayfs/Kconfig |  6 +++---
 include/linux/mm.h   |  4 ++--
 include/uapi/linux/ethtool_netlink.h |  2 +-
 include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
 mm/gup.c | 12 ++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
 virt/kvm/arm/vgic/vgic.h |  4 ++--
 24 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index df34a4176e58..28be91d4e66b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -960,7 +960,7 @@
edid/1680x1050.bin, or edid/1920x1080.bin is given
and no file with the same name exists. Details and
instructions how to build your own EDID data are
-   available in Documentation/driver-api/edid.rst. An EDID
+   available in Documentation/admin-guide/edid.rst. An EDID
data set will only be used for a particular connector,
if its name and a colon are prepended to the EDID
name. Each connector may use a unique EDID data
diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index e1c355e84edd..eaabc3134294 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
writes
 until they are certain (1) that the write will actually happen, (2)
 of the location of the write, and (3) of the value to be written.
 But please carefully read the "CONTROL DEPENDENCIES" section and the
-Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
+Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
 break dependencies in a great many highly creative ways.
 
CPU 1 CPU 2
diff --git a/Documentation/process/submit-checklist.rst 
b/Documentation/process/submit-checklist.rst
index 8e56337d422d..3f8e9d5d95c2 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
 and why.
 
 26) If any ioctl's are added by the patch, then also update
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 27) If your modified source code depends on or uses any of the kernel
 APIs or features that are related to the following ``Kconfig`` symbols,
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
b/Documentation/translations/it_IT/process/submit-checklist.rst
index 995ee69fab11..3e575502690f 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
 sorgenti che ne spieghi la logica: cosa fanno e perché.
 
 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o
 funzionalità del kernel che è associata a uno dei seguenti simboli
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 2e831ece6e26..e50fe6541335 100644
--- a/Documentati

[PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-17 Thread Stefan Berger

From: Stefan Berger 

This patch fixes the following problem when the ibmvtpm driver
is built as a module:

ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] 
undefined!
make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make: *** [Makefile:1298: modules] Error 2

Signed-off-by: Stefan Berger 
---
 drivers/char/tpm/tpm2-cmd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 76f67b155bd5..eff1f12d981a 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -681,6 +681,7 @@ int tpm2_get_cc_attrs_tbl(struct tpm_chip *chip)
rc = -ENODEV;
return rc;
 }
+EXPORT_SYMBOL_GPL(tpm2_get_cc_attrs_tbl);
 
 /**
  * tpm2_startup - turn on the TPM
-- 
2.23.0

Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest

2020-03-17 Thread Bharata B Rao

On Tue, Mar 17, 2020 at 02:56:28PM +0530, Bharata B Rao wrote:
> Case 1: 2 node NUMA, node0 empty
> 
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7
> node 1 size: 16294 MB
> node 1 free: 15453 MB
> node distances:
> node   0   1 
>   0:  10  40 
>   1:  40  10 
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..888e4d245444 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1971,10 +1971,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
> flags, int node,
>   void *object;
>   int searchnode = node;
>  
> - if (node == NUMA_NO_NODE)
> + if (node == NUMA_NO_NODE || !node_present_pages(node))
>   searchnode = numa_mem_id();
> - else if (!node_present_pages(node))
> - searchnode = node_to_mem_node(node);

For the above topology, I see this:

node_to_mem_node(1) = 1
node_to_mem_node(0) = 0
node_to_mem_node(NUMA_NO_NODE) = 0

Looks like the last two cases (returning memory-less node 0) is the
problem here?

Regards,
Bharata.

Re: [PATCH v3] powerpc/fsl-85xx: fix compile error

2020-03-17 Thread Michael Ellerman

王文虎  writes:
> From: Michael Ellerman 
>  Date: 2020-03-16 17:41:12
> To:WANG Wenhu ,Benjamin Herrenschmidt 
> ,Paul Mackerras ,WANG Wenhu 
> ,Allison Randal ,Richard Fontana 
> ,Greg Kroah-Hartman ,Thomas 
> Gleixner 
> ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org
>  cc: triv...@kernel.org,ker...@vivo.com,stable 
> Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>WANG Wenhu 
>  writes:
>>> Include "linux/of_address.h" to fix the compile error for
>>> mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.
>>>
>>>   CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
>>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function 
>>> ‘mpc85xx_l2ctlr_of_probe’:
>>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of 
>>> function ‘of_iomap’; did you mean ‘pci_iomap’? 
>>> [-Werror=implicit-function-declaration]
>>>   l2ctlr = of_iomap(dev->dev.of_node, 0);
>>>^~~~
>>>pci_iomap
>>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer 
>>> from integer without a cast [-Werror=int-conversion]
>>>   l2ctlr = of_iomap(dev->dev.of_node, 0);
>>>  ^
>>> cc1: all warnings being treated as errors
>>> scripts/Makefile.build:267: recipe for target 
>>> 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
>>> make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1
>>>
>>> Fixes: commit 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
>>
>>The syntax is:
>>
>>Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
>>
>>> Cc: stable 
>>
>>The commit above went into v2.6.37.
>>
>>So no one has noticed this bug since then, how? Or did something else
>>change to expose the problem?
>
> Actually a hard question to answer cause it also left me scratching for long.
> However, I can not find right or definite answer.

Oh, actually it's fairly straight forward, the code can't be built at
all in upstream because CONFIG_FSL_85XX_CACHE_SRAM is not selectable or
selected by anything.

You sent a patch previously to make it selectable, which Scott thought
was a bad idea.

So this whole file is dead code as far as I'm concerned, so patches for
it definitely do not need to go to stable.

If you want to add a user for it then please send a series doing that,
and this commit can be the first.

cheers

[PATCH] treewide: Rename "unencrypted" to "decrypted"

2020-03-17 Thread Borislav Petkov

Hi all,

this hasn't been fully tested yet but it is mechanical rename only so
there shouldn't be any problems (famous last words :-)).

I'll run it through the randconfig bench today and take it through tip if
there are no objections.

Thx.

---

Back then when the whole SME machinery started getting mainlined, it
was agreed that for simplicity, clarity and sanity's sake, the terms
denoting encrypted and not-encrypted memory should be "encrypted" and
"decrypted". And the majority of the code sticks to that convention
except those two. So rename them.

No functional changes.

Signed-off-by: Borislav Petkov 
---
 arch/powerpc/platforms/pseries/Kconfig |  2 +-
 arch/s390/Kconfig  |  2 +-
 arch/x86/Kconfig   |  2 +-
 arch/x86/mm/mem_encrypt.c  |  4 ++--
 include/linux/dma-direct.h |  8 
 kernel/dma/Kconfig |  2 +-
 kernel/dma/direct.c| 14 +++---
 kernel/dma/mapping.c   |  2 +-
 8 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 24c18362e5ea..a78e2c3e1d92 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -151,7 +151,7 @@ config PPC_SVM
depends on PPC_PSERIES
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
-   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_FORCE_DMA_DECRYPTED
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 8abe77536d9d..ab1dbb7415b4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -192,7 +192,7 @@ config S390
select VIRT_CPU_ACCOUNTING
select ARCH_HAS_SCALED_CPUTIME
select HAVE_NMI
-   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_FORCE_DMA_DECRYPTED
select SWIOTLB
select GENERIC_ALLOCATOR
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index beea77046f9b..2ae904f505e1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,7 @@ config AMD_MEM_ENCRYPT
depends on X86_64 && CPU_SUP_AMD
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
-   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_FORCE_DMA_DECRYPTED
---help---
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a03614bd3e1a..66d09f269e6d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -350,8 +350,8 @@ bool sev_active(void)
return sme_me_mask && sev_enabled;
 }
 
-/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-bool force_dma_unencrypted(struct device *dev)
+/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_DECRYPTED */
+bool force_dma_decrypted(struct device *dev)
 {
/*
 * For SEV, all DMA must be to unencrypted addresses.
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..9f955844e9c7 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -26,14 +26,14 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
-#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
-bool force_dma_unencrypted(struct device *dev);
+#ifdef CONFIG_ARCH_HAS_FORCE_DMA_DECRYPTED
+bool force_dma_decrypted(struct device *dev);
 #else
-static inline bool force_dma_unencrypted(struct device *dev)
+static inline bool force_dma_decrypted(struct device *dev)
 {
return false;
 }
-#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+#endif /* CONFIG_ARCH_HAS_FORCE_DMA_DECRYPTED */
 
 /*
  * If memory encryption is supported, phys_to_dma will set the memory 
encryption
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..55c4147bb2b1 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -51,7 +51,7 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL
 config ARCH_HAS_DMA_PREP_COHERENT
bool
 
-config ARCH_HAS_FORCE_DMA_UNENCRYPTED
+config ARCH_HAS_FORCE_DMA_DECRYPTED
bool
 
 config DMA_NONCOHERENT_CACHE_SYNC
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index ac7956c38f69..a0576c0ccacd 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -26,7 +26,7 @@ unsigned int zone_dma_bits __ro_after_init = 24;
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
 {
-   if (force_dma_unencrypted(dev))
+   if (force_dma_decrypted(dev))
return __phys_to_dma(dev, phys);
return phys_to_dma(dev, phys);
 }
@@ -49,7 +49,7 @@ static gfp_t __dma_direct_opt

Re: [PATCH 15/15] powerpc/watchpoint/xmon: Support 2nd dawr

2020-03-17 Thread Christophe Leroy





Le 09/03/2020 à 09:58, Ravi Bangoria a écrit :

Add support for 2nd DAWR in xmon. With this, we can have two
simultaneous breakpoints from xmon.

Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/xmon/xmon.c | 101 ++-
  1 file changed, 69 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index ac18fe3e4295..20adc83404c8 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -110,7 +110,7 @@ struct bpt {
  
  #define NBPTS	256

  static struct bpt bpts[NBPTS];
-static struct bpt dabr;
+static struct bpt dabr[HBP_NUM_MAX];
  static struct bpt *iabr;
  static unsigned bpinstr = 0x7fe8; /* trap */
  
@@ -786,10 +786,17 @@ static int xmon_sstep(struct pt_regs *regs)
  
  static int xmon_break_match(struct pt_regs *regs)

  {
+   int i;
+
if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
return 0;
-   if (dabr.enabled == 0)
-   return 0;
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (dabr[i].enabled)
+   goto found;
+   }
+   return 0;
+
+found:
xmon_core(regs, 0);
return 1;
  }
@@ -928,13 +935,16 @@ static void insert_bpts(void)
  
  static void insert_cpu_bpts(void)

  {
+   int i;
struct arch_hw_breakpoint brk;
  
-	if (dabr.enabled) {

-   brk.address = dabr.address;
-   brk.type = (dabr.enabled & HW_BRK_TYPE_DABR) | 
HW_BRK_TYPE_PRIV_ALL;
-   brk.len = DABR_MAX_LEN;
-   __set_breakpoint(&brk, 0);
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (dabr[i].enabled) {
+   brk.address = dabr[i].address;
+   brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | 
HW_BRK_TYPE_PRIV_ALL;
+   brk.len = 8;
+   __set_breakpoint(&brk, i);
+   }
}
  
  	if (iabr)

@@ -1348,6 +1358,35 @@ static long check_bp_loc(unsigned long addr)
return 1;
  }
  
+static int free_data_bpt(void)


This names suggests the function frees a breakpoint.
I guess it should be find_free_data_bpt()


+{
+   int i;
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (!dabr[i].enabled)
+   return i;
+   }
+   printf("Couldn't find free breakpoint register\n");
+   return -1;
+}
+
+static void print_data_bpts(void)
+{
+   int i;
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   if (!dabr[i].enabled)
+   continue;
+
+   printf("   data   "REG"  [", dabr[i].address);
+   if (dabr[i].enabled & 1)
+   printf("r");
+   if (dabr[i].enabled & 2)
+   printf("w");
+   printf("]\n");
+   }
+}
+
  static char *breakpoint_help_string =
  "Breakpoint command usage:\n"
  "bshow breakpoints\n"
@@ -1381,10 +1420,9 @@ bpt_cmds(void)
printf("Hardware data breakpoint not supported on this 
cpu\n");
break;
}
-   if (dabr.enabled) {
-   printf("Couldn't find free breakpoint register\n");
+   i = free_data_bpt();
+   if (i < 0)
break;
-   }
mode = 7;
cmd = inchar();
if (cmd == 'r')
@@ -1393,15 +1431,15 @@ bpt_cmds(void)
mode = 6;
else
termch = cmd;
-   dabr.address = 0;
-   dabr.enabled = 0;
-   if (scanhex(&dabr.address)) {
-   if (!is_kernel_addr(dabr.address)) {
+   dabr[i].address = 0;
+   dabr[i].enabled = 0;
+   if (scanhex(&dabr[i].address)) {
+   if (!is_kernel_addr(dabr[i].address)) {
printf(badaddr);
break;
}
-   dabr.address &= ~HW_BRK_TYPE_DABR;
-   dabr.enabled = mode | BP_DABR;
+   dabr[i].address &= ~HW_BRK_TYPE_DABR;
+   dabr[i].enabled = mode | BP_DABR;
}
  
  		force_enable_xmon();

@@ -1440,7 +1478,9 @@ bpt_cmds(void)
for (i = 0; i < NBPTS; ++i)
bpts[i].enabled = 0;
iabr = NULL;
-   dabr.enabled = 0;
+   for (i = 0; i < nr_wp_slots(); i++)
+   dabr[i].enabled = 0;
+
printf("All breakpoints cleared\n");
break;
}
@@ -1474,14 +1514,7 @@ bpt_cmds(void)
if (xmon_is_ro || !scanhex(&a)) {
/* print all breakpoints */
printf("   type

Re: [PATCH 14/15] powerpc/watchpoint/xmon: Don't allow breakpoint overwriting

2020-03-17 Thread Christophe Leroy





Le 09/03/2020 à 09:58, Ravi Bangoria a écrit :

Xmon allows overwriting breakpoints because it's supported by only
one dawr. But with multiple dawrs, overwriting becomes ambiguous
or unnecessary complicated. So let's not allow it.


Could we drop this completely (I mean the functionnality, not the patch).

Christophe



Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/xmon/xmon.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0ca0d29f99c6..ac18fe3e4295 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1381,6 +1381,10 @@ bpt_cmds(void)
printf("Hardware data breakpoint not supported on this 
cpu\n");
break;
}
+   if (dabr.enabled) {
+   printf("Couldn't find free breakpoint register\n");
+   break;
+   }
mode = 7;
cmd = inchar();
if (cmd == 'r')

Re: [PATCH 13/15] powerpc/watchpoint: Don't allow concurrent perf and ptrace events

2020-03-17 Thread Christophe Leroy





Le 09/03/2020 à 09:58, Ravi Bangoria a écrit :

ptrace and perf watchpoints on powerpc behaves differently. Ptrace


On the 8xx, ptrace generates signal after executing the instruction.


watchpoint works in one-shot mode and generates signal before executing
instruction. It's ptrace user's job to single-step the instruction and
re-enable the watchpoint. OTOH, in case of perf watchpoint, kernel
emulates/single-steps the instruction and then generates event. If perf
and ptrace creates two events with same or overlapping address ranges,
it's ambiguous to decide who should single-step the instruction. Because
of this issue ptrace and perf event can't coexist when the address range
overlaps.


Ok, and then ? What's the purpose of this (big) patch ?



Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/include/asm/hw_breakpoint.h |   2 +
  arch/powerpc/kernel/hw_breakpoint.c  | 220 +++
  kernel/events/hw_breakpoint.c|  16 ++
  3 files changed, 238 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index ec61e2b7195c..6e1a19af5177 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -66,6 +66,8 @@ extern int hw_breakpoint_exceptions_notify(struct 
notifier_block *unused,
unsigned long val, void *data);
  int arch_install_hw_breakpoint(struct perf_event *bp);
  void arch_uninstall_hw_breakpoint(struct perf_event *bp);
+int arch_reserve_bp_slot(struct perf_event *bp);
+void arch_release_bp_slot(struct perf_event *bp);
  void arch_unregister_hw_breakpoint(struct perf_event *bp);
  void hw_breakpoint_pmu_read(struct perf_event *bp);
  extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 2ac89b92590f..d8529d9151e8 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -123,6 +123,226 @@ static bool is_ptrace_bp(struct perf_event *bp)
return (bp->overflow_handler == ptrace_triggered);
  }
  
+struct breakpoint {

+   struct list_head list;
+   struct perf_event *bp;
+   bool ptrace_bp;
+};


Don't we have an equivalent struct already ?


+
+static DEFINE_PER_CPU(struct breakpoint *, cpu_bps[HBP_NUM_MAX]);
+static LIST_HEAD(task_bps);
+
+static struct breakpoint *alloc_breakpoint(struct perf_event *bp)
+{
+   struct breakpoint *tmp;
+
+   tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+   if (!tmp)
+   return ERR_PTR(-ENOMEM);
+   tmp->bp = bp;
+   tmp->ptrace_bp = is_ptrace_bp(bp);
+   return tmp;
+}
+
+static bool bp_addr_range_overlap(struct perf_event *bp1, struct perf_event 
*bp2)
+{
+   __u64 bp1_saddr, bp1_eaddr, bp2_saddr, bp2_eaddr;
+
+   bp1_saddr = bp1->attr.bp_addr & ~HW_BREAKPOINT_ALIGN;
+   bp1_eaddr = (bp1->attr.bp_addr + bp1->attr.bp_len - 1) | 
HW_BREAKPOINT_ALIGN;
+   bp2_saddr = bp2->attr.bp_addr & ~HW_BREAKPOINT_ALIGN;
+   bp2_eaddr = (bp2->attr.bp_addr + bp2->attr.bp_len - 1) | 
HW_BREAKPOINT_ALIGN;
+
+   return (bp1_saddr <= bp2_eaddr && bp1_eaddr >= bp2_saddr);


Would be better with something like (HW_BREAKPOINT_SIZE needs to be 
defined).


bp1_saddr = ALIGN_DOWN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE);
bp1_eaddr = ALIGN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE);
bp2_saddr = ALIGN_DOWN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE);
bp2_eaddr = ALIGN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE);

return (bp1_saddr < bp2_eaddr && bp1_eaddr > bp2_saddr);


+}
+
+static bool alternate_infra_bp(struct breakpoint *b, struct perf_event *bp)
+{
+   return is_ptrace_bp(bp) ? !b->ptrace_bp : b->ptrace_bp;
+}
+
+static bool can_co_exist(struct breakpoint *b, struct perf_event *bp)
+{
+   return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp));
+}
+
+static int task_bps_add(struct perf_event *bp)
+{
+   struct breakpoint *tmp;
+
+   tmp = alloc_breakpoint(bp);
+   if (IS_ERR(tmp))
+   return PTR_ERR(tmp);
+
+   list_add(&tmp->list, &task_bps);
+   return 0;
+}
+
+static void task_bps_remove(struct perf_event *bp)
+{
+   struct list_head *pos, *q;
+   struct breakpoint *tmp;
+
+   list_for_each_safe(pos, q, &task_bps) {
+   tmp = list_entry(pos, struct breakpoint, list);
+
+   if (tmp->bp == bp) {
+   list_del(&tmp->list);
+   kfree(tmp);
+   break;
+   }
+   }
+}
+
+/*
+ * If any task has breakpoint from alternate infrastructure,
+ * return true. Otherwise return false.
+ */
+static bool all_task_bps_check(struct perf_event *bp)
+{
+   struct breakpoint *tmp;
+
+   list_for_each_entry(tmp, &task_bps, list) {
+   if (!can_co_exist(tmp, bp))
+   return true;
+   }
+

Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-17 Thread David Hildenbrand

On 17.03.20 11:49, David Hildenbrand wrote:
> For now, distributions implement advanced udev rules to essentially
> - Don't online any hotplugged memory (s390x)
> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>   hyperv)
> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>   care of (e.g., bare metal, special virt environments)
> 
> In summary: All memory is usually onlined the same way, however, the
> kernel always has to ask user space to come up with the same answer.
> E.g., Hyper-V always waits for a memory block to get onlined before
> continuing, otherwise it might end up adding memory faster than
> hotplugging it, which can result in strange OOM situations. This waiting

s/hotplugging/onlining/

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-17 Thread David Hildenbrand

On 17.03.20 12:01, Michal Hocko wrote:
> On Tue 17-03-20 11:49:42, David Hildenbrand wrote:
>> For now, distributions implement advanced udev rules to essentially
>> - Don't online any hotplugged memory (s390x)
>> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>>   hyperv)
>> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>>   care of (e.g., bare metal, special virt environments)
>>
>> In summary: All memory is usually onlined the same way, however, the
>> kernel always has to ask user space to come up with the same answer.
>> E.g., Hyper-V always waits for a memory block to get onlined before
>> continuing, otherwise it might end up adding memory faster than
>> hotplugging it, which can result in strange OOM situations. This waiting
>> slows down adding of a bigger amount of memory.
>>
>> Let's allow to specify a default online_type, not just "online" and
>> "offline". This allows distributions to configure the default online_type
>> when booting up and be done with it.
>>
>> We can now specify "offline", "online", "online_movable" and
>> "online_kernel" via
>> - "memhp_default_state=" on the kernel cmdline
>> - /sys/devices/system/memory/auto_online_blocks
>> just like we are able to specify for a single memory block via
>> /sys/devices/system/memory/memoryX/state
>>
>> Reviewed-by: Wei Yang 
>> Cc: Greg Kroah-Hartman 
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Oscar Salvador 
>> Cc: "Rafael J. Wysocki" 
>> Cc: Baoquan He 
>> Cc: Wei Yang 
>> Signed-off-by: David Hildenbrand 
> 
> As I've said earlier and several times already, I really dislike this
> interface. But it is fact that this patch doesn't make it any worse.
> Quite contrary, so feel free to add
> Acked-by: Michal Hocko 

Thanks Michal!

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-17 Thread Michal Hocko

On Tue 17-03-20 11:49:42, David Hildenbrand wrote:
> For now, distributions implement advanced udev rules to essentially
> - Don't online any hotplugged memory (s390x)
> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>   hyperv)
> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>   care of (e.g., bare metal, special virt environments)
> 
> In summary: All memory is usually onlined the same way, however, the
> kernel always has to ask user space to come up with the same answer.
> E.g., Hyper-V always waits for a memory block to get onlined before
> continuing, otherwise it might end up adding memory faster than
> hotplugging it, which can result in strange OOM situations. This waiting
> slows down adding of a bigger amount of memory.
> 
> Let's allow to specify a default online_type, not just "online" and
> "offline". This allows distributions to configure the default online_type
> when booting up and be done with it.
> 
> We can now specify "offline", "online", "online_movable" and
> "online_kernel" via
> - "memhp_default_state=" on the kernel cmdline
> - /sys/devices/system/memory/auto_online_blocks
> just like we are able to specify for a single memory block via
> /sys/devices/system/memory/memoryX/state
> 
> Reviewed-by: Wei Yang 
> Cc: Greg Kroah-Hartman 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: "Rafael J. Wysocki" 
> Cc: Baoquan He 
> Cc: Wei Yang 
> Signed-off-by: David Hildenbrand 

As I've said earlier and several times already, I really dislike this
interface. But it is fact that this patch doesn't make it any worse.
Quite contrary, so feel free to add
Acked-by: Michal Hocko 

> ---
>  drivers/base/memory.c  | 11 +--
>  include/linux/memory_hotplug.h |  2 ++
>  mm/memory_hotplug.c|  8 
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8d3e16dab69f..2b09b68b9f78 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -35,7 +35,7 @@ static const char *const online_type_to_str[] = {
>   [MMOP_ONLINE_MOVABLE] = "online_movable",
>  };
>  
> -static int memhp_online_type_from_str(const char *str)
> +int memhp_online_type_from_str(const char *str)
>  {
>   int i;
>  
> @@ -394,13 +394,12 @@ static ssize_t auto_online_blocks_store(struct device 
> *dev,
>   struct device_attribute *attr,
>   const char *buf, size_t count)
>  {
> - if (sysfs_streq(buf, "online"))
> - memhp_default_online_type = MMOP_ONLINE;
> - else if (sysfs_streq(buf, "offline"))
> - memhp_default_online_type = MMOP_OFFLINE;
> - else
> + const int online_type = memhp_online_type_from_str(buf);
> +
> + if (online_type < 0)
>   return -EINVAL;
>  
> + memhp_default_online_type = online_type;
>   return count;
>  }
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c6e090b34c4b..ef55115320fb 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -117,6 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
>   struct mhp_restrictions *restrictions);
>  extern u64 max_mem_size;
>  
> +extern int memhp_online_type_from_str(const char *str);
> +
>  /* Default online_type (MMOP_*) when new memory blocks are added. */
>  extern int memhp_default_online_type;
>  /* If movable_node boot option specified */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1975a2b99a2b..9916977b6ee1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -74,10 +74,10 @@ int memhp_default_online_type = MMOP_ONLINE;
>  
>  static int __init setup_memhp_default_state(char *str)
>  {
> - if (!strcmp(str, "online"))
> - memhp_default_online_type = MMOP_ONLINE;
> - else if (!strcmp(str, "offline"))
> - memhp_default_online_type = MMOP_OFFLINE;
> + const int online_type = memhp_online_type_from_str(str);
> +
> + if (online_type >= 0)
> + memhp_default_online_type = online_type;
>  
>   return 1;
>  }
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type

2020-03-17 Thread Michal Hocko

On Tue 17-03-20 11:49:41, David Hildenbrand wrote:
> ... and rename it to memhp_default_online_type. This is a preparation
> for more detailed default online behavior.
> 
> Reviewed-by: Wei Yang 
> Cc: Greg Kroah-Hartman 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: "Rafael J. Wysocki" 
> Cc: Baoquan He 
> Cc: Wei Yang 
> Signed-off-by: David Hildenbrand 

Acked-by: Michal Hocko 

> ---
>  drivers/base/memory.c  | 10 --
>  include/linux/memory_hotplug.h |  3 ++-
>  mm/memory_hotplug.c| 11 ++-
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8a7f29c0bf97..8d3e16dab69f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -386,10 +386,8 @@ static DEVICE_ATTR_RO(block_size_bytes);
>  static ssize_t auto_online_blocks_show(struct device *dev,
>  struct device_attribute *attr, char *buf)
>  {
> - if (memhp_auto_online)
> - return sprintf(buf, "online\n");
> - else
> - return sprintf(buf, "offline\n");
> + return sprintf(buf, "%s\n",
> +online_type_to_str[memhp_default_online_type]);
>  }
>  
>  static ssize_t auto_online_blocks_store(struct device *dev,
> @@ -397,9 +395,9 @@ static ssize_t auto_online_blocks_store(struct device 
> *dev,
>   const char *buf, size_t count)
>  {
>   if (sysfs_streq(buf, "online"))
> - memhp_auto_online = true;
> + memhp_default_online_type = MMOP_ONLINE;
>   else if (sysfs_streq(buf, "offline"))
> - memhp_auto_online = false;
> + memhp_default_online_type = MMOP_OFFLINE;
>   else
>   return -EINVAL;
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c2e06ed5e0e9..c6e090b34c4b 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -117,7 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
>   struct mhp_restrictions *restrictions);
>  extern u64 max_mem_size;
>  
> -extern bool memhp_auto_online;
> +/* Default online_type (MMOP_*) when new memory blocks are added. */
> +extern int memhp_default_online_type;
>  /* If movable_node boot option specified */
>  extern bool movable_node_enabled;
>  static inline bool movable_node_is_enabled(void)
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 2d2aae830b92..1975a2b99a2b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -67,17 +67,17 @@ void put_online_mems(void)
>  bool movable_node_enabled = false;
>  
>  #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
> -bool memhp_auto_online;
> +int memhp_default_online_type = MMOP_OFFLINE;
>  #else
> -bool memhp_auto_online = true;
> +int memhp_default_online_type = MMOP_ONLINE;
>  #endif
>  
>  static int __init setup_memhp_default_state(char *str)
>  {
>   if (!strcmp(str, "online"))
> - memhp_auto_online = true;
> + memhp_default_online_type = MMOP_ONLINE;
>   else if (!strcmp(str, "offline"))
> - memhp_auto_online = false;
> + memhp_default_online_type = MMOP_OFFLINE;
>  
>   return 1;
>  }
> @@ -990,6 +990,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
>  
>  static int online_memory_block(struct memory_block *mem, void *arg)
>  {
> + mem->online_type = memhp_default_online_type;
>   return device_online(&mem->dev);
>  }
>  
> @@ -1062,7 +1063,7 @@ int __ref add_memory_resource(int nid, struct resource 
> *res)
>   mem_hotplug_done();
>  
>   /* online pages if requested */
> - if (memhp_auto_online)
> + if (memhp_default_online_type != MMOP_OFFLINE)
>   walk_memory_blocks(start, size, NULL, online_memory_block);
>  
>   return ret;
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

1 2 >

1 - 100 of 134 matches

Mail list logo