Re: [PATCH 07/15] powerpc/watchpoint: Get watchpoint count dynamically while disabling them
On 3/17/20 4:02 PM, Christophe Leroy wrote: Le 09/03/2020 à 09:57, Ravi Bangoria a écrit : Instead of disabling only one watchpooint, get num of available watchpoints dynamically and disable all of them. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 980ac7d9f267..ec61e2b7195c 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -75,14 +75,13 @@ extern void ptrace_triggered(struct perf_event *bp, struct perf_sample_data *data, struct pt_regs *regs); static inline void hw_breakpoint_disable(void) { - struct arch_hw_breakpoint brk; - - brk.address = 0; - brk.type = 0; - brk.len = 0; - brk.hw_len = 0; - if (ppc_breakpoint_available()) - __set_breakpoint(&brk, 0); + int i; + struct arch_hw_breakpoint null_brk = {0}; + + if (ppc_breakpoint_available()) { I think this test should go into nr_wp_slots() which should return zero when no breakpoint is available. Seems possible. Will change it in next version. Thanks, Ravi
Re: [PATCH 5/5] selftests/powerpc: Add README for GZIP engine tests
This is a good readme, the instructions for compiling and testing work. Reviewed-by: Daniel Axtens Regards, Daniel Raphael Moreira Zinsly writes: > Include a README file with the instructions to use the > testcases at selftests/powerpc/nx-gzip. > > Signed-off-by: Bulent Abali > Signed-off-by: Raphael Moreira Zinsly > --- > .../powerpc/nx-gzip/99-nx-gzip.rules | 1 + > .../testing/selftests/powerpc/nx-gzip/README | 44 +++ > 2 files changed, 45 insertions(+) > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/README > > diff --git a/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules > b/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules > new file mode 100644 > index ..5a7118495cb3 > --- /dev/null > +++ b/tools/testing/selftests/powerpc/nx-gzip/99-nx-gzip.rules > @@ -0,0 +1 @@ > +SUBSYSTEM=="nxgzip", KERNEL=="nx-gzip", MODE="0666" > diff --git a/tools/testing/selftests/powerpc/nx-gzip/README > b/tools/testing/selftests/powerpc/nx-gzip/README > new file mode 100644 > index ..ff0c817a65c5 > --- /dev/null > +++ b/tools/testing/selftests/powerpc/nx-gzip/README > @@ -0,0 +1,44 @@ > +Test the nx-gzip function: > += > + > +Verify that following device exists: > + /dev/crypto/nx-gzip > +If you get a permission error run as sudo or set the device permissions: > + sudo chmod go+rw /dev/crypto/nx-gzip > +However, chmod may not survive across boots. You may create a udev file such > +as: > + /etc/udev/rules.d/99-nx-gzip.rules > + > + > +Then make and run: > +$ make > +gcc -O3 -I./inc -o gzfht_test gzfht_test.c gzip_vas.c > +gcc -O3 -I./inc -o gunz_test gunz_test.c gzip_vas.c > + > + > +Compress any file using Fixed Huffman mode. Output will have a .nx.gz suffix: > +$ ./gzfht_test gzip_vas.c > +file gzip_vas.c read, 5276 bytes > +compressed 5276 to 2564 bytes total, crc32 checksum = b937a37d > + > + > +Uncompress the previous output. Output will have a .nx.gunzip suffix: > +$ ./gunz_test gzip_vas.c.nx.gz > +gzHeader FLG 0 > +00 00 00 00 04 03 > +gzHeader MTIME, XFL, OS ignored > +computed checksum b937a37d isize 149c > +stored checksum b937a37d isize 149c > +decomp is complete: fclose > + > + > +Compare two files: > +$ sha1sum gzip_vas.c.nx.gz.nx.gunzip gzip_vas.c > +f041cd8581e8d920f79f6ce7f65411be5d026c2a gzip_vas.c.nx.gz.nx.gunzip > +f041cd8581e8d920f79f6ce7f65411be5d026c2a gzip_vas.c > + > + > +Note that the code here are intended for testing the nx-gzip hardware > function. > +They are not intended for demonstrating performance or compression ratio. > +For more information and source code consider using: > +https://github.com/libnxz/power-gzip > -- > 2.21.0
Re: [PATCH 05/15] powerpc/watchpoint: Provide DAWR number to set_dawr
On 3/17/20 3:58 PM, Christophe Leroy wrote: Le 09/03/2020 à 09:57, Ravi Bangoria a écrit : Introduce new parameter 'nr' to set_dawr() which indicates which DAWR should be programed. While we are at it (In another patch I think), we should do the same to set_dabr() so that we can use both DABR and DABR2 This series is for DAWR only and does not support DABR2. I'll look at how other book3s family processors provides DABR2 supports. Thanks, Ravi
Re: [PATCH 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase
Raphael Moreira Zinsly writes: > Include a decompression testcase for the powerpc NX-GZIP > engine. I compiled gzip with the AFL++ fuzzer and generated a corpus of tests to run against this decompressor. I also fuzzed the decompressor directly. I found a few issues. I _think_ they're just in the userspace but I'm a bit too early in the process to know. I realise this is self-test code but: a) it stops me testing more deeply, and b) it looks like some of this code is shared with https://github.com/libnxz/power-gzip/ The issues I've found are: 1) In the ERR_NX_DATA_LENGTH case, the decompressor doesn't check that you're making forward progress, so you can provoke it into an infinite loop. Here's an _extremely_ ugly fix: diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c index 653de92698cc..236a1f567656 100644 --- a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c @@ -343,6 +343,8 @@ int decompress_file(int argc, char **argv, void *devhandle) nx_dde_t dde_out[6] __attribute__((aligned (128))); int pgfault_retries; + int last_first_used = 0; + /* when using mmap'ed files */ off_t input_file_offset; @@ -642,6 +644,11 @@ int decompress_file(int argc, char **argv, void *devhandle) first_used = fifo_used_first_bytes(cur_in, used_in, fifo_in_len); last_used = fifo_used_last_bytes(cur_in, used_in, fifo_in_len); + if (first_used > 0 && last_first_used > 0) { + assert(first_used != last_first_used); + } + last_first_used = first_used; + if (first_used > 0) nx_append_dde(ddl_in, fifo_in + cur_in, first_used); 2) It looks like you can provoke an out-of-bounds write. I've seen both infinte loops printing something that seems to come from the file content like: 57201: Got signal 11 si_code 3, si_addr 0xcacacacacacacac8 or a less bizzare address like 19285: Got signal 11 si_code 1, si_addr 0x7fffcf1b Depending on the build I've also seen the stack smasher protection fire. I don't understand the code well enough to figure out how this comes to be just yet. I've included a few test cases as attachments. I've preconverted them with xxd to avoid anything that might flag suspicious gzip files! Decompress them then use `xxd -r attachment testcase.gz` to convert them back. Regards, Daniel infloop.bz2 Description: Binary data sig1.bz2 Description: Binary data sig676767.bz2 Description: Binary data sigededed.bz2 Description: Binary data > > Signed-off-by: Bulent Abali > Signed-off-by: Raphael Moreira Zinsly > --- > .../selftests/powerpc/nx-gzip/Makefile|7 +- > .../selftests/powerpc/nx-gzip/gunz_test.c | 1058 + > 2 files changed, 1062 insertions(+), 3 deletions(-) > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > > diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile > b/tools/testing/selftests/powerpc/nx-gzip/Makefile > index ab903f63bbbd..82abc19a49a0 100644 > --- a/tools/testing/selftests/powerpc/nx-gzip/Makefile > +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile > @@ -1,9 +1,9 @@ > CC = gcc > CFLAGS = -O3 > INC = ./inc > -SRC = gzfht_test.c > +SRC = gzfht_test.c gunz_test.c > OBJ = $(SRC:.c=.o) > -TESTS = gzfht_test > +TESTS = gzfht_test gunz_test > EXTRA_SOURCES = gzip_vas.c > > all: $(TESTS) > @@ -16,6 +16,7 @@ $(TESTS): $(OBJ) > > run_tests: $(TESTS) > ./gzfht_test gzip_vas.c > + ./gunz_test gzip_vas.c.nx.gz > > clean: > - rm -f $(TESTS) *.o *~ *.gz > + rm -f $(TESTS) *.o *~ *.gz *.gunzip > diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > new file mode 100644 > index ..653de92698cc > --- /dev/null > +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > @@ -0,0 +1,1058 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * > + * P9 gunzip sample code for demonstrating the P9 NX hardware > + * interface. Not intended for productive uses or for performance or > + * compression ratio measurements. Note also that /dev/crypto/gzip, > + * VAS and skiboot support are required > + * > + * Copyright 2020 IBM Corp. > + * > + * Author: Bulent Abali > + * > + * https://github.com/libnxz/power-gzip for zlib api and other utils > + * Definitions of acronyms used here. See > + * P9 NX Gzip Accelerator User's Manual for details > + * > + * adler/crc: 32 bit checksums appended to stream tail > + * ce: completion extension > + * cpb: coprocessor parameter block (metadata) > + * crb: coprocessor request block (command) > + * csb: coprocessor status block (status) > + * dht: dynamic huffman table > + * dde: data descriptor element (address, length) > + * ddl: list of ddes > + * dh/fh:dynamic and fixed huffman types > + *
[PATCHv2] selftests/powerpc: Turn off timeout setting for benchmarks, dscr, signal, tm
Some specific tests in powerpc can take longer than the default 45 seconds that added in commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") to run, the following test result was collected across 2 Power8 nodes and 1 Power9 node in our pool: powerpc/benchmarks/futex_bench - 52s powerpc/dscr/dscr_sysfs_test - 116s powerpc/signal/signal_fuzzer - 88s powerpc/tm/tm_unavailable_test - 168s powerpc/tm/tm-poison - 240s Thus they will fail with TIMEOUT error. Disable the timeout setting for these sub-tests to allow them finish properly. https://bugs.launchpad.net/bugs/1864642 Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Po-Hsu Lin --- tools/testing/selftests/powerpc/benchmarks/Makefile | 2 ++ tools/testing/selftests/powerpc/benchmarks/settings | 1 + tools/testing/selftests/powerpc/dscr/Makefile | 2 ++ tools/testing/selftests/powerpc/dscr/settings | 1 + tools/testing/selftests/powerpc/signal/Makefile | 2 ++ tools/testing/selftests/powerpc/signal/settings | 1 + tools/testing/selftests/powerpc/tm/Makefile | 2 ++ tools/testing/selftests/powerpc/tm/settings | 1 + 8 files changed, 12 insertions(+) create mode 100644 tools/testing/selftests/powerpc/benchmarks/settings create mode 100644 tools/testing/selftests/powerpc/dscr/settings create mode 100644 tools/testing/selftests/powerpc/signal/settings create mode 100644 tools/testing/selftests/powerpc/tm/settings diff --git a/tools/testing/selftests/powerpc/benchmarks/Makefile b/tools/testing/selftests/powerpc/benchmarks/Makefile index d40300a..a32a6ab 100644 --- a/tools/testing/selftests/powerpc/benchmarks/Makefile +++ b/tools/testing/selftests/powerpc/benchmarks/Makefile @@ -2,6 +2,8 @@ TEST_GEN_PROGS := gettimeofday context_switch fork mmap_bench futex_bench null_syscall TEST_GEN_FILES := exec_target +TEST_FILES := settings + CFLAGS += -O2 top_srcdir = ../../../../.. diff --git a/tools/testing/selftests/powerpc/benchmarks/settings b/tools/testing/selftests/powerpc/benchmarks/settings new file mode 100644 index 000..e7b9417 --- /dev/null +++ b/tools/testing/selftests/powerpc/benchmarks/settings @@ -0,0 +1 @@ +timeout=0 diff --git a/tools/testing/selftests/powerpc/dscr/Makefile b/tools/testing/selftests/powerpc/dscr/Makefile index 5df4763..cfa6eed 100644 --- a/tools/testing/selftests/powerpc/dscr/Makefile +++ b/tools/testing/selftests/powerpc/dscr/Makefile @@ -3,6 +3,8 @@ TEST_GEN_PROGS := dscr_default_test dscr_explicit_test dscr_user_test \ dscr_inherit_test dscr_inherit_exec_test dscr_sysfs_test \ dscr_sysfs_thread_test +TEST_FILES := settings + top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/dscr/settings b/tools/testing/selftests/powerpc/dscr/settings new file mode 100644 index 000..e7b9417 --- /dev/null +++ b/tools/testing/selftests/powerpc/dscr/settings @@ -0,0 +1 @@ +timeout=0 diff --git a/tools/testing/selftests/powerpc/signal/Makefile b/tools/testing/selftests/powerpc/signal/Makefile index 113838f..153fafc 100644 --- a/tools/testing/selftests/powerpc/signal/Makefile +++ b/tools/testing/selftests/powerpc/signal/Makefile @@ -5,6 +5,8 @@ CFLAGS += -maltivec $(OUTPUT)/signal_tm: CFLAGS += -mhtm $(OUTPUT)/sigfuz: CFLAGS += -pthread -m64 +TEST_FILES := settings + top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/signal/settings b/tools/testing/selftests/powerpc/signal/settings new file mode 100644 index 000..e7b9417 --- /dev/null +++ b/tools/testing/selftests/powerpc/signal/settings @@ -0,0 +1 @@ +timeout=0 diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index b15a1a3..7b99d09 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -7,6 +7,8 @@ TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \ tm-signal-context-force-tm tm-poison +TEST_FILES := settings + top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/tm/settings b/tools/testing/selftests/powerpc/tm/settings new file mode 100644 index 000..e7b9417 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/settings @@ -0,0 +1 @@ +timeout=0 -- 2.7.4
Re: [PATCH 03/15] powerpc/watchpoint: Introduce function to get nr watchpoints dynamically
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index f2f8d8aa8e3b..741c4f7573c4 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -43,6 +43,8 @@ struct arch_hw_breakpoint { #define DABR_MAX_LEN 8 #define DAWR_MAX_LEN 512 +extern int nr_wp_slots(void); 'extern' keyword is unneeded and irrelevant here. Please remove it. Even checkpatch is unhappy (https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/12172//artifact/linux/checkpatch.log) Sure. ... diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 110db94cdf3c..6d4b029532e2 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -835,6 +835,12 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint *a, return true; } +/* Returns total number of data breakpoints available. */ +int nr_wp_slots(void) +{ + return HBP_NUM_MAX; +} + This is not worth a global function. At least it should be a static function located in hw_breakpoint.c. But it would be even better to have it as a static inline in asm/hw_breakpoint.h Makes sense. Will change it. Thanks.
Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
On Tue, 17 Mar 2020 08:04:14 + (UTC) Christophe Leroy wrote: > When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the > following build failure is encoutered: > > In file included from arch/powerpc/mm/fault.c:33:0: > ./include/linux/hugetlb.h: In function 'hstate_inode': > ./include/linux/hugetlb.h:477:9: error: implicit declaration of function > 'HUGETLBFS_SB' [-Werror=implicit-function-declaration] > return HUGETLBFS_SB(i->i_sb)->hstate; > ^ > ./include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have > 'int') > return HUGETLBFS_SB(i->i_sb)->hstate; > ^ > > Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE. > > Reported-by: kbuild test robot > Link: https://patchwork.ozlabs.org/patch/1255548/#2386036 > Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes") A 12 year old build error? If accurate, that has to be a world record. > Cc: sta...@vger.kernel.org I think I'll remove this. Obviously nobody is suffering from this problem!
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
On Wed, Mar 18, 2020 at 08:50:44AM +0530, Srikar Dronamraju wrote: > * Vlastimil Babka [2020-03-17 17:45:15]: > > > On 3/17/20 5:25 PM, Srikar Dronamraju wrote: > > > * Vlastimil Babka [2020-03-17 16:56:04]: > > > > > >> > > >> I wonder why do you get a memory leak while Sachin in the same situation > > >> [1] > > >> gets a crash? I don't understand anything anymore. > > > > > > Sachin was testing on linux-next which has Kirill's patch which modifies > > > slub to use kmalloc_node instead of kmalloc. While Bharata is testing on > > > upstream, which doesn't have this. > > > > Yes, that Kirill's patch was about the memcg shrinker map allocation. But > > the > > patch hunk that Bharata posted as a "hack" that fixes the problem, it > > follows > > that there has to be something else that calls kmalloc_node(node) where > > node is > > one that doesn't have present pages. > > > > He mentions alloc_fair_sched_group() which has: > > > > for_each_possible_cpu(i) { > > cfs_rq = kzalloc_node(sizeof(struct cfs_rq), > > GFP_KERNEL, cpu_to_node(i)); > > ... > > se = kzalloc_node(sizeof(struct sched_entity), > > GFP_KERNEL, cpu_to_node(i)); > > > > > Sachin's experiment. > Upstream-next/ memcg / > possible nodes were 0-31 > online nodes were 0-1 > kmalloc_node called for_each_node / for_each_possible_node. > This would crash while allocating slab from !N_ONLINE nodes. > > Bharata's experiment. > Upstream > possible nodes were 0-1 > online nodes were 0-1 > kmalloc_node called for_each_online_node/ for_each_possible_cpu > i.e kmalloc is called for N_ONLINE nodes. > So wouldn't crash > > Even if his possible nodes were 0-256. I don't think we have kmalloc_node > being called in !N_ONLINE nodes. Hence its not crashing. > If we see the above code that you quote, kzalloc_node is using cpu_to_node > which in Bharata's case will always return 1. > > > > I assume one of these structs is 1k and other 512 bytes (rounded) and that > > for > > some possible cpu's cpu_to_node(i) will be 0, which has no present pages. > > And as > > Bharata pasted, node_to_mem_node(0) = 0 Correct, these two kazalloc_node() calls for all possible cpus are causing increased slab memory consumption in my case. > > So this looks like the same scenario, but it doesn't crash? Is the node 0 > > actually online here, and/or does it have N_NORMAL_MEMORY state? > Node 0 is online, but N_NORMAL_MEMORY state is empty. In fact memory leak goes away if I insert the below check/assignment in the slab alloc code path: + if (!node_isset(node, node_states[N_NORMAL_MEMORY])) + node = NUMA_NO_NODE; Regards, Bharata.
Re: [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB
On 18/02/2020 18:36, Alexey Kardashevskiy wrote: > Here is an attempt to support bigger DMA space for devices > supporting DMA masks less than 59 bits (GPUs come into mind > first). POWER9 PHBs have an option to map 2 windows at 0 > and select a windows based on DMA address being below or above > 4GB. > > This adds the "iommu=iommu_bypass" kernel parameter and > supports VFIO+pseries machine - current this requires telling > upstream+unmodified QEMU about this via > -global spapr-pci-host-bridge.dma64_win_addr=0x1 > or per-phb property. 4/4 advertises the new option but > there is no automation around it in QEMU (should it be?). > > For now it is either 1<<59 or 4GB mode; dynamic switching is > not supported (could be via sysfs). > > This is a rebased version of > https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/ > > This is based on sha1 > 71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux". > > Please comment. Thanks. Anyone from the POWERPC side wants to comment? Thanks, > > > > Alexey Kardashevskiy (5): > powerpc/powernv/ioda: Move TCE bypass base to PE > powerpc/powernv/ioda: Rework for huge DMA window at 4GB > powerpc/powernv/ioda: Allow smaller TCE table levels > powerpc/powernv/phb4: Add 4GB IOMMU bypass mode > vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB > > arch/powerpc/include/asm/iommu.h | 2 + > arch/powerpc/include/asm/opal-api.h | 9 +- > arch/powerpc/include/asm/opal.h | 2 + > arch/powerpc/platforms/powernv/pci.h | 2 +- > include/uapi/linux/vfio.h | 2 + > arch/powerpc/platforms/powernv/npu-dma.c | 1 + > arch/powerpc/platforms/powernv/opal-call.c| 2 + > arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 +- > arch/powerpc/platforms/powernv/pci-ioda.c | 229 ++ > drivers/vfio/vfio_iommu_spapr_tce.c | 10 +- > 10 files changed, 207 insertions(+), 56 deletions(-) > -- Alexey
Re: [PATCH 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase
Raphael Moreira Zinsly writes: > Include a decompression testcase for the powerpc NX-GZIP > engine. > > Signed-off-by: Bulent Abali > Signed-off-by: Raphael Moreira Zinsly > --- > .../selftests/powerpc/nx-gzip/Makefile|7 +- > .../selftests/powerpc/nx-gzip/gunz_test.c | 1058 + > 2 files changed, 1062 insertions(+), 3 deletions(-) > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > > diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile > b/tools/testing/selftests/powerpc/nx-gzip/Makefile > index ab903f63bbbd..82abc19a49a0 100644 > --- a/tools/testing/selftests/powerpc/nx-gzip/Makefile > +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile > @@ -1,9 +1,9 @@ > CC = gcc > CFLAGS = -O3 > INC = ./inc > -SRC = gzfht_test.c > +SRC = gzfht_test.c gunz_test.c > OBJ = $(SRC:.c=.o) > -TESTS = gzfht_test > +TESTS = gzfht_test gunz_test > EXTRA_SOURCES = gzip_vas.c > > all: $(TESTS) > @@ -16,6 +16,7 @@ $(TESTS): $(OBJ) > > run_tests: $(TESTS) > ./gzfht_test gzip_vas.c > + ./gunz_test gzip_vas.c.nx.gz > > clean: > - rm -f $(TESTS) *.o *~ *.gz > + rm -f $(TESTS) *.o *~ *.gz *.gunzip > diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > new file mode 100644 > index ..653de92698cc > --- /dev/null > +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c > @@ -0,0 +1,1058 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * > + * P9 gunzip sample code for demonstrating the P9 NX hardware > + * interface. Not intended for productive uses or for performance or > + * compression ratio measurements. Note also that /dev/crypto/gzip, > + * VAS and skiboot support are required > + * > + * Copyright 2020 IBM Corp. > + * > + * Author: Bulent Abali > + * > + * https://github.com/libnxz/power-gzip for zlib api and other utils > + * Definitions of acronyms used here. See > + * P9 NX Gzip Accelerator User's Manual for details > + * > + * adler/crc: 32 bit checksums appended to stream tail > + * ce: completion extension > + * cpb: coprocessor parameter block (metadata) > + * crb: coprocessor request block (command) > + * csb: coprocessor status block (status) > + * dht: dynamic huffman table > + * dde: data descriptor element (address, length) > + * ddl: list of ddes > + * dh/fh:dynamic and fixed huffman types > + * fc: coprocessor function code > + * histlen: history/dictionary length > + * history: sliding window of up to 32KB of data > + * lzcount: Deflate LZ symbol counts > + * rembytecnt: remaining byte count > + * sfbt: source final block type; last block's type during decomp > + * spbc: source processed byte count > + * subc: source unprocessed bit count > + * tebc: target ending bit count; valid bits in the last byte > + * tpbc: target processed byte count > + * vas: virtual accelerator switch; the user mode interface > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "nxu.h" > +#include "nx.h" > + > +int nx_dbg = 0; > +FILE *nx_gzip_log = NULL; > + > +#define NX_MIN(X, Y) (((X) < (Y))?(X):(Y)) > +#define NX_MAX(X, Y) (((X) > (Y))?(X):(Y)) > + > +#define mb() asm volatile("sync" ::: "memory") > +#define rmb()asm volatile("lwsync" ::: "memory") > +#define wmb()rmb() > + > +const int fifo_in_len = 1<<24; > +const int fifo_out_len = 1<<24; > +const int page_sz = 1<<16; > +const int line_sz = 1<<7; > +const int window_max = 1<<15; > +const int retry_max = 50; > + > +extern void *nx_fault_storage_address; > +extern void *nx_function_begin(int function, int pri); > +extern int nx_function_end(void *handle); > + > +/* > + * Fault in pages prior to NX job submission. wr=1 may be required to > + * touch writeable pages. System zero pages do not fault-in the page as > + * intended. Typically set wr=1 for NX target pages and set wr=0 for > + * NX source pages. > + */ > +static int nx_touch_pages(void *buf, long buf_len, long page_len, int wr) > +{ > + char *begin = buf; > + char *end = (char *) buf + buf_len - 1; > + volatile char t; > + > + assert(buf_len >= 0 && !!buf); > + > + NXPRT(fprintf(stderr, "touch %p %p len 0x%lx wr=%d\n", buf, > + buf + buf_len, buf_len, wr)); > + > + if (buf_len <= 0 || buf == NULL) > + return -1; > + > + do { > + t = *begin; > + if (wr) > + *begin = t; > + begin = begin + page_len; > + } while (begin < end); > + > + /* When buf_sz is small or buf tail is in another page. */ > + t = *end; > + if (wr) > + *end = t; > + > + return 0; > +} > + > +void sigsegv_handler(int sig, siginfo_t *info, void *ctx) > +{ >
Re: [PATCH 1/5] selftests/powerpc: Add header files for GZIP engine test
Hi, This is throwing a number of snowpatch warnings, as well as a whitespace warning when I apply it. Please could you check the warnings at https://patchwork.ozlabs.org/patch/1255779/ It looks like the rest of the series also throws some warnings - please check those also. Kind regards, Daniel Raphael Moreira Zinsly writes: > Add files to access the powerpc NX-GZIP engine in user space. > > Signed-off-by: Bulent Abali > Signed-off-by: Raphael Moreira Zinsly > --- > .../selftests/powerpc/nx-gzip/inc/crb.h | 170 ++ > .../selftests/powerpc/nx-gzip/inc/nx-gzip.h | 27 +++ > .../powerpc/nx-gzip/inc/nx-helpers.h | 53 ++ > .../selftests/powerpc/nx-gzip/inc/nx.h| 30 > 4 files changed, 280 insertions(+) > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/crb.h > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h > create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx.h > > diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h > b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h > new file mode 100644 > index ..6af25fb8461a > --- /dev/null > +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h > @@ -0,0 +1,170 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +#ifndef __CRB_H > +#define __CRB_H > +#include > + > +typedef unsigned char u8; > +typedef unsigned int u32; > +typedef unsigned long long u64; > + > +/* From nx-842.h */ > + > +/* CCW 842 CI/FC masks > + * NX P8 workbook, section 4.3.1, figure 4-6 > + * "CI/FC Boundary by NX CT type" > + */ > +#define CCW_CI_842 (0x3ff8) > +#define CCW_FC_842 (0x0007) > + > +/* end - nx-842.h */ > + > +#ifndef __aligned > +#define __aligned(x)__attribute__((aligned(x))) > +#endif > + > +#ifndef __packed > +#define __packed__attribute__((packed)) > +#endif > + > +/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */ > + > +#define CCB_VALUE(0x3fff) > +#define CCB_ADDRESS (0xfff8) > +#define CCB_CM (0x0007) > +#define CCB_CM0 (0x0004) > +#define CCB_CM12 (0x0003) > + > +#define CCB_CM0_ALL_COMPLETIONS (0x0) > +#define CCB_CM0_LAST_IN_CHAIN(0x4) > +#define CCB_CM12_STORE (0x0) > +#define CCB_CM12_INTERRUPT (0x1) > + > +#define CCB_SIZE (0x10) > +#define CCB_ALIGNCCB_SIZE > + > +struct coprocessor_completion_block { > + __be64 value; > + __be64 address; > +} __packed __aligned(CCB_ALIGN); > + > + > +/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */ > + > +#define CSB_V(0x80) > +#define CSB_F(0x04) > +#define CSB_CH (0x03) > +#define CSB_CE_INCOMPLETE(0x80) > +#define CSB_CE_TERMINATION (0x40) > +#define CSB_CE_TPBC (0x20) > + > +#define CSB_CC_SUCCESS (0) > +#define CSB_CC_INVALID_ALIGN (1) > +#define CSB_CC_OPERAND_OVERLAP (2) > +#define CSB_CC_DATA_LENGTH (3) > +#define CSB_CC_TRANSLATION (5) > +#define CSB_CC_PROTECTION(6) > +#define CSB_CC_RD_EXTERNAL (7) > +#define CSB_CC_INVALID_OPERAND (8) > +#define CSB_CC_PRIVILEGE (9) > +#define CSB_CC_INTERNAL (10) > +#define CSB_CC_WR_EXTERNAL (12) > +#define CSB_CC_NOSPC (13) > +#define CSB_CC_EXCESSIVE_DDE (14) > +#define CSB_CC_WR_TRANSLATION(15) > +#define CSB_CC_WR_PROTECTION (16) > +#define CSB_CC_UNKNOWN_CODE (17) > +#define CSB_CC_ABORT (18) > +#define CSB_CC_TRANSPORT (20) > +#define CSB_CC_SEGMENTED_DDL (31) > +#define CSB_CC_PROGRESS_POINT(32) > +#define CSB_CC_DDE_OVERFLOW (33) > +#define CSB_CC_SESSION (34) > +#define CSB_CC_PROVISION (36) > +#define CSB_CC_CHAIN (37) > +#define CSB_CC_SEQUENCE (38) > +#define CSB_CC_HW(39) > + > +#define CSB_SIZE (0x10) > +#define CSB_ALIGNCSB_SIZE > + > +struct coprocessor_status_block { > + u8 flags; > + u8 cs; > + u8 cc; > + u8 ce; > + __be32 count; > + __be64 address; > +} __packed __aligned(CSB_ALIGN); > + > + > +/* Chapter 6.5.10 Data-Descriptor List (DDL) > + * each list contains one or more Data-Descriptor Entries (DDE) > + */ > + > +#define DDE_P(0x8000) > + > +#define DDE_SIZE (0x10) > +#define DDE_ALIGNDDE_SIZE > + > +struct data_descriptor_entry { > + __be16 flags; > + u8 count; > + u8 index; > + __be32 length; > + __be64 address; > +} __packed __aligned(DDE_ALIGN); > + > + > +/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */ > + > +#define CRB_SIZE (0x80) > +#define CRB_ALIGN(0x100) /* Errata: requires 256 alignment */ > + > + > +/* Copro
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
* Vlastimil Babka [2020-03-17 17:45:15]: > On 3/17/20 5:25 PM, Srikar Dronamraju wrote: > > * Vlastimil Babka [2020-03-17 16:56:04]: > > > >> > >> I wonder why do you get a memory leak while Sachin in the same situation > >> [1] > >> gets a crash? I don't understand anything anymore. > > > > Sachin was testing on linux-next which has Kirill's patch which modifies > > slub to use kmalloc_node instead of kmalloc. While Bharata is testing on > > upstream, which doesn't have this. > > Yes, that Kirill's patch was about the memcg shrinker map allocation. But the > patch hunk that Bharata posted as a "hack" that fixes the problem, it follows > that there has to be something else that calls kmalloc_node(node) where node > is > one that doesn't have present pages. > > He mentions alloc_fair_sched_group() which has: > > for_each_possible_cpu(i) { > cfs_rq = kzalloc_node(sizeof(struct cfs_rq), > GFP_KERNEL, cpu_to_node(i)); > ... > se = kzalloc_node(sizeof(struct sched_entity), > GFP_KERNEL, cpu_to_node(i)); > Sachin's experiment. Upstream-next/ memcg / possible nodes were 0-31 online nodes were 0-1 kmalloc_node called for_each_node / for_each_possible_node. This would crash while allocating slab from !N_ONLINE nodes. Bharata's experiment. Upstream possible nodes were 0-1 online nodes were 0-1 kmalloc_node called for_each_online_node/ for_each_possible_cpu i.e kmalloc is called for N_ONLINE nodes. So wouldn't crash Even if his possible nodes were 0-256. I don't think we have kmalloc_node being called in !N_ONLINE nodes. Hence its not crashing. If we see the above code that you quote, kzalloc_node is using cpu_to_node which in Bharata's case will always return 1. > I assume one of these structs is 1k and other 512 bytes (rounded) and that for > some possible cpu's cpu_to_node(i) will be 0, which has no present pages. And > as > Bharata pasted, node_to_mem_node(0) = 0 > So this looks like the same scenario, but it doesn't crash? Is the node 0 > actually online here, and/or does it have N_NORMAL_MEMORY state? I still dont have any clue on the leak though. -- Thanks and Regards Srikar Dronamraju
[PATCH] soc: fsl: enable acpi support
This patch enables ACPI support in Rcpm driver. Signed-off-by: Peng Ma --- drivers/soc/fsl/rcpm.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index a093dbe..7da6bbd 100644 --- a/drivers/soc/fsl/rcpm.c +++ b/drivers/soc/fsl/rcpm.c @@ -13,6 +13,7 @@ #include #include #include +#include #define RCPM_WAKEUP_CELL_MAX_SIZE 7 @@ -139,10 +140,17 @@ static const struct of_device_id rcpm_of_match[] = { }; MODULE_DEVICE_TABLE(of, rcpm_of_match); +static const struct acpi_device_id rcpm_imx_acpi_ids[] = { + {"NXP0012",}, + { } +}; +MODULE_DEVICE_TABLE(acpi, rcpm_imx_acpi_ids); + static struct platform_driver rcpm_driver = { .driver = { .name = "rcpm", .of_match_table = rcpm_of_match, + .acpi_match_table = ACPI_PTR(rcpm_imx_acpi_ids), .pm = &rcpm_pm_ops, }, .probe = rcpm_probe, -- 2.9.5
[powerpc:merge] BUILD SUCCESS 8a445cbcb9f5090cb07ec6cbb89a8a1fc99a0ff7
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 8a445cbcb9f5090cb07ec6cbb89a8a1fc99a0ff7 Automatic merge of branches 'master', 'next' and 'fixes' into merge elapsed time: 785m configs tested: 156 configs skipped: 0 The following configs have been built successfully. More configs may be tested in the coming days. arm64allyesconfig arm allyesconfig arm64 allnoconfig arm allnoconfig arm at91_dt_defconfig arm efm32_defconfig arm exynos_defconfig armmulti_v5_defconfig armmulti_v7_defconfig armshmobile_defconfig arm sunxi_defconfig arm64 defconfig sparcallyesconfig arm64allmodconfig mips fuloong2e_defconfig s390defconfig openrisc simple_smp_defconfig riscv allnoconfig s390 allyesconfig i386 allnoconfig i386 alldefconfig i386 allyesconfig i386defconfig ia64 alldefconfig ia64 allmodconfig ia64 allnoconfig ia64 allyesconfig ia64defconfig arm allmodconfig c6x allyesconfig c6xevmc6678_defconfig nios2 10m50_defconfig nios2 3c120_defconfig openriscor1ksim_defconfig xtensa common_defconfig xtensa iss_defconfig alpha defconfig cskydefconfig nds32 allnoconfig nds32 defconfig h8300 edosk2674_defconfig h8300h8300h-sim_defconfig h8300 h8s-sim_defconfig m68k allmodconfig m68k m5475evb_defconfig m68k multi_defconfig m68k sun3_defconfig arc allyesconfig arc defconfig microblaze mmu_defconfig microblazenommu_defconfig powerpc allnoconfig powerpc defconfig powerpc ppc64_defconfig powerpc rhel-kconfig mips 32r2_defconfig mips 64r6el_defconfig mips allmodconfig mips allnoconfig mips allyesconfig mips malta_kvm_defconfig pariscallnoconfig pariscgeneric-64bit_defconfig pariscgeneric-32bit_defconfig parisc allyesconfig x86_64 randconfig-a001-20200317 x86_64 randconfig-a002-20200317 x86_64 randconfig-a003-20200317 i386 randconfig-a001-20200317 i386 randconfig-a002-20200317 i386 randconfig-a003-20200317 alpha randconfig-a001-20200317 m68k randconfig-a001-20200317 mips randconfig-a001-20200317 nds32 randconfig-a001-20200317 parisc randconfig-a001-20200317 h8300randconfig-a001-20200318 sparc64 randconfig-a001-20200318 c6x randconfig-a001-20200318 nios2randconfig-a001-20200318 microblaze randconfig-a001-20200318 csky randconfig-a001-20200318 openrisc randconfig-a001-20200318 s390 randconfig-a001-20200318 sh randconfig-a001-20200318 xtensa randconfig-a001-20200318 xtensa randconfig-a001-20200317 openrisc randconfig-a001-20200317 csky randconfig-a001-20200317 sh randconfig-a001-20200317 s390 randconfig-a001-20200317 x86_64 randconfig-b001-20200317 x86_64 randconfig-b002-20200317 x86_64 randconfig-b003-20200317 i386 randconfig-b001-20200317 i386 randconfig-b002-20200317 i386 randconfig-b003-20200317 x86_64 randconfig-c001-20200317 x86_64 randconfig-c002-20200317 x86_64
[powerpc:next-test] BUILD SUCCESS 033c9d310e29706e46c05f4fe7e863b1f32939db
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: 033c9d310e29706e46c05f4fe7e863b1f32939db powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries elapsed time: 669m configs tested: 156 configs skipped: 7 The following configs have been built successfully. More configs may be tested in the coming days. arm64allyesconfig arm allyesconfig arm64 allnoconfig arm allnoconfig arm at91_dt_defconfig arm efm32_defconfig arm exynos_defconfig armmulti_v5_defconfig armmulti_v7_defconfig armshmobile_defconfig arm sunxi_defconfig arm64 defconfig sparcallyesconfig arm64allmodconfig mips fuloong2e_defconfig s390defconfig openrisc simple_smp_defconfig riscv allnoconfig s390 allyesconfig powerpc allnoconfig i386 allnoconfig i386 alldefconfig i386 allyesconfig i386defconfig ia64 alldefconfig ia64 allmodconfig ia64 allnoconfig ia64 allyesconfig ia64defconfig arm allmodconfig c6x allyesconfig c6xevmc6678_defconfig nios2 10m50_defconfig nios2 3c120_defconfig openriscor1ksim_defconfig xtensa common_defconfig xtensa iss_defconfig alpha defconfig cskydefconfig nds32 allnoconfig nds32 defconfig h8300 edosk2674_defconfig h8300h8300h-sim_defconfig h8300 h8s-sim_defconfig m68k allmodconfig m68k m5475evb_defconfig m68k multi_defconfig m68k sun3_defconfig arc defconfig arc allyesconfig powerpc ppc64_defconfig powerpc rhel-kconfig microblaze mmu_defconfig microblazenommu_defconfig powerpc defconfig mips 32r2_defconfig mips 64r6el_defconfig mips allmodconfig mips allnoconfig mips allyesconfig mips malta_kvm_defconfig pariscallnoconfig parisc allyesconfig pariscgeneric-32bit_defconfig pariscgeneric-64bit_defconfig x86_64 randconfig-a001-20200317 x86_64 randconfig-a002-20200317 x86_64 randconfig-a003-20200317 i386 randconfig-a001-20200317 i386 randconfig-a002-20200317 i386 randconfig-a003-20200317 alpharandconfig-a001-20200317 m68k randconfig-a001-20200317 mips randconfig-a001-20200317 nds32randconfig-a001-20200317 parisc randconfig-a001-20200317 c6x randconfig-a001-20200317 h8300randconfig-a001-20200317 microblaze randconfig-a001-20200317 nios2randconfig-a001-20200317 sparc64 randconfig-a001-20200317 csky randconfig-a001-20200318 openrisc randconfig-a001-20200318 s390 randconfig-a001-20200318 sh randconfig-a001-20200318 xtensa randconfig-a001-20200318 csky randconfig-a001-20200317 openrisc randconfig-a001-20200317 s390 randconfig-a001-20200317 sh randconfig-a001-20200317 xtensa randconfig-a001-20200317 x86_64 randconfig-b001-20200317 x86_64 randconfig-b002-20200317 x86_64 randconfig-b003-20200317 i386 randconfig-b001-20200317 i386 randconfig-b002-20200317 i386 randconfig-b003-20200317 x86_64 randconfig-c001-20200317 x86_64 randconfig-c002-20200317 x86_64 randconfig-c003-20200317 i386
[powerpc:fixes-test] BUILD SUCCESS af3d0a68698c7e5df8b72267086b23422a3954bb
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: af3d0a68698c7e5df8b72267086b23422a3954bb powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC elapsed time: 2837m configs tested: 251 configs skipped: 218 The following configs have been built successfully. More configs may be tested in the coming days. arm allmodconfig arm allnoconfig arm allyesconfig arm64allmodconfig arm64 allnoconfig arm64allyesconfig arm at91_dt_defconfig arm efm32_defconfig arm exynos_defconfig armmulti_v5_defconfig armmulti_v7_defconfig armshmobile_defconfig arm sunxi_defconfig arm64 defconfig sparcallyesconfig pariscgeneric-32bit_defconfig mips allyesconfig riscvnommu_virt_defconfig h8300 edosk2674_defconfig mips fuloong2e_defconfig s390defconfig m68k sun3_defconfig sparc defconfig ia64defconfig powerpc defconfig riscv allnoconfig s390 alldefconfig s390 allyesconfig sparc64 defconfig shtitan_defconfig nios2 3c120_defconfig riscv rv32_defconfig i386defconfig nds32 allnoconfig powerpc ppc64_defconfig powerpc allnoconfig ia64 allnoconfig pariscallnoconfig s390 debug_defconfig i386 allnoconfig i386 alldefconfig i386 allyesconfig ia64 alldefconfig ia64 allmodconfig ia64 allyesconfig c6x allyesconfig c6xevmc6678_defconfig nios2 10m50_defconfig openriscor1ksim_defconfig openrisc simple_smp_defconfig xtensa common_defconfig xtensa iss_defconfig alpha defconfig cskydefconfig nds32 defconfig h8300h8300h-sim_defconfig h8300 h8s-sim_defconfig m68k allmodconfig m68k m5475evb_defconfig m68k multi_defconfig arc allyesconfig arc defconfig microblaze mmu_defconfig microblazenommu_defconfig powerpc rhel-kconfig mips 32r2_defconfig mips 64r6el_defconfig mips allmodconfig mips allnoconfig mips malta_kvm_defconfig parisc allyesconfig pariscgeneric-64bit_defconfig x86_64 randconfig-a001-20200316 x86_64 randconfig-a002-20200316 x86_64 randconfig-a003-20200316 i386 randconfig-a001-20200316 i386 randconfig-a002-20200316 i386 randconfig-a003-20200316 x86_64 randconfig-a001-20200317 x86_64 randconfig-a002-20200317 x86_64 randconfig-a003-20200317 i386 randconfig-a001-20200317 i386 randconfig-a002-20200317 i386 randconfig-a003-20200317 x86_64 randconfig-a001-20200318 x86_64 randconfig-a002-20200318 x86_64 randconfig-a003-20200318 i386 randconfig-a001-20200318 i386 randconfig-a002-20200318 i386 randconfig-a003-20200318 alpharandconfig-a001-20200317 m68k randconfig-a001-20200317 mips randconfig-a001-20200317 nds32randconfig-a001-20200317 parisc randconfig-a001-20200317 riscvrandconfig-a001-20200317 alpharandconfig-a001-20200316 m68k randconfig-a001-20200316 mips randconfig-a001-20200316 nds32randconfig-a001-20200316 parisc
[powerpc:next] BUILD SUCCESS 59ed2adf393109c56d383e568f2e57bb5ad6d901
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: 59ed2adf393109c56d383e568f2e57bb5ad6d901 powerpc/lib: Fix emulate_step() std test elapsed time: 669m configs tested: 161 configs skipped: 0 The following configs have been built successfully. More configs may be tested in the coming days. arm64allyesconfig arm allyesconfig arm64 allnoconfig arm allnoconfig arm at91_dt_defconfig arm efm32_defconfig arm exynos_defconfig armmulti_v5_defconfig armmulti_v7_defconfig armshmobile_defconfig arm sunxi_defconfig arm64 defconfig arm64allmodconfig mips fuloong2e_defconfig s390defconfig openrisc simple_smp_defconfig riscv allnoconfig s390 allyesconfig powerpc allnoconfig i386 allnoconfig i386 alldefconfig i386 allyesconfig i386defconfig ia64 alldefconfig ia64 allmodconfig ia64 allnoconfig ia64 allyesconfig ia64defconfig arm allmodconfig c6x allyesconfig c6xevmc6678_defconfig nios2 10m50_defconfig nios2 3c120_defconfig openriscor1ksim_defconfig xtensa common_defconfig xtensa iss_defconfig alpha defconfig cskydefconfig nds32 allnoconfig nds32 defconfig h8300 edosk2674_defconfig h8300h8300h-sim_defconfig h8300 h8s-sim_defconfig m68k allmodconfig m68k m5475evb_defconfig m68k multi_defconfig m68k sun3_defconfig arc allyesconfig arc defconfig microblaze mmu_defconfig microblazenommu_defconfig powerpc defconfig powerpc ppc64_defconfig powerpc rhel-kconfig mips 32r2_defconfig mips 64r6el_defconfig mips allmodconfig mips allnoconfig mips allyesconfig mips malta_kvm_defconfig pariscallnoconfig parisc allyesconfig pariscgeneric-32bit_defconfig pariscgeneric-64bit_defconfig x86_64 randconfig-a001-20200317 x86_64 randconfig-a002-20200317 x86_64 randconfig-a003-20200317 i386 randconfig-a001-20200317 i386 randconfig-a002-20200317 i386 randconfig-a003-20200317 alpharandconfig-a001-20200317 m68k randconfig-a001-20200317 mips randconfig-a001-20200317 nds32randconfig-a001-20200317 parisc randconfig-a001-20200317 c6x randconfig-a001-20200317 h8300randconfig-a001-20200317 microblaze randconfig-a001-20200317 nios2randconfig-a001-20200317 sparc64 randconfig-a001-20200317 csky randconfig-a001-20200318 openrisc randconfig-a001-20200318 s390 randconfig-a001-20200318 sh randconfig-a001-20200318 xtensa randconfig-a001-20200318 csky randconfig-a001-20200317 openrisc randconfig-a001-20200317 s390 randconfig-a001-20200317 sh randconfig-a001-20200317 xtensa randconfig-a001-20200317 x86_64 randconfig-b001-20200317 x86_64 randconfig-b002-20200317 x86_64 randconfig-b003-20200317 i386 randconfig-b001-20200317 i386 randconfig-b002-20200317 i386 randconfig-b003-20200317 x86_64 randconfig-c001-20200317 x86_64 randconfig-c002-20200317 x86_64 randconfig-c003-20200317 i386 randconfig-c001-20200317 i386 randconfig-c002-20200317 i386
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
Hi Borislav, I love your patch! Yet something to improve: [auto build test ERROR on tip/x86/mm] [cannot apply to linux/master powerpc/next s390/features tip/x86/core linus/master v5.6-rc6 next-20200317] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Borislav-Petkov/treewide-Rename-unencrypted-to-decrypted/20200318-015738 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1 config: powerpc-defconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): powerpc64-linux-ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash' powerpc64-linux-ld: kernel/dma/mapping.o: in function `.dma_pgprot': >> mapping.c:(.text+0xc5c): undefined reference to `.force_dma_decrypted' powerpc64-linux-ld: kernel/dma/mapping.o: in function `.dma_common_mmap': mapping.c:(.text+0xcfc): undefined reference to `.force_dma_decrypted' powerpc64-linux-ld: kernel/dma/direct.o: in function `.dma_direct_get_required_mask': >> direct.c:(.text+0x920): undefined reference to `.force_dma_decrypted' powerpc64-linux-ld: kernel/dma/direct.o: in function `.__dma_direct_alloc_pages': direct.c:(.text+0x9fc): undefined reference to `.force_dma_decrypted' >> powerpc64-linux-ld: direct.c:(.text+0xaa8): undefined reference to >> `.force_dma_decrypted' powerpc64-linux-ld: kernel/dma/direct.o:direct.c:(.text+0xb28): more undefined references to `.force_dma_decrypted' follow --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
Hi Borislav, I love your patch! Yet something to improve: [auto build test ERROR on tip/x86/mm] [cannot apply to linux/master powerpc/next s390/features tip/x86/core linus/master v5.6-rc6 next-20200317] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Borislav-Petkov/treewide-Rename-unencrypted-to-decrypted/20200318-015738 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1 config: s390-zfcpdump_defconfig (attached as .config) compiler: s390-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=s390 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): s390-linux-ld: kernel/dma/mapping.o: in function `dma_pgprot': mapping.c:(.text+0x144): undefined reference to `force_dma_decrypted' s390-linux-ld: kernel/dma/mapping.o: in function `dma_common_mmap': mapping.c:(.text+0x1a4): undefined reference to `force_dma_decrypted' s390-linux-ld: kernel/dma/direct.o: in function `dma_direct_get_required_mask': direct.c:(.text+0xae): undefined reference to `force_dma_decrypted' s390-linux-ld: kernel/dma/direct.o: in function `__dma_direct_alloc_pages': direct.c:(.text+0x152): undefined reference to `force_dma_decrypted' >> s390-linux-ld: direct.c:(.text+0x1e8): undefined reference to >> `force_dma_decrypted' s390-linux-ld: kernel/dma/direct.o:direct.c:(.text+0x2e0): more undefined references to `force_dma_decrypted' follow --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn
Hi Pratik, Thanks. I have checked: - for matching puts/gets - that all the '.' to '->' conversions, aud uses of '&' check out - that the Snowpatch checks pass (https://patchwork.ozlabs.org/patch/1255580/) On that basis: Reviewed-by: Daniel Axtens Regards, Daniel > The patch avoids allocating cpufreq_policy on stack hence fixing frame > size overflow in 'powernv_cpufreq_work_fn' > > Fixes: 227942809b52 ("cpufreq: powernv: Restore cpu frequency to policy->cur > on unthrottling") > Signed-off-by: Pratik Rajesh Sampat > --- > drivers/cpufreq/powernv-cpufreq.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c > b/drivers/cpufreq/powernv-cpufreq.c > index 56f4bc0d209e..20ee0661555a 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -902,6 +902,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = { > void powernv_cpufreq_work_fn(struct work_struct *work) > { > struct chip *chip = container_of(work, struct chip, throttle); > + struct cpufreq_policy *policy; > unsigned int cpu; > cpumask_t mask; > > @@ -916,12 +917,14 @@ void powernv_cpufreq_work_fn(struct work_struct *work) > chip->restore = false; > for_each_cpu(cpu, &mask) { > int index; > - struct cpufreq_policy policy; > > - cpufreq_get_policy(&policy, cpu); > - index = cpufreq_table_find_index_c(&policy, policy.cur); > - powernv_cpufreq_target_index(&policy, index); > - cpumask_andnot(&mask, &mask, policy.cpus); > + policy = cpufreq_cpu_get(cpu); > + if (!policy) > + continue; > + index = cpufreq_table_find_index_c(policy, policy->cur); > + powernv_cpufreq_target_index(policy, index); > + cpumask_andnot(&mask, &mask, policy->cpus); > + cpufreq_cpu_put(policy); > } > out: > put_online_cpus(); > -- > 2.24.1
Re: [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online
On Tue, Mar 17, 2020 at 11:49:40AM +0100, David Hildenbrand wrote: >All in-tree users except the mm-core are gone. Let's drop the export. > >Cc: Andrew Morton >Cc: Michal Hocko >Cc: Oscar Salvador >Cc: "Rafael J. Wysocki" >Cc: Baoquan He >Cc: Wei Yang >Signed-off-by: David Hildenbrand Reviewed-by: Wei Yang >--- > mm/memory_hotplug.c | 1 - > 1 file changed, 1 deletion(-) > >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >index 1a00b5a37ef6..2d2aae830b92 100644 >--- a/mm/memory_hotplug.c >+++ b/mm/memory_hotplug.c >@@ -71,7 +71,6 @@ bool memhp_auto_online; > #else > bool memhp_auto_online = true; > #endif >-EXPORT_SYMBOL_GPL(memhp_auto_online); > > static int __init setup_memhp_default_state(char *str) > { >-- >2.24.1 -- Wei Yang Help you, Help me
Re: [PATCH v2 4/8] powernv/memtrace: always online added memory blocks
On Tue, Mar 17, 2020 at 11:49:38AM +0100, David Hildenbrand wrote: >Let's always try to online the re-added memory blocks. In case add_memory() >already onlined the added memory blocks, the first device_online() call >will fail and stop processing the remaining memory blocks. > >This avoids manually having to check memhp_auto_online. > >Note: PPC always onlines all hotplugged memory directly from the kernel >as well - something that is handled by user space on other >architectures. > >Cc: Benjamin Herrenschmidt >Cc: Paul Mackerras >Cc: Michael Ellerman >Cc: Andrew Morton >Cc: Greg Kroah-Hartman >Cc: Michal Hocko >Cc: Oscar Salvador >Cc: "Rafael J. Wysocki" >Cc: Baoquan He >Cc: Wei Yang >Cc: linuxppc-dev@lists.ozlabs.org >Signed-off-by: David Hildenbrand Looks good. Reviewed-by: Wei Yang >--- > arch/powerpc/platforms/powernv/memtrace.c | 14 -- > 1 file changed, 4 insertions(+), 10 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/memtrace.c >b/arch/powerpc/platforms/powernv/memtrace.c >index d6d64f8718e6..13b369d2cc45 100644 >--- a/arch/powerpc/platforms/powernv/memtrace.c >+++ b/arch/powerpc/platforms/powernv/memtrace.c >@@ -231,16 +231,10 @@ static int memtrace_online(void) > continue; > } > >- /* >- * If kernel isn't compiled with the auto online option >- * we need to online the memory ourselves. >- */ >- if (!memhp_auto_online) { >- lock_device_hotplug(); >- walk_memory_blocks(ent->start, ent->size, NULL, >- online_mem_block); >- unlock_device_hotplug(); >- } >+ lock_device_hotplug(); >+ walk_memory_blocks(ent->start, ent->size, NULL, >+ online_mem_block); >+ unlock_device_hotplug(); > > /* >* Memory was added successfully so clean up references to it >-- >2.24.1 -- Wei Yang Help you, Help me
Re: [PATCH 6/6] soc: fsl: qe: fix sparse warnings for ucc_slow.c
On Mon, Mar 16, 2020 at 4:08 PM Rasmus Villemoes wrote: > > On 12/03/2020 23.28, Li Yang wrote: > > Fixes the following sparse warnings: > > > [snip] > > > > Also removed the unneccessary clearing for kzalloc'ed structure. > > Please don't mix that in the same patch, do it in a preparatory patch. > That makes reviewing much easier. Thanks for the review. One of the few lines removed are actually causing sparse warning, that's why I changed this together with the sparse fixes. I can add all the lines removed in the log for better record. > > > > > /* Get PRAM base */ > > uccs->us_pram_offset = > > @@ -231,24 +224,24 @@ int ucc_slow_init(struct ucc_slow_info * us_info, > > struct ucc_slow_private ** ucc > > /* clear bd buffer */ > > qe_iowrite32be(0, &bd->buf); > > /* set bd status and length */ > > - qe_iowrite32be(0, (u32 *)bd); > > + qe_iowrite32be(0, (u32 __iomem *)bd); > > It's cleaner to do two qe_iowrite16be to &bd->status and &bd->length, > that avoids the casting altogether. It probably is cleaner, but also slower for the bd manipulation that can be in the hot path. The QE code has been using this way to access/update the bd status and length together for a long time. And I remembered that it could help to prevent synchronization issues for accessing status and length separately. It is probably cleaner to change the data structure to use union for accessing status and length together. But that will need a big change to update all the current users of the structure which I don't have the time to do right now. > > > bd++; > > } > > /* for last BD set Wrap bit */ > > qe_iowrite32be(0, &bd->buf); > > - qe_iowrite32be(cpu_to_be32(T_W), (u32 *)bd); > > + qe_iowrite32be(T_W, (u32 __iomem *)bd); > > Yeah, and this is why. Who can actually keep track of where that bit > ends up being set with that casting going on. Please use > qe_iowrite16be() with an appropriately modified constant to the > appropriate field instead of these games. > > And if the hardware doesn't support 16 bit writes, the definition of > struct qe_bd is wrong and should have a single __be32 status_length > field, with appropriate accessors defined. Same comment as above. > > > /* Init Rx bds */ > > bd = uccs->rx_bd = qe_muram_addr(uccs->rx_base_offset); > > for (i = 0; i < us_info->rx_bd_ring_len - 1; i++) { > > /* set bd status and length */ > > - qe_iowrite32be(0, (u32 *)bd); > > + qe_iowrite32be(0, (u32 __iomem *)bd); > > Same. > > > /* clear bd buffer */ > > qe_iowrite32be(0, &bd->buf); > > bd++; > > } > > /* for last BD set Wrap bit */ > > - qe_iowrite32be(cpu_to_be32(R_W), (u32 *)bd); > > + qe_iowrite32be(R_W, (u32 __iomem *)bd); > > Same. > > > qe_iowrite32be(0, &bd->buf); > > > > /* Set GUMR (For more details see the hardware spec.). */ > > @@ -273,8 +266,8 @@ int ucc_slow_init(struct ucc_slow_info * us_info, > > struct ucc_slow_private ** ucc > > qe_iowrite32be(gumr, &us_regs->gumr_h); > > > > /* gumr_l */ > > - gumr = us_info->tdcr | us_info->rdcr | us_info->tenc | us_info->renc | > > - us_info->diag | us_info->mode; > > + gumr = (u32)us_info->tdcr | (u32)us_info->rdcr | (u32)us_info->tenc | > > +(u32)us_info->renc | (u32)us_info->diag | (u32)us_info->mode; > > Are the tdcr, rdcr, tenc, renc fields actually set anywhere (the same > for the diag and mode, but word-grepping for those give way too many > false positives)? They seem to be a somewhat pointless split out of the > bitfields of gumr_l, and not populated anywhere?. That's not directly > related to this patch, of course, but getting rid of them first (if they > are indeed completely unused) might make the sparse cleanup a little > simpler. I would agree with you that is not neccessary to create different enum types for these bits in the same register in the first place. But I would like to prevent aggressive cleanups of this driver for old hardware that is becoming harder and harder to be thoroughly tested right now. These fields are probably not used by the in tree ucc_uart driver right now. But as a generic library for QE, I think it is harmless to keep these simple code there for potentially customized version of the uart driver or other ucc slow drivers. Regards, Leo
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
Borislav Petkov writes: > On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote: >> On 3/17/20 4:18 AM, Borislav Petkov wrote: >> > Back then when the whole SME machinery started getting mainlined, it >> > was agreed that for simplicity, clarity and sanity's sake, the terms >> > denoting encrypted and not-encrypted memory should be "encrypted" and >> > "decrypted". And the majority of the code sticks to that convention >> > except those two. So rename them. >> >> Don't "unencrypted" and "decrypted" mean different things? >> >> Unencrypted to me means "encryption was never used for this data". >> >> Decrypted means "this was/is encrypted but here is a plaintext copy". > > Maybe but linguistical semantics is not the point here. > > The idea is to represent a "binary" concept of memory being encrypted > or memory being not encrypted. And at the time we decided to use > "encrypted" and "decrypted" for those two things. > > Do you see the need to differentiate a third "state", so to speak, of > memory which was never encrypted? I think so. encrypted data is something you can't use without having the key decrypted data is the plaintext copy of something encrypted, so it might be of sensible nature. unencrypted data can still be sensible, but nothing ever bothered to encrypt it in the first place. So having this distinction is useful in terms of setting the context straight. Thanks, tglx
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
On Tue, 2020-03-17 at 14:24 -0700, Dave Hansen wrote: > On 3/17/20 2:06 PM, Borislav Petkov wrote: > > On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote: > > > On 3/17/20 4:18 AM, Borislav Petkov wrote: > > > > Back then when the whole SME machinery started getting mainlined, it > > > > was agreed that for simplicity, clarity and sanity's sake, the terms > > > > denoting encrypted and not-encrypted memory should be "encrypted" and > > > > "decrypted". And the majority of the code sticks to that convention > > > > except those two. So rename them. > > > Don't "unencrypted" and "decrypted" mean different things? > > > > > > Unencrypted to me means "encryption was never used for this data". > > > > > > Decrypted means "this was/is encrypted but here is a plaintext copy". > > Maybe but linguistical semantics is not the point here. > > > > The idea is to represent a "binary" concept of memory being encrypted > > or memory being not encrypted. And at the time we decided to use > > "encrypted" and "decrypted" for those two things. > > Yeah, agreed. We're basically trying to name "!encrypted". > > > Do you see the need to differentiate a third "state", so to speak, of > > memory which was never encrypted? > > No, there are just two states. I just think the "!encrypted" case > should not be called "decrypted". Nor do I, it's completely misleading.
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
On Tue, Mar 17, 2020 at 02:24:59PM -0700, Dave Hansen wrote: > No, there are just two states. I just think the "!encrypted" case > should not be called "decrypted". Yeah, we suck at naming - news at 11! :-) I believe we even considered things like "encrypted" vs "clear" but that sucked too. ;-\ In any case, that ship has sailed now and having two as differently as possible looking words to denote the two "states" should be good enough for our purposes... Oh well. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
On 3/17/20 2:06 PM, Borislav Petkov wrote: > On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote: >> On 3/17/20 4:18 AM, Borislav Petkov wrote: >>> Back then when the whole SME machinery started getting mainlined, it >>> was agreed that for simplicity, clarity and sanity's sake, the terms >>> denoting encrypted and not-encrypted memory should be "encrypted" and >>> "decrypted". And the majority of the code sticks to that convention >>> except those two. So rename them. >> Don't "unencrypted" and "decrypted" mean different things? >> >> Unencrypted to me means "encryption was never used for this data". >> >> Decrypted means "this was/is encrypted but here is a plaintext copy". > Maybe but linguistical semantics is not the point here. > > The idea is to represent a "binary" concept of memory being encrypted > or memory being not encrypted. And at the time we decided to use > "encrypted" and "decrypted" for those two things. Yeah, agreed. We're basically trying to name "!encrypted". > Do you see the need to differentiate a third "state", so to speak, of > memory which was never encrypted? No, there are just two states. I just think the "!encrypted" case should not be called "decrypted".
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
On Tue, Mar 17, 2020 at 01:35:12PM -0700, Dave Hansen wrote: > On 3/17/20 4:18 AM, Borislav Petkov wrote: > > Back then when the whole SME machinery started getting mainlined, it > > was agreed that for simplicity, clarity and sanity's sake, the terms > > denoting encrypted and not-encrypted memory should be "encrypted" and > > "decrypted". And the majority of the code sticks to that convention > > except those two. So rename them. > > Don't "unencrypted" and "decrypted" mean different things? > > Unencrypted to me means "encryption was never used for this data". > > Decrypted means "this was/is encrypted but here is a plaintext copy". Maybe but linguistical semantics is not the point here. The idea is to represent a "binary" concept of memory being encrypted or memory being not encrypted. And at the time we decided to use "encrypted" and "decrypted" for those two things. Do you see the need to differentiate a third "state", so to speak, of memory which was never encrypted? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] treewide: Rename "unencrypted" to "decrypted"
On 3/17/20 4:18 AM, Borislav Petkov wrote: > Back then when the whole SME machinery started getting mainlined, it > was agreed that for simplicity, clarity and sanity's sake, the terms > denoting encrypted and not-encrypted memory should be "encrypted" and > "decrypted". And the majority of the code sticks to that convention > except those two. So rename them. Don't "unencrypted" and "decrypted" mean different things? Unencrypted to me means "encryption was never used for this data". Decrypted means "this was/is encrypted but here is a plaintext copy". This, for instance: > +++ b/kernel/dma/direct.c > @@ -26,7 +26,7 @@ unsigned int zone_dma_bits __ro_after_init = 24; > static inline dma_addr_t phys_to_dma_direct(struct device *dev, > phys_addr_t phys) > { > - if (force_dma_unencrypted(dev)) > + if (force_dma_decrypted(dev)) > return __phys_to_dma(dev, phys); is referring to DMA that is not and never was encrypted. It's skipping the encryption altogether. There's no act of "decryption" anywhere. This, on the other hand, seems named wrong to me: > /* > * Macros to add or remove encryption attribute > */ > #define pgprot_encrypted(prot) __pgprot(__sme_set(pgprot_val(prot))) > #define pgprot_decrypted(prot) __pgprot(__sme_clr(pgprot_val(prot))) This seems like it would be better named pgprot_unencrypted().
Re: [PATCH V7 09/14] powerpc/vas: Update CSB and notify process for fault CRBs
On Tue, 2020-03-17 at 16:28 +1100, Michael Ellerman wrote: > Haren Myneni writes: > > For each fault CRB, update fault address in CRB (fault_storage_addr) > > and translation error status in CSB so that user space can touch the > > fault address and resend the request. If the user space passed invalid > > CSB address send signal to process with SIGSEGV. > > > > Signed-off-by: Sukadev Bhattiprolu > > Signed-off-by: Haren Myneni > > --- > > arch/powerpc/platforms/powernv/vas-fault.c | 114 > > + > > 1 file changed, 114 insertions(+) > > > > diff --git a/arch/powerpc/platforms/powernv/vas-fault.c > > b/arch/powerpc/platforms/powernv/vas-fault.c > > index 1c6d5cc..751ce48 100644 > > --- a/arch/powerpc/platforms/powernv/vas-fault.c > > +++ b/arch/powerpc/platforms/powernv/vas-fault.c > > @@ -11,6 +11,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -26,6 +27,118 @@ > > #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) > > > > /* > > + * Update the CSB to indicate a translation error. > > + * > > + * If we are unable to update the CSB means copy_to_user failed due to > > + * invalid csb_addr, send a signal to the process. > > + * > > + * Remaining settings in the CSB are based on wait_for_csb() of > > + * NX-GZIP. > > + */ > > +static void update_csb(struct vas_window *window, > > + struct coprocessor_request_block *crb) > > +{ > > + int rc; > > + struct pid *pid; > > + void __user *csb_addr; > > + struct task_struct *tsk; > > + struct kernel_siginfo info; > > + struct coprocessor_status_block csb; > > csb is on the stack, and later copied to user, which is a risk for > creating an infoleak. > > Also please use reverse Christmas tree layout for your variables. > > > + > > + /* > > +* NX user space windows can not be opened for task->mm=NULL > > +* and faults will not be generated for kernel requests. > > +*/ > > + if (!window->mm || !window->user_win) > > + return; > > If that's a should-never-happen condition then should it do a > WARN_ON_ONCE() rather than silently returning? Will add WARN_ON > > > + csb_addr = (void __user *)be64_to_cpu(crb->csb_addr); > > + > > + csb.cc = CSB_CC_TRANSLATION; > > + csb.ce = CSB_CE_TERMINATION; > > + csb.cs = 0; > > + csb.count = 0; > > + > > + /* > > +* NX operates and returns in BE format as defined CRB struct. > > +* So return fault_storage_addr in BE as NX pastes in FIFO and > > +* expects user space to convert to CPU format. > > +*/ > > + csb.address = crb->stamp.nx.fault_storage_addr; > > + csb.flags = 0; > > I'm pretty sure this has initialised all the fields of csb. > > But, I'd still be much happier if you zeroed the whole struct to begin > with, that way we know for sure we can't leak any uninitialised bytes to > userspace. It's only 16 bytes so it shouldn't add any noticeable > overhead. Sure, will initialize csb > > > + > > + pid = window->pid; > > + tsk = get_pid_task(pid, PIDTYPE_PID); > > + /* > > +* Send window will be closed after processing all NX requests > > +* and process exits after closing all windows. In multi-thread > > +* applications, thread may not exists, but does not close FD > > +* (means send window) upon exit. Parent thread (tgid) can use > > +* and close the window later. > > +* pid and mm references are taken when window is opened by > > +* process (pid). So tgid is used only when child thread opens > > +* a window and exits without closing it in multithread tasks. > > +*/ > > + if (!tsk) { > > + pid = window->tgid; > > + tsk = get_pid_task(pid, PIDTYPE_PID); > > + /* > > +* Parent thread will be closing window during its exit. > > +* So should not get here. > > +*/ > > + if (!tsk) > > + return; > > Similar question on WARN_ON_ONCE() Yes, we can add WARN_ON > > > + } > > + > > + /* Return if the task is exiting. */ > > Why? Just because it's no use? It's racy isn't it, so it can't be for > correctness? Yes process is exiting and no need to update CSB. We release the task->usage refcount after copy_to_user(). > > > + if (tsk->flags & PF_EXITING) { > > + put_task_struct(tsk); > > + return; > > + } > > + > > + use_mm(window->mm); > > There's no check that csb_addr is actually pointing into userspace, but > copy_to_user() does it for you. > > > + rc = copy_to_user(csb_addr, &csb, sizeof(csb)); > > + /* > > +* User space polls on csb.flags (first byte). So add barrier > > +* then copy first byte with csb flags update. > > +*/ > > + smp_mb(); > > You only need to order the stores above vs the store below to csb.flags. > So you should only need an smp_wmb() here. Sure, will add if (!rc) { csb.flags = CSB_V; smp_mb(); rc = copy_to_user(csb_addr, &csb
Re: [PATCH V7 08/14] powerpc/vas: Take reference to PID and mm for user space windows
On Tue, 2020-03-17 at 15:09 +1100, Michael Ellerman wrote: > Haren Myneni writes: > > Process close windows after its requests are completed. In multi-thread > > applications, child can open a window but release FD will not be called > > upon its exit. Parent thread will be closing it later upon its exit. > > What if the parent exits first? Thanks for the review. If the parent exists, child thread will close the window when it exits. So we should not get the case where window is still open. > > > The parent can also send NX requests with this window and NX can > > generate page faults. After kernel handles the page fault, send > > signal to process by using PID if CSB address is invalid. Parent > > thread will not receive signal since its PID is different from the one > > saved in vas_window. So use tgid in case if the task for the pid saved > > in window is not running and send signal to its parent. > > > > To prevent reusing the pid until the window closed, take reference to > > pid and task mm. > > That text is all very dense. Can you please flesh it out and reword it > to clearly spell out what's going on in much more detail. Sure, will update the commit description. > > > > diff --git a/arch/powerpc/platforms/powernv/vas-window.c > > b/arch/powerpc/platforms/powernv/vas-window.c > > index a45d81d..7587258 100644 > > --- a/arch/powerpc/platforms/powernv/vas-window.c > > +++ b/arch/powerpc/platforms/powernv/vas-window.c > > @@ -1266,8 +1300,17 @@ int vas_win_close(struct vas_window *window) > > poll_window_castout(window); > > > > /* if send window, drop reference to matching receive window */ > > - if (window->tx_win) > > + if (window->tx_win) { > > + if (window->user_win) { > > + /* Drop references to pid and mm */ > > + put_pid(window->pid); > > + if (window->mm) { > > + mmdrop(window->mm); > > + mm_context_remove_copro(window->mm); > > That seems backward. Once you drop the reference the mm can be freed > can't it? Yes, will change. > > > + } > > + } > > put_rx_win(window->rxwin); > > + } > > > > vas_window_free(window); > > cheers
Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev, > #ifdef CONFIG_MEMORY_HOTPLUG > set_online_page_callback(&hv_online_page); > register_memory_notifier(&hv_memory_nb); > + init_completion(&dm_device.ol_waitevent); I'll move this one line up. > #endif > > hv_set_drvdata(dev, &dm_device); > -- Thanks, David / dhildenb
Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
On 3/17/20 1:04 AM, Christophe Leroy wrote: > When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the > following build failure is encoutered: > > In file included from arch/powerpc/mm/fault.c:33:0: > ./include/linux/hugetlb.h: In function 'hstate_inode': > ./include/linux/hugetlb.h:477:9: error: implicit declaration of function > 'HUGETLBFS_SB' [-Werror=implicit-function-declaration] > return HUGETLBFS_SB(i->i_sb)->hstate; > ^ > ./include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have > 'int') > return HUGETLBFS_SB(i->i_sb)->hstate; > ^ > > Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE. > > Reported-by: kbuild test robot > Link: https://patchwork.ozlabs.org/patch/1255548/#2386036 > Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes") > Cc: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy As hugetlb.h evolved over time, I suspect nobody imagined a configuration with CONFIG_HUGETLB_PAGE and not CONFIG_HUGETLBFS. This patch does address the build issues. So, Reviewed-by: Mike Kravetz However, there are many definitions in that file not behind #ifdef CONFIG_HUGETLBFS that make no sense unless CONFIG_HUGETLBFS is defined. Such cleanup is way beyond the scope of this patch/effort. I will add it to the list of hugetlb/hugetlbfs things that can be cleaned up. -- Mike Kravetz
Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
On 3/17/20 9:47 AM, Christophe Leroy wrote: > > > Le 17/03/2020 à 17:40, Mike Kravetz a écrit : >> On 3/17/20 1:43 AM, Christophe Leroy wrote: >>> >>> >>> Le 17/03/2020 à 09:25, Baoquan He a écrit : On 03/17/20 at 08:04am, Christophe Leroy wrote: > When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the > following build failure is encoutered: From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS? I could misunderstand the def_bool, please correct me if I am wrong. >>> >>> AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default >>> HUGETLB_PAGE is not selected when HUGETLBFS is not. But it is still >>> possible for an arch to select HUGETLB_PAGE without selecting HUGETLBFS >>> when it uses huge pages for other purpose than hugetlb file system. >>> >> >> Hi Christophe, >> >> Do you actually have a use case/example of using hugetlb pages without >> hugetlbfs? I can understand that there are some use cases which never >> use the filesystem interface. However, hugetlb support is so intertwined >> with hugetlbfs, I am thinking there would be issues trying to use them >> separately. I will look into this further. >> > > Hi Mike, > > Series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620 > > And especially patch 39 to 41. > Ah, ok. You are simply using a few interfaces in the hugetlb header files. The huge pages created in your mappings are not PageHuge() pages. -- Mike Kravetz
Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
Le 17/03/2020 à 17:40, Mike Kravetz a écrit : On 3/17/20 1:43 AM, Christophe Leroy wrote: Le 17/03/2020 à 09:25, Baoquan He a écrit : On 03/17/20 at 08:04am, Christophe Leroy wrote: When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the following build failure is encoutered: From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS? I could misunderstand the def_bool, please correct me if I am wrong. AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default HUGETLB_PAGE is not selected when HUGETLBFS is not. But it is still possible for an arch to select HUGETLB_PAGE without selecting HUGETLBFS when it uses huge pages for other purpose than hugetlb file system. Hi Christophe, Do you actually have a use case/example of using hugetlb pages without hugetlbfs? I can understand that there are some use cases which never use the filesystem interface. However, hugetlb support is so intertwined with hugetlbfs, I am thinking there would be issues trying to use them separately. I will look into this further. Hi Mike, Series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620 And especially patch 39 to 41. Thanks Christophe
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
On 3/17/20 5:25 PM, Srikar Dronamraju wrote: > * Vlastimil Babka [2020-03-17 16:56:04]: > >> >> I wonder why do you get a memory leak while Sachin in the same situation [1] >> gets a crash? I don't understand anything anymore. > > Sachin was testing on linux-next which has Kirill's patch which modifies > slub to use kmalloc_node instead of kmalloc. While Bharata is testing on > upstream, which doesn't have this. Yes, that Kirill's patch was about the memcg shrinker map allocation. But the patch hunk that Bharata posted as a "hack" that fixes the problem, it follows that there has to be something else that calls kmalloc_node(node) where node is one that doesn't have present pages. He mentions alloc_fair_sched_group() which has: for_each_possible_cpu(i) { cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, cpu_to_node(i)); ... se = kzalloc_node(sizeof(struct sched_entity), GFP_KERNEL, cpu_to_node(i)); I assume one of these structs is 1k and other 512 bytes (rounded) and that for some possible cpu's cpu_to_node(i) will be 0, which has no present pages. And as Bharata pasted, node_to_mem_node(0) = 0 So this looks like the same scenario, but it doesn't crash? Is the node 0 actually online here, and/or does it have N_NORMAL_MEMORY state? >> >> [1] >> https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/ >> >
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
* Vlastimil Babka [2020-03-17 14:34:25]: > > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t > > flags, int node, > > struct kmem_cache_cpu *c) > > { > > void *object; > > - int searchnode = node; > > > > - if (node == NUMA_NO_NODE) > > - searchnode = numa_mem_id(); > > - else if (!node_present_pages(node)) > > - searchnode = node_to_mem_node(node); > > - > > - object = get_partial_node(s, get_node(s, searchnode), c, flags); > > + object = get_partial_node(s, get_node(s, node), c, flags); > > if (object || node != NUMA_NO_NODE)>return object; > > > > return get_any_partial(s, flags, c); > > I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the > hunk below), thus the get_any_partial() call becomes dead code? Very true. Would it be okay if we remove the node != NUMA_NO_NODE if (object || node != NUMA_NO_NODE) return object; will now become if (object) return object; -- Thanks and Regards Srikar Dronamraju
Re: [PATCH] mm/hugetlb: Fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
On 3/17/20 1:43 AM, Christophe Leroy wrote: > > > Le 17/03/2020 à 09:25, Baoquan He a écrit : >> On 03/17/20 at 08:04am, Christophe Leroy wrote: >>> When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the >>> following build failure is encoutered: >> >> From the definition of HUGETLB_PAGE, isn't it relying on HUGETLBFS? >> I could misunderstand the def_bool, please correct me if I am wrong. > > AFAIU, it means that HUGETLBFS rely on HUGETLB_PAGE, by default HUGETLB_PAGE > is not selected when HUGETLBFS is not. But it is still possible for an arch > to select HUGETLB_PAGE without selecting HUGETLBFS when it uses huge pages > for other purpose than hugetlb file system. > Hi Christophe, Do you actually have a use case/example of using hugetlb pages without hugetlbfs? I can understand that there are some use cases which never use the filesystem interface. However, hugetlb support is so intertwined with hugetlbfs, I am thinking there would be issues trying to use them separately. I will look into this further. -- Mike Kravetz
Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
On 17.03.20 17:29, Vitaly Kuznetsov wrote: > David Hildenbrand writes: > >> We get the MEM_ONLINE notifier call if memory is added right from the >> kernel via add_memory() or later from user space. >> >> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt >> mechanism (->done) for that. Initialize the wait event only once and >> reinitialize before adding memory. Unconditionally call complete() and >> wait_for_completion_timeout(). >> >> If there are no waiters, complete() will only increment ->done - which >> will be reset by reinit_completion(). If complete() has already been >> called, wait_for_completion_timeout() will not wait. >> >> There is still the chance for a small race between concurrent >> reinit_completion() and complete(). If complete() wins, we would not >> wait - which is tolerable (and the race exists in current code as >> well). > > How can we see concurent reinit_completion() and complete()? Obvioulsy, > we are not onlining new memory in kernel and hv_mem_hot_add() calls are > serialized, we're waiting up to 5*HZ for the added block to come online > before proceeding to the next one. Or do you mean we actually hit this > 5*HZ timeout, proceeded to the next block and immediately after > reinit_completion() we saw complete() for the previously added block? Yes exactly - or if an admin manually offlines+re-onlines a random memory block. > This is tolerable indeed, we're making forward progress (and this all is > 'best effort' anyway). Exactly my thoughts. [...] > > Reviewed-by: Vitaly Kuznetsov > Thanks! -- Thanks, David / dhildenb
Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
David Hildenbrand writes: > We get the MEM_ONLINE notifier call if memory is added right from the > kernel via add_memory() or later from user space. > > Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt > mechanism (->done) for that. Initialize the wait event only once and > reinitialize before adding memory. Unconditionally call complete() and > wait_for_completion_timeout(). > > If there are no waiters, complete() will only increment ->done - which > will be reset by reinit_completion(). If complete() has already been > called, wait_for_completion_timeout() will not wait. > > There is still the chance for a small race between concurrent > reinit_completion() and complete(). If complete() wins, we would not > wait - which is tolerable (and the race exists in current code as > well). How can we see concurent reinit_completion() and complete()? Obvioulsy, we are not onlining new memory in kernel and hv_mem_hot_add() calls are serialized, we're waiting up to 5*HZ for the added block to come online before proceeding to the next one. Or do you mean we actually hit this 5*HZ timeout, proceeded to the next block and immediately after reinit_completion() we saw complete() for the previously added block? This is tolerable indeed, we're making forward progress (and this all is 'best effort' anyway). > > Note: We only wait for "some" memory to get onlined, which seems to be > good enough for now. > > Cc: "K. Y. Srinivasan" > Cc: Haiyang Zhang > Cc: Stephen Hemminger > Cc: Wei Liu > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: "Rafael J. Wysocki" > Cc: Baoquan He > Cc: Wei Yang > Cc: Vitaly Kuznetsov > Cc: linux-hyp...@vger.kernel.org > Signed-off-by: David Hildenbrand > --- > drivers/hv/hv_balloon.c | 25 ++--- > 1 file changed, 10 insertions(+), 15 deletions(-) > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > index a02ce43d778d..af5e09f08130 100644 > --- a/drivers/hv/hv_balloon.c > +++ b/drivers/hv/hv_balloon.c > @@ -533,7 +533,6 @@ struct hv_dynmem_device { >* State to synchronize hot-add. >*/ > struct completion ol_waitevent; > - bool ha_waiting; > /* >* This thread handles hot-add >* requests from the host as well as notifying > @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, > unsigned long val, > switch (val) { > case MEM_ONLINE: > case MEM_CANCEL_ONLINE: > - if (dm_device.ha_waiting) { > - dm_device.ha_waiting = false; > - complete(&dm_device.ol_waitevent); > - } > + complete(&dm_device.ol_waitevent); > break; > > case MEM_OFFLINE: > @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned > long size, > has->covered_end_pfn += processed_pfn; > spin_unlock_irqrestore(&dm_device.ha_lock, flags); > > - init_completion(&dm_device.ol_waitevent); > - dm_device.ha_waiting = !memhp_auto_online; > + reinit_completion(&dm_device.ol_waitevent); > > nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); > ret = add_memory(nid, PFN_PHYS((start_pfn)), > @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, > unsigned long size, > } > > /* > - * Wait for the memory block to be onlined when memory onlining > - * is done outside of kernel (memhp_auto_online). Since the hot > - * add has succeeded, it is ok to proceed even if the pages in > - * the hot added region have not been "onlined" within the > - * allowed time. > + * Wait for memory to get onlined. If the kernel onlined the > + * memory when adding it, this will return directly. Otherwise, > + * it will wait for user space to online the memory. This helps > + * to avoid adding memory faster than it is getting onlined. As > + * adding succeeded, it is ok to proceed even if the memory was > + * not onlined in time. >*/ > - if (dm_device.ha_waiting) > - wait_for_completion_timeout(&dm_device.ol_waitevent, > - 5*HZ); > + wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ); > post_status(&dm_device); > } > } > @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev, > #ifdef CONFIG_MEMORY_HOTPLUG > set_online_page_callback(&hv_online_page); > register_memory_notifier(&hv_memory_nb); > + init_completion(&dm_device.ol_waitevent); > #endif > > hv_set_drvdata(dev, &dm_device); Reviewed-by: Vitaly Kuznetsov -- Vitaly
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
* Vlastimil Babka [2020-03-17 16:56:04]: > > I wonder why do you get a memory leak while Sachin in the same situation [1] > gets a crash? I don't understand anything anymore. Sachin was testing on linux-next which has Kirill's patch which modifies slub to use kmalloc_node instead of kmalloc. While Bharata is testing on upstream, which doesn't have this. > > [1] > https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/ > -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v1 16/46] powerpc/mm: Allocate static page tables for fixmap
On Tue, Mar 17, 2020 at 03:38:46PM +0100, Christophe Leroy wrote: > > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c > > index f62de06e3d07..9934659cb871 100644 > > --- a/arch/powerpc/mm/pgtable_32.c > > +++ b/arch/powerpc/mm/pgtable_32.c > > @@ -29,11 +29,27 @@ > > #include > > #include > > #include > > +#include > > #include > > extern char etext[], _stext[], _sinittext[], _einittext[]; > > +static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data; > > Sparse reports this as a variable size array. This is definitely not. Gcc > properly sees it is an 8k table (2 pages). Yes, thing is that FIXMAP_PTE_SIZE is not that constant since it uses __builtin_ffs() (via PTE_SHIFT / PTE_T_LOG2). Nevertheless, since Sparse v0.6.1 (released in October) accepts these in constant expressions, like GCC does. -- Luc
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
On 3/17/20 12:53 PM, Bharata B Rao wrote: > On Tue, Mar 17, 2020 at 02:56:28PM +0530, Bharata B Rao wrote: >> Case 1: 2 node NUMA, node0 empty >> >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: >> node 0 size: 0 MB >> node 0 free: 0 MB >> node 1 cpus: 0 1 2 3 4 5 6 7 >> node 1 size: 16294 MB >> node 1 free: 15453 MB >> node distances: >> node 0 1 >> 0: 10 40 >> 1: 40 10 >> >> diff --git a/mm/slub.c b/mm/slub.c >> index 17dc00e33115..888e4d245444 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1971,10 +1971,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t >> flags, int node, >> void *object; >> int searchnode = node; >> >> -if (node == NUMA_NO_NODE) >> +if (node == NUMA_NO_NODE || !node_present_pages(node)) >> searchnode = numa_mem_id(); >> -else if (!node_present_pages(node)) >> -searchnode = node_to_mem_node(node); > > For the above topology, I see this: > > node_to_mem_node(1) = 1 > node_to_mem_node(0) = 0 > node_to_mem_node(NUMA_NO_NODE) = 0 > > Looks like the last two cases (returning memory-less node 0) is the > problem here? I wonder why do you get a memory leak while Sachin in the same situation [1] gets a crash? I don't understand anything anymore. [1] https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/ > Regards, > Bharata. > >
Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
On 3/17/20 11:22 AM, Sachin Sant wrote: On 17-Mar-2020, at 6:38 PM, Stefan Berger wrote: From: Stefan Berger This patch fixes the following problem when the ibmvtpm driver is built as a module: ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefined! make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1 make: *** [Makefile:1298: modules] Error 2 Signed-off-by: Stefan Berger Reported-by: Sachin Sant Tested-by: Sachin Sant Thank you! Stefan
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
On 3/17/20 3:51 PM, Srikar Dronamraju wrote: > * Vlastimil Babka [2020-03-17 14:53:26]: > >> >> > >> >> > Mitigate this by allocating the new slab from the node_numa_mem. >> >> >> >> Are you sure this is really needed and the other 3 patches are not enough >> >> for >> >> the current SLUB code to work as needed? It seems you are changing the >> >> semantics >> >> here... >> >> >> > >> > The other 3 patches are not enough because we don't carry the searchnode >> > when the actual alloc_pages_node gets called. >> > >> > With only the 3 patches, we see the above Panic, its signature is slightly >> > different from what Sachin first reported and which I have carried in 1st >> > patch. >> >> Ah, I see. So that's the missing pgdat after your series [1] right? > > Yes the pgdat would be missing after my cpuless, memoryless node patchset. > However.. >> >> That sounds like an argument for Michal's suggestions that pgdats exist and >> have >> correctly populated zonelists for all possible nodes. > > Only the first patch in this series would be affected by pgdat existing or > not. Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages > would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is > cpuless it would return 0. I thought the point was to return 1 for node 0. > If we pass this node 0 (which is memoryless/cpuless) to > alloc_pages_node. Please note I am only setting node_numa_mem only > for offline nodes. However we could change this to set for all offline and > memoryless nodes. That would indeed make sense. But I guess that alloc_pages would still crash as the result of numa_to_mem_node() is not passed down to alloc_pages() without this patch. In __alloc_pages_node() we currently have "The node must be valid and online" so offline nodes don't have zonelists. Either they get them, or we indeed need something like this patch. But in order to not make get_any_partial() dead code, the final replacement of invalid node with a valid one should be done in alloc_slab_page() I guess? >> node_to_mem_node() could be just a shortcut for the first zone's node in the >> zonelist, so that fallback follows the topology. >> >> [1] >> https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e >> > > >
Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
> On 17-Mar-2020, at 6:38 PM, Stefan Berger wrote: > > From: Stefan Berger > > This patch fixes the following problem when the ibmvtpm driver > is built as a module: > > ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] > undefined! > make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1 > make: *** [Makefile:1298: modules] Error 2 > > Signed-off-by: Stefan Berger Reported-by: Sachin Sant Tested-by: Sachin Sant Thanks -Sachin
Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"
On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote: SBIP > +static int metricgroup__add_metric_runtime_param(struct strbuf *events, > + struct list_head *group_list, struct pmu_event *pe) > +{ > + int i, count; > + int ret = -EINVAL; > + > + count = arch_get_runtimeparam(); > + > + /* This loop is added to create multiple > + * events depend on count value and add > + * those events to group_list. > + */ > + > + for (i = 0; i < count; i++) { > + const char **ids; > + int idnum; > + struct egroup *eg; > + char value[PATH_MAX]; > + > + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, i) < > 0) > + return ret; > + > + if (events->len > 0) > + strbuf_addf(events, ","); > + > + if (metricgroup__has_constraint(pe)) > + metricgroup__add_metric_non_group(events, ids, idnum); > + else > + metricgroup__add_metric_weak_group(events, ids, idnum); > + > + eg = malloc(sizeof(*eg)); > + if (!eg) { > + ret = -ENOMEM; > + return ret; > + } > + > + sprintf(value, "%s%c%d", pe->metric_name, '_', i); > + eg->ids = ids; > + eg->idnum = idnum; > + eg->metric_name = strdup(value); > + if (!eg->metric_name) { > + ret = -ENOMEM; > + return ret; > + } > + > + eg->metric_expr = pe->metric_expr; > + eg->metric_unit = pe->unit; > + list_add_tail(&eg->nd, group_list); > + ret = 0; > + > + if (ret != 0) > + break; again, this is part of metricgroup__add_metric_param no? why not use it? jirka
Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function
On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote: > This patch refactor metricgroup__add_metric function where > some part of it move to function metricgroup__add_metric_param. > No logic change. > > Signed-off-by: Kajol Jain > --- > tools/perf/util/metricgroup.c | 63 +-- > 1 file changed, 38 insertions(+), 25 deletions(-) > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index c3a8c701609a..b4919bcfbd8b 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event > *pe) > return false; > } > > +static int metricgroup__add_metric_param(struct strbuf *events, > + struct list_head *group_list, struct pmu_event *pe) > +{ > + > + const char **ids; > + int idnum; > + struct egroup *eg; > + int ret = -EINVAL; > + > + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0) > + return ret; > + > + if (events->len > 0) > + strbuf_addf(events, ","); > + > + if (metricgroup__has_constraint(pe)) > + metricgroup__add_metric_non_group(events, ids, idnum); > + else > + metricgroup__add_metric_weak_group(events, ids, idnum); > + > + eg = malloc(sizeof(*eg)); > + if (!eg) > + ret = -ENOMEM; > + > + eg->ids = ids; > + eg->idnum = idnum; > + eg->metric_name = pe->metric_name; > + eg->metric_expr = pe->metric_expr; > + eg->metric_unit = pe->unit; > + list_add_tail(&eg->nd, group_list); > + ret = 0; > + > + return ret; return 0; jirka
Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function
On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote: > This patch refactor metricgroup__add_metric function where > some part of it move to function metricgroup__add_metric_param. > No logic change. > > Signed-off-by: Kajol Jain > --- > tools/perf/util/metricgroup.c | 63 +-- > 1 file changed, 38 insertions(+), 25 deletions(-) > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index c3a8c701609a..b4919bcfbd8b 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event > *pe) > return false; > } > > +static int metricgroup__add_metric_param(struct strbuf *events, > + struct list_head *group_list, struct pmu_event *pe) > +{ > + > + const char **ids; > + int idnum; > + struct egroup *eg; > + int ret = -EINVAL; > + > + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0) > + return ret; > + > + if (events->len > 0) > + strbuf_addf(events, ","); > + > + if (metricgroup__has_constraint(pe)) > + metricgroup__add_metric_non_group(events, ids, idnum); > + else > + metricgroup__add_metric_weak_group(events, ids, idnum); > + > + eg = malloc(sizeof(*eg)); > + if (!eg) > + ret = -ENOMEM; ??? you need to return in here, eg is NULL > + > + eg->ids = ids; > + eg->idnum = idnum; > + eg->metric_name = pe->metric_name; > + eg->metric_expr = pe->metric_expr; > + eg->metric_unit = pe->unit; > + list_add_tail(&eg->nd, group_list); > + ret = 0; > + > + return ret; > +} > + > static int metricgroup__add_metric(const char *metric, struct strbuf *events, > struct list_head *group_list) > { > @@ -493,35 +528,13 @@ static int metricgroup__add_metric(const char *metric, > struct strbuf *events, > continue; > if (match_metric(pe->metric_group, metric) || > match_metric(pe->metric_name, metric)) { > - const char **ids; > - int idnum; > - struct egroup *eg; > > pr_debug("metric expr %s for %s\n", pe->metric_expr, > pe->metric_name); > > - if (expr__find_other(pe->metric_expr, > - NULL, &ids, &idnum) < 0) > + ret = metricgroup__add_metric_param(events, > + group_list, pe); > + if (ret == -EINVAL) > continue; previous code did 'continue' on ret < 0, why just -EINVAL now? jirka
Re: [PATCH v5 08/11] perf/tools: Refactoring metricgroup__add_metric function
On Tue, Mar 17, 2020 at 11:53:30AM +0530, Kajol Jain wrote: > This patch refactor metricgroup__add_metric function where > some part of it move to function metricgroup__add_metric_param. > No logic change. can't compile this change: [jolsa@krava perf]$ make JOBS=1 BUILD: Doing 'make -j1' parallel build CC util/metricgroup.o util/metricgroup.c: In function ‘metricgroup__add_metric_param’: util/metricgroup.c:486:6: error: too many arguments to function ‘expr__find_other’ 486 | if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0) | ^~~~ In file included from util/metricgroup.c:14: util/expr.h:25:5: note: declared here 25 | int expr__find_other(const char *expr, const char *one, const char ***other, | ^~~~ mv: cannot stat 'util/.metricgroup.o.tmp': No such file or directory make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/metricgroup.o] Error 1 make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2 make[2]: *** [Makefile.perf:616: perf-in.o] Error 2 make[1]: *** [Makefile.perf:225: sub-make] Error 2 make: *** [Makefile:70: all] Error 2 jirka > > Signed-off-by: Kajol Jain > --- > tools/perf/util/metricgroup.c | 63 +-- > 1 file changed, 38 insertions(+), 25 deletions(-) > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index c3a8c701609a..b4919bcfbd8b 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -474,6 +474,41 @@ static bool metricgroup__has_constraint(struct pmu_event > *pe) > return false; > } > > +static int metricgroup__add_metric_param(struct strbuf *events, > + struct list_head *group_list, struct pmu_event *pe) > +{ > + > + const char **ids; > + int idnum; > + struct egroup *eg; > + int ret = -EINVAL; > + > + if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, 1) < 0) > + return ret; > + > + if (events->len > 0) > + strbuf_addf(events, ","); > + > + if (metricgroup__has_constraint(pe)) > + metricgroup__add_metric_non_group(events, ids, idnum); > + else > + metricgroup__add_metric_weak_group(events, ids, idnum); > + > + eg = malloc(sizeof(*eg)); > + if (!eg) > + ret = -ENOMEM; > + > + eg->ids = ids; > + eg->idnum = idnum; > + eg->metric_name = pe->metric_name; > + eg->metric_expr = pe->metric_expr; > + eg->metric_unit = pe->unit; > + list_add_tail(&eg->nd, group_list); > + ret = 0; > + > + return ret; > +} > + > static int metricgroup__add_metric(const char *metric, struct strbuf *events, > struct list_head *group_list) > { > @@ -493,35 +528,13 @@ static int metricgroup__add_metric(const char *metric, > struct strbuf *events, > continue; > if (match_metric(pe->metric_group, metric) || > match_metric(pe->metric_name, metric)) { > - const char **ids; > - int idnum; > - struct egroup *eg; > > pr_debug("metric expr %s for %s\n", pe->metric_expr, > pe->metric_name); > > - if (expr__find_other(pe->metric_expr, > - NULL, &ids, &idnum) < 0) > + ret = metricgroup__add_metric_param(events, > + group_list, pe); > + if (ret == -EINVAL) > continue; > - if (events->len > 0) > - strbuf_addf(events, ","); > - > - if (metricgroup__has_constraint(pe)) > - metricgroup__add_metric_non_group(events, ids, > idnum); > - else > - metricgroup__add_metric_weak_group(events, ids, > idnum); > - > - eg = malloc(sizeof(struct egroup)); > - if (!eg) { > - ret = -ENOMEM; > - break; > - } > - eg->ids = ids; > - eg->idnum = idnum; > - eg->metric_name = pe->metric_name; > - eg->metric_expr = pe->metric_expr; > - eg->metric_unit = pe->unit; > - list_add_tail(&eg->nd, group_list); > - ret = 0; > } > } > return ret; > -- > 2.18.1 >
Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"
On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote: SNIP > diff --git a/tools/perf/arch/powerpc/util/header.c > b/tools/perf/arch/powerpc/util/header.c > index 3b4cdfc5efd6..dcc3c6ab2e67 100644 > --- a/tools/perf/arch/powerpc/util/header.c > +++ b/tools/perf/arch/powerpc/util/header.c > @@ -7,6 +7,8 @@ > #include > #include > #include "header.h" > +#include "metricgroup.h" > +#include > > #define mfspr(rn) ({unsigned long rval; \ >asm volatile("mfspr %0," __stringify(rn) \ > @@ -16,6 +18,8 @@ > #define PVR_VER(pvr)(((pvr) >> 16) & 0x) /* Version field */ > #define PVR_REV(pvr)(((pvr) >> 0) & 0x) /* Revison field */ > > +#define SOCKETS_INFO_FILE_PATH "/devices/hv_24x7/interface/sockets" > + > int > get_cpuid(char *buffer, size_t sz) > { > @@ -44,3 +48,9 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused) > > return bufp; > } > + > +int arch_get_runtimeparam(void) > +{ > + int count; > + return sysfs__read_int(SOCKETS_INFO_FILE_PATH, &count) < 0 ? 1 : count; is that SOCKETS_INFO_FILE_PATH define used later? if not please put the path directly as an argument to sysfs__read_int jirka
Re: [PATCH v5 09/11] perf/tools: Enhance JSON/metric infrastructure to handle "?"
On Tue, Mar 17, 2020 at 11:53:31AM +0530, Kajol Jain wrote: SNIP > diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h > index 0938ad166ece..7786829b048b 100644 > --- a/tools/perf/util/expr.h > +++ b/tools/perf/util/expr.h > @@ -17,12 +17,13 @@ struct expr_parse_ctx { > > struct expr_scanner_ctx { > int start_token; > + int expr__runtimeparam; no need for expr__ prefix in here.. jsut runtime_param > diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c > index 402af3e8d287..85ac6d913782 100644 > --- a/tools/perf/util/stat-shadow.c > +++ b/tools/perf/util/stat-shadow.c > @@ -336,7 +336,7 @@ void perf_stat__collect_metric_expr(struct evlist > *evsel_list) > metric_events = counter->metric_events; > if (!metric_events) { > if (expr__find_other(counter->metric_expr, > counter->name, > - &metric_names, > &num_metric_names) < 0) > + &metric_names, > &num_metric_names, 1) < 0) > continue; > > metric_events = calloc(sizeof(struct evsel *), > @@ -777,7 +777,15 @@ static void generic_metric(struct perf_stat_config > *config, > } > > if (!metric_events[i]) { > - if (expr__parse(&ratio, &pctx, metric_expr) == 0) { > + int param = 1; > + if (strstr(metric_expr, "?")) { > + char *tmp = strrchr(metric_name, '_'); > + > + tmp++; > + param = strtol(tmp, &tmp, 10); > + } so, if metric_expr contains '?' you go and search metric_name for '_' and use the number after '_' ... ugh.. what's the logic? I understand you create as many metrics as the magic runtime param tells you.. but what's the connection to this? could you please outline in the changelog or comment the whole scheme of how this all should work? like step by step on some simple example? thanks, jirka
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
* Vlastimil Babka [2020-03-17 14:53:26]: > >> > > >> > Mitigate this by allocating the new slab from the node_numa_mem. > >> > >> Are you sure this is really needed and the other 3 patches are not enough > >> for > >> the current SLUB code to work as needed? It seems you are changing the > >> semantics > >> here... > >> > > > > The other 3 patches are not enough because we don't carry the searchnode > > when the actual alloc_pages_node gets called. > > > > With only the 3 patches, we see the above Panic, its signature is slightly > > different from what Sachin first reported and which I have carried in 1st > > patch. > > Ah, I see. So that's the missing pgdat after your series [1] right? Yes the pgdat would be missing after my cpuless, memoryless node patchset. However.. > > That sounds like an argument for Michal's suggestions that pgdats exist and > have > correctly populated zonelists for all possible nodes. Only the first patch in this series would be affected by pgdat existing or not. Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is cpuless it would return 0. If we pass this node 0 (which is memoryless/cpuless) to alloc_pages_node. Please note I am only setting node_numa_mem only for offline nodes. However we could change this to set for all offline and memoryless nodes. > node_to_mem_node() could be just a shortcut for the first zone's node in the > zonelist, so that fallback follows the topology. > > [1] > https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e > -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v1 39/46] powerpc/8xx: Add a function to early map kernel via huge pages
Le 17/03/2020 à 02:39, kbuild test robot a écrit : Hi Christophe, Thank you for the patch! Yet something to improve: [auto build test ERROR on next-20200316] [cannot apply to powerpc/next v5.6-rc6 v5.6-rc5 v5.6-rc4 v5.6-rc6] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Christophe-Leroy/Use-hugepages-to-map-kernel-mem-on-8xx/20200317-065610 base:8548fd2f20ed19b0e8c0585b71fdfde1ae00ae3c config: powerpc-tqm8xx_defconfig (attached as .config) compiler: powerpc-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): In file included from arch/powerpc/mm/fault.c:33: include/linux/hugetlb.h: In function 'hstate_inode': include/linux/hugetlb.h:522:9: error: implicit declaration of function 'HUGETLBFS_SB'; did you mean 'HUGETLBFS_MAGIC'? [-Werror=implicit-function-declaration] 522 | return HUGETLBFS_SB(i->i_sb)->hstate; | ^~~~ | HUGETLBFS_MAGIC include/linux/hugetlb.h:522:30: error: invalid type argument of '->' (have 'int') 522 | return HUGETLBFS_SB(i->i_sb)->hstate; | ^~ cc1: all warnings being treated as errors hstate_inode() shouldn't use HUGETLBFS_SB() which CONFIG_HUGETLBFS is not set. Proposed fix at https://patchwork.ozlabs.org/patch/1256108/ [...] include/linux/hugetlb.h:522:30: error: invalid type argument of '->' (have 'int') 522 | return HUGETLBFS_SB(i->i_sb)->hstate; | ^~ At top level: arch/powerpc//mm/nohash/8xx.c:73:18: error: '__early_map_kernel_hugepage' defined but not used [-Werror=unused-function] 73 | static int __ref __early_map_kernel_hugepage(unsigned long va, phys_addr_t pa, | ^~~ cc1: all warnings being treated as errors This patch is a preparation patch. The function is not used yet, that's normal. Ok, it breaks bisectability. Should it be squashed with the first user of the function ? Christophe
Re: [PATCH v1 16/46] powerpc/mm: Allocate static page tables for fixmap
Le 16/03/2020 à 13:36, Christophe Leroy a écrit : Allocate static page tables for the fixmap area. This allows setting mappings through page tables before memblock is ready. That's needed to use early_ioremap() early and to use standard page mappings with fixmap. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/fixmap.h | 4 arch/powerpc/kernel/setup_32.c| 2 +- arch/powerpc/mm/pgtable_32.c | 16 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h index 2ef155a3c821..ccbe2e83c950 100644 --- a/arch/powerpc/include/asm/fixmap.h +++ b/arch/powerpc/include/asm/fixmap.h @@ -86,6 +86,10 @@ enum fixed_addresses { #define __FIXADDR_SIZE(__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE) +#define FIXMAP_ALIGNED_SIZE (ALIGN(FIXADDR_TOP, PGDIR_SIZE) - \ +ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE)) +#define FIXMAP_PTE_SIZE(FIXMAP_ALIGNED_SIZE / PGDIR_SIZE * PTE_TABLE_SIZE) + #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NCG #define FIXMAP_PAGE_IOPAGE_KERNEL_NCG diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 5b49b26eb154..3f1e1c0b328a 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -81,7 +81,7 @@ notrace void __init machine_init(u64 dt_ptr) /* Configure static keys first, now that we're relocated. */ setup_feature_keys(); - early_ioremap_setup(); + early_ioremap_init(); /* Enable early debugging if any specified (see udbg.h) */ udbg_early_init(); diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index f62de06e3d07..9934659cb871 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -29,11 +29,27 @@ #include #include #include +#include #include extern char etext[], _stext[], _sinittext[], _einittext[]; +static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data; Sparse reports this as a variable size array. This is definitely not. Gcc properly sees it is an 8k table (2 pages). Christophe + +notrace void __init early_ioremap_init(void) +{ + unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE); + pte_t *ptep = (pte_t *)early_fixmap_pagetable; + pmd_t *pmdp = pmd_ptr_k(addr); + + for (; (s32)(FIXADDR_TOP - addr) > 0; +addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++) + pmd_populate_kernel(&init_mm, pmdp, ptep); + + early_ioremap_setup(); +} + static void __init *early_alloc_pgtable(unsigned long size) { void *ptr = memblock_alloc(size, size); Christophe
Re: [PATCH 08/12] docs: fix broken references to text files
On Tue, Mar 17, 2020 at 02:10:47PM +0100, Mauro Carvalho Chehab wrote: > Several references got broken due to txt to ReST conversion. > > Several of them can be automatically fixed with: > > scripts/documentation-file-ref-check --fix > > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/admin-guide/kernel-parameters.txt | 2 +- > Documentation/memory-barriers.txt| 2 +- > Documentation/process/submit-checklist.rst | 2 +- > .../translations/it_IT/process/submit-checklist.rst | 2 +- > Documentation/translations/ko_KR/memory-barriers.txt | 2 +- > .../translations/zh_CN/filesystems/sysfs.txt | 2 +- > .../translations/zh_CN/process/submit-checklist.rst | 2 +- > Documentation/virt/kvm/arm/pvtime.rst| 2 +- > Documentation/virt/kvm/devices/vcpu.rst | 2 +- > Documentation/virt/kvm/hypercalls.rst| 4 ++-- > arch/powerpc/include/uapi/asm/kvm_para.h | 2 +- > drivers/gpu/drm/Kconfig | 2 +- > drivers/gpu/drm/drm_ioctl.c | 2 +- > drivers/hwtracing/coresight/Kconfig | 2 +- > fs/fat/Kconfig | 8 > fs/fuse/Kconfig | 2 +- > fs/fuse/dev.c| 2 +- > fs/overlayfs/Kconfig | 6 +++--- > include/linux/mm.h | 4 ++-- > include/uapi/linux/ethtool_netlink.h | 2 +- > include/uapi/rdma/rdma_user_ioctl_cmds.h | 2 +- > mm/gup.c | 12 ++-- > virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 +- > virt/kvm/arm/vgic/vgic.h | 4 ++-- > 24 files changed, 37 insertions(+), 37 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index df34a4176e58..28be91d4e66b 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -960,7 +960,7 @@ > edid/1680x1050.bin, or edid/1920x1080.bin is given > and no file with the same name exists. Details and > instructions how to build your own EDID data are > - available in Documentation/driver-api/edid.rst. An EDID > + available in Documentation/admin-guide/edid.rst. An EDID > data set will only be used for a particular connector, > if its name and a colon are prepended to the EDID > name. Each connector may use a unique EDID data > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index e1c355e84edd..eaabc3134294 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do > writes > until they are certain (1) that the write will actually happen, (2) > of the location of the write, and (3) of the value to be written. > But please carefully read the "CONTROL DEPENDENCIES" section and the > -Documentation/RCU/rcu_dereference.txt file: The compiler can and does > +Documentation/RCU/rcu_dereference.rst file: The compiler can and does > break dependencies in a great many highly creative ways. > > CPU 1 CPU 2 > diff --git a/Documentation/process/submit-checklist.rst > b/Documentation/process/submit-checklist.rst > index 8e56337d422d..3f8e9d5d95c2 100644 > --- a/Documentation/process/submit-checklist.rst > +++ b/Documentation/process/submit-checklist.rst > @@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches. > and why. > > 26) If any ioctl's are added by the patch, then also update > -``Documentation/ioctl/ioctl-number.rst``. > +``Documentation/userspace-api/ioctl/ioctl-number.rst``. > > 27) If your modified source code depends on or uses any of the kernel > APIs or features that are related to the following ``Kconfig`` symbols, > diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst > b/Documentation/translations/it_IT/process/submit-checklist.rst > index 995ee69fab11..3e575502690f 100644 > --- a/Documentation/translations/it_IT/process/submit-checklist.rst > +++ b/Documentation/translations/it_IT/process/submit-checklist.rst > @@ -117,7 +117,7 @@ sottomissione delle patch, in particolare > sorgenti che ne spieghi la logica: cosa fanno e perché. > > 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate > -``Documentation/ioctl/ioctl-number.rst``. > +``Documentation/userspace-api/ioctl/ioctl-number.rst``. > > 26) Se il codice che avete modificato dipende o usa una qualsiasi > interfaccia o > funzionali
Re: [PATCH v1 46/46] powerpc/32s: Implement dedicated kasan_init_region()
Le 16/03/2020 à 13:36, Christophe Leroy a écrit : Implement a kasan_init_region() dedicated to book3s/32 that allocates KASAN regions using BATs. Signed-off-by: Christophe Leroy Note that the sparse warning on pmac32_defconfig is definitely a false positive. See details patch 16/46 ("powerpc/mm: Allocate static page tables for fixmap") Christophe
Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
* Bharata B Rao [2020-03-17 19:52:32]: Thanks Bharata. > This patchset can also fix a related problem that I reported earlier at > https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html > with an additional change, suggested by Srikar as shown below: > > > > > - for_each_online_node(node) { > > + for_each_node(node) { > > + /* > > +* For all possible but not yet online nodes, ensure their > > +* node_numa_mem is set correctly so that kmalloc_node works > > +* for such nodes. > > +*/ > > + if (!node_online(node)) { > > Change the above line to like below: > > + if (!node_state(node, N_MEMORY)) { > Just to clarify, this is needed if we don't have http://lore.kernel.org/lkml/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u -- Thanks and Regards Srikar Dronamraju
Re:Re: [PATCH v3] powerpc/fsl-85xx: fix compile error
From: Michael Ellerman Date: 2020-03-17 19:22:13 To:"王文虎" cc: Benjamin Herrenschmidt ,Paul Mackerras ,Allison Randal ,Richard Fontana ,Greg Kroah-Hartman ,Thomas Gleixner ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org,triv...@kernel.org,ker...@vivo.com,stable Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>王文虎 writes: >> From: Michael Ellerman >> Date: 2020-03-16 17:41:12 >> To:WANG Wenhu ,Benjamin Herrenschmidt >> ,Paul Mackerras ,WANG Wenhu >> ,Allison Randal ,Richard Fontana >> ,Greg Kroah-Hartman ,Thomas >> Gleixner >> ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org >> cc: triv...@kernel.org,ker...@vivo.com,stable >> Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>WANG Wenhu >> writes: Include "linux/of_address.h" to fix the compile error for mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c. CC arch/powerpc/sysdev/fsl_85xx_l2ctlr.o arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’: arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of function ‘of_iomap’; did you mean ‘pci_iomap’? [-Werror=implicit-function-declaration] l2ctlr = of_iomap(dev->dev.of_node, 0); ^~~~ pci_iomap arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] l2ctlr = of_iomap(dev->dev.of_node, 0); ^ cc1: all warnings being treated as errors scripts/Makefile.build:267: recipe for target 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1 Fixes: commit 6db92cc9d07d ("powerpc/85xx: add cache-sram support") >>> >>>The syntax is: >>> >>>Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") >>> Cc: stable >>> >>>The commit above went into v2.6.37. >>> >>>So no one has noticed this bug since then, how? Or did something else >>>change to expose the problem? >> >> Actually a hard question to answer cause it also left me scratching for long. >> However, I can not find right or definite answer. > >Oh, actually it's fairly straight forward, the code can't be built at >all in upstream because CONFIG_FSL_85XX_CACHE_SRAM is not selectable or >selected by anything. Yeah, sure that is the reason, and I meant it was hard to figure out why nobody had ever compiled the driver with FSL_85XX_CACHE_SRAM enabled until me. > >You sent a patch previously to make it selectable, which Scott thought >was a bad idea. > >So this whole file is dead code as far as I'm concerned, so patches for >it definitely do not need to go to stable. > >If you want to add a user for it then please send a series doing that, >and this commit can be the first. For this, as you mentioned, it is dead and do not need to be applied to any stable. And I recommand the patch as a unit itself cause our module which uses it is still under developing, and the module itself would be taken as a complete logical block. Also it would take some time. Thanks, Wenhu > >cheers
Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
This patchset can also fix a related problem that I reported earlier at https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html with an additional change, suggested by Srikar as shown below: On Tue, Mar 17, 2020 at 06:47:53PM +0530, Srikar Dronamraju wrote: > Currently fallback nodes for offline nodes aren't set. Hence by default > node 0 ends up being the default node. However node 0 might be offline. > > Fix this by explicitly setting fallback node. Ensure first_memory_node > is set before kernel does explicit setting of fallback node. > > arch/powerpc/mm/numa.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 281531340230..6e97ab6575cb 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void) > if (!numa_enabled) > return; > > - for_each_online_node(node) { > + for_each_node(node) { > + /* > + * For all possible but not yet online nodes, ensure their > + * node_numa_mem is set correctly so that kmalloc_node works > + * for such nodes. > + */ > + if (!node_online(node)) { Change the above line to like below: + if (!node_state(node, N_MEMORY)) { Regards, Bharata.
[PATCH v5 0/3] Introduce Self-Save API for deep stop states
Skiboot patches v4: https://lists.ozlabs.org/pipermail/skiboot/2020-February/016404.html Linux patches v4: https://lkml.org/lkml/2020/2/12/446 Changelog v4 --> v5 Based on Michael Ellerman's comments, in patch3: 1. Added documentation in power-mgt.txt for self-save and self-restore 2. Used bitmap abstractions while parsing device tree 3. Fixed scenario where zero or less SPRs were treated as a success 4. Removed redundant device_node pointer 5. Removed unnecessary pr_warn messages 6. Changed the old of_find_node_by_path calls to of_find_compatible_node 7. Fixed leaking reference of the parsed device_node pointer Currently the stop-API supports a mechanism called as self-restore which allows us to restore the values of certain SPRs on wakeup from a deep-stop state to a desired value. To use this, the Kernel makes an OPAL call passing the PIR of the CPU, the SPR number and the value to which the SPR should be restored when that CPU wakes up from a deep stop state. Recently, a new feature, named self-save has been enabled in the stop-api, which is an alternative mechanism to do the same, except that self-save will save the current content of the SPR before entering a deep stop state and also restore the content back on waking up from a deep stop state. This patch series aims at introducing and leveraging the self-save feature in the kernel. Now, as the kernel has a choice to prefer one mode over the other and there can be registers in both the save/restore SPR list which are sent from the device tree, a new interface has been defined for the seamless handing of the modes for each SPR. A list of preferred SPRs are maintained in the kernel which contains two properties: 1. supported_mode: Helps in identifying if it strictly supports self save or restore or both. Initialized using the information from device tree. 2. preferred_mode: Calls out what mode is preferred for each SPR. It could be strictly self save or restore, or it can also determine the preference of mode over the other if both are present by encapsulating the other in bitmask from LSB to MSB. Initialized statically. Below is a table to show the Scenario::Consequence when the self save and self restore modes are available or disabled in different combinations as perceived from the device tree thus giving complete backwards compatibly regardless of an older firmware running a newer kernel or vise-versa. Support for self save or self-restore is embedded in the device tree, along with the set of registers it supports. SR = Self restore; SS = Self save .---.. | Scenario |Consequence | :---+: | Legacy Firmware. No SS or SR node | Self restore is called for all | | | supported SPRs | :---+: | SR: !active SS: !active | Deep stop states disabled | :---+: | SR: active SS: !active| Self restore is called for all | | | supported SPRs | :---+: | SR: active SS: active | Goes through the preferences for each | | | SPR and executes of the modes | | | accordingly. Currently, Self restore is| | | called for all the SPRs except PSSCR | | | which is self saved| :---+: | SR: active(only HID0) SS: active | Self save called for all supported | | | registers expect HID0 (as HID0 cannot | | | be self saved currently) | :---+: | SR: !active SS: active| currently will disable deep states as | | | HID0 is needed to be self restored and | | | cannot be self saved | '---'' Pratik Rajesh Sampat (3): powerpc/powernv: Interface to define support and preference for a SPR powerpc/powernv: Introduce Self save support powerpc/powernv: Parse device tree, population of SPR support .../bindings/powerpc/opal/power-mgt.txt | 10 + arch/powerpc/include/asm/opal-api.h | 3 +- arch/p
[PATCH v5 3/3] powerpc/powernv: Parse device tree, population of SPR support
Parse the device tree for nodes self-save, self-restore and populate support for the preferred SPRs based what was advertised by the device tree. Signed-off-by: Pratik Rajesh Sampat Reviewed-by: Ram Pai --- .../bindings/powerpc/opal/power-mgt.txt | 10 +++ arch/powerpc/platforms/powernv/idle.c | 78 +++ 2 files changed, 88 insertions(+) diff --git a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt index 9d619e955576..093cb5fe3d2d 100644 --- a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt +++ b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt @@ -116,3 +116,13 @@ otherwise. The length of all the property arrays must be the same. which of the fields of the PMICR are set in the corresponding entries in ibm,cpu-idle-state-pmicr. This is an optional property on POWER8 and is absent on POWER9. + +- self-restore: + Array of unsigned 64-bit values containing a property for sprn-mask + with each bit indicating the index of the supported SPR for the + functionality. This is an optional property for both Power8 and Power9 + +- self-save: + Array of unsigned 64-bit values containing a property for sprn-mask + with each bit indicating the index of the supported SPR for the + functionality. This is an optional property for both Power8 and Power9 diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 97aeb45e897b..c39111b338ff 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -1436,6 +1436,81 @@ static void __init pnv_probe_idle_states(void) supported_cpuidle_states |= pnv_idle_states[i].flags; } +/* + * Extracts and populates the self save or restore capabilities + * passed from the device tree node + */ +static int extract_save_restore_state_dt(struct device_node *np, int type) +{ + int nr_sprns = 0, i, bitmask_index; + u64 *temp_u64; + u64 bit_pos; + + nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask"); + if (nr_sprns <= 0) + return -EINVAL; + temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL); + if (of_property_read_u64_array(np, "sprn-bitmask", + temp_u64, nr_sprns)) { + pr_warn("cpuidle-powernv: failed to find registers in DT\n"); + kfree(temp_u64); + return -EINVAL; + } + /* +* Populate acknowledgment of support for the sprs in the global vector +* gotten by the registers supplied by the firmware. +* The registers are in a bitmask, bit index within +* that specifies the SPR +*/ + for (i = 0; i < nr_preferred_sprs; i++) { + bitmask_index = BIT_WORD(preferred_sprs[i].spr); + bit_pos = BIT_MASK(preferred_sprs[i].spr); + if ((temp_u64[bitmask_index] & bit_pos) == 0) { + if (type == SELF_RESTORE_TYPE) + preferred_sprs[i].supported_mode &= + ~SELF_RESTORE_STRICT; + else + preferred_sprs[i].supported_mode &= + ~SELF_SAVE_STRICT; + continue; + } + if (type == SELF_RESTORE_TYPE) { + preferred_sprs[i].supported_mode |= + SELF_RESTORE_STRICT; + } else { + preferred_sprs[i].supported_mode |= + SELF_SAVE_STRICT; + } + } + + kfree(temp_u64); + return 0; +} + +static int pnv_parse_deepstate_dt(void) +{ + struct device_node *np; + int rc = 0, i; + + /* Self restore register population */ + np = of_find_compatible_node(NULL, NULL, "ibm,opal-self-restore"); + if (np) { + rc = extract_save_restore_state_dt(np, SELF_RESTORE_TYPE); + if (rc != 0) + return rc; + } + /* Self save register population */ + np = of_find_compatible_node(NULL, NULL, "ibm,opal-self-save"); + if (!np) { + for (i = 0; i < nr_preferred_sprs; i++) + preferred_sprs[i].supported_mode &= ~SELF_SAVE_STRICT; + } else { + rc = extract_save_restore_state_dt(np, SELF_SAVE_TYPE); + } + of_node_put(np); + return rc; +} + /* * This function parses device-tree and populates all the information * into pnv_idle_states structure. It also sets up nr_pnv_idle_states @@ -1584,6 +1659,9 @@ static int __init pnv_init_idle_states(void) return rc; pnv_probe_idle_states(); + rc = pnv_parse_deepstate_dt(); + if (rc) + return rc; if (!cpu_has_feature(CPU_FTR_ARCH_3
[PATCH v5 2/3] powerpc/powernv: Introduce Self save support
This commit introduces and leverages the Self save API which OPAL now supports. Add the new Self Save OPAL API call in the list of OPAL calls. Implement the self saving of the SPRs based on the support populated while respecting it's preferences. This implementation allows mixing of support for the SPRs, which means that a SPR can be self restored while another SPR be self saved if they support and prefer it to be so. Signed-off-by: Pratik Rajesh Sampat Reviewed-by: Ram Pai --- arch/powerpc/include/asm/opal-api.h| 3 ++- arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/idle.c | 22 ++ arch/powerpc/platforms/powernv/opal-call.c | 1 + 4 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index c1f25a760eb1..1b6e1a68d431 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -214,7 +214,8 @@ #define OPAL_SECVAR_GET176 #define OPAL_SECVAR_GET_NEXT 177 #define OPAL_SECVAR_ENQUEUE_UPDATE 178 -#define OPAL_LAST 178 +#define OPAL_SLW_SELF_SAVE_REG 181 +#define OPAL_LAST 181 #define QUIESCE_HOLD 1 /* Spin all calls at entry */ #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9986ac34b8e2..389a85b63805 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -203,6 +203,7 @@ int64_t opal_handle_hmi(void); int64_t opal_handle_hmi2(__be64 *out_flags); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); +int64_t opal_slw_self_save_reg(uint64_t cpu_pir, uint64_t sprn); int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 03fe835aadd1..97aeb45e897b 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -279,6 +279,26 @@ static int pnv_self_save_restore_sprs(void) if (rc != 0) return rc; break; + } else if (preferred & curr_spr.supported_mode + & SELF_SAVE_STRICT) { + is_initialized = true; + if (curr_spr.spr == SPRN_HMEER && + cpu_thread_in_core(cpu) != 0) { + continue; + } + rc = opal_slw_self_save_reg(pir, + curr_spr.spr); + if (rc != 0) + return rc; + switch (curr_spr.spr) { + case SPRN_LPCR: + is_lpcr_self_save = true; + break; + case SPRN_PTCR: + is_ptcr_self_save = true; + break; + } + break; } preferred_sprs[index].preferred_mode = preferred_sprs[index].preferred_mode >> @@ -1159,6 +1179,8 @@ void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val) if (!is_lpcr_self_save) opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + else + opal_slw_self_save_reg(pir, SPRN_LPCR); } } diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c index 5cd0f52d258f..11e0ceb90de0 100644 --- a/arch/powerpc/platforms/powernv/opal-call.c +++ b/arch/powerpc/platforms/powernv/opal-call.c @@ -223,6 +223,7 @@ OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); OPAL_CALL(opal_handle_hmi2,OPAL_HANDLE_HMI2); OPAL_CALL(opal_config_cpu_idle_state, OPAL_CONFIG_CPU_IDLE_STATE); OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG); +OPAL_CALL(opal_slw_self_save_reg, OPAL_SLW_SELF_SAV
[PATCH v5 1/3] powerpc/powernv: Interface to define support and preference for a SPR
Define a bitmask interface to determine support for the Self Restore, Self Save or both. Also define an interface to determine the preference of that SPR to be strictly saved or restored or encapsulated with an order of preference. The preference bitmask is shown as below: |... | 2nd pref | 1st pref | MSB LSB The preference from higher to lower is from LSB to MSB with a shift of 8 bits. Example: Prefer self save first, if not available then prefer self restore The preference mask for this scenario will be seen as below. ((SELF_RESTORE_STRICT << PREFERENCE_SHIFT) | SELF_SAVE_STRICT) - |... | Self restore | Self save | - MSB LSB Finally, declare a list of preferred SPRs which encapsulate the bitmaks for preferred and supported with defaults of both being set to support legacy firmware. This commit also implements using the above interface and retains the legacy functionality of self restore. Signed-off-by: Pratik Rajesh Sampat Reviewed-by: Ram Pai --- arch/powerpc/platforms/powernv/idle.c | 316 +- 1 file changed, 259 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 78599bca66c2..03fe835aadd1 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -32,9 +32,112 @@ #define P9_STOP_SPR_MSR 2000 #define P9_STOP_SPR_PSSCR 855 +/* Interface for the stop state supported and preference */ +#define SELF_RESTORE_TYPE0 +#define SELF_SAVE_TYPE 1 + +#define NR_PREFERENCES2 +#define PREFERENCE_SHIFT 4 +#define PREFERENCE_MASK 0xf + +#define UNSUPPORTED 0x0 +#define SELF_RESTORE_STRICT 0x1 +#define SELF_SAVE_STRICT0x2 + +/* + * Bitmask defining the kind of preferences available. + * Note : The higher to lower preference is from LSB to MSB, with a shift of + * 4 bits. + * + * || 2nd pref | 1st pref | + * + * MSB LSB + */ +/* Prefer Restore if available, otherwise unsupported */ +#define PREFER_SELF_RESTORE_ONLY SELF_RESTORE_STRICT +/* Prefer Save if available, otherwise unsupported */ +#define PREFER_SELF_SAVE_ONLY SELF_SAVE_STRICT +/* Prefer Restore when available, otherwise prefer Save */ +#define PREFER_RESTORE_SAVE((SELF_SAVE_STRICT << \ + PREFERENCE_SHIFT)\ + | SELF_RESTORE_STRICT) +/* Prefer Save when available, otherwise prefer Restore*/ +#define PREFER_SAVE_RESTORE((SELF_RESTORE_STRICT <<\ + PREFERENCE_SHIFT)\ + | SELF_SAVE_STRICT) static u32 supported_cpuidle_states; struct pnv_idle_states_t *pnv_idle_states; int nr_pnv_idle_states; +/* Caching the lpcr & ptcr support to use later */ +static bool is_lpcr_self_save; +static bool is_ptcr_self_save; + +struct preferred_sprs { + u64 spr; + u32 preferred_mode; + u32 supported_mode; +}; + +/* + * Preferred mode: Order of precedence when both self-save and self-restore + *supported + * Supported mode: Default support. Can be overwritten during system + *initialization + */ +struct preferred_sprs preferred_sprs[] = { + { + .spr = SPRN_HSPRG0, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_LPCR, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_PTCR, + .preferred_mode = PREFER_SAVE_RESTORE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_HMEER, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_HID0, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = P9_STOP_SPR_MSR, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = P9_STOP_SPR_PSSCR, + .preferred_mode = PREFER_SAVE_RESTORE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_HID1, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, + }, + { + .spr = SPRN_HID4, + .preferred_mode = PREFER_RESTORE_SAVE, + .supported_mode = SELF_RESTORE_STRICT, +
Re: [PATCH v4 3/3] powerpc/powernv: Parse device tree, population of SPR support
Thank you Michael for the review. On 17/03/20 8:43 am, Michael Ellerman wrote: Pratik Rajesh Sampat writes: Parse the device tree for nodes self-save, self-restore and populate support for the preferred SPRs based what was advertised by the device tree. These should be documented in: Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt Sure thing I'll add them. diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 97aeb45e897b..27dfadf609e8 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -1436,6 +1436,85 @@ static void __init pnv_probe_idle_states(void) supported_cpuidle_states |= pnv_idle_states[i].flags; } +/* + * Extracts and populates the self save or restore capabilities + * passed from the device tree node + */ +static int extract_save_restore_state_dt(struct device_node *np, int type) +{ + int nr_sprns = 0, i, bitmask_index; + int rc = 0; + u64 *temp_u64; + u64 bit_pos; + + nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask"); + if (nr_sprns <= 0) + return rc; Using <= 0 means zero SPRs is treated by success as the caller, is that intended? If so a comment would be appropriate. My bad, this is undesirable. This should be treated as a failure. I'll fix this. + temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL); + if (of_property_read_u64_array(np, "sprn-bitmask", + temp_u64, nr_sprns)) { + pr_warn("cpuidle-powernv: failed to find registers in DT\n"); + kfree(temp_u64); + return -EINVAL; + } + /* +* Populate acknowledgment of support for the sprs in the global vector +* gotten by the registers supplied by the firmware. +* The registers are in a bitmask, bit index within +* that specifies the SPR +*/ + for (i = 0; i < nr_preferred_sprs; i++) { + bitmask_index = preferred_sprs[i].spr / 64; + bit_pos = preferred_sprs[i].spr % 64; This is basically a hand coded bitmap, see eg. BIT_WORD(), BIT_MASK() etc. I don't think there's an easy way to convert temp_u64 into a proper bitmap, so it's probably not worth doing that. But at least use the macros. Right. I'll use the macros for a cleaner abstraction. + if ((temp_u64[bitmask_index] & (1UL << bit_pos)) == 0) { + if (type == SELF_RESTORE_TYPE) + preferred_sprs[i].supported_mode &= + ~SELF_RESTORE_STRICT; + else + preferred_sprs[i].supported_mode &= + ~SELF_SAVE_STRICT; + continue; + } + if (type == SELF_RESTORE_TYPE) { + preferred_sprs[i].supported_mode |= + SELF_RESTORE_STRICT; + } else { + preferred_sprs[i].supported_mode |= + SELF_SAVE_STRICT; + } + } + + kfree(temp_u64); + return rc; +} + +static int pnv_parse_deepstate_dt(void) +{ + struct device_node *sr_np, *ss_np; You never use these concurrently AFAICS, so you could just have a single *np. Sure, got rid of it. + int rc = 0, i; + + /* Self restore register population */ + sr_np = of_find_node_by_path("/ibm,opal/power-mgt/self-restore"); I know the existing idle code uses of_find_node_by_path(), but that's because it's old and crufty. Please don't add new searches by path. You should be searching by compatible. Alright, I'll replace of_find_node_by_path() calls with of_find_compatible_node() + if (!sr_np) { + pr_warn("opal: self restore Node not found"); This warning and the others below will fire on all existing firmware versions, which is not OK. I'll get rid of the warnings. They seem unnecessary to me also now. + } else { + rc = extract_save_restore_state_dt(sr_np, SELF_RESTORE_TYPE); + if (rc != 0) + return rc; + } + /* Self save register population */ + ss_np = of_find_node_by_path("/ibm,opal/power-mgt/self-save"); + if (!ss_np) { + pr_warn("opal: self save Node not found"); + pr_warn("Legacy firmware. Assuming default self-restore support"); + for (i = 0; i < nr_preferred_sprs; i++) + preferred_sprs[i].supported_mode &= ~SELF_SAVE_STRICT; + } else { + rc = extract_save_restore_state_dt(ss_np, SELF_SAVE_TYPE); + } + return rc; You're leaking references on all the device_nodes in here, you need of_node_put() before exiting. Got it. I'll clear the refcount before exitting. +} cheers Thanks! Prat
Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus
On Tue 17-03-20 14:44:45, Vlastimil Babka wrote: > On 3/16/20 10:06 AM, Michal Hocko wrote: > > On Thu 12-03-20 17:41:58, Vlastimil Babka wrote: > > [...] > >> with nid present in: > >> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some > >> online > > > > I would rather have a dummy pgdat for those. Have a look at > > $ git grep "NODE_DATA.*->" | wc -l > > 63 > > > > Who knows how many else we have there. I haven't looked more closely. > > Besides that what is a real reason to not have pgdat ther and force all > > users of a $random node from those that the platform considers possible > > for special casing? Is that a memory overhead? Is that really a thing? > > I guess we can ignore memory overhead. I guess there only might be some > concern > that for nodes that are initially offline, we will allocate the pgdat on a > different node, and after they are online, it will stay on a different node > with > more access latency from local cpus. If we only allocate for online nodes, it > can always be local? But I guess it doesn't matter that much. This is not the case even now because of chicke&egg. You need a memory to allocate from and that memory has to be managed somewhere per node (pgdat). Keep in mind we do not have the bootmem allocator for the hotplug. Have a look at hotadd_new_pgdat and when it is called. There are some attempts to allocate memmap from the hotpluged memory but I am not sure we can do the whole thing without pgdat in place. If we can then can come up with some replace the pgdat magic. But still I am not even sure this is something we really have to optimize for. -- Michal Hocko SUSE Labs
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
On 3/17/20 2:45 PM, Srikar Dronamraju wrote: > * Vlastimil Babka [2020-03-17 14:34:25]: > >> On 3/17/20 2:17 PM, Srikar Dronamraju wrote: >> > Currently while allocating a slab for a offline node, we use its >> > associated node_numa_mem to search for a partial slab. If we don't find >> > a partial slab, we try allocating a slab from the offline node using >> > __alloc_pages_node. However this is bound to fail. >> > >> > NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0 >> > LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0 >> > Call Trace: >> > [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 >> > (unreliable) >> > [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0 >> > [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820 >> > [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60 >> > [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490 >> > [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110 >> > [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270 >> > [c008b3683b90] [c0234e08] online_css+0x48/0xd0 >> > [c008b3683bc0] [c023dedc] >> > cgroup_apply_control_enable+0x2ec/0x4d0 >> > [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0 >> > [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0 >> > [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230 >> > [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0 >> > [c008b3683e20] [c000b278] system_call+0x5c/0x68 >> > >> > Mitigate this by allocating the new slab from the node_numa_mem. >> >> Are you sure this is really needed and the other 3 patches are not enough for >> the current SLUB code to work as needed? It seems you are changing the >> semantics >> here... >> > > The other 3 patches are not enough because we don't carry the searchnode > when the actual alloc_pages_node gets called. > > With only the 3 patches, we see the above Panic, its signature is slightly > different from what Sachin first reported and which I have carried in 1st > patch. Ah, I see. So that's the missing pgdat after your series [1] right? That sounds like an argument for Michal's suggestions that pgdats exist and have correctly populated zonelists for all possible nodes. node_to_mem_node() could be just a shortcut for the first zone's node in the zonelist, so that fallback follows the topology. [1] https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e >> > --- a/mm/slub.c >> > +++ b/mm/slub.c >> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, >> > gfp_t flags, int node, >> >struct kmem_cache_cpu *c) >> > { >> >void *object; >> > - int searchnode = node; >> > >> > - if (node == NUMA_NO_NODE) >> > - searchnode = numa_mem_id(); >> > - else if (!node_present_pages(node)) >> > - searchnode = node_to_mem_node(node); >> > - >> > - object = get_partial_node(s, get_node(s, searchnode), c, flags); >> > + object = get_partial_node(s, get_node(s, node), c, flags); >> >if (object || node != NUMA_NO_NODE)>return object; >> > >> > return get_any_partial(s, flags, c); >> >> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the >> hunk below), thus the get_any_partial() call becomes dead code? >> >> > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct >> > kmem_cache *s, gfp_t flags, >> > >> >WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO)); >> > >> > + if (node == NUMA_NO_NODE) >> > + node = numa_mem_id(); >> > + else if (!node_present_pages(node)) >> > + node = node_to_mem_node(node); >> > + >> >freelist = get_partial(s, flags, node, c); >> > >> >if (freelist) >> > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, >> > gfp_t gfpflags, int node, >> > redo: >> > >> >if (unlikely(!node_match(page, node))) { >> > - int searchnode = node; >> > - >> >if (node != NUMA_NO_NODE && !node_present_pages(node)) >> > - searchnode = node_to_mem_node(node); >> > + node = node_to_mem_node(node); >> > >> > - if (unlikely(!node_match(page, searchnode))) { >> > + if (unlikely(!node_match(page, node))) { >> >stat(s, ALLOC_NODE_MISMATCH); >> >deactivate_slab(s, page, c->freelist, c); >> >goto new_slab; >> > >> >
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
* Vlastimil Babka [2020-03-17 14:34:25]: > On 3/17/20 2:17 PM, Srikar Dronamraju wrote: > > Currently while allocating a slab for a offline node, we use its > > associated node_numa_mem to search for a partial slab. If we don't find > > a partial slab, we try allocating a slab from the offline node using > > __alloc_pages_node. However this is bound to fail. > > > > NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0 > > LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0 > > Call Trace: > > [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 > > (unreliable) > > [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0 > > [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820 > > [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60 > > [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490 > > [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110 > > [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270 > > [c008b3683b90] [c0234e08] online_css+0x48/0xd0 > > [c008b3683bc0] [c023dedc] > > cgroup_apply_control_enable+0x2ec/0x4d0 > > [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0 > > [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0 > > [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230 > > [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0 > > [c008b3683e20] [c000b278] system_call+0x5c/0x68 > > > > Mitigate this by allocating the new slab from the node_numa_mem. > > Are you sure this is really needed and the other 3 patches are not enough for > the current SLUB code to work as needed? It seems you are changing the > semantics > here... > The other 3 patches are not enough because we don't carry the searchnode when the actual alloc_pages_node gets called. With only the 3 patches, we see the above Panic, its signature is slightly different from what Sachin first reported and which I have carried in 1st patch. > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t > > flags, int node, > > struct kmem_cache_cpu *c) > > { > > void *object; > > - int searchnode = node; > > > > - if (node == NUMA_NO_NODE) > > - searchnode = numa_mem_id(); > > - else if (!node_present_pages(node)) > > - searchnode = node_to_mem_node(node); > > - > > - object = get_partial_node(s, get_node(s, searchnode), c, flags); > > + object = get_partial_node(s, get_node(s, node), c, flags); > > if (object || node != NUMA_NO_NODE)>return object; > > > > return get_any_partial(s, flags, c); > > I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the > hunk below), thus the get_any_partial() call becomes dead code? > > > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct > > kmem_cache *s, gfp_t flags, > > > > WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO)); > > > > + if (node == NUMA_NO_NODE) > > + node = numa_mem_id(); > > + else if (!node_present_pages(node)) > > + node = node_to_mem_node(node); > > + > > freelist = get_partial(s, flags, node, c); > > > > if (freelist) > > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, > > gfp_t gfpflags, int node, > > redo: > > > > if (unlikely(!node_match(page, node))) { > > - int searchnode = node; > > - > > if (node != NUMA_NO_NODE && !node_present_pages(node)) > > - searchnode = node_to_mem_node(node); > > + node = node_to_mem_node(node); > > > > - if (unlikely(!node_match(page, searchnode))) { > > + if (unlikely(!node_match(page, node))) { > > stat(s, ALLOC_NODE_MISMATCH); > > deactivate_slab(s, page, c->freelist, c); > > goto new_slab; > > > -- Thanks and Regards Srikar Dronamraju
Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus
On 3/16/20 10:06 AM, Michal Hocko wrote: > On Thu 12-03-20 17:41:58, Vlastimil Babka wrote: > [...] >> with nid present in: >> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some >> online > > I would rather have a dummy pgdat for those. Have a look at > $ git grep "NODE_DATA.*->" | wc -l > 63 > > Who knows how many else we have there. I haven't looked more closely. > Besides that what is a real reason to not have pgdat ther and force all > users of a $random node from those that the platform considers possible > for special casing? Is that a memory overhead? Is that really a thing? I guess we can ignore memory overhead. I guess there only might be some concern that for nodes that are initially offline, we will allocate the pgdat on a different node, and after they are online, it will stay on a different node with more access latency from local cpus. If we only allocate for online nodes, it can always be local? But I guess it doesn't matter that much. > Somebody has suggested to tweak some of the low level routines to do the > special casing but I really have to say I do not like that. We shouldn't > use the first online node or anything like that. We should simply always > follow the topology presented by FW and of that we need to have a pgdat. >
Re: [PATCH 1/4] mm: Check for node_online in node_present_pages
* Srikar Dronamraju [2020-03-17 18:47:50]: > > Reported-by: Sachin Sant > Tested-by: Sachin Sant > Signed-off-by: Srikar Dronamraju > --- > include/linux/mmzone.h | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index f3f264826423..88078a3b95e5 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -756,8 +756,10 @@ typedef struct pglist_data { > atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; > } pg_data_t; > > -#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) > -#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) > +#define node_present_pages(nid) \ > + (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0) > +#define node_spanned_pages(nid) \ > + (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0) > #ifdef CONFIG_FLAT_NODE_MEM_MAP > #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) > #else If we indeed start having pgdat/NODE_DATA for even offline nodes as Michal Hocko, then we may not this particular patch. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
On 3/17/20 2:17 PM, Srikar Dronamraju wrote: > Currently while allocating a slab for a offline node, we use its > associated node_numa_mem to search for a partial slab. If we don't find > a partial slab, we try allocating a slab from the offline node using > __alloc_pages_node. However this is bound to fail. > > NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0 > LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0 > Call Trace: > [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 > (unreliable) > [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0 > [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820 > [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60 > [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490 > [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110 > [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270 > [c008b3683b90] [c0234e08] online_css+0x48/0xd0 > [c008b3683bc0] [c023dedc] cgroup_apply_control_enable+0x2ec/0x4d0 > [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0 > [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0 > [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230 > [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0 > [c008b3683e20] [c000b278] system_call+0x5c/0x68 > > Mitigate this by allocating the new slab from the node_numa_mem. Are you sure this is really needed and the other 3 patches are not enough for the current SLUB code to work as needed? It seems you are changing the semantics here... > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t > flags, int node, > struct kmem_cache_cpu *c) > { > void *object; > - int searchnode = node; > > - if (node == NUMA_NO_NODE) > - searchnode = numa_mem_id(); > - else if (!node_present_pages(node)) > - searchnode = node_to_mem_node(node); > - > - object = get_partial_node(s, get_node(s, searchnode), c, flags); > + object = get_partial_node(s, get_node(s, node), c, flags); > if (object || node != NUMA_NO_NODE)>return object; > > return get_any_partial(s, flags, c); I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the hunk below), thus the get_any_partial() call becomes dead code? > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache > *s, gfp_t flags, > > WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO)); > > + if (node == NUMA_NO_NODE) > + node = numa_mem_id(); > + else if (!node_present_pages(node)) > + node = node_to_mem_node(node); > + > freelist = get_partial(s, flags, node, c); > > if (freelist) > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, > gfp_t gfpflags, int node, > redo: > > if (unlikely(!node_match(page, node))) { > - int searchnode = node; > - > if (node != NUMA_NO_NODE && !node_present_pages(node)) > - searchnode = node_to_mem_node(node); > + node = node_to_mem_node(node); > > - if (unlikely(!node_match(page, searchnode))) { > + if (unlikely(!node_match(page, node))) { > stat(s, ALLOC_NODE_MISMATCH); > deactivate_slab(s, page, c->freelist, c); > goto new_slab; >
Re: [PATCH 08/12] docs: fix broken references to text files
On Tue, Mar 17, 2020 at 02:10:47PM +0100, Mauro Carvalho Chehab wrote: > Several references got broken due to txt to ReST conversion. > > Several of them can be automatically fixed with: > > scripts/documentation-file-ref-check --fix > > Signed-off-by: Mauro Carvalho Chehab > Documentation/admin-guide/kernel-parameters.txt | 2 +- > Documentation/memory-barriers.txt| 2 +- > Documentation/process/submit-checklist.rst | 2 +- > .../translations/it_IT/process/submit-checklist.rst | 2 +- > Documentation/translations/ko_KR/memory-barriers.txt | 2 +- > .../translations/zh_CN/filesystems/sysfs.txt | 2 +- > .../translations/zh_CN/process/submit-checklist.rst | 2 +- > Documentation/virt/kvm/arm/pvtime.rst| 2 +- > Documentation/virt/kvm/devices/vcpu.rst | 2 +- > Documentation/virt/kvm/hypercalls.rst| 4 ++-- > arch/powerpc/include/uapi/asm/kvm_para.h | 2 +- > drivers/gpu/drm/Kconfig | 2 +- > drivers/gpu/drm/drm_ioctl.c | 2 +- > drivers/hwtracing/coresight/Kconfig | 2 +- > fs/fat/Kconfig | 8 > fs/fuse/Kconfig | 2 +- > fs/fuse/dev.c| 2 +- > fs/overlayfs/Kconfig | 6 +++--- > include/linux/mm.h | 4 ++-- > include/uapi/linux/ethtool_netlink.h | 2 +- > include/uapi/rdma/rdma_user_ioctl_cmds.h | 2 +- For the rdma files Acked-by: Jason Gunthorpe Jason
[PATCH 1/4] mm: Check for node_online in node_present_pages
Calling a kmalloc_node on a possible node which is not yet onlined can lead to panic. Currently node_present_pages() doesn't verify the node is online before accessing the pgdat for the node. However pgdat struct may not be available resulting in a crash. NIP [c03d55f4] ___slab_alloc+0x1f4/0x760 LR [c03d5b94] __slab_alloc+0x34/0x60 Call Trace: [c008b3783960] [c03d5734] ___slab_alloc+0x334/0x760 (unreliable) [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60 [c008b3783a70] [c03d6fa0] __kmalloc_node+0x110/0x490 [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110 [c008b3783b30] [c03fee38] mem_cgroup_css_online+0x108/0x270 [c008b3783b90] [c0235aa8] online_css+0x48/0xd0 [c008b3783bc0] [c023eaec] cgroup_apply_control_enable+0x2ec/0x4d0 [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0 [c008b3783d10] [c051e170] kernfs_iop_mkdir+0x90/0xf0 [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230 [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0 [c008b3783e20] [c000b278] system_call+0x5c/0x68 Fix this by verifying the node is online before accessing the pgdat structure. Fix the same for node_spanned_pages() too. Cc: Andrew Morton Cc: linux...@kvack.org Cc: Mel Gorman Cc: Michael Ellerman Cc: Sachin Sant Cc: Michal Hocko Cc: Christopher Lameter Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim Cc: Kirill Tkhai Cc: Vlastimil Babka Cc: Srikar Dronamraju Cc: Bharata B Rao Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Srikar Dronamraju --- include/linux/mmzone.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index f3f264826423..88078a3b95e5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -756,8 +756,10 @@ typedef struct pglist_data { atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; } pg_data_t; -#define node_present_pages(nid)(NODE_DATA(nid)->node_present_pages) -#define node_spanned_pages(nid)(NODE_DATA(nid)->node_spanned_pages) +#define node_present_pages(nid)\ + (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0) +#define node_spanned_pages(nid)\ + (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0) #ifdef CONFIG_FLAT_NODE_MEM_MAP #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) #else -- 2.18.1
[PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
Currently fallback nodes for offline nodes aren't set. Hence by default node 0 ends up being the default node. However node 0 might be offline. Fix this by explicitly setting fallback node. Ensure first_memory_node is set before kernel does explicit setting of fallback node. Cc: Andrew Morton Cc: linux...@kvack.org Cc: Mel Gorman Cc: Michael Ellerman Cc: Sachin Sant Cc: Michal Hocko Cc: Christopher Lameter Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim Cc: Kirill Tkhai Cc: Vlastimil Babka Cc: Srikar Dronamraju Cc: Bharata B Rao Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Srikar Dronamraju --- arch/powerpc/mm/numa.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 281531340230..6e97ab6575cb 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void) if (!numa_enabled) return; - for_each_online_node(node) { + for_each_node(node) { + /* +* For all possible but not yet online nodes, ensure their +* node_numa_mem is set correctly so that kmalloc_node works +* for such nodes. +*/ + if (!node_online(node)) { + reset_numa_mem(node); + continue; + } pr_info("Node %d CPUs:", node); count = 0; -- 2.18.1
[PATCH 3/4] mm: Implement reset_numa_mem
For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY fallback node. Currently kernel has an API set_numa_mem that sets node_numa_mem for memoryless node. However this API cannot be used for offline nodes. Hence all offline nodes will have their node_numa_mem set to 0. However systems can themselves have node 0 as offline i.e memoryless and cpuless at this time. In such cases, node_to_mem_node() fails to provide a N_MEMORY fallback node. Mitigate this by having a new API that sets the default node_numa_mem for offline nodes to be first_memory_node. Cc: Andrew Morton Cc: linux...@kvack.org Cc: Mel Gorman Cc: Michael Ellerman Cc: Sachin Sant Cc: Michal Hocko Cc: Christopher Lameter Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim Cc: Kirill Tkhai Cc: Vlastimil Babka Cc: Srikar Dronamraju Cc: Bharata B Rao Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Srikar Dronamraju --- include/asm-generic/topology.h | 3 +++ include/linux/topology.h | 7 +++ 2 files changed, 10 insertions(+) diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h index 238873739550..e803ee7850e6 100644 --- a/include/asm-generic/topology.h +++ b/include/asm-generic/topology.h @@ -68,6 +68,9 @@ #ifndef set_numa_mem #define set_numa_mem(node) #endif +#ifndef reset_numa_mem +#define reset_numa_mem(node) +#endif #ifndef set_cpu_numa_mem #define set_cpu_numa_mem(cpu, node) #endif diff --git a/include/linux/topology.h b/include/linux/topology.h index eb2fe6edd73c..bebda80038bf 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -147,6 +147,13 @@ static inline int node_to_mem_node(int node) } #endif +#ifndef reset_numa_mem +static inline void reset_numa_mem(int node) +{ + _node_numa_mem_[node] = first_memory_node; +} +#endif + #ifndef numa_mem_id /* Returns the number of the nearest Node with memory */ static inline int numa_mem_id(void) -- 2.18.1
[PATCH 0/4] Fix kmalloc_node on offline nodes
Sachin recently reported that linux-next was no more bootable on few systems. https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/ # numactl -H available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 # Sachin bisected the problem to Commit a75056fc1e7c ("mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node") The root cause analysis showed that mm/slub and powerpc/numa had some shortcomings with respect to offline nodes. This patch series is on top of patches posted at https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u Cc: Andrew Morton Cc: linux...@kvack.org Cc: Mel Gorman Cc: Michael Ellerman Cc: Sachin Sant Cc: Michal Hocko Cc: Christopher Lameter Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim Cc: Kirill Tkhai Cc: Vlastimil Babka Cc: Srikar Dronamraju Cc: Bharata B Rao Srikar Dronamraju (4): mm: Check for node_online in node_present_pages mm/slub: Use mem_node to allocate a new slab mm: Implement reset_numa_mem powerpc/numa: Set fallback nodes for offline nodes arch/powerpc/mm/numa.c | 11 ++- include/asm-generic/topology.h | 3 +++ include/linux/mmzone.h | 6 -- include/linux/topology.h | 7 +++ mm/slub.c | 19 --- 5 files changed, 32 insertions(+), 14 deletions(-) -- 2.18.1
[PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
Currently while allocating a slab for a offline node, we use its associated node_numa_mem to search for a partial slab. If we don't find a partial slab, we try allocating a slab from the offline node using __alloc_pages_node. However this is bound to fail. NIP [c039a300] __alloc_pages_nodemask+0x130/0x3b0 LR [c039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0 Call Trace: [c008b36837f0] [c039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable) [c008b3683870] [c03d1ff8] new_slab+0x128/0xcf0 [c008b3683950] [c03d6060] ___slab_alloc+0x410/0x820 [c008b3683a40] [c03d64a4] __slab_alloc+0x34/0x60 [c008b3683a70] [c03d78b0] __kmalloc_node+0x110/0x490 [c008b3683af0] [c0343a08] kvmalloc_node+0x58/0x110 [c008b3683b30] [c03ffd44] mem_cgroup_css_online+0x104/0x270 [c008b3683b90] [c0234e08] online_css+0x48/0xd0 [c008b3683bc0] [c023dedc] cgroup_apply_control_enable+0x2ec/0x4d0 [c008b3683ca0] [c02416f8] cgroup_mkdir+0x228/0x5f0 [c008b3683d10] [c0520360] kernfs_iop_mkdir+0x90/0xf0 [c008b3683d50] [c043e400] vfs_mkdir+0x110/0x230 [c008b3683da0] [c0441ee0] do_mkdirat+0xb0/0x1a0 [c008b3683e20] [c000b278] system_call+0x5c/0x68 Mitigate this by allocating the new slab from the node_numa_mem. Cc: Andrew Morton Cc: linux...@kvack.org Cc: Mel Gorman Cc: Michael Ellerman Cc: Sachin Sant Cc: Michal Hocko Cc: Christopher Lameter Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim Cc: Kirill Tkhai Cc: Vlastimil Babka Cc: Srikar Dronamraju Cc: Bharata B Rao Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Srikar Dronamraju --- mm/slub.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 1c55bf7892bf..fdf7f38f96e6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, struct kmem_cache_cpu *c) { void *object; - int searchnode = node; - if (node == NUMA_NO_NODE) - searchnode = numa_mem_id(); - else if (!node_present_pages(node)) - searchnode = node_to_mem_node(node); - - object = get_partial_node(s, get_node(s, searchnode), c, flags); + object = get_partial_node(s, get_node(s, node), c, flags); if (object || node != NUMA_NO_NODE) return object; @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags, WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO)); + if (node == NUMA_NO_NODE) + node = numa_mem_id(); + else if (!node_present_pages(node)) + node = node_to_mem_node(node); + freelist = get_partial(s, flags, node, c); if (freelist) @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, redo: if (unlikely(!node_match(page, node))) { - int searchnode = node; - if (node != NUMA_NO_NODE && !node_present_pages(node)) - searchnode = node_to_mem_node(node); + node = node_to_mem_node(node); - if (unlikely(!node_match(page, searchnode))) { + if (unlikely(!node_match(page, node))) { stat(s, ALLOC_NODE_MISMATCH); deactivate_slab(s, page, c->freelist, c); goto new_slab; -- 2.18.1
Re: [PATCH] powerpc/32: Fix missing NULL pmd check in virt_to_kpte()
On Sat, 2020-03-07 at 10:09:15 UTC, Christophe Leroy wrote: > Commit 2efc7c085f05 ("powerpc/32: drop get_pteptr()"), > replaced get_pteptr() by virt_to_kpte(). But virt_to_kpte() lacks a > NULL pmd check and returns an invalid non NULL pointer when there > is no page table. > > Reported-by: Nick Desaulniers > Fixes: 2efc7c085f05 ("powerpc/32: drop get_pteptr()") > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/cc6f0e39000900e5dd1448103a9571f0eccd7d4e cheers
Re: [PATCH v2 -next] powerpc/pmac/smp: drop unnecessary volatile qualifier
On Tue, 2020-03-03 at 08:56:04 UTC, YueHaibing wrote: > core99_l2_cache/core99_l3_cache no need to mark as volatile, > just remove it. > > Signed-off-by: YueHaibing Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/a4037d1f1fc4e92b69d7196d4568c33078d465ea cheers
Re: [PATCH] powerpc/64s/radix: Fix !SMP build
On Mon, 2020-03-02 at 01:04:10 UTC, Nicholas Piggin wrote: > Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/993cfecc59e57de237ae27fadc84ed24efa87a4d cheers
Re: [PATCH] selftests: powerpc: Add tlbie_test in .gitignore
On Fri, 2020-02-28 at 00:00:09 UTC, Christophe Leroy wrote: > The commit identified below added tlbie_test but > forgot to add it in .gitignore. > > Fixes: 93cad5f78995 ("selftests/powerpc: Add test case for tlbie vs mtpidr > ordering issue") > Cc: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/47bf235f324c696395c30541fe4fcf99fcd24188 cheers
Re: [PATCH] powerpc: fix emulate_step std test
On Wed, 2020-02-26 at 05:53:02 UTC, Nicholas Piggin wrote: > Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/59ed2adf393109c56d383e568f2e57bb5ad6d901 cheers
Re: [PATCH] powerpc/vdso: remove deprecated VDS64_HAS_DESCRIPTORS references
On Mon, 2020-02-24 at 21:18:48 UTC, Joe Lawrence wrote: > The original 2005 patch that introduced the powerpc vdso, pre-git > ("ppc64: Implement a vDSO and use it for signal trampoline") notes that: > > ... symbols exposed by the vDSO aren't "normal" function symbols, apps > can't be expected to link against them directly, the vDSO's are both > seen as if they were linked at 0 and the symbols just contain offsets > to the various functions. This is done on purpose to avoid a > relocation step (ppc64 functions normally have descriptors with abs > addresses in them). When glibc uses those functions, it's expected to > use it's own trampolines that know how to reach them. > > Despite that explanation, there remains dead #ifdef > VDS64_HAS_DESCRIPTORS code-blocks that provide alternate function > definitions that setup function descriptors. > > Since VDS64_HAS_DESCRIPTORS has been unused for all these years, we > might as well finally remove it from the codebase. > > Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204430.= > html > Link: https://lore.kernel.org/lkml/1108002773.7733.196.camel@gaston/ > Signed-off-by: Joe Lawrence Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/ffd3eaf178b0f616a071e510e289d937330b0b35 cheers
Re: [PATCH 1/2] cpufreq/powernv: Fix use-after-free
On Thu, 2020-02-06 at 06:26:21 UTC, Oliver O'Halloran wrote: > The cpufreq driver has a use-after-free that we can hit if: > > a) There's an OCC message pending when the notifier is registered, and > b) The cpufreq driver fails to register with the core. > > When a) occurs the notifier schedules a workqueue item to handle the > message. The backing work_struct is located on chips[].throttle and when b) > happens we clean up by freeing the array. Once we get to the (now free) > queued item and the kernel crashes. > > Cc: Vaidyanathan Srinivasan > Fixes: c5e29ea ("cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}") > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/d0a72efac89d1c35ac55197895201b7b94c5e6ef cheers
Re: [PATCH] powerpc/fsl_booke: avoid creating duplicate tlb1 entry
On Thu, 2020-01-23 at 11:19:25 UTC, Laurentiu Tudor wrote: > In the current implementation, the call to loadcam_multi() is wrapped > between switch_to_as1() and restore_to_as0() calls so, when it tries > to create its own temporary AS=3D1 TLB1 entry, it ends up duplicating the > existing one created by switch_to_as1(). Add a check to skip creating > the temporary entry if already running in AS=3D1. > > Fixes: d9e1831a4202 ("powerpc/85xx: Load all early TLB entries at once") > Signed-off-by: Laurentiu Tudor > Cc: sta...@vger.kernel.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/aa4113340ae6c2811e046f08c2bc21011d20a072 cheers
Re: [PATCH] evh_bytechan: fix out of bounds accesses
On Thu, 2020-01-09 at 07:39:12 UTC, Stephen Rothwell wrote: > ev_byte_channel_send() assumes that its third argument is a 16 byte array. > Some places where it is called it may not be (or we can't easily tell > if it is). Newer compilers have started producing warnings about this, > so make sure we actually pass a 16 byte array. > > There may be more elegant solutions to this, but the driver is quite > old and hasn't been updated in many years. > > The warnings (from a powerpc allyesconfig build) are: > > In file included from include/linux/byteorder/big_endian.h:5, > from arch/powerpc/include/uapi/asm/byteorder.h:14, > from include/asm-generic/bitops/le.h:6, > from arch/powerpc/include/asm/bitops.h:250, > from include/linux/bitops.h:29, > from include/linux/kernel.h:12, > from include/asm-generic/bug.h:19, > from arch/powerpc/include/asm/bug.h:109, > from include/linux/bug.h:5, > from include/linux/mmdebug.h:5, > from include/linux/gfp.h:5, > from include/linux/slab.h:15, > from drivers/tty/ehv_bytechan.c:24: > drivers/tty/ehv_bytechan.c: In function =E2=80=98ehv_bc_udbg_putc=E2=80=99: > arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 = > is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds] > 298 | r6 =3D be32_to_cpu(p[1]); > include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac= > ro =E2=80=98__be32_to_cpu=E2=80=99 >40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x)) > | ^ > arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro = > =E2=80=98be32_to_cpu=E2=80=99 > 298 | r6 =3D be32_to_cpu(p[1]); > | ^~~ > drivers/tty/ehv_bytechan.c:166:13: note: while referencing =E2=80=98data=E2= > =80=99 > 166 | static void ehv_bc_udbg_putc(char c) > | ^~~~ > In file included from include/linux/byteorder/big_endian.h:5, > from arch/powerpc/include/uapi/asm/byteorder.h:14, > from include/asm-generic/bitops/le.h:6, > from arch/powerpc/include/asm/bitops.h:250, > from include/linux/bitops.h:29, > from include/linux/kernel.h:12, > from include/asm-generic/bug.h:19, > from arch/powerpc/include/asm/bug.h:109, > from include/linux/bug.h:5, > from include/linux/mmdebug.h:5, > from include/linux/gfp.h:5, > from include/linux/slab.h:15, > from drivers/tty/ehv_bytechan.c:24: > arch/powerpc/include/asm/epapr_hcalls.h:299:20: warning: array subscript 2 = > is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds] > 299 | r7 =3D be32_to_cpu(p[2]); > include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac= > ro =E2=80=98__be32_to_cpu=E2=80=99 >40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x)) > | ^ > arch/powerpc/include/asm/epapr_hcalls.h:299:7: note: in expansion of macro = > =E2=80=98be32_to_cpu=E2=80=99 > 299 | r7 =3D be32_to_cpu(p[2]); > | ^~~ > drivers/tty/ehv_bytechan.c:166:13: note: while referencing =E2=80=98data=E2= > =80=99 > 166 | static void ehv_bc_udbg_putc(char c) > | ^~~~ > In file included from include/linux/byteorder/big_endian.h:5, > from arch/powerpc/include/uapi/asm/byteorder.h:14, > from include/asm-generic/bitops/le.h:6, > from arch/powerpc/include/asm/bitops.h:250, > from include/linux/bitops.h:29, > from include/linux/kernel.h:12, > from include/asm-generic/bug.h:19, > from arch/powerpc/include/asm/bug.h:109, > from include/linux/bug.h:5, > from include/linux/mmdebug.h:5, > from include/linux/gfp.h:5, > from include/linux/slab.h:15, > from drivers/tty/ehv_bytechan.c:24: > arch/powerpc/include/asm/epapr_hcalls.h:300:20: warning: array subscript 3 = > is outside array bounds of =E2=80=98const char[1]=E2=80=99 [-Warray-bounds] > 300 | r8 =3D be32_to_cpu(p[3]); > include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of mac= > ro =E2=80=98__be32_to_cpu=E2=80=99 >40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x)) > | ^ > arch/powerpc/include/asm/epapr_hcalls.h:300:7: note: in expansion of macro = > =E2=80=98be32_to_cpu=E2=80=99 > 300 | r8 =3D be32_to_cpu(p[3]); > | ^~~ > drivers/tty/ehv_bytechan.c:166:13: note: while referencing
Re: [PATCH] powerpc/pmac/smp: avoid unused-variable warnings
On Fri, 2019-09-20 at 15:39:51 UTC, Ilie Halip wrote: > When building with ppc64_defconfig, the compiler reports > that these 2 variables are not used: > warning: unused variable 'core99_l2_cache' [-Wunused-variable] > warning: unused variable 'core99_l3_cache' [-Wunused-variable] > > They are only used when CONFIG_PPC64 is not defined. Move > them into a section which does the same macro check. > > Reported-by: Nathan Chancellor > Signed-off-by: Ilie Halip Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/9451c79bc39e610882bdd12370f01af5004a3c4f cheers
[PATCH 11/12] docs: filesystems: fix renamed references
Some filesystem references got broken by a previous patch series I submitted. Address those. Signed-off-by: Mauro Carvalho Chehab --- Documentation/ABI/stable/sysfs-devices-node | 2 +- Documentation/ABI/testing/procfs-smaps_rollup | 2 +- Documentation/admin-guide/cpu-load.rst| 2 +- .../admin-guide/kernel-parameters.txt | 2 +- Documentation/admin-guide/nfs/nfsroot.rst | 2 +- .../driver-api/driver-model/device.rst| 4 +- .../driver-api/driver-model/overview.rst | 2 +- Documentation/filesystems/dax.txt | 2 +- Documentation/filesystems/dnotify.txt | 2 +- .../filesystems/ramfs-rootfs-initramfs.rst| 2 +- Documentation/filesystems/sysfs.rst | 2 +- .../powerpc/firmware-assisted-dump.rst| 2 +- Documentation/process/adding-syscalls.rst | 2 +- .../it_IT/process/adding-syscalls.rst | 2 +- .../translations/zh_CN/filesystems/sysfs.txt | 6 +-- MAINTAINERS | 54 +-- Next/merge.log| 12 ++--- drivers/base/core.c | 2 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 +- fs/Kconfig| 2 +- fs/Kconfig.binfmt | 2 +- fs/adfs/Kconfig | 2 +- fs/affs/Kconfig | 2 +- fs/afs/Kconfig| 6 +-- fs/bfs/Kconfig| 2 +- fs/cramfs/Kconfig | 2 +- fs/ecryptfs/Kconfig | 2 +- fs/hfs/Kconfig| 2 +- fs/hpfs/Kconfig | 2 +- fs/isofs/Kconfig | 2 +- fs/namespace.c| 2 +- fs/notify/inotify/Kconfig | 2 +- fs/ntfs/Kconfig | 2 +- fs/ocfs2/Kconfig | 2 +- fs/proc/Kconfig | 4 +- fs/romfs/Kconfig | 2 +- fs/sysfs/dir.c| 2 +- fs/sysfs/file.c | 2 +- fs/sysfs/mount.c | 2 +- fs/sysfs/symlink.c| 2 +- fs/sysv/Kconfig | 2 +- fs/udf/Kconfig| 2 +- include/linux/kobject.h | 2 +- include/linux/kobject_ns.h| 2 +- include/linux/relay.h | 2 +- include/linux/sysfs.h | 2 +- kernel/relay.c| 2 +- lib/kobject.c | 4 +- 48 files changed, 86 insertions(+), 86 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index df8413cf1468..484fc04bcc25 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -54,7 +54,7 @@ Date: October 2002 Contact: Linux Memory Management list Description: Provides information about the node's distribution and memory - utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt + utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst What: /sys/devices/system/node/nodeX/numastat Date: October 2002 diff --git a/Documentation/ABI/testing/procfs-smaps_rollup b/Documentation/ABI/testing/procfs-smaps_rollup index 274df44d8b1b..046978193368 100644 --- a/Documentation/ABI/testing/procfs-smaps_rollup +++ b/Documentation/ABI/testing/procfs-smaps_rollup @@ -11,7 +11,7 @@ Description: Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem are not present in /proc/pid/smaps. These fields represent the sum of the Pss field of each type (anon, file, shmem). - For more details, see Documentation/filesystems/proc.txt + For more details, see Documentation/filesystems/proc.rst and the procfs man page. Typical output looks like this: diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst index 2d01ce43d2a2..ebdecf864080 100644 --- a/Documentation/admin-guide/cpu-load.rst +++ b/Documentation/admin-guide/cpu-load.rst @@ -105,7 +105,7 @@ References -- - http://lkml.org/lkml/2007/2/12/6 -- Documentation/filesystems/proc.txt (1.8) +- Documentation/filesystems/proc.rst (1.8) Thanks diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 28be91d4e66b..192a36a66fc9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -683,7 +683,7 @@ cored
[PATCH 08/12] docs: fix broken references to text files
Several references got broken due to txt to ReST conversion. Several of them can be automatically fixed with: scripts/documentation-file-ref-check --fix Signed-off-by: Mauro Carvalho Chehab --- Documentation/admin-guide/kernel-parameters.txt | 2 +- Documentation/memory-barriers.txt| 2 +- Documentation/process/submit-checklist.rst | 2 +- .../translations/it_IT/process/submit-checklist.rst | 2 +- Documentation/translations/ko_KR/memory-barriers.txt | 2 +- .../translations/zh_CN/filesystems/sysfs.txt | 2 +- .../translations/zh_CN/process/submit-checklist.rst | 2 +- Documentation/virt/kvm/arm/pvtime.rst| 2 +- Documentation/virt/kvm/devices/vcpu.rst | 2 +- Documentation/virt/kvm/hypercalls.rst| 4 ++-- arch/powerpc/include/uapi/asm/kvm_para.h | 2 +- drivers/gpu/drm/Kconfig | 2 +- drivers/gpu/drm/drm_ioctl.c | 2 +- drivers/hwtracing/coresight/Kconfig | 2 +- fs/fat/Kconfig | 8 fs/fuse/Kconfig | 2 +- fs/fuse/dev.c| 2 +- fs/overlayfs/Kconfig | 6 +++--- include/linux/mm.h | 4 ++-- include/uapi/linux/ethtool_netlink.h | 2 +- include/uapi/rdma/rdma_user_ioctl_cmds.h | 2 +- mm/gup.c | 12 ++-- virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 +- virt/kvm/arm/vgic/vgic.h | 4 ++-- 24 files changed, 37 insertions(+), 37 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index df34a4176e58..28be91d4e66b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -960,7 +960,7 @@ edid/1680x1050.bin, or edid/1920x1080.bin is given and no file with the same name exists. Details and instructions how to build your own EDID data are - available in Documentation/driver-api/edid.rst. An EDID + available in Documentation/admin-guide/edid.rst. An EDID data set will only be used for a particular connector, if its name and a colon are prepended to the EDID name. Each connector may use a unique EDID data diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index e1c355e84edd..eaabc3134294 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do writes until they are certain (1) that the write will actually happen, (2) of the location of the write, and (3) of the value to be written. But please carefully read the "CONTROL DEPENDENCIES" section and the -Documentation/RCU/rcu_dereference.txt file: The compiler can and does +Documentation/RCU/rcu_dereference.rst file: The compiler can and does break dependencies in a great many highly creative ways. CPU 1 CPU 2 diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst index 8e56337d422d..3f8e9d5d95c2 100644 --- a/Documentation/process/submit-checklist.rst +++ b/Documentation/process/submit-checklist.rst @@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches. and why. 26) If any ioctl's are added by the patch, then also update -``Documentation/ioctl/ioctl-number.rst``. +``Documentation/userspace-api/ioctl/ioctl-number.rst``. 27) If your modified source code depends on or uses any of the kernel APIs or features that are related to the following ``Kconfig`` symbols, diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst index 995ee69fab11..3e575502690f 100644 --- a/Documentation/translations/it_IT/process/submit-checklist.rst +++ b/Documentation/translations/it_IT/process/submit-checklist.rst @@ -117,7 +117,7 @@ sottomissione delle patch, in particolare sorgenti che ne spieghi la logica: cosa fanno e perché. 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate -``Documentation/ioctl/ioctl-number.rst``. +``Documentation/userspace-api/ioctl/ioctl-number.rst``. 26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o funzionalità del kernel che è associata a uno dei seguenti simboli diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index 2e831ece6e26..e50fe6541335 100644 --- a/Documentati
[PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
From: Stefan Berger This patch fixes the following problem when the ibmvtpm driver is built as a module: ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefined! make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1 make: *** [Makefile:1298: modules] Error 2 Signed-off-by: Stefan Berger --- drivers/char/tpm/tpm2-cmd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index 76f67b155bd5..eff1f12d981a 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -681,6 +681,7 @@ int tpm2_get_cc_attrs_tbl(struct tpm_chip *chip) rc = -ENODEV; return rc; } +EXPORT_SYMBOL_GPL(tpm2_get_cc_attrs_tbl); /** * tpm2_startup - turn on the TPM -- 2.23.0
Re: Slub: Increased mem consumption on cpu,mem-less node powerpc guest
On Tue, Mar 17, 2020 at 02:56:28PM +0530, Bharata B Rao wrote: > Case 1: 2 node NUMA, node0 empty > > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: > node 0 size: 0 MB > node 0 free: 0 MB > node 1 cpus: 0 1 2 3 4 5 6 7 > node 1 size: 16294 MB > node 1 free: 15453 MB > node distances: > node 0 1 > 0: 10 40 > 1: 40 10 > > diff --git a/mm/slub.c b/mm/slub.c > index 17dc00e33115..888e4d245444 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1971,10 +1971,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t > flags, int node, > void *object; > int searchnode = node; > > - if (node == NUMA_NO_NODE) > + if (node == NUMA_NO_NODE || !node_present_pages(node)) > searchnode = numa_mem_id(); > - else if (!node_present_pages(node)) > - searchnode = node_to_mem_node(node); For the above topology, I see this: node_to_mem_node(1) = 1 node_to_mem_node(0) = 0 node_to_mem_node(NUMA_NO_NODE) = 0 Looks like the last two cases (returning memory-less node 0) is the problem here? Regards, Bharata.
Re: [PATCH v3] powerpc/fsl-85xx: fix compile error
王文虎 writes: > From: Michael Ellerman > Date: 2020-03-16 17:41:12 > To:WANG Wenhu ,Benjamin Herrenschmidt > ,Paul Mackerras ,WANG Wenhu > ,Allison Randal ,Richard Fontana > ,Greg Kroah-Hartman ,Thomas > Gleixner > ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org > cc: triv...@kernel.org,ker...@vivo.com,stable > Subject: Re: [PATCH v3] powerpc/fsl-85xx: fix compile error>WANG Wenhu > writes: >>> Include "linux/of_address.h" to fix the compile error for >>> mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c. >>> >>> CC arch/powerpc/sysdev/fsl_85xx_l2ctlr.o >>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function >>> ‘mpc85xx_l2ctlr_of_probe’: >>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of >>> function ‘of_iomap’; did you mean ‘pci_iomap’? >>> [-Werror=implicit-function-declaration] >>> l2ctlr = of_iomap(dev->dev.of_node, 0); >>>^~~~ >>>pci_iomap >>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer >>> from integer without a cast [-Werror=int-conversion] >>> l2ctlr = of_iomap(dev->dev.of_node, 0); >>> ^ >>> cc1: all warnings being treated as errors >>> scripts/Makefile.build:267: recipe for target >>> 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed >>> make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1 >>> >>> Fixes: commit 6db92cc9d07d ("powerpc/85xx: add cache-sram support") >> >>The syntax is: >> >>Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") >> >>> Cc: stable >> >>The commit above went into v2.6.37. >> >>So no one has noticed this bug since then, how? Or did something else >>change to expose the problem? > > Actually a hard question to answer cause it also left me scratching for long. > However, I can not find right or definite answer. Oh, actually it's fairly straight forward, the code can't be built at all in upstream because CONFIG_FSL_85XX_CACHE_SRAM is not selectable or selected by anything. You sent a patch previously to make it selectable, which Scott thought was a bad idea. So this whole file is dead code as far as I'm concerned, so patches for it definitely do not need to go to stable. If you want to add a user for it then please send a series doing that, and this commit can be the first. cheers
[PATCH] treewide: Rename "unencrypted" to "decrypted"
Hi all, this hasn't been fully tested yet but it is mechanical rename only so there shouldn't be any problems (famous last words :-)). I'll run it through the randconfig bench today and take it through tip if there are no objections. Thx. --- Back then when the whole SME machinery started getting mainlined, it was agreed that for simplicity, clarity and sanity's sake, the terms denoting encrypted and not-encrypted memory should be "encrypted" and "decrypted". And the majority of the code sticks to that convention except those two. So rename them. No functional changes. Signed-off-by: Borislav Petkov --- arch/powerpc/platforms/pseries/Kconfig | 2 +- arch/s390/Kconfig | 2 +- arch/x86/Kconfig | 2 +- arch/x86/mm/mem_encrypt.c | 4 ++-- include/linux/dma-direct.h | 8 kernel/dma/Kconfig | 2 +- kernel/dma/direct.c| 14 +++--- kernel/dma/mapping.c | 2 +- 8 files changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 24c18362e5ea..a78e2c3e1d92 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -151,7 +151,7 @@ config PPC_SVM depends on PPC_PSERIES select SWIOTLB select ARCH_HAS_MEM_ENCRYPT - select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select ARCH_HAS_FORCE_DMA_DECRYPTED help There are certain POWER platforms which support secure guests using the Protected Execution Facility, with the help of an Ultravisor diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 8abe77536d9d..ab1dbb7415b4 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -192,7 +192,7 @@ config S390 select VIRT_CPU_ACCOUNTING select ARCH_HAS_SCALED_CPUTIME select HAVE_NMI - select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select ARCH_HAS_FORCE_DMA_DECRYPTED select SWIOTLB select GENERIC_ALLOCATOR diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beea77046f9b..2ae904f505e1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1525,7 +1525,7 @@ config AMD_MEM_ENCRYPT depends on X86_64 && CPU_SUP_AMD select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT - select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select ARCH_HAS_FORCE_DMA_DECRYPTED ---help--- Say yes to enable support for the encryption of system memory. This requires an AMD processor that supports Secure Memory diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index a03614bd3e1a..66d09f269e6d 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -350,8 +350,8 @@ bool sev_active(void) return sme_me_mask && sev_enabled; } -/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ -bool force_dma_unencrypted(struct device *dev) +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_DECRYPTED */ +bool force_dma_decrypted(struct device *dev) { /* * For SEV, all DMA must be to unencrypted addresses. diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 24b8684aa21d..9f955844e9c7 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -26,14 +26,14 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr) } #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */ -#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED -bool force_dma_unencrypted(struct device *dev); +#ifdef CONFIG_ARCH_HAS_FORCE_DMA_DECRYPTED +bool force_dma_decrypted(struct device *dev); #else -static inline bool force_dma_unencrypted(struct device *dev) +static inline bool force_dma_decrypted(struct device *dev) { return false; } -#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */ +#endif /* CONFIG_ARCH_HAS_FORCE_DMA_DECRYPTED */ /* * If memory encryption is supported, phys_to_dma will set the memory encryption diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..55c4147bb2b1 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -51,7 +51,7 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL config ARCH_HAS_DMA_PREP_COHERENT bool -config ARCH_HAS_FORCE_DMA_UNENCRYPTED +config ARCH_HAS_FORCE_DMA_DECRYPTED bool config DMA_NONCOHERENT_CACHE_SYNC diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index ac7956c38f69..a0576c0ccacd 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -26,7 +26,7 @@ unsigned int zone_dma_bits __ro_after_init = 24; static inline dma_addr_t phys_to_dma_direct(struct device *dev, phys_addr_t phys) { - if (force_dma_unencrypted(dev)) + if (force_dma_decrypted(dev)) return __phys_to_dma(dev, phys); return phys_to_dma(dev, phys); } @@ -49,7 +49,7 @@ static gfp_t __dma_direct_opt
Re: [PATCH 15/15] powerpc/watchpoint/xmon: Support 2nd dawr
Le 09/03/2020 à 09:58, Ravi Bangoria a écrit : Add support for 2nd DAWR in xmon. With this, we can have two simultaneous breakpoints from xmon. Signed-off-by: Ravi Bangoria --- arch/powerpc/xmon/xmon.c | 101 ++- 1 file changed, 69 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index ac18fe3e4295..20adc83404c8 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -110,7 +110,7 @@ struct bpt { #define NBPTS 256 static struct bpt bpts[NBPTS]; -static struct bpt dabr; +static struct bpt dabr[HBP_NUM_MAX]; static struct bpt *iabr; static unsigned bpinstr = 0x7fe8; /* trap */ @@ -786,10 +786,17 @@ static int xmon_sstep(struct pt_regs *regs) static int xmon_break_match(struct pt_regs *regs) { + int i; + if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) return 0; - if (dabr.enabled == 0) - return 0; + for (i = 0; i < nr_wp_slots(); i++) { + if (dabr[i].enabled) + goto found; + } + return 0; + +found: xmon_core(regs, 0); return 1; } @@ -928,13 +935,16 @@ static void insert_bpts(void) static void insert_cpu_bpts(void) { + int i; struct arch_hw_breakpoint brk; - if (dabr.enabled) { - brk.address = dabr.address; - brk.type = (dabr.enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; - brk.len = DABR_MAX_LEN; - __set_breakpoint(&brk, 0); + for (i = 0; i < nr_wp_slots(); i++) { + if (dabr[i].enabled) { + brk.address = dabr[i].address; + brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; + brk.len = 8; + __set_breakpoint(&brk, i); + } } if (iabr) @@ -1348,6 +1358,35 @@ static long check_bp_loc(unsigned long addr) return 1; } +static int free_data_bpt(void) This names suggests the function frees a breakpoint. I guess it should be find_free_data_bpt() +{ + int i; + + for (i = 0; i < nr_wp_slots(); i++) { + if (!dabr[i].enabled) + return i; + } + printf("Couldn't find free breakpoint register\n"); + return -1; +} + +static void print_data_bpts(void) +{ + int i; + + for (i = 0; i < nr_wp_slots(); i++) { + if (!dabr[i].enabled) + continue; + + printf(" data "REG" [", dabr[i].address); + if (dabr[i].enabled & 1) + printf("r"); + if (dabr[i].enabled & 2) + printf("w"); + printf("]\n"); + } +} + static char *breakpoint_help_string = "Breakpoint command usage:\n" "bshow breakpoints\n" @@ -1381,10 +1420,9 @@ bpt_cmds(void) printf("Hardware data breakpoint not supported on this cpu\n"); break; } - if (dabr.enabled) { - printf("Couldn't find free breakpoint register\n"); + i = free_data_bpt(); + if (i < 0) break; - } mode = 7; cmd = inchar(); if (cmd == 'r') @@ -1393,15 +1431,15 @@ bpt_cmds(void) mode = 6; else termch = cmd; - dabr.address = 0; - dabr.enabled = 0; - if (scanhex(&dabr.address)) { - if (!is_kernel_addr(dabr.address)) { + dabr[i].address = 0; + dabr[i].enabled = 0; + if (scanhex(&dabr[i].address)) { + if (!is_kernel_addr(dabr[i].address)) { printf(badaddr); break; } - dabr.address &= ~HW_BRK_TYPE_DABR; - dabr.enabled = mode | BP_DABR; + dabr[i].address &= ~HW_BRK_TYPE_DABR; + dabr[i].enabled = mode | BP_DABR; } force_enable_xmon(); @@ -1440,7 +1478,9 @@ bpt_cmds(void) for (i = 0; i < NBPTS; ++i) bpts[i].enabled = 0; iabr = NULL; - dabr.enabled = 0; + for (i = 0; i < nr_wp_slots(); i++) + dabr[i].enabled = 0; + printf("All breakpoints cleared\n"); break; } @@ -1474,14 +1514,7 @@ bpt_cmds(void) if (xmon_is_ro || !scanhex(&a)) { /* print all breakpoints */ printf(" type
Re: [PATCH 14/15] powerpc/watchpoint/xmon: Don't allow breakpoint overwriting
Le 09/03/2020 à 09:58, Ravi Bangoria a écrit : Xmon allows overwriting breakpoints because it's supported by only one dawr. But with multiple dawrs, overwriting becomes ambiguous or unnecessary complicated. So let's not allow it. Could we drop this completely (I mean the functionnality, not the patch). Christophe Signed-off-by: Ravi Bangoria --- arch/powerpc/xmon/xmon.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 0ca0d29f99c6..ac18fe3e4295 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1381,6 +1381,10 @@ bpt_cmds(void) printf("Hardware data breakpoint not supported on this cpu\n"); break; } + if (dabr.enabled) { + printf("Couldn't find free breakpoint register\n"); + break; + } mode = 7; cmd = inchar(); if (cmd == 'r')
Re: [PATCH 13/15] powerpc/watchpoint: Don't allow concurrent perf and ptrace events
Le 09/03/2020 à 09:58, Ravi Bangoria a écrit : ptrace and perf watchpoints on powerpc behaves differently. Ptrace On the 8xx, ptrace generates signal after executing the instruction. watchpoint works in one-shot mode and generates signal before executing instruction. It's ptrace user's job to single-step the instruction and re-enable the watchpoint. OTOH, in case of perf watchpoint, kernel emulates/single-steps the instruction and then generates event. If perf and ptrace creates two events with same or overlapping address ranges, it's ambiguous to decide who should single-step the instruction. Because of this issue ptrace and perf event can't coexist when the address range overlaps. Ok, and then ? What's the purpose of this (big) patch ? Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 2 + arch/powerpc/kernel/hw_breakpoint.c | 220 +++ kernel/events/hw_breakpoint.c| 16 ++ 3 files changed, 238 insertions(+) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index ec61e2b7195c..6e1a19af5177 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -66,6 +66,8 @@ extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); int arch_install_hw_breakpoint(struct perf_event *bp); void arch_uninstall_hw_breakpoint(struct perf_event *bp); +int arch_reserve_bp_slot(struct perf_event *bp); +void arch_release_bp_slot(struct perf_event *bp); void arch_unregister_hw_breakpoint(struct perf_event *bp); void hw_breakpoint_pmu_read(struct perf_event *bp); extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk); diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 2ac89b92590f..d8529d9151e8 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -123,6 +123,226 @@ static bool is_ptrace_bp(struct perf_event *bp) return (bp->overflow_handler == ptrace_triggered); } +struct breakpoint { + struct list_head list; + struct perf_event *bp; + bool ptrace_bp; +}; Don't we have an equivalent struct already ? + +static DEFINE_PER_CPU(struct breakpoint *, cpu_bps[HBP_NUM_MAX]); +static LIST_HEAD(task_bps); + +static struct breakpoint *alloc_breakpoint(struct perf_event *bp) +{ + struct breakpoint *tmp; + + tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return ERR_PTR(-ENOMEM); + tmp->bp = bp; + tmp->ptrace_bp = is_ptrace_bp(bp); + return tmp; +} + +static bool bp_addr_range_overlap(struct perf_event *bp1, struct perf_event *bp2) +{ + __u64 bp1_saddr, bp1_eaddr, bp2_saddr, bp2_eaddr; + + bp1_saddr = bp1->attr.bp_addr & ~HW_BREAKPOINT_ALIGN; + bp1_eaddr = (bp1->attr.bp_addr + bp1->attr.bp_len - 1) | HW_BREAKPOINT_ALIGN; + bp2_saddr = bp2->attr.bp_addr & ~HW_BREAKPOINT_ALIGN; + bp2_eaddr = (bp2->attr.bp_addr + bp2->attr.bp_len - 1) | HW_BREAKPOINT_ALIGN; + + return (bp1_saddr <= bp2_eaddr && bp1_eaddr >= bp2_saddr); Would be better with something like (HW_BREAKPOINT_SIZE needs to be defined). bp1_saddr = ALIGN_DOWN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE); bp1_eaddr = ALIGN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE); bp2_saddr = ALIGN_DOWN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE); bp2_eaddr = ALIGN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE); return (bp1_saddr < bp2_eaddr && bp1_eaddr > bp2_saddr); +} + +static bool alternate_infra_bp(struct breakpoint *b, struct perf_event *bp) +{ + return is_ptrace_bp(bp) ? !b->ptrace_bp : b->ptrace_bp; +} + +static bool can_co_exist(struct breakpoint *b, struct perf_event *bp) +{ + return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp)); +} + +static int task_bps_add(struct perf_event *bp) +{ + struct breakpoint *tmp; + + tmp = alloc_breakpoint(bp); + if (IS_ERR(tmp)) + return PTR_ERR(tmp); + + list_add(&tmp->list, &task_bps); + return 0; +} + +static void task_bps_remove(struct perf_event *bp) +{ + struct list_head *pos, *q; + struct breakpoint *tmp; + + list_for_each_safe(pos, q, &task_bps) { + tmp = list_entry(pos, struct breakpoint, list); + + if (tmp->bp == bp) { + list_del(&tmp->list); + kfree(tmp); + break; + } + } +} + +/* + * If any task has breakpoint from alternate infrastructure, + * return true. Otherwise return false. + */ +static bool all_task_bps_check(struct perf_event *bp) +{ + struct breakpoint *tmp; + + list_for_each_entry(tmp, &task_bps, list) { + if (!can_co_exist(tmp, bp)) + return true; + } +
Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
On 17.03.20 11:49, David Hildenbrand wrote: > For now, distributions implement advanced udev rules to essentially > - Don't online any hotplugged memory (s390x) > - Online all memory to ZONE_NORMAL (e.g., most virt environments like > hyperv) > - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken > care of (e.g., bare metal, special virt environments) > > In summary: All memory is usually onlined the same way, however, the > kernel always has to ask user space to come up with the same answer. > E.g., Hyper-V always waits for a memory block to get onlined before > continuing, otherwise it might end up adding memory faster than > hotplugging it, which can result in strange OOM situations. This waiting s/hotplugging/onlining/ -- Thanks, David / dhildenb
Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
On 17.03.20 12:01, Michal Hocko wrote: > On Tue 17-03-20 11:49:42, David Hildenbrand wrote: >> For now, distributions implement advanced udev rules to essentially >> - Don't online any hotplugged memory (s390x) >> - Online all memory to ZONE_NORMAL (e.g., most virt environments like >> hyperv) >> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken >> care of (e.g., bare metal, special virt environments) >> >> In summary: All memory is usually onlined the same way, however, the >> kernel always has to ask user space to come up with the same answer. >> E.g., Hyper-V always waits for a memory block to get onlined before >> continuing, otherwise it might end up adding memory faster than >> hotplugging it, which can result in strange OOM situations. This waiting >> slows down adding of a bigger amount of memory. >> >> Let's allow to specify a default online_type, not just "online" and >> "offline". This allows distributions to configure the default online_type >> when booting up and be done with it. >> >> We can now specify "offline", "online", "online_movable" and >> "online_kernel" via >> - "memhp_default_state=" on the kernel cmdline >> - /sys/devices/system/memory/auto_online_blocks >> just like we are able to specify for a single memory block via >> /sys/devices/system/memory/memoryX/state >> >> Reviewed-by: Wei Yang >> Cc: Greg Kroah-Hartman >> Cc: Andrew Morton >> Cc: Michal Hocko >> Cc: Oscar Salvador >> Cc: "Rafael J. Wysocki" >> Cc: Baoquan He >> Cc: Wei Yang >> Signed-off-by: David Hildenbrand > > As I've said earlier and several times already, I really dislike this > interface. But it is fact that this patch doesn't make it any worse. > Quite contrary, so feel free to add > Acked-by: Michal Hocko Thanks Michal! -- Thanks, David / dhildenb
Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
On Tue 17-03-20 11:49:42, David Hildenbrand wrote: > For now, distributions implement advanced udev rules to essentially > - Don't online any hotplugged memory (s390x) > - Online all memory to ZONE_NORMAL (e.g., most virt environments like > hyperv) > - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken > care of (e.g., bare metal, special virt environments) > > In summary: All memory is usually onlined the same way, however, the > kernel always has to ask user space to come up with the same answer. > E.g., Hyper-V always waits for a memory block to get onlined before > continuing, otherwise it might end up adding memory faster than > hotplugging it, which can result in strange OOM situations. This waiting > slows down adding of a bigger amount of memory. > > Let's allow to specify a default online_type, not just "online" and > "offline". This allows distributions to configure the default online_type > when booting up and be done with it. > > We can now specify "offline", "online", "online_movable" and > "online_kernel" via > - "memhp_default_state=" on the kernel cmdline > - /sys/devices/system/memory/auto_online_blocks > just like we are able to specify for a single memory block via > /sys/devices/system/memory/memoryX/state > > Reviewed-by: Wei Yang > Cc: Greg Kroah-Hartman > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: "Rafael J. Wysocki" > Cc: Baoquan He > Cc: Wei Yang > Signed-off-by: David Hildenbrand As I've said earlier and several times already, I really dislike this interface. But it is fact that this patch doesn't make it any worse. Quite contrary, so feel free to add Acked-by: Michal Hocko > --- > drivers/base/memory.c | 11 +-- > include/linux/memory_hotplug.h | 2 ++ > mm/memory_hotplug.c| 8 > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8d3e16dab69f..2b09b68b9f78 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -35,7 +35,7 @@ static const char *const online_type_to_str[] = { > [MMOP_ONLINE_MOVABLE] = "online_movable", > }; > > -static int memhp_online_type_from_str(const char *str) > +int memhp_online_type_from_str(const char *str) > { > int i; > > @@ -394,13 +394,12 @@ static ssize_t auto_online_blocks_store(struct device > *dev, > struct device_attribute *attr, > const char *buf, size_t count) > { > - if (sysfs_streq(buf, "online")) > - memhp_default_online_type = MMOP_ONLINE; > - else if (sysfs_streq(buf, "offline")) > - memhp_default_online_type = MMOP_OFFLINE; > - else > + const int online_type = memhp_online_type_from_str(buf); > + > + if (online_type < 0) > return -EINVAL; > > + memhp_default_online_type = online_type; > return count; > } > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index c6e090b34c4b..ef55115320fb 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -117,6 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size, > struct mhp_restrictions *restrictions); > extern u64 max_mem_size; > > +extern int memhp_online_type_from_str(const char *str); > + > /* Default online_type (MMOP_*) when new memory blocks are added. */ > extern int memhp_default_online_type; > /* If movable_node boot option specified */ > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1975a2b99a2b..9916977b6ee1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -74,10 +74,10 @@ int memhp_default_online_type = MMOP_ONLINE; > > static int __init setup_memhp_default_state(char *str) > { > - if (!strcmp(str, "online")) > - memhp_default_online_type = MMOP_ONLINE; > - else if (!strcmp(str, "offline")) > - memhp_default_online_type = MMOP_OFFLINE; > + const int online_type = memhp_online_type_from_str(str); > + > + if (online_type >= 0) > + memhp_default_online_type = online_type; > > return 1; > } > -- > 2.24.1 -- Michal Hocko SUSE Labs
Re: [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type
On Tue 17-03-20 11:49:41, David Hildenbrand wrote: > ... and rename it to memhp_default_online_type. This is a preparation > for more detailed default online behavior. > > Reviewed-by: Wei Yang > Cc: Greg Kroah-Hartman > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: "Rafael J. Wysocki" > Cc: Baoquan He > Cc: Wei Yang > Signed-off-by: David Hildenbrand Acked-by: Michal Hocko > --- > drivers/base/memory.c | 10 -- > include/linux/memory_hotplug.h | 3 ++- > mm/memory_hotplug.c| 11 ++- > 3 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8a7f29c0bf97..8d3e16dab69f 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -386,10 +386,8 @@ static DEVICE_ATTR_RO(block_size_bytes); > static ssize_t auto_online_blocks_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > - if (memhp_auto_online) > - return sprintf(buf, "online\n"); > - else > - return sprintf(buf, "offline\n"); > + return sprintf(buf, "%s\n", > +online_type_to_str[memhp_default_online_type]); > } > > static ssize_t auto_online_blocks_store(struct device *dev, > @@ -397,9 +395,9 @@ static ssize_t auto_online_blocks_store(struct device > *dev, > const char *buf, size_t count) > { > if (sysfs_streq(buf, "online")) > - memhp_auto_online = true; > + memhp_default_online_type = MMOP_ONLINE; > else if (sysfs_streq(buf, "offline")) > - memhp_auto_online = false; > + memhp_default_online_type = MMOP_OFFLINE; > else > return -EINVAL; > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index c2e06ed5e0e9..c6e090b34c4b 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -117,7 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size, > struct mhp_restrictions *restrictions); > extern u64 max_mem_size; > > -extern bool memhp_auto_online; > +/* Default online_type (MMOP_*) when new memory blocks are added. */ > +extern int memhp_default_online_type; > /* If movable_node boot option specified */ > extern bool movable_node_enabled; > static inline bool movable_node_is_enabled(void) > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 2d2aae830b92..1975a2b99a2b 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -67,17 +67,17 @@ void put_online_mems(void) > bool movable_node_enabled = false; > > #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE > -bool memhp_auto_online; > +int memhp_default_online_type = MMOP_OFFLINE; > #else > -bool memhp_auto_online = true; > +int memhp_default_online_type = MMOP_ONLINE; > #endif > > static int __init setup_memhp_default_state(char *str) > { > if (!strcmp(str, "online")) > - memhp_auto_online = true; > + memhp_default_online_type = MMOP_ONLINE; > else if (!strcmp(str, "offline")) > - memhp_auto_online = false; > + memhp_default_online_type = MMOP_OFFLINE; > > return 1; > } > @@ -990,6 +990,7 @@ static int check_hotplug_memory_range(u64 start, u64 size) > > static int online_memory_block(struct memory_block *mem, void *arg) > { > + mem->online_type = memhp_default_online_type; > return device_online(&mem->dev); > } > > @@ -1062,7 +1063,7 @@ int __ref add_memory_resource(int nid, struct resource > *res) > mem_hotplug_done(); > > /* online pages if requested */ > - if (memhp_auto_online) > + if (memhp_default_online_type != MMOP_OFFLINE) > walk_memory_blocks(start, size, NULL, online_memory_block); > > return ret; > -- > 2.24.1 -- Michal Hocko SUSE Labs