Re: [patch 17/22] PCI/MSI: Split out !IRQDOMAIN code

2021-11-28 Thread Cédric Le Goater




+void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
+{
+   struct msi_desc *desc;
+   int i;
+
+   for_each_pci_msi_entry(desc, dev) {
+   if (desc->irq) {
+   for (i = 0; i < entry->nvec_used; i++)


I guess this is 'desc' ?

Thanks,

C.


+   arch_teardown_msi_irq(desc->irq + i);
+   }
+   }
+}


Re: [patch 05/22] genirq/msi: Fixup includes

2021-11-28 Thread Cédric Le Goater

On 11/27/21 02:18, Thomas Gleixner wrote:

Remove the kobject.h include from msi.h as it's not required and add a
sysfs.h include to the core code instead.

Signed-off-by: Thomas Gleixner 



This patch breaks compile on powerpc :

  CC  arch/powerpc/kernel/msi.o
In file included from ../arch/powerpc/kernel/msi.c:7:
../include/linux/msi.h:410:65: error: ‘struct cpumask’ declared inside 
parameter list will not be visible outside of this definition or declaration 
[-Werror]
  410 | int msi_domain_set_affinity(struct irq_data *data, const struct cpumask 
*mask,
  | ^~~
cc1: all warnings being treated as errors

Below is fix you can merge in patch 5.

Thanks,

C.

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -2,6 +2,7 @@
 #ifndef LINUX_MSI_H
 #define LINUX_MSI_H
 
+#include 

 #include 
 #include 


---
  include/linux/msi.h |1 -
  kernel/irq/msi.c|1 +
  2 files changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -2,7 +2,6 @@
  #ifndef LINUX_MSI_H
  #define LINUX_MSI_H
  
-#include 

  #include 
  #include 
  
--- a/kernel/irq/msi.c

+++ b/kernel/irq/msi.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  #include "internals.h"






Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 07:03:41PM +0100, mirq-t...@rere.qmqm.pl wrote:
> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions like this:
> > 
> > if (num_present_cpus() > 1)
> > do_something();
> > 
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> > 
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> > 
> > if (num_present_cpus_gt(1))
> > do_something();
> > 
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
> 
> Having slept on it I have more structured thoughts:
> 
> First, I like substituting bitmap_empty/full where possible - I think
> the change stands on its own, so could be split and sent as is.

Ok, I can do it.

> I don't like the proposed API very much. One problem is that it hides
> the comparison operator and makes call sites less readable:
> 
>   bitmap_weight(...) > N
> 
> becomes:
> 
>   bitmap_weight_gt(..., N)
> 
> and:
>   bitmap_weight(...) <= N
> 
> becomes:
> 
>   bitmap_weight_lt(..., N+1)
> or:
>   !bitmap_weight_gt(..., N)
> 
> I'd rather see something resembling memcmp() API that's known enough
> to be easier to grasp. For above examples:
> 
>   bitmap_weight_cmp(..., N) > 0
>   bitmap_weight_cmp(..., N) <= 0
>   ...

bitmap_weight_cmp() cannot be efficient. Consider this example:

bitmap_weight_lt(1000   , 1) == false
 ^
 stop here

bitmap_weight_cmp(1000   , 1) == 0
 ^
 stop here

I agree that '_gt' is less verbose than '>', but the advantage of 
'_gt' over '>' is proportional to length of bitmap, and it means
that this API should exist.

> This would also make the implementation easier in not having to
> copy and paste the code three times. Could also use a simple
> optimization reducing code size:

In the next version I'll reduce code duplication like this:

bool bitmap_eq(..., N);
bool bitmap_ge(..., N);

#define bitmap_weight_gt(..., N)  bitmap_weight_ge(..., N + 1)
#define bitmap_weight_lt(..., N) !bitmap_weight_ge(..., N)
#define bitmap_weight_le(..., N) !bitmap_weight_gt(..., N)

Thanks,
Yury


Re: [PATCH v3 4/4] powerpc/inst: Optimise copy_inst_from_kernel_nofault()

2021-11-28 Thread Christophe Leroy




Le 27/11/2021 à 19:10, Christophe Leroy a écrit :

copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to
copy one or two 32bits words. This means calling an out-of-line
function which itself calls back copy_from_kernel_nofault_allowed()
then performs a generic copy with loops.

Rewrite copy_inst_from_kernel_nofault() to do everything at a
single place and use __get_kernel_nofault() directly to perform
single accesses without loops.

Before the patch:

0018 :
  18:   94 21 ff e0 stwur1,-32(r1)
  1c:   7c 08 02 a6 mflrr0
  20:   38 a0 00 04 li  r5,4
  24:   93 e1 00 1c stw r31,28(r1)
  28:   7c 7f 1b 78 mr  r31,r3
  2c:   38 61 00 08 addir3,r1,8
  30:   90 01 00 24 stw r0,36(r1)
  34:   48 00 00 01 bl  34 
34: R_PPC_REL24 copy_from_kernel_nofault
  38:   2c 03 00 00 cmpwi   r3,0
  3c:   40 82 00 0c bne 48 
  40:   81 21 00 08 lwz r9,8(r1)
  44:   91 3f 00 00 stw r9,0(r31)
  48:   80 01 00 24 lwz r0,36(r1)
  4c:   83 e1 00 1c lwz r31,28(r1)
  50:   38 21 00 20 addir1,r1,32
  54:   7c 08 03 a6 mtlrr0
  58:   4e 80 00 20 blr

After the patch:

0018 :
  18:   3d 20 b0 00 lis r9,-20480
  1c:   7c 04 48 40 cmplw   r4,r9
  20:   7c 69 1b 78 mr  r9,r3
  24:   41 80 00 2c blt 50 
  28:   81 42 04 d0 lwz r10,1232(r2)
  2c:   39 4a 00 01 addir10,r10,1
  30:   91 42 04 d0 stw r10,1232(r2)
  34:   80 e4 00 00 lwz r7,0(r4)
  38:   81 42 04 d0 lwz r10,1232(r2)
  3c:   38 60 00 00 li  r3,0
  40:   39 4a ff ff addir10,r10,-1
  44:   91 42 04 d0 stw r10,1232(r2)
  48:   90 e9 00 00 stw r7,0(r9)
  4c:   4e 80 00 20 blr

  50:   38 60 ff de li  r3,-34
  54:   4e 80 00 20 blr
  58:   81 22 04 d0 lwz r9,1232(r2)
  5c:   38 60 ff f2 li  r3,-14
  60:   39 29 ff ff addir9,r9,-1
  64:   91 22 04 d0 stw r9,1232(r2)
  68:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
v3: New
---
  arch/powerpc/mm/maccess.c | 18 --
  1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c
index 5abae96b2b46..90309806f5eb 100644
--- a/arch/powerpc/mm/maccess.c
+++ b/arch/powerpc/mm/maccess.c
@@ -15,16 +15,22 @@ bool copy_from_kernel_nofault_allowed(const void 
*unsafe_src, size_t size)
  int copy_inst_from_kernel_nofault(ppc_inst_t *inst, u32 *src)
  {
unsigned int val, suffix;
-   int err;
  
-	err = copy_from_kernel_nofault(, src, sizeof(val));

-   if (err)
-   return err;
+   if (unlikely(!is_kernel_addr((unsigned long)src)))
+   return -ERANGE;
+
+   pagefault_disable();


Allthough generic version of copy_from_kernel_nofault() does it, 
disabling pagefault is pointless here because we are accessing kernel 
addresses only, so a page fault will always fail via bad_kernel_fault(), 
it will never reach the faulthandler_disabled() test.



+   __get_kernel_nofault(, src, u32, Efault);
if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) {
-   err = copy_from_kernel_nofault(, src + 1, 
sizeof(suffix));
+   __get_kernel_nofault(, src + 1, u32, Efault);
+   pagefault_enable();
*inst = ppc_inst_prefix(val, suffix);
} else {
+   pagefault_enable();
*inst = ppc_inst(val);
}
-   return err;
+   return 0;
+Efault:
+   pagefault_enable();
+   return -EFAULT;
  }



[linux-next] Read-only file system after boot (powerpc)

2021-11-28 Thread Sachin Sant
With recent version of linux-next builds on IBM Power servers,
depending on file system used I have observed the file system is either
mounted read-only(ext4) or kernel fails to reach login prompt (with XFS).
In both cases following messages are seen during boot:

[4.379474] sd 0:0:1:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK cmd_age=0s
[4.379479] sd 0:0:1:0: [sda] tag#25 Sense Key : Illegal Request [current]
[4.379483] sd 0:0:1:0: [sda] tag#25 Add. Sense: Invalid field in cdb
[4.379487] sd 0:0:1:0: [sda] tag#25 CDB: Write(10) 2a 08 00 db d7 36 00 00 
01 00
[4.379490] critical target error, dev sda, sector 115259824 op 0x1:(WRITE) 
flags 0x69800 phys_seg 1 prio class 0

I first thought these might be due to failed drives, but started seeing them
on several servers. I have attached linux boot log (boot.log) for reference.

git bisect points to following commit:

f1880d26e517a3fe2fd0c32bcbe05e9230a441cf is the first bad commit
commit f1880d26e517a3fe2fd0c32bcbe05e9230a441cf
Author: Christoph Hellwig 
Date:   Wed Nov 24 07:28:56 2021 +0100

blk-mq: cleanup request allocation

Reverting this patch helps. 

# git bisect log
git bisect start
# bad: [f30a24ed97b401416118756fa35fbe5d28f999e3] Add linux-next specific files 
for 20211126
git bisect bad f30a24ed97b401416118756fa35fbe5d28f999e3
# good: [136057256686de39cc3a07c2e39ef6bc43003ff6] Linux 5.16-rc2
git bisect good 136057256686de39cc3a07c2e39ef6bc43003ff6
# good: [f7cdb85df9f5705849b31d44f4f02de4fd0d5f9e] Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git
git bisect good f7cdb85df9f5705849b31d44f4f02de4fd0d5f9e
# bad: [df829d7e3ea25733fab4682a3a7ed2339f55d926] Merge branch 'for-next' of 
git://git.kernel.dk/linux-block.git
git bisect bad df829d7e3ea25733fab4682a3a7ed2339f55d926
# good: [bf0d7cde12d6913d21e790efeda122a1cf42c5c3] Merge branch 'drm-next' of 
https://gitlab.freedesktop.org/agd5f/linux
git bisect good bf0d7cde12d6913d21e790efeda122a1cf42c5c3
# good: [06cceaadd85112f6753fa46efd6538ac4bc0fed8] Merge branch 
'for-linux-next-gt' of git://anongit.freedesktop.org/drm-intel
git bisect good 06cceaadd85112f6753fa46efd6538ac4bc0fed8
# good: [6c26b5054ce2b822856e32f1840d13f777c6f295] ASoC: SOF: Intel: add .ack 
support for HDaudio platforms
git bisect good 6c26b5054ce2b822856e32f1840d13f777c6f295
# bad: [8b333e2b7f20a21c8bb969c1c05aa9c2d8c6f73f] Merge branch 
'for-5.17/io_uring' into for-next
git bisect bad 8b333e2b7f20a21c8bb969c1c05aa9c2d8c6f73f
# good: [441a375d2002edb994eba4b39259c6d177714578] blk-ioprio: don't set bio 
priority if not needed
git bisect good 441a375d2002edb994eba4b39259c6d177714578
# bad: [c895b784c699224d690c7dfbdcff309df82366e3] loop: don't hold lo_mutex 
during __loop_clr_fd()
git bisect bad c895b784c699224d690c7dfbdcff309df82366e3
# good: [10e69ae57a1d4a026de15a9c9058c79ecd47e287] block: don't include 
blk-mq-sched.h in blk.h
git bisect good 10e69ae57a1d4a026de15a9c9058c79ecd47e287
# good: [65db5bdc941eab6e3d2adee483d1cb0ec70a39ad] block: don't include 
 in blk.h
git bisect good 65db5bdc941eab6e3d2adee483d1cb0ec70a39ad
# bad: [f1880d26e517a3fe2fd0c32bcbe05e9230a441cf] blk-mq: cleanup request 
allocation
git bisect bad f1880d26e517a3fe2fd0c32bcbe05e9230a441cf
# good: [b1d1d48b8b3a90b4eb28fe3222ca57c6266e211c] block: don't include 
 in blk.h
git bisect good b1d1d48b8b3a90b4eb28fe3222ca57c6266e211c
# first bad commit: [f1880d26e517a3fe2fd0c32bcbe05e9230a441cf] blk-mq: cleanup 
request allocation
#

Thanks
-Sachin




boot.log
Description: Binary data


[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #44 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Interesting.

I wonder why it works with OUTLINE KASAN but not with INLINE KASAN.

Can you activate CONFIG_PTDUMP_DEBUGFS and provide the content of
/sys/kernel/debug/kernel_page_tables for the cases than don't work (OUTILINE
KASAN + 0x3000 LOWMEM_SIZE, INLINE KASAN + 0x2000 LOWMEM SIZE).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH v3] powerpc/64s: Get LPID bit width from device tree

2021-11-28 Thread Nicholas Piggin
Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
Since v2: fixed compile bug, added skiboot commit hash in changelog.

 arch/powerpc/include/asm/book3s/64/mmu.h |  9 ++---
 arch/powerpc/mm/book3s64/pgtable.c   |  5 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 13 +--
 arch/powerpc/mm/init_64.c| 46 +++-
 4 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index c02f42d1031e..8c500dd6fee4 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -62,6 +62,9 @@ extern struct patb_entry *partition_tb;
 #define PRTS_MASK  0x1f/* process table size field */
 #define PRTB_MASK  0x0000UL
 
+/* Number of supported LPID bits */
+extern unsigned int mmu_lpid_bits;
+
 /* Number of supported PID bits */
 extern unsigned int mmu_pid_bits;
 
@@ -76,10 +79,8 @@ extern unsigned long __ro_after_init radix_mem_block_size;
 #define PRTB_SIZE_SHIFT(mmu_pid_bits + 4)
 #define PRTB_ENTRIES   (1ul << mmu_pid_bits)
 
-/*
- * Power9 currently only support 64K partition table size.
- */
-#define PATB_SIZE_SHIFT16
+#define PATB_SIZE_SHIFT(mmu_lpid_bits + 4)
+#define PATB_ENTRIES   (1ul << mmu_lpid_bits)
 
 typedef unsigned long mm_context_id_t;
 struct spinlock;
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 9e16c7b1a6c5..13d1fbddecb9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -207,17 +207,12 @@ void __init mmu_partition_table_init(void)
unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
unsigned long ptcr;
 
-   BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 36), "Partition table size too 
large.");
/* Initialize the Partition Table with no entries */
partition_tb = memblock_alloc(patb_size, patb_size);
if (!partition_tb)
panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
  __func__, patb_size, patb_size);
 
-   /*
-* update partition table control register,
-* 64 K size.
-*/
ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
set_ptcr_when_no_uv(ptcr);
powernv_set_nmmu_ptcr(ptcr);
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 3a600bd7fbc6..6e365dab9b87 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -33,7 +33,6 @@
 
 #include 
 
-unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
 unsigned long radix_mem_block_size __ro_after_init;
 
@@ -357,18 +356,13 @@ static void __init radix_init_pgtable(void)
-1, PAGE_KERNEL));
}
 
-   /* Find out how many PID bits are supported */
if (!cpu_has_feature(CPU_FTR_HVMODE) &&
cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG)) {
/*
 * Older versions of KVM on these machines perfer if the
 * guest only uses the low 19 PID bits.
 */
-   if (!mmu_pid_bits)
-   mmu_pid_bits = 19;
-   } else {
-   if (!mmu_pid_bits)
-   mmu_pid_bits = 20;
+   mmu_pid_bits = 19;
}
mmu_base_pid = 1;
 
@@ -449,11 +443,6 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
if (type == NULL || strcmp(type, "cpu") != 0)
return 0;
 
-   /* Find MMU PID size */
-   prop = of_get_flat_dt_prop(node, "ibm,mmu-pid-bits", );
-   if (prop && size == 4)
-   mmu_pid_bits = be32_to_cpup(prop);
-
/* Grab page size encodings */
prop = of_get_flat_dt_prop(node, "ibm,processor-radix-AP-encodings", 
);
if (!prop)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 386be136026e..3e5f9ac9dded 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -370,6 +370,9 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 #ifdef CONFIG_PPC_BOOK3S_64
+unsigned int mmu_lpid_bits;
+unsigned int mmu_pid_bits;
+
 static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT);
 
 static int __init parse_disable_radix(char *p)
@@ -437,19 +440,60 @@ static void __init 

[PATCH v5 17/17] powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU

2021-11-28 Thread Nicholas Piggin
Microwatt implements a subset of ISA v3.0 (which is equivalent to
the POWER9_CPU option). It is radix-only, so does not require hash
MMU support.

This saves 20kB compressed dtbImage and 56kB vmlinux size.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/configs/microwatt_defconfig | 3 ++-
 arch/powerpc/platforms/microwatt/Kconfig | 1 -
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/configs/microwatt_defconfig 
b/arch/powerpc/configs/microwatt_defconfig
index 07d87a4044b2..eff933ebbb9e 100644
--- a/arch/powerpc/configs/microwatt_defconfig
+++ b/arch/powerpc/configs/microwatt_defconfig
@@ -15,6 +15,8 @@ CONFIG_EMBEDDED=y
 # CONFIG_COMPAT_BRK is not set
 # CONFIG_SLAB_MERGE_DEFAULT is not set
 CONFIG_PPC64=y
+CONFIG_POWER9_CPU=y
+# CONFIG_PPC_64S_HASH_MMU is not set
 # CONFIG_PPC_KUEP is not set
 # CONFIG_PPC_KUAP is not set
 CONFIG_CPU_LITTLE_ENDIAN=y
@@ -27,7 +29,6 @@ CONFIG_PPC_MICROWATT=y
 CONFIG_CPU_FREQ=y
 CONFIG_HZ_100=y
 CONFIG_PPC_4K_PAGES=y
-# CONFIG_PPC_MEM_KEYS is not set
 # CONFIG_SECCOMP is not set
 # CONFIG_MQ_IOSCHED_KYBER is not set
 # CONFIG_COREDUMP is not set
diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
b/arch/powerpc/platforms/microwatt/Kconfig
index 823192e9d38a..5e320f49583a 100644
--- a/arch/powerpc/platforms/microwatt/Kconfig
+++ b/arch/powerpc/platforms/microwatt/Kconfig
@@ -5,7 +5,6 @@ config PPC_MICROWATT
select PPC_XICS
select PPC_ICS_NATIVE
select PPC_ICP_NATIVE
-   select PPC_HASH_MMU_NATIVE if PPC_64S_HASH_MMU
select PPC_UDBG_16550
select ARCH_RANDOM
help
-- 
2.23.0



[PATCH v5 16/17] powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU

2021-11-28 Thread Nicholas Piggin
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  | 21 ++--
 .../include/asm/book3s/64/tlbflush-hash.h |  6 
 arch/powerpc/include/asm/book3s/pgtable.h |  4 +++
 arch/powerpc/include/asm/mmu_context.h|  2 ++
 arch/powerpc/include/asm/paca.h   |  8 +
 arch/powerpc/kernel/asm-offsets.c |  2 ++
 arch/powerpc/kernel/entry_64.S|  4 +--
 arch/powerpc/kernel/exceptions-64s.S  | 16 ++
 arch/powerpc/kernel/mce.c |  2 +-
 arch/powerpc/kernel/mce_power.c   | 10 --
 arch/powerpc/kernel/paca.c| 18 ---
 arch/powerpc/kernel/process.c | 13 
 arch/powerpc/kernel/prom.c|  2 ++
 arch/powerpc/kernel/setup_64.c|  5 +++
 arch/powerpc/kexec/core_64.c  |  4 +--
 arch/powerpc/kexec/ranges.c   |  4 +++
 arch/powerpc/mm/book3s64/Makefile | 15 +
 arch/powerpc/mm/book3s64/hugetlbpage.c|  2 ++
 arch/powerpc/mm/book3s64/mmu_context.c| 32 +++
 arch/powerpc/mm/book3s64/pgtable.c|  2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  4 +++
 arch/powerpc/mm/copro_fault.c |  2 ++
 arch/powerpc/mm/ptdump/Makefile   |  2 +-
 arch/powerpc/platforms/powernv/idle.c |  2 ++
 arch/powerpc/platforms/powernv/setup.c|  2 ++
 arch/powerpc/platforms/pseries/lpar.c | 11 +--
 arch/powerpc/platforms/pseries/lparcfg.c  |  2 +-
 arch/powerpc/platforms/pseries/mobility.c |  6 
 arch/powerpc/platforms/pseries/ras.c  |  2 ++
 arch/powerpc/platforms/pseries/reconfig.c |  2 ++
 arch/powerpc/platforms/pseries/setup.c|  6 ++--
 arch/powerpc/xmon/xmon.c  |  8 +++--
 drivers/misc/lkdtm/Makefile   |  2 +-
 drivers/misc/lkdtm/core.c |  2 +-
 35 files changed, 174 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1fa336ec8faf..fb48823ccd62 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,7 +129,7 @@ config PPC
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
-   select ARCH_HAS_MEMREMAP_COMPAT_ALIGN   if PPC_BOOK3S_64
+   select ARCH_HAS_MEMREMAP_COMPAT_ALIGN   if PPC_64S_HASH_MMU
select ARCH_HAS_MMIOWB  if PPC64
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PHYS_TO_DMA
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index c02f42d1031e..046888f7ab6e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -98,7 +98,9 @@ typedef struct {
 * from EA and new context ids to build the new VAs.
 */
mm_context_id_t id;
+#ifdef CONFIG_PPC_64S_HASH_MMU
mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
+#endif
};
 
/* Number of bits in the mm_cpumask */
@@ -110,7 +112,9 @@ typedef struct {
/* Number of user space windows opened in process mm_context */
atomic_t vas_windows;
 
+#ifdef CONFIG_PPC_64S_HASH_MMU
struct hash_mm_context *hash_context;
+#endif
 
void __user *vdso;
/*
@@ -133,6 +137,7 @@ typedef struct {
 #endif
 } mm_context_t;
 
+#ifdef CONFIG_PPC_64S_HASH_MMU
 static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
 {
return ctx->hash_context->user_psize;
@@ -193,8 +198,15 @@ static inline struct subpage_prot_table 
*mm_ctx_subpage_prot(mm_context_t *ctx)
 extern int mmu_linear_psize;
 extern int mmu_virtual_psize;
 extern int mmu_vmalloc_psize;
-extern int mmu_vmemmap_psize;
 extern int mmu_io_psize;
+#else /* CONFIG_PPC_64S_HASH_MMU */
+#ifdef CONFIG_PPC_64K_PAGES
+#define mmu_virtual_psize MMU_PAGE_64K
+#else
+#define mmu_virtual_psize MMU_PAGE_4K
+#endif
+#endif
+extern int mmu_vmemmap_psize;
 
 /* MMU initialization */
 void mmu_early_init_devtree(void);
@@ -233,8 +245,9 @@ static inline void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 * know which translations we will pick. Hence go with hash
 * restrictions.
 */
-   return hash__setup_initial_memory_limit(first_memblock_base,
-  first_memblock_size);
+   if (!radix_enabled())
+   hash__setup_initial_memory_limit(first_memblock_base,
+first_memblock_size);
 }
 
 #ifdef CONFIG_PPC_PSERIES
@@ -255,6 +268,7 @@ 

[PATCH v5 15/17] powerpc/64s: Make hash MMU support configurable

2021-11-28 Thread Nicholas Piggin
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig |  3 ++-
 arch/powerpc/include/asm/mmu.h   | 16 +---
 arch/powerpc/kernel/dt_cpu_ftrs.c| 14 ++
 arch/powerpc/kvm/Kconfig |  1 +
 arch/powerpc/mm/init_64.c| 13 +++--
 arch/powerpc/platforms/Kconfig.cputype   | 23 +--
 arch/powerpc/platforms/cell/Kconfig  |  1 +
 arch/powerpc/platforms/maple/Kconfig |  1 +
 arch/powerpc/platforms/microwatt/Kconfig |  2 +-
 arch/powerpc/platforms/pasemi/Kconfig|  1 +
 arch/powerpc/platforms/powermac/Kconfig  |  1 +
 arch/powerpc/platforms/powernv/Kconfig   |  2 +-
 12 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 27afb64d027c..1fa336ec8faf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -845,7 +845,7 @@ config FORCE_MAX_ZONEORDER
 config PPC_SUBPAGE_PROT
bool "Support setting protections for 4k subpages (subpage_prot 
syscall)"
default n
-   depends on PPC_BOOK3S_64 && PPC_64K_PAGES
+   depends on PPC_64S_HASH_MMU && PPC_64K_PAGES
help
  This option adds support for system call to allow user programs
  to set access permissions (read/write, readonly, or no access)
@@ -943,6 +943,7 @@ config PPC_MEM_KEYS
prompt "PowerPC Memory Protection Keys"
def_bool y
depends on PPC_BOOK3S_64
+   depends on PPC_64S_HASH_MMU
select ARCH_USES_HIGH_VMA_FLAGS
select ARCH_HAS_PKEYS
help
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 8abe8e42e045..5f41565a1e5d 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -157,7 +157,7 @@ DECLARE_PER_CPU(int, next_tlbcam_idx);
 
 enum {
MMU_FTRS_POSSIBLE =
-#if defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC_BOOK3S_604)
+#if defined(CONFIG_PPC_BOOK3S_604)
MMU_FTR_HPTE_TABLE |
 #endif
 #ifdef CONFIG_PPC_8xx
@@ -184,15 +184,18 @@ enum {
MMU_FTR_USE_TLBRSRV | MMU_FTR_USE_PAIRED_MAS |
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
+   MMU_FTR_KERNEL_RO |
+#ifdef CONFIG_PPC_64S_HASH_MMU
MMU_FTR_NO_SLBIE_B | MMU_FTR_16M_PAGE | MMU_FTR_TLBIEL |
MMU_FTR_LOCKLESS_TLBIE | MMU_FTR_CI_LARGE_PAGE |
MMU_FTR_1T_SEGMENT | MMU_FTR_TLBIE_CROP_VA |
-   MMU_FTR_KERNEL_RO | MMU_FTR_68_BIT_VA |
+   MMU_FTR_68_BIT_VA | MMU_FTR_HPTE_TABLE |
 #endif
 #ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
MMU_FTR_GTSE |
 #endif /* CONFIG_PPC_RADIX_MMU */
+#endif
 #ifdef CONFIG_PPC_KUAP
MMU_FTR_BOOK3S_KUAP |
 #endif /* CONFIG_PPC_KUAP */
@@ -224,6 +227,13 @@ enum {
 #define MMU_FTRS_ALWAYSMMU_FTR_TYPE_FSL_E
 #endif
 
+/* BOOK3S_64 options */
+#if defined(CONFIG_PPC_RADIX_MMU) && !defined(CONFIG_PPC_64S_HASH_MMU)
+#define MMU_FTRS_ALWAYSMMU_FTR_TYPE_RADIX
+#elif !defined(CONFIG_PPC_RADIX_MMU) && defined(CONFIG_PPC_64S_HASH_MMU)
+#define MMU_FTRS_ALWAYSMMU_FTR_HPTE_TABLE
+#endif
+
 #ifndef MMU_FTRS_ALWAYS
 #define MMU_FTRS_ALWAYS0
 #endif
@@ -329,7 +339,7 @@ static __always_inline bool radix_enabled(void)
return mmu_has_feature(MMU_FTR_TYPE_RADIX);
 }
 
-static inline bool early_radix_enabled(void)
+static __always_inline bool early_radix_enabled(void)
 {
return early_mmu_has_feature(MMU_FTR_TYPE_RADIX);
 }
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index d2b35fb9181d..1ac8d7357195 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -273,6 +273,9 @@ static int __init feat_enable_mmu_hash(struct 
dt_cpu_feature *f)
 {
u64 lpcr;
 
+   if (!IS_ENABLED(CONFIG_PPC_64S_HASH_MMU))
+   return 0;
+
lpcr = mfspr(SPRN_LPCR);
lpcr &= ~LPCR_ISL;
 
@@ -292,6 +295,9 @@ static int __init feat_enable_mmu_hash_v3(struct 
dt_cpu_feature *f)
 {
u64 lpcr;
 
+   if (!IS_ENABLED(CONFIG_PPC_64S_HASH_MMU))
+   return 0;
+
lpcr = mfspr(SPRN_LPCR);
lpcr &= ~(LPCR_ISL | LPCR_UPRT | LPCR_HR);
mtspr(SPRN_LPCR, lpcr);
@@ -305,15 +311,15 @@ static int __init feat_enable_mmu_hash_v3(struct 
dt_cpu_feature *f)
 
 static int __init feat_enable_mmu_radix(struct dt_cpu_feature *f)
 {
-#ifdef CONFIG_PPC_RADIX_MMU
+   if (!IS_ENABLED(CONFIG_PPC_RADIX_MMU))
+   return 0;
+
+   cur_cpu_spec->mmu_features |= MMU_FTR_KERNEL_RO;
cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
-   cur_cpu_spec->mmu_features |= MMU_FTRS_HASH_BASE;
cur_cpu_spec->mmu_features |= 

[PATCH v5 14/17] powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear

2021-11-28 Thread Nicholas Piggin
There are a few places that require MMU_FTR_HPTE_TABLE to be set even
when running in radix mode. Fix those up.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/pgtable.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index ce9482383144..abb3198bd277 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -81,9 +81,6 @@ static struct page *maybe_pte_to_page(pte_t pte)
 
 static pte_t set_pte_filter_hash(pte_t pte)
 {
-   if (radix_enabled())
-   return pte;
-
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) 
||
   cpu_has_feature(CPU_FTR_NOEXECUTE))) {
@@ -112,6 +109,9 @@ static inline pte_t set_pte_filter(pte_t pte)
 {
struct page *pg;
 
+   if (radix_enabled())
+   return pte;
+
if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
return set_pte_filter_hash(pte);
 
@@ -144,6 +144,9 @@ static pte_t set_access_flags_filter(pte_t pte, struct 
vm_area_struct *vma,
 {
struct page *pg;
 
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   return pte;
+
if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
return pte;
 
-- 
2.23.0



[PATCH v5 13/17] powerpc/64e: remove mmu_linear_psize

2021-11-28 Thread Nicholas Piggin
mmu_linear_psize is only set at boot once on 64e, is not necessarily
the correct size of the linear map pages, and is never used anywhere.
Remove it.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/nohash/mmu-book3e.h | 1 -
 arch/powerpc/mm/nohash/tlb.c | 9 -
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/mmu-book3e.h 
b/arch/powerpc/include/asm/nohash/mmu-book3e.h
index e43a418d3ccd..787e6482e299 100644
--- a/arch/powerpc/include/asm/nohash/mmu-book3e.h
+++ b/arch/powerpc/include/asm/nohash/mmu-book3e.h
@@ -284,7 +284,6 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
 #error Unsupported page size
 #endif
 
-extern int mmu_linear_psize;
 extern int mmu_vmemmap_psize;
 
 struct tlb_core_data {
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 647bf454a0fa..311281063d48 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -150,7 +150,6 @@ static inline int mmu_get_tsize(int psize)
  */
 #ifdef CONFIG_PPC64
 
-int mmu_linear_psize;  /* Page size used for the linear mapping */
 int mmu_pte_psize; /* Page size used for PTE pages */
 int mmu_vmemmap_psize; /* Page size used for the virtual mem map */
 int book3e_htw_mode;   /* HW tablewalk?  Value is PPC_HTW_* */
@@ -657,14 +656,6 @@ static void early_init_this_mmu(void)
 
 static void __init early_init_mmu_global(void)
 {
-   /* XXX This will have to be decided at runtime, but right
-* now our boot and TLB miss code hard wires it. Ideally
-* we should find out a suitable page size and patch the
-* TLB miss code (either that or use the PACA to store
-* the value we want)
-*/
-   mmu_linear_psize = MMU_PAGE_1G;
-
/* XXX This should be decided at runtime based on supported
 * page sizes in the TLB, but for now let's assume 16M is
 * always there and a good fit (which it probably is)
-- 
2.23.0



[PATCH v5 12/17] powerpc: make memremap_compat_align 64s-only

2021-11-28 Thread Nicholas Piggin
memremap_compat_align is only relevant when ZONE_DEVICE is selected.
ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected
by PPC_BOOK3S_64.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c | 20 
 arch/powerpc/mm/ioremap.c  | 20 
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index dea74d7717c0..27afb64d027c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,7 +129,7 @@ config PPC
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
-   select ARCH_HAS_MEMREMAP_COMPAT_ALIGN
+   select ARCH_HAS_MEMREMAP_COMPAT_ALIGN   if PPC_BOOK3S_64
select ARCH_HAS_MMIOWB  if PPC64
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PHYS_TO_DMA
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 81f2e5b670e2..4d97d1525d49 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -533,3 +533,23 @@ static int __init pgtable_debugfs_setup(void)
return 0;
 }
 arch_initcall(pgtable_debugfs_setup);
+
+#ifdef CONFIG_ZONE_DEVICE
+/*
+ * Override the generic version in mm/memremap.c.
+ *
+ * With hash translation, the direct-map range is mapped with just one
+ * page size selected by htab_init_page_sizes(). Consult
+ * mmu_psize_defs[] to determine the minimum page size alignment.
+*/
+unsigned long memremap_compat_align(void)
+{
+   if (!radix_enabled()) {
+   unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift;
+   return max(SUBSECTION_SIZE, 1UL << shift);
+   }
+
+   return SUBSECTION_SIZE;
+}
+EXPORT_SYMBOL_GPL(memremap_compat_align);
+#endif
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 57342154d2b0..4f12504fb405 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -98,23 +98,3 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, 
unsigned long size,
 
return NULL;
 }
-
-#ifdef CONFIG_ZONE_DEVICE
-/*
- * Override the generic version in mm/memremap.c.
- *
- * With hash translation, the direct-map range is mapped with just one
- * page size selected by htab_init_page_sizes(). Consult
- * mmu_psize_defs[] to determine the minimum page size alignment.
-*/
-unsigned long memremap_compat_align(void)
-{
-   unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift;
-
-   if (radix_enabled())
-   return SUBSECTION_SIZE;
-   return max(SUBSECTION_SIZE, 1UL << shift);
-
-}
-EXPORT_SYMBOL_GPL(memremap_compat_align);
-#endif
-- 
2.23.0



[PATCH v5 11/17] powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix

2021-11-28 Thread Nicholas Piggin
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6052f5d5ded3..9a493796ce66 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -880,14 +880,23 @@ void __init setup_per_cpu_areas(void)
int rc = -EINVAL;
 
/*
-* Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
-* to group units.  For larger mappings, use 1M atom which
-* should be large enough to contain a number of units.
+* BookE and BookS radix are historical values and should be revisited.
 */
-   if (mmu_linear_psize == MMU_PAGE_4K)
+   if (IS_ENABLED(CONFIG_PPC_BOOK3E)) {
+   atom_size = SZ_1M;
+   } else if (radix_enabled()) {
atom_size = PAGE_SIZE;
-   else
-   atom_size = 1 << 20;
+   } else {
+   /*
+* Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
+* to group units.  For larger mappings, use 1M atom which
+* should be large enough to contain a number of units.
+*/
+   if (mmu_linear_psize == MMU_PAGE_4K)
+   atom_size = PAGE_SIZE;
+   else
+   atom_size = SZ_1M;
+   }
 
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, 
pcpu_cpu_distance,
-- 
2.23.0



[PATCH v5 10/17] powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c

2021-11-28 Thread Nicholas Piggin
This file contains functions and data common to radix, so rename it to
remove the hash_ prefix.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/book3s64/Makefile  | 2 +-
 arch/powerpc/mm/book3s64/{hash_hugetlbpage.c => hugetlbpage.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/book3s64/{hash_hugetlbpage.c => hugetlbpage.c} (100%)

diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index 1579e18e098d..501efadb287f 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_PPC_HASH_MMU_NATIVE) += hash_native.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= radix_pgtable.o radix_tlb.o
 obj-$(CONFIG_PPC_4K_PAGES) += hash_4k.o
 obj-$(CONFIG_PPC_64K_PAGES)+= hash_64k.o
-obj-$(CONFIG_HUGETLB_PAGE) += hash_hugetlbpage.o
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 ifdef CONFIG_HUGETLB_PAGE
 obj-$(CONFIG_PPC_RADIX_MMU)+= radix_hugetlbpage.o
 endif
diff --git a/arch/powerpc/mm/book3s64/hash_hugetlbpage.c 
b/arch/powerpc/mm/book3s64/hugetlbpage.c
similarity index 100%
rename from arch/powerpc/mm/book3s64/hash_hugetlbpage.c
rename to arch/powerpc/mm/book3s64/hugetlbpage.c
-- 
2.23.0



[PATCH v5 09/17] powerpc/64s: move page size definitions from hash specific file

2021-11-28 Thread Nicholas Piggin
The radix code uses some of the psize variables. Move the common
ones from hash_utils.c to pgtable.c.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/book3s64/hash_utils.c | 5 -
 arch/powerpc/mm/book3s64/pgtable.c| 7 +++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 97a36fa3940e..eced266dc5e9 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -99,8 +99,6 @@
  */
 
 static unsigned long _SDR1;
-struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
-EXPORT_SYMBOL_GPL(mmu_psize_defs);
 
 u8 hpte_page_sizes[1 << LP_BITS];
 EXPORT_SYMBOL_GPL(hpte_page_sizes);
@@ -114,9 +112,6 @@ EXPORT_SYMBOL_GPL(mmu_linear_psize);
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
 EXPORT_SYMBOL_GPL(mmu_vmalloc_psize);
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-int mmu_vmemmap_psize = MMU_PAGE_4K;
-#endif
 int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 EXPORT_SYMBOL_GPL(mmu_kernel_ssize);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 9e16c7b1a6c5..81f2e5b670e2 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -22,6 +22,13 @@
 
 #include "internal.h"
 
+struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
+EXPORT_SYMBOL_GPL(mmu_psize_defs);
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+int mmu_vmemmap_psize = MMU_PAGE_4K;
+#endif
+
 unsigned long __pmd_frag_nr;
 EXPORT_SYMBOL(__pmd_frag_nr);
 unsigned long __pmd_frag_size_shift;
-- 
2.23.0



[PATCH v5 08/17] powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled

2021-11-28 Thread Nicholas Piggin
The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/mce_power.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index c2f55fe7092d..cf5263b648fc 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -80,12 +80,12 @@ static bool mce_in_guest(void)
 #ifdef CONFIG_PPC_BOOK3S_64
 void flush_and_reload_slb(void)
 {
-   /* Invalidate all SLBs */
-   slb_flush_all_realmode();
-
if (early_radix_enabled())
return;
 
+   /* Invalidate all SLBs */
+   slb_flush_all_realmode();
+
/*
 * This probably shouldn't happen, but it may be possible it's
 * called in early boot before SLB shadows are allocated.
-- 
2.23.0



[PATCH v5 07/17] powerpc/64s: move THP trace point creation out of hash specific file

2021-11-28 Thread Nicholas Piggin
In preparation for making hash MMU support configurable, move THP
trace point function definitions out of an otherwise hash-specific
file.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/book3s64/Makefile   | 2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c | 1 -
 arch/powerpc/mm/book3s64/trace.c| 8 
 3 files changed, 9 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/mm/book3s64/trace.c

diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index 319f4b7f3357..1579e18e098d 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -5,7 +5,7 @@ ccflags-y   := $(NO_MINIMAL_TOC)
 CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
 obj-y  += hash_pgtable.o hash_utils.o slb.o \
-  mmu_context.o pgtable.o hash_tlb.o
+  mmu_context.o pgtable.o hash_tlb.o trace.o
 obj-$(CONFIG_PPC_HASH_MMU_NATIVE)  += hash_native.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= radix_pgtable.o radix_tlb.o
 obj-$(CONFIG_PPC_4K_PAGES) += hash_4k.o
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index ad5eff097d31..7ce8914992e3 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -16,7 +16,6 @@
 
 #include 
 
-#define CREATE_TRACE_POINTS
 #include 
 
 #if H_PGTABLE_RANGE > (USER_VSID_RANGE * (TASK_SIZE_USER64 / 
TASK_CONTEXT_SIZE))
diff --git a/arch/powerpc/mm/book3s64/trace.c b/arch/powerpc/mm/book3s64/trace.c
new file mode 100644
index ..b86e7b906257
--- /dev/null
+++ b/arch/powerpc/mm/book3s64/trace.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * This file is for defining trace points and trace related helpers.
+ */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define CREATE_TRACE_POINTS
+#include 
+#endif
-- 
2.23.0



[PATCH v5 06/17] powerpc/pseries: lparcfg don't include slb_size line in radix mode

2021-11-28 Thread Nicholas Piggin
This avoids a change in behaviour in the later patch making hash
support configurable. This is possibly a user interface change, so
the alternative would be a hard-coded slb_size=0 here.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/pseries/lparcfg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index f71eac74ea92..3354c00914fa 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -532,7 +532,8 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
   lppaca_shared_proc(get_lppaca()));
 
 #ifdef CONFIG_PPC_BOOK3S_64
-   seq_printf(m, "slb_size=%d\n", mmu_slb_size);
+   if (!radix_enabled())
+   seq_printf(m, "slb_size=%d\n", mmu_slb_size);
 #endif
parse_em_data(m);
maxmem_data(m);
-- 
2.23.0



[PATCH v5 05/17] powerpc/pseries: move process table registration away from hash-specific code

2021-11-28 Thread Nicholas Piggin
This reduces ifdefs in a later change which makes hash support configurable.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/pseries/lpar.c | 56 +--
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 3df6bdfea475..06d6a824c0dc 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -712,6 +712,34 @@ void vpa_init(int cpu)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 
+static int pseries_lpar_register_process_table(unsigned long base,
+   unsigned long page_size, unsigned long table_size)
+{
+   long rc;
+   unsigned long flags = 0;
+
+   if (table_size)
+   flags |= PROC_TABLE_NEW;
+   if (radix_enabled()) {
+   flags |= PROC_TABLE_RADIX;
+   if (mmu_has_feature(MMU_FTR_GTSE))
+   flags |= PROC_TABLE_GTSE;
+   } else
+   flags |= PROC_TABLE_HPT_SLB;
+   for (;;) {
+   rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
+   page_size, table_size);
+   if (!H_IS_LONG_BUSY(rc))
+   break;
+   mdelay(get_longbusy_msecs(rc));
+   }
+   if (rc != H_SUCCESS) {
+   pr_err("Failed to register process table (rc=%ld)\n", rc);
+   BUG();
+   }
+   return rc;
+}
+
 static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
 unsigned long vpn, unsigned long pa,
 unsigned long rflags, unsigned long vflags,
@@ -1680,34 +1708,6 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
return 0;
 }
 
-static int pseries_lpar_register_process_table(unsigned long base,
-   unsigned long page_size, unsigned long table_size)
-{
-   long rc;
-   unsigned long flags = 0;
-
-   if (table_size)
-   flags |= PROC_TABLE_NEW;
-   if (radix_enabled()) {
-   flags |= PROC_TABLE_RADIX;
-   if (mmu_has_feature(MMU_FTR_GTSE))
-   flags |= PROC_TABLE_GTSE;
-   } else
-   flags |= PROC_TABLE_HPT_SLB;
-   for (;;) {
-   rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
-   page_size, table_size);
-   if (!H_IS_LONG_BUSY(rc))
-   break;
-   mdelay(get_longbusy_msecs(rc));
-   }
-   if (rc != H_SUCCESS) {
-   pr_err("Failed to register process table (rc=%ld)\n", rc);
-   BUG();
-   }
-   return rc;
-}
-
 void __init hpte_init_pseries(void)
 {
mmu_hash_ops.hpte_invalidate = pSeries_lpar_hpte_invalidate;
-- 
2.23.0



[PATCH v5 04/17] powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific

2021-11-28 Thread Nicholas Piggin
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h |  2 +-
 arch/powerpc/kernel/exceptions-64s.S |  4 ++--
 arch/powerpc/mm/book3s64/slb.c   | 16 
 arch/powerpc/mm/fault.c  | 24 
 4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index a1d238255f07..3487aab12229 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -564,7 +564,7 @@ DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
 
 /* slb.c */
 DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);
-DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
+DECLARE_INTERRUPT_HANDLER(do_bad_segment_interrupt);
 
 /* hash_utils.c */
 DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index eaf1f72131a1..046c99e31d01 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1430,7 +1430,7 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
std r3,RESULT(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  do_bad_slb_fault
+   bl  do_bad_segment_interrupt
b   interrupt_return_srr
 
 
@@ -1510,7 +1510,7 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
std r3,RESULT(r1)
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  do_bad_slb_fault
+   bl  do_bad_segment_interrupt
b   interrupt_return_srr
 
 
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index f0037bcc47a0..31f4cef3adac 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -868,19 +868,3 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
return err;
}
 }
-
-DEFINE_INTERRUPT_HANDLER(do_bad_slb_fault)
-{
-   int err = regs->result;
-
-   if (err == -EFAULT) {
-   if (user_mode(regs))
-   _exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
-   else
-   bad_page_fault(regs, SIGSEGV);
-   } else if (err == -EINVAL) {
-   unrecoverable_exception(regs);
-   } else {
-   BUG();
-   }
-}
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..2d4a411c7c85 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -620,4 +621,27 @@ DEFINE_INTERRUPT_HANDLER(do_bad_page_fault_segv)
 {
bad_page_fault(regs, SIGSEGV);
 }
+
+/*
+ * In radix, segment interrupts indicate the EA is not addressable by the
+ * page table geometry, so they are always sent here.
+ *
+ * In hash, this is called if do_slb_fault returns error. Typically it is
+ * because the EA was outside the region allowed by software.
+ */
+DEFINE_INTERRUPT_HANDLER(do_bad_segment_interrupt)
+{
+   int err = regs->result;
+
+   if (err == -EFAULT) {
+   if (user_mode(regs))
+   _exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
+   else
+   bad_page_fault(regs, SIGSEGV);
+   } else if (err == -EINVAL) {
+   unrecoverable_exception(regs);
+   } else {
+   BUG();
+   }
+}
 #endif
-- 
2.23.0



[PATCH v5 03/17] powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE

2021-11-28 Thread Nicholas Piggin
The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/tlbflush.h |   4 -
 arch/powerpc/mm/book3s64/hash_native.c| 104 --
 arch/powerpc/mm/book3s64/hash_utils.c | 104 ++
 arch/powerpc/platforms/pseries/Kconfig|   1 -
 4 files changed, 104 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 215973b4cb26..d2e80f178b6d 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -14,7 +14,6 @@ enum {
TLB_INVAL_SCOPE_LPID = 1,   /* invalidate TLBs for current LPID */
 };
 
-#ifdef CONFIG_PPC_NATIVE
 static inline void tlbiel_all(void)
 {
/*
@@ -30,9 +29,6 @@ static inline void tlbiel_all(void)
else
hash__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL);
 }
-#else
-static inline void tlbiel_all(void) { BUG(); }
-#endif
 
 static inline void tlbiel_all_lpid(bool radix)
 {
diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
b/arch/powerpc/mm/book3s64/hash_native.c
index d8279bfe68ea..d2a320828c0b 100644
--- a/arch/powerpc/mm/book3s64/hash_native.c
+++ b/arch/powerpc/mm/book3s64/hash_native.c
@@ -43,110 +43,6 @@
 
 static DEFINE_RAW_SPINLOCK(native_tlbie_lock);
 
-static inline void tlbiel_hash_set_isa206(unsigned int set, unsigned int is)
-{
-   unsigned long rb;
-
-   rb = (set << PPC_BITLSHIFT(51)) | (is << PPC_BITLSHIFT(53));
-
-   asm volatile("tlbiel %0" : : "r" (rb));
-}
-
-/*
- * tlbiel instruction for hash, set invalidation
- * i.e., r=1 and is=01 or is=10 or is=11
- */
-static __always_inline void tlbiel_hash_set_isa300(unsigned int set, unsigned 
int is,
-   unsigned int pid,
-   unsigned int ric, unsigned int prs)
-{
-   unsigned long rb;
-   unsigned long rs;
-   unsigned int r = 0; /* hash format */
-
-   rb = (set << PPC_BITLSHIFT(51)) | (is << PPC_BITLSHIFT(53));
-   rs = ((unsigned long)pid << PPC_BITLSHIFT(31));
-
-   asm volatile(PPC_TLBIEL(%0, %1, %2, %3, %4)
-: : "r"(rb), "r"(rs), "i"(ric), "i"(prs), "i"(r)
-: "memory");
-}
-
-
-static void tlbiel_all_isa206(unsigned int num_sets, unsigned int is)
-{
-   unsigned int set;
-
-   asm volatile("ptesync": : :"memory");
-
-   for (set = 0; set < num_sets; set++)
-   tlbiel_hash_set_isa206(set, is);
-
-   ppc_after_tlbiel_barrier();
-}
-
-static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
-{
-   unsigned int set;
-
-   asm volatile("ptesync": : :"memory");
-
-   /*
-* Flush the partition table cache if this is HV mode.
-*/
-   if (early_cpu_has_feature(CPU_FTR_HVMODE))
-   tlbiel_hash_set_isa300(0, is, 0, 2, 0);
-
-   /*
-* Now invalidate the process table cache. UPRT=0 HPT modes (what
-* current hardware implements) do not use the process table, but
-* add the flushes anyway.
-*
-* From ISA v3.0B p. 1078:
-* The following forms are invalid.
-*  * PRS=1, R=0, and RIC!=2 (The only process-scoped
-*HPT caching is of the Process Table.)
-*/
-   tlbiel_hash_set_isa300(0, is, 0, 2, 1);
-
-   /*
-* Then flush the sets of the TLB proper. Hash mode uses
-* partition scoped TLB translations, which may be flushed
-* in !HV mode.
-*/
-   for (set = 0; set < num_sets; set++)
-   tlbiel_hash_set_isa300(set, is, 0, 0, 0);
-
-   ppc_after_tlbiel_barrier();
-
-   asm volatile(PPC_ISA_3_0_INVALIDATE_ERAT "; isync" : : :"memory");
-}
-
-void hash__tlbiel_all(unsigned int action)
-{
-   unsigned int is;
-
-   switch (action) {
-   case TLB_INVAL_SCOPE_GLOBAL:
-   is = 3;
-   break;
-   case TLB_INVAL_SCOPE_LPID:
-   is = 2;
-   break;
-   default:
-   BUG();
-   }
-
-   if (early_cpu_has_feature(CPU_FTR_ARCH_300))
-   tlbiel_all_isa300(POWER9_TLB_SETS_HASH, is);
-   else if (early_cpu_has_feature(CPU_FTR_ARCH_207S))
-   tlbiel_all_isa206(POWER8_TLB_SETS, is);
-   else if (early_cpu_has_feature(CPU_FTR_ARCH_206))
-   tlbiel_all_isa206(POWER7_TLB_SETS, is);
-   else
-   WARN(1, "%s called on pre-POWER7 CPU\n", __func__);
-}
-
 static inline unsigned long  ___tlbie(unsigned long vpn, int psize,
int apsize, int ssize)
 {
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 92680da5229a..97a36fa3940e 

[PATCH v5 02/17] powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE

2021-11-28 Thread Nicholas Piggin
PPC_NATIVE now only controls the native HPT code, so rename it to be
more descriptive. Restrict it to Book3S only.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/book3s64/Makefile  | 2 +-
 arch/powerpc/mm/book3s64/hash_utils.c  | 2 +-
 arch/powerpc/platforms/52xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig | 4 ++--
 arch/powerpc/platforms/cell/Kconfig| 2 +-
 arch/powerpc/platforms/chrp/Kconfig| 2 +-
 arch/powerpc/platforms/embedded6xx/Kconfig | 2 +-
 arch/powerpc/platforms/maple/Kconfig   | 2 +-
 arch/powerpc/platforms/microwatt/Kconfig   | 2 +-
 arch/powerpc/platforms/pasemi/Kconfig  | 2 +-
 arch/powerpc/platforms/powermac/Kconfig| 2 +-
 arch/powerpc/platforms/powernv/Kconfig | 2 +-
 arch/powerpc/platforms/pseries/Kconfig | 2 +-
 13 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index 1b56d3af47d4..319f4b7f3357 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -6,7 +6,7 @@ CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
 obj-y  += hash_pgtable.o hash_utils.o slb.o \
   mmu_context.o pgtable.o hash_tlb.o
-obj-$(CONFIG_PPC_NATIVE)   += hash_native.o
+obj-$(CONFIG_PPC_HASH_MMU_NATIVE)  += hash_native.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= radix_pgtable.o radix_tlb.o
 obj-$(CONFIG_PPC_4K_PAGES) += hash_4k.o
 obj-$(CONFIG_PPC_64K_PAGES)+= hash_64k.o
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index cfd45245d009..92680da5229a 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1091,7 +1091,7 @@ void __init hash__early_init_mmu(void)
ps3_early_mm_init();
else if (firmware_has_feature(FW_FEATURE_LPAR))
hpte_init_pseries();
-   else if (IS_ENABLED(CONFIG_PPC_NATIVE))
+   else if (IS_ENABLED(CONFIG_PPC_HASH_MMU_NATIVE))
hpte_init_native();
 
if (!mmu_hash_ops.hpte_insert)
diff --git a/arch/powerpc/platforms/52xx/Kconfig 
b/arch/powerpc/platforms/52xx/Kconfig
index 99d60acc20c8..b72ed2950ca8 100644
--- a/arch/powerpc/platforms/52xx/Kconfig
+++ b/arch/powerpc/platforms/52xx/Kconfig
@@ -34,7 +34,7 @@ config PPC_EFIKA
bool "bPlan Efika 5k2. MPC5200B based computer"
depends on PPC_MPC52xx
select PPC_RTAS
-   select PPC_NATIVE
+   select PPC_HASH_MMU_NATIVE
 
 config PPC_LITE5200
bool "Freescale Lite5200 Eval Board"
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index e02d29a9d12f..d41dad227de8 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -40,9 +40,9 @@ config EPAPR_PARAVIRT
 
  In case of doubt, say Y
 
-config PPC_NATIVE
+config PPC_HASH_MMU_NATIVE
bool
-   depends on PPC_BOOK3S_32 || PPC64
+   depends on PPC_BOOK3S
help
  Support for running natively on the hardware, i.e. without
  a hypervisor. This option is not user-selectable but should
diff --git a/arch/powerpc/platforms/cell/Kconfig 
b/arch/powerpc/platforms/cell/Kconfig
index cb70c5f25bc6..db4465c51b56 100644
--- a/arch/powerpc/platforms/cell/Kconfig
+++ b/arch/powerpc/platforms/cell/Kconfig
@@ -8,7 +8,7 @@ config PPC_CELL_COMMON
select PPC_DCR_MMIO
select PPC_INDIRECT_PIO
select PPC_INDIRECT_MMIO
-   select PPC_NATIVE
+   select PPC_HASH_MMU_NATIVE
select PPC_RTAS
select IRQ_EDGE_EOI_HANDLER
 
diff --git a/arch/powerpc/platforms/chrp/Kconfig 
b/arch/powerpc/platforms/chrp/Kconfig
index 9b5c5505718a..ff30ed579a39 100644
--- a/arch/powerpc/platforms/chrp/Kconfig
+++ b/arch/powerpc/platforms/chrp/Kconfig
@@ -11,6 +11,6 @@ config PPC_CHRP
select RTAS_ERROR_LOGGING
select PPC_MPC106
select PPC_UDBG_16550
-   select PPC_NATIVE
+   select PPC_HASH_MMU_NATIVE
select FORCE_PCI
default y
diff --git a/arch/powerpc/platforms/embedded6xx/Kconfig 
b/arch/powerpc/platforms/embedded6xx/Kconfig
index 4c6d703a4284..c54786f8461e 100644
--- a/arch/powerpc/platforms/embedded6xx/Kconfig
+++ b/arch/powerpc/platforms/embedded6xx/Kconfig
@@ -55,7 +55,7 @@ config MVME5100
select FORCE_PCI
select PPC_INDIRECT_PCI
select PPC_I8259
-   select PPC_NATIVE
+   select PPC_HASH_MMU_NATIVE
select PPC_UDBG_16550
help
  This option enables support for the Motorola (now Emerson) MVME5100
diff --git a/arch/powerpc/platforms/maple/Kconfig 
b/arch/powerpc/platforms/maple/Kconfig
index 86ae210bee9a..7fd84311ade5 100644
--- a/arch/powerpc/platforms/maple/Kconfig
+++ b/arch/powerpc/platforms/maple/Kconfig
@@ -9,7 +9,7 @@ config PPC_MAPLE
select GENERIC_TBSYNC
select PPC_UDBG_16550
select PPC_970_NAP
-   select PPC_NATIVE
+   select 

[PATCH v5 00/17] powerpc: Make hash MMU code build configurable

2021-11-28 Thread Nicholas Piggin
Now that there's a platform that can make good use of it, here's
a series that can prevent the hash MMU code being built for 64s
platforms that don't need it.

Since v4:
- Fix 32s allnoconfig compile found by kernel test robot.
- Fix 64s platforms like cell that require hash but allow POWER9 CPU
  and !HASH to be selected found by Christophe.

Since v3:
- Merged microwatt patches into 1.
- Fix some changelogs, titles, comments.
- Keep MMU_FTR_HPTE_TABLE in the features when booting radix if hash
  support is is configured because it will be needed for KVM radix host
  hash guest (when we extend this option to KVM).
- Accounted for hopefully all review comments (thanks Christophe)

Since v2:
- Split MMU_FTR_HPTE_TABLE clearing for radix boot into its own patch.
- Remove memremap_compat_align from other sub archs entirely.
- Flip patch order of the 2 main patches to put Kconfig change first.
- Fixed Book3S/32 xmon segment dumping bug.
- Removed a few more ifdefs, changed numbers to use SZ_ definitions,
  etc.
- Fixed microwatt defconfig so it should actually disable hash MMU now.

Since v1:
- Split out most of the Kconfig change from the conditional compilation
  changes.
- Split out several more changes into preparatory patches.
- Reduced some ifdefs.
- Caught a few missing hash bits: pgtable dump, lkdtm,
  memremap_compat_align.

Since RFC:
- Split out large code movement from other changes.
- Used mmu ftr test constant folding rather than adding new constant
  true/false for radix_enabled().
- Restore tlbie trace point that had to be commented out in the
  previous.
- Avoid minor (probably unreachable) behaviour change in machine check
  handler when hash was not compiled.
- Fix microwatt updates so !HASH is not enforced.
- Rebase, build fixes.

Thanks,
Nick

Nicholas Piggin (17):
  powerpc: Remove unused FW_FEATURE_NATIVE references
  powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE
  powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
  powerpc/64s: Move and rename do_bad_slb_fault as it is not hash
specific
  powerpc/pseries: move process table registration away from
hash-specific code
  powerpc/pseries: lparcfg don't include slb_size line in radix mode
  powerpc/64s: move THP trace point creation out of hash specific file
  powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
  powerpc/64s: move page size definitions from hash specific file
  powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c
  powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
  powerpc: make memremap_compat_align 64s-only
  powerpc/64e: remove mmu_linear_psize
  powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear
  powerpc/64s: Make hash MMU support configurable
  powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
  powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU

 arch/powerpc/Kconfig  |   5 +-
 arch/powerpc/configs/microwatt_defconfig  |   3 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  |  21 +++-
 .../include/asm/book3s/64/tlbflush-hash.h |   6 +
 arch/powerpc/include/asm/book3s/64/tlbflush.h |   4 -
 arch/powerpc/include/asm/book3s/pgtable.h |   4 +
 arch/powerpc/include/asm/firmware.h   |   8 --
 arch/powerpc/include/asm/interrupt.h  |   2 +-
 arch/powerpc/include/asm/mmu.h|  16 ++-
 arch/powerpc/include/asm/mmu_context.h|   2 +
 arch/powerpc/include/asm/nohash/mmu-book3e.h  |   1 -
 arch/powerpc/include/asm/paca.h   |   8 ++
 arch/powerpc/kernel/asm-offsets.c |   2 +
 arch/powerpc/kernel/dt_cpu_ftrs.c |  14 ++-
 arch/powerpc/kernel/entry_64.S|   4 +-
 arch/powerpc/kernel/exceptions-64s.S  |  20 +++-
 arch/powerpc/kernel/mce.c |   2 +-
 arch/powerpc/kernel/mce_power.c   |  16 ++-
 arch/powerpc/kernel/paca.c|  18 ++-
 arch/powerpc/kernel/process.c |  13 +-
 arch/powerpc/kernel/prom.c|   2 +
 arch/powerpc/kernel/setup_64.c|  26 +++-
 arch/powerpc/kexec/core_64.c  |   4 +-
 arch/powerpc/kexec/ranges.c   |   4 +
 arch/powerpc/kvm/Kconfig  |   1 +
 arch/powerpc/mm/book3s64/Makefile |  19 +--
 arch/powerpc/mm/book3s64/hash_native.c| 104 
 arch/powerpc/mm/book3s64/hash_pgtable.c   |   1 -
 arch/powerpc/mm/book3s64/hash_utils.c | 111 +-
 .../{hash_hugetlbpage.c => hugetlbpage.c} |   2 +
 arch/powerpc/mm/book3s64/mmu_context.c|  32 -
 arch/powerpc/mm/book3s64/pgtable.c|  27 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   4 +
 arch/powerpc/mm/book3s64/slb.c|  16 ---
 arch/powerpc/mm/book3s64/trace.c  |   8 ++
 arch/powerpc/mm/copro_fault.c |   2 +
 arch/powerpc/mm/fault.c   |  24 
 

[PATCH v5 01/17] powerpc: Remove unused FW_FEATURE_NATIVE references

2021-11-28 Thread Nicholas Piggin
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/firmware.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 97a3bd9ffeb9..9b702d2b80fb 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -80,8 +80,6 @@ enum {
FW_FEATURE_POWERNV_ALWAYS = 0,
FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
-   FW_FEATURE_NATIVE_POSSIBLE = 0,
-   FW_FEATURE_NATIVE_ALWAYS = 0,
FW_FEATURE_POSSIBLE =
 #ifdef CONFIG_PPC_PSERIES
FW_FEATURE_PSERIES_POSSIBLE |
@@ -91,9 +89,6 @@ enum {
 #endif
 #ifdef CONFIG_PPC_PS3
FW_FEATURE_PS3_POSSIBLE |
-#endif
-#ifdef CONFIG_PPC_NATIVE
-   FW_FEATURE_NATIVE_ALWAYS |
 #endif
0,
FW_FEATURE_ALWAYS =
@@ -105,9 +100,6 @@ enum {
 #endif
 #ifdef CONFIG_PPC_PS3
FW_FEATURE_PS3_ALWAYS &
-#endif
-#ifdef CONFIG_PPC_NATIVE
-   FW_FEATURE_NATIVE_ALWAYS &
 #endif
FW_FEATURE_POSSIBLE,
 
-- 
2.23.0



Re: [PATCH RFC 0/4] mm: percpu: Cleanup percpu first chunk funciton

2021-11-28 Thread Dennis Zhou
On Mon, Nov 29, 2021 at 10:51:18AM +0800, Kefeng Wang wrote:
> Hi Dennis and all maintainers, any comments about the changes, many thanks.
> 
> On 2021/11/21 17:35, Kefeng Wang wrote:
> > When support page mapping percpu first chunk allocator on arm64, we
> > found there are lots of duplicated codes in percpu embed/page first
> > chunk allocator. This patchset is aimed to cleanup them and should
> > no funciton change, only test on arm64.
> > 
> > Kefeng Wang (4):
> >mm: percpu: Generalize percpu related config
> >mm: percpu: Add pcpu_fc_cpu_to_node_fn_t typedef
> >mm: percpu: Add generic pcpu_fc_alloc/free funciton
> >mm: percpu: Add generic pcpu_populate_pte() function
> > 
> >   arch/arm64/Kconfig |  20 +
> >   arch/ia64/Kconfig  |   9 +--
> >   arch/mips/Kconfig  |  10 +--
> >   arch/mips/mm/init.c|  14 +---
> >   arch/powerpc/Kconfig   |  17 +---
> >   arch/powerpc/kernel/setup_64.c |  92 +
> >   arch/riscv/Kconfig |  10 +--
> >   arch/sparc/Kconfig |  12 +--
> >   arch/sparc/kernel/smp_64.c | 105 +---
> >   arch/x86/Kconfig   |  17 +---
> >   arch/x86/kernel/setup_percpu.c |  66 ++-
> >   drivers/base/arch_numa.c   |  68 +---
> >   include/linux/percpu.h |  13 +--
> >   mm/Kconfig |  12 +++
> >   mm/percpu.c| 143 +
> >   15 files changed, 165 insertions(+), 443 deletions(-)
> > 

Hi Kefang,

I apologize for the delay. It's a holiday week in the US + I had some
personal things come up at the beginning of last week. I'll have it
reviewed by tomorrow.

Thanks,
Dennis


Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 09:08:41PM +1000, Nicholas Piggin wrote:
> Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> > In many cases people use bitmap_weight()-based functions like this:
> > 
> > if (num_present_cpus() > 1)
> > do_something();
> > 
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> > 
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> > 
> > if (num_present_cpus_gt(1))
> > do_something();
> > 
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
> 
> There would be no change to callers if you maintain counters like what
> is done for num_online_cpus() today. Maybe some fixes to arch code that
> does not use set_cpu_possible() etc APIs required, but AFAIKS it would
> be better to fix such cases anyway.

Thanks, Nick. I'll try to do this.


[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #42 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 299761
  --> https://bugzilla.kernel.org/attachment.cgi?id=299761=edit
dmesg (5.15.5, INLINE KASAN, LOWMEM_SIZE=0x2000, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #299715|0   |1
is obsolete||

--- Comment #43 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 299763
  --> https://bugzilla.kernel.org/attachment.cgi?id=299763=edit
kernel .config (5.15.5, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-11-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #299711|0   |1
is obsolete||

--- Comment #41 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 299759
  --> https://bugzilla.kernel.org/attachment.cgi?id=299759=edit
dmesg (5.15.5, OUTLINE KASAN, LOWMEM_SIZE=0x2800, PowerMac G4 DP)

Ok, finally getting somewhere.. With CONFIG_LOWMEM_SIZE=0x2800 I don't get
this "vmap allocation for size 33562624 failed" error and also OUTLINE KASAN
does not show the originally reported "BUG: Unable to handle kernel data access
at 0x00f0fd0d" for raid6_pq!

For INLINE KASAN I still get "vmap allocation for size 33562624 failed", even
with CONFIG_LOWMEM_SIZE=0x2800 or CONFIG_LOWMEM_SIZE=0x2000.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v4 08/25] kernel: Add combined power-off+restart handler call chain API

2021-11-28 Thread Dmitry Osipenko
29.11.2021 00:17, Michał Mirosław пишет:
>> I'm having trouble with parsing this comment. Could you please try to
>> rephrase it? I don't see how you could check whether power-off handler
>> is available if you'll mix all handlers together.
> If notify_call_chain() would be fixed to return NOTIFY_OK if any call
> returned NOTIFY_OK, then this would be a clear way to gather the
> answer if any of the handlers will attempt the final action (reboot or
> power off).
> 

Could you please show a code snippet that implements your suggestion?


Re: [PATCH v4 18/25] x86: Use do_kernel_power_off()

2021-11-28 Thread Dmitry Osipenko
28.11.2021 04:15, Michał Mirosław пишет:
> On Fri, Nov 26, 2021 at 09:00:54PM +0300, Dmitry Osipenko wrote:
>> Kernel now supports chained power-off handlers. Use do_kernel_power_off()
>> that invokes chained power-off handlers. It also invokes legacy
>> pm_power_off() for now, which will be removed once all drivers will
>> be converted to the new power-off API.
>>
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  arch/x86/kernel/reboot.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>> index 0a40df66a40d..cd7d9416d81a 100644
>> --- a/arch/x86/kernel/reboot.c
>> +++ b/arch/x86/kernel/reboot.c
>> @@ -747,10 +747,10 @@ static void native_machine_halt(void)
>>  
>>  static void native_machine_power_off(void)
>>  {
>> -if (pm_power_off) {
>> +if (kernel_can_power_off()) {
>>  if (!reboot_force)
>>  machine_shutdown();
>> -pm_power_off();
>> +do_kernel_power_off();
>>  }
> 
> Judging from an old commit from 2006 [1], this can be rewritten as:
> 
> if (!reboot_force && kernel_can_power_off())
>   machine_shutdown();
> do_kernel_power_off();
> 
> And maybe later reworked so it doesn't need kernel_can_power_off().
> 
> [1] http://lkml.iu.edu/hypermail//linux/kernel/0511.3/0681.html

It could be rewritten like you're suggesting, but I'd prefer to keep the
old variant, for clarity.


Re: [PATCH v4 05/25] reboot: Warn if restart handler has duplicated priority

2021-11-28 Thread Dmitry Osipenko
28.11.2021 03:28, Michał Mirosław пишет:
> On Fri, Nov 26, 2021 at 09:00:41PM +0300, Dmitry Osipenko wrote:
>> Add sanity check which ensures that there are no two restart handlers
>> registered with the same priority. Normally it's a direct sign of a
>> problem if two handlers use the same priority.
> 
> The patch doesn't ensure the property that there are no duplicated-priority
> entries on the chain.

It's not the exact point of this patch.

> I'd rather see a atomic_notifier_chain_register_unique() that returns
> -EBUSY or something istead of adding an entry with duplicate priority.
> That way it would need only one list traversal unless you want to
> register the duplicate anyway (then you would call the older
> atomic_notifier_chain_register() after reporting the error).

The point of this patch is to warn developers about the problem that
needs to be fixed. We already have such troubling drivers in mainline.

It's not critical to register different handlers with a duplicated
priorities, but such cases really need to be corrected. We shouldn't
break users' machines during transition to the new API, meanwhile
developers should take action of fixing theirs drivers.

> (Or you could return > 0 when a duplicate is registered in
> atomic_notifier_chain_register() if the callers are prepared
> for that. I don't really like this way, though.)

I had a similar thought at some point before and decided that I'm not in
favor of this approach. It's nicer to have a dedicated function that
verifies the uniqueness, IMO.


Re: [PATCH v4 22/25] memory: emif: Use kernel_can_power_off()

2021-11-28 Thread Dmitry Osipenko
28.11.2021 04:23, Michał Mirosław пишет:
> On Fri, Nov 26, 2021 at 09:00:58PM +0300, Dmitry Osipenko wrote:
>> Replace legacy pm_power_off with kernel_can_power_off() helper that
>> is aware about chained power-off handlers.
>>
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  drivers/memory/emif.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/memory/emif.c b/drivers/memory/emif.c
>> index 762d0c0f0716..cab10d5274a0 100644
>> --- a/drivers/memory/emif.c
>> +++ b/drivers/memory/emif.c
>> @@ -630,7 +630,7 @@ static irqreturn_t emif_threaded_isr(int irq, void 
>> *dev_id)
>>  dev_emerg(emif->dev, "SDRAM temperature exceeds operating 
>> limit.. Needs shut down!!!\n");
>>  
>>  /* If we have Power OFF ability, use it, else try restarting */
>> -if (pm_power_off) {
>> +if (kernel_can_power_off()) {
>>  kernel_power_off();
>>  } else {
>>  WARN(1, "FIXME: NO pm_power_off!!! trying restart\n");
> 
> BTW, this part of the code seems to be better moved to generic code that
> could replace POWER_OFF request with REBOOT like it is done for reboot()
> syscall.

Not sure that it can be done. Somebody will have to verify that it won't
break all those platform power-off handlers. Better to keep this code
as-is in the context of this patchset.


Re: [PATCH v4 08/25] kernel: Add combined power-off+restart handler call chain API

2021-11-28 Thread Dmitry Osipenko
28.11.2021 03:43, Michał Mirosław пишет:
> On Fri, Nov 26, 2021 at 09:00:44PM +0300, Dmitry Osipenko wrote:
>> SoC platforms often have multiple ways of how to perform system's
>> power-off and restart operations. Meanwhile today's kernel is limited to
>> a single option. Add combined power-off+restart handler call chain API,
>> which is inspired by the restart API. The new API provides both power-off
>> and restart functionality.
>>
>> The old pm_power_off method will be kept around till all users are
>> converted to the new API.
>>
>> Current restart API will be replaced by the new unified API since
>> new API is its superset. The restart functionality of the sys-off handler
>> API is built upon the existing restart-notifier APIs.
>>
>> In order to ease conversion to the new API, convenient helpers are added
>> for the common use-cases. They will reduce amount of boilerplate code and
>> remove global variables. These helpers preserve old behaviour for cases
>> where only one power-off handler is expected, this is what all existing
>> drivers want, and thus, they could be easily converted to the new API.
>> Users of the new API should explicitly enable power-off chaining by
>> setting corresponding flag of the power_handler structure.
> [...]
> 
> Hi,
> 
> A general question: do we really need three distinct chains for this?

Hello Michał,

At minimum this makes code easier to follow.

> Can't there be only one that chain of callbacks that get a stage
> (RESTART_PREPARE, RESTART, POWER_OFF_PREPARE, POWER_OFF) and can ignore
> them at will? Calling through POWER_OFF_PREPARE would also return
> whether that POWER_OFF is possible (for kernel_can_power_off()).

I'm having trouble with parsing this comment. Could you please try to
rephrase it? I don't see how you could check whether power-off handler
is available if you'll mix all handlers together.

> I would also split this patch into preparation cleanups (like wrapping
> pm_power_off call with a function) and adding the notifier-based
> implementation.

What's the benefit of this split up will be? Are you suggesting that it
will ease reviewing of this patch or something else?


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 12:54:00PM -0500, Dennis Zhou wrote:
> Hello,
> 
> On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> > On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace 
> > > > num_*_cpus()
> > > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > > to return earlier depending on the condition.
> > > []
> > > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > > []
> > > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > >  * if platform didn't set the present map already, do it now
> > > >  * boot cpu is set to present already by init/main.c
> > > >  */
> > > > -   if (num_present_cpus() <= 1)
> > > > +   if (num_present_cpus_le(2))
> > > > init_cpu_present(cpu_possible_mask);
> > > 
> > > ?  is this supposed to be 2 or 1
> > 
> > X <= 1 is the equivalent of X < 2.
> > 
> > > > diff --git a/drivers/cpufreq/pcc-cpufreq.c 
> > > > b/drivers/cpufreq/pcc-cpufreq.c
> > > []
> > > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > > > return ret;
> > > > }
> > > >  
> > > > -   if (num_present_cpus() > 4) {
> > > > +   if (num_present_cpus_gt(4)) {
> > > > pcc_cpufreq_driver.flags |= 
> > > > CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > > > pr_err("%s: Too many CPUs, dynamic performance scaling 
> > > > disabled\n",
> > > >__func__);
> > > 
> > > It looks as if the present variants should be using the same values
> > > so the _le test above with 1 changed to 2 looks odd.
> >  
> 
> I think the confusion comes from le meaning less than rather than lt.
> Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
> gt (>), I'd consider renaming your le to lt.

Ok, makes sense. I'll rename in v2 and add <= and >= versions.


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Joe Perches
On Sun, 2021-11-28 at 09:43 -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >* if platform didn't set the present map already, do it now
> > >* boot cpu is set to present already by init/main.c
> > >*/
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > >   init_cpu_present(cpu_possible_mask);
> > 
> > ?  is this supposed to be 2 or 1
> 
> X <= 1 is the equivalent of X < 2.

True. The call though is _le not _lt




Re: [PATCH 7/9] lib/cpumask: add num_{possible, present, active}_cpus_{eq, gt, le}

2021-11-28 Thread Emil Renner Berthing
On Sun, 28 Nov 2021 at 18:43, Yury Norov  wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >  * if platform didn't set the present map already, do it now
> > >  * boot cpu is set to present already by init/main.c
> > >  */
> > > -   if (num_present_cpus() <= 1)
> > > +   if (num_present_cpus_le(2))
> > > init_cpu_present(cpu_possible_mask);
> >
> > ?  is this supposed to be 2 or 1
>
> X <= 1 is the equivalent of X < 2.

Ah, then the function is confusing. Usually it's lt = less than and lt
= less than or equal. Same idea for gt vs ge.


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Dennis Zhou
Hello,

On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >* if platform didn't set the present map already, do it now
> > >* boot cpu is set to present already by init/main.c
> > >*/
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > >   init_cpu_present(cpu_possible_mask);
> > 
> > ?  is this supposed to be 2 or 1
> 
> X <= 1 is the equivalent of X < 2.
> 
> > > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> > []
> > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > >   return ret;
> > >   }
> > >  
> > > - if (num_present_cpus() > 4) {
> > > + if (num_present_cpus_gt(4)) {
> > >   pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > >   pr_err("%s: Too many CPUs, dynamic performance scaling 
> > > disabled\n",
> > >  __func__);
> > 
> > It looks as if the present variants should be using the same values
> > so the _le test above with 1 changed to 2 looks odd.
>  

I think the confusion comes from le meaning less than rather than lt.
Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
gt (>), I'd consider renaming your le to lt.

Thanks,
Dennis


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > with one of new functions where appropriate. This allows num_*_cpus_*()
> > to return earlier depending on the condition.
> []
> > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> []
> > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> >  * if platform didn't set the present map already, do it now
> >  * boot cpu is set to present already by init/main.c
> >  */
> > -   if (num_present_cpus() <= 1)
> > +   if (num_present_cpus_le(2))
> > init_cpu_present(cpu_possible_mask);
> 
> ?  is this supposed to be 2 or 1

X <= 1 is the equivalent of X < 2.

> > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> []
> > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > return ret;
> > }
> >  
> > -   if (num_present_cpus() > 4) {
> > +   if (num_present_cpus_gt(4)) {
> > pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > pr_err("%s: Too many CPUs, dynamic performance scaling 
> > disabled\n",
> >__func__);
> 
> It looks as if the present variants should be using the same values
> so the _le test above with 1 changed to 2 looks odd.
 


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Joe Perches
On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> with one of new functions where appropriate. This allows num_*_cpus_*()
> to return earlier depending on the condition.
[]
> diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
[]
> @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>* if platform didn't set the present map already, do it now
>* boot cpu is set to present already by init/main.c
>*/
> - if (num_present_cpus() <= 1)
> + if (num_present_cpus_le(2))
>   init_cpu_present(cpu_possible_mask);

?  is this supposed to be 2 or 1

> diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
[]
> @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
>   return ret;
>   }
>  
> - if (num_present_cpus() > 4) {
> + if (num_present_cpus_gt(4)) {
>   pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
>   pr_err("%s: Too many CPUs, dynamic performance scaling 
> disabled\n",
>  __func__);

It looks as if the present variants should be using the same values
so the _le test above with 1 changed to 2 looks odd.




Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Nicholas Piggin
Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> In many cases people use bitmap_weight()-based functions like this:
> 
>   if (num_present_cpus() > 1)
>   do_something();
> 
> This may take considerable amount of time on many-cpus machines because
> num_present_cpus() will traverse every word of underlying cpumask
> unconditionally.
> 
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count present cpus to any number greater than 1:
> 
>   if (num_present_cpus_gt(1))
>   do_something();
> 
> To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> functions together with corresponding wrappers in cpumask and nodemask.

There would be no change to callers if you maintain counters like what
is done for num_online_cpus() today. Maybe some fixes to arch code that
does not use set_cpu_possible() etc APIs required, but AFAIKS it would
be better to fix such cases anyway.

Thanks,
Nick


Re: [PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}

2021-11-28 Thread Greg Kroah-Hartman
On Sat, Nov 27, 2021 at 07:56:58PM -0800, Yury Norov wrote:
> bitmap_weight() counts all set bits in the bitmap unconditionally.
> However in some cases we can traverse a part of bitmap when we
> only need to check if number of set bits is greater, less or equal
> to some number.
> 
> This patch replaces bitmap_weight() with one of
> bitmap_{empty,full,eq,gt,le), as appropriate.
> 
> In some places driver code has been optimized further, where it's
> trivial.
> 
> Signed-off-by: Yury Norov 
> ---
>  arch/nds32/kernel/perf_event_cpu.c |  4 +---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 ++--
>  arch/x86/kvm/hyperv.c  |  8 
>  drivers/crypto/ccp/ccp-dev-v5.c|  5 +
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c   |  2 +-
>  drivers/iio/adc/mxs-lradc-adc.c|  3 +--
>  drivers/iio/dummy/iio_simple_dummy_buffer.c|  4 ++--
>  drivers/iio/industrialio-buffer.c  |  2 +-
>  drivers/iio/industrialio-trigger.c |  2 +-
>  drivers/memstick/core/ms_block.c   |  4 ++--
>  drivers/net/dsa/b53/b53_common.c   |  2 +-
>  drivers/net/ethernet/broadcom/bcmsysport.c |  6 +-
>  drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c   |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  2 +-
>  .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c  |  2 +-
>  .../ethernet/marvell/octeontx2/nic/otx2_flows.c|  8 
>  .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c   |  2 +-
>  drivers/net/ethernet/mellanox/mlx4/cmd.c   | 10 +++---
>  drivers/net/ethernet/mellanox/mlx4/eq.c|  4 ++--
>  drivers/net/ethernet/mellanox/mlx4/main.c  |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  2 +-
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |  3 +--
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c |  4 ++--
>  drivers/net/ethernet/qlogic/qed/qed_roce.c |  2 +-
>  drivers/perf/arm-cci.c |  2 +-
>  drivers/perf/arm_pmu.c |  4 ++--
>  drivers/perf/hisilicon/hisi_uncore_pmu.c   |  2 +-
>  drivers/perf/thunderx2_pmu.c   |  3 +--
>  drivers/perf/xgene_pmu.c   |  2 +-
>  drivers/pwm/pwm-pca9685.c  |  2 +-
>  drivers/staging/media/tegra-video/vi.c |  2 +-
>  drivers/thermal/intel/intel_powerclamp.c   | 10 --
>  fs/ocfs2/cluster/heartbeat.c   | 14 +++---
>  33 files changed, 57 insertions(+), 75 deletions(-)

After you get the new functions added to the kernel tree, this patch
should be broken up into one-patch-per-subsystem and submitted through
the various subsystem trees.

thanks,

greg k-h


Re: [PATCH 1/2] tools/perf: Include global and local variants for p_stage_cyc sort key

2021-11-28 Thread Jiri Olsa
On Thu, Nov 25, 2021 at 08:18:50AM +0530, Athira Rajeev wrote:
> Sort key p_stage_cyc is used to present the latency
> cycles spend in pipeline stages. perf tool has local
> p_stage_cyc sort key to display this info. There is no
> global variant available for this sort key. local variant
> shows latency in a sinlge sample, whereas, global value
> will be useful to present the total latency (sum of
> latencies) in the hist entry. It represents latency
> number multiplied by the number of samples.
> 
> Add global (p_stage_cyc) and local variant
> (local_p_stage_cyc) for this sort key. Use the
> local_p_stage_cyc as default option for "mem" sort mode.
> Also add this to list of dynamic sort keys.
> 
> Signed-off-by: Athira Rajeev 
> Reported-by: Namhyung Kim 

I can't apply this to Arnaldo's perf/core, could you please rebase?

patching file util/hist.c
patching file util/hist.h
patching file util/sort.c
Hunk #3 FAILED at 1392.
Hunk #4 succeeded at 1878 (offset 20 lines).
1 out of 4 hunks FAILED -- saving rejects to file util/sort.c.rej
patching file util/sort.h

thanks,
jirka

> ---
>  tools/perf/util/hist.c |  4 +++-
>  tools/perf/util/hist.h |  3 ++-
>  tools/perf/util/sort.c | 34 +-
>  tools/perf/util/sort.h |  3 ++-
>  4 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index b776465e04ef..0a8033b09e28 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -211,7 +211,9 @@ void hists__calc_col_len(struct hists *hists, struct 
> hist_entry *h)
>   hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
>   hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13);
>   hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13);
> - hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13);
> + hists__new_col_len(hists, HISTC_LOCAL_P_STAGE_CYC, 13);
> + hists__new_col_len(hists, HISTC_GLOBAL_P_STAGE_CYC, 13);
> +
>   if (symbol_conf.nanosecs)
>   hists__new_col_len(hists, HISTC_TIME, 16);
>   else
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index 5343b62476e6..2752ce681108 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -75,7 +75,8 @@ enum hist_column {
>   HISTC_MEM_BLOCKED,
>   HISTC_LOCAL_INS_LAT,
>   HISTC_GLOBAL_INS_LAT,
> - HISTC_P_STAGE_CYC,
> + HISTC_LOCAL_P_STAGE_CYC,
> + HISTC_GLOBAL_P_STAGE_CYC,
>   HISTC_NR_COLS, /* Last entry */
>  };
>  
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index e9216a292a04..e978f7883e07 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -37,7 +37,7 @@ const char  default_parent_pattern[] = 
> "^sys_|^do_page_fault";
>  const char   *parent_pattern = default_parent_pattern;
>  const char   *default_sort_order = "comm,dso,symbol";
>  const char   default_branch_sort_order[] = 
> "comm,dso_from,symbol_from,symbol_to,cycles";
> -const char   default_mem_sort_order[] = 
> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc";
> +const char   default_mem_sort_order[] = 
> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc";
>  const char   default_top_sort_order[] = "dso,symbol";
>  const char   default_diff_sort_order[] = "dso,symbol";
>  const char   default_tracepoint_sort_order[] = "trace";
> @@ -46,8 +46,8 @@ const char  *field_order;
>  regex_t  ignore_callees_regex;
>  int  have_ignore_callees = 0;
>  enum sort_mode   sort__mode = SORT_MODE__NORMAL;
> -const char   *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"};
> -const char   *arch_specific_sort_keys[] = {"p_stage_cyc"};
> +const char   *dynamic_headers[] = {"local_ins_lat", "ins_lat", 
> "local_p_stage_cyc", "p_stage_cyc"};
> +const char   *arch_specific_sort_keys[] = {"local_p_stage_cyc", 
> "p_stage_cyc"};
>  
>  /*
>   * Replaces all occurrences of a char used with the:
> @@ -1392,22 +1392,37 @@ struct sort_entry sort_global_ins_lat = {
>  };
>  
>  static int64_t
> -sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry 
> *right)
> +sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
>  {
>   return left->p_stage_cyc - right->p_stage_cyc;
>  }
>  
> +static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, 
> char *bf,
> + size_t size, unsigned int width)
> +{
> + return repsep_snprintf(bf, size, "%-*u", width,
> + he->p_stage_cyc * he->stat.nr_events);
> +}
> +
> +
>  static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
>   size_t size, unsigned int width)
>  {
>   return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
>  }
>  
> -struct sort_entry sort_p_stage_cyc = {
> - .se_header  = "Pipeline Stage Cycle",
> - .se_cmp  

[Bug 213837] "Kernel panic - not syncing: corrupted stack end detected inside scheduler" at building via distcc on a G5

2021-11-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213837

--- Comment #13 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 299755
  --> https://bugzilla.kernel.org/attachment.cgi?id=299755=edit
dmesg (5.16-rc2 + patch, PowerMac G5 11,2)

Still happens with with 5.16-rc2, but getting a slightly different error
message this time. Also this crash happened earlier, not at building distcc but
at unpacking the to be built tar.gz archive with tar + pigz:

[...]
stack: c5b0e600:  0003 c000 00105b2c  ..[,
stack: c5b0e610: c000 05b0e6a0 0031faa1 bd74990f  .1...t..
stack: c5b0e620: c000 00104f50  0006  ..OP
stack: c5b0e630: c000 05b0e6a0 0031faa1 bd74990f  .1...t..
kernel tried to execute exec-protected page (c5b0bbe0) - exploit
attemp? (uid: 0)
stack: c5b0e640: c000 0001 c000 022f2a08  ./*.

The last 2 lines were not in the netconsole.log but only were to be seen on the
screen of the frozen G5 so I added them manually.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.