[PATCH v8 3/3] Documentation/powerpc: update fadump implementation details

2024-02-16 Thread Sourabh Jain
The patch titled ("powerpc: make fadump resilient with memory add/remove
events") has made significant changes to the implementation of fadump,
particularly on elfcorehdr creation and fadump crash info header
structure. Therefore, updating the fadump implementation documentation
to reflect those changes.

Following updates are done to firmware assisted dump documentation:

1. The elfcorehdr is no longer stored after fadump HDR in the reserved
   dump area. Instead, the second kernel dynamically allocates memory
   for the elfcorehdr within the address range from 0 to the boot memory
   size. Therefore, update figures 1 and 2 of Memory Reservation during
   the first and second kernels to reflect this change.

2. A version field has been added to the fadump header to manage the
   future changes to fadump crash info header structure without changing
   the fadump header magic number in the future. Therefore, remove the
   corresponding TODO from the document.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 .../arch/powerpc/firmware-assisted-dump.rst   | 91 +--
 1 file changed, 42 insertions(+), 49 deletions(-)

diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst 
b/Documentation/arch/powerpc/firmware-assisted-dump.rst
index e363fc48529a..7e37aadd1f77 100644
--- a/Documentation/arch/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst
@@ -134,12 +134,12 @@ that are run. If there is dump data, then the
 memory is held.
 
 If there is no waiting dump data, then only the memory required to
-hold CPU state, HPTE region, boot memory dump, FADump header and
-elfcore header, is usually reserved at an offset greater than boot
-memory size (see Fig. 1). This area is *not* released: this region
-will be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state and
-HPTE region, in the case a crash does occur.
+hold CPU state, HPTE region, boot memory dump, and FADump header is
+usually reserved at an offset greater than boot memory size (see Fig. 1).
+This area is *not* released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy of the boot
+memory content in addition to CPU state and HPTE region, in the case
+a crash does occur.
 
 Since this reserved memory area is used only after the system crash,
 there is no point in blocking this significant chunk of memory from
@@ -153,22 +153,22 @@ that were present in CMA region::
 
   o Memory Reservation during first kernel
 
-  Low memory Top of memory
-  0boot memory size   |<--- Reserved dump area --->|   |
-  |   |   |Permanent Reservation   |   |
-  V   V   ||   V
-  +---+-/ /---+---++---+-+-++--+
-  |   |   |///||  DUMP | HDR | ELF ||  |
-  +---+-/ /---+---++---+-+-++--+
-|   ^^ ^  ^   ^
-|   || |  |   |
-\  CPU  HPTE   /  |   |
- --   |   |
-  Boot memory content gets transferred|   |
-  to reserved area by firmware at the |   |
-  time of crash.  |   |
-  FADump Header   |
-   (meta area)|
+  Low memory  Top of memory
+  0boot memory size   |<-- Reserved dump area ->| |
+  |   |   |  Permanent Reservation  | |
+  V   V   | | V
+  +---+-/ /---+---++---+---++-+
+  |   |   |///||DUMP   |  HDR  || |
+  +---+-/ /---+---++---+---++-+
+|   ^^   ^ ^  ^
+|   ||   | |  |
+\  CPU  HPTE / |  |
+   |  |
+  Boot memory content gets transferred |  |
+  to reserved area by firmware at the  |  |
+  time of crash.   |  |
+   FADump Header  |
+(meta area)   |
   |
   |
   Metadata: This area holds a metadata structure whose
@@ -186,13 +186,20 @@ that were present in 

[PATCH v8 2/3] powerpc/fadump: add hotplug_ready sysfs interface

2024-02-16 Thread Sourabh Jain
The elfcorehdr describes the CPUs and memory of the crashed kernel to
the kernel that captures the dump, known as the second or fadump kernel.
The elfcorehdr needs to be updated if the system's memory changes due to
memory hotplug or online/offline events.

Currently, memory hotplug events are monitored in userspace by udev
rules, and fadump is re-registered, which recreates the elfcorehdr with
the latest available memory in the system.

However, the previous patch ("powerpc: make fadump resilient with memory
add/remove events") moved the creation of elfcorehdr to the second or
fadump kernel. This eliminates the need to regenerate the elfcorehdr
during memory hotplug or online/offline events.

Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
userspace know that fadump re-registration is not required for memory
add/remove events.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++
 arch/powerpc/kernel/fadump.c  | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
index 8f7a64a81783..e2786a3db8dd 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org
 Description:   read only
Provide information about the amount of memory reserved by
FADump to save the crash dump in bytes.
+
+What:  /sys/kernel/fadump/hotplug_ready
+Date:  Feb 2024
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Kdump udev rule re-registers fadump on memory add/remove events,
+   primarily to update the elfcorehdr. This sysfs indicates the
+   kdump udev rule that fadump re-registration is not required on
+   memory add/remove events because elfcorehdr is now prepared in
+   the second/fadump kernel.
+User:  kexec-tools
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a55ad8514745..6478820a2038 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1444,6 +1444,18 @@ static ssize_t enabled_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
+/*
+ * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates
+ * to usersapce that fadump re-registration is not required on memory
+ * hotplug events.
+ */
+static ssize_t hotplug_ready_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+   return sprintf(buf, "%d\n", 1);
+}
+
 static ssize_t mem_reserved_show(struct kobject *kobj,
 struct kobj_attribute *attr,
 char *buf)
@@ -1516,11 +1528,13 @@ static struct kobj_attribute release_attr = 
__ATTR_WO(release_mem);
 static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
 static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
+static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
_attr.attr,
_reserved_attr.attr,
+   _ready_attr.attr,
NULL,
 };
 
-- 
2.43.0



[PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events

2024-02-16 Thread Sourabh Jain
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
   can lead to race conditions and, more importantly, it creates a wide
   window during which fadump is inactive until all memory add/remove
   events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
   available during early boot and remains fixed thereafter. However, if
   elfcorehdr is later recreated with additional memblock regions, its
   size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel is responsible for preparing the fadump
header and storing it in the fadump reserved area. The fadump header
includes the start address of the elfcorehdr, crashing CPU details, and
other relevant information. In the event of a crash in the first kernel,
the second/fadump boots and accesses the fadump header prepared by the
first kernel. It then performs the following steps in a
platform-specific function [rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr

Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
   It stores the fadump header version, which is currently set to 1.
   This provides flexibility to update the fadump crash info header in
   the future without changing the magic number. For each change in the
   fadump header, the version will be increased. This will help the
   updated kernel determine how to handle kernel dumps from older
   kernels. The magic number remains relevant for checking fadump header
   corruption.

2. pt_regs_sz/cpu_mask_sz:
   Store size of pt_regs and cpu_mask structure of first kernel. These
   attributes are used to prevent dump processing if the sizes of
   pt_regs or cpu_mask structure differ between the first and fadump
   kernels.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 arch/powerpc/include/asm/fadump-internal.h   |  31 +-
 arch/powerpc/kernel/fadump.c | 339 +++
 arch/powerpc/platforms/powernv/opal-fadump.c |  22 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c |  30 +-
 4 files changed, 232 insertions(+), 190 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..5d706a7acc8a 100644
--- 

[PATCH v8 0/3] powerpc: make fadump resilient with memory add/remove events

2024-02-16 Thread Sourabh Jain
Problem:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the cpus and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the patch series.

Existing solution:
==
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

Challenges with existing solution:
==
1. Performing bulk memory add/remove with udev-based fadump
   re-registration can lead to race conditions and, more importantly,
   it creates a large wide window during which fadump is inactive until
   all memory add/remove events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. Memory for elfcorehdr is allocated based on the memblock regions
   available during first kernel early boot and it remains fixed
   thereafter. However, if the elfcorehdr is later recreated with
   additional memblock regions, its size will increase, potentially
   leading to memory corruption.

Proposed solution:
==
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

To know more about elfcorehdr creation in the fadump kernel, refer to
the first patch in this series.

The second patch includes a new sysfs interface that tells userspace
that fadump re-registration isn't needed for memory add/remove events. 
note that userspace changes do not need to be in sync with kernel
changes; they can roll out independently.

Since there are significant changes in the fadump implementation, the
third patch updates the fadump documentation to reflect the changes made
in this patch series.

Kernel tree rebased on 6.8.0-rc4 with patch series applied:
=
https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v8

Userspace changes:
==
To realize this feature, one must update the kdump udev rules to prevent
fadump re-registration during memory add/remove events.

On rhel apply the following changes to file
/usr/lib/udev/rules.d/98-kexec.rules

-run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; 
/usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'"
+# don't re-register fadump if the value of the node
+# /sys/kernel/fadump/hotplug_ready is 1.
+
+run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test 
-f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0  || ! 
test -f /sys/kernel/fadump/hotplug_ready || cat 
/sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run 
--quiet --no-block /usr/lib/udev/kdump-udev-throttler'"

Changelog:
==
v8: 16 Feb 2023
  - Move `elfcorehdr_addr` and `elfcorehdr_size` struct attributes
from `struct fadump_crash_info_header` to `struct fw_dump`.
  - Make minor changes in commit message 1/3.
  - Rebase it to 6.8-rc4.

v7: 11 Jan 2024
  https://lore.kernel.org/all/2024040943.297501-1-sourabhj...@linux.ibm.com/
  - Rebase it to 6.7

v6: 8 Dec 2023
  https://lore.kernel.org/all/20231208115159.82236-1-sourabhj...@linux.ibm.com/
  - Add size fields for `pt_regs` and `cpumask` in the fadump header
structure
  - Don't process the dump if the size of `pt_regs` and `cpu_mask` is
not same in the crashed and fadump kernel
  - Include an additional check for endianness mismatch when the magic
number doesn't match, to print the relevant error message
  - Don't process the dump if the fadump header contains an old magic number
  - Rebased it to 6.7.0-rc4

v5: 29 Oct 2023 
  https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/
  - Fix a comment on the first patch

v4: 21 Oct 2023
  https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/
  - Fix a build warning about type casting

v3: 9 Oct 2023
  https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/
  - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr
  - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset

v2: 25 Sep 2023
  https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/
  - Fixed a few indentation issues reported by the checkpatch script.
  - 

Re: [kvm-unit-tests PATCH v4 8/8] migration: add a migration selftest

2024-02-16 Thread Nicholas Piggin
On Fri Feb 16, 2024 at 9:15 PM AEST, Thomas Huth wrote:
> On 09/02/2024 10.11, Nicholas Piggin wrote:
> > Add a selftest for migration support in  guest library and test harness
> > code. It performs migrations in a tight loop to irritate races and bugs
> > in the test harness code.
> > 
> > Include the test in arm, s390, powerpc.
> > 
> > Acked-by: Claudio Imbrenda  (s390x)
> > Reviewed-by: Thomas Huth 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   arm/Makefile.common  |  1 +
> >   arm/selftest-migration.c |  1 +
> >   arm/unittests.cfg|  6 ++
>
>   Hi Nicholas,
>
> I just gave the patches a try, but the arm test seems to fail for me: Only 
> the first getchar() seems to wait for a character, all the subsequent ones 
> don't wait anymore and just continue immediately ... is this working for 
> you? Or do I need another patch on top?

Hey sorry missed this comment

It does seem to work for me, I've mostly tested pseries but I did test
others too (that's how I saw the arm getchar limit).

How are you observing it not waiting for migration? I put some sleeps in
the migration script before echo'ing to the console input and it seems
to be doing the right thing. Admittedly the test contains no way to
programaticaly verify the machine was migrated the expected number of
times, it would be nice to try to match that up somehow.

Thanks,
Nick


Re: [PATCH v2] MAINTAINERS: adjust file entries after crypto vmx file movement

2024-02-16 Thread Herbert Xu
On Thu, Feb 08, 2024 at 10:33:27AM +0100, Lukas Bulwahn wrote:
> Commit 109303336a0c ("crypto: vmx - Move to arch/powerpc/crypto") moves the
> crypto vmx files to arch/powerpc, but misses to adjust the file entries for
> IBM Power VMX Cryptographic instructions and LINUX FOR POWERPC.
> 
> Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about
> broken references.
> 
> Adjust these file entries accordingly. To keep the matched files exact
> after the movement, spell out each file name in the new directory.
> 
> Signed-off-by: Lukas Bulwahn 
> ---
> v1: 
> https://lore.kernel.org/lkml/20240129131729.4311-1-lukas.bulw...@gmail.com/
> 
> v1 -> v2:
>   - address Herbert Xu's feedback:
>   keep the matched files exactly those which were in the vmx directory
> 
> Danny, please ack.
> Herbert, please pick this minor clean-up patch on your -next tree.
> 
>  MAINTAINERS | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] powerpc: remove unused *_syscall_64.o variables in Makefile

2024-02-16 Thread Masahiro Yamada
+To: Daniel Axtens



Maybe, we should check if the issue fixed by
2f26ed1764b42a8c40d9c48441c73a70d805decf
came back.





On Fri, Feb 16, 2024 at 10:55 PM Masahiro Yamada  wrote:
>
> Commit ab1a517d55b0 ("powerpc/syscall: Rename syscall_64.c into
> interrupt.c") missed to update these three lines:
>
>   GCOV_PROFILE_syscall_64.o := n
>   KCOV_INSTRUMENT_syscall_64.o := n
>   UBSAN_SANITIZE_syscall_64.o := n
>
> To restore the original behavior, we could replace them with:
>
>   GCOV_PROFILE_interrupt.o := n
>   KCOV_INSTRUMENT_interrupt.o := n
>   UBSAN_SANITIZE_interrupt.o := n
>
> However, nobody has noticed the functional change in the past three
> years, so they were unneeded.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/powerpc/kernel/Makefile | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 2919433be355..72d1cd6443bc 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -191,9 +191,6 @@ GCOV_PROFILE_kprobes-ftrace.o := n
>  KCOV_INSTRUMENT_kprobes-ftrace.o := n
>  KCSAN_SANITIZE_kprobes-ftrace.o := n
>  UBSAN_SANITIZE_kprobes-ftrace.o := n
> -GCOV_PROFILE_syscall_64.o := n
> -KCOV_INSTRUMENT_syscall_64.o := n
> -UBSAN_SANITIZE_syscall_64.o := n
>  UBSAN_SANITIZE_vdso.o := n
>
>  # Necessary for booting with kcov enabled on book3e machines
> --
> 2.40.1
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-16 Thread John Hubbard

On 2/16/24 08:56, Catalin Marinas wrote:
...

The problem is that the contpte_* symbols are called from the ptep_* inline
functions. So where those inlines are called from modules, we need to make sure
the contpte_* symbols are available.

John Hubbard originally reported this problem against v1 and I enumerated all
the drivers that call into the ptep_* inlines here:
https://lore.kernel.org/linux-arm-kernel/b994ff89-1a1f-26ca-9479-b08c77f94...@arm.com/#t

So they definitely need to be exported. Perhaps we can tighten it to


Yes. Let's keep the in-tree modules working.


EXPORT_SYMBOL_GPL(), but I was being cautious as I didn't want to break anything
out-of-tree. I'm not sure what the normal policy is? arm64 seems to use ~equal
amounts of both.


EXPORT_SYMBOL_GPL() seems appropriate and low risk. As Catalin says below,
these really are deeply core mm routines, and any module operating at this
level is not going to be able to survive on EXPORT_SYMBOL alone, IMHO.

Now, if only I could find an out of tree module to test that claim on... :)



I don't think we are consistent here. For example set_pte_at() can't be
called from non-GPL modules because of __sync_icache_dcache. OTOH, such
driver is probably doing something dodgy. Same with
apply_to_page_range(), it's GPL-only (called from i915).

Let's see if others have any view over the next week or so, otherwise
I'd go for _GPL and relax it later if someone has a good use-case (can
be a patch on top adding _GPL).


I think going directly to _GPL for these is fine, actually.


thanks,
--
John Hubbard
NVIDIA



Re: [PATCH v2] powerpc: Add gpr1 and fpu save/restore functions

2024-02-16 Thread Timothy Pearson
- Original Message -
> From: "christophe leroy" 
> To: "Timothy Pearson" , "linuxppc-dev" 
> 
> Sent: Tuesday, February 13, 2024 9:13:41 AM
> Subject: Re: [PATCH v2] powerpc: Add gpr1 and fpu save/restore functions

> Le 12/02/2024 à 18:14, Timothy Pearson a écrit :


> All this looks very similar to savegpr0_xx and restgpr0_xx with the
> difference being that r12 is used instead of r1.
> 
> Could we avoid duplication by using macros ?
> 

Yes.  v3 sent.


[PATCH v3] powerpc: Add gpr1 and fpu save/restore functions

2024-02-16 Thread Timothy Pearson
When building the kernel in size optimized mode with the amdgpu module enabled,
gcc will begin referencing external gpr1 and fpu save/restore functions.  This
will then cause a linker failure as we do not link against libgcc which
normally contains those builtin functions.

Implement gpr1 and fpu save/restore functions per the PowerPC 64-bit ELFv2 ABI
documentation.

Tested on a Talos II with a WX7100 installed and running in DisplayCore mode.

Reported-by: kernel test robot 
Tested-by: Timothy Pearson 
Signed-off-by: Timothy Pearson 
---
 arch/powerpc/kernel/prom_init_check.sh |   4 +-
 arch/powerpc/lib/crtsavres.S   | 363 +
 scripts/mod/modpost.c  |   4 +
 3 files changed, 253 insertions(+), 118 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 69623b9045d5..76c5651e29d3 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -72,10 +72,10 @@ do
 
# ignore register save/restore funcitons
case $UNDEF in
-   _restgpr_*|_restgpr0_*|_rest32gpr_*)
+   _restgpr_*|_restgpr0_*|_restgpr1_*|_rest32gpr_*)
OK=1
;;
-   _savegpr_*|_savegpr0_*|_save32gpr_*)
+   _savegpr_*|_savegpr0_*|_restgpr0_*|_save32gpr_*)
OK=1
;;
esac
diff --git a/arch/powerpc/lib/crtsavres.S b/arch/powerpc/lib/crtsavres.S
index 7e5e1c28e56a..f97270d36720 100644
--- a/arch/powerpc/lib/crtsavres.S
+++ b/arch/powerpc/lib/crtsavres.S
@@ -3,6 +3,7 @@
  *
  *   Copyright (C) 1995, 1996, 1998, 2000, 2001 Free Software Foundation, Inc.
  *   Copyright 2008 Freescale Semiconductor, Inc.
+ *   Copyright 2024 Raptor Engineering, LLC
  *   Written By Michael Meissner
  *
  * Based on gcc/config/rs6000/crtsavres.asm from gcc
@@ -314,126 +315,134 @@ _GLOBAL(_restvr_31)
 
 #else /* CONFIG_PPC64 */
 
-.globl _savegpr0_14
-_savegpr0_14:
-   std r14,-144(r1)
-.globl _savegpr0_15
-_savegpr0_15:
-   std r15,-136(r1)
-.globl _savegpr0_16
-_savegpr0_16:
-   std r16,-128(r1)
-.globl _savegpr0_17
-_savegpr0_17:
-   std r17,-120(r1)
-.globl _savegpr0_18
-_savegpr0_18:
-   std r18,-112(r1)
-.globl _savegpr0_19
-_savegpr0_19:
-   std r19,-104(r1)
-.globl _savegpr0_20
-_savegpr0_20:
-   std r20,-96(r1)
-.globl _savegpr0_21
-_savegpr0_21:
-   std r21,-88(r1)
-.globl _savegpr0_22
-_savegpr0_22:
-   std r22,-80(r1)
-.globl _savegpr0_23
-_savegpr0_23:
-   std r23,-72(r1)
-.globl _savegpr0_24
-_savegpr0_24:
-   std r24,-64(r1)
-.globl _savegpr0_25
-_savegpr0_25:
-   std r25,-56(r1)
-.globl _savegpr0_26
-_savegpr0_26:
-   std r26,-48(r1)
-.globl _savegpr0_27
-_savegpr0_27:
-   std r27,-40(r1)
-.globl _savegpr0_28
-_savegpr0_28:
-   std r28,-32(r1)
-.globl _savegpr0_29
-_savegpr0_29:
-   std r29,-24(r1)
-.globl _savegpr0_30
-_savegpr0_30:
-   std r30,-16(r1)
-.globl _savegpr0_31
-_savegpr0_31:
-   std r31,-8(r1)
-   std r0,16(r1)
+#define __PPC64_SAVEGPR(n,base)\
+.globl _savegpr##n##_14\
+_savegpr##n##_14:  \
+   std r14,-144(base)  \
+.globl _savegpr##n##_15\
+_savegpr##n##_15:  \
+   std r15,-136(base)  \
+.globl _savegpr##n##_16\
+_savegpr##n##_16:  \
+   std r16,-128(base)  \
+.globl _savegpr##n##_17\
+_savegpr##n##_17:  \
+   std r17,-120(base)  \
+.globl _savegpr##n##_18\
+_savegpr##n##_18:  \
+   std r18,-112(base)  \
+.globl _savegpr##n##_19\
+_savegpr##n##_19:  \
+   std r19,-104(base)  \
+.globl _savegpr##n##_20\
+_savegpr##n##_20:  \
+   std r20,-96(base)   \
+.globl _savegpr##n##_21\
+_savegpr##n##_21:  \
+   std r21,-88(base)   \
+.globl _savegpr##n##_22\
+_savegpr##n##_22:  \
+   std r22,-80(base)   \
+.globl _savegpr##n##_23\
+_savegpr##n##_23:  \
+   std r23,-72(base)   \
+.globl _savegpr##n##_24\
+_savegpr##n##_24:  \
+   std r24,-64(base)   \
+.globl _savegpr##n##_25\
+_savegpr##n##_25:  \
+   std r25,-56(base)   \
+.globl _savegpr##n##_26\
+_savegpr##n##_26:  \
+   std r26,-48(base)   \
+.globl _savegpr##n##_27\
+_savegpr##n##_27:  \
+   std r27,-40(base)   \
+.globl _savegpr##n##_28\
+_savegpr##n##_28:  \
+   std r28,-32(base)   \
+.globl _savegpr##n##_29\
+_savegpr##n##_29:  \
+   std r29,-24(base)   \
+.globl _savegpr##n##_30\
+_savegpr##n##_30:  \
+   std r30,-16(base)   \
+.globl _savegpr##n##_31\
+_savegpr##n##_31:  \
+   std r31,-8(base)\
+ 

Re: [PATCH] KVM: Get rid of return value from kvm_arch_create_vm_debugfs()

2024-02-16 Thread Marc Zyngier
On Fri, 16 Feb 2024 15:59:41 +,
Oliver Upton  wrote:
> 
> The general expectation with debugfs is that any initialization failure
> is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows
> implementations to return an error and kvm_create_vm_debugfs() allows
> that to fail VM creation.
> 
> Change to a void return to discourage architectures from making debugfs
> failures fatal for the VM. Seems like everyone already had the right
> idea, as all implementations already return 0 unconditionally.
> 
> Signed-off-by: Oliver Upton 

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-16 Thread Catalin Marinas
On Fri, Feb 16, 2024 at 12:53:43PM +, Ryan Roberts wrote:
> On 16/02/2024 12:25, Catalin Marinas wrote:
> > On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote:
> >>  arch/arm64/mm/contpte.c  | 285 +++
> > 
> > Nitpick: I think most symbols in contpte.c can be EXPORT_SYMBOL_GPL().
> > We don't expect them to be used by random out of tree modules. In fact,
> > do we expect them to end up in modules at all? Most seem to be called
> > from the core mm code.
> 
> The problem is that the contpte_* symbols are called from the ptep_* inline
> functions. So where those inlines are called from modules, we need to make 
> sure
> the contpte_* symbols are available.
> 
> John Hubbard originally reported this problem against v1 and I enumerated all
> the drivers that call into the ptep_* inlines here:
> https://lore.kernel.org/linux-arm-kernel/b994ff89-1a1f-26ca-9479-b08c77f94...@arm.com/#t
> 
> So they definitely need to be exported. Perhaps we can tighten it to
> EXPORT_SYMBOL_GPL(), but I was being cautious as I didn't want to break 
> anything
> out-of-tree. I'm not sure what the normal policy is? arm64 seems to use ~equal
> amounts of both.

I don't think we are consistent here. For example set_pte_at() can't be
called from non-GPL modules because of __sync_icache_dcache. OTOH, such
driver is probably doing something dodgy. Same with
apply_to_page_range(), it's GPL-only (called from i915).

Let's see if others have any view over the next week or so, otherwise
I'd go for _GPL and relax it later if someone has a good use-case (can
be a patch on top adding _GPL).

> > If you can make this easier to parse (in a few years time) with an
> > additional patch adding some more comments, that would be great. For
> > this patch:
> 
> I already have a big block comment at the top, which was trying to explain it.
> Clearly not well enough though. I'll add more comments as a follow up patch 
> when
> I get back from holiday.

I read that comment but it wasn't immediately obvious what the atomicity
requirements are - basically we require a single PTE to be atomically
read (which it is), the rest is the dirty/young state being added on
top. I guess a sentence along these lines would do.

Enjoy your holiday!

-- 
Catalin


[PATCH] KVM: Get rid of return value from kvm_arch_create_vm_debugfs()

2024-02-16 Thread Oliver Upton
The general expectation with debugfs is that any initialization failure
is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows
implementations to return an error and kvm_create_vm_debugfs() allows
that to fail VM creation.

Change to a void return to discourage architectures from making debugfs
failures fatal for the VM. Seems like everyone already had the right
idea, as all implementations already return 0 unconditionally.

Signed-off-by: Oliver Upton 
---

Compile-tested on arm64, powerpc, and x86 since I don't trust myself to
even get this simple patch right.

 arch/powerpc/kvm/powerpc.c | 3 +--
 arch/x86/kvm/debugfs.c | 3 +--
 include/linux/kvm_host.h   | 2 +-
 virt/kvm/kvm_main.c| 8 ++--
 4 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 23407fbd73c9..d32abe7fe6ab 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -2538,9 +2538,8 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, 
struct dentry *debugfs_
vcpu->kvm->arch.kvm_ops->create_vcpu_debugfs(vcpu, 
debugfs_dentry);
 }
 
-int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+void kvm_arch_create_vm_debugfs(struct kvm *kvm)
 {
if (kvm->arch.kvm_ops->create_vm_debugfs)
kvm->arch.kvm_ops->create_vm_debugfs(kvm);
-   return 0;
 }
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 95ea1a1f7403..999227fc7c66 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -189,9 +189,8 @@ static const struct file_operations mmu_rmaps_stat_fops = {
.release= kvm_mmu_rmaps_stat_release,
 };
 
-int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+void kvm_arch_create_vm_debugfs(struct kvm *kvm)
 {
debugfs_create_file("mmu_rmaps_stat", 0644, kvm->debugfs_dentry, kvm,
_rmaps_stat_fops);
-   return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..9a45f673f687 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1507,7 +1507,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
-int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+void kvm_arch_create_vm_debugfs(struct kvm *kvm);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88a69f7..c681149c382a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1150,10 +1150,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const 
char *fdname)
_fops_per_vm);
}
 
-   ret = kvm_arch_create_vm_debugfs(kvm);
-   if (ret)
-   goto out_err;
-
+   kvm_arch_create_vm_debugfs(kvm);
return 0;
 out_err:
kvm_destroy_vm_debugfs(kvm);
@@ -1183,9 +1180,8 @@ void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
  * Cleanup should be automatic done in kvm_destroy_vm_debugfs() recursively, so
  * a per-arch destroy interface is not needed.
  */
-int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
+void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
 {
-   return 0;
 }
 
 static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)

base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
-- 
2.44.0.rc0.258.g7320e95886-goog



Re: [kvm-unit-tests PATCH v1 00/18] arm/arm64: Rework cache maintenance at boot

2024-02-16 Thread Alexandru Elisei
Hi,

On Thu, Nov 30, 2023 at 04:07:02AM -0500, Shaoqin Huang wrote:
> Hi,
> 
> I'm posting Alexandru's patch set[1] rebased on the latest branch with the
> conflicts being resolved. No big changes compare to its original code.
> 
> As this version 1 of this series was posted one years ago, I would first let 
> you
> recall it, what's the intention of this series and what this series do. You 
> can
> view it by click the link[2] and view the cover-letter.
> 
> Since when writing the series[1], the efi support for arm64[3] hasn't been
> merged into the kvm-unit-tests, but now the efi support for arm64 has been
> merged. Directly rebase the series[1] onto the latest branch will break the 
> efi
> tests. This is mainly because the Patch #15 ("arm/arm64: Enable the MMU 
> early")
> moves the mmu_enable() out of the setup_mmu(), which causes the efi test will
> not enable the mmu. So I do a small change in the efi_mem_init() which makes 
> the
> efi test also enable the MMU early, and make it works.
> 
> And another change should be noticed is in the Patch #17 ("arm/arm64: Perform
> dcache maintenance"). In the efi_mem_init(), it will disable the mmu, and 
> build
> a new pagetable and re-enable the mmu, if the asm_mmu_disable clean and
> invalidate the data caches for entire memory, we don't need to care the dcache
> and after mmu disabled, we use the mmu_setup_early() to re-enable the mmu, 
> which
> takes care all the cache maintenance. But the situation changes since the 
> Patch
> #18 ("arm/arm64: Rework the cache maintenance in asm_mmu_disable") only clean
> and invalidate the data caches for the stack memory area. So we need to clean
> and invalidate the data caches manually before disable the mmu, I'm not
> confident about current cache maintenance at the efi setup patch, so I ask for
> your help to review it if it's right or not.
> 
> And I also drop one patch ("s390: Do not use the physical allocator") from[1]
> since this cause s390 test to fail.

This is unfortunate. What tests do you see failing?

I wrote the s390x patch so I can justify dropping the locking in the
physical allocator. And I wanted to drop the locking so I wouldn't have to
do maintenance operation on the spinlock.

Because of how kvm-unit-tests implements spinlocks for arm/arm64, they
don't protect against concurrent accesses when the MMU is turned off. And
because arm and arm64 use the physical allocator during the test boot
sequence, not during a test, using a spin lock is also useless since there
will be no concurrent accesses (the boot phase is done on a single CPU).

But since replacing the physical allocator causes test failures for s390x,
looks like the physical will still be needed to tests, and that requires
having the spinlock.

I guess the best approach would be to teach the physical allocator to do
cache maintenance on the spinlock. We might as well, since the UART needs
it too, and I don't think this series addresses that.

Thanks,
Alex

> 
> This series may include bug, so I really appreciate your review to improve 
> this
> series together.
> 
> You can get the code from:
> 
> $ git clone https://gitlab.com/shahuang/kvm-unit-tests.git \
>   -b arm-arm64-rework-cache-maintenance-at-boot-v1
> 
> [1] 
> https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/arm-arm64-rework-cache-maintenance-at-boot-v2-wip2
> [2] 
> https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
> [3] 
> https://patchwork.kernel.org/project/kvm/cover/20230530160924.82158-1-nikos.nikole...@arm.com/
> 
> Changelog:
> --
> RFC->v1:
>   - Gathered Reviewed-by tags.
>   - Various changes to commit messages and comments to hopefully make the code
> easier to understand.
>   - Patches #8 ("lib/alloc_phys: Expand documentation with usage and 
> limitations")
> are new.
>   - Folded patch "arm: page.h: Add missing libcflat.h include" into #17
> ("arm/arm64: Perform dcache maintenance at boot").
>   - Reordered the series to group patches that touch aproximately the same 
> code
> together - the patches that change the physical allocator are now first,
> followed come the patches that change how the secondaries are brought 
> online.
>   - Fixed several nasty bugs where the r4 register was being clobbered in the 
> arm
> assembly.
>   - Unmap the early UART address if the DTB address does not match the early
> address.
>   - Added dcache maintenance when a page table is modified with the MMU 
> disabled.
>   - Moved the cache maintenance when disabling the MMU to be executed before 
> the
> MMU is disabled.
>   - Rebase it on lasted branch which efi support has been merged.
>   - Make the efi test also enable MMU early.
>   - Add cache maintenance on efi setup path especially before mmu_disable.
> 
> RFC: 
> https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
> 
> Alexandru Elisei (18):
>   Makefile: Define __ASSEMBLY__ for assembly files
>   powerpc: Replace the 

Re: [PATCH v1 4/5] powerpc: Remove cpu-as-y completely

2024-02-16 Thread Christophe Leroy


Le 20/02/2023 à 07:00, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> cpu-as-y is there to force assembler building options.
>> But there is no need for that. Gcc is passed the necessary
>> options and it automatically pass the appropriate option to
>> GAS.
>>
>> GCC is given -maltivec when relevant, so no need
>> for -Wa,-maltivec in addition
>>
>> And -Wa,-many is wrong as it will hide innapropriate
>> instructions. Better to detect them and handle them on a
>> case by case basis.
>> -Wa,-many was added by commit 960e30029863 ("powerpc/Makefile:
>> Fix PPC_BOOK3S_64 ASFLAGS") in order to fix an issue with
>> clang and the passed -Wa,-mpower4 option. But we have now
>> removed it expecting the compiler to automatically pass the
>> proper options and instructions based on -mcpu=power4
> 
> I wanted to apply this one, but it caused a lot of breakage for big
> endian Book3S-64 builds - where we build for power4 but have lots of
> code that uses >= power5 instructions.
> 
> I'll try and get those all fixed and pick this up for the next merge
> window.
> 

ping ?


Re: [PATCH v2 0/2] PCI endpoint BAR hardware description cleanup

2024-02-16 Thread Manivannan Sadhasivam
On Fri, Feb 16, 2024 at 02:45:13PM +0100, Niklas Cassel wrote:
> The series is based on top of:
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=endpoint
> 
> 
> Hello all,
> 
> This series cleans up the hardware description for PCI endpoint BARs.
> 
> The problems with the existing hardware description:
> -The documentation is lackluster.
> -Some of the names are confusingly similar, e.g. fixed_64bit and
>  fixed_size, even though these are for completely unrelated things.
> -The way that the BARs are defined in the endpoint controller drivers
>  is messy, because the left hand side is not a BAR, so you can mark a
>  BAR as e.g. both fixed size and reserved.
> 
> This series tries to address all the problems above.
> 
> Personally, I think that the code is more readable, both the endpoint
> controller drivers, but also pci-epc-core.c.
> 
> (I will be sending out a patch series that adds BAR_RESIZABLE to enum
> pci_epc_bar_type in the coming week.)
> 

Applied to pci/endpoint!

- Mani

> 
> Kind regards,
> Niklas
> 
> 
> Changes since V1:
> -Picked up tags from Kishon and Mani.
> -Improved commit message for patch 1/2 as suggested by Mani.
> -Improved kdoc in patch 2/2 as suggested by Mani.
> 
> 
> Niklas Cassel (2):
>   PCI: endpoint: Clean up hardware description for BARs
>   PCI: endpoint: Drop only_64bit on reserved BARs
> 
>  drivers/pci/controller/dwc/pci-imx6.c |  3 +-
>  drivers/pci/controller/dwc/pci-keystone.c | 12 +++---
>  .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
>  drivers/pci/controller/dwc/pcie-keembay.c |  8 +++-
>  drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 +-
>  drivers/pci/controller/dwc/pcie-tegra194.c| 10 +++--
>  drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 +--
>  drivers/pci/controller/pcie-rcar-ep.c | 14 ---
>  drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +-
>  drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++--
>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
>  drivers/pci/endpoint/pci-epc-core.c   | 25 +---
>  drivers/pci/endpoint/pci-epf-core.c   | 15 +++
>  include/linux/pci-epc.h   | 39 ---
>  14 files changed, 106 insertions(+), 58 deletions(-)
> 
> -- 
> 2.43.1
> 

-- 
மணிவண்ணன் சதாசிவம்


Re: [PATCH v2 1/2] PCI: endpoint: Clean up hardware description for BARs

2024-02-16 Thread Manivannan Sadhasivam
On Fri, Feb 16, 2024 at 02:45:14PM +0100, Niklas Cassel wrote:
> The hardware description for BARs is scattered in many different variables
> in pci_epc_features. Some of these things are mutually exclusive, so it
> can create confusion over which variable that has precedence over another.
> 
> Improve the situation by creating a struct pci_epc_bar_desc, and a new
> enum pci_epc_bar_type, and convert the endpoint controller drivers to use
> this more well defined format.
> 
> Additionally, some endpoint controller drivers mark the BAR succeeding a
> "64-bit only BAR" as reserved, while some do not. By definition, a 64-bit
> BAR uses the succeeding BAR for the upper 32-bits, so an EPF driver cannot
> use a BAR succeeding a 64-bit BAR. Ensure that all endpoint controller
> drivers are uniform, and actually describe a reserved BAR as reserved.
> 
> Signed-off-by: Niklas Cassel 

Reviewed-by: Manivannan Sadhasivam 

- Mani

> Reviewed-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/controller/dwc/pci-imx6.c |  3 +-
>  drivers/pci/controller/dwc/pci-keystone.c | 12 +++
>  .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
>  drivers/pci/controller/dwc/pcie-keembay.c |  8 +++--
>  drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 ++-
>  drivers/pci/controller/dwc/pcie-tegra194.c| 10 --
>  drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 ++--
>  drivers/pci/controller/pcie-rcar-ep.c | 14 +---
>  drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +--
>  drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++---
>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
>  drivers/pci/endpoint/pci-epc-core.c   | 32 +
>  drivers/pci/endpoint/pci-epf-core.c   | 15 
>  include/linux/pci-epc.h   | 34 +++
>  14 files changed, 108 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
> b/drivers/pci/controller/dwc/pci-imx6.c
> index dc2c036ab28c..47a9a96484ed 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -1081,7 +1081,8 @@ static const struct pci_epc_features 
> imx8m_pcie_epc_features = {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = false,
> - .reserved_bar = 1 << BAR_1 | 1 << BAR_3,
> + .bar[BAR_1] = { .type = BAR_RESERVED, },
> + .bar[BAR_3] = { .type = BAR_RESERVED, },
>   .align = SZ_64K,
>  };
>  
> diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
> b/drivers/pci/controller/dwc/pci-keystone.c
> index c0c62533a3f1..b2b93b4fa82d 100644
> --- a/drivers/pci/controller/dwc/pci-keystone.c
> +++ b/drivers/pci/controller/dwc/pci-keystone.c
> @@ -924,12 +924,12 @@ static const struct pci_epc_features 
> ks_pcie_am654_epc_features = {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = true,
> - .reserved_bar = 1 << BAR_0 | 1 << BAR_1,
> - .bar_fixed_64bit = 1 << BAR_0,
> - .bar_fixed_size[2] = SZ_1M,
> - .bar_fixed_size[3] = SZ_64K,
> - .bar_fixed_size[4] = 256,
> - .bar_fixed_size[5] = SZ_1M,
> + .bar[BAR_0] = { .type = BAR_RESERVED, .only_64bit = true, },
> + .bar[BAR_1] = { .type = BAR_RESERVED, },
> + .bar[BAR_2] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
> + .bar[BAR_3] = { .type = BAR_FIXED, .fixed_size = SZ_64K, },
> + .bar[BAR_4] = { .type = BAR_FIXED, .fixed_size = 256, },
> + .bar[BAR_5] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
>   .align = SZ_1M,
>  };
>  
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index 2e398494e7c0..1f6ee1460ec2 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -250,7 +250,10 @@ static int __init ls_pcie_ep_probe(struct 
> platform_device *pdev)
>   pci->dev = dev;
>   pci->ops = pcie->drvdata->dw_pcie_ops;
>  
> - ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4);
> + ls_epc->bar[BAR_2].only_64bit = true;
> + ls_epc->bar[BAR_3].type = BAR_RESERVED;
> + ls_epc->bar[BAR_4].only_64bit = true;
> + ls_epc->bar[BAR_5].type = BAR_RESERVED;
>   ls_epc->linkup_notifier = true;
>  
>   pcie->pci = pci;
> diff --git a/drivers/pci/controller/dwc/pcie-keembay.c 
> b/drivers/pci/controller/dwc/pcie-keembay.c
> index 208d3b0ba196..5e8e54f597dd 100644
> --- a/drivers/pci/controller/dwc/pcie-keembay.c
> +++ b/drivers/pci/controller/dwc/pcie-keembay.c
> @@ -312,8 +312,12 @@ static const struct pci_epc_features 
> keembay_pcie_epc_features = {
>   .linkup_notifier= false,
>   .msi_capable= true,
>   .msix_capable   = true,
> - .reserved_bar   = BIT(BAR_1) | BIT(BAR_3) | BIT(BAR_5),
> - .bar_fixed_64bit= BIT(BAR_0) | BIT(BAR_2) | BIT(BAR_4),
> + .bar[BAR_0] = { .only_64bit = true, 

Re: [kvm-unit-tests PATCH v4 8/8] migration: add a migration selftest

2024-02-16 Thread Thomas Huth

On 09/02/2024 10.11, Nicholas Piggin wrote:

Add a selftest for migration support in  guest library and test harness
code. It performs migrations in a tight loop to irritate races and bugs
in the test harness code.

Include the test in arm, s390, powerpc.

Acked-by: Claudio Imbrenda  (s390x)
Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
  arm/Makefile.common  |  1 +
  arm/selftest-migration.c |  1 +
  arm/unittests.cfg|  6 ++
  common/selftest-migration.c  | 34 ++
  powerpc/Makefile.common  |  1 +
  powerpc/selftest-migration.c |  1 +
  powerpc/unittests.cfg|  4 
  s390x/Makefile   |  1 +
  s390x/selftest-migration.c   |  1 +
  s390x/unittests.cfg  |  4 
  10 files changed, 54 insertions(+)
  create mode 12 arm/selftest-migration.c
  create mode 100644 common/selftest-migration.c
  create mode 12 powerpc/selftest-migration.c
  create mode 12 s390x/selftest-migration.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f828dbe0..f107c478 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -5,6 +5,7 @@
  #
  
  tests-common  = $(TEST_DIR)/selftest.$(exe)

+tests-common += $(TEST_DIR)/selftest-migration.$(exe)
  tests-common += $(TEST_DIR)/spinlock-test.$(exe)
  tests-common += $(TEST_DIR)/pci-test.$(exe)
  tests-common += $(TEST_DIR)/pmu.$(exe)
diff --git a/arm/selftest-migration.c b/arm/selftest-migration.c
new file mode 12
index ..bd1eb266
--- /dev/null
+++ b/arm/selftest-migration.c
@@ -0,0 +1 @@
+../common/selftest-migration.c
\ No newline at end of file
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index fe601cbb..db0e4c9b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -55,6 +55,12 @@ smp = $MAX_SMP
  extra_params = -append 'smp'
  groups = selftest
  
+# Test migration

+[selftest-migration]
+file = selftest-migration.flat
+groups = selftest migration
+arch = arm64
+
  # Test PCI emulation
  [pci-test]
  file = pci-test.flat
diff --git a/common/selftest-migration.c b/common/selftest-migration.c
new file mode 100644
index ..f70c505f
--- /dev/null
+++ b/common/selftest-migration.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Machine independent migration tests
+ *
+ * This is just a very simple test that is intended to stress the migration
+ * support in the test harness. This could be expanded to test more guest
+ * library code, but architecture-specific tests should be used to test
+ * migration of tricky machine state.
+ */
+#include 
+#include 
+
+#if defined(__arm__) || defined(__aarch64__)
+/* arm can only call getchar 15 times */
+#define NR_MIGRATIONS 15
+#else
+#define NR_MIGRATIONS 100
+#endif


FYI, I just wrote a patch that will hopefully fix the limitation to 15 times 
on arm:


 https://lore.kernel.org/kvm/20240216140210.70280-1-th...@redhat.com/T/#u

 Thomas



[powerpc:merge] BUILD SUCCESS 35d5936fc3d416d9629c6b53a07bd25dc1c2bc7f

2024-02-16 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 35d5936fc3d416d9629c6b53a07bd25dc1c2bc7f  Automatic merge of 
'next' into merge (2024-02-15 23:53)

elapsed time: 1445m

configs tested: 167
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc haps_hs_smp_defconfig   gcc  
arc   randconfig-001-20240216   gcc  
arc   randconfig-002-20240216   gcc  
arm  alldefconfig   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm  collie_defconfig   gcc  
arm defconfig   clang
armhisi_defconfig   gcc  
armkeystone_defconfig   gcc  
arm  moxart_defconfig   gcc  
armneponset_defconfig   gcc  
arm   randconfig-003-20240216   gcc  
arm   randconfig-004-20240216   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-002-20240216   gcc  
arm64 randconfig-004-20240216   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240216   gcc  
csky  randconfig-002-20240216   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240216   gcc  
i386 buildonly-randconfig-002-20240216   clang
i386 buildonly-randconfig-003-20240216   clang
i386 buildonly-randconfig-004-20240216   gcc  
i386 buildonly-randconfig-005-20240216   gcc  
i386 buildonly-randconfig-006-20240216   clang
i386defconfig   clang
i386  randconfig-001-20240216   clang
i386  randconfig-002-20240216   clang
i386  randconfig-003-20240216   gcc  
i386  randconfig-004-20240216   clang
i386  randconfig-005-20240216   gcc  
i386  randconfig-006-20240216   clang
i386  randconfig-011-20240216   gcc  
i386  randconfig-012-20240216   gcc  
i386  randconfig-013-20240216   gcc  
i386  randconfig-014-20240216   clang
i386  randconfig-015-20240216   gcc  
i386  randconfig-016-20240216   gcc  
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240216   gcc  
loongarch randconfig-002-20240216   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68kmvme16x_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips   ip28_defconfig   gcc  
mips  loongson3_defconfig   gcc  
mips  pic32mzda_defconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20240216   gcc  
nios2

[PATCH] powerpc: remove unused KCSAN_SANITIZE_early_64.o in Makefile

2024-02-16 Thread Masahiro Yamada
Commit 2fb857bc9f9e ("powerpc/kcsan: Add exclusions from instrumentation")
added KCSAN_SANITIZE_early_64.o to arch/powerpc/kernel/Makefile, while
it does not compile early_64.o.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/kernel/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 72d1cd6443bc..a6f9b53c7490 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -55,7 +55,6 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 KCSAN_SANITIZE_early_32.o := n
-KCSAN_SANITIZE_early_64.o := n
 KCSAN_SANITIZE_cputable.o := n
 KCSAN_SANITIZE_btext.o := n
 KCSAN_SANITIZE_paca.o := n
-- 
2.40.1



[PATCH] powerpc: remove unused *_syscall_64.o variables in Makefile

2024-02-16 Thread Masahiro Yamada
Commit ab1a517d55b0 ("powerpc/syscall: Rename syscall_64.c into
interrupt.c") missed to update these three lines:

  GCOV_PROFILE_syscall_64.o := n
  KCOV_INSTRUMENT_syscall_64.o := n
  UBSAN_SANITIZE_syscall_64.o := n

To restore the original behavior, we could replace them with:

  GCOV_PROFILE_interrupt.o := n
  KCOV_INSTRUMENT_interrupt.o := n
  UBSAN_SANITIZE_interrupt.o := n

However, nobody has noticed the functional change in the past three
years, so they were unneeded.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/kernel/Makefile | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2919433be355..72d1cd6443bc 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -191,9 +191,6 @@ GCOV_PROFILE_kprobes-ftrace.o := n
 KCOV_INSTRUMENT_kprobes-ftrace.o := n
 KCSAN_SANITIZE_kprobes-ftrace.o := n
 UBSAN_SANITIZE_kprobes-ftrace.o := n
-GCOV_PROFILE_syscall_64.o := n
-KCOV_INSTRUMENT_syscall_64.o := n
-UBSAN_SANITIZE_syscall_64.o := n
 UBSAN_SANITIZE_vdso.o := n
 
 # Necessary for booting with kcov enabled on book3e machines
-- 
2.40.1



[PATCH v2 1/2] PCI: endpoint: Clean up hardware description for BARs

2024-02-16 Thread Niklas Cassel
The hardware description for BARs is scattered in many different variables
in pci_epc_features. Some of these things are mutually exclusive, so it
can create confusion over which variable that has precedence over another.

Improve the situation by creating a struct pci_epc_bar_desc, and a new
enum pci_epc_bar_type, and convert the endpoint controller drivers to use
this more well defined format.

Additionally, some endpoint controller drivers mark the BAR succeeding a
"64-bit only BAR" as reserved, while some do not. By definition, a 64-bit
BAR uses the succeeding BAR for the upper 32-bits, so an EPF driver cannot
use a BAR succeeding a 64-bit BAR. Ensure that all endpoint controller
drivers are uniform, and actually describe a reserved BAR as reserved.

Signed-off-by: Niklas Cassel 
Reviewed-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/dwc/pci-imx6.c |  3 +-
 drivers/pci/controller/dwc/pci-keystone.c | 12 +++
 .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
 drivers/pci/controller/dwc/pcie-keembay.c |  8 +++--
 drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 ++-
 drivers/pci/controller/dwc/pcie-tegra194.c| 10 --
 drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 ++--
 drivers/pci/controller/pcie-rcar-ep.c | 14 +---
 drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +--
 drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++---
 drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
 drivers/pci/endpoint/pci-epc-core.c   | 32 +
 drivers/pci/endpoint/pci-epf-core.c   | 15 
 include/linux/pci-epc.h   | 34 +++
 14 files changed, 108 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index dc2c036ab28c..47a9a96484ed 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1081,7 +1081,8 @@ static const struct pci_epc_features 
imx8m_pcie_epc_features = {
.linkup_notifier = false,
.msi_capable = true,
.msix_capable = false,
-   .reserved_bar = 1 << BAR_1 | 1 << BAR_3,
+   .bar[BAR_1] = { .type = BAR_RESERVED, },
+   .bar[BAR_3] = { .type = BAR_RESERVED, },
.align = SZ_64K,
 };
 
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index c0c62533a3f1..b2b93b4fa82d 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -924,12 +924,12 @@ static const struct pci_epc_features 
ks_pcie_am654_epc_features = {
.linkup_notifier = false,
.msi_capable = true,
.msix_capable = true,
-   .reserved_bar = 1 << BAR_0 | 1 << BAR_1,
-   .bar_fixed_64bit = 1 << BAR_0,
-   .bar_fixed_size[2] = SZ_1M,
-   .bar_fixed_size[3] = SZ_64K,
-   .bar_fixed_size[4] = 256,
-   .bar_fixed_size[5] = SZ_1M,
+   .bar[BAR_0] = { .type = BAR_RESERVED, .only_64bit = true, },
+   .bar[BAR_1] = { .type = BAR_RESERVED, },
+   .bar[BAR_2] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
+   .bar[BAR_3] = { .type = BAR_FIXED, .fixed_size = SZ_64K, },
+   .bar[BAR_4] = { .type = BAR_FIXED, .fixed_size = 256, },
+   .bar[BAR_5] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
.align = SZ_1M,
 };
 
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 2e398494e7c0..1f6ee1460ec2 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -250,7 +250,10 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
pci->dev = dev;
pci->ops = pcie->drvdata->dw_pcie_ops;
 
-   ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4);
+   ls_epc->bar[BAR_2].only_64bit = true;
+   ls_epc->bar[BAR_3].type = BAR_RESERVED;
+   ls_epc->bar[BAR_4].only_64bit = true;
+   ls_epc->bar[BAR_5].type = BAR_RESERVED;
ls_epc->linkup_notifier = true;
 
pcie->pci = pci;
diff --git a/drivers/pci/controller/dwc/pcie-keembay.c 
b/drivers/pci/controller/dwc/pcie-keembay.c
index 208d3b0ba196..5e8e54f597dd 100644
--- a/drivers/pci/controller/dwc/pcie-keembay.c
+++ b/drivers/pci/controller/dwc/pcie-keembay.c
@@ -312,8 +312,12 @@ static const struct pci_epc_features 
keembay_pcie_epc_features = {
.linkup_notifier= false,
.msi_capable= true,
.msix_capable   = true,
-   .reserved_bar   = BIT(BAR_1) | BIT(BAR_3) | BIT(BAR_5),
-   .bar_fixed_64bit= BIT(BAR_0) | BIT(BAR_2) | BIT(BAR_4),
+   .bar[BAR_0] = { .only_64bit = true, },
+   .bar[BAR_1] = { .type = BAR_RESERVED, },
+   .bar[BAR_2] = { .only_64bit = true, },
+   .bar[BAR_3] = { .type = BAR_RESERVED, },
+   .bar[BAR_4] = { .only_64bit = true, },

[PATCH v2 0/2] PCI endpoint BAR hardware description cleanup

2024-02-16 Thread Niklas Cassel
The series is based on top of:
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=endpoint


Hello all,

This series cleans up the hardware description for PCI endpoint BARs.

The problems with the existing hardware description:
-The documentation is lackluster.
-Some of the names are confusingly similar, e.g. fixed_64bit and
 fixed_size, even though these are for completely unrelated things.
-The way that the BARs are defined in the endpoint controller drivers
 is messy, because the left hand side is not a BAR, so you can mark a
 BAR as e.g. both fixed size and reserved.

This series tries to address all the problems above.

Personally, I think that the code is more readable, both the endpoint
controller drivers, but also pci-epc-core.c.

(I will be sending out a patch series that adds BAR_RESIZABLE to enum
pci_epc_bar_type in the coming week.)


Kind regards,
Niklas


Changes since V1:
-Picked up tags from Kishon and Mani.
-Improved commit message for patch 1/2 as suggested by Mani.
-Improved kdoc in patch 2/2 as suggested by Mani.


Niklas Cassel (2):
  PCI: endpoint: Clean up hardware description for BARs
  PCI: endpoint: Drop only_64bit on reserved BARs

 drivers/pci/controller/dwc/pci-imx6.c |  3 +-
 drivers/pci/controller/dwc/pci-keystone.c | 12 +++---
 .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
 drivers/pci/controller/dwc/pcie-keembay.c |  8 +++-
 drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 +-
 drivers/pci/controller/dwc/pcie-tegra194.c| 10 +++--
 drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 +--
 drivers/pci/controller/pcie-rcar-ep.c | 14 ---
 drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +-
 drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++--
 drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
 drivers/pci/endpoint/pci-epc-core.c   | 25 +---
 drivers/pci/endpoint/pci-epf-core.c   | 15 +++
 include/linux/pci-epc.h   | 39 ---
 14 files changed, 106 insertions(+), 58 deletions(-)

-- 
2.43.1



Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-16 Thread Ryan Roberts
Hi Catalin,

Thanks for the review! Comments below...


On 16/02/2024 12:25, Catalin Marinas wrote:
> On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote:
>>  arch/arm64/mm/contpte.c  | 285 +++
> 
> Nitpick: I think most symbols in contpte.c can be EXPORT_SYMBOL_GPL().
> We don't expect them to be used by random out of tree modules. In fact,
> do we expect them to end up in modules at all? Most seem to be called
> from the core mm code.

The problem is that the contpte_* symbols are called from the ptep_* inline
functions. So where those inlines are called from modules, we need to make sure
the contpte_* symbols are available.

John Hubbard originally reported this problem against v1 and I enumerated all
the drivers that call into the ptep_* inlines here:
https://lore.kernel.org/linux-arm-kernel/b994ff89-1a1f-26ca-9479-b08c77f94...@arm.com/#t

So they definitely need to be exported. Perhaps we can tighten it to
EXPORT_SYMBOL_GPL(), but I was being cautious as I didn't want to break anything
out-of-tree. I'm not sure what the normal policy is? arm64 seems to use ~equal
amounts of both.

> 
>> +#define ptep_get_lockless ptep_get_lockless
>> +static inline pte_t ptep_get_lockless(pte_t *ptep)
>> +{
>> +pte_t pte = __ptep_get(ptep);
>> +
>> +if (likely(!pte_valid_cont(pte)))
>> +return pte;
>> +
>> +return contpte_ptep_get_lockless(ptep);
>> +}
> [...]
>> +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
>> +{
>> +/*
>> + * Gather access/dirty bits, which may be populated in any of the ptes
>> + * of the contig range. We may not be holding the PTL, so any contiguous
>> + * range may be unfolded/modified/refolded under our feet. Therefore we
>> + * ensure we read a _consistent_ contpte range by checking that all ptes
>> + * in the range are valid and have CONT_PTE set, that all pfns are
>> + * contiguous and that all pgprots are the same (ignoring access/dirty).
>> + * If we find a pte that is not consistent, then we must be racing with
>> + * an update so start again. If the target pte does not have CONT_PTE
>> + * set then that is considered consistent on its own because it is not
>> + * part of a contpte range.
>> +*/
> 
> I can't get my head around this lockless API. Maybe it works fine (and
> may have been discussed already) but we should document what the races
> are, why it works, what the memory ordering requirements are. For
> example, the generic (well, x86 PAE) ptep_get_lockless() only needs to
> ensure that the low/high 32 bits of a pte are consistent and there are
> some ordering rules on how these are updated.
> 
> Does the arm64 implementation only need to be correct w.r.t. the
> access/dirty bits? Since we can read orig_ptep atomically, I assume the
> only other updates from unfolding would set the dirty/access bits.
> 
>> +
>> +pgprot_t orig_prot;
>> +unsigned long pfn;
>> +pte_t orig_pte;
>> +pgprot_t prot;
>> +pte_t *ptep;
>> +pte_t pte;
>> +int i;
>> +
>> +retry:
>> +orig_pte = __ptep_get(orig_ptep);
>> +
>> +if (!pte_valid_cont(orig_pte))
>> +return orig_pte;
>> +
>> +orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte)));
>> +ptep = contpte_align_down(orig_ptep);
>> +pfn = pte_pfn(orig_pte) - (orig_ptep - ptep);
>> +
>> +for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) {
>> +pte = __ptep_get(ptep);
>> +prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));
> 
> We don't have any ordering guarantees in how the ptes in this range are
> read or written in the contpte_set_ptes() and the fold/unfold functions.
> We might not need them given all the other checks below but it's worth
> adding a comment.
> 
>> +
>> +if (!pte_valid_cont(pte) ||
>> +   pte_pfn(pte) != pfn ||
>> +   pgprot_val(prot) != pgprot_val(orig_prot))
>> +goto retry;
> 
> I think this also needs some comment. I get the !pte_valid_cont() check
> to attempt retrying when racing with unfolding. Are the other checks
> needed to detect re-folding with different protection or pfn?
> 
>> +
>> +if (pte_dirty(pte))
>> +orig_pte = pte_mkdirty(orig_pte);
>> +
>> +if (pte_young(pte))
>> +orig_pte = pte_mkyoung(orig_pte);
>> +}
> 
> After writing the comments above, I think I figured out that the whole
> point of this loop is to check that the ptes in the contig range are
> still consistent and the only variation allowed is the dirty/young
> state to be passed to the orig_pte returned. The original pte may have
> been updated by the time this loop finishes but I don't think it
> matters, it wouldn't be any different than reading a single pte and
> returning it while it is being updated.

Correct. The pte can be updated at any time, before after or during the reads.
That was always the case. But now we have to cope with a 

Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:32:05AM +, Ryan Roberts wrote:
> There are situations where a change to a single PTE could cause the
> contpte block in which it resides to become foldable (i.e. could be
> repainted with the contiguous bit). Such situations arise, for example,
> when user space temporarily changes protections, via mprotect, for
> individual pages, such can be the case for certain garbage collectors.
> 
> We would like to detect when such a PTE change occurs. However this can
> be expensive due to the amount of checking required. Therefore only
> perform the checks when an indiviual PTE is modified via mprotect
> (ptep_modify_prot_commit() -> set_pte_at() -> set_ptes(nr=1)) and only
> when we are setting the final PTE in a contpte-aligned block.
> 
> Signed-off-by: Ryan Roberts 

Acked-by: Catalin Marinas 


Re: [PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:32:04AM +, Ryan Roberts wrote:
> As set_ptes() and wrprotect_ptes() become a bit more complex, the
> compiler may choose not to inline them. But this is critical for fork()
> performance. So mark the functions, along with contpte_try_unfold()
> which is called by them, as __always_inline. This is worth ~1% on the
> fork() microbenchmark with order-0 folios (the common case).
> 
> Acked-by: Mark Rutland 
> Signed-off-by: Ryan Roberts 

Acked-by: Catalin Marinas 


Re: [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint()

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:32:03AM +, Ryan Roberts wrote:
> When core code iterates over a range of ptes and calls ptep_get() for
> each of them, if the range happens to cover contpte mappings, the number
> of pte reads becomes amplified by a factor of the number of PTEs in a
> contpte block. This is because for each call to ptep_get(), the
> implementation must read all of the ptes in the contpte block to which
> it belongs to gather the access and dirty bits.
> 
> This causes a hotspot for fork(), as well as operations that unmap
> memory such as munmap(), exit and madvise(MADV_DONTNEED). Fortunately we
> can fix this by implementing pte_batch_hint() which allows their
> iterators to skip getting the contpte tail ptes when gathering the batch
> of ptes to operate on. This results in the number of PTE reads returning
> to 1 per pte.
> 
> Acked-by: Mark Rutland 
> Reviewed-by: David Hildenbrand 
> Tested-by: John Hubbard 
> Signed-off-by: Ryan Roberts 

Acked-by: Catalin Marinas 


Re: [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:32:01AM +, Ryan Roberts wrote:
> Optimize the contpte implementation to fix some of the
> exit/munmap/dontneed performance regression introduced by the initial
> contpte commit. Subsequent patches will solve it entirely.
> 
> During exit(), munmap() or madvise(MADV_DONTNEED), mappings must be
> cleared. Previously this was done 1 PTE at a time. But the core-mm
> supports batched clear via the new [get_and_]clear_full_ptes() APIs. So
> let's implement those APIs and for fully covered contpte mappings, we no
> longer need to unfold the contpte. This significantly reduces unfolding
> operations, reducing the number of tlbis that must be issued.
> 
> Tested-by: John Hubbard 
> Signed-off-by: Ryan Roberts 

Acked-by: Catalin Marinas 


Re: [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:32:00AM +, Ryan Roberts wrote:
> Optimize the contpte implementation to fix some of the fork performance
> regression introduced by the initial contpte commit. Subsequent patches
> will solve it entirely.
> 
> During fork(), any private memory in the parent must be write-protected.
> Previously this was done 1 PTE at a time. But the core-mm supports
> batched wrprotect via the new wrprotect_ptes() API. So let's implement
> that API and for fully covered contpte mappings, we no longer need to
> unfold the contpte. This has 2 benefits:
> 
>   - reduced unfolding, reduces the number of tlbis that must be issued.
>   - The memory remains contpte-mapped ("folded") in the parent, so it
> continues to benefit from the more efficient use of the TLB after
> the fork.
> 
> The optimization to wrprotect a whole contpte block without unfolding is
> possible thanks to the tightening of the Arm ARM in respect to the
> definition and behaviour when 'Misprogramming the Contiguous bit'. See
> section D21194 at https://developer.arm.com/documentation/102105/ja-07/
> 
> Tested-by: John Hubbard 
> Signed-off-by: Ryan Roberts 

Acked-by: Catalin Marinas 


Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-16 Thread Catalin Marinas
On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote:
>  arch/arm64/mm/contpte.c  | 285 +++

Nitpick: I think most symbols in contpte.c can be EXPORT_SYMBOL_GPL().
We don't expect them to be used by random out of tree modules. In fact,
do we expect them to end up in modules at all? Most seem to be called
from the core mm code.

> +#define ptep_get_lockless ptep_get_lockless
> +static inline pte_t ptep_get_lockless(pte_t *ptep)
> +{
> + pte_t pte = __ptep_get(ptep);
> +
> + if (likely(!pte_valid_cont(pte)))
> + return pte;
> +
> + return contpte_ptep_get_lockless(ptep);
> +}
[...]
> +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
> +{
> + /*
> +  * Gather access/dirty bits, which may be populated in any of the ptes
> +  * of the contig range. We may not be holding the PTL, so any contiguous
> +  * range may be unfolded/modified/refolded under our feet. Therefore we
> +  * ensure we read a _consistent_ contpte range by checking that all ptes
> +  * in the range are valid and have CONT_PTE set, that all pfns are
> +  * contiguous and that all pgprots are the same (ignoring access/dirty).
> +  * If we find a pte that is not consistent, then we must be racing with
> +  * an update so start again. If the target pte does not have CONT_PTE
> +  * set then that is considered consistent on its own because it is not
> +  * part of a contpte range.
> +*/

I can't get my head around this lockless API. Maybe it works fine (and
may have been discussed already) but we should document what the races
are, why it works, what the memory ordering requirements are. For
example, the generic (well, x86 PAE) ptep_get_lockless() only needs to
ensure that the low/high 32 bits of a pte are consistent and there are
some ordering rules on how these are updated.

Does the arm64 implementation only need to be correct w.r.t. the
access/dirty bits? Since we can read orig_ptep atomically, I assume the
only other updates from unfolding would set the dirty/access bits.

> +
> + pgprot_t orig_prot;
> + unsigned long pfn;
> + pte_t orig_pte;
> + pgprot_t prot;
> + pte_t *ptep;
> + pte_t pte;
> + int i;
> +
> +retry:
> + orig_pte = __ptep_get(orig_ptep);
> +
> + if (!pte_valid_cont(orig_pte))
> + return orig_pte;
> +
> + orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte)));
> + ptep = contpte_align_down(orig_ptep);
> + pfn = pte_pfn(orig_pte) - (orig_ptep - ptep);
> +
> + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) {
> + pte = __ptep_get(ptep);
> + prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));

We don't have any ordering guarantees in how the ptes in this range are
read or written in the contpte_set_ptes() and the fold/unfold functions.
We might not need them given all the other checks below but it's worth
adding a comment.

> +
> + if (!pte_valid_cont(pte) ||
> +pte_pfn(pte) != pfn ||
> +pgprot_val(prot) != pgprot_val(orig_prot))
> + goto retry;

I think this also needs some comment. I get the !pte_valid_cont() check
to attempt retrying when racing with unfolding. Are the other checks
needed to detect re-folding with different protection or pfn?

> +
> + if (pte_dirty(pte))
> + orig_pte = pte_mkdirty(orig_pte);
> +
> + if (pte_young(pte))
> + orig_pte = pte_mkyoung(orig_pte);
> + }

After writing the comments above, I think I figured out that the whole
point of this loop is to check that the ptes in the contig range are
still consistent and the only variation allowed is the dirty/young
state to be passed to the orig_pte returned. The original pte may have
been updated by the time this loop finishes but I don't think it
matters, it wouldn't be any different than reading a single pte and
returning it while it is being updated.

If you can make this easier to parse (in a few years time) with an
additional patch adding some more comments, that would be great. For
this patch:

Reviewed-by: Catalin Marinas 

-- 
Catalin


Re: [PATCH 1/2] PCI: endpoint: Clean up hardware description for BARs

2024-02-16 Thread Manivannan Sadhasivam
On Fri, Feb 16, 2024 at 04:49:08PM +0530, Manivannan Sadhasivam wrote:
> On Sat, Feb 10, 2024 at 02:26:25AM +0100, Niklas Cassel wrote:
> > The hardware description for BARs is scattered in many different variables
> > in pci_epc_features. Some of these things are mutually exclusive, so it
> > can create confusion over which variable that has precedence over another.
> > 
> > Improve the situation by creating a struct pci_epc_bar_desc, and a new
> > enum pci_epc_bar_type, and convert the endpoint controller drivers to use
> > this more well defined format.
> > 
> > Signed-off-by: Niklas Cassel 
> > ---
> >  drivers/pci/controller/dwc/pci-imx6.c |  3 +-
> >  drivers/pci/controller/dwc/pci-keystone.c | 12 +++
> >  .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
> >  drivers/pci/controller/dwc/pcie-keembay.c |  8 +++--
> >  drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 ++-
> >  drivers/pci/controller/dwc/pcie-tegra194.c| 10 --
> >  drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 ++--
> >  drivers/pci/controller/pcie-rcar-ep.c | 14 +---
> >  drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +--
> >  drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++---
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
> >  drivers/pci/endpoint/pci-epc-core.c   | 32 +
> >  drivers/pci/endpoint/pci-epf-core.c   | 15 
> >  include/linux/pci-epc.h   | 34 +++
> >  14 files changed, 108 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
> > b/drivers/pci/controller/dwc/pci-imx6.c
> > index dc2c036ab28c..47a9a96484ed 100644
> > --- a/drivers/pci/controller/dwc/pci-imx6.c
> > +++ b/drivers/pci/controller/dwc/pci-imx6.c
> > @@ -1081,7 +1081,8 @@ static const struct pci_epc_features 
> > imx8m_pcie_epc_features = {
> > .linkup_notifier = false,
> > .msi_capable = true,
> > .msix_capable = false,
> > -   .reserved_bar = 1 << BAR_1 | 1 << BAR_3,
> > +   .bar[BAR_1] = { .type = BAR_RESERVED, },
> > +   .bar[BAR_3] = { .type = BAR_RESERVED, },
> > .align = SZ_64K,
> >  };
> >  
> > diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
> > b/drivers/pci/controller/dwc/pci-keystone.c
> > index c0c62533a3f1..b2b93b4fa82d 100644
> > --- a/drivers/pci/controller/dwc/pci-keystone.c
> > +++ b/drivers/pci/controller/dwc/pci-keystone.c
> > @@ -924,12 +924,12 @@ static const struct pci_epc_features 
> > ks_pcie_am654_epc_features = {
> > .linkup_notifier = false,
> > .msi_capable = true,
> > .msix_capable = true,
> > -   .reserved_bar = 1 << BAR_0 | 1 << BAR_1,
> > -   .bar_fixed_64bit = 1 << BAR_0,
> > -   .bar_fixed_size[2] = SZ_1M,
> > -   .bar_fixed_size[3] = SZ_64K,
> > -   .bar_fixed_size[4] = 256,
> > -   .bar_fixed_size[5] = SZ_1M,
> > +   .bar[BAR_0] = { .type = BAR_RESERVED, .only_64bit = true, },
> > +   .bar[BAR_1] = { .type = BAR_RESERVED, },
> > +   .bar[BAR_2] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
> > +   .bar[BAR_3] = { .type = BAR_FIXED, .fixed_size = SZ_64K, },
> > +   .bar[BAR_4] = { .type = BAR_FIXED, .fixed_size = 256, },
> > +   .bar[BAR_5] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
> > .align = SZ_1M,
> >  };
> >  
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index 2e398494e7c0..1f6ee1460ec2 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -250,7 +250,10 @@ static int __init ls_pcie_ep_probe(struct 
> > platform_device *pdev)
> > pci->dev = dev;
> > pci->ops = pcie->drvdata->dw_pcie_ops;
> >  
> > -   ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4);
> > +   ls_epc->bar[BAR_2].only_64bit = true;
> > +   ls_epc->bar[BAR_3].type = BAR_RESERVED;
> 
> BAR_3 and BAR_4 were not reserved previously.
> 

Okay, looking at patch 2 makes it clear why you have marked it as such. But it
should've been mentioned in the commit message.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


Re: [PATCH 1/2] PCI: endpoint: Clean up hardware description for BARs

2024-02-16 Thread Manivannan Sadhasivam
On Sat, Feb 10, 2024 at 02:26:25AM +0100, Niklas Cassel wrote:
> The hardware description for BARs is scattered in many different variables
> in pci_epc_features. Some of these things are mutually exclusive, so it
> can create confusion over which variable that has precedence over another.
> 
> Improve the situation by creating a struct pci_epc_bar_desc, and a new
> enum pci_epc_bar_type, and convert the endpoint controller drivers to use
> this more well defined format.
> 
> Signed-off-by: Niklas Cassel 
> ---
>  drivers/pci/controller/dwc/pci-imx6.c |  3 +-
>  drivers/pci/controller/dwc/pci-keystone.c | 12 +++
>  .../pci/controller/dwc/pci-layerscape-ep.c|  5 ++-
>  drivers/pci/controller/dwc/pcie-keembay.c |  8 +++--
>  drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  4 ++-
>  drivers/pci/controller/dwc/pcie-tegra194.c| 10 --
>  drivers/pci/controller/dwc/pcie-uniphier-ep.c | 15 ++--
>  drivers/pci/controller/pcie-rcar-ep.c | 14 +---
>  drivers/pci/endpoint/functions/pci-epf-ntb.c  |  4 +--
>  drivers/pci/endpoint/functions/pci-epf-test.c |  8 ++---
>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  2 +-
>  drivers/pci/endpoint/pci-epc-core.c   | 32 +
>  drivers/pci/endpoint/pci-epf-core.c   | 15 
>  include/linux/pci-epc.h   | 34 +++
>  14 files changed, 108 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
> b/drivers/pci/controller/dwc/pci-imx6.c
> index dc2c036ab28c..47a9a96484ed 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -1081,7 +1081,8 @@ static const struct pci_epc_features 
> imx8m_pcie_epc_features = {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = false,
> - .reserved_bar = 1 << BAR_1 | 1 << BAR_3,
> + .bar[BAR_1] = { .type = BAR_RESERVED, },
> + .bar[BAR_3] = { .type = BAR_RESERVED, },
>   .align = SZ_64K,
>  };
>  
> diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
> b/drivers/pci/controller/dwc/pci-keystone.c
> index c0c62533a3f1..b2b93b4fa82d 100644
> --- a/drivers/pci/controller/dwc/pci-keystone.c
> +++ b/drivers/pci/controller/dwc/pci-keystone.c
> @@ -924,12 +924,12 @@ static const struct pci_epc_features 
> ks_pcie_am654_epc_features = {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = true,
> - .reserved_bar = 1 << BAR_0 | 1 << BAR_1,
> - .bar_fixed_64bit = 1 << BAR_0,
> - .bar_fixed_size[2] = SZ_1M,
> - .bar_fixed_size[3] = SZ_64K,
> - .bar_fixed_size[4] = 256,
> - .bar_fixed_size[5] = SZ_1M,
> + .bar[BAR_0] = { .type = BAR_RESERVED, .only_64bit = true, },
> + .bar[BAR_1] = { .type = BAR_RESERVED, },
> + .bar[BAR_2] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
> + .bar[BAR_3] = { .type = BAR_FIXED, .fixed_size = SZ_64K, },
> + .bar[BAR_4] = { .type = BAR_FIXED, .fixed_size = 256, },
> + .bar[BAR_5] = { .type = BAR_FIXED, .fixed_size = SZ_1M, },
>   .align = SZ_1M,
>  };
>  
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index 2e398494e7c0..1f6ee1460ec2 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -250,7 +250,10 @@ static int __init ls_pcie_ep_probe(struct 
> platform_device *pdev)
>   pci->dev = dev;
>   pci->ops = pcie->drvdata->dw_pcie_ops;
>  
> - ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4);
> + ls_epc->bar[BAR_2].only_64bit = true;
> + ls_epc->bar[BAR_3].type = BAR_RESERVED;

BAR_3 and BAR_4 were not reserved previously.

> + ls_epc->bar[BAR_4].only_64bit = true;
> + ls_epc->bar[BAR_5].type = BAR_RESERVED;
>   ls_epc->linkup_notifier = true;
>  
>   pcie->pci = pci;
> diff --git a/drivers/pci/controller/dwc/pcie-keembay.c 
> b/drivers/pci/controller/dwc/pcie-keembay.c
> index 208d3b0ba196..5e8e54f597dd 100644
> --- a/drivers/pci/controller/dwc/pcie-keembay.c
> +++ b/drivers/pci/controller/dwc/pcie-keembay.c
> @@ -312,8 +312,12 @@ static const struct pci_epc_features 
> keembay_pcie_epc_features = {
>   .linkup_notifier= false,
>   .msi_capable= true,
>   .msix_capable   = true,
> - .reserved_bar   = BIT(BAR_1) | BIT(BAR_3) | BIT(BAR_5),
> - .bar_fixed_64bit= BIT(BAR_0) | BIT(BAR_2) | BIT(BAR_4),
> + .bar[BAR_0] = { .only_64bit = true, },
> + .bar[BAR_1] = { .type = BAR_RESERVED, },
> + .bar[BAR_2] = { .only_64bit = true, },
> + .bar[BAR_3] = { .type = BAR_RESERVED, },
> + .bar[BAR_4] = { .only_64bit = true, },
> + .bar[BAR_5] = { .type = BAR_RESERVED, },
>   .align  = SZ_16K,
>  };
>  
> diff --git a/drivers/pci/controller/dwc/pcie-rcar-gen4.c 

Re: [kvm-unit-tests PATCH v4 8/8] migration: add a migration selftest

2024-02-16 Thread Thomas Huth

On 09/02/2024 10.11, Nicholas Piggin wrote:

Add a selftest for migration support in  guest library and test harness
code. It performs migrations in a tight loop to irritate races and bugs
in the test harness code.

Include the test in arm, s390, powerpc.

Acked-by: Claudio Imbrenda  (s390x)
Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
  arm/Makefile.common  |  1 +
  arm/selftest-migration.c |  1 +
  arm/unittests.cfg|  6 ++


 Hi Nicholas,

I just gave the patches a try, but the arm test seems to fail for me: Only 
the first getchar() seems to wait for a character, all the subsequent ones 
don't wait anymore and just continue immediately ... is this working for 
you? Or do I need another patch on top?


 Thanks,
  Thomas




[PATCH 2/2] powerpc: Don't ignore errors from set_memory_{n}p() in __kernel_map_pages()

2024-02-16 Thread Christophe Leroy
set_memory_p() and set_memory_np() can fail.

As mentioned in linux/mm.h:

/*
 * To support DEBUG_PAGEALLOC architecture must ensure that
 * __kernel_map_pages() never fails
 */

So panic in case set_memory_p() or set_memory_np() fail
in __kernel_map_pages().

Link: https://github.com/KSPP/linux/issues/7
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  3 ++-
 arch/powerpc/mm/pageattr.c| 10 +++---
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 6e70ae511631..8f47ae79f2a6 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -269,7 +269,7 @@ int hash__create_section_mapping(unsigned long start, 
unsigned long end,
 int nid, pgprot_t prot);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
-void hash__kernel_map_pages(struct page *page, int numpages, int enable);
+int hash__kernel_map_pages(struct page *page, int numpages, int enable);
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 0626a25b0d72..01c3b4b65241 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -2172,7 +2172,7 @@ static void kernel_unmap_linear_page(unsigned long vaddr, 
unsigned long lmi)
 mmu_kernel_ssize, 0);
 }
 
-void hash__kernel_map_pages(struct page *page, int numpages, int enable)
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
 {
unsigned long flags, vaddr, lmi;
int i;
@@ -2189,6 +2189,7 @@ void hash__kernel_map_pages(struct page *page, int 
numpages, int enable)
kernel_unmap_linear_page(vaddr, lmi);
}
local_irq_restore(flags);
+   return 0;
 }
 #endif /* CONFIG_DEBUG_PAGEALLOC || CONFIG_KFENCE */
 
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 16b8d20d6ca8..62b678585878 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -106,17 +106,21 @@ int change_memory_attr(unsigned long addr, int numpages, 
long action)
 #ifdef CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
+   int err;
unsigned long addr = (unsigned long)page_address(page);
 
if (PageHighMem(page))
return;
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
-   hash__kernel_map_pages(page, numpages, enable);
+   err = hash__kernel_map_pages(page, numpages, enable);
else if (enable)
-   set_memory_p(addr, numpages);
+   err = set_memory_p(addr, numpages);
else
-   set_memory_np(addr, numpages);
+   err = set_memory_np(addr, numpages);
+
+   if (err)
+   panic("%s: set_memory_%sp() failed\n", enable ? "" : "n");
 }
 #endif
 #endif
-- 
2.43.0



[PATCH 1/2] powerpc: Refactor __kernel_map_pages()

2024-02-16 Thread Christophe Leroy
__kernel_map_pages() is almost identical for PPC32 and RADIX.

Refactor it.

On PPC32 it is not needed for KFENCE, but to keep it simple
just make it similar to PPC64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
 arch/powerpc/include/asm/book3s/64/radix.h   |  2 --
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 --
 arch/powerpc/mm/pageattr.c   | 19 +++
 arch/powerpc/mm/pgtable_32.c | 15 ---
 5 files changed, 19 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 927d585652bc..62c43d3d80ec 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1027,16 +1027,6 @@ static inline void vmemmap_remove_mapping(unsigned long 
start,
 }
 #endif
 
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
-static inline void __kernel_map_pages(struct page *page, int numpages, int 
enable)
-{
-   if (radix_enabled())
-   radix__kernel_map_pages(page, numpages, enable);
-   else
-   hash__kernel_map_pages(page, numpages, enable);
-}
-#endif
-
 static inline pte_t pmd_pte(pmd_t pmd)
 {
return __pte_raw(pmd_raw(pmd));
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 357e23a403d3..8f55ff74bb68 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -362,8 +362,6 @@ int radix__create_section_mapping(unsigned long start, 
unsigned long end,
 int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-void radix__kernel_map_pages(struct page *page, int numpages, int enable);
-
 #ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
 #define vmemmap_can_optimize vmemmap_can_optimize
 bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap 
*pgmap);
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c6a4ac766b2b..e16e2fd104c5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1339,20 +1339,6 @@ void __ref radix__vmemmap_free(unsigned long start, 
unsigned long end,
 #endif
 #endif
 
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
-void radix__kernel_map_pages(struct page *page, int numpages, int enable)
-{
-   unsigned long addr;
-
-   addr = (unsigned long)page_address(page);
-
-   if (enable)
-   set_memory_p(addr, numpages);
-   else
-   set_memory_np(addr, numpages);
-}
-#endif
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
 unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long 
addr,
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 421db7c4f2a4..16b8d20d6ca8 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -101,3 +101,22 @@ int change_memory_attr(unsigned long addr, int numpages, 
long action)
return apply_to_existing_page_range(_mm, start, size,
change_page_attr, (void *)action);
 }
+
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
+#ifdef CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC
+void __kernel_map_pages(struct page *page, int numpages, int enable)
+{
+   unsigned long addr = (unsigned long)page_address(page);
+
+   if (PageHighMem(page))
+   return;
+
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
+   hash__kernel_map_pages(page, numpages, enable);
+   else if (enable)
+   set_memory_p(addr, numpages);
+   else
+   set_memory_np(addr, numpages);
+}
+#endif
+#endif
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index e94919853ca3..4be97b4a44f9 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -188,18 +188,3 @@ void mark_rodata_ro(void)
ptdump_check_wx();
 }
 #endif
-
-#if defined(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && 
defined(CONFIG_DEBUG_PAGEALLOC)
-void __kernel_map_pages(struct page *page, int numpages, int enable)
-{
-   unsigned long addr = (unsigned long)page_address(page);
-
-   if (PageHighMem(page))
-   return;
-
-   if (enable)
-   set_memory_p(addr, numpages);
-   else
-   set_memory_np(addr, numpages);
-}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
-- 
2.43.0



[PATCH] powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()

2024-02-16 Thread Christophe Leroy
mark_rodata_ro() and mark_initmem_nx() use functions that can
fail like set_memory_nx() and set_memory_ro(), leading to a not
protected kernel.

In case of failure, panic.

Link: https://github.com/KSPP/linux/issues/7
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/mmu.c |  7 --
 arch/powerpc/mm/mmu_decl.h |  8 +++
 arch/powerpc/mm/nohash/8xx.c   | 33 +---
 arch/powerpc/mm/nohash/e500.c  | 10 ++---
 arch/powerpc/mm/pgtable_32.c   | 39 --
 5 files changed, 65 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 5445587bfe84..100f999871bc 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -193,7 +193,7 @@ static bool is_module_segment(unsigned long addr)
return true;
 }
 
-void mmu_mark_initmem_nx(void)
+int mmu_mark_initmem_nx(void)
 {
int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
int i;
@@ -230,9 +230,10 @@ void mmu_mark_initmem_nx(void)
 
mtsr(mfsr(i << 28) | 0x1000, i << 28);
}
+   return 0;
 }
 
-void mmu_mark_rodata_ro(void)
+int mmu_mark_rodata_ro(void)
 {
int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
int i;
@@ -245,6 +246,8 @@ void mmu_mark_rodata_ro(void)
}
 
update_bats();
+
+   return 0;
 }
 
 /*
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 72341b9fb552..6107e4944509 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -160,11 +160,11 @@ static inline unsigned long p_block_mapped(phys_addr_t 
pa) { return 0; }
 #endif
 
 #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_PPC_8xx) || 
defined(CONFIG_PPC_E500)
-void mmu_mark_initmem_nx(void);
-void mmu_mark_rodata_ro(void);
+int mmu_mark_initmem_nx(void);
+int mmu_mark_rodata_ro(void);
 #else
-static inline void mmu_mark_initmem_nx(void) { }
-static inline void mmu_mark_rodata_ro(void) { }
+static inline int mmu_mark_initmem_nx(void) { return 0; }
+static inline int mmu_mark_rodata_ro(void) { return 0; }
 #endif
 
 #ifdef CONFIG_PPC_8xx
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 6be6421086ed..43d4842bb1c7 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -119,23 +119,26 @@ void __init mmu_mapin_immr(void)
PAGE_KERNEL_NCG, MMU_PAGE_512K, true);
 }
 
-static void mmu_mapin_ram_chunk(unsigned long offset, unsigned long top,
-   pgprot_t prot, bool new)
+static int mmu_mapin_ram_chunk(unsigned long offset, unsigned long top,
+  pgprot_t prot, bool new)
 {
unsigned long v = PAGE_OFFSET + offset;
unsigned long p = offset;
+   int err = 0;
 
WARN_ON(!IS_ALIGNED(offset, SZ_512K) || !IS_ALIGNED(top, SZ_512K));
 
-   for (; p < ALIGN(p, SZ_8M) && p < top; p += SZ_512K, v += SZ_512K)
-   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, new);
-   for (; p < ALIGN_DOWN(top, SZ_8M) && p < top; p += SZ_8M, v += SZ_8M)
-   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_8M, new);
-   for (; p < ALIGN_DOWN(top, SZ_512K) && p < top; p += SZ_512K, v += 
SZ_512K)
-   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, new);
+   for (; p < ALIGN(p, SZ_8M) && p < top && !err; p += SZ_512K, v += 
SZ_512K)
+   err = __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, 
new);
+   for (; p < ALIGN_DOWN(top, SZ_8M) && p < top && !err; p += SZ_8M, v += 
SZ_8M)
+   err = __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_8M, new);
+   for (; p < ALIGN_DOWN(top, SZ_512K) && p < top && !err; p += SZ_512K, v 
+= SZ_512K)
+   err = __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, 
new);
 
if (!new)
flush_tlb_kernel_range(PAGE_OFFSET + v, PAGE_OFFSET + top);
+
+   return err;
 }
 
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
@@ -166,27 +169,33 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
return top;
 }
 
-void mmu_mark_initmem_nx(void)
+int mmu_mark_initmem_nx(void)
 {
unsigned long etext8 = ALIGN(__pa(_etext), SZ_8M);
unsigned long sinittext = __pa(_sinittext);
unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : 
etext8;
unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
+   int err = 0;
 
if (!debug_pagealloc_enabled_or_kfence())
-   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
+   err = mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, 
false);
 
mmu_pin_tlb(block_mapped_ram, false);
+
+   return err;
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
-void mmu_mark_rodata_ro(void)
+int mmu_mark_rodata_ro(void)
 {
unsigned long 

[PATCH] powerpc/kprobes: Handle error returned by set_memory_rox()

2024-02-16 Thread Christophe Leroy
set_memory_rox() can fail.

In case it fails, free allocated memory and return NULL.

Link: https://github.com/KSPP/linux/issues/7
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b20ee72e873a..bbca90a5e2ec 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -134,10 +134,16 @@ void *alloc_insn_page(void)
if (!page)
return NULL;
 
-   if (strict_module_rwx_enabled())
-   set_memory_rox((unsigned long)page, 1);
+   if (strict_module_rwx_enabled()) {
+   int err = set_memory_rox((unsigned long)page, 1);
 
+   if (err)
+   goto error;
+   }
return page;
+error:
+   module_memfree(page);
+   return NULL;
 }
 
 int arch_prepare_kprobe(struct kprobe *p)
-- 
2.43.0



[PATCH] powerpc: Implement set_memory_rox()

2024-02-16 Thread Christophe Leroy
Same as x86 and s390, add set_memory_rox() to avoid doing
one pass with set_memory_ro() and a second pass with set_memory_x().

See commit 60463628c9e0 ("x86/mm: Implement native set_memory_rox()")
and commit 22e99fa56443 ("s390/mm: implement set_memory_rox()") for
more information.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/set_memory.h | 7 +++
 arch/powerpc/mm/pageattr.c| 4 
 2 files changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
index 7ebc807aa8cc..9a025b776a4b 100644
--- a/arch/powerpc/include/asm/set_memory.h
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -8,6 +8,7 @@
 #define SET_MEMORY_X   3
 #define SET_MEMORY_NP  4   /* Set memory non present */
 #define SET_MEMORY_P   5   /* Set memory present */
+#define SET_MEMORY_ROX 6
 
 int change_memory_attr(unsigned long addr, int numpages, long action);
 
@@ -41,4 +42,10 @@ static inline int set_memory_p(unsigned long addr, int 
numpages)
return change_memory_attr(addr, numpages, SET_MEMORY_P);
 }
 
+static inline int set_memory_rox(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_ROX);
+}
+#define set_memory_rox set_memory_rox
+
 #endif
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 6163e484bc6d..421db7c4f2a4 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -38,6 +38,10 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, 
void *data)
/* Don't clear DIRTY bit */
pte_update_delta(ptep, addr, _PAGE_KERNEL_RW & ~_PAGE_DIRTY, 
_PAGE_KERNEL_RO);
break;
+   case SET_MEMORY_ROX:
+   /* Don't clear DIRTY bit */
+   pte_update_delta(ptep, addr, _PAGE_KERNEL_RW & ~_PAGE_DIRTY, 
_PAGE_KERNEL_ROX);
+   break;
case SET_MEMORY_RW:
pte_update_delta(ptep, addr, _PAGE_KERNEL_RO, _PAGE_KERNEL_RW);
break;
-- 
2.43.0



[PATCH] powerpc: Use user_mode() macro when possible

2024-02-16 Thread Christophe Leroy
There is a nice macro to check user mode.

Use it instead of open coding anding with MSR_PR to increase
readability and avoid having to comment what that anding is for.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/interrupt.h |  2 +-
 arch/powerpc/kernel/syscall.c|  2 +-
 arch/powerpc/kernel/traps.c  |  4 ++--
 arch/powerpc/lib/sstep.c | 23 +++
 arch/powerpc/perf/core-book3s.c  |  2 +-
 arch/powerpc/xmon/xmon.c |  4 ++--
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index a4196ab1d016..7b610864b364 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -97,7 +97,7 @@ DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
 
 static inline bool is_implicit_soft_masked(struct pt_regs *regs)
 {
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
return false;
 
if (regs->nip >= (unsigned long)__end_soft_masked)
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 77fedb190c93..f6f868e817e6 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -31,7 +31,7 @@ notrace long system_call_exception(struct pt_regs *regs, 
unsigned long r0)
user_exit_irqoff();
 
BUG_ON(regs_is_unrecoverable(regs));
-   BUG_ON(!(regs->msr & MSR_PR));
+   BUG_ON(!user_mode(regs));
BUG_ON(arch_irq_disabled_regs(regs));
 
 #ifdef CONFIG_PPC_PKEY
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 11e062b47d3f..f23430adb68a 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -404,7 +404,7 @@ noinstr void hv_nmi_check_nonrecoverable(struct pt_regs 
*regs)
return;
if (!(regs->msr & MSR_HV))
return;
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
return;
 
/*
@@ -1510,7 +1510,7 @@ static void do_program_check(struct pt_regs *regs)
if (!is_kernel_addr(bugaddr) && !(regs->msr & MSR_IR))
bugaddr += PAGE_OFFSET;
 
-   if (!(regs->msr & MSR_PR) &&  /* not user-mode */
+   if (!user_mode(regs) &&
report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
regs_add_return_ip(regs, 4);
return;
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5766180f5380..e65f3fb68d06 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1429,7 +1429,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
return 1;
 
case 18:/* rfid, scary */
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
goto priv;
op->type = RFI;
return 0;
@@ -1742,13 +1742,13 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
return 1;
 #endif
case 83:/* mfmsr */
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
goto priv;
op->type = MFMSR;
op->reg = rd;
return 0;
case 146:   /* mtmsr */
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
goto priv;
op->type = MTMSR;
op->reg = rd;
@@ -1756,7 +1756,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
return 0;
 #ifdef CONFIG_PPC64
case 178:   /* mtmsrd */
-   if (regs->msr & MSR_PR)
+   if (user_mode(regs))
goto priv;
op->type = MTMSR;
op->reg = rd;
@@ -3437,14 +3437,14 @@ int emulate_loadstore(struct pt_regs *regs, struct 
instruction_op *op)
 * stored in the thread_struct.  If the instruction is in
 * the kernel, we must not touch the state in the thread_struct.
 */
-   if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
+   if (!user_mode(regs) && !(regs->msr & MSR_FP))
return 0;
err = do_fp_load(op, ea, regs, cross_endian);
break;
 #endif
 #ifdef CONFIG_ALTIVEC
case LOAD_VMX:
-   if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
+   if (!user_mode(regs) && !(regs->msr & MSR_VEC))
return 0;
err = do_vec_load(op->reg, ea, size, regs, cross_endian);
break;
@@ -3459,7 +3459,7 

[PATCH] powerpc/trace: Restrict hash_fault trace event to HASH MMU

2024-02-16 Thread Christophe Leroy
'perf list' on powerpc 8xx shows an event named "1:hash_fault".

This event is pointless because trace_hash_fault() is called only
from mm/book3s64/hash_utils.c

Only define it when CONFIG_PPC_64S_HASH_MMU is selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/trace.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index 82cc2c6704e6..d9ac3a4f46e1 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -267,6 +267,7 @@ TRACE_EVENT_FN(opal_exit,
 );
 #endif
 
+#ifdef CONFIG_PPC_64S_HASH_MMU
 TRACE_EVENT(hash_fault,
 
TP_PROTO(unsigned long addr, unsigned long access, unsigned long 
trap),
@@ -286,7 +287,7 @@ TRACE_EVENT(hash_fault,
TP_printk("hash fault with addr 0x%lx and access = 0x%lx trap = 
0x%lx",
  __entry->addr, __entry->access, __entry->trap)
 );
-
+#endif
 
 TRACE_EVENT(tlbie,
 
-- 
2.43.0