RE: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info

2024-04-22 Thread Duan, Zhenzhong



>-Original Message-
>From: Jonathan Cameron 
>Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might
>be ANFE in aer_err_info
>
>On Wed, 17 Apr 2024 14:14:05 +0800
>Zhenzhong Duan  wrote:
>
>> In some cases the detector of a Non-Fatal Error(NFE) is not the most
>> appropriate agent to determine the type of the error. For example,
>> when software performs a configuration read from a non-existent
>> device or Function, completer will send an ERR_NONFATAL Message.
>> On some platforms, ERR_NONFATAL results in a System Error, which
>> breaks normal software probing.
>>
>> Advisory Non-Fatal Error(ANFE) is a special case that can be used
>> in above scenario. It is predominantly determined by the role of the
>> detecting agent (Requester, Completer, or Receiver) and the specific
>> error. In such cases, an agent with AER signals the NFE (if enabled)
>> by sending an ERR_COR Message as an advisory to software, instead of
>> sending ERR_NONFATAL.
>>
>> When processing an ANFE, ideally both correctable error(CE) status and
>> uncorrectable error(UE) status should be cleared. However, there is no
>> way to fully identify the UE associated with ANFE. Even worse, a Fatal
>> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as
>> ANFE. Treating an ANFE as NFE will reproduce above mentioned issue,
>> i.e., breaking softwore probing; treating NFE as ANFE will make us
>> ignoring some UEs which need active recover operation. To avoid clearing
>> UEs that are not ANFE by accident, the most conservative route is taken
>> here: If any of the FE/NFE Detected bits is set in Device Status, do not
>> touch UE status, they should be cleared later by the UE handler. Otherwise,
>> a specific set of UEs that may be raised as ANFE according to the PCIe
>> specification will be cleared if their corresponding severity is Non-Fatal.
>>
>> To achieve above purpose, store UNCOR_STATUS bits that might be ANFE
>> in aer_err_info.anfe_status. So that those bits could be printed and
>> processed later.
>>
>> Tested-by: Yudong Wang 
>> Co-developed-by: "Wang, Qingshun" 
>> Signed-off-by: "Wang, Qingshun" 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  drivers/pci/pci.h  |  1 +
>>  drivers/pci/pcie/aer.c | 45
>++
>>  2 files changed, 46 insertions(+)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 17fed1846847..3f9eb807f9fd 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -412,6 +412,7 @@ struct aer_err_info {
>>
>>  unsigned int status;/* COR/UNCOR Error Status */
>>  unsigned int mask;  /* COR/UNCOR Error Mask */
>> +unsigned int anfe_status;   /* UNCOR Error Status for ANFE */
>>  struct pcie_tlp_log tlp;/* TLP Header */
>>  };
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index ac6293c24976..27364ab4b148 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -107,6 +107,12 @@ struct aer_stats {
>>  PCI_ERR_ROOT_MULTI_COR_RCV |
>   \
>>  PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>>
>> +#define AER_ERR_ANFE_UNC_MASK
>   (PCI_ERR_UNC_POISON_TLP |   \
>> +PCI_ERR_UNC_COMP_TIME |
>   \
>> +PCI_ERR_UNC_COMP_ABORT |
>   \
>> +PCI_ERR_UNC_UNX_COMP |
>   \
>> +PCI_ERR_UNC_UNSUP)
>> +
>>  static int pcie_aer_disable;
>>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>>
>> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain, unsigned
>int bus, unsigned int devfn,
>>  EXPORT_SYMBOL_GPL(aer_recover_queue);
>>  #endif
>>
>> +static void anfe_get_uc_status(struct pci_dev *dev, struct aer_err_info
>*info)
>> +{
>> +u32 uncor_mask, uncor_status;
>> +u16 device_status;
>> +int aer = dev->aer_cap;
>> +
>> +if (pcie_capability_read_word(dev, PCI_EXP_DEVSTA,
>_status))
>> +return;
>> +/*
>> + * Take the most conservative route here. If there are
>> + * Non-Fatal/Fatal errors detected, do not assume any
>> + * bit in uncor_status is set by ANFE.
>> + */
>> +if (device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED))
>> +return;
>> +
>
>Is there not a race here?  If we happen to get either an NFED or FED
>between the read of device_status above and here we might pick up a status
>that corresponds to that (and hence clear something we should not).

In this scenario, info->anfe_status is 0.

>
>Or am I missing that race being close somewhere?

The bits leading to NFED or FED is masked out when assigning info->anfe_status.
Bits for FED is masked out by ~info->severity,
bit for NFED is masked out by AER_ERR_ANFE_UNC_MASK.

So we never clear status bits for NFED or FED in ANFE handler.

See below assignment of info->anfe_status.

Thanks

Re: [RFC PATCH 2/2] objtool/powerpc: Enhance objtool to fixup alternate feature relative addresses

2024-04-22 Thread Nathan Chancellor
Hi Sathvika,

On Mon, Apr 22, 2024 at 02:52:06PM +0530, Sathvika Vasireddy wrote:
> Implement build-time fixup of alternate feature relative addresses for
> the out-of-line (else) patch code. Initial posting to achieve the same
> using another tool can be found at [1]. Idea is to implement this using
> objtool instead of introducing another tool since it already has elf
> parsing and processing covered.
> 
> Introduce --ftr-fixup as an option to objtool to do feature fixup at
> build-time.
> 
> Couple of issues and warnings encountered while implementing feature
> fixup using objtool are as follows:
> 
> 1. libelf is creating corrupted vmlinux file after writing necessary
> changes to the file. Due to this, kexec is not able to load new
> kernel.
> 
> It gives the following error:
> ELF Note corrupted !
> Cannot determine the file type of vmlinux
> 
> To fix this issue, after opening vmlinux file, make a call to
> elf_flagelf (e, ELF_C_SET, ELF_F_LAYOUT). This instructs libelf not
> to touch the segment and section layout. It informs the library
> that the application will take responsibility for the layout of the
> file and that the library should not insert any padding between
> sections.
> 
> 2. Fix can't find starting instruction warnings when run on vmlinux
> 
> Objtool throws a lot of can't find starting instruction warnings
> when run on vmlinux with --ftr-fixup option.
> 
> These warnings are seen because find_insn() function looks for
> instructions at offsets that are relative to the start of the section.
> In case of individual object files (.o), there are no can't find
> starting instruction warnings seen because the actual offset
> associated with an instruction is itself a relative offset since the
> sections start at offset 0x0.
> 
> However, in case of vmlinux, find_insn() function fails to find
> instructions at the actual offset associated with an instruction
> since the sections in vmlinux do not start at offset 0x0. Due to
> this, find_insn() will look for absolute offset and not the relative
> offset. This is resulting in a lot of can't find starting instruction
> warnings when objtool is run on vmlinux.
> 
> To fix this, pass offset that is relative to the start of the section
> to find_insn().
> 
> find_insn() is also looking for symbols of size 0. But, objtool does
> not store empty STT_NOTYPE symbols in the rbtree. Due to this,
> for empty symbols, objtool is throwing can't find starting
> instruction warnings. Fix this by ignoring symbols that are of
> size 0 since objtool does not add them to the rbtree.
> 
> 3. Objtool is throwing unannotated intra-function call warnings
> when run on vmlinux with --ftr-fixup option.
> 
> One such example:
> 
> vmlinux: warning: objtool: .text+0x3d94:
> unannotated intra-function call
> 
> .text + 0x3d94 = c0008000 + 3d94 = c00081d4
> 
> c00081d4: 45 24 02 48  bl c002a618
> 
> 
> c002a610 :
> c002a610:   0e 01 4c 3c addis   r2,r12,270
> c002a610: R_PPC64_REL16_HA.TOC.
> c002a614:   f0 6c 42 38 addir2,r2,27888
> c002a614: R_PPC64_REL16_LO.TOC.+0x4
> c002a618:   a6 02 08 7c mflrr0
> 
> This is happening because we should be looking for destination
> symbols that are at absolute offsets instead of relative offsets.
> After fixing dest_off to point to absolute offset, there are still
> a lot of these warnings shown.
> 
> In the above example, objtool is computing the destination
> offset to be c002a618, which points to a completely
> different instruction. find_call_destination() is looking for this
> offset and failing. Instead, we should be looking for destination
> offset c002a610 which points to system_reset_exception
> function.
> 
> Even after fixing the way destination offset is computed, and
> after looking for dest_off - 0x8 in cases where the original offset
> is not found, there are still a lot of unannotated intra-function
> call warnings generated. This is due to symbols that are not
> properly annotated.
> 
> So, for now, as a hack to curb these warnings, do not emit
> unannotated intra-function call warnings when objtool is run
> with --ftr-fixup option.
> 
> TODO:
> This patch enables build time feature fixup only for powerpc little
> endian configs. There are boot failures with big endian configs.
> Posting this as an initial RFC to get some review comments while I work
> on big endian issues.
> 
> [1]
> https://lore.kernel.org/linuxppc-dev/20170521010130.13552-1-npig...@gmail.com/
> 
> Co-developed-by: Nicholas Piggin 
> Signed-off-by: Nicholas Piggin 
> Signed-off-by: Sathvika Vasireddy 

When I build this series with LLVM 14 [1] (due to an issue I report
below), I am getting a crash when CONFIG_FTR_FIXUP_SELFTEST is disabled.

diff --git a/arch/powerpc/configs/ppc64_defconfig 

Re: [RFC PATCH 1/2] objtool: Run objtool only if either of the config options are selected

2024-04-22 Thread Sathvika Vasireddy

Hi Masahiro, thanks for reviewing.

On 4/22/24 5:39 PM, Masahiro Yamada wrote:

On Mon, Apr 22, 2024 at 6:25 PM Sathvika Vasireddy  wrote:

Currently, when objtool is enabled and none of the supported options
are triggered, kernel build errors out with the below error:
error: objtool: At least one command required.


Then, I think CONFIG_OBJTOOL should be disabled.
A subsequent patch introduces --ftr-fixup as an option to objtool to do 
feature fixup at build-time via CONFIG_HAVE_OBJTOOL_FTR_FIXUP option. If 
CONFIG_OBJTOOL is not selected, then objtool cannot be used to pass 
--ftr-fixup option.


In cases where none of the supported options (like --mcount on powerpc 
for example) is triggered, but still require --ftr-fixup option to be 
passed to objtool, we see "error: objtool: At least one command 
required" errors. So, to address this, run only when either of the 
config options are selected.


Thanks,
Sathvika

[PATCH v2] powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE

2024-04-22 Thread Gaurav Batra
At the time of LPAR boot up, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR boot up. One of the scenario is
where the firmware has frozen the PE due to some error condition. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is booted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

BUG: Kernel NULL pointer dereference on read at 0x00c8
Faulting instruction address: 0xc01024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
(NM1060_023) hv:phyp pSeries
NIP:  c01024c0 LR: c01024b0 CTR: c0102450
REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
MSR:  82009033   CR: 28000822  XER: 
CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
...
NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Signed-off-by: Gaurav Batra 
---
 arch/powerpc/platforms/pseries/iommu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..b1e6d275cda9 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
*bus)
 * parent bus. During reboot, there will be ibm,dma-window property to
 * define DMA window. For kdump, there will at least be default window 
or DDW
 * or both.
+* There is an exception to the above. In case the PE goes into frozen
+* state, firmware may not provide ibm,dma-window property at the time
+* of LPAR boot up.
 */
 
+   if (!pdn) {
+   pr_debug("  no ibm,dma-window property !\n");
+   return;
+   }
+
ppci = PCI_DN(pdn);
 
pr_debug("  parent is %pOF, iommu_table: 0x%p\n",

base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
-- 
2.39.3 (Apple Git-146)



[PATCH v10 3/3] Documentation/powerpc: update fadump implementation details

2024-04-22 Thread Sourabh Jain
The patch titled ("powerpc: make fadump resilient with memory add/remove
events") has made significant changes to the implementation of fadump,
particularly on elfcorehdr creation and fadump crash info header
structure. Therefore, updating the fadump implementation documentation
to reflect those changes.

Following updates are done to firmware assisted dump documentation:

1. The elfcorehdr is no longer stored after fadump HDR in the reserved
   dump area. Instead, the second kernel dynamically allocates memory
   for the elfcorehdr within the address range from 0 to the boot memory
   size. Therefore, update figures 1 and 2 of Memory Reservation during
   the first and second kernels to reflect this change.

2. A version field has been added to the fadump header to manage the
   future changes to fadump crash info header structure without changing
   the fadump header magic number in the future. Therefore, remove the
   corresponding TODO from the document.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 .../arch/powerpc/firmware-assisted-dump.rst   | 91 +--
 1 file changed, 42 insertions(+), 49 deletions(-)

diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst 
b/Documentation/arch/powerpc/firmware-assisted-dump.rst
index e363fc48529a..7e37aadd1f77 100644
--- a/Documentation/arch/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst
@@ -134,12 +134,12 @@ that are run. If there is dump data, then the
 memory is held.
 
 If there is no waiting dump data, then only the memory required to
-hold CPU state, HPTE region, boot memory dump, FADump header and
-elfcore header, is usually reserved at an offset greater than boot
-memory size (see Fig. 1). This area is *not* released: this region
-will be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state and
-HPTE region, in the case a crash does occur.
+hold CPU state, HPTE region, boot memory dump, and FADump header is
+usually reserved at an offset greater than boot memory size (see Fig. 1).
+This area is *not* released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy of the boot
+memory content in addition to CPU state and HPTE region, in the case
+a crash does occur.
 
 Since this reserved memory area is used only after the system crash,
 there is no point in blocking this significant chunk of memory from
@@ -153,22 +153,22 @@ that were present in CMA region::
 
   o Memory Reservation during first kernel
 
-  Low memory Top of memory
-  0boot memory size   |<--- Reserved dump area --->|   |
-  |   |   |Permanent Reservation   |   |
-  V   V   ||   V
-  +---+-/ /---+---++---+-+-++--+
-  |   |   |///||  DUMP | HDR | ELF ||  |
-  +---+-/ /---+---++---+-+-++--+
-|   ^^ ^  ^   ^
-|   || |  |   |
-\  CPU  HPTE   /  |   |
- --   |   |
-  Boot memory content gets transferred|   |
-  to reserved area by firmware at the |   |
-  time of crash.  |   |
-  FADump Header   |
-   (meta area)|
+  Low memory  Top of memory
+  0boot memory size   |<-- Reserved dump area ->| |
+  |   |   |  Permanent Reservation  | |
+  V   V   | | V
+  +---+-/ /---+---++---+---++-+
+  |   |   |///||DUMP   |  HDR  || |
+  +---+-/ /---+---++---+---++-+
+|   ^^   ^ ^  ^
+|   ||   | |  |
+\  CPU  HPTE / |  |
+   |  |
+  Boot memory content gets transferred |  |
+  to reserved area by firmware at the  |  |
+  time of crash.   |  |
+   FADump Header  |
+(meta area)   |
   |
   |
   Metadata: This area holds a metadata structure whose
@@ -186,13 +186,20 @@ that were present in 

[PATCH v10 2/3] powerpc/fadump: add hotplug_ready sysfs interface

2024-04-22 Thread Sourabh Jain
The elfcorehdr describes the CPUs and memory of the crashed kernel to
the kernel that captures the dump, known as the second or fadump kernel.
The elfcorehdr needs to be updated if the system's memory changes due to
memory hotplug or online/offline events.

Currently, memory hotplug events are monitored in userspace by udev
rules, and fadump is re-registered, which recreates the elfcorehdr with
the latest available memory in the system.

However, the previous patch ("powerpc: make fadump resilient with memory
add/remove events") moved the creation of elfcorehdr to the second or
fadump kernel. This eliminates the need to regenerate the elfcorehdr
during memory hotplug or online/offline events.

Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
userspace know that fadump re-registration is not required for memory
add/remove events.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: "Aneesh Kumar K.V" 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++
 arch/powerpc/kernel/fadump.c  | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
index 8f7a64a81783..c586054657d6 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org
 Description:   read only
Provide information about the amount of memory reserved by
FADump to save the crash dump in bytes.
+
+What:  /sys/kernel/fadump/hotplug_ready
+Date:  Apr 2024
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Kdump udev rule re-registers fadump on memory add/remove events,
+   primarily to update the elfcorehdr. This sysfs indicates the
+   kdump udev rule that fadump re-registration is not required on
+   memory add/remove events because elfcorehdr is now prepared in
+   the second/fadump kernel.
+User:  kexec-tools
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 35254fc1516b..dfab452e947b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1442,6 +1442,18 @@ static ssize_t enabled_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
+/*
+ * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates
+ * to usersapce that fadump re-registration is not required on memory
+ * hotplug events.
+ */
+static ssize_t hotplug_ready_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+   return sprintf(buf, "%d\n", 1);
+}
+
 static ssize_t mem_reserved_show(struct kobject *kobj,
 struct kobj_attribute *attr,
 char *buf)
@@ -1514,11 +1526,13 @@ static struct kobj_attribute release_attr = 
__ATTR_WO(release_mem);
 static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
 static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
+static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
_attr.attr,
_reserved_attr.attr,
+   _ready_attr.attr,
NULL,
 };
 
-- 
2.44.0



[PATCH v10 1/3] powerpc: make fadump resilient with memory add/remove events

2024-04-22 Thread Sourabh Jain
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
   can lead to race conditions and, more importantly, it creates a wide
   window during which fadump is inactive until all memory add/remove
   events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
   available during early boot and remains fixed thereafter. However, if
   elfcorehdr is later recreated with additional memblock regions, its
   size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel prepares fadump header and stores it in the
fadump reserved area. The fadump header includes the start address of
the elfcorehdr, crashing CPU details, and other relevant information. In
the event of a crash in the first kernel, the second/fadump boots and
accesses the fadump header prepared by the first kernel. It then
performs the following steps in a platform-specific function
[rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr

Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
   It stores the fadump header version, which is currently set to 1.
   This provides flexibility to update the fadump crash info header in
   the future without changing the magic number. For each change in the
   fadump header, the version will be increased. This will help the
   updated kernel determine how to handle kernel dumps from older
   kernels. The magic number remains relevant for checking fadump header
   corruption.

2. pt_regs_sz/cpu_mask_sz:
   Store size of pt_regs and cpu_mask structure of first kernel. These
   attributes are used to prevent dump processing if the sizes of
   pt_regs or cpu_mask structure differ between the first and fadump
   kernels.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: "Aneesh Kumar K.V" 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 arch/powerpc/include/asm/fadump-internal.h   |  31 +-
 arch/powerpc/kernel/fadump.c | 361 +++
 arch/powerpc/platforms/powernv/opal-fadump.c |  22 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c |  34 +-
 4 files changed, 242 insertions(+), 206 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..5d706a7acc8a 100644
--- 

[PATCH v10 0/3] powerpc: make fadump resilient with memory add/remove events

2024-04-22 Thread Sourabh Jain
Problem:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the cpus and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the patch series.

Existing solution:
==
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

Challenges with existing solution:
==
1. Performing bulk memory add/remove with udev-based fadump
   re-registration can lead to race conditions and, more importantly,
   it creates a large wide window during which fadump is inactive until
   all memory add/remove events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. Memory for elfcorehdr is allocated based on the memblock regions
   available during first kernel early boot and it remains fixed
   thereafter. However, if the elfcorehdr is later recreated with
   additional memblock regions, its size will increase, potentially
   leading to memory corruption.

Proposed solution:
==
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

To know more about elfcorehdr creation in the fadump kernel, refer to
the first patch in this series.

The second patch includes a new sysfs interface that tells userspace
that fadump re-registration isn't needed for memory add/remove events. 
note that userspace changes do not need to be in sync with kernel
changes; they can roll out independently.

Since there are significant changes in the fadump implementation, the
third patch updates the fadump documentation to reflect the changes made
in this patch series.

Kernel tree rebased on 6.9.0-rc5 with patch series applied:
===
https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v10

Userspace changes:
==
To realize this feature, one must update the kdump udev rules to prevent
fadump re-registration during memory add/remove events.

On rhel apply the following changes to file
/usr/lib/udev/rules.d/98-kexec.rules

-RUN+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; 
/usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'"
+# don't re-register fadump if the value of the node
+# /sys/kernel/fadump/hotplug_ready is 1.
+
+RUN+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test 
-f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0  || ! 
test -f /sys/kernel/fadump/hotplug_ready || cat 
/sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run 
--quiet --no-block /usr/lib/udev/kdump-udev-throttler'"

Changelog:
==
v10: 23 Apr 2024
  - Fix a type cast build error. 1/3
  - Rebase it to 6.9-rc5.

v9: 16 Apr 2024
  https://lore.kernel.org/all/20240416080848.347602-1-sourabhj...@linux.ibm.com/
  - Set the physical address of elfcorehdr to elfcorehdr_addr. 1/3
  - Set elfcorehdr_addr to ELFCORE_ADDR_ERR before freeing the elfcorehdr. 1/3
  - Mark the populate_elf_pt_load function as __init. 1/3
  - Rename a function from process_fadump to fadump_process. 1/3
  - Make minor changes to the commit message and a couple of comments in 1/3.
  - Update date of introduction of /sys/kernel/fadump/hotplug_ready sysfs. 2/3
  - Rebase it to 6.9-rc3.

v8: 16 Feb 2024
  https://lore.kernel.org/all/20240217072004.148293-1-sourabhj...@linux.ibm.com/
  - Move `elfcorehdr_addr` and `elfcorehdr_size` struct attributes
from `struct fadump_crash_info_header` to `struct fw_dump`.
  - Make minor changes in commit message 1/3.
  - Rebase it to 6.8-rc4.

v7: 11 Jan 2024
  https://lore.kernel.org/all/2024040943.297501-1-sourabhj...@linux.ibm.com/
  - Rebase it to 6.7

v6: 8 Dec 2023
  https://lore.kernel.org/all/20231208115159.82236-1-sourabhj...@linux.ibm.com/
  - Add size fields for `pt_regs` and `cpumask` in the fadump header
structure
  - Don't process the dump if the size of `pt_regs` and `cpu_mask` is
not same in the crashed and fadump kernel
  - Include an additional check for endianness mismatch when the magic
number doesn't match, to print the relevant error message
  - Don't process the dump if the fadump header contains an old 

Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-22 Thread Song Liu
Hi Masami and Mike,

On Sat, Apr 20, 2024 at 2:11 AM Masami Hiramatsu  wrote:
[...]
> > >
> > > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as
> > > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe
> > > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing
> > > the "range" object, we will have to compare different range parameters to 
> > > check
> > > we can share cached pages between module text and kprobe, which is not
> > > efficient. Did I miss something?
>
> Song, thanks for trying to eplain. I think I need to explain why I used
> module_alloc() originally.
>
> This depends on how kprobe features are implemented on the architecture, and
> how much features are supported on kprobes.
>
> Because kprobe jump optimization and kprobe jump-back optimization need to
> use a jump instruction to jump into the trampoline and jump back from the
> trampoline directly, if the architecuture jmp instruction supports +-2GB range
> like x86, it needs to allocate the trampoline buffer inside such address 
> space.
> This requirement is similar to the modules (because module function needs to
> call other functions in the kernel etc.), at least kprobes on x86 used
> module_alloc().
>
> However, if an architecture only supports breakpoint/trap based kprobe,
> it does not need to consider whether the execmem is allocated.
>
> >
> > We can always share large ROX pages as long as they are within the correct
> > address space. The permissions for them are ROX and the alignment
> > differences are due to KASAN and this is handled during allocation of the
> > large page to refill the cache. __execmem_cache_alloc() only needs to limit
> > the search for the address space of the range.
>
> So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it
> should be configured for each arch. Especially, if it is only used for
> searching parameter, it looks OK to me.

Thanks for the explanation!

I was thinking "we can have EXECMEM_KPROBE share the same parameters as
EXECMEM_MODULE_TEXT for all architectures". But this thought is built on top
of assumptions on future changes/improvements within multiple sub systems.
At this moment, I have no objections moving forward with current execmem APIs.

Thanks,
Song


Re: [PATCH v3 14/15] riscv: Add support for suppressing warning backtraces

2024-04-22 Thread Charlie Jenkins
On Wed, Apr 03, 2024 at 06:19:35AM -0700, Guenter Roeck wrote:
> Add name of functions triggering warning backtraces to the __bug_table
> object section to enable support for suppressing WARNING backtraces.
> 
> To limit image size impact, the pointer to the function name is only added
> to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
> CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
> parameter is replaced with a (dummy) NULL parameter to avoid an image size
> increase due to unused __func__ entries (this is necessary because __func__
> is not a define but a virtual variable).
> 
> To simplify the implementation, unify the __BUG_ENTRY_ADDR and
> __BUG_ENTRY_FILE macros into a single macro named __BUG_REL() which takes
> the address, file, or function reference as parameter.
> 
> Tested-by: Linux Kernel Functional Testing 
> Acked-by: Dan Carpenter 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Signed-off-by: Guenter Roeck 
> ---
> v2:
> - Rebased to v6.9-rc1
> - Added Tested-by:, Acked-by:, and Reviewed-by: tags
> - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option
> v3:
> - Rebased to v6.9-rc2
> 
>  arch/riscv/include/asm/bug.h | 38 
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
> index 1aaea81fb141..79f360af4ad8 100644
> --- a/arch/riscv/include/asm/bug.h
> +++ b/arch/riscv/include/asm/bug.h
> @@ -30,26 +30,39 @@
>  typedef u32 bug_insn_t;
>  
>  #ifdef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
> -#define __BUG_ENTRY_ADDR RISCV_INT " 1b - ."
> -#define __BUG_ENTRY_FILE RISCV_INT " %0 - ."
> +#define __BUG_REL(val)   RISCV_INT " " __stringify(val) " - ."
>  #else
> -#define __BUG_ENTRY_ADDR RISCV_PTR " 1b"
> -#define __BUG_ENTRY_FILE RISCV_PTR " %0"
> +#define __BUG_REL(val)   RISCV_PTR " " __stringify(val)
>  #endif
>  
>  #ifdef CONFIG_DEBUG_BUGVERBOSE
> +
> +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> +# define HAVE_BUG_FUNCTION
> +# define __BUG_FUNC_PTR  __BUG_REL(%1)
> +#else
> +# define __BUG_FUNC_PTR
> +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
> +
>  #define __BUG_ENTRY  \
> - __BUG_ENTRY_ADDR "\n\t" \
> - __BUG_ENTRY_FILE "\n\t" \
> - RISCV_SHORT " %1\n\t"   \
> - RISCV_SHORT " %2"
> + __BUG_REL(1b) "\n\t"\
> + __BUG_REL(%0) "\n\t"\
> + __BUG_FUNC_PTR "\n\t"   \
> + RISCV_SHORT " %2\n\t"   \
> + RISCV_SHORT " %3"
>  #else
>  #define __BUG_ENTRY  \
> - __BUG_ENTRY_ADDR "\n\t" \
> - RISCV_SHORT " %2"
> + __BUG_REL(1b) "\n\t"\
> + RISCV_SHORT " %3"
>  #endif
>  
>  #ifdef CONFIG_GENERIC_BUG
> +#ifdef HAVE_BUG_FUNCTION
> +# define __BUG_FUNC  __func__
> +#else
> +# define __BUG_FUNC  NULL
> +#endif
> +
>  #define __BUG_FLAGS(flags)   \
>  do { \
>   __asm__ __volatile__ (  \
> @@ -58,10 +71,11 @@ do {  
> \
>   ".pushsection __bug_table,\"aw\"\n\t"   \
>   "2:\n\t"\
>   __BUG_ENTRY "\n\t"  \
> - ".org 2b + %3\n\t"  \
> + ".org 2b + %4\n\t"  \
>   ".popsection"   \
>   :   \
> - : "i" (__FILE__), "i" (__LINE__),   \
> + : "i" (__FILE__), "i" (__BUG_FUNC), \
> +   "i" (__LINE__),   \
> "i" (flags),  \
> "i" (sizeof(struct bug_entry)));  \
>  } while (0)
> -- 
> 2.39.2
> 
> 
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Reviewed-by: Charlie Jenkins 

- Charlie



Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info

2024-04-22 Thread Jonathan Cameron
On Wed, 17 Apr 2024 14:14:05 +0800
Zhenzhong Duan  wrote:

> In some cases the detector of a Non-Fatal Error(NFE) is not the most
> appropriate agent to determine the type of the error. For example,
> when software performs a configuration read from a non-existent
> device or Function, completer will send an ERR_NONFATAL Message.
> On some platforms, ERR_NONFATAL results in a System Error, which
> breaks normal software probing.
> 
> Advisory Non-Fatal Error(ANFE) is a special case that can be used
> in above scenario. It is predominantly determined by the role of the
> detecting agent (Requester, Completer, or Receiver) and the specific
> error. In such cases, an agent with AER signals the NFE (if enabled)
> by sending an ERR_COR Message as an advisory to software, instead of
> sending ERR_NONFATAL.
> 
> When processing an ANFE, ideally both correctable error(CE) status and
> uncorrectable error(UE) status should be cleared. However, there is no
> way to fully identify the UE associated with ANFE. Even worse, a Fatal
> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as
> ANFE. Treating an ANFE as NFE will reproduce above mentioned issue,
> i.e., breaking softwore probing; treating NFE as ANFE will make us
> ignoring some UEs which need active recover operation. To avoid clearing
> UEs that are not ANFE by accident, the most conservative route is taken
> here: If any of the FE/NFE Detected bits is set in Device Status, do not
> touch UE status, they should be cleared later by the UE handler. Otherwise,
> a specific set of UEs that may be raised as ANFE according to the PCIe
> specification will be cleared if their corresponding severity is Non-Fatal.
> 
> To achieve above purpose, store UNCOR_STATUS bits that might be ANFE
> in aer_err_info.anfe_status. So that those bits could be printed and
> processed later.
> 
> Tested-by: Yudong Wang 
> Co-developed-by: "Wang, Qingshun" 
> Signed-off-by: "Wang, Qingshun" 
> Signed-off-by: Zhenzhong Duan 
> ---
>  drivers/pci/pci.h  |  1 +
>  drivers/pci/pcie/aer.c | 45 ++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 17fed1846847..3f9eb807f9fd 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -412,6 +412,7 @@ struct aer_err_info {
>  
>   unsigned int status;/* COR/UNCOR Error Status */
>   unsigned int mask;  /* COR/UNCOR Error Mask */
> + unsigned int anfe_status;   /* UNCOR Error Status for ANFE */
>   struct pcie_tlp_log tlp;/* TLP Header */
>  };
>  
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..27364ab4b148 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -107,6 +107,12 @@ struct aer_stats {
>   PCI_ERR_ROOT_MULTI_COR_RCV |\
>   PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>  
> +#define AER_ERR_ANFE_UNC_MASK(PCI_ERR_UNC_POISON_TLP |   
> \
> + PCI_ERR_UNC_COMP_TIME | \
> + PCI_ERR_UNC_COMP_ABORT |\
> + PCI_ERR_UNC_UNX_COMP |  \
> + PCI_ERR_UNC_UNSUP)
> +
>  static int pcie_aer_disable;
>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>  
> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain, unsigned int bus, 
> unsigned int devfn,
>  EXPORT_SYMBOL_GPL(aer_recover_queue);
>  #endif
>  
> +static void anfe_get_uc_status(struct pci_dev *dev, struct aer_err_info 
> *info)
> +{
> + u32 uncor_mask, uncor_status;
> + u16 device_status;
> + int aer = dev->aer_cap;
> +
> + if (pcie_capability_read_word(dev, PCI_EXP_DEVSTA, _status))
> + return;
> + /*
> +  * Take the most conservative route here. If there are
> +  * Non-Fatal/Fatal errors detected, do not assume any
> +  * bit in uncor_status is set by ANFE.
> +  */
> + if (device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED))
> + return;
> +

Is there not a race here?  If we happen to get either an NFED or FED 
between the read of device_status above and here we might pick up a status
that corresponds to that (and hence clear something we should not).

Or am I missing that race being close somewhere?

> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, _status);
> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, _mask);
> + /*
> +  * According to PCIe Base Specification Revision 6.1,
> +  * Section 6.2.3.2.4, if an UNCOR error is raised as
> +  * Advisory Non-Fatal error, it will match the following
> +  * conditions:
> +  *  a. The severity of the error is Non-Fatal.
> +  *  b. The error is one of the following:
> +  *  1. Poisoned TLP   (Section 6.2.3.2.4.3)
> + 

Re: [PATCH v5 RESEND] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-04-22 Thread Naveen N Rao
On Tue, Apr 02, 2024 at 12:36:54PM +0530, Gautam Menghani wrote:
> PAPR hypervisor has introduced three new counters in the VPA area of
> LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2
> for context switches from host to guest and vice versa, and 1 counter
> for getting the total time spent inside the KVM guest. Add a tracepoint
> that enables reading the counters for use by ftrace/perf. Note that this
> tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM).
> 
> Also maintain an aggregation of the context switch times in vcpu->arch.
> This will be useful in getting the aggregate times with a pmu driver
> which will be upstreamed in the near future.

It would be better to add code to maintain aggregate times as part of 
that pmu driver.

> 
> [1] Terminology:
> a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor
> b. L2 refers to the KVM guest booted on top of L1.
> 
> Signed-off-by: Vaibhav Jain 
> Signed-off-by: Gautam Menghani 
> ---
> v5 RESEND: 
> 1. Add the changelog
> 
> v4 -> v5:
> 1. Define helper functions for getting/setting the accumulation counter
> in L2's VPA
> 
> v3 -> v4:
> 1. After vcpu_run, check the VPA flag instead of checking for tracepoint
> being enabled for disabling the cs time accumulation.
> 
> v2 -> v3:
> 1. Move the counter disabling and zeroing code to a different function.
> 2. Move the get_lppaca() inside the tracepoint_enabled() branch.
> 3. Add the aggregation logic to maintain total context switch time.
> 
> v1 -> v2:
> 1. Fix the build error due to invalid struct member reference.
> 
>  arch/powerpc/include/asm/kvm_host.h |  5 
>  arch/powerpc/include/asm/lppaca.h   | 11 +---
>  arch/powerpc/kvm/book3s_hv.c| 40 +
>  arch/powerpc/kvm/trace_hv.h | 25 ++
>  4 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 8abac532146e..d953b32dd68a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -847,6 +847,11 @@ struct kvm_vcpu_arch {
>   gpa_t nested_io_gpr;
>   /* For nested APIv2 guests*/
>   struct kvmhv_nestedv2_io nestedv2_io;
> +
> + /* Aggregate context switch and guest run time info (in ns) */
> + u64 l1_to_l2_cs_agg;
> + u64 l2_to_l1_cs_agg;
> + u64 l2_runtime_agg;

Can be dropped from this patch.

>  #endif
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
> diff --git a/arch/powerpc/include/asm/lppaca.h 
> b/arch/powerpc/include/asm/lppaca.h
> index 61ec2447dabf..bda6b86b9f13 100644
> --- a/arch/powerpc/include/asm/lppaca.h
> +++ b/arch/powerpc/include/asm/lppaca.h
> @@ -62,7 +62,8 @@ struct lppaca {
>   u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
>   u8  fpregs_in_use;
>   u8  pmcregs_in_use;
> - u8  reserved8[28];
> + u8  l2_accumul_cntrs_enable;  /* Enable usage of counters for KVM 
> guest */

A simpler name - l2_counters_enable or such?

> + u8  reserved8[27];
>   __be64  wait_state_cycles;  /* Wait cycles for this proc */
>   u8  reserved9[28];
>   __be16  slb_count;  /* # of SLBs to maintain */
> @@ -92,9 +93,13 @@ struct lppaca {
>   /* cacheline 4-5 */
>  
>   __be32  page_ins;   /* CMO Hint - # page ins by OS */
> - u8  reserved12[148];
> + u8  reserved12[28];
> + volatile __be64 l1_to_l2_cs_tb;
> + volatile __be64 l2_to_l1_cs_tb;
> + volatile __be64 l2_runtime_tb;
> + u8 reserved13[96];
>   volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
> - u8  reserved13[96];
> + u8  reserved14[96];
>  } cacheline_aligned;
>  
>  #define lppaca_of(cpu)   (*paca_ptrs[cpu]->lppaca_ptr)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8e86eb577eb8..fea1c1429975 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4108,6 +4108,37 @@ static void vcpu_vpa_increment_dispatch(struct 
> kvm_vcpu *vcpu)
>   }
>  }
>  
> +static inline int kvmhv_get_l2_accumul(void)
> +{
> + return get_lppaca()->l2_accumul_cntrs_enable;
> +}
> +
> +static inline void kvmhv_set_l2_accumul(int val)
   ^^^
   bool?

> +{
> + get_lppaca()->l2_accumul_cntrs_enable = val;
> +}
> +
> +static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu)
> +{
> + struct lppaca *lp = get_lppaca();
> + u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns;
> +
> + l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb));
> + l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb));
> + l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb));
> + trace_kvmppc_vcpu_exit_cs_time(vcpu, l1_to_l2_ns, l2_to_l1_ns,
> + l2_runtime_ns);
> + 

Re: [PATCH v5 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-22 Thread Google
On Mon, 22 Apr 2024 12:44:35 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (IBM)" 
> 
> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> code.
> 
> Since code allocations are now implemented with execmem, kprobes can be
> enabled in non-modular kernels.
> 
> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> dependency of CONFIG_KPROBES on CONFIG_MODULES.

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you!

> 
> Signed-off-by: Mike Rapoport (IBM) 
> ---
>  arch/Kconfig|  2 +-
>  include/linux/module.h  |  9 ++
>  kernel/kprobes.c| 55 +++--
>  kernel/trace/trace_kprobe.c | 20 +-
>  4 files changed, 63 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 7006f71f0110..a48ce6a488b3 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -52,9 +52,9 @@ config GENERIC_ENTRY
>  
>  config KPROBES
>   bool "Kprobes"
> - depends on MODULES
>   depends on HAVE_KPROBES
>   select KALLSYMS
> + select EXECMEM
>   select TASKS_RCU if PREEMPTION
>   help
> Kprobes allows you to trap at almost any kernel address and
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 1153b0d99a80..ffa1c603163c 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
>   return mod->state != MODULE_STATE_GOING;
>  }
>  
> +static inline bool module_is_coming(struct module *mod)
> +{
> +return mod->state == MODULE_STATE_COMING;
> +}
> +
>  struct module *__module_text_address(unsigned long addr);
>  struct module *__module_address(unsigned long addr);
>  bool is_module_address(unsigned long addr);
> @@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct 
> module *mod, void *ptr)
>   return ptr;
>  }
>  
> +static inline bool module_is_coming(struct module *mod)
> +{
> + return false;
> +}
>  #endif /* CONFIG_MODULES */
>  
>  #ifdef CONFIG_SYSFS
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index ddd7cdc16edf..ca2c6cbd42d2 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
>   }
>  
>   /* Get module refcount and reject __init functions for loaded modules. 
> */
> - if (*probed_mod) {
> + if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
>   /*
>* We must hold a refcount of the probed module while updating
>* its code to prohibit unexpected unloading.
> @@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
>* kprobes in there.
>*/
>   if (within_module_init((unsigned long)p->addr, *probed_mod) &&
> - (*probed_mod)->state != MODULE_STATE_COMING) {
> + !module_is_coming(*probed_mod)) {
>   module_put(*probed_mod);
>   *probed_mod = NULL;
>   ret = -ENOENT;
>   }
>   }
> +
>  out:
>   preempt_enable();
>   jump_label_unlock();
> @@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
> unsigned long end)
>   return 0;
>  }
>  
> -/* Remove all symbols in given area from kprobe blacklist */
> -static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
> end)
> -{
> - struct kprobe_blacklist_entry *ent, *n;
> -
> - list_for_each_entry_safe(ent, n, _blacklist, list) {
> - if (ent->start_addr < start || ent->start_addr >= end)
> - continue;
> - list_del(>list);
> - kfree(ent);
> - }
> -}
> -
> -static void kprobe_remove_ksym_blacklist(unsigned long entry)
> -{
> - kprobe_remove_area_blacklist(entry, entry + 1);
> -}
> -
>  int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long 
> *value,
>  char *type, char *sym)
>  {
> @@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
> long *start,
>   return ret ? : arch_populate_kprobe_blacklist();
>  }
>  
> +#ifdef CONFIG_MODULES
> +/* Remove all symbols in given area from kprobe blacklist */
> +static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
> end)
> +{
> + struct kprobe_blacklist_entry *ent, *n;
> +
> + list_for_each_entry_safe(ent, n, _blacklist, list) {
> + if (ent->start_addr < start || ent->start_addr >= end)
> + continue;
> + list_del(>list);
> + kfree(ent);
> + }
> +}
> +
> +static void kprobe_remove_ksym_blacklist(unsigned long entry)
> +{
> + kprobe_remove_area_blacklist(entry, entry + 1);
> +}
> +
>  static void add_module_kprobe_blacklist(struct module 

Re: [PATCH] selftests/powerpc: Install tests in sub-directories

2024-04-22 Thread Michael Ellerman
Michael Ellerman  writes:
> The sources for the powerpc selftests are arranged into sub-directories.
> However when the tests are built and installed, the sub-directories are
> squashed, losing the structure.

This is missing a preparatory patch, new version coming.

cheers


[PATCH v2 2/2] selftests/powerpc: Install tests in sub-directories

2024-04-22 Thread Michael Ellerman
The sources for the powerpc selftests are arranged into sub-directories.
However when the tests are built and installed, the sub-directories are
squashed, losing the structure.

For example, with the current code the result of installing the selftests is:

  $ tree tools/testing/selftests/kselftest_install
  tools/testing/selftests/kselftest_install
  ├── kselftest
  │   ├── ktap_helpers.sh
  │   ├── module.sh
  │   ├── prefix.pl
  │   └── runner.sh
  ├── kselftest-list.txt
  ├── powerpc
  │   ├── alignment_handler
  │   ├── attr_test
  │   ├── back_to_back_ebbs_test
  │   ├── bad_accesses
  │   ├── bhrb_filter_map_test
  │   ├── bhrb_no_crash_wo_pmu_test
  │   ├── blacklisted_events_test
  │   ├── cache_shape
  │   ├── close_clears_pmcc_test
  │   ├── context_switch
  │   ├── copy_first_unaligned
  ...
  │   ├── settings
  ...
  │   └── wild_bctr
  └── run_kselftest.sh

All the powerpc tests are squashed into the single powerpc directory. In
particular, note that there is a single `settings` file, even though
there are multiple settings files in the powerpc selftest sources. One
of the settings files ends up installed, depending on install order,
even if they have different contents.

Similarly if there were two tests with the same name in different
sub-directories they would clobber each other.

Fix it by replicating the directory structure of the source tree into
the install directory. The result being for example:

  $ tree tools/testing/selftests/kselftest_install
  tools/testing/selftests/kselftest_install
  ├── kselftest
  │   ├── ktap_helpers.sh
  │   ├── module.sh
  │   ├── prefix.pl
  │   └── runner.sh
  ├── kselftest-list.txt
  ├── powerpc
  │   ├── alignment
  │   │   ├── alignment_handler
  │   │   └── copy_first_unaligned
  │   ├── benchmarks
  │   │   ├── context_switch
  │   │   ├── exec_target
  │   │   ├── fork
  │   │   ├── futex_bench
  │   │   ├── gettimeofday
  │   │   ├── mmap_bench
  │   │   ├── null_syscall
  │   │   └── settings
  ...
  │   ├── eeh
  │   │   ├── eeh-basic.sh
  │   │   ├── eeh-functions.sh
  │   │   └── settings
  ...
  │   └── vphn
  │   └── test-vphn
  └── run_kselftest.sh

Note multiple settings files in different sub-directories.

This change also has the effect of changing the names of the tests from
the point of view of the kselftest runner. Before the tests are named
eg:

  powerpc:copy_first_unaligned
  powerpc:cache_shape
  powerpc:reg_access_test

After, the test collection names include the sub-directory:

  powerpc/alignment:copy_first_unaligned
  powerpc/cache_shape:cache_shape
  powerpc/pmu/ebb:reg_access_test

That means whereas previously all powerpc tests could be run with:

  $ ./run_kselftest.sh -c powerpc

After the change it's necessary to pass a regex that matches all powerpc
entries, eg:

  $ ./run_kselftest.sh -c "powerpc.*"

The latter form also works before and after the change.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/Makefile | 4 ++--
 tools/testing/selftests/powerpc/pmu/Makefile | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

v2: Unchanged.

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 2f299fd04d2d..b175e94e1901 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -52,14 +52,14 @@ endef
 override define INSTALL_RULE
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET;\
-   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install;\
+   $(MAKE) OUTPUT=$$BUILD_TARGET 
INSTALL_PATH=$$INSTALL_PATH/$$TARGET -C $$TARGET install;\
done;
 endef
 
 emit_tests:
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET;\
-   $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET $@;\
+   $(MAKE) OUTPUT=$$BUILD_TARGET COLLECTION=$(COLLECTION)/$$TARGET 
-s -C $$TARGET $@;\
done;
 
 override define CLEAN
diff --git a/tools/testing/selftests/powerpc/pmu/Makefile 
b/tools/testing/selftests/powerpc/pmu/Makefile
index 773933e5180e..7e9dbf3d0d09 100644
--- a/tools/testing/selftests/powerpc/pmu/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/Makefile
@@ -44,7 +44,7 @@ emit_tests:
done
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET; \
-   $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET COLLECTION=$(COLLECTION)/$$TARGET 
-s -C $$TARGET emit_tests; \
done;
 
 DEFAULT_INSTALL_RULE := $(INSTALL_RULE)
@@ -52,7 +52,7 @@ override define INSTALL_RULE
$(DEFAULT_INSTALL_RULE)
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET; \
-   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET 
INSTALL_PATH=$$INSTALL_PATH/$$TARGET -C $$TARGET install; \

[PATCH v2 1/2] selftests/powerpc: Convert pmu Makefile to for loop style

2024-04-22 Thread Michael Ellerman
The pmu Makefile has grown more sub directories over the years. Rather
than open coding the rules for each subdir, use for loops.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/pmu/Makefile | 43 ++--
 1 file changed, 22 insertions(+), 21 deletions(-)

v2: Actually send both patches.

diff --git a/tools/testing/selftests/powerpc/pmu/Makefile 
b/tools/testing/selftests/powerpc/pmu/Makefile
index 1fcacae1b188..773933e5180e 100644
--- a/tools/testing/selftests/powerpc/pmu/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/Makefile
@@ -9,7 +9,9 @@ top_srcdir = ../../../../..
 include ../../lib.mk
 include ../flags.mk
 
-all: $(TEST_GEN_PROGS) ebb sampling_tests event_code_tests
+SUB_DIRS := ebb sampling_tests event_code_tests
+
+all: $(TEST_GEN_PROGS) $(SUB_DIRS)
 
 $(TEST_GEN_PROGS): $(EXTRA_SOURCES)
 
@@ -23,12 +25,16 @@ $(OUTPUT)/count_stcx_fail: loop.S $(EXTRA_SOURCES)
 
 $(OUTPUT)/per_event_excludes: ../utils.c
 
+$(SUB_DIRS):
+   BUILD_TARGET=$(OUTPUT)/$@; mkdir -p $$BUILD_TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -k -C $@ all
+
 DEFAULT_RUN_TESTS := $(RUN_TESTS)
 override define RUN_TESTS
$(DEFAULT_RUN_TESTS)
-   +TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests
-   +TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests
-   +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests
+   +@for TARGET in $(SUB_DIRS); do \
+   BUILD_TARGET=$(OUTPUT)/$$TARGET; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests; \
+   done;
 endef
 
 emit_tests:
@@ -36,34 +42,29 @@ emit_tests:
BASENAME_TEST=`basename $$TEST`;\
echo "$(COLLECTION):$$BASENAME_TEST";   \
done
-   +TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
-   +TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
-   +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
+   +@for TARGET in $(SUB_DIRS); do \
+   BUILD_TARGET=$(OUTPUT)/$$TARGET; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests; \
+   done;
 
 DEFAULT_INSTALL_RULE := $(INSTALL_RULE)
 override define INSTALL_RULE
$(DEFAULT_INSTALL_RULE)
-   +TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET install
-   +TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET install
-   +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET install
+   +@for TARGET in $(SUB_DIRS); do \
+   BUILD_TARGET=$(OUTPUT)/$$TARGET; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install; \
+   done;
 endef
 
 DEFAULT_CLEAN := $(CLEAN)
 override define CLEAN
$(DEFAULT_CLEAN)
$(RM) $(TEST_GEN_PROGS) $(OUTPUT)/loop.o
-   +TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET clean
-   +TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET clean
-   +TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) 
OUTPUT=$$BUILD_TARGET -C $$TARGET clean
+   +@for TARGET in $(SUB_DIRS); do \
+   BUILD_TARGET=$(OUTPUT)/$$TARGET; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET clean; \
+   done;
 endef
 
-ebb:
-   TARGET=$@; BUILD_TARGET=$$OUTPUT/$$TARGET; mkdir -p $$BUILD_TARGET; 
$(MAKE) OUTPUT=$$BUILD_TARGET -k -C $$TARGET all
-
-sampling_tests:
-   TARGET=$@; BUILD_TARGET=$$OUTPUT/$$TARGET; mkdir -p $$BUILD_TARGET; 
$(MAKE) OUTPUT=$$BUILD_TARGET -k -C $$TARGET all
-
-event_code_tests:
-   TARGET=$@; BUILD_TARGET=$$OUTPUT/$$TARGET; mkdir -p $$BUILD_TARGET; 
$(MAKE) OUTPUT=$$BUILD_TARGET -k -C $$TARGET all
 
 .PHONY: all run_tests ebb sampling_tests event_code_tests emit_tests
-- 
2.44.0



[PATCH] selftests/powerpc: Install tests in sub-directories

2024-04-22 Thread Michael Ellerman
The sources for the powerpc selftests are arranged into sub-directories.
However when the tests are built and installed, the sub-directories are
squashed, losing the structure.

For example, with the current code the result of installing the selftests is:

  $ tree tools/testing/selftests/kselftest_install
  tools/testing/selftests/kselftest_install
  ├── kselftest
  │   ├── ktap_helpers.sh
  │   ├── module.sh
  │   ├── prefix.pl
  │   └── runner.sh
  ├── kselftest-list.txt
  ├── powerpc
  │   ├── alignment_handler
  │   ├── attr_test
  │   ├── back_to_back_ebbs_test
  │   ├── bad_accesses
  │   ├── bhrb_filter_map_test
  │   ├── bhrb_no_crash_wo_pmu_test
  │   ├── blacklisted_events_test
  │   ├── cache_shape
  │   ├── close_clears_pmcc_test
  │   ├── context_switch
  │   ├── copy_first_unaligned
  ...
  │   ├── settings
  ...
  │   └── wild_bctr
  └── run_kselftest.sh

All the powerpc tests are squashed into the single powerpc directory. In
particular, note that there is a single `settings` file, even though
there are multiple settings files in the powerpc selftest sources. One
of the settings files ends up installed, depending on install order,
even if they have different contents.

Similarly if there were two tests with the same name in different
sub-directories they would clobber each other.

Fix it by replicating the directory structure of the source tree into
the install directory. The result being for example:

  $ tree tools/testing/selftests/kselftest_install
  tools/testing/selftests/kselftest_install
  ├── kselftest
  │   ├── ktap_helpers.sh
  │   ├── module.sh
  │   ├── prefix.pl
  │   └── runner.sh
  ├── kselftest-list.txt
  ├── powerpc
  │   ├── alignment
  │   │   ├── alignment_handler
  │   │   └── copy_first_unaligned
  │   ├── benchmarks
  │   │   ├── context_switch
  │   │   ├── exec_target
  │   │   ├── fork
  │   │   ├── futex_bench
  │   │   ├── gettimeofday
  │   │   ├── mmap_bench
  │   │   ├── null_syscall
  │   │   └── settings
  ...
  │   ├── eeh
  │   │   ├── eeh-basic.sh
  │   │   ├── eeh-functions.sh
  │   │   └── settings
  ...
  │   └── vphn
  │   └── test-vphn
  └── run_kselftest.sh

Note multiple settings files in different sub-directories.

This change also has the effect of changing the names of the tests from
the point of view of the kselftest runner. Before the tests are named
eg:

  powerpc:copy_first_unaligned
  powerpc:cache_shape
  powerpc:reg_access_test

After, the test collection names include the sub-directory:

  powerpc/alignment:copy_first_unaligned
  powerpc/cache_shape:cache_shape
  powerpc/pmu/ebb:reg_access_test

That means whereas previously all powerpc tests could be run with:

  $ ./run_kselftest.sh -c powerpc

After the change it's necessary to pass a regex that matches all powerpc
entries, eg:

  $ ./run_kselftest.sh -c "powerpc.*"

The latter form also works before and after the change.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/Makefile | 4 ++--
 tools/testing/selftests/powerpc/pmu/Makefile | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 2f299fd04d2d..b175e94e1901 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -52,14 +52,14 @@ endef
 override define INSTALL_RULE
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET;\
-   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install;\
+   $(MAKE) OUTPUT=$$BUILD_TARGET 
INSTALL_PATH=$$INSTALL_PATH/$$TARGET -C $$TARGET install;\
done;
 endef
 
 emit_tests:
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET;\
-   $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET $@;\
+   $(MAKE) OUTPUT=$$BUILD_TARGET COLLECTION=$(COLLECTION)/$$TARGET 
-s -C $$TARGET $@;\
done;
 
 override define CLEAN
diff --git a/tools/testing/selftests/powerpc/pmu/Makefile 
b/tools/testing/selftests/powerpc/pmu/Makefile
index 773933e5180e..7e9dbf3d0d09 100644
--- a/tools/testing/selftests/powerpc/pmu/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/Makefile
@@ -44,7 +44,7 @@ emit_tests:
done
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET; \
-   $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET COLLECTION=$(COLLECTION)/$$TARGET 
-s -C $$TARGET emit_tests; \
done;
 
 DEFAULT_INSTALL_RULE := $(INSTALL_RULE)
@@ -52,7 +52,7 @@ override define INSTALL_RULE
$(DEFAULT_INSTALL_RULE)
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET; \
-   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install; \
+   $(MAKE) OUTPUT=$$BUILD_TARGET 
INSTALL_PATH=$$INSTALL_PATH/$$TARGET -C $$TARGET install; \
done;
 

Re: [PATCH v5 11/15] arch: make execmem setup available regardless of CONFIG_MODULES

2024-04-22 Thread Philippe Mathieu-Daudé

On 22/4/24 11:44, Mike Rapoport wrote:

From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
---
  arch/arm/kernel/module.c   |  43 --
  arch/arm/mm/init.c |  45 +++
  arch/arm64/kernel/module.c | 140 -
  arch/arm64/mm/init.c   | 140 +
  arch/loongarch/kernel/module.c |  19 -
  arch/loongarch/mm/init.c   |  21 +
  arch/mips/kernel/module.c  |  22 --
  arch/mips/mm/init.c|  23 ++
  arch/nios2/kernel/module.c |  20 -
  arch/nios2/mm/init.c   |  21 +
  arch/parisc/kernel/module.c|  20 -
  arch/parisc/mm/init.c  |  23 +-
  arch/powerpc/kernel/module.c   |  63 ---
  arch/powerpc/mm/mem.c  |  64 +++
  arch/riscv/kernel/module.c |  44 ---
  arch/riscv/mm/init.c   |  45 +++
  arch/s390/kernel/module.c  |  27 ---
  arch/s390/mm/init.c|  30 +++
  arch/sparc/kernel/module.c |  19 -
  arch/sparc/mm/Makefile |   2 +
  arch/sparc/mm/execmem.c|  21 +
  arch/x86/kernel/module.c   |  27 ---
  arch/x86/mm/init.c |  29 +++
  23 files changed, 463 insertions(+), 445 deletions(-)
  create mode 100644 arch/sparc/mm/execmem.c


Reviewed-by: Philippe Mathieu-Daudé 



Re: [RFC PATCH 1/2] objtool: Run objtool only if either of the config options are selected

2024-04-22 Thread Masahiro Yamada
On Mon, Apr 22, 2024 at 6:25 PM Sathvika Vasireddy  wrote:
>
> Currently, when objtool is enabled and none of the supported options
> are triggered, kernel build errors out with the below error:
> error: objtool: At least one command required.


Then, I think CONFIG_OBJTOOL should be disabled.


>
> To address this, ensure that objtool is run only when either of the
> config options are selected.
>
> Signed-off-by: Sathvika Vasireddy 
> ---
>  scripts/Makefile.lib | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 3179747cbd2c..c65bb0fbd136 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -286,7 +286,10 @@ objtool-args = $(objtool-args-y) 
>   \
>
>  delay-objtool := $(or $(CONFIG_LTO_CLANG),$(CONFIG_X86_KERNEL_IBT))
>
> +ifneq ($(objtool-args-y),)
>  cmd_objtool = $(if $(objtool-enabled), ; $(objtool) $(objtool-args) $@)
> +endif
> +
>  cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard 
> $(objtool))' ; } >> $(dot-target).cmd)
>
>  endif # CONFIG_OBJTOOL
> --
> 2.34.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH] powerpc: Mark memory_limit as initdata

2024-04-22 Thread Michael Ellerman
The `memory_limit` variable should only be used during boot, enforce
that by marking it initdata.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3a440004b97d..12316ac66e7e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -30,7 +30,7 @@
 
 #include 
 
-unsigned long long memory_limit;
+unsigned long long memory_limit __initdata;
 
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] 
__page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
-- 
2.44.0



[PATCH v5 15/15] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0



[PATCH v5 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 7006f71f0110..a48ce6a488b3 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, _blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(>list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, _blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(>list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(_module_nb);
+}
+#else
+static int kprobe_register_module_notifier(void)
+{
+   

[PATCH v5 13/15] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0



[PATCH v5 12/15] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3f5ba72c9480..cd8addb96a0b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0



[PATCH v5 11/15] arch: make execmem setup available regardless of CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  44 ---
 arch/riscv/mm/init.c   |  45 +++
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 463 insertions(+), 445 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return _info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+

[PATCH v5 10/15] powerpc: extend execmem_params for kprobes allocations

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0



[PATCH v5 09/15] riscv: extend execmem_params for generated code allocations

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Slightly reorder execmem_params initialization to support both 32 and 64
bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/kernel/module.c | 28 +---
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 3 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..2ecbacbc9993 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,19 +906,41 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   unsigned long start, end;
+
+   if (IS_ENABLED(CONFIG_64BIT)) {
+   start = MODULES_VADDR;
+   end = MODULES_END;
+   } else {
+   start = VMALLOC_START;
+   end = VMALLOC_END;
+   }
+
execmem_info = (struct execmem_info){
.ranges = {
[EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
+   .start  = start,
+   .end= end,
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0



[PATCH v5 08/15] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
early initialization of execmem required by x86.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/Kconfig   |  6 +++
 arch/arm/kernel/module.c   | 41 ++---
 arch/arm64/kernel/module.c | 67 ++--
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 -
 arch/powerpc/kernel/module.c   | 60 -
 arch/s390/kernel/module.c  | 54 ++-
 arch/x86/Kconfig   |  1 +
 arch/x86/kernel/module.c   | 70 ++
 include/linux/execmem.h| 34 +++
 include/linux/moduleloader.h   | 12 -
 kernel/module/main.c   | 26 +++
 mm/execmem.c   | 70 +-
 mm/mm_init.c   |  2 +
 14 files changed, 259 insertions(+), 202 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..7006f71f0110 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,12 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_EARLY
+   bool
+   help
+ For architectures that might allocate executable memory early on
+ boot, for instance ftrace on x86.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,59 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   void *p = NULL;
+   unsigned long fallback_start = 0, fallback_end = 0;
+   unsigned long start = 0, end = 0;
+
+   module_init_limits();
 
/*
 * Where possible, prefer to allocate within direct branch range of the
 * kernel such that no PLTs are necessary.
 */
if (module_direct_base) {
-   p = __vmalloc_node_range(size, MODULE_ALIGN,
-module_direct_base,
-   

[PATCH v5 07/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/loongarch/kernel/module.c | 19 +++--
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 +++---
 arch/parisc/kernel/module.c| 24 +++
 arch/riscv/kernel/module.c | 24 +++
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 41 +++
 mm/execmem.c   | 73 --
 8 files changed, 208 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 

[PATCH v5 06/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)
return -ENOMEM;

[PATCH v5 05/15] module: make module_memory_{alloc,free} more self-contained

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0



[PATCH v5 04/15] sparc: simplify module_alloc()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becames identical for both 32
and 64 bits.

While on it, drop unsed include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0



[PATCH v5 03/15] nios2: define virtual address space for modules

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0



[PATCH v5 02/15] mips: module: rename MODULE_START to MODULES_VADDR

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0



[PATCH v5 01/15] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0



[PATCH v5 00/15] mm: jit/text allocator

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

(something went wrong with the prevois posting, sorry for the noise)

Hi,

Since v3 I looked into making execmem more of an utility toolbox, as we
discussed at LPC with Mark Rutland, but it was getting more hairier than
having a struct describing architecture constraints and a type identifying
the consumer of execmem.

And I do think that having the description of architecture constraints for
allocations of executable memory in a single place is better than having it
spread all over the place.

The patches available via git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v5

v5 changes:
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define parameters for ranges suitable for that
subsystem.

The new infrastructure allows decoupling of BPF, kprobes and ftrace from

Re: [PATCH] powerpc: Split PAGE_SHIFT/SIZE into vdso/page.h

2024-04-22 Thread Michael Ellerman
Michael Ellerman  writes:
> The VDSO needs PAGE_SHIFT/SIZE defined, so it includes asm/page.h.

For the archives, this was superseeded by Arnd's rework:

  https://lore.kernel.org/all/20240320180228.136371-1-a...@kernel.org/

cheers


[RFC PATCH 2/2] objtool/powerpc: Enhance objtool to fixup alternate feature relative addresses

2024-04-22 Thread Sathvika Vasireddy
Implement build-time fixup of alternate feature relative addresses for
the out-of-line (else) patch code. Initial posting to achieve the same
using another tool can be found at [1]. Idea is to implement this using
objtool instead of introducing another tool since it already has elf
parsing and processing covered.

Introduce --ftr-fixup as an option to objtool to do feature fixup at
build-time.

Couple of issues and warnings encountered while implementing feature
fixup using objtool are as follows:

1. libelf is creating corrupted vmlinux file after writing necessary
changes to the file. Due to this, kexec is not able to load new
kernel.

It gives the following error:
ELF Note corrupted !
Cannot determine the file type of vmlinux

To fix this issue, after opening vmlinux file, make a call to
elf_flagelf (e, ELF_C_SET, ELF_F_LAYOUT). This instructs libelf not
to touch the segment and section layout. It informs the library
that the application will take responsibility for the layout of the
file and that the library should not insert any padding between
sections.

2. Fix can't find starting instruction warnings when run on vmlinux

Objtool throws a lot of can't find starting instruction warnings
when run on vmlinux with --ftr-fixup option.

These warnings are seen because find_insn() function looks for
instructions at offsets that are relative to the start of the section.
In case of individual object files (.o), there are no can't find
starting instruction warnings seen because the actual offset
associated with an instruction is itself a relative offset since the
sections start at offset 0x0.

However, in case of vmlinux, find_insn() function fails to find
instructions at the actual offset associated with an instruction
since the sections in vmlinux do not start at offset 0x0. Due to
this, find_insn() will look for absolute offset and not the relative
offset. This is resulting in a lot of can't find starting instruction
warnings when objtool is run on vmlinux.

To fix this, pass offset that is relative to the start of the section
to find_insn().

find_insn() is also looking for symbols of size 0. But, objtool does
not store empty STT_NOTYPE symbols in the rbtree. Due to this,
for empty symbols, objtool is throwing can't find starting
instruction warnings. Fix this by ignoring symbols that are of
size 0 since objtool does not add them to the rbtree.

3. Objtool is throwing unannotated intra-function call warnings
when run on vmlinux with --ftr-fixup option.

One such example:

vmlinux: warning: objtool: .text+0x3d94:
unannotated intra-function call

.text + 0x3d94 = c0008000 + 3d94 = c00081d4

c00081d4: 45 24 02 48  bl c002a618


c002a610 :
c002a610:   0e 01 4c 3c addis   r2,r12,270
c002a610: R_PPC64_REL16_HA.TOC.
c002a614:   f0 6c 42 38 addir2,r2,27888
c002a614: R_PPC64_REL16_LO.TOC.+0x4
c002a618:   a6 02 08 7c mflrr0

This is happening because we should be looking for destination
symbols that are at absolute offsets instead of relative offsets.
After fixing dest_off to point to absolute offset, there are still
a lot of these warnings shown.

In the above example, objtool is computing the destination
offset to be c002a618, which points to a completely
different instruction. find_call_destination() is looking for this
offset and failing. Instead, we should be looking for destination
offset c002a610 which points to system_reset_exception
function.

Even after fixing the way destination offset is computed, and
after looking for dest_off - 0x8 in cases where the original offset
is not found, there are still a lot of unannotated intra-function
call warnings generated. This is due to symbols that are not
properly annotated.

So, for now, as a hack to curb these warnings, do not emit
unannotated intra-function call warnings when objtool is run
with --ftr-fixup option.

TODO:
This patch enables build time feature fixup only for powerpc little
endian configs. There are boot failures with big endian configs.
Posting this as an initial RFC to get some review comments while I work
on big endian issues.

[1]
https://lore.kernel.org/linuxppc-dev/20170521010130.13552-1-npig...@gmail.com/

Co-developed-by: Nicholas Piggin 
Signed-off-by: Nicholas Piggin 
Signed-off-by: Sathvika Vasireddy 
---
 arch/Kconfig  |   3 +
 arch/powerpc/Kconfig  |   5 +
 arch/powerpc/Makefile |   5 +
 arch/powerpc/include/asm/feature-fixups.h |  11 +-
 arch/powerpc/kernel/vmlinux.lds.S |  14 +-
 arch/powerpc/lib/feature-fixups.c |  13 +
 scripts/Makefile.lib  |   7 +
 scripts/Makefile.vmlinux  |  15 +-
 tools/objtool/arch/powerpc/special.c  | 329 ++
 tools/objtool/arch/x86/special.c   

[RFC PATCH 1/2] objtool: Run objtool only if either of the config options are selected

2024-04-22 Thread Sathvika Vasireddy
Currently, when objtool is enabled and none of the supported options
are triggered, kernel build errors out with the below error:
error: objtool: At least one command required.

To address this, ensure that objtool is run only when either of the
config options are selected.

Signed-off-by: Sathvika Vasireddy 
---
 scripts/Makefile.lib | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3179747cbd2c..c65bb0fbd136 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -286,7 +286,10 @@ objtool-args = $(objtool-args-y)   
\
 
 delay-objtool := $(or $(CONFIG_LTO_CLANG),$(CONFIG_X86_KERNEL_IBT))
 
+ifneq ($(objtool-args-y),)
 cmd_objtool = $(if $(objtool-enabled), ; $(objtool) $(objtool-args) $@)
+endif
+
 cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard 
$(objtool))' ; } >> $(dot-target).cmd)
 
 endif # CONFIG_OBJTOOL
-- 
2.34.1



Re: [PATCH] powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()

2024-04-22 Thread Michael Ellerman
On Thu, 15 Feb 2024 07:52:32 -0600, Shivaprasad G Bhat wrote:
> The patch makes the iommu_group_get() call only when using it
> thereby avoiding the unnecessary get & put for domain already
> being set case.
> 
> 

Applied to powerpc/fixes.

[1/1] powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()
  https://git.kernel.org/powerpc/c/5bd31ab5f79eb6e3bdfa0ca0b57650f9d1604062

cheers


Re: [PATCH] powerpc/crypto/chacha-p10: Fix failure on non Power10

2024-04-22 Thread Michael Ellerman
On Fri, 29 Mar 2024 00:02:00 +1100, Michael Ellerman wrote:
> The chacha-p10-crypto module provides optimised chacha routines for
> Power10. It also selects CRYPTO_ARCH_HAVE_LIB_CHACHA which says it
> provides chacha_crypt_arch() to generic code.
> 
> Notably the module needs to provide chacha_crypt_arch() regardless of
> whether it is loaded on Power10 or an older CPU.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/crypto/chacha-p10: Fix failure on non Power10
  https://git.kernel.org/powerpc/c/69630926011c1f7170a465b7b5c228deb66e9372

cheers


Re: [PATCH v2] selftests/powerpc/papr-vpd: Fix missing variable initialization

2024-04-22 Thread Michael Ellerman
On Thu, 04 Apr 2024 17:02:09 -0500, Nathan Lynch wrote:
> The "close handle without consuming VPD" testcase has inconsistent
> results because it fails to initialize the location code object it
> passes to ioctl() to create a VPD handle. Initialize the location code
> to the empty string as intended.
> 
> 

Applied to powerpc/fixes.

[1/1] selftests/powerpc/papr-vpd: Fix missing variable initialization
  https://git.kernel.org/powerpc/c/210cfef579260ed6c3b700e7baeae51a5e183f43

cheers


[PATCH v5 15/15] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0



[PATCH v5 14/15] kprobes: remove dependency on CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 7006f71f0110..a48ce6a488b3 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, _blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(>list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, _blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(>list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(_module_nb);
+}
+#else
+static int kprobe_register_module_notifier(void)
+{
+   

[PATCH v5 13/15] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0



[PATCH v5 12/15] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3f5ba72c9480..cd8addb96a0b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0



[PATCH v5 11/15] arch: make execmem setup available regardless of CONFIG_MODULES

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  44 ---
 arch/riscv/mm/init.c   |  45 +++
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 463 insertions(+), 445 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return _info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+

[PATCH v5 10/15] powerpc: extend execmem_params for kprobes allocations

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0



[PATCH v5 09/15] riscv: extend execmem_params for generated code allocations

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Slightly reorder execmem_params initialization to support both 32 and 64
bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/kernel/module.c | 28 +---
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 3 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..2ecbacbc9993 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,19 +906,41 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   unsigned long start, end;
+
+   if (IS_ENABLED(CONFIG_64BIT)) {
+   start = MODULES_VADDR;
+   end = MODULES_END;
+   } else {
+   start = VMALLOC_START;
+   end = VMALLOC_END;
+   }
+
execmem_info = (struct execmem_info){
.ranges = {
[EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
+   .start  = start,
+   .end= end,
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0



[PATCH v5 08/15] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
early initialization of execmem required by x86.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/Kconfig   |  6 +++
 arch/arm/kernel/module.c   | 41 ++---
 arch/arm64/kernel/module.c | 67 ++--
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 -
 arch/powerpc/kernel/module.c   | 60 -
 arch/s390/kernel/module.c  | 54 ++-
 arch/x86/Kconfig   |  1 +
 arch/x86/kernel/module.c   | 70 ++
 include/linux/execmem.h| 34 +++
 include/linux/moduleloader.h   | 12 -
 kernel/module/main.c   | 26 +++
 mm/execmem.c   | 70 +-
 mm/mm_init.c   |  2 +
 14 files changed, 259 insertions(+), 202 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..7006f71f0110 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,12 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_EARLY
+   bool
+   help
+ For architectures that might allocate executable memory early on
+ boot, for instance ftrace on x86.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,59 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   void *p = NULL;
+   unsigned long fallback_start = 0, fallback_end = 0;
+   unsigned long start = 0, end = 0;
+
+   module_init_limits();
 
/*
 * Where possible, prefer to allocate within direct branch range of the
 * kernel such that no PLTs are necessary.
 */
if (module_direct_base) {
-   p = __vmalloc_node_range(size, MODULE_ALIGN,
-module_direct_base,
-   

[PATCH v5 07/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/loongarch/kernel/module.c | 19 +++--
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 +++---
 arch/parisc/kernel/module.c| 24 +++
 arch/riscv/kernel/module.c | 24 +++
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 41 +++
 mm/execmem.c   | 73 --
 8 files changed, 208 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 

[PATCH v5 06/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)
return -ENOMEM;

[PATCH v5 05/15] module: make module_memory_{alloc,free} more self-contained

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0



[PATCH v5 04/15] sparc: simplify module_alloc()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becames identical for both 32
and 64 bits.

While on it, drop unsed include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0



[PATCH v5 03/15] nios2: define virtual address space for modules

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0



[PATCH v5 02/15] mips: module: rename MODULE_START to MODULES_VADDR

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0



[PATCH v5 01/15] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0



Re: (subset) [PATCH 00/34] address all -Wunused-const warnings

2024-04-22 Thread Michael Ellerman
On Wed, 03 Apr 2024 10:06:18 +0200, Arnd Bergmann wrote:
> Compilers traditionally warn for unused 'static' variables, but not
> if they are constant. The reason here is a custom for C++ programmers
> to define named constants as 'static const' variables in header files
> instead of using macros or enums.
> 
> In W=1 builds, we get warnings only static const variables in C
> files, but not in headers, which is a good compromise, but this still
> produces warning output in at least 30 files. These warnings are
> almost all harmless, but also trivial to fix, and there is no
> good reason to warn only about the non-const variables being unused.
> 
> [...]

Applied to powerpc/next.

[01/34] powerpc/fsl-soc: hide unused const variable

https://git.kernel.org/powerpc/c/01acaf3aa75e1641442cc23d8fe0a7bb4226efb1

cheers


[PATCH v5 00/15] mm: jit/text allocator

2024-04-22 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

Since v3 I looked into making execmem more of an utility toolbox, as we
discussed at LPC with Mark Rutland, but it was getting more hairier than
having a struct describing architecture constraints and a type identifying
the consumer of execmem.

And I do think that having the description of architecture constraints for
allocations of executable memory in a single place is better than having it
spread all over the place.

The patches available via git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v5

v5 changes:
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define parameters for ranges suitable for that
subsystem.

The new infrastructure allows decoupling of BPF, kprobes and ftrace from
modules, and most importantly it paves the way for ROX allocations for

Re: [PATCH] powerpc: Use str_plural() to fix Coccinelle warning

2024-04-22 Thread Michael Ellerman
On Mon, 01 Apr 2024 00:22:50 +0200, Thorsten Blum wrote:
> Fixes the following Coccinelle/coccicheck warning reported by
> string_choices.cocci:
> 
>   opportunity for str_plural(tpc)
> 
> 

Applied to powerpc/next.

[1/1] powerpc: Use str_plural() to fix Coccinelle warning
  https://git.kernel.org/powerpc/c/3e42e72796d8991fecad78d61a180e24a4853427

cheers


Re: [PATCH] powerpc/ptdump: Fix walk_vmemmap to also print first vmemmap entry

2024-04-22 Thread Michael Ellerman
On Wed, 17 Apr 2024 20:37:40 +0530, Ritesh Harjani (IBM) wrote:
> walk_vmemmap() was skipping the first vmemmap entry pointed by
> vmemmap_list pointer itself. This patch fixes that.
> 
> With this we should see the vmemmap entry at 0xc00c for hash
> which wasn't getting printed on doing
> 
> "cat /sys/kernel/debug/kernel_hash_pagetable"
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/ptdump: Fix walk_vmemmap to also print first vmemmap entry
  https://git.kernel.org/powerpc/c/f318c8be797f8572629d5386a88cde7d753457a8

cheers


Re: [PATCH v2] Add static_key_feature_checks_initialized flag

2024-04-22 Thread Michael Ellerman
On Mon, 08 Apr 2024 05:23:58 +, Nicholas Miehlbradt wrote:
> JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_intialized to determine
> whether {cpu,mmu}_has_feature() is used before static keys were
> initialized. However, {cpu,mmu}_has_feature() should not be used before
> setup_feature_keys() is called but static_key_initialized is set well
> before this by the call to jump_label_init() in early_init_devtree().
> This creates a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG will not
> detect misuse and report errors. Add a flag specifically to indicate
> when {cpu,mmu}_has_feature() is safe to use.
> 
> [...]

Applied to powerpc/next.

[1/1] Add static_key_feature_checks_initialized flag
  https://git.kernel.org/powerpc/c/676b2f99b0f6cd11193eeae13c976565c3fc7545

cheers


Re: [PATCH] powerpc: Fix fatal warnings flag for LLVM's integrated assembler

2024-04-22 Thread Michael Ellerman
On Fri, 05 Apr 2024 12:31:22 -0700, Nathan Chancellor wrote:
> When building with LLVM_IAS=1, there is an error because
> '-fatal-warnings' is not recognized as a valid flag:
> 
>   clang: error: unsupported argument '-fatal-warnings' to option '-Wa,'
> 
> Use the double hyphen version of the flag, '--fatal-warnings', which
> works with both the GNU assembler and LLVM's integrated assembler.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix fatal warnings flag for LLVM's integrated assembler
  https://git.kernel.org/powerpc/c/8884fc918f6aee220f9b41806974508bd0213aca

cheers


Re: [PATCH v5] powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.

2024-04-22 Thread Michael Ellerman
On Wed, 10 Apr 2024 10:00:06 +0530, Mahesh Salgaonkar wrote:
> nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
> crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
> interrupt handler) if percpu allocation comes from vmalloc area.
> 
> Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
> wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
> percpu allocation is from the embedded first chunk. However with
> CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
> allocation can come from the vmalloc area.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.
  https://git.kernel.org/powerpc/c/0db880fc865ffb522141ced4bfa66c12ab1fbb70

cheers


Re: [PATCH v2] powerpc: Fix PS3 allmodconfig warning

2024-04-22 Thread Michael Ellerman
On Mon, 01 Apr 2024 16:08:31 +0900, Geoff Levand wrote:
> The struct ps3_notification_device in the ps3_probe_thread routine
> is too large to be on the stack, causing a warning for an
> allmodconfig build with clang.
> 
> Change the struct ps3_notification_device from a variable on the stack
> to a dynamically allocated variable.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix PS3 allmodconfig warning
  https://git.kernel.org/powerpc/c/bfe51886ca544956eb4ff924d1937ac01d0ca9c8

cheers


Re: [PATCH v1] powerpc: Error on assembly warnings

2024-04-22 Thread Michael Ellerman
On Tue, 26 Mar 2024 15:44:20 +1100, Benjamin Gray wrote:
> We currently enable -Werror on the arch/powerpc subtree. However this
> only catches C warnings. Assembly warnings are logged, but the make
> invocation will still succeed. This can allow incorrect syntax such as
> 
>   ori r3, r4, r5
> 
> to be compiled without catching that the assembler is treating r5
> as the immediate value 5.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Error on assembly warnings
  https://git.kernel.org/powerpc/c/608d4a5ca56302181e669cea0aa571cbec6680eb

cheers


Re: [PATCH v1 1/1] powerpc/52xx: Replace of_gpio.h by proper one

2024-04-22 Thread Michael Ellerman
On Wed, 13 Mar 2024 15:56:45 +0200, Andy Shevchenko wrote:
> of_gpio.h is deprecated and subject to remove.
> The driver doesn't use it directly, replace it
> with what is really being used.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/52xx: Replace of_gpio.h by proper one
  https://git.kernel.org/powerpc/c/676abf7c39267080ab81597c6d4f372a10c0fc21

cheers


Re: [PATCH v4] powerpc/pseries: make max polling consistent for longer H_CALLs

2024-04-22 Thread Andrew Donnellan
On Wed, 2024-04-17 at 23:12 -0400, Nayna Jain wrote:
> Currently, plpks_confirm_object_flushed() function polls for 5msec in
> total instead of 5sec.
> 
> Keep max polling time consistent for all the H_CALLs, which take
> longer
> than expected, to be 5sec. Also, make use of fsleep() everywhere to
> insert delay.
> 
> Reported-by: Nageswara R Sastry 
> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform
> KeyStore")
> Signed-off-by: Nayna Jain 
> Tested-by: Nageswara R Sastry 

Reviewed-by: Andrew Donnellan 

> ---
> v4:
>  * As per Andrew's feedback, squashed Patch 2 with Patch 1.
> Now it is single patch.
> 
> v3:
>  * Addition to Patch 1 timeout patch based on Andrew's feedback.
> 
> v2:
> * Updated based on feedback from Michael Ellerman
>     Replaced usleep_range with fsleep.
>     Since there is no more need to specify range, sleep time is
> reverted back to 10 msec.
> 
>  arch/powerpc/include/asm/plpks.h   |  5 ++---
>  arch/powerpc/platforms/pseries/plpks.c | 10 +-
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/plpks.h
> b/arch/powerpc/include/asm/plpks.h
> index 23b77027c916..7a84069759b0 100644
> --- a/arch/powerpc/include/asm/plpks.h
> +++ b/arch/powerpc/include/asm/plpks.h
> @@ -44,9 +44,8 @@
>  #define PLPKS_MAX_DATA_SIZE  4000
>  
>  // Timeouts for PLPKS operations
> -#define PLPKS_MAX_TIMEOUT5000 // msec
> -#define PLPKS_FLUSH_SLEEP10 // msec
> -#define PLPKS_FLUSH_SLEEP_RANGE  400
> +#define PLPKS_MAX_TIMEOUT(5 * USEC_PER_SEC)
> +#define PLPKS_FLUSH_SLEEP1 // usec
>  
>  struct plpks_var {
>   char *component;
> diff --git a/arch/powerpc/platforms/pseries/plpks.c
> b/arch/powerpc/platforms/pseries/plpks.c
> index febe18f251d0..4a595493d28a 100644
> --- a/arch/powerpc/platforms/pseries/plpks.c
> +++ b/arch/powerpc/platforms/pseries/plpks.c
> @@ -415,8 +415,7 @@ static int plpks_confirm_object_flushed(struct
> label *label,
>   break;
>   }
>  
> - usleep_range(PLPKS_FLUSH_SLEEP,
> -  PLPKS_FLUSH_SLEEP +
> PLPKS_FLUSH_SLEEP_RANGE);
> + fsleep(PLPKS_FLUSH_SLEEP);
>   timeout = timeout + PLPKS_FLUSH_SLEEP;
>   } while (timeout < PLPKS_MAX_TIMEOUT);
>  
> @@ -464,9 +463,10 @@ int plpks_signed_update_var(struct plpks_var
> *var, u64 flags)
>  
>   continuetoken = retbuf[0];
>   if (pseries_status_to_err(rc) == -EBUSY) {
> - int delay_ms = get_longbusy_msecs(rc);
> - mdelay(delay_ms);
> - timeout += delay_ms;
> + int delay_us = get_longbusy_msecs(rc) *
> 1000;
> +
> + fsleep(delay_us);
> + timeout += delay_us;
>   }
>   rc = pseries_status_to_err(rc);
>   } while (rc == -EBUSY && timeout < PLPKS_MAX_TIMEOUT);

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited


Re: [PATCH 1/1] Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig

2024-04-22 Thread Florian Weimer
* Vignesh Balasubramanian:

> diff --git a/include/linux/elf.h b/include/linux/elf.h
> index c9a46c4e183b..5c402788da19 100644
> --- a/include/linux/elf.h
> +++ b/include/linux/elf.h
> @@ -65,7 +65,7 @@ extern Elf64_Dyn _DYNAMIC [];
>  struct file;
>  struct coredump_params;
>  
> -#ifndef ARCH_HAVE_EXTRA_ELF_NOTES
> +#ifndef CONFIG_ARCH_HAVE_EXTRA_ELF_NOTES

You could add

  #pragma GCC poison ARCH_HAVE_EXTRA_ELF_NOTES

to a central header, to let GCC and Clang flag uses that have not been
converted.

Thanks,
Florian



[PATCH v2] powerpc/eeh: Permanently disable the removed device

2024-04-22 Thread Ganesh Goudar
When a device is hot removed on powernv, the hotplug driver clears
the device's state. However, on pseries, if a device is removed by
phyp after reaching the error threshold, the kernel remains unaware,
leading to the device not being torn down. This prevents necessary
remediation actions like failover.

Permanently disable the device if the presence check fails.

Also, in eeh_dev_check_failure in we may consider the error as false
positive if the device is hotpluged out as the get_state call returns
EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state,
so log the event if the state is not moved to permanent failure state.

Signed-off-by: Ganesh Goudar 
---
V2:
* Elobrate the commit message.
* Fix formatting issues in commit message and comments.
---
 arch/powerpc/kernel/eeh.c| 11 ++-
 arch/powerpc/kernel/eeh_driver.c | 13 +++--
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ab316e155ea9..6670063a7a6c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -506,9 +506,18 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * We will punt with the following conditions: Failure to get
 * PE's state, EEH not support and Permanently unavailable
 * state, PE is in good state.
+*
+* On the pSeries, after reaching the threshold, get_state might
+* return EEH_STATE_NOT_SUPPORT. However, it's possible that the
+* device state remains uncleared if the device is not marked
+* pci_channel_io_perm_failure. Therefore, consider logging the
+* event to let device removal happen.
+*
 */
if ((ret < 0) ||
-   (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
+   (ret == EEH_STATE_NOT_SUPPORT &&
+dev->error_state == pci_channel_io_perm_failure) ||
+   eeh_state_active(ret)) {
eeh_stats.false_positives++;
pe->false_positives++;
rc = 0;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 48773d2d9be3..7efe04c68f0f 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -865,9 +865,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
devices++;
 
if (!devices) {
-   pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
+   pr_warn("EEH: Frozen PHB#%x-PE#%x is empty!\n",
pe->phb->global_number, pe->addr);
-   goto out; /* nothing to recover */
+   /*
+* The device is removed, tear down its state, on powernv
+* hotplug driver would take care of it but not on pseries,
+* permanently disable the card as it is hot removed.
+*
+* In the case of powernv, note that the removal of device
+* is covered by pci rescan lock, so no problem even if hotplug
+* driver attempts to remove the device.
+*/
+   goto recover_failed;
}
 
/* Log the event */
-- 
2.44.0



[PATCH v2] ASoC: dt-bindings: fsl,ssi: Convert to YAML

2024-04-22 Thread Shengjiu Wang
Convert the fsl,ssi binding to YAML.

Add below compatible strings which were not listed
in document:

fsl,imx50-ssi
fsl,imx53-ssi
fsl,imx25-ssi
fsl,imx27-ssi
fsl,imx6q-ssi
fsl,imx6sl-ssi
fsl,imx6sx-ssi

Add below fsl,mode strings which were not listed.

i2s-slave
i2s-master
lj-slave
lj-master
rj-slave
rj-master

Add 'ac97-gpios' property which were not listed.
Then dtbs_check can pass.

And remove the 'codec' description which should be
in the 'codec' binding doc.

Signed-off-by: Shengjiu Wang 
---
changes in v2:
- change fallback string to const.
- add dai-common.yaml
- add ac97-gpios property

 .../devicetree/bindings/sound/fsl,ssi.txt |  87 
 .../devicetree/bindings/sound/fsl,ssi.yaml| 192 ++
 2 files changed, 192 insertions(+), 87 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/sound/fsl,ssi.txt
 create mode 100644 Documentation/devicetree/bindings/sound/fsl,ssi.yaml

diff --git a/Documentation/devicetree/bindings/sound/fsl,ssi.txt 
b/Documentation/devicetree/bindings/sound/fsl,ssi.txt
deleted file mode 100644
index 7e15a85cecd2..
--- a/Documentation/devicetree/bindings/sound/fsl,ssi.txt
+++ /dev/null
@@ -1,87 +0,0 @@
-Freescale Synchronous Serial Interface
-
-The SSI is a serial device that communicates with audio codecs.  It can
-be programmed in AC97, I2S, left-justified, or right-justified modes.
-
-Required properties:
-- compatible:   Compatible list, should contain one of the following
-compatibles:
-  fsl,mpc8610-ssi
-  fsl,imx51-ssi
-  fsl,imx35-ssi
-  fsl,imx21-ssi
-- cell-index:   The SSI, <0> = SSI1, <1> = SSI2, and so on.
-- reg:  Offset and length of the register set for the device.
-- interrupts:where a is the interrupt number and b is a
-field that represents an encoding of the sense and
-level information for the interrupt.  This should be
-encoded based on the information in section 2)
-depending on the type of interrupt controller you
-have.
-- fsl,fifo-depth:   The number of elements in the transmit and receive FIFOs.
-This number is the maximum allowed value for SFCSR[TFWM0].
- - clocks:  "ipg" - Required clock for the SSI unit
-"baud" - Required clock for SSI master mode. Otherwise this
- clock is not used
-
-Required are also ac97 link bindings if ac97 is used. See
-Documentation/devicetree/bindings/sound/soc-ac97link.txt for the necessary
-bindings.
-
-Optional properties:
-- codec-handle: Phandle to a 'codec' node that defines an audio
-codec connected to this SSI.  This node is typically
-a child of an I2C or other control node.
-- fsl,fiq-stream-filter: Bool property. Disabled DMA and use FIQ instead to
-   filter the codec stream. This is necessary for some boards
-   where an incompatible codec is connected to this SSI, e.g.
-   on pca100 and pcm043.
-- dmas:Generic dma devicetree binding as described in
-   Documentation/devicetree/bindings/dma/dma.txt.
-- dma-names:   Two dmas have to be defined, "tx" and "rx", if fsl,imx-fiq
-   is not defined.
-- fsl,mode: The operating mode for the AC97 interface only.
-"ac97-slave" - AC97 mode, SSI is clock slave
-"ac97-master" - AC97 mode, SSI is clock master
-- fsl,ssi-asynchronous:
-If specified, the SSI is to be programmed in asynchronous
-mode.  In this mode, pins SRCK, STCK, SRFS, and STFS must
-all be connected to valid signals.  In synchronous mode,
-SRCK and SRFS are ignored.  Asynchronous mode allows
-playback and capture to use different sample sizes and
-sample rates.  Some drivers may require that SRCK and STCK
-be connected together, and SRFS and STFS be connected
-together.  This would still allow different sample sizes,
-but not different sample rates.
-- fsl,playback-dma: Phandle to a node for the DMA channel to use for
-playback of audio.  This is typically dictated by SOC
-design.  See the notes below.
-Only used on Power Architecture.
-- fsl,capture-dma:  Phandle to a node for the DMA channel to use for
-capture (recording) of audio.  This is typically dictated
-by SOC design.  See the notes below.
-Only used on Power Architecture.
-
-Child 'codec' node required properties:
-- compatible:   Compatible list, contains the name of the codec
-
-Child 'codec' node optional