subject:""

Re: kernel BUG in ptr_stale

2024-05-09 Thread Kent Overstreet

On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> the email were a PoC file of the issue.

This (and several of your others) are fixed in Linus's tree.

> 
> Stack dump:
> 
> bcachefs (loop1): mounting version 1.7: (unknown version) 
> opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names
> [ cut here ]
> kernel BUG at fs/bcachefs/buckets.h:114!
> invalid opcode:  [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline]
> RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188
> Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 
> 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 
> 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23
> RSP: 0018:c90007c4ec38 EFLAGS: 00010246
> RAX: 0004 RBX: 0080 RCX: c90002679000
> RDX: 0004 RSI: 83ccf3b3 RDI: 0006
> RBP:  R08: 0006 R09: 1028
> R10: 0080 R11:  R12: 1028
> R13: 88804dee5100 R14:  R15: 88805b1a4110
> FS:  7f79ba8ab640() GS:88807ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  
>  bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012
>  bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215
>  bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline]
>  bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297
>  journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322
>  journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411
>  bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752
>  bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32
>  bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160
>  bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691
>  bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978
>  bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968
>  bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863
>  legacy_get_tree+0x109/0x220 fs/fs_context.c:662
>  vfs_get_tree+0x93/0x380 fs/super.c:1771
>  do_new_mount fs/namespace.c:3337 [inline]
>  path_mount+0x679/0x1e40 fs/namespace.c:3664
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount fs/namespace.c:3863 [inline]
>  __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f79b9a91b3e
> Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 
> 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5
> RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e
> RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90
> RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050
> R10: 0181c050 R11: 0202 R12: 20011a00
> R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0
>  
> Modules linked in:
> ---[ end trace  ]---
> 
> 
> Thank you for taking the time to read this email and we look forward to 
> working with you further.
> 
> 
> 
> 
> 
>

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-09 Thread Andrew Davis


On 5/9/24 10:32 AM, Mathieu Poirier wrote:

On Wed, 8 May 2024 at 10:54, Andrew Davis  wrote:


On 5/7/24 3:36 PM, Mathieu Poirier wrote:

On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:

From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
   drivers/remoteproc/Kconfig   |  13 +
   drivers/remoteproc/Makefile  |   1 +
   drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
   3 files changed, 799 insertions(+)
   create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
It's safe to say N here if you're not interested in utilizing
the DSP slave processors.

+config TI_K3_M4_REMOTEPROC
+tristate "TI K3 M4 remoteproc support"
+depends on ARCH_K3 || COMPILE_TEST
+select MAILBOX
+select OMAP2PLUS_MBOX
+help
+  Say m here to support TI's M4 remote processor subsystems
+  on various TI K3 family of SoCs through the remote processor
+  framework.
+
+  It's safe to say N here if you're not interested in utilizing
+  a remote processor.
+
   config TI_K3_R5_REMOTEPROC
  tristate "TI K3 R5 remoteproc support"
  depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)+= st_remoteproc.o
   obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
   obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
   obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)   += ti_k3_m4_remoteproc.o
   obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
   obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..0030e509f6b5d
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ *  Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+void __iomem *cpu_addr;
+phys_addr_t bus_addr;
+u32 dev_addr;
+size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+const char *name;
+const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ * @uses_lreset: flag to denote the need for local reset management
+ */
+struct k3_m4_rproc_dev_data {
+const struct k3_m4_rproc_mem_data *mems;
+u32 num_mems;
+bool uses_lreset;
+};
+
+/**
+ * struct k3_m4_rproc - k3 remote processor driver structure
+ * @dev: cached device pointer
+ * @rproc:

Re: [PATCHv5 bpf-next 6/8] x86/shstk: Add return uprobe support

2024-05-09 Thread Edgecombe, Rick P

On Thu, 2024-05-09 at 10:30 +0200, Jiri Olsa wrote:
> > Per the earlier discussion, this cannot be reached unless uretprobes are in
> > use,
> > which cannot happen without something with privileges taking an action. But
> > are
> > uretprobes ever used for monitoring applications where security is
> > important? Or
> > is it strictly a debug-time thing?
> 
> sorry, I don't have that level of detail, but we do have customers
> that use uprobes in general or want to use it and complain about
> the speed
> 
> there are several tools in bcc [1] that use uretprobes in scripts,
> like:
>   memleak, sslsniff, trace, bashreadline, gethostlatency, argdist,
>   funclatency

Is it possible to have shadow stack only use the non-syscall solution? It seems
it exposes a more limited compatibility in that it only allows writing the
specific trampoline address. (IIRC) Then shadow stack users could still use
uretprobes, but just not the new optimized solution. There are already
operations that are slower with shadow stack, like longjmp(), so this could be
ok maybe.

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-09 Thread Andrew Davis


On 5/9/24 10:22 AM, Mathieu Poirier wrote:

On Wed, 8 May 2024 at 09:36, Andrew Davis  wrote:


On 5/6/24 3:46 PM, Mathieu Poirier wrote:

Good day,

I have started reviewing this patchset.  Comments will be scattered over
multiple days and as such, I will explicitly inform you when  am done with the
review.

On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:

From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
   drivers/remoteproc/Kconfig   |  13 +
   drivers/remoteproc/Makefile  |   1 +
   drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
   3 files changed, 799 insertions(+)
   create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
It's safe to say N here if you're not interested in utilizing
the DSP slave processors.

+config TI_K3_M4_REMOTEPROC
+tristate "TI K3 M4 remoteproc support"
+depends on ARCH_K3 || COMPILE_TEST
+select MAILBOX
+select OMAP2PLUS_MBOX
+help
+  Say m here to support TI's M4 remote processor subsystems
+  on various TI K3 family of SoCs through the remote processor
+  framework.
+
+  It's safe to say N here if you're not interested in utilizing
+  a remote processor.
+
   config TI_K3_R5_REMOTEPROC
  tristate "TI K3 R5 remoteproc support"
  depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)+= st_remoteproc.o
   obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
   obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
   obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)   += ti_k3_m4_remoteproc.o
   obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
   obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..0030e509f6b5d
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ *  Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+void __iomem *cpu_addr;
+phys_addr_t bus_addr;
+u32 dev_addr;
+size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+const char *name;
+const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ * @uses_lreset: flag to denote the need for local reset management
+ */
+struct k3_m4_rproc_dev_data {
+const struct

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-09 Thread Mathieu Poirier

On Wed, 8 May 2024 at 10:54, Andrew Davis  wrote:
>
> On 5/7/24 3:36 PM, Mathieu Poirier wrote:
> > On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:
> >> From: Martyn Welch 
> >>
> >> The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> >> the MCU domain. This core is typically used for safety applications in a
> >> stand alone mode. However, some application (non safety related) may
> >> want to use the M4F core as a generic remote processor with IPC to the
> >> host processor. The M4F core has internal IRAM and DRAM memories and are
> >> exposed to the system bus for code and data loading.
> >>
> >> A remote processor driver is added to support this subsystem, including
> >> being able to load and boot the M4F core. Loading includes to M4F
> >> internal memories and predefined external code/data memories. The
> >> carve outs for external contiguous memory is defined in the M4F device
> >> node and should match with the external memory declarations in the M4F
> >> image binary. The M4F subsystem has two resets. One reset is for the
> >> entire subsystem i.e including the internal memories and the other, a
> >> local reset is only for the M4F processing core. When loading the image,
> >> the driver first releases the subsystem reset, loads the firmware image
> >> and then releases the local reset to let the M4F processing core run.
> >>
> >> Signed-off-by: Martyn Welch 
> >> Signed-off-by: Hari Nagalla 
> >> Signed-off-by: Andrew Davis 
> >> ---
> >>   drivers/remoteproc/Kconfig   |  13 +
> >>   drivers/remoteproc/Makefile  |   1 +
> >>   drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
> >>   3 files changed, 799 insertions(+)
> >>   create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> >>
> >> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> >> index 48845dc8fa852..1a7c0330c91a9 100644
> >> --- a/drivers/remoteproc/Kconfig
> >> +++ b/drivers/remoteproc/Kconfig
> >> @@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
> >>It's safe to say N here if you're not interested in utilizing
> >>the DSP slave processors.
> >>
> >> +config TI_K3_M4_REMOTEPROC
> >> +tristate "TI K3 M4 remoteproc support"
> >> +depends on ARCH_K3 || COMPILE_TEST
> >> +select MAILBOX
> >> +select OMAP2PLUS_MBOX
> >> +help
> >> +  Say m here to support TI's M4 remote processor subsystems
> >> +  on various TI K3 family of SoCs through the remote processor
> >> +  framework.
> >> +
> >> +  It's safe to say N here if you're not interested in utilizing
> >> +  a remote processor.
> >> +
> >>   config TI_K3_R5_REMOTEPROC
> >>  tristate "TI K3 R5 remoteproc support"
> >>  depends on ARCH_K3
> >> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> >> index 91314a9b43cef..5ff4e2fee4abd 100644
> >> --- a/drivers/remoteproc/Makefile
> >> +++ b/drivers/remoteproc/Makefile
> >> @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)+= 
> >> st_remoteproc.o
> >>   obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
> >>   obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
> >>   obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
> >> +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)   += ti_k3_m4_remoteproc.o
> >>   obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
> >>   obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
> >> diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> >> b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> >> new file mode 100644
> >> index 0..0030e509f6b5d
> >> --- /dev/null
> >> +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> >> @@ -0,0 +1,785 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * TI K3 Cortex-M4 Remote Processor(s) driver
> >> + *
> >> + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> >> https://www.ti.com/
> >> + *  Hari Nagalla 
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "omap_remoteproc.h"
> >> +#include "remoteproc_internal.h"
> >> +#include "ti_sci_proc.h"
> >> +
> >> +/**
> >> + * struct k3_m4_rproc_mem - internal memory structure
> >> + * @cpu_addr: MPU virtual address of the memory region
> >> + * @bus_addr: Bus address used to access the memory region
> >> + * @dev_addr: Device address of the memory region from remote processor 
> >> view
> >> + * @size: Size of the memory region
> >> + */
> >> +struct k3_m4_rproc_mem {
> >> +void __iomem *cpu_addr;
> >> +phys_addr_t bus_addr;
> >> +u32 dev_addr;
> >> +size_t size;
> >> +};
> >> +
> >> +/**
> >> + * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
> >> + * @name: name for this memory entry
> >> + * @dev_addr: device address for the memory entry
> >> + */
> >> +struct k3_m4_rproc_mem_data {
> >> +const char *name;
> >>

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-09 Thread Mathieu Poirier

On Wed, 8 May 2024 at 09:36, Andrew Davis  wrote:
>
> On 5/6/24 3:46 PM, Mathieu Poirier wrote:
> > Good day,
> >
> > I have started reviewing this patchset.  Comments will be scattered over
> > multiple days and as such, I will explicitly inform you when  am done with 
> > the
> > review.
> >
> > On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:
> >> From: Martyn Welch 
> >>
> >> The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> >> the MCU domain. This core is typically used for safety applications in a
> >> stand alone mode. However, some application (non safety related) may
> >> want to use the M4F core as a generic remote processor with IPC to the
> >> host processor. The M4F core has internal IRAM and DRAM memories and are
> >> exposed to the system bus for code and data loading.
> >>
> >> A remote processor driver is added to support this subsystem, including
> >> being able to load and boot the M4F core. Loading includes to M4F
> >> internal memories and predefined external code/data memories. The
> >> carve outs for external contiguous memory is defined in the M4F device
> >> node and should match with the external memory declarations in the M4F
> >> image binary. The M4F subsystem has two resets. One reset is for the
> >> entire subsystem i.e including the internal memories and the other, a
> >> local reset is only for the M4F processing core. When loading the image,
> >> the driver first releases the subsystem reset, loads the firmware image
> >> and then releases the local reset to let the M4F processing core run.
> >>
> >> Signed-off-by: Martyn Welch 
> >> Signed-off-by: Hari Nagalla 
> >> Signed-off-by: Andrew Davis 
> >> ---
> >>   drivers/remoteproc/Kconfig   |  13 +
> >>   drivers/remoteproc/Makefile  |   1 +
> >>   drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
> >>   3 files changed, 799 insertions(+)
> >>   create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> >>
> >> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> >> index 48845dc8fa852..1a7c0330c91a9 100644
> >> --- a/drivers/remoteproc/Kconfig
> >> +++ b/drivers/remoteproc/Kconfig
> >> @@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
> >>It's safe to say N here if you're not interested in utilizing
> >>the DSP slave processors.
> >>
> >> +config TI_K3_M4_REMOTEPROC
> >> +tristate "TI K3 M4 remoteproc support"
> >> +depends on ARCH_K3 || COMPILE_TEST
> >> +select MAILBOX
> >> +select OMAP2PLUS_MBOX
> >> +help
> >> +  Say m here to support TI's M4 remote processor subsystems
> >> +  on various TI K3 family of SoCs through the remote processor
> >> +  framework.
> >> +
> >> +  It's safe to say N here if you're not interested in utilizing
> >> +  a remote processor.
> >> +
> >>   config TI_K3_R5_REMOTEPROC
> >>  tristate "TI K3 R5 remoteproc support"
> >>  depends on ARCH_K3
> >> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> >> index 91314a9b43cef..5ff4e2fee4abd 100644
> >> --- a/drivers/remoteproc/Makefile
> >> +++ b/drivers/remoteproc/Makefile
> >> @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)+= 
> >> st_remoteproc.o
> >>   obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
> >>   obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
> >>   obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
> >> +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)   += ti_k3_m4_remoteproc.o
> >>   obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
> >>   obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
> >> diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> >> b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> >> new file mode 100644
> >> index 0..0030e509f6b5d
> >> --- /dev/null
> >> +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> >> @@ -0,0 +1,785 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * TI K3 Cortex-M4 Remote Processor(s) driver
> >> + *
> >> + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> >> https://www.ti.com/
> >> + *  Hari Nagalla 
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "omap_remoteproc.h"
> >> +#include "remoteproc_internal.h"
> >> +#include "ti_sci_proc.h"
> >> +
> >> +/**
> >> + * struct k3_m4_rproc_mem - internal memory structure
> >> + * @cpu_addr: MPU virtual address of the memory region
> >> + * @bus_addr: Bus address used to access the memory region
> >> + * @dev_addr: Device address of the memory region from remote processor 
> >> view
> >> + * @size: Size of the memory region
> >> + */
> >> +struct k3_m4_rproc_mem {
> >> +void __iomem *cpu_addr;
> >> +phys_addr_t bus_addr;
> >> +u32 dev_addr;
> >> +size_t size;
> >> +};
> >> +
> >> +/**
> >> + * struct k3_m4_rproc_mem_data - memory definitions

Re: [PATCH 1/1] livepatch: Rename KLP_* to KLP_TRANSITION_*

2024-05-09 Thread Petr Mladek

On Tue 2024-05-07 13:01:11, zhangwar...@gmail.com wrote:
> From: Wardenjohn 
> 
> The original macros of KLP_* is about the state of the transition.
> Rename macros of KLP_* to KLP_TRANSITION_* to fix the confusing
> description of klp transition state.
> 
> Signed-off-by: Wardenjohn 

JFYI, the patch has been comitted into livepatching.git, branch for-10.

Best regards,
Petr

Re: [PATCH v2] module: create weak dependecies

2024-05-09 Thread Lucas De Marchi


On Thu, May 09, 2024 at 12:24:40PM GMT, Jose Ignacio Tornos Martinez wrote:

It has been seen that for some network mac drivers (i.e. lan78xx) the
related module for the phy is loaded dynamically depending on the current
hardware. In this case, the associated phy is read using mdio bus and then
the associated phy module is loaded during runtime (kernel function
phy_request_driver_module). However, no software dependency is defined, so
the user tools will no be able to get this dependency. For example, if
dracut is used and the hardware is present, lan78xx will be included but no
phy module will be added, and in the next restart the device will not work
from boot because no related phy will be found during initramfs stage.

In order to solve this, we could define a normal 'pre' software dependency
in lan78xx module with all the possible phy modules (there may be some),
but proceeding in that way, all the possible phy modules would be loaded
while only one is necessary.

The idea is to create a new type of dependency, that we are going to call
'weak' to be used only by the user tools that need to detect this situation.
In that way, for example, dracut could check the 'weak' dependency of the
modules involved in order to install these dependencies in initramfs too.
That is, for the commented lan78xx module, defining the 'weak' dependency
with the possible phy modules list, only the necessary phy would be loaded
on demand keeping the same behavior, but all the possible phy modules would
be available from initramfs.


I think it's important a note about backward compatibility. If a system
doesn't have a new-enough depmod, it will basically not create the new
weadep file and initrd generators won't be able to use that. Only
downside is not being able to use the new feature, but it should still
work as previously.



The 'weak' dependency support has been included in kmod:
https://github.com/kmod-project/kmod/commit/05828b4a6e9327a63ef94df544a042b5e9ce4fe7

Signed-off-by: Jose Ignacio Tornos Martinez 
---
V1 -> V2:
- Include reference to 'weak' dependency support in kmod.

include/linux/module.h | 5 +
1 file changed, 5 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..231e710d8736 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -173,6 +173,11 @@ extern void cleanup_module(void);
 */
#define MODULE_SOFTDEP(_softdep) MODULE_INFO(softdep, _softdep)

+/* Weak module dependencies. See man modprobe.d for details.


Documentation/process/coding-style.rst section 8 says to balance the /*
and */, but up to Luis what he enforces in the module tree.

Reviewed-by: Lucas De Marchi 

Lucas De Marchi


+ * Example: MODULE_WEAKDEP("module-foo")
+ */
+#define MODULE_WEAKDEP(_weakdep) MODULE_INFO(weakdep, _weakdep)
+
/*
 * MODULE_FILE is used for generating modules.builtin
 * So, make it no-op when this is being built as a module
--
2.44.0

Re: [PATCH 1/1] livepatch: Rename KLP_* to KLP_TRANSITION_*

2024-05-09 Thread Miroslav Benes

On Tue, 7 May 2024, zhangwar...@gmail.com wrote:

> From: Wardenjohn 
> 
> The original macros of KLP_* is about the state of the transition.
> Rename macros of KLP_* to KLP_TRANSITION_* to fix the confusing
> description of klp transition state.
> 
> Signed-off-by: Wardenjohn 

Acked-by: Miroslav Benes 

M

[PATCH v1 1/1] Input: gpio-keys - expose wakeup keys in sysfs

2024-05-09 Thread Guido Günther

This helps user space to figure out which keys should be used to unidle a
device. E.g on phones the volume rocker should usually not unblank the
screen.

Signed-off-by: Guido Günther 
---
 drivers/input/keyboard/gpio_keys.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/input/keyboard/gpio_keys.c 
b/drivers/input/keyboard/gpio_keys.c
index 9f3bcd41cf67..84f43d1d4375 100644
--- a/drivers/input/keyboard/gpio_keys.c
+++ b/drivers/input/keyboard/gpio_keys.c
@@ -198,7 +198,8 @@ static void gpio_keys_enable_button(struct gpio_button_data 
*bdata)
  */
 static ssize_t gpio_keys_attr_show_helper(struct gpio_keys_drvdata *ddata,
  char *buf, unsigned int type,
- bool only_disabled)
+ bool only_disabled,
+ bool only_wakeup)
 {
int n_events = get_n_events_by_type(type);
unsigned long *bits;
@@ -218,6 +219,9 @@ static ssize_t gpio_keys_attr_show_helper(struct 
gpio_keys_drvdata *ddata,
if (only_disabled && !bdata->disabled)
continue;
 
+   if (only_wakeup && !bdata->button->wakeup)
+   continue;
+
__set_bit(*bdata->code, bits);
}
 
@@ -297,7 +301,7 @@ static ssize_t gpio_keys_attr_store_helper(struct 
gpio_keys_drvdata *ddata,
return error;
 }
 
-#define ATTR_SHOW_FN(name, type, only_disabled)
\
+#define ATTR_SHOW_FN(name, type, only_disabled, only_wakeup)   \
 static ssize_t gpio_keys_show_##name(struct device *dev,   \
 struct device_attribute *attr, \
 char *buf) \
@@ -306,22 +310,26 @@ static ssize_t gpio_keys_show_##name(struct device *dev,  
\
struct gpio_keys_drvdata *ddata = platform_get_drvdata(pdev);   \
\
return gpio_keys_attr_show_helper(ddata, buf,   \
- type, only_disabled); \
+ type, only_disabled,  \
+ only_wakeup); \
 }
 
-ATTR_SHOW_FN(keys, EV_KEY, false);
-ATTR_SHOW_FN(switches, EV_SW, false);
-ATTR_SHOW_FN(disabled_keys, EV_KEY, true);
-ATTR_SHOW_FN(disabled_switches, EV_SW, true);
+ATTR_SHOW_FN(keys, EV_KEY, false, false);
+ATTR_SHOW_FN(switches, EV_SW, false, false);
+ATTR_SHOW_FN(disabled_keys, EV_KEY, true, false);
+ATTR_SHOW_FN(disabled_switches, EV_SW, true, false);
+ATTR_SHOW_FN(wakeup_keys, EV_KEY, false, true);
 
 /*
  * ATTRIBUTES:
  *
  * /sys/devices/platform/gpio-keys/keys [ro]
  * /sys/devices/platform/gpio-keys/switches [ro]
+ * /sys/devices/platform/gpio-keys/wakeup_keys [ro]
  */
 static DEVICE_ATTR(keys, S_IRUGO, gpio_keys_show_keys, NULL);
 static DEVICE_ATTR(switches, S_IRUGO, gpio_keys_show_switches, NULL);
+static DEVICE_ATTR(wakeup_keys, S_IRUGO, gpio_keys_show_wakeup_keys, NULL);
 
 #define ATTR_STORE_FN(name, type)  \
 static ssize_t gpio_keys_store_##name(struct device *dev,  \
@@ -361,6 +369,7 @@ static struct attribute *gpio_keys_attrs[] = {
_attr_switches.attr,
_attr_disabled_keys.attr,
_attr_disabled_switches.attr,
+   _attr_wakeup_keys.attr,
NULL,
 };
 ATTRIBUTE_GROUPS(gpio_keys);
-- 
2.43.0

[PATCH v4] ftrace: Fix possible use-after-free issue in ftrace_location()

2024-05-09 Thread Zheng Yejian

KASAN reports a bug:

  BUG: KASAN: use-after-free in ftrace_location+0x90/0x120
  Read of size 8 at addr 888141d40010 by task insmod/424
  CPU: 8 PID: 424 Comm: insmod Tainted: GW  6.9.0-rc2+
  [...]
  Call Trace:
   
   dump_stack_lvl+0x68/0xa0
   print_report+0xcf/0x610
   kasan_report+0xb5/0xe0
   ftrace_location+0x90/0x120
   register_kprobe+0x14b/0xa40
   kprobe_init+0x2d/0xff0 [kprobe_example]
   do_one_initcall+0x8f/0x2d0
   do_init_module+0x13a/0x3c0
   load_module+0x3082/0x33d0
   init_module_from_file+0xd2/0x130
   __x64_sys_finit_module+0x306/0x440
   do_syscall_64+0x68/0x140
   entry_SYSCALL_64_after_hwframe+0x71/0x79

The root cause is that, in lookup_rec(), ftrace record of some address
is being searched in ftrace pages of some module, but those ftrace pages
at the same time is being freed in ftrace_release_mod() as the
corresponding module is being deleted:

   CPU1   |  CPU2
  register_kprobes() {| delete_module() {
check_kprobe_address_safe() { |
  arch_check_ftrace_location() {  |
ftrace_location() {   |
  lookup_rec() // USE!|   ftrace_release_mod() // Free!

To fix this issue:
  1. Hold rcu lock as accessing ftrace pages in ftrace_location_range();
  2. Use ftrace_location_range() instead of lookup_rec() in
 ftrace_location();
  3. Call synchronize_rcu() before freeing any ftrace pages both in
 ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem().

Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Suggested-by: Steven Rostedt 
Signed-off-by: Zheng Yejian 
---
 kernel/trace/ftrace.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

v4:
 - Simply add rcu locking into ftrace_location_range() as suggested by Steve.
   Link: 
https://lore.kernel.org/linux-trace-kernel/20240502170743.15a5f...@gandalf.local.home/

v3:
 - Complete the commit description and add Suggested-by tag
 - Add comments around where synchronize_rcu() is called
 - Link: 
https://lore.kernel.org/linux-trace-kernel/20240417032830.1764690-1-zhengyeji...@huawei.com/

v2:
 - Link: 
https://lore.kernel.org/all/20240416112459.1444612-1-zhengyeji...@huawei.com/
 - Use RCU lock instead of holding ftrace_lock as suggested by Steve.
   Link: https://lore.kernel.org/all/20240410112823.1d084...@gandalf.local.home/

v1:
 - Link: 
https://lore.kernel.org/all/20240401125543.1282845-1-zhengyeji...@huawei.com/

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da1710499698..1e12df9bb531 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1595,12 +1595,15 @@ static struct dyn_ftrace *lookup_rec(unsigned long 
start, unsigned long end)
 unsigned long ftrace_location_range(unsigned long start, unsigned long end)
 {
struct dyn_ftrace *rec;
+   unsigned long ip = 0;
 
+   rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
-   return rec->ip;
+   ip = rec->ip;
+   rcu_read_unlock();
 
-   return 0;
+   return ip;
 }
 
 /**
@@ -1614,25 +1617,22 @@ unsigned long ftrace_location_range(unsigned long 
start, unsigned long end)
  */
 unsigned long ftrace_location(unsigned long ip)
 {
-   struct dyn_ftrace *rec;
+   unsigned long loc;
unsigned long offset;
unsigned long size;
 
-   rec = lookup_rec(ip, ip);
-   if (!rec) {
+   loc = ftrace_location_range(ip, ip);
+   if (!loc) {
if (!kallsyms_lookup_size_offset(ip, , ))
goto out;
 
/* map sym+0 to __fentry__ */
if (!offset)
-   rec = lookup_rec(ip, ip + size - 1);
+   loc = ftrace_location_range(ip, ip + size - 1);
}
 
-   if (rec)
-   return rec->ip;
-
 out:
-   return 0;
+   return loc;
 }
 
 /**
@@ -6596,6 +6596,8 @@ static int ftrace_process_locs(struct module *mod,
/* We should have used all pages unless we skipped some */
if (pg_unuse) {
WARN_ON(!skipped);
+   /* Need to synchronize with ftrace_location_range() */
+   synchronize_rcu();
ftrace_free_pages(pg_unuse);
}
return ret;
@@ -6809,6 +6811,9 @@ void ftrace_release_mod(struct module *mod)
  out_unlock:
mutex_unlock(_lock);
 
+   /* Need to synchronize with ftrace_location_range() */
+   if (tmp_page)
+   synchronize_rcu();
for (pg = tmp_page; pg; pg = tmp_page) {
 
/* Needs to be called outside of ftrace_lock */
@@ -7142,6 +7147,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, 
void *end_ptr)
unsigned long start = (unsigned long)(start_ptr);
unsigned long end = (unsigned long)(end_ptr);
struct ftrace_page **last_pg = _pages_start;
+   struct ftrace_page *tmp_page = NULL;
struct

Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-09 Thread Vincent Donnefort

On Tue, May 07, 2024 at 10:34:02PM -0400, Steven Rostedt wrote:
> On Tue, 30 Apr 2024 12:13:51 +0100
> Vincent Donnefort  wrote:
> 
> > +#ifdef CONFIG_MMU
> > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> > +   struct vm_area_struct *vma)
> > +{
> > +   unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
> > +   unsigned int subbuf_pages, subbuf_order;
> > +   struct page **pages;
> > +   int p = 0, s = 0;
> > +   int err;
> > +
> > +   /* Refuse MP_PRIVATE or writable mappings */
> > +   if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> > +   !(vma->vm_flags & VM_MAYSHARE))
> > +   return -EPERM;
> > +
> > +   /*
> > +* Make sure the mapping cannot become writable later. Also tell the VM
> > +* to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). Finally,
> > +* prevent migration, GUP and dump (VM_IO).
> > +*/
> > +   vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, VM_MAYWRITE);
> 
> Do we really need the VM_IO?
> 
> When testing this in gdb, I would get:
> 
> (gdb) p tmap->map->subbuf_size
> Cannot access memory at address 0x77fc2008
> 
> It appears that you can't ptrace IO memory. When I removed that flag,
> gdb has no problem reading that memory.

Yeah, VM_IO indeed implies DONTDUMP. VM_IO was part of Linus recommendations.
But perhaps, VM_DONTEXPAND and MIXEDMAP (implicitely set by vm_insert_pages) are
enough protection?

I don't see how anything could use GUP there and as David pointed-out on the
previous version, it doesn't event prevent the GUP-fast path.

> 
> I think we should drop that flag.
> 
> Can you send a v23 with that removed, Shuah's update, and also the
> change below:

Ack.

[...]

[PATCH v2] module: create weak dependecies

2024-05-09 Thread Jose Ignacio Tornos Martinez

It has been seen that for some network mac drivers (i.e. lan78xx) the
related module for the phy is loaded dynamically depending on the current
hardware. In this case, the associated phy is read using mdio bus and then
the associated phy module is loaded during runtime (kernel function
phy_request_driver_module). However, no software dependency is defined, so
the user tools will no be able to get this dependency. For example, if
dracut is used and the hardware is present, lan78xx will be included but no
phy module will be added, and in the next restart the device will not work
from boot because no related phy will be found during initramfs stage.

In order to solve this, we could define a normal 'pre' software dependency
in lan78xx module with all the possible phy modules (there may be some),
but proceeding in that way, all the possible phy modules would be loaded
while only one is necessary.

The idea is to create a new type of dependency, that we are going to call
'weak' to be used only by the user tools that need to detect this situation.
In that way, for example, dracut could check the 'weak' dependency of the
modules involved in order to install these dependencies in initramfs too.
That is, for the commented lan78xx module, defining the 'weak' dependency
with the possible phy modules list, only the necessary phy would be loaded
on demand keeping the same behavior, but all the possible phy modules would
be available from initramfs.

The 'weak' dependency support has been included in kmod:
https://github.com/kmod-project/kmod/commit/05828b4a6e9327a63ef94df544a042b5e9ce4fe7

Signed-off-by: Jose Ignacio Tornos Martinez 
---
V1 -> V2:
- Include reference to 'weak' dependency support in kmod.

 include/linux/module.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..231e710d8736 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -173,6 +173,11 @@ extern void cleanup_module(void);
  */
 #define MODULE_SOFTDEP(_softdep) MODULE_INFO(softdep, _softdep)
 
+/* Weak module dependencies. See man modprobe.d for details.
+ * Example: MODULE_WEAKDEP("module-foo")
+ */
+#define MODULE_WEAKDEP(_weakdep) MODULE_INFO(weakdep, _weakdep)
+
 /*
  * MODULE_FILE is used for generating modules.builtin
  * So, make it no-op when this is being built as a module
-- 
2.44.0

Re: [PATCHv5 bpf-next 6/8] x86/shstk: Add return uprobe support

2024-05-09 Thread Jiri Olsa

On Tue, May 07, 2024 at 05:35:54PM +, Edgecombe, Rick P wrote:
> On Tue, 2024-05-07 at 12:53 +0200, Jiri Olsa wrote:
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 81e6ee95784d..ae6c3458a675 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -406,6 +406,11 @@ SYSCALL_DEFINE0(uretprobe)
> >  * trampoline's ret instruction
> >  */
> > r11_cx_ax[2] = regs->ip;
> > +
> > +   /* make the shadow stack follow that */
> > +   if (shstk_push_frame(regs->ip))
> > +   goto sigill;
> > +
> > regs->ip = ip;
> >  
> 
> Per the earlier discussion, this cannot be reached unless uretprobes are in 
> use,
> which cannot happen without something with privileges taking an action. But 
> are
> uretprobes ever used for monitoring applications where security is important? 
> Or
> is it strictly a debug-time thing?

sorry, I don't have that level of detail, but we do have customers
that use uprobes in general or want to use it and complain about
the speed

there are several tools in bcc [1] that use uretprobes in scripts,
like:
  memleak, sslsniff, trace, bashreadline, gethostlatency, argdist,
  funclatency

jirka


[1] https://github.com/iovisor/bcc

general protection fault in crypto_skcipher_encrypt

2024-05-09 Thread Ubisectech Sirius

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:

bcachefs (loop0): error validating btree node on loop0 at btree extents level 
0/0
  u64s 11 type btree_ptr_v2 SPOS_MAX len 0 ver 0: seq 83426fcb67886cbe written 
16 min_key POS_MIN durability: 1 ptr: 0:27:0 gen 0
  node offset 0 bset u64s 0: unknown checksum type 4, fixing
general protection fault, probably for non-canonical address 
0xdc04:  [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0020-0x0027]
CPU: 0 PID: 25892 Comm: syz-executor.0 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:crypto_skcipher_alg include/crypto/skcipher.h:387 [inline]
RIP: 0010:crypto_skcipher_encrypt+0x48/0x170 crypto/skcipher.c:656
Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 17 01 00 00 48 b8 00 00 00 00 00 
fc ff df 48 8b 5d 40 48 8d 7b 18 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 08 01 
00 00 48 8d 7b 04 4c 8b 63 18 48 b8 00 00
RSP: 0018:c90002d46388 EFLAGS: 00010202
RAX: dc00 RBX: 0008 RCX: c9000e04a000
RDX: 0004 RSI: 84162b50 RDI: 0020
RBP: c90002d463f8 R08: 0020 R09: 0001
R10: 0001 R11:  R12: 0008
R13: 0020 R14: 88801133d600 R15: c90002d467d8
FS:  7fdd0174b640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 001b2df21000 CR3: 57314000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 do_encrypt_sg+0xbc/0x150 fs/bcachefs/checksum.c:107
 do_encrypt+0x27d/0x430 fs/bcachefs/checksum.c:149
 gen_poly_key.isra.0+0x144/0x310 fs/bcachefs/checksum.c:190
 bch2_checksum+0x1bc/0x2b0 fs/bcachefs/checksum.c:226
 bch2_btree_node_read_done+0x75b/0x45a0 fs/bcachefs/btree_io.c:1003
 btree_node_read_work+0x77f/0x10e0 fs/bcachefs/btree_io.c:1262
 bch2_btree_node_read+0xef3/0x1460 fs/bcachefs/btree_io.c:1641
 __bch2_btree_root_read fs/bcachefs/btree_io.c:1680 [inline]
 bch2_btree_root_read+0x2bc/0x670 fs/bcachefs/btree_io.c:1704
 read_btree_roots fs/bcachefs/recovery.c:384 [inline]
 bch2_fs_recovery+0x28b0/0x52d0 fs/bcachefs/recovery.c:914
 bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978
 bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968
 bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863
 legacy_get_tree+0x109/0x220 fs/fs_context.c:662
 vfs_get_tree+0x93/0x380 fs/super.c:1771
 do_new_mount fs/namespace.c:3337 [inline]
 path_mount+0x679/0x1e40 fs/namespace.c:3664
 do_mount fs/namespace.c:3677 [inline]
 __do_sys_mount fs/namespace.c:3886 [inline]
 __se_sys_mount fs/namespace.c:3863 [inline]
 __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7fdd00a91b3e
Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7fdd0174ae38 EFLAGS: 0202 ORIG_RAX: 00a5
RAX: ffda RBX: 000119fc RCX: 7fdd00a91b3e
RDX: 20011a00 RSI: 2040 RDI: 7fdd0174ae90
RBP: 7fdd0174aed0 R08: 7fdd0174aed0 R09: 0001
R10: 0001 R11: 0202 R12: 20011a00
R13: 2040 R14: 7fdd0174ae90 R15: 2100
 
Modules linked in:
---[ end trace  ]---
RIP: 0010:crypto_skcipher_alg include/crypto/skcipher.h:387 [inline]
RIP: 0010:crypto_skcipher_encrypt+0x48/0x170 crypto/skcipher.c:656
Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 17 01 00 00 48 b8 00 00 00 00 00 
fc ff df 48 8b 5d 40 48 8d 7b 18 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 08 01 
00 00 48 8d 7b 04 4c 8b 63 18 48 b8 00 00
RSP: 0018:c90002d46388 EFLAGS: 00010202
RAX: dc00 RBX: 0008 RCX: c9000e04a000
RDX: 0004 RSI: 84162b50 RDI: 0020
RBP: c90002d463f8 R08: 0020 R09: 0001
R10: 0001 R11:  R12: 0008
R13: 0020 R14: 88801133d600 R15: c90002d467d8
FS:  7fdd0174b640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 55e4cd278120 CR3: 57314000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554

Code disassembly (best guess):
   0:   48 89 fa

WARNING in fscrypt_fname_siphash

2024-05-09 Thread Ubisectech Sirius

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:

[ cut here ]
WARNING: CPU: 0 PID: 10070 at fs/crypto/fname.c:573 
fscrypt_fname_siphash+0xdf/0x120 fs/crypto/fname.c:573
Modules linked in:
CPU: 0 PID: 10070 Comm: syz-executor.2 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:fscrypt_fname_siphash+0xdf/0x120 fs/crypto/fname.c:573
Code: 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 75 3f 48 8b 7b 08 48 
83 c4 10 5b 5d 41 5c e9 d7 79 6e 08 e8 f2 a2 80 ff 90 <0f> 0b 90 eb 93 48 89 14 
24 e8 b3 5b d7 ff 48 8b 14 24 eb b7 e8 48
RSP: 0018:c90002887238 EFLAGS: 00010246
RAX: 0004 RBX: c900028872d0 RCX: c900030cb000
RDX: 0004 RSI: 8209535e RDI: 0001
RBP: 888063065180 R08: 0001 R09: 
R10:  R11: 8d4c4d91 R12: 
R13: 0001 R14: dc00 R15: 888056aba0c4
FS:  7ff63b86f640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff5442398e0 CR3: 57fd8000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 __ext4fs_dirhash+0x36a/0xb60 fs/ext4/hash.c:268
 ext4fs_dirhash+0x246/0x2d0 fs/ext4/hash.c:322
 htree_dirblock_to_tree+0x821/0xc90 fs/ext4/namei.c:1124
 ext4_htree_fill_tree+0x32d/0xc40 fs/ext4/namei.c:1219
 ext4_dx_readdir fs/ext4/dir.c:597 [inline]
 ext4_readdir+0x1d2d/0x36a0 fs/ext4/dir.c:142
 iterate_dir+0x1f4/0x5c0 fs/readdir.c:106
 ovl_dir_read fs/overlayfs/readdir.c:315 [inline]
 ovl_indexdir_cleanup+0x2e2/0x940 fs/overlayfs/readdir.c:1183
 ovl_get_indexdir fs/overlayfs/super.c:887 [inline]
 ovl_fill_super+0x414d/0x6560 fs/overlayfs/super.c:1404
 vfs_get_super fs/super.c:1338 [inline]
 get_tree_nodev+0xda/0x190 fs/super.c:1357
 vfs_get_tree+0x93/0x380 fs/super.c:1771
 do_new_mount fs/namespace.c:3337 [inline]
 path_mount+0x679/0x1e40 fs/namespace.c:3664
 do_mount fs/namespace.c:3677 [inline]
 __do_sys_mount fs/namespace.c:3886 [inline]
 __se_sys_mount fs/namespace.c:3863 [inline]
 __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7ff63aa8fd6d
Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ff63b86f028 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 7ff63abcc120 RCX: 7ff63aa8fd6d
RDX: 2000 RSI: 2040 RDI: 
RBP: 7ff63aaf14cd R08: 2140 R09: 
R10:  R11: 0246 R12: 
R13: 006e R14: 7ff63abcc120 R15: 7ff63b84f000
 

Thank you for taking the time to read this email and we look forward to working 
with you further.





poc.c
Description: Binary data

Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue

2024-05-08 Thread Feng Liu



On 2024-05-08 a.m.7:18, Catherine Redfield wrote:

*External email: Use caution opening links or attachments*


On a VM with the GCP kernel (where we first identified the problem), I see:

1. The full kernel log from `journalctl --system > kernlog` attached.  
The specific suspend section is here:


May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
systemd[1]: Reached target sleep.target - Sleep.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
systemd[1]: Starting systemd-suspend.service - System Suspend...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
systemd-sleep[1413]: Performing sleep operation 'suspend'...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: PM: suspend entry (deep)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: Filesystems sync: 0.008 seconds
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: Freezing user space processes
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: Freezing user space processes completed (elapsed 0.001 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: OOM killer disabled.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: Freezing remaining freezable tasks
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: port 00:03:0.0: PM: dpm_run_callback(): 
pm_runtime_force_suspend+0x0/0x130 returns -16
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal 
kernel: port 00:03:0.0: PM: failed to suspend: error -16


Thanks Joesph and Catherine's help.

Hi,

I have alreay synced up with Cananical guys offline about this issue.

I can run "suspend/resume" sucessfully on my local server and VM.
And "PM: failed to suspend: error -16" looks like not cause by my 
previous virtio patch ( fd27ef6b44be  ("virtio-pci: Introduce admin 
virtqueue")) which only modified "virtio_device_freeze" about "suspend" 
action.


So I have provide the my steps and debug patch to Joesph and Catherine. 
I will also sync up the information here, as follow:


I have read the qemu code and find a way to trigger "suspend/resume" on 
my setup, and add some debug message in the latest kerenel


My setps are:
1. QEMU cmdline add following

-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=1 \

-netdev type=tap,ifname=tap0,id=hostnet0,script=no,downscript=no \
-device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=$SSH_MAC,bus=pci.0,addr=0x3 \

..

2. In the VM, run "systemctl suspend" to PM suspend the VM into memory
3. In qemu hmp shell, run "system_wakeup" to resume the VM again

My VM configuration:
NIC: 1 virtio nic emulated by QEMU
OS:  Ubuntu 22.04.4 LTS
kernel:  latest kernel, 6.9-rc7: ee5b455b0ada (kernel2/net-next-virito, 
kernel2/master, master) Merge tag 'slab-for-6.9-rc7-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab)



I add some debug message on the latest kernel, and do above steps to 
trigger "suspen/resume". Everything of VM is OK, VM could suspend/resume 
successfully.

Follwing is the kernel log:


May  6 15:59:52 feliu-vm kernel: [   43.446737] PM: suspend entry (deep)
May  6 16:00:04 feliu-vm kernel: [   43.467640] Filesystems sync: 0.020 
seconds
May  6 16:00:04 feliu-vm kernel: [   43.467923] Freezing user space 
processes
May  6 16:00:04 feliu-vm kernel: [   43.470294] Freezing user space 
processes completed (elapsed 0.002 seconds)

May  6 16:00:04 feliu-vm kernel: [   43.470299] OOM killer disabled.
May  6 16:00:04 feliu-vm kernel: [   43.470301] Freezing remaining 
freezable tasks
May  6 16:00:04 feliu-vm kernel: [   43.471482] Freezing remaining 
freezable tasks completed (elapsed 0.001 seconds)
May  6 16:00:04 feliu-vm kernel: [   43.471495] printk: Suspending 
console(s) (use no_console_suspend to debug)
May  6 16:00:04 feliu-vm kernel: [   43.474034] virtio_net virtio0: 
godeng virtio device freeze
May  6 16:00:04 feliu-vm kernel: [   43.475714] virtio_net virtio0 ens3: 
godfeng virtnet_freeze done
May  6 16:00:04 feliu-vm kernel: [   43.475717] virtio_net virtio0: 
godfeng VIRTIO_F_ADMIN_VQ not enabled
May  6 16:00:04 feliu-vm kernel: [   43.475719] virtio_net virtio0: 
godeng virtio device freeze done


May  6 16:00:04 feliu-vm kernel: [   43.535382] smpboot: CPU 1 is now 
offline
May  6 16:00:04 feliu-vm kernel: [   43.537283] IRQ fixup: irq 1 move in 
progress, old vector 32
May  6 16:00:04 feliu-vm kernel: [   43.538504] smpboot: CPU 2 is now 
offline
May  6 16:00:04 feliu-vm kernel: [   43.541392]

Re: [PATCH v3] ftrace: Fix possible use-after-free issue in ftrace_location()

2024-05-08 Thread Zheng Yejian


On 2024/5/3 05:07, Steven Rostedt wrote:

On Wed, 17 Apr 2024 11:28:30 +0800
Zheng Yejian  wrote:


diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da1710499698..e05d3e3dc06a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1581,7 +1581,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, 
unsigned long end)
  }
  
  /**

- * ftrace_location_range - return the first address of a traced location
+ * ftrace_location_range_rcu - return the first address of a traced location


kerneldoc comments are for external functions. You need to move this down
to ftrace_location_range() as here you are commenting a local static function.


I'll do it in v4.



But I have to ask, why did you create this static function anyway? There's
only one user of it (the ftrace_location_range()). Why didn't you just
simply add the rcu locking there?


Yes, the only-one-user function looks ugly.
At first thought that ftrace_location_range() needs to a lock, I just do like 
that,
no specital reason.



unsigned long ftrace_location_range(unsigned long start, unsigned long end)
{
struct dyn_ftrace *rec;
unsigned long ip = 0;

rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
ip = rec->ip;
rcu_read_unlock();

return ip;
}

-- Steve



   *if it touches the given ip range
   * @start: start of range to search.
   * @end: end of range to search (inclusive). @end points to the last byte
@@ -1592,7 +1592,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, 
unsigned long end)
   * that is either a NOP or call to the function tracer. It checks the ftrace
   * internal tables to determine if the address belongs or not.
   */
-unsigned long ftrace_location_range(unsigned long start, unsigned long end)
+static unsigned long ftrace_location_range_rcu(unsigned long start, unsigned 
long end)
  {
struct dyn_ftrace *rec;
  
@@ -1603,6 +1603,16 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)

return 0;
  }
  
+unsigned long ftrace_location_range(unsigned long start, unsigned long end)

+{
+   unsigned long loc;
+
+   rcu_read_lock();
+   loc = ftrace_location_range_rcu(start, end);
+   rcu_read_unlock();
+   return loc;
+}

[PATCH 4/4] selftests/bpf: add test validating uprobe/uretprobe stack traces

2024-05-08 Thread Andrii Nakryiko

Add a set of tests to validate that stack traces captured from or in the
presence of active uprobes and uretprobes are valid and complete.

For this we use BPF program that are installed either on entry or exit
of user function, plus deep-nested USDT. One of target funtions
(target_1) is recursive to generate two different entries in the stack
trace for the same uprobe/uretprobe, testing potential edge conditions.

Without fixes in this patch set, we get something like this for one of
the scenarios:

 caller: 0x758fff - 0x7595ab
 target_1: 0x758fd5 - 0x758fff
 target_2: 0x758fca - 0x758fd5
 target_3: 0x758fbf - 0x758fca
 target_4: 0x758fb3 - 0x758fbf
 ENTRY #0: 0x758fb3 (in target_4)
 ENTRY #1: 0x758fd3 (in target_2)
 ENTRY #2: 0x758ffd (in target_1)
 ENTRY #3: 0x7fffe000
 ENTRY #4: 0x7fffe000
 ENTRY #5: 0x6f8f39
 ENTRY #6: 0x6fa6f0
 ENTRY #7: 0x7f403f229590

Entry #3 and #4 (0x7fffe000) are uretprobe trampoline addresses
which obscure actual target_1 and another target_1 invocations. Also
note that between entry #0 and entry #1 we are missing an entry for
target_3, which is fixed in patch #2.

With all the fixes, we get desired full stack traces:

 caller: 0x758fff - 0x7595ab
 target_1: 0x758fd5 - 0x758fff
 target_2: 0x758fca - 0x758fd5
 target_3: 0x758fbf - 0x758fca
 target_4: 0x758fb3 - 0x758fbf
 ENTRY #0: 0x758fb7 (in target_4)
 ENTRY #1: 0x758fc8 (in target_3)
 ENTRY #2: 0x758fd3 (in target_2)
 ENTRY #3: 0x758ffd (in target_1)
 ENTRY #4: 0x758ff3 (in target_1)
 ENTRY #5: 0x75922c (in caller)
 ENTRY #6: 0x6f8f39
 ENTRY #7: 0x6fa6f0
 ENTRY #8: 0x7f986adc4cd0

Now there is a logical and complete sequence of function calls.

Signed-off-by: Andrii Nakryiko 
---
 .../bpf/prog_tests/uretprobe_stack.c  | 185 ++
 .../selftests/bpf/progs/uretprobe_stack.c |  96 +
 2 files changed, 281 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
 create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_stack.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c 
b/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
new file mode 100644
index ..2974553da6e5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include 
+#include "uretprobe_stack.skel.h"
+#include "../sdt.h"
+
+/* We set up target_1() -> target_2() -> target_3() -> target_4() -> USDT()
+ * call chain, each being traced by our BPF program. On entry or return from
+ * each target_*() we are capturing user stack trace and recording it in
+ * global variable, so that user space part of the test can validate it.
+ *
+ * Note, we put each target function into a custom section to get those
+ * __start_XXX/__stop_XXX symbols, generated by linker for us, which allow us
+ * to know address range of those functions
+ */
+__attribute__((section("uprobe__target_4")))
+__weak int target_4(void)
+{
+   STAP_PROBE1(uretprobe_stack, target, 42);
+   return 42;
+}
+
+extern const void *__start_uprobe__target_4;
+extern const void *__stop_uprobe__target_4;
+
+__attribute__((section("uprobe__target_3")))
+__weak int target_3(void)
+{
+   return target_4();
+}
+
+extern const void *__start_uprobe__target_3;
+extern const void *__stop_uprobe__target_3;
+
+__attribute__((section("uprobe__target_2")))
+__weak int target_2(void)
+{
+   return target_3();
+}
+
+extern const void *__start_uprobe__target_2;
+extern const void *__stop_uprobe__target_2;
+
+__attribute__((section("uprobe__target_1")))
+__weak int target_1(int depth)
+{
+   if (depth < 1)
+   return 1 + target_1(depth + 1);
+   else
+   return target_2();
+}
+
+extern const void *__start_uprobe__target_1;
+extern const void *__stop_uprobe__target_1;
+
+extern const void *__start_uretprobe_stack_sec;
+extern const void *__stop_uretprobe_stack_sec;
+
+struct range {
+   long start;
+   long stop;
+};
+
+static struct range targets[] = {
+   {}, /* we want target_1 to map to target[1], so need 1-based indexing */
+   { (long)&__start_uprobe__target_1, (long)&__stop_uprobe__target_1 },
+   { (long)&__start_uprobe__target_2, (long)&__stop_uprobe__target_2 },
+   { (long)&__start_uprobe__target_3, (long)&__stop_uprobe__target_3 },
+   { (long)&__start_uprobe__target_4, (long)&__stop_uprobe__target_4 },
+};
+
+static struct range caller = {
+   (long)&__start_uretprobe_stack_sec,
+   (long)&__stop_uretprobe_stack_sec,
+};
+
+static void validate_stack(__u64 *ips, int stack_len, int cnt, ...)
+{
+   int i, j;
+   va_list args;
+
+   if (!ASSERT_GT(stack_len, 0, "stack_len"))
+   return;
+
+   stack_len /= 8;
+
+   /* check if we have enough entries to satisfy test expectations */
+   if (!ASSERT_GE(stack_len, cnt, "stack_len2"))
+

[PATCH 3/4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-05-08 Thread Andrii Nakryiko

When tracing user functions with uprobe functionality, it's common to
install the probe (e.g., a BPF program) at the first instruction of the
function. This is often going to be `push %rbp` instruction in function
preamble, which means that within that function frame pointer hasn't
been established yet. This leads to consistently missing an actual
caller of the traced function, because perf_callchain_user() only
records current IP (capturing traced function) and then following frame
pointer chain (which would be caller's frame, containing the address of
caller's caller).

So when we have target_1 -> target_2 -> target_3 call chain and we are
tracing an entry to target_3, captured stack trace will report
target_1 -> target_3 call chain, which is wrong and confusing.

This patch proposes a x86-64-specific heuristic to detect `push %rbp`
instruction being traced. If that's the case, with the assumption that
applicatoin is compiled with frame pointers, this instruction would be
a strong indicator that this is the entry to the function. In that case,
return address is still pointed to by %rsp, so we fetch it and add to
stack trace before proceeding to unwind the rest using frame
pointer-based logic.

Signed-off-by: Andrii Nakryiko 
---
 arch/x86/events/core.c  | 20 
 include/linux/uprobes.h |  2 ++
 kernel/events/uprobes.c |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..82d5570b58ff 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2884,6 +2884,26 @@ perf_callchain_user(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *regs
return;
 
pagefault_disable();
+
+#ifdef CONFIG_UPROBES
+   /*
+* If we are called from uprobe handler, and we are indeed at the very
+* entry to user function (which is normally a `push %rbp` instruction,
+* under assumption of application being compiled with frame pointers),
+* we should read return address from *regs->sp before proceeding
+* to follow frame pointers, otherwise we'll skip immediate caller
+* as %rbp is not yet setup.
+*/
+   if (current->utask) {
+   struct arch_uprobe *auprobe = current->utask->auprobe;
+   u64 ret_addr;
+
+   if (auprobe && auprobe->insn[0] == 0x55 /* push %rbp */ &&
+   !__get_user(ret_addr, (const u64 __user *)regs->sp))
+   perf_callchain_store(entry, ret_addr);
+   }
+#endif
+
while (entry->nr < entry->max_stack) {
if (!valid_user_frame(fp, sizeof(frame)))
break;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0c57eec85339..7b785cd30d86 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -76,6 +76,8 @@ struct uprobe_task {
struct uprobe   *active_uprobe;
unsigned long   xol_vaddr;
 
+   struct arch_uprobe  *auprobe;
+
struct return_instance  *return_instances;
unsigned intdepth;
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1c99380dc89d..504693845187 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2072,6 +2072,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
bool need_prep = false; /* prepare return uprobe, when needed */
 
down_read(>register_rwsem);
+   current->utask->auprobe = >arch;
for (uc = uprobe->consumers; uc; uc = uc->next) {
int rc = 0;
 
@@ -2086,6 +2087,7 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
 
remove &= rc;
}
+   current->utask->auprobe = NULL;
 
if (need_prep && !remove)
prepare_uretprobe(uprobe, regs); /* put bp at return */
-- 
2.43.0

[PATCH 2/4] perf,uprobes: fix user stack traces in the presence of pending uretprobes

2024-05-08 Thread Andrii Nakryiko

When kernel has pending uretprobes installed, it hijacks original user
function return address on the stack with a uretprobe trampoline
address. There could be multiple such pending uretprobes (either on
different user functions or on the same recursive one) at any given
time within the same task.

This approach interferes with the user stack trace capture logic, which
would report suprising addresses (like 0x7fffe000) that correspond
to a special "[uprobes]" section that kernel installs in the target
process address space for uretprobe trampoline code, while logically it
should be an address somewhere within the calling function of another
traced user function.

This is easy to correct for, though. Uprobes subsystem keeps track of
pending uretprobes and records original return addresses. This patch is
using this to do a post-processing step and restore each trampoline
address entries with correct original return address. This is done only
if there are pending uretprobes for current task.

Reported-by: Riham Selim 
Signed-off-by: Andrii Nakryiko 
---
 kernel/events/callchain.c | 42 ++-
 kernel/events/uprobes.c   |  9 +
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 1273be84392c..2f7ceca7ae3f 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -176,13 +177,50 @@ put_callchain_entry(int rctx)
put_recursion_context(this_cpu_ptr(callchain_recursion), rctx);
 }
 
+static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry 
*entry,
+  int start_entry_idx)
+{
+#ifdef CONFIG_UPROBES
+   struct uprobe_task *utask = current->utask;
+   struct return_instance *ri;
+   __u64 *cur_ip, *last_ip, tramp_addr;
+
+   if (likely(!utask || !utask->return_instances))
+   return;
+
+   cur_ip = >ip[start_entry_idx];
+   last_ip = >ip[entry->nr - 1];
+   ri = utask->return_instances;
+   tramp_addr = uprobe_get_trampoline_vaddr();
+
+   /* If there are pending uretprobes for current thread, they are
+* recorded in a list inside utask->return_instances; each such
+* pending uretprobe replaces traced user function's return address on
+* the stack, so when stack trace is captured, instead of seeing
+* actual function's return address, we'll have one or many uretprobe
+* trampoline addresses in the stack trace, which are not helpful and
+* misleading to users.
+* So here we go over the pending list of uretprobes, and each
+* encountered trampoline address is replaced with actual return
+* address.
+*/
+   while (ri && cur_ip <= last_ip) {
+   if (*cur_ip == tramp_addr) {
+   *cur_ip = ri->orig_ret_vaddr;
+   ri = ri->next;
+   }
+   cur_ip++;
+   }
+#endif
+}
+
 struct perf_callchain_entry *
 get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
   u32 max_stack, bool crosstask, bool add_mark)
 {
struct perf_callchain_entry *entry;
struct perf_callchain_entry_ctx ctx;
-   int rctx;
+   int rctx, start_entry_idx;
 
entry = get_callchain_entry();
if (!entry)
@@ -215,7 +253,9 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool 
kernel, bool user,
if (add_mark)
perf_callchain_store_context(, 
PERF_CONTEXT_USER);
 
+   start_entry_idx = entry->nr;
perf_callchain_user(, regs);
+   fixup_uretprobe_trampoline_entries(entry, 
start_entry_idx);
}
}
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d60d24f0f2f4..1c99380dc89d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2149,6 +2149,15 @@ static void handle_trampoline(struct pt_regs *regs)
 
instruction_pointer_set(regs, ri->orig_ret_vaddr);
do {
+   /* pop current instance from the stack of pending 
return instances,
+* as it's not pending anymore: we just fixed up 
original
+* instruction pointer in regs and are about to call 
handlers;
+* this allows fixup_uretprobe_trampoline_entries() to 
properly fix up
+* captured stack traces from uretprobe handlers, in 
which pending
+* trampoline addresses on the stack are replaced with 
correct
+* original return addresses
+*/
+   utask->return_instances = ri->next;
if (valid)

[PATCH 1/4] uprobes: rename get_trampoline_vaddr() and make it global

2024-05-08 Thread Andrii Nakryiko

This helper is needed in another file, so make it a bit more uniquely
named and expose it internally.

Signed-off-by: Andrii Nakryiko 
---
 include/linux/uprobes.h | 1 +
 kernel/events/uprobes.c | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..0c57eec85339 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -138,6 +138,7 @@ extern bool arch_uretprobe_is_alive(struct return_instance 
*ret, enum rp_check c
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 void *src, unsigned long len);
+extern unsigned long uprobe_get_trampoline_vaddr(void);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8ae0eefc3a34..d60d24f0f2f4 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1827,7 +1827,7 @@ void uprobe_copy_process(struct task_struct *t, unsigned 
long flags)
  *
  * Returns -1 in case the xol_area is not allocated.
  */
-static unsigned long get_trampoline_vaddr(void)
+unsigned long uprobe_get_trampoline_vaddr(void)
 {
struct xol_area *area;
unsigned long trampoline_vaddr = -1;
@@ -1878,7 +1878,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, 
struct pt_regs *regs)
if (!ri)
return;
 
-   trampoline_vaddr = get_trampoline_vaddr();
+   trampoline_vaddr = uprobe_get_trampoline_vaddr();
orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, 
regs);
if (orig_ret_vaddr == -1)
goto fail;
@@ -2187,7 +2187,7 @@ static void handle_swbp(struct pt_regs *regs)
int is_swbp;
 
bp_vaddr = uprobe_get_swbp_addr(regs);
-   if (bp_vaddr == get_trampoline_vaddr())
+   if (bp_vaddr == uprobe_get_trampoline_vaddr())
return handle_trampoline(regs);
 
uprobe = find_active_uprobe(bp_vaddr, _swbp);
-- 
2.43.0

[PATCH 0/4] Fix user stack traces captured from uprobes

2024-05-08 Thread Andrii Nakryiko

This patch set reports two issues with captured stack traces.

First issue, fixed in patch #2, deals with fixing up uretprobe trampoline
addresses in captured stack trace. This issue happens when there are pending
return probes, for which kernel hijacks some of the return addresses on user
stacks. The code is matching those special uretprobe trampoline addresses with
the list of pending return probe instances and replaces them with actual
return addresses.

Second issue, which patch #3 is trying to fix with the help of heuristic, is
having to do with capturing user stack traces in entry uprobes. At the very
entrance to user function, frame pointer in rbp register is not yet setup, so
actual caller return address is still pointed to by rsp. Patch is using
a simple heuristic, looking for `push %rbp` instruction, to fetch this extra
direct caller return address, before proceeding to unwind the stack using rbp.

Consider this patch #3 an RFC, if there are better suggestions how this can be
solved, I'd be happy to hear that.

Patch #4 adds tests into BPF selftests, that validate that captured stack
traces at various points is what we expect to get. This patch, while being BPF
selftests, is isolated from any other BPF selftests changes and can go in
through non-BPF tree without the risk of merge conflicts.

Patches are based on latest linux-trace's probes/for-next branch.

Andrii Nakryiko (4):
  uprobes: rename get_trampoline_vaddr() and make it global
  perf,uprobes: fix user stack traces in the presence of pending
uretprobes
  perf,x86: avoid missing caller address in stack traces captured in
uprobe
  selftests/bpf: add test validating uprobe/uretprobe stack traces

 arch/x86/events/core.c|  20 ++
 include/linux/uprobes.h   |   3 +
 kernel/events/callchain.c |  42 +++-
 kernel/events/uprobes.c   |  17 +-
 .../bpf/prog_tests/uretprobe_stack.c  | 185 ++
 .../selftests/bpf/progs/uretprobe_stack.c |  96 +
 6 files changed, 359 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_stack.c
 create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_stack.c

-- 
2.43.0

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-08 Thread Andrew Davis


On 5/7/24 3:36 PM, Mathieu Poirier wrote:

On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:

From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
  drivers/remoteproc/Kconfig   |  13 +
  drivers/remoteproc/Makefile  |   1 +
  drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
  3 files changed, 799 insertions(+)
  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
  It's safe to say N here if you're not interested in utilizing
  the DSP slave processors.
  
+config TI_K3_M4_REMOTEPROC

+   tristate "TI K3 M4 remoteproc support"
+   depends on ARCH_K3 || COMPILE_TEST
+   select MAILBOX
+   select OMAP2PLUS_MBOX
+   help
+ Say m here to support TI's M4 remote processor subsystems
+ on various TI K3 family of SoCs through the remote processor
+ framework.
+
+ It's safe to say N here if you're not interested in utilizing
+ a remote processor.
+
  config TI_K3_R5_REMOTEPROC
tristate "TI K3 R5 remoteproc support"
depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
  obj-$(CONFIG_ST_SLIM_REMOTEPROC)  += st_slim_rproc.o
  obj-$(CONFIG_STM32_RPROC) += stm32_rproc.o
  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)+= ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)  += ti_k3_m4_remoteproc.o
  obj-$(CONFIG_TI_K3_R5_REMOTEPROC) += ti_k3_r5_remoteproc.o
  obj-$(CONFIG_XLNX_R5_REMOTEPROC)  += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..0030e509f6b5d
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ * Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+   void __iomem *cpu_addr;
+   phys_addr_t bus_addr;
+   u32 dev_addr;
+   size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+   const char *name;
+   const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ * @uses_lreset: flag to denote the need for local reset management
+ */
+struct k3_m4_rproc_dev_data {
+   const struct k3_m4_rproc_mem_data *mems;
+   u32 num_mems;
+   bool uses_lreset;
+};
+
+/**
+ * struct k3_m4_rproc - k3 remote processor driver structure
+ * @dev: cached device pointer
+ * @rproc: remoteproc device

[PATCH v4 4/4] arm64: dts: qcom: msm8976: Add WCNSS node

2024-05-08 Thread Adam Skladowski

Add node describing wireless connectivity subsystem.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 105 ++
 1 file changed, 105 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index 22a6a09a904d..a7f772485bf5 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -771,6 +771,36 @@ blsp2_i2c4_sleep: blsp2-i2c4-sleep-state {
drive-strength = <2>;
bias-disable;
};
+
+   wcss_wlan_default: wcss-wlan-default-state  {
+   wcss-wlan2-pins {
+   pins = "gpio40";
+   function = "wcss_wlan2";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan1-pins {
+   pins = "gpio41";
+   function = "wcss_wlan1";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan0-pins {
+   pins = "gpio42";
+   function = "wcss_wlan0";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan-pins {
+   pins = "gpio43", "gpio44";
+   function = "wcss_wlan";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+   };
};
 
gcc: clock-controller@180 {
@@ -1458,6 +1488,81 @@ blsp2_i2c4: i2c@7af8000 {
status = "disabled";
};
 
+   wcnss: remoteproc@a204000 {
+   compatible = "qcom,pronto-v3-pil", "qcom,pronto";
+   reg = <0x0a204000 0x2000>,
+ <0x0a202000 0x1000>,
+ <0x0a21b000 0x3000>;
+   reg-names = "ccu",
+   "dxe",
+   "pmu";
+
+   memory-region = <_fw_mem>;
+
+   interrupts-extended = < GIC_SPI 149 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog",
+ "fatal",
+ "ready",
+ "handover",
+ "stop-ack";
+
+   power-domains = < MSM8976_VDDCX>,
+   < MSM8976_VDDMX>;
+   power-domain-names = "cx", "mx";
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   pinctrl-0 = <_wlan_default>;
+   pinctrl-names = "default";
+
+   status = "disabled";
+
+   wcnss_iris: iris {
+   /* Separate chip, compatible is board-specific 
*/
+   clocks = < RPM_SMD_RF_CLK2>;
+   clock-names = "xo";
+   };
+
+   smd-edge {
+   interrupts = ;
+
+   mboxes = < 17>;
+   qcom,smd-edge = <6>;
+   qcom,remote-pid = <4>;
+
+   label = "pronto";
+
+   wcnss_ctrl: wcnss {
+   compatible = "qcom,wcnss";
+   qcom,smd-channels = "WCNSS_CTRL";
+
+   qcom,mmio = <>;
+
+   wcnss_bt: bluetooth {
+   compatible = "qcom,wcnss-bt";
+   };
+
+   wcnss_wifi: wifi {
+   compatible = "qcom,wcnss-wlan";
+
+   interrupts = ,
+

[PATCH v4 3/4] arm64: dts: qcom: msm8976: Add Adreno GPU

2024-05-08 Thread Adam Skladowski

Add Adreno GPU node.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 71 +++
 1 file changed, 71 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index b26c35796928..22a6a09a904d 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -1080,6 +1080,77 @@ mdss_dsi1_phy: phy@1a96a00 {
};
};
 
+   adreno_gpu: gpu@1c0 {
+   compatible = "qcom,adreno-510.0", "qcom,adreno";
+
+   reg = <0x01c0 0x4>;
+   reg-names = "kgsl_3d0_reg_memory";
+
+   interrupts = ;
+   interrupt-names = "kgsl_3d0_irq";
+
+   clocks = < GCC_GFX3D_OXILI_CLK>,
+< GCC_GFX3D_OXILI_AHB_CLK>,
+< GCC_GFX3D_OXILI_GMEM_CLK>,
+< GCC_GFX3D_BIMC_CLK>,
+< GCC_GFX3D_OXILI_TIMER_CLK>,
+< GCC_GFX3D_OXILI_AON_CLK>;
+   clock-names = "core",
+ "iface",
+ "mem",
+ "mem_iface",
+ "rbbmtimer",
+ "alwayson";
+
+   power-domains = < OXILI_GX_GDSC>;
+
+   iommus = <_iommu 0>;
+
+   operating-points-v2 = <_opp_table>;
+
+   status = "disabled";
+
+   gpu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-2 {
+   opp-hz = /bits/ 64 <2>;
+   required-opps = <_opp_low_svs>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-3 {
+   opp-hz = /bits/ 64 <3>;
+   required-opps = <_opp_svs>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-4 {
+   opp-hz = /bits/ 64 <4>;
+   required-opps = <_opp_nom>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-48000 {
+   opp-hz = /bits/ 64 <48000>;
+   required-opps = <_opp_nom_plus>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-54000 {
+   opp-hz = /bits/ 64 <54000>;
+   required-opps = <_opp_turbo>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-6 {
+   opp-hz = /bits/ 64 <6>;
+   required-opps = <_opp_turbo>;
+   opp-supported-hw = <0xff>;
+   };
+   };
+   };
+
apps_iommu: iommu@1ee {
compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
reg = <0x01ee 0x3000>;
-- 
2.44.0

[PATCH v4 2/4] arm64: dts: qcom: msm8976: Add MDSS nodes

2024-05-08 Thread Adam Skladowski

Add MDSS nodes to support displays on MSM8976 SoC.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 280 +-
 1 file changed, 276 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index 8bdcc1438177..b26c35796928 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -785,10 +785,10 @@ gcc: clock-controller@180 {
 
clocks = < RPM_SMD_XO_CLK_SRC>,
 < RPM_SMD_XO_A_CLK_SRC>,
-<0>,
-<0>,
-<0>,
-<0>;
+<_dsi0_phy 1>,
+<_dsi0_phy 0>,
+<_dsi1_phy 1>,
+<_dsi1_phy 0>;
clock-names = "xo",
  "xo_a",
  "dsi0pll",
@@ -808,6 +808,278 @@ tcsr: syscon@1937000 {
reg = <0x01937000 0x3>;
};
 
+   mdss: display-subsystem@1a0 {
+   compatible = "qcom,mdss";
+
+   reg = <0x01a0 0x1000>,
+ <0x01ab 0x3000>;
+   reg-names = "mdss_phys", "vbif_phys";
+
+   power-domains = < MDSS_GDSC>;
+   interrupts = ;
+
+   interrupt-controller;
+   #interrupt-cells = <1>;
+
+   clocks = < GCC_MDSS_AHB_CLK>,
+< GCC_MDSS_AXI_CLK>,
+< GCC_MDSS_VSYNC_CLK>,
+< GCC_MDSS_MDP_CLK>;
+   clock-names = "iface",
+ "bus",
+ "vsync",
+ "core";
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   status = "disabled";
+
+   mdss_mdp: display-controller@1a01000 {
+   compatible = "qcom,msm8976-mdp5", "qcom,mdp5";
+   reg = <0x01a01000 0x89000>;
+   reg-names = "mdp_phys";
+
+   interrupt-parent = <>;
+   interrupts = <0>;
+
+   clocks = < GCC_MDSS_AHB_CLK>,
+< GCC_MDSS_AXI_CLK>,
+< GCC_MDSS_MDP_CLK>,
+< GCC_MDSS_VSYNC_CLK>,
+< GCC_MDP_TBU_CLK>,
+< GCC_MDP_RT_TBU_CLK>;
+   clock-names = "iface",
+ "bus",
+ "core",
+ "vsync",
+ "tbu",
+ "tbu_rt";
+
+   operating-points-v2 = <_opp_table>;
+   power-domains = < MDSS_GDSC>;
+
+   iommus = <_iommu 22>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+
+   mdss_mdp5_intf1_out: endpoint {
+   remote-endpoint = 
<_dsi0_in>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+
+   mdss_mdp5_intf2_out: endpoint {
+   remote-endpoint = 
<_dsi1_in>;
+   };
+   };
+   };
+
+   mdp_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-17778 {
+   opp-hz = /bits/ 64 <17778>;
+   required-opps = 
<_opp_svs>;
+   };
+
+   opp-27000 {
+   opp-hz = /bits/ 64 <27000>;
+

[PATCH v4 1/4] arm64: dts: qcom: msm8976: Add IOMMU nodes

2024-05-08 Thread Adam Skladowski

Add the nodes describing the apps and gpu iommu and its context banks
that are found on msm8976 SoCs.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 81 +++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index d2bb1ada361a..8bdcc1438177 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -808,6 +808,87 @@ tcsr: syscon@1937000 {
reg = <0x01937000 0x3>;
};
 
+   apps_iommu: iommu@1ee {
+   compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
+   reg = <0x01ee 0x3000>;
+   ranges  = <0 0x01e2 0x2>;
+
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_APSS_TCU_CLK>;
+   clock-names = "iface", "bus";
+
+   qcom,iommu-secure-id = <17>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+
+   /* VFE */
+   iommu-ctx@15000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x15000 0x1000>;
+   qcom,ctx-asid = <20>;
+   interrupts = ;
+   };
+
+   /* VENUS NS */
+   iommu-ctx@16000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x16000 0x1000>;
+   qcom,ctx-asid = <21>;
+   interrupts = ;
+   };
+
+   /* MDP0 */
+   iommu-ctx@17000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x17000 0x1000>;
+   qcom,ctx-asid = <22>;
+   interrupts = ;
+   };
+   };
+
+   gpu_iommu: iommu@1f08000 {
+   compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
+   ranges = <0 0x01f08000 0x8000>;
+
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_GFX3D_TCU_CLK>;
+   clock-names = "iface", "bus";
+
+   power-domains = < OXILI_CX_GDSC>;
+
+   qcom,iommu-secure-id = <18>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+
+   /* gfx3d user */
+   iommu-ctx@0 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x0 0x1000>;
+   qcom,ctx-asid = <0>;
+   interrupts = ;
+   };
+
+   /* gfx3d secure */
+   iommu-ctx@1000 {
+   compatible = "qcom,msm-iommu-v2-sec";
+   reg = <0x1000 0x1000>;
+   qcom,ctx-asid = <2>;
+   interrupts = ;
+   };
+
+   /* gfx3d priv */
+   iommu-ctx@2000 {
+   compatible = "qcom,msm-iommu-v2-sec";
+   reg = <0x2000 0x1000>;
+   qcom,ctx-asid = <1>;
+   interrupts = ;
+   };
+   };
+
spmi_bus: spmi@200f000 {
compatible = "qcom,spmi-pmic-arb";
reg = <0x0200f000 0x1000>,
-- 
2.44.0

[PATCH v4 0/4] MSM8976 MDSS/GPU/WCNSS support

2024-05-08 Thread Adam Skladowski

This patch series provide support for display subsystem, gpu
and also adds wireless connectivity subsystem support.

Changes since v3

1. Minor styling fixes
2. Converted qcom,ipc into mailbox on wcnss patch

Changes since v2

1. Disabled mdss_dsi nodes by default
2. Changed reg size of mdss_dsi0 to be equal on both
3. Added operating points to second mdss_dsi
4. Brought back required opp-supported-hw on adreno
5. Moved status under operating points on adreno

Changes since v1

1. Addressed feedback
2. Dropped already applied dt-bindings patches
3. Dropped sdc patch as it was submitted as part of other series
4. Dropped dt-bindings patch for Adreno, also separate now

Adam Skladowski (4):
  arm64: dts: qcom: msm8976: Add IOMMU nodes
  arm64: dts: qcom: msm8976: Add MDSS nodes
  arm64: dts: qcom: msm8976: Add Adreno GPU
  arm64: dts: qcom: msm8976: Add WCNSS node

 arch/arm64/boot/dts/qcom/msm8976.dtsi | 537 +-
 1 file changed, 533 insertions(+), 4 deletions(-)

-- 
2.44.0

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-08 Thread Andrew Davis


On 5/6/24 3:46 PM, Mathieu Poirier wrote:

Good day,

I have started reviewing this patchset.  Comments will be scattered over
multiple days and as such, I will explicitly inform you when  am done with the
review.

On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:

From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
  drivers/remoteproc/Kconfig   |  13 +
  drivers/remoteproc/Makefile  |   1 +
  drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
  3 files changed, 799 insertions(+)
  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
  It's safe to say N here if you're not interested in utilizing
  the DSP slave processors.
  
+config TI_K3_M4_REMOTEPROC

+   tristate "TI K3 M4 remoteproc support"
+   depends on ARCH_K3 || COMPILE_TEST
+   select MAILBOX
+   select OMAP2PLUS_MBOX
+   help
+ Say m here to support TI's M4 remote processor subsystems
+ on various TI K3 family of SoCs through the remote processor
+ framework.
+
+ It's safe to say N here if you're not interested in utilizing
+ a remote processor.
+
  config TI_K3_R5_REMOTEPROC
tristate "TI K3 R5 remoteproc support"
depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
  obj-$(CONFIG_ST_SLIM_REMOTEPROC)  += st_slim_rproc.o
  obj-$(CONFIG_STM32_RPROC) += stm32_rproc.o
  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)+= ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)  += ti_k3_m4_remoteproc.o
  obj-$(CONFIG_TI_K3_R5_REMOTEPROC) += ti_k3_r5_remoteproc.o
  obj-$(CONFIG_XLNX_R5_REMOTEPROC)  += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..0030e509f6b5d
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ * Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+   void __iomem *cpu_addr;
+   phys_addr_t bus_addr;
+   u32 dev_addr;
+   size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+   const char *name;
+   const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ * @uses_lreset: flag to denote the need for local reset management
+ */
+struct k3_m4_rproc_dev_data {
+   const struct k3_m4_rproc_mem_data *mems;
+   u32

[PATCH net-next v3 07/13] mm: page_frag: avoid caller accessing 'page_frag_cache' directly

2024-05-08 Thread Yunsheng Lin

Use appropriate frag_page API instead of caller accessing
'page_frag_cache' directly.

CC: Alexander Duyck 
Signed-off-by: Yunsheng Lin 
---
 drivers/vhost/net.c |  2 +-
 include/linux/page_frag_cache.h | 10 ++
 mm/page_frag_test.c |  2 +-
 net/core/skbuff.c   |  6 +++---
 net/rxrpc/conn_object.c |  4 +---
 net/rxrpc/local_object.c|  4 +---
 net/sunrpc/svcsock.c|  6 ++
 7 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6691fac01e0d..b2737dc0dc50 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1325,7 +1325,7 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
vqs[VHOST_NET_VQ_RX]);
 
f->private_data = n;
-   n->pf_cache.va = NULL;
+   page_frag_cache_init(>pf_cache);
 
return 0;
 }
diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
index a5747cf7a3a1..024ff73a7ea4 100644
--- a/include/linux/page_frag_cache.h
+++ b/include/linux/page_frag_cache.h
@@ -23,6 +23,16 @@ struct page_frag_cache {
bool pfmemalloc;
 };
 
+static inline void page_frag_cache_init(struct page_frag_cache *nc)
+{
+   nc->va = NULL;
+}
+
+static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc)
+{
+   return !!nc->pfmemalloc;
+}
+
 void page_frag_cache_drain(struct page_frag_cache *nc);
 void __page_frag_cache_drain(struct page *page, unsigned int count);
 void *__page_frag_alloc_va_align(struct page_frag_cache *nc,
diff --git a/mm/page_frag_test.c b/mm/page_frag_test.c
index 92eb288aab75..8a974d0588bf 100644
--- a/mm/page_frag_test.c
+++ b/mm/page_frag_test.c
@@ -329,7 +329,7 @@ static int __init page_frag_test_init(void)
u64 duration;
int ret;
 
-   test_frag.va = NULL;
+   page_frag_cache_init(_frag);
atomic_set(, 2);
init_completion();
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dca4e7445348..caee22db1cc7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -741,12 +741,12 @@ struct sk_buff *__netdev_alloc_skb(struct net_device 
*dev, unsigned int len,
if (in_hardirq() || irqs_disabled()) {
nc = this_cpu_ptr(_alloc_cache);
data = page_frag_alloc_va(nc, len, gfp_mask);
-   pfmemalloc = nc->pfmemalloc;
+   pfmemalloc = page_frag_cache_is_pfmemalloc(nc);
} else {
local_bh_disable();
nc = this_cpu_ptr(_alloc_cache.page);
data = page_frag_alloc_va(nc, len, gfp_mask);
-   pfmemalloc = nc->pfmemalloc;
+   pfmemalloc = page_frag_cache_is_pfmemalloc(nc);
local_bh_enable();
}
 
@@ -834,7 +834,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, 
unsigned int len)
len = SKB_HEAD_ALIGN(len);
 
data = page_frag_alloc_va(>page, len, gfp_mask);
-   pfmemalloc = nc->page.pfmemalloc;
+   pfmemalloc = page_frag_cache_is_pfmemalloc(>page);
}
 
if (unlikely(!data))
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 1539d315afe7..694c4df7a1a3 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -337,9 +337,7 @@ static void rxrpc_clean_up_connection(struct work_struct 
*work)
 */
rxrpc_purge_queue(>rx_queue);
 
-   if (conn->tx_data_alloc.va)
-   __page_frag_cache_drain(virt_to_page(conn->tx_data_alloc.va),
-   conn->tx_data_alloc.pagecnt_bias);
+   page_frag_cache_drain(>tx_data_alloc);
call_rcu(>rcu, rxrpc_rcu_free_connection);
 }
 
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 504453c688d7..a8cffe47cf01 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -452,9 +452,7 @@ void rxrpc_destroy_local(struct rxrpc_local *local)
 #endif
rxrpc_purge_queue(>rx_queue);
rxrpc_purge_client_connections(local);
-   if (local->tx_alloc.va)
-   __page_frag_cache_drain(virt_to_page(local->tx_alloc.va),
-   local->tx_alloc.pagecnt_bias);
+   page_frag_cache_drain(>tx_alloc);
 }
 
 /*
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 42d20412c1c3..4b1e87187614 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1609,7 +1609,6 @@ static void svc_tcp_sock_detach(struct svc_xprt *xprt)
 static void svc_sock_free(struct svc_xprt *xprt)
 {
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
-   struct page_frag_cache *pfc = >sk_frag_cache;
struct socket *sock = svsk->sk_sock;
 
trace_svcsock_free(svsk, sock);
@@ -1619,8 +1618,7 @@ static void svc_sock_free(struct svc_xprt *xprt)
sockfd_put(sock);
else
sock_release(sock);
-   if (pfc->va)
-

[PATCH net-next v3 06/13] mm: page_frag: add '_va' suffix to page_frag API

2024-05-08 Thread Yunsheng Lin

Currently the page_frag API is returning 'virtual address'
or 'va' when allocing and expecting 'virtual address' or
'va' as input when freeing.

As we are about to support new use cases that the caller
need to deal with 'struct page' or need to deal with both
'va' and 'struct page'. In order to differentiate the API
handling between 'va' and 'struct page', add '_va' suffix
to the corresponding API mirroring the page_pool_alloc_va()
API of the page_pool. So that callers expecting to deal with
va, page or both va and page may call page_frag_alloc_va*,
page_frag_alloc_pg*, or page_frag_alloc* API accordingly.

CC: Alexander Duyck 
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/google/gve/gve_rx.c  |  4 ++--
 drivers/net/ethernet/intel/ice/ice_txrx.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  2 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  4 ++--
 .../marvell/octeontx2/nic/otx2_common.c   |  2 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c|  4 ++--
 drivers/nvme/host/tcp.c   |  8 +++
 drivers/nvme/target/tcp.c | 22 +--
 drivers/vhost/net.c   |  6 ++---
 include/linux/page_frag_cache.h   | 21 +-
 include/linux/skbuff.h|  2 +-
 kernel/bpf/cpumap.c   |  2 +-
 mm/page_frag_cache.c  | 12 +-
 mm/page_frag_test.c   | 11 +-
 net/core/skbuff.c | 18 +++
 net/core/xdp.c|  2 +-
 net/rxrpc/txbuf.c | 15 +++--
 net/sunrpc/svcsock.c  |  6 ++---
 19 files changed, 74 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_rx.c 
b/drivers/net/ethernet/google/gve/gve_rx.c
index acb73d4d0de6..b6c10100e462 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -729,7 +729,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct 
gve_rx_ring *rx,
 
total_len = headroom + SKB_DATA_ALIGN(len) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-   frame = page_frag_alloc(>page_cache, total_len, GFP_ATOMIC);
+   frame = page_frag_alloc_va(>page_cache, total_len, GFP_ATOMIC);
if (!frame) {
u64_stats_update_begin(>statss);
rx->xdp_alloc_fails++;
@@ -742,7 +742,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct 
gve_rx_ring *rx,
 
err = xdp_do_redirect(dev, , xdp_prog);
if (err)
-   page_frag_free(frame);
+   page_frag_free_va(frame);
 
return err;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c 
b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 8bb743f78fcb..399b317c509d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -126,7 +126,7 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct 
ice_tx_buf *tx_buf)
dev_kfree_skb_any(tx_buf->skb);
break;
case ICE_TX_BUF_XDP_TX:
-   page_frag_free(tx_buf->raw_buf);
+   page_frag_free_va(tx_buf->raw_buf);
break;
case ICE_TX_BUF_XDP_XMIT:
xdp_return_frame(tx_buf->xdpf);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h 
b/drivers/net/ethernet/intel/ice/ice_txrx.h
index feba314a3fe4..6379f57d8228 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -148,7 +148,7 @@ static inline int ice_skb_pad(void)
  * @ICE_TX_BUF_DUMMY: dummy Flow Director packet, unmap and kfree()
  * @ICE_TX_BUF_FRAG: mapped skb OR _buff frag, only unmap DMA
  * @ICE_TX_BUF_SKB: _buff, unmap and consume_skb(), update stats
- * @ICE_TX_BUF_XDP_TX: _buff, unmap and page_frag_free(), stats
+ * @ICE_TX_BUF_XDP_TX: _buff, unmap and page_frag_free_va(), stats
  * @ICE_TX_BUF_XDP_XMIT: _frame, unmap and xdp_return_frame(), stats
  * @ICE_TX_BUF_XSK_TX: _buff on XSk queue, xsk_buff_free(), stats
  */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c 
b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 2719f0e20933..a1a41a14df0d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -250,7 +250,7 @@ ice_clean_xdp_tx_buf(struct device *dev, struct ice_tx_buf 
*tx_buf,
 
switch (tx_buf->type) {
case ICE_TX_BUF_XDP_TX:
-   page_frag_free(tx_buf->raw_buf);
+   page_frag_free_va(tx_buf->raw_buf);
break;
case ICE_TX_BUF_XDP_XMIT:
xdp_return_frame_bulk(tx_buf->xdpf, bq);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index

Re: [PATCH v4 1/2] ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthresh

2024-05-08 Thread patchwork-bot+netdevbpf

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller :

On Mon,  6 May 2024 16:14:43 +0200 you wrote:
> Cc: Julian Anastasov 
> Cc: Simon Horman 
> Cc: Pablo Neira Ayuso 
> Cc: Jozsef Kadlecsik 
> Cc: Florian Westphal 
> Suggested-by: Julian Anastasov 
> Signed-off-by: Alexander Mikhalitsyn 
> 
> [...]

Here is the summary with links:
  - [v4,1/2] ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthresh
https://git.kernel.org/netdev/net-next/c/643bb5dbaef7
  - [v4,2/2] ipvs: allow some sysctls in non-init user namespaces
https://git.kernel.org/netdev/net-next/c/2b696a2a101d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

[PATCH] clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents

2024-05-08 Thread Luca Weiss

Both gpll6 and gpll7 are parented to CXO at 19.2 MHz and not to GPLL0
which runs at 600 MHz. Also gpll6_out_even should have the parent gpll6
and not gpll0.

Adjust the parents of these clocks to make Linux report the correct rate
and not absurd numbers like gpll7 at ~25 GHz or gpll6 at 24 GHz.

Corrected rates are the following:

  gpll7  80702 Hz
  gpll6  76800 Hz
 gpll6_out_even  38400 Hz
  gpll0  6 Hz
 gpll0_out_odd   2 Hz
 gpll0_out_even  3 Hz

And because gpll6 is the parent of gcc_sdcc2_apps_clk_src (at 202 MHz)
that clock also reports the correct rate now and avoids this warning:

  [5.984062] mmc0: Card appears overclocked; req 20200 Hz, actual 
6312499237 Hz

Fixes: 131abae905df ("clk: qcom: Add SM6350 GCC driver")
Signed-off-by: Luca Weiss 
---
 drivers/clk/qcom/gcc-sm6350.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/qcom/gcc-sm6350.c b/drivers/clk/qcom/gcc-sm6350.c
index cf4a7b6e0b23..0559a33faf00 100644
--- a/drivers/clk/qcom/gcc-sm6350.c
+++ b/drivers/clk/qcom/gcc-sm6350.c
@@ -100,8 +100,8 @@ static struct clk_alpha_pll gpll6 = {
.enable_mask = BIT(6),
.hw.init = &(struct clk_init_data){
.name = "gpll6",
-   .parent_hws = (const struct clk_hw*[]){
-   ,
+   .parent_data = &(const struct clk_parent_data){
+   .fw_name = "bi_tcxo",
},
.num_parents = 1,
.ops = _alpha_pll_fixed_fabia_ops,
@@ -124,7 +124,7 @@ static struct clk_alpha_pll_postdiv gpll6_out_even = {
.clkr.hw.init = &(struct clk_init_data){
.name = "gpll6_out_even",
.parent_hws = (const struct clk_hw*[]){
-   ,
+   ,
},
.num_parents = 1,
.ops = _alpha_pll_postdiv_fabia_ops,
@@ -139,8 +139,8 @@ static struct clk_alpha_pll gpll7 = {
.enable_mask = BIT(7),
.hw.init = &(struct clk_init_data){
.name = "gpll7",
-   .parent_hws = (const struct clk_hw*[]){
-   ,
+   .parent_data = &(const struct clk_parent_data){
+   .fw_name = "bi_tcxo",
},
.num_parents = 1,
.ops = _alpha_pll_fixed_fabia_ops,

---
base-commit: dd5a440a31fae6e459c0d627162825505361
change-id: 20240508-sm6350-gpll-fix-a308bb393434

Best regards,
-- 
Luca Weiss

inconsistent lock state in __mmap_lock_do_trace_released

2024-05-08 Thread Ubisectech Sirius

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:


WARNING: inconsistent lock state
6.7.0 #2 Not tainted

inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
modprobe/17218 [HC1[1]:SC0[0]:HE0:SE1] takes:
88802c6376a0 (lock#13){?.+.}-{2:2}, at: local_lock_acquire 
include/linux/local_lock_internal.h:29 [inline]
88802c6376a0 (lock#13){?.+.}-{2:2}, at: 
__mmap_lock_do_trace_released+0x7b/0x740 mm/mmap_lock.c:243
{HARDIRQ-ON-W} state was registered at:
  lock_acquire kernel/locking/lockdep.c:5754 [inline]
  lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5719
  local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
  __mmap_lock_do_trace_acquire_returned+0x96/0x740 mm/mmap_lock.c:237
  __mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline]
  mmap_read_trylock include/linux/mmap_lock.h:166 [inline]
  get_mmap_lock_carefully mm/memory.c:5372 [inline]
  lock_mm_and_find_vma+0xf1/0x5a0 mm/memory.c:5432
  do_user_addr_fault+0x390/0x1010 arch/x86/mm/fault.c:1387
  handle_page_fault arch/x86/mm/fault.c:1507 [inline]
  exc_page_fault+0x99/0x180 arch/x86/mm/fault.c:1563
  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
  __put_user_8+0x11/0x20 arch/x86/lib/putuser.S:105
  clear_rseq_cs kernel/rseq.c:257 [inline]
  rseq_ip_fixup kernel/rseq.c:291 [inline]
  __rseq_handle_notify_resume+0xd50/0x1000 kernel/rseq.c:329
  rseq_handle_notify_resume include/linux/sched.h:2361 [inline]
  resume_user_mode_work include/linux/resume_user_mode.h:61 [inline]
  exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
  exit_to_user_mode_prepare+0x170/0x240 kernel/entry/common.c:204
  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
  syscall_exit_to_user_mode+0x1e/0x60 kernel/entry/common.c:296
  do_syscall_64+0x53/0x120 arch/x86/entry/common.c:89
  entry_SYSCALL_64_after_hwframe+0x6f/0x77
irq event stamp: 696
hardirqs last  enabled at (695): [] 
flush_tlb_mm_range+0x26b/0x340 arch/x86/mm/tlb.c:1035
hardirqs last disabled at (696): [] sysvec_irq_work+0x18/0xf0 
arch/x86/kernel/irq_work.c:17
softirqs last  enabled at (244): [] local_bh_enable 
include/linux/bottom_half.h:33 [inline]
softirqs last  enabled at (244): [] fpregs_unlock 
arch/x86/include/asm/fpu/api.h:80 [inline]
softirqs last  enabled at (244): [] fpu_reset_fpregs 
arch/x86/kernel/fpu/core.c:733 [inline]
softirqs last  enabled at (244): [] 
fpu_flush_thread+0x309/0x400 arch/x86/kernel/fpu/core.c:777
softirqs last disabled at (242): [] local_bh_disable 
include/linux/bottom_half.h:20 [inline]
softirqs last disabled at (242): [] fpregs_lock 
arch/x86/include/asm/fpu/api.h:72 [inline]
softirqs last disabled at (242): [] fpu_reset_fpregs 
arch/x86/kernel/fpu/core.c:716 [inline]
softirqs last disabled at (242): [] 
fpu_flush_thread+0x23a/0x400 arch/x86/kernel/fpu/core.c:777

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(lock#13);
  
lock(lock#13);

 *** DEADLOCK ***

3 locks held by modprobe/17218:
 #0: 8880200de658 (>vm_lock->lock){}-{3:3}, at: vma_start_read 
include/linux/mm.h:663 [inline]
 #0: 8880200de658 (>vm_lock->lock){}-{3:3}, at: 
lock_vma_under_rcu+0x1e1/0x960 mm/memory.c:5501
 #1: 8d3a9f60 (rcu_read_lock){}-{1:2}, at: rcu_lock_acquire 
include/linux/rcupdate.h:301 [inline]
 #1: 8d3a9f60 (rcu_read_lock){}-{1:2}, at: rcu_read_lock 
include/linux/rcupdate.h:747 [inline]
 #1: 8d3a9f60 (rcu_read_lock){}-{1:2}, at: 
__pte_offset_map+0x42/0x570 mm/pgtable-generic.c:285
 #2: 8880491b0258 (ptlock_ptr(ptdesc)#2){+.+.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:351 [inline]
 #2: 8880491b0258 (ptlock_ptr(ptdesc)#2){+.+.}-{2:2}, at: 
__pte_offset_map_lock+0x10d/0x2f0 mm/pgtable-generic.c:373

stack backtrace:
CPU: 0 PID: 17218 Comm: modprobe Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 print_usage_bug kernel/locking/lockdep.c:3971 [inline]
 valid_state kernel/locking/lockdep.c:4013 [inline]
 mark_lock_irq kernel/locking/lockdep.c:4216 [inline]
 mark_lock+0x99b/0xd60 kernel/locking/lockdep.c:4678
 mark_usage kernel/locking/lockdep.c:4564 [inline]
 __lock_acquire+0x1339/0x3bb0 kernel/locking/lockdep.c:5091
 lock_acquire kernel/locking/lockdep.c:5754 [inline]
 lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5719
 local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
 __mmap_lock_do_trace_released+0x93/0x740 mm/mmap_lock.c:243
 __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
 mmap_read_unlock_non_owner include/linux/mmap_lock.h:178 [inline]

Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-05-08 Thread maobibo





On 2024/5/8 下午1:00, Huacai Chen wrote:

On Tue, May 7, 2024 at 11:06 AM maobibo  wrote:




On 2024/5/7 上午10:05, Huacai Chen wrote:

On Tue, May 7, 2024 at 9:40 AM maobibo  wrote:




On 2024/5/6 下午10:17, Huacai Chen wrote:

On Mon, May 6, 2024 at 6:05 PM maobibo  wrote:




On 2024/5/6 下午5:40, Huacai Chen wrote:

On Mon, May 6, 2024 at 5:35 PM maobibo  wrote:




On 2024/5/6 下午4:59, Huacai Chen wrote:

On Mon, May 6, 2024 at 4:18 PM maobibo  wrote:




On 2024/5/6 下午3:06, Huacai Chen wrote:

Hi, Bibo,

On Mon, May 6, 2024 at 2:36 PM maobibo  wrote:




On 2024/5/6 上午9:49, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpus cannot be the same.

Different irqchips have different size declaration about physical cpuid,
max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
for MSI irqchip.

The smallest value from all interrupt controllers is selected now,
and the max cpuid size is defines as 256 by KVM which comes from
extioi irqchip.

Signed-off-by: Bibo Mao 
---
arch/loongarch/include/asm/kvm_host.h | 26 
arch/loongarch/include/asm/kvm_vcpu.h |  1 +
arch/loongarch/kvm/vcpu.c | 93 ++-
arch/loongarch/kvm/vm.c   | 11 
4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..3ba16ef1fe69 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {

#define MAX_PGTABLE_LEVELS 4

+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ *  For IPI HW, max dest CPUID size 1024
+ *  For extioi interrupt controller, max dest CPUID size is 256
+ *  For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID  256
+
+struct kvm_phyid_info {
+   struct kvm_vcpu *vcpu;
+   boolenabled;
+};
+
+struct kvm_phyid_map {
+   int max_phyid;
+   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
struct kvm_arch {
   /* Guest physical mm */
   kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
   unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
   unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
   unsigned int  root_level;
+   spinlock_tphyid_map_lock;
+   struct kvm_phyid_map  *phyid_map;

   s64 time_offset;
   struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
void kvm_restore_timer(struct kvm_vcpu *vcpu);

int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct 
kvm_interrupt *irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);

/*
 * Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 3a8779065f73..b633fd28b8db 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 *val)
   return 0;
}

+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   if (val >= KVM_MAX_PHYID)
+   return -EINVAL;
+
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   map = vcpu->kvm->arch.phyid_map;
+   spin_lock(>kvm->arch.phyid_map_lock);
+   if (map->phys_map[cpuid].enabled) {
+   /*
+* Cpuid is already set before
+* Forbid changing different cpuid at runtime
+*/
+   if (cpuid != val) {
+   /*
+* Cpuid 0 is initial value for vcpu, maybe invalid
+* unset value for vcpu
+*/
+   if (cpuid) {
+   spin_unlock(>kvm->arch.phyid_map_lock);
+

Re: [PATCH 1/1] livepatch: Rename KLP_* to KLP_TRANSITION_*

2024-05-07 Thread Josh Poimboeuf

On Tue, May 07, 2024 at 01:01:11PM +0800, zhangwar...@gmail.com wrote:
> From: Wardenjohn 
> 
> The original macros of KLP_* is about the state of the transition.
> Rename macros of KLP_* to KLP_TRANSITION_* to fix the confusing
> description of klp transition state.
> 
> Signed-off-by: Wardenjohn 

Acked-by: Josh Poimboeuf 

-- 
Josh

Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-05-07 Thread Huacai Chen

On Tue, May 7, 2024 at 11:06 AM maobibo  wrote:
>
>
>
> On 2024/5/7 上午10:05, Huacai Chen wrote:
> > On Tue, May 7, 2024 at 9:40 AM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午10:17, Huacai Chen wrote:
> >>> On Mon, May 6, 2024 at 6:05 PM maobibo  wrote:
> 
> 
> 
>  On 2024/5/6 下午5:40, Huacai Chen wrote:
> > On Mon, May 6, 2024 at 5:35 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午4:59, Huacai Chen wrote:
> >>> On Mon, May 6, 2024 at 4:18 PM maobibo  wrote:
> 
> 
> 
>  On 2024/5/6 下午3:06, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Mon, May 6, 2024 at 2:36 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  
> >>> wrote:
> 
>  Physical cpuid is used for interrupt routing for irqchips such as
>  ipi/msi/extioi interrupt controller. And physical cpuid is stored
>  at CSR register LOONGARCH_CSR_CPUID, it can not be changed once 
>  vcpu
>  is created and physical cpuid of two vcpus cannot be the same.
> 
>  Different irqchips have different size declaration about 
>  physical cpuid,
>  max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, 
>  max cpuid
>  supported by IPI hardware is 1024, 256 for extioi irqchip, and 
>  65536
>  for MSI irqchip.
> 
>  The smallest value from all interrupt controllers is selected 
>  now,
>  and the max cpuid size is defines as 256 by KVM which comes from
>  extioi irqchip.
> 
>  Signed-off-by: Bibo Mao 
>  ---
> arch/loongarch/include/asm/kvm_host.h | 26 
> arch/loongarch/include/asm/kvm_vcpu.h |  1 +
> arch/loongarch/kvm/vcpu.c | 93 
>  ++-
> arch/loongarch/kvm/vm.c   | 11 
> 4 files changed, 130 insertions(+), 1 deletion(-)
> 
>  diff --git a/arch/loongarch/include/asm/kvm_host.h 
>  b/arch/loongarch/include/asm/kvm_host.h
>  index 2d62f7b0d377..3ba16ef1fe69 100644
>  --- a/arch/loongarch/include/asm/kvm_host.h
>  +++ b/arch/loongarch/include/asm/kvm_host.h
>  @@ -64,6 +64,30 @@ struct kvm_world_switch {
> 
> #define MAX_PGTABLE_LEVELS 4
> 
>  +/*
>  + * Physical cpu id is used for interrupt routing, there are 
>  different
>  + * definitions about physical cpuid on different hardwares.
>  + *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>  + *  For IPI HW, max dest CPUID size 1024
>  + *  For extioi interrupt controller, max dest CPUID size is 256
>  + *  For MSI interrupt controller, max supported CPUID size is 
>  65536
>  + *
>  + * Currently max CPUID is defined as 256 for KVM hypervisor, in 
>  future
>  + * it will be expanded to 4096, including 16 packages at most. 
>  And every
>  + * package supports at most 256 vcpus
>  + */
>  +#define KVM_MAX_PHYID  256
>  +
>  +struct kvm_phyid_info {
>  +   struct kvm_vcpu *vcpu;
>  +   boolenabled;
>  +};
>  +
>  +struct kvm_phyid_map {
>  +   int max_phyid;
>  +   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>  +};
>  +
> struct kvm_arch {
>    /* Guest physical mm */
>    kvm_pte_t *pgd;
>  @@ -71,6 +95,8 @@ struct kvm_arch {
>    unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>    unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
>    unsigned int  root_level;
>  +   spinlock_tphyid_map_lock;
>  +   struct kvm_phyid_map  *phyid_map;
> 
>    s64 time_offset;
>    struct kvm_context __percpu *vmcs;
>  diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
>  b/arch/loongarch/include/asm/kvm_vcpu.h
>  index 0cb4fdb8a9b5..9f53950959da 100644
>  --- a/arch/loongarch/include/asm/kvm_vcpu.h
>  +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>  @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>

Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue

2024-05-07 Thread Jason Wang

On Sat, May 4, 2024 at 2:10 AM Joseph Salisbury
 wrote:
>
> Hi Feng,
>
> During testing, a kernel bug was identified with the suspend/resume
> functionality on instances running in a public cloud [0].  This bug is a
> regression introduced in v6.8-rc1.  After a kernel bisect, the following
> commit was identified as the cause of the regression:
>
> fd27ef6b44be  ("virtio-pci: Introduce admin virtqueue")

Have a quick glance at the patch it seems it should not damage the
freeze/restore as it should behave as in the past.

But I found something interesting:

1) assumes 1 admin vq which is not what spec said
2) special function for admin virtqueue during freeze/restore, but it
doesn't do anything special than del_vq()
3) lack real users but I guess e.g the destroy_avq() needs to be
synchronized with the one that is using admin virtqueue

>
> I was hoping to get your feedback, since you are the patch author. Do
> you think gathering any additional data will help diagnose this issue?

Yes, please show us

1) the kernel log here.
2) the features that the device has like
/sys/bus/virtio/devices/virtio0/features

> This commit is depended upon by other virtio commits, so a revert test
> is not really straight forward without reverting all the dependencies.
> Any ideas you have would be greatly appreciated.

Thanks

>
>
> Thanks,
>
> Joe
>
> http://pad.lv/2063315
>

Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-07 Thread Steven Rostedt

On Tue, 30 Apr 2024 12:13:51 +0100
Vincent Donnefort  wrote:

> +#ifdef CONFIG_MMU
> +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> + struct vm_area_struct *vma)
> +{
> + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
> + unsigned int subbuf_pages, subbuf_order;
> + struct page **pages;
> + int p = 0, s = 0;
> + int err;
> +
> + /* Refuse MP_PRIVATE or writable mappings */
> + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> + !(vma->vm_flags & VM_MAYSHARE))
> + return -EPERM;
> +
> + /*
> +  * Make sure the mapping cannot become writable later. Also tell the VM
> +  * to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). Finally,
> +  * prevent migration, GUP and dump (VM_IO).
> +  */
> + vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, VM_MAYWRITE);

Do we really need the VM_IO?

When testing this in gdb, I would get:

(gdb) p tmap->map->subbuf_size
Cannot access memory at address 0x77fc2008

It appears that you can't ptrace IO memory. When I removed that flag,
gdb has no problem reading that memory.

I think we should drop that flag.

Can you send a v23 with that removed, Shuah's update, and also the
change below:

> +
> + lockdep_assert_held(_buffer->mapping_lock);
> +
> + subbuf_order = cpu_buffer->buffer->subbuf_order;
> + subbuf_pages = 1 << subbuf_order;
> +
> + nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */
> + nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */
> +
> + vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> + if (!vma_pages || vma_pages > nr_pages)
> + return -EINVAL;
> +
> + nr_pages = vma_pages;
> +
> + pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL);
> + if (!pages)
> + return -ENOMEM;
> +
> + if (!pgoff) {
> + pages[p++] = virt_to_page(cpu_buffer->meta_page);
> +
> + /*
> +  * TODO: Align sub-buffers on their size, once
> +  * vm_insert_pages() supports the zero-page.
> +  */
> + } else {
> + /* Skip the meta-page */
> + pgoff--;
> +
> + if (pgoff % subbuf_pages) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + s += pgoff / subbuf_pages;
> + }
> +
> + while (s < nr_subbufs && p < nr_pages) {
> + struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]);
> + int off = 0;
> +
> + for (; off < (1 << (subbuf_order)); off++, page++) {
> + if (p >= nr_pages)
> + break;
> +
> + pages[p++] = page;
> + }
> + s++;
> + }

The above can be made to:

while (p < nr_pages) {
struct page *page;
int off = 0;

if (WARN_ON_ONCE(s >= nr_subbufs))
break;

page = virt_to_page(cpu_buffer->subbuf_ids[s]);
for (; off < (1 << (subbuf_order)); off++, page++) {
if (p >= nr_pages)
break;

pages[p++] = page;
}
s++;
}

Thanks.

-- Steve

> +
> + err = vm_insert_pages(vma, vma->vm_start, pages, _pages);
> +
> +out:
> + kfree(pages);
> +
> + return err;
> +}
> +#else
> +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> + struct vm_area_struct *vma)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif

Re: [PATCH RESEND v8 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-05-07 Thread Google

On Sun,  5 May 2024 19:06:19 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (IBM)" 
> 
> Several architectures override module_alloc() only to define address
> range for code allocations different than VMALLOC address space.
> 
> Provide a generic implementation in execmem that uses the parameters for
> address space ranges, required alignment and page protections provided
> by architectures.
> 
> The architectures must fill execmem_info structure and implement
> execmem_arch_setup() that returns a pointer to that structure. This way the
> execmem initialization won't be called from every architecture, but rather
> from a central place, namely a core_initcall() in execmem.
> 
> The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> with the parameters defined by the architectures.  If an architecture does
> not implement execmem_arch_setup(), execmem_alloc() will fall back to
> module_alloc().
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Song Liu 
> ---
>  arch/loongarch/kernel/module.c | 19 --
>  arch/mips/kernel/module.c  | 20 --
>  arch/nios2/kernel/module.c | 21 ---
>  arch/parisc/kernel/module.c| 24 
>  arch/riscv/kernel/module.c | 24 
>  arch/sparc/kernel/module.c | 20 --
>  include/linux/execmem.h| 47 
>  mm/execmem.c   | 67 --
>  mm/mm_init.c   |  2 +
>  9 files changed, 210 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
> index c7d0338d12c1..ca6dd7ea1610 100644
> --- a/arch/loongarch/kernel/module.c
> +++ b/arch/loongarch/kernel/module.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
> *strtab,
>   return 0;
>  }
>  
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
> __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,
> + .end= MODULES_END,
> + .pgprot = PAGE_KERNEL,
> + .alignment = 1,
> + },
> + },
> + };
> +
> + return _info;
>  }
>  
>  static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
> diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
> index 9a6c96014904..59225a3cf918 100644
> --- a/arch/mips/kernel/module.c
> +++ b/arch/mips/kernel/module.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  struct mips_hi16 {
> @@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
>  static DEFINE_SPINLOCK(dbe_lock);
>  
>  #ifdef MODULES_VADDR
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,
> + .end= MODULES_END,
> + .pgprot = PAGE_KERNEL,
> + .alignment = 1,
> + },
> + },
> + };
> +
> + return _info;
>  }
>  #endif
>  
> diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
> index 9c97b7513853..0d1ee86631fc 100644
> --- a/arch/nios2/kernel/module.c
> +++ b/arch/nios2/kernel/module.c
> @@ -18,15 +18,26 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> -void *module_alloc(unsigned long size)
> +static struct execmem_info execmem_info __ro_after_init;
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC,
> - VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> - __builtin_return_address(0));
> + execmem_info = (struct execmem_info){
> + .ranges = {
> + [EXECMEM_DEFAULT] = {
> + .start  = MODULES_VADDR,
> +

Re: [PATCH RESEND v8 05/16] module: make module_memory_{alloc,free} more self-contained

2024-05-07 Thread Google

On Sun,  5 May 2024 19:06:17 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (IBM)" 
> 
> Move the logic related to the memory allocation and freeing into
> module_memory_alloc() and module_memory_free().
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks,

> Signed-off-by: Mike Rapoport (IBM) 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  kernel/module/main.c | 64 +++-
>  1 file changed, 39 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index e1e8a7a9d6c1..5b82b069e0d3 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type 
> type)
>   mod_mem_type_is_core_data(type);
>  }
>  
> -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
> +static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
>  {
> + unsigned int size = PAGE_ALIGN(mod->mem[type].size);
> + void *ptr;
> +
> + mod->mem[type].size = size;
> +
>   if (mod_mem_use_vmalloc(type))
> - return vzalloc(size);
> - return module_alloc(size);
> + ptr = vmalloc(size);
> + else
> + ptr = module_alloc(size);
> +
> + if (!ptr)
> + return -ENOMEM;
> +
> + /*
> +  * The pointer to these blocks of memory are stored on the module
> +  * structure and we keep that around so long as the module is
> +  * around. We only free that memory when we unload the module.
> +  * Just mark them as not being a leak then. The .init* ELF
> +  * sections *do* get freed after boot so we *could* treat them
> +  * slightly differently with kmemleak_ignore() and only grey
> +  * them out as they work as typical memory allocations which
> +  * *do* eventually get freed, but let's just keep things simple
> +  * and avoid *any* false positives.
> +  */
> + kmemleak_not_leak(ptr);
> +
> + memset(ptr, 0, size);
> + mod->mem[type].base = ptr;
> +
> + return 0;
>  }
>  
> -static void module_memory_free(void *ptr, enum mod_mem_type type)
> +static void module_memory_free(struct module *mod, enum mod_mem_type type)
>  {
> + void *ptr = mod->mem[type].base;
> +
>   if (mod_mem_use_vmalloc(type))
>   vfree(ptr);
>   else
> @@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
>   /* Free lock-classes; relies on the preceding sync_rcu(). */
>   lockdep_free_key_range(mod_mem->base, mod_mem->size);
>   if (mod_mem->size)
> - module_memory_free(mod_mem->base, type);
> + module_memory_free(mod, type);
>   }
>  
>   /* MOD_DATA hosts mod, so free it at last */
>   lockdep_free_key_range(mod->mem[MOD_DATA].base, 
> mod->mem[MOD_DATA].size);
> - module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
> + module_memory_free(mod, MOD_DATA);
>  }
>  
>  /* Free a module, remove from lists, etc. */
> @@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
> struct load_info *info)
>  static int move_module(struct module *mod, struct load_info *info)
>  {
>   int i;
> - void *ptr;
>   enum mod_mem_type t = 0;
>   int ret = -ENOMEM;
>  
> @@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   mod->mem[type].base = NULL;
>   continue;
>   }
> - mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
> - ptr = module_memory_alloc(mod->mem[type].size, type);
> - /*
> - * The pointer to these blocks of memory are stored on the 
> module
> - * structure and we keep that around so long as the module is
> - * around. We only free that memory when we unload the 
> module.
> - * Just mark them as not being a leak then. The .init* ELF
> - * sections *do* get freed after boot so we *could* treat 
> them
> - * slightly differently with kmemleak_ignore() and only grey
> - * them out as they work as typical memory allocations which
> - * *do* eventually get freed, but let's just keep things 
> simple
> - * and avoid *any* false positives.
> -  */
> - kmemleak_not_leak(ptr);
> - if (!ptr) {
> +
> + ret = module_memory_alloc(mod, type);
> + if (ret) {
>   t = type;
>   goto out_enomem;
>   }
> - memset(ptr, 0, mod->mem[type].size);
> - mod->mem[type].base = ptr;
>   }
>  
>   /* Transfer each section which specifies SHF_ALLOC */
> @@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
> load_info *info)
>   return 0;
>  out_enomem:
>

Re: [PATCH] mm: Remove mm argument from mm_get_unmapped_area()

2024-05-07 Thread Jarkko Sakkinen

On Mon May 6, 2024 at 7:07 PM EEST, Rick Edgecombe wrote:
> Recently the get_unmapped_area() pointer on mm_struct was removed in
> favor of direct callable function that can determines which of two
> handlers to call based on an mm flag. This function,
> mm_get_unmapped_area(), checks the flag of the mm passed as an argument.
>
> Dan Williams pointed out (see link) that all callers pass curret->mm, so
> the mm argument is unneeded. It could be conceivable for a caller to want
> to pass a different mm in the future, but in this case a new helper could
> easily be added.
>
> So remove the mm argument, and rename the function
> current_get_unmapped_area().
>
> Fixes: 529ce23a764f ("mm: switch mm->get_unmapped_area() to a flag")
> Suggested-by: Dan Williams 
> Signed-off-by: Rick Edgecombe 
> Link: 
> https://lore.kernel.org/lkml/6603bed6662a_4a98a29...@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> ---
> Based on linux-next.
> ---
>  arch/sparc/kernel/sys_sparc_64.c |  9 +
>  arch/x86/kernel/cpu/sgx/driver.c |  2 +-
>  drivers/char/mem.c   |  2 +-
>  drivers/dax/device.c |  6 +++---
>  fs/proc/inode.c  |  2 +-
>  fs/ramfs/file-mmu.c  |  2 +-
>  include/linux/sched/mm.h |  6 +++---
>  io_uring/memmap.c|  2 +-
>  kernel/bpf/arena.c   |  2 +-
>  kernel/bpf/syscall.c |  2 +-
>  mm/mmap.c| 11 +--
>  mm/shmem.c   |  9 -
>  12 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/arch/sparc/kernel/sys_sparc_64.c 
> b/arch/sparc/kernel/sys_sparc_64.c
> index d9c3b34ca744..cf0b4ace5bf9 100644
> --- a/arch/sparc/kernel/sys_sparc_64.c
> +++ b/arch/sparc/kernel/sys_sparc_64.c
> @@ -220,7 +220,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, 
> unsigned long orig_addr, u
>  
>   if (flags & MAP_FIXED) {
>   /* Ok, don't mess with it. */
> - return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, 
> pgoff, flags);
> + return current_get_unmapped_area(NULL, orig_addr, len, pgoff, 
> flags);
>   }
>   flags &= ~MAP_SHARED;
>  
> @@ -233,8 +233,9 @@ unsigned long get_fb_unmapped_area(struct file *filp, 
> unsigned long orig_addr, u
>   align_goal = (64UL * 1024);
>  
>   do {
> - addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
> - len + (align_goal - PAGE_SIZE), 
> pgoff, flags);
> + addr = current_get_unmapped_area(NULL, orig_addr,
> +  len + (align_goal - PAGE_SIZE),
> +  pgoff, flags);
>   if (!(addr & ~PAGE_MASK)) {
>   addr = (addr + (align_goal - 1UL)) & ~(align_goal - 
> 1UL);
>   break;
> @@ -252,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, 
> unsigned long orig_addr, u
>* be obtained.
>*/
>   if (addr & ~PAGE_MASK)
> - addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, 
> pgoff, flags);
> + addr = current_get_unmapped_area(NULL, orig_addr, len, pgoff, 
> flags);
>  
>   return addr;
>  }
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c 
> b/arch/x86/kernel/cpu/sgx/driver.c
> index 22b65a5f5ec6..5f7bfd9035f7 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file 
> *file,
>   if (flags & MAP_FIXED)
>   return addr;
>  
> - return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> + return current_get_unmapped_area(file, addr, len, pgoff, flags);
>  }
>  
>  #ifdef CONFIG_COMPAT
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 7c359cc406d5..a29c4bd506d5 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -546,7 +546,7 @@ static unsigned long get_unmapped_area_zero(struct file 
> *file,
>   }
>  
>   /* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
> - return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> + return current_get_unmapped_area(file, addr, len, pgoff, flags);
>  #else
>   return -ENOSYS;
>  #endif
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index eb61598247a9..c379902307b7 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file 
> *filp,
>   if ((off + len_align) < off)
>   goto out;
>  
> - addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
> -   pgoff, flags);
> + addr_align = current_get_unmapped_area(filp, addr, len_align,
> +pgoff, flags);
>   if (!IS_ERR_VALUE(addr_align)) {
>

Re: [syzbot] [bpf?] [trace?] general protection fault in bpf_get_attach_cookie_tracing

2024-05-07 Thread Andrii Nakryiko

On Sun, May 5, 2024 at 9:13 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:a9e7715ce8b3 libbpf: Avoid casts from pointers to enums in..
> git tree:   bpf-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=153c1dc498
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e8aa3e4736485e94
> dashboard link: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d4b58898
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16cb047098
>
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/a6daa7801875/disk-a9e7715c.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/0d5b51385a69/vmlinux-a9e7715c.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/999297a08631/bzImage-a9e7715c.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3ab78ff125b7979e4...@syzkaller.appspotmail.com
>
> general protection fault, probably for non-canonical address 
> 0xdc00:  [#1] PREEMPT SMP KASAN PTI

I suspect it's the same issue that we already fixed ([0]) in
bpf/master, the fixes just haven't made it into bpf-next tree

  [0] 1a80dbcb2dba bpf: support deferring bpf_link dealloc to after
RCU grace period

> KASAN: null-ptr-deref in range [0x-0x0007]
> CPU: 0 PID: 5082 Comm: syz-executor316 Not tainted 
> 6.9.0-rc5-syzkaller-01452-ga9e7715ce8b3 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 03/27/2024
> RIP: 0010:bpf_get_attach_cookie_tracing kernel/trace/bpf_trace.c:1179 
> [inline]
> RIP: 0010:bpf_get_attach_cookie_tracing+0x46/0x60 
> kernel/trace/bpf_trace.c:1174
> Code: d3 03 00 48 81 c3 00 18 00 00 48 89 d8 48 c1 e8 03 42 80 3c 30 00 74 08 
> 48 89 df e8 54 b9 59 00 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 
> 48 89 df e8 3b b9 59 00 48 8b 03 5b 41 5e c3
> RSP: 0018:c90002f9fba8 EFLAGS: 00010246
> RAX:  RBX:  RCX: 888029575a00
> RDX:  RSI: c9ace048 RDI: 
> RBP: c90002f9fbc0 R08: 89938ae7 R09: 125e80a0
> R10: dc00 R11: a950 R12: c90002f9fc80
> R13: dc00 R14: dc00 R15: 
> FS:  78992380() GS:8880b940() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2e3e9388 CR3: 791c2000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  bpf_prog_fe13437f26555f61+0x1a/0x1c
>  bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
>  __bpf_prog_run include/linux/filter.h:691 [inline]
>  bpf_prog_run include/linux/filter.h:698 [inline]
>  __bpf_prog_test_run_raw_tp+0x149/0x310 net/bpf/test_run.c:732
>  bpf_prog_test_run_raw_tp+0x47b/0x6a0 net/bpf/test_run.c:772
>  bpf_prog_test_run+0x33a/0x3b0 kernel/bpf/syscall.c:4286
>  __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5700
>  __do_sys_bpf kernel/bpf/syscall.c:5789 [inline]
>  __se_sys_bpf kernel/bpf/syscall.c:5787 [inline]
>  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5787
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f53be8a0469
> Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7ffdcf680a08 EFLAGS: 0246 ORIG_RAX: 0141
> RAX: ffda RBX: 7ffdcf680bd8 RCX: 7f53be8a0469
> RDX: 000c RSI: 2080 RDI: 000a
> RBP: 7f53be913610 R08:  R09: 7ffdcf680bd8
> R10: 7f53be8dbae3 R11: 0246 R12: 0001
> R13: 7ffdcf680bc8 R14: 0001 R15: 0001
>  
> Modules linked in:
> ---[ end trace  ]---
> RIP: 0010:bpf_get_attach_cookie_tracing kernel/trace/bpf_trace.c:1179 
> [inline]
> RIP: 0010:bpf_get_attach_cookie_tracing+0x46/0x60 
> kernel/trace/bpf_trace.c:1174
> Code: d3 03 00 48 81 c3 00 18 00 00 48 89 d8 48 c1 e8 03 42 80 3c 30 00 74 08 
> 48 89 df e8 54 b9 59 00 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 
> 48 89 df e8 3b b9 59 00 48 8b 03 5b 41 5e c3
> RSP: 0018:c90002f9fba8 EFLAGS: 00010246
> RAX:  RBX:  RCX: 888029575a00
> RDX:  RSI: c9ace048 RDI: 
> RBP: c90002f9fbc0 R08: 89938ae7 R09: 125e80a0
> R10: dc00 R11: a950

Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-05-07 Thread Andrii Nakryiko

On Wed, May 1, 2024 at 7:06 PM Masami Hiramatsu  wrote:
>
> On Tue, 30 Apr 2024 09:29:40 -0700
> Andrii Nakryiko  wrote:
>
> > On Tue, Apr 30, 2024 at 6:32 AM Masami Hiramatsu  
> > wrote:
> > >
> > > On Mon, 29 Apr 2024 13:25:04 -0700
> > > Andrii Nakryiko  wrote:
> > >
> > > > On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu  
> > > > wrote:
> > > > >
> > > > > Hi Andrii,
> > > > >
> > > > > On Thu, 25 Apr 2024 13:31:53 -0700
> > > > > Andrii Nakryiko  wrote:
> > > > >
> > > > > > Hey Masami,
> > > > > >
> > > > > > I can't really review most of that code as I'm completely unfamiliar
> > > > > > with all those inner workings of fprobe/ftrace/function_graph. I 
> > > > > > left
> > > > > > a few comments where there were somewhat more obvious BPF-related
> > > > > > pieces.
> > > > > >
> > > > > > But I also did run our BPF benchmarks on probes/for-next as a 
> > > > > > baseline
> > > > > > and then with your series applied on top. Just to see if there are 
> > > > > > any
> > > > > > regressions. I think it will be a useful data point for you.
> > > > >
> > > > > Thanks for testing!
> > > > >
> > > > > >
> > > > > > You should be already familiar with the bench tool we have in BPF
> > > > > > selftests (I used it on some other patches for your tree).
> > > > >
> > > > > What patches we need?
> > > > >
> > > >
> > > > You mean for this `bench` tool? They are part of BPF selftests (under
> > > > tools/testing/selftests/bpf), you can build them by running:
> > > >
> > > > $ make RELEASE=1 -j$(nproc) bench
> > > >
> > > > After that you'll get a self-container `bench` binary, which has all
> > > > the self-contained benchmarks.
> > > >
> > > > You might also find a small script (benchs/run_bench_trigger.sh inside
> > > > BPF selftests directory) helpful, it collects final summary of the
> > > > benchmark run and optionally accepts a specific set of benchmarks. So
> > > > you can use it like this:
> > > >
> > > > $ benchs/run_bench_trigger.sh kprobe kprobe-multi
> > > > kprobe :   18.731 ± 0.639M/s
> > > > kprobe-multi   :   23.938 ± 0.612M/s
> > > >
> > > > By default it will run a wider set of benchmarks (no uprobes, but a
> > > > bunch of extra fentry/fexit tests and stuff like this).
> > >
> > > origin:
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.329 ± 0.007M/s
> > > kretprobe-multi:1.341 ± 0.004M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.288 ± 0.014M/s
> > > kretprobe-multi:1.365 ± 0.002M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.329 ± 0.002M/s
> > > kretprobe-multi:1.331 ± 0.011M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.311 ± 0.003M/s
> > > kretprobe-multi:1.318 ± 0.002M/s s
> > >
> > > patched:
> > >
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.274 ± 0.003M/s
> > > kretprobe-multi:1.397 ± 0.002M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.307 ± 0.002M/s
> > > kretprobe-multi:1.406 ± 0.004M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.279 ± 0.004M/s
> > > kretprobe-multi:1.330 ± 0.014M/s
> > > # benchs/run_bench_trigger.sh
> > > kretprobe :1.256 ± 0.010M/s
> > > kretprobe-multi:1.412 ± 0.003M/s
> > >
> > > Hmm, in my case, it seems smaller differences (~3%?).
> > > I attached perf report results for those, but I don't see large 
> > > difference.
> >
> > I ran my benchmarks on bare metal machine (and quite powerful at that,
> > you can see my numbers are almost 10x of yours), with mitigations
> > disabled, no retpolines, etc. If you have any of those mitigations it
> > might result in smaller differences, probably. If you are running
> > inside QEMU/VM, the results might differ significantly as well.
>
> I ran it on my bare metal machines again, but could not find any difference
> between them. But I think I enabled intel mitigations on, so it might make
> a difference from your result.
>
> Can you run the benchmark with perf record? If there is such differences,
> there should be recorded.

I can, yes, will try to do this week, I'm just trying to keep up with
the rest of the stuff on my plate and haven't found yet time to do
this. I'll get back to you (and I'll use the latest version of your
patch set, of course).

> e.g.
>
> # perf record -g -o perf.data-kretprobe-nopatch-raw-bpf -- bench -w2 -d5 -a 
> trig-kretprobe
> # perf report -G -i perf.data-kretprobe-nopatch-raw-bpf -k $VMLINUX --stdio > 
> perf-out-kretprobe-nopatch-raw-bpf
>
> I attached the results in my side.
> The interesting point is, the functions int the result are not touched by
> this series. Thus there may be another reason if you see the kretprobe
> regression.
>
> Thank you,
> --
> Masami Hiramatsu (Google)

Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-07 Thread Mathieu Poirier

On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:
> From: Martyn Welch 
> 
> The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> the MCU domain. This core is typically used for safety applications in a
> stand alone mode. However, some application (non safety related) may
> want to use the M4F core as a generic remote processor with IPC to the
> host processor. The M4F core has internal IRAM and DRAM memories and are
> exposed to the system bus for code and data loading.
> 
> A remote processor driver is added to support this subsystem, including
> being able to load and boot the M4F core. Loading includes to M4F
> internal memories and predefined external code/data memories. The
> carve outs for external contiguous memory is defined in the M4F device
> node and should match with the external memory declarations in the M4F
> image binary. The M4F subsystem has two resets. One reset is for the
> entire subsystem i.e including the internal memories and the other, a
> local reset is only for the M4F processing core. When loading the image,
> the driver first releases the subsystem reset, loads the firmware image
> and then releases the local reset to let the M4F processing core run.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Hari Nagalla 
> Signed-off-by: Andrew Davis 
> ---
>  drivers/remoteproc/Kconfig   |  13 +
>  drivers/remoteproc/Makefile  |   1 +
>  drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
>  3 files changed, 799 insertions(+)
>  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> 
> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> index 48845dc8fa852..1a7c0330c91a9 100644
> --- a/drivers/remoteproc/Kconfig
> +++ b/drivers/remoteproc/Kconfig
> @@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
> It's safe to say N here if you're not interested in utilizing
> the DSP slave processors.
>  
> +config TI_K3_M4_REMOTEPROC
> + tristate "TI K3 M4 remoteproc support"
> + depends on ARCH_K3 || COMPILE_TEST
> + select MAILBOX
> + select OMAP2PLUS_MBOX
> + help
> +   Say m here to support TI's M4 remote processor subsystems
> +   on various TI K3 family of SoCs through the remote processor
> +   framework.
> +
> +   It's safe to say N here if you're not interested in utilizing
> +   a remote processor.
> +
>  config TI_K3_R5_REMOTEPROC
>   tristate "TI K3 R5 remoteproc support"
>   depends on ARCH_K3
> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> index 91314a9b43cef..5ff4e2fee4abd 100644
> --- a/drivers/remoteproc/Makefile
> +++ b/drivers/remoteproc/Makefile
> @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC) += st_remoteproc.o
>  obj-$(CONFIG_ST_SLIM_REMOTEPROC) += st_slim_rproc.o
>  obj-$(CONFIG_STM32_RPROC)+= stm32_rproc.o
>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)   += ti_k3_dsp_remoteproc.o
> +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)+= ti_k3_m4_remoteproc.o
>  obj-$(CONFIG_TI_K3_R5_REMOTEPROC)+= ti_k3_r5_remoteproc.o
>  obj-$(CONFIG_XLNX_R5_REMOTEPROC) += xlnx_r5_remoteproc.o
> diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> new file mode 100644
> index 0..0030e509f6b5d
> --- /dev/null
> +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> @@ -0,0 +1,785 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * TI K3 Cortex-M4 Remote Processor(s) driver
> + *
> + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> https://www.ti.com/
> + *   Hari Nagalla 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "omap_remoteproc.h"
> +#include "remoteproc_internal.h"
> +#include "ti_sci_proc.h"
> +
> +/**
> + * struct k3_m4_rproc_mem - internal memory structure
> + * @cpu_addr: MPU virtual address of the memory region
> + * @bus_addr: Bus address used to access the memory region
> + * @dev_addr: Device address of the memory region from remote processor view
> + * @size: Size of the memory region
> + */
> +struct k3_m4_rproc_mem {
> + void __iomem *cpu_addr;
> + phys_addr_t bus_addr;
> + u32 dev_addr;
> + size_t size;
> +};
> +
> +/**
> + * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
> + * @name: name for this memory entry
> + * @dev_addr: device address for the memory entry
> + */
> +struct k3_m4_rproc_mem_data {
> + const char *name;
> + const u32 dev_addr;
> +};
> +
> +/**
> + * struct k3_m4_rproc_dev_data - device data structure for a remote processor
> + * @mems: pointer to memory definitions for a remote processor
> + * @num_mems: number of memory regions in @mems
> + * @uses_lreset: flag to denote the need for local reset management
> + */
> +struct k3_m4_rproc_dev_data {
> + const struct k3_m4_rproc_mem_data *mems;
> + u32 num_mems;
> + bool

Re: [PATCHv5 bpf-next 6/8] x86/shstk: Add return uprobe support

2024-05-07 Thread Edgecombe, Rick P

On Tue, 2024-05-07 at 12:53 +0200, Jiri Olsa wrote:
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 81e6ee95784d..ae6c3458a675 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -406,6 +406,11 @@ SYSCALL_DEFINE0(uretprobe)
>  * trampoline's ret instruction
>  */
> r11_cx_ax[2] = regs->ip;
> +
> +   /* make the shadow stack follow that */
> +   if (shstk_push_frame(regs->ip))
> +   goto sigill;
> +
> regs->ip = ip;
>  

Per the earlier discussion, this cannot be reached unless uretprobes are in use,
which cannot happen without something with privileges taking an action. But are
uretprobes ever used for monitoring applications where security is important? Or
is it strictly a debug-time thing?

Re: [PATCHv5 bpf-next 5/8] selftests/bpf: Add uretprobe syscall call from user space test

2024-05-07 Thread Andrii Nakryiko

On Tue, May 7, 2024 at 3:54 AM Jiri Olsa  wrote:
>
> Adding test to verify that when called from outside of the
> trampoline provided by kernel, the uretprobe syscall will cause
> calling process to receive SIGILL signal and the attached bpf
> program is not executed.
>
> Reviewed-by: Masami Hiramatsu (Google) 
> Signed-off-by: Jiri Olsa 
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 95 +++
>  .../bpf/progs/uprobe_syscall_executed.c   | 17 
>  2 files changed, 112 insertions(+)
>  create mode 100644 
> tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
>

Acked-by: Andrii Nakryiko 

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c 
> b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 1a50cd35205d..3ef324c2db50 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -7,7 +7,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "uprobe_syscall.skel.h"
> +#include "uprobe_syscall_executed.skel.h"
>

[...]

Re: [PATCH] mm: Remove mm argument from mm_get_unmapped_area()

2024-05-07 Thread Liam R. Howlett

* Edgecombe, Rick P  [240507 09:51]:
> On Mon, 2024-05-06 at 12:32 -0400, Liam R. Howlett wrote:
> > 
> > I like this patch.
> 
> Thanks for taking a look.
> 
> > 
> > I think the context of current->mm is implied. IOW, could we call it
> > get_unmapped_area() instead?  There are other functions today that use
> > current->mm that don't start with current_.  I probably should
> > have responded to Dan's suggestion with my comment.
> 
> Yes, get_unmapped_area() is already taken. What else to call it... It is kind 
> of
> the process "default" get_unmapped_area(). But with Christoph's proposal it
> would basically be arch_get_unmapped_area().

unmapped_area(), but that's also taken..

arch_get_unmapped_area() are all quite close.  If you look into it, many
of the arch versions were taken from the sparc 32 version.  Subsequent
changes were made and they are no longer exactly the same, but I believe
functionally equivalent - rather tricky to test though.

I wanted to unite these to simplify the mm code a while back, but have
not gotten back to it.  One aspect that some archs have is "cache
coloring" which does affect the VMAs.

The other difference is VDSO, which I may be looking into soon.  Someone
once called me a glutton for punishment and there may be some truth in
that...

Cheers,
Liam

[PATCH RT 0/1] Linux v4.19.312-rt134-rc3

2024-05-07 Thread Daniel Wagner

Dear RT Folks,

This is the RT stable review cycle of patch 4.19.312-rt134-rc3.

Please scream at me if I messed something up. Please test the patches
too.

The -rc release is also available on kernel.org

  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

on the v4.19-rt-next branch.

If all goes well, this patch will be converted to the next main
release on 2024-05-14.

Signing key fingerprint:

  5BF6 7BC5 0826 72CA BB45  ACAE 587C 5ECA 5D0A 306C

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Daniel

Changes from v4.19.307-rt133:


Daniel Wagner (1):
  Linux 4.19.312-rt134

 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.44.0

[PATCH RT 1/1] Linux 4.19.312-rt134

2024-05-07 Thread Daniel Wagner

v4.19.312-rt134-rc3 stable review patch.
If anyone has any objections, please let me know.

---


Signed-off-by: Daniel Wagner 
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index c2c7e0fb6685..6067da4c8c99 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt133
+-rt134
-- 
2.44.0

[PATCH v10 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Skip recording calltime and rettime if the fgraph_ops does not need it.
This is a kind of performance optimization for fprobe. Since the fprobe
user does not use these entries, recording timestamp in fgraph is just
a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag,
and all fgraph_ops sets this flag, skip recording calltime and rettime.

Suggested-by: Jiri Olsa 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v10:
  - Add likely() to skipping timestamp.
 Changes in v9:
  - Newly added.
---
 include/linux/ftrace.h |2 ++
 kernel/trace/fgraph.c  |   51 +---
 kernel/trace/fprobe.c  |1 +
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 64ca91d1527f..eb9de9d70829 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1156,6 +1156,8 @@ struct fgraph_ops {
struct ftrace_ops   ops; /* for the hash lists */
void*private;
int idx;
+   /* If skip_timestamp is true, this does not record timestamps. */
+   boolskip_timestamp;
 };
 
 void *fgraph_reserve_data(int idx, int size_bytes);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 40f47fcbc6c3..13b41485ce49 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -138,6 +138,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+static bool fgraph_skip_timestamp;
 
 /* LRU index table for fgraph_array */
 static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
@@ -483,7 +484,7 @@ void ftrace_graph_stop(void)
 static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp,
-int fgraph_idx)
+int fgraph_idx, bool skip_ts)
 {
struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
@@ -506,8 +507,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
ret_stack = get_ret_stack(current, current->curr_ret_stack, );
if (ret_stack && ret_stack->func == func &&
get_fgraph_type(current, offset + FGRAPH_FRAME_OFFSET) == 
FGRAPH_TYPE_BITMAP &&
-   !is_fgraph_index_set(current, offset + FGRAPH_FRAME_OFFSET, 
fgraph_idx))
+   !is_fgraph_index_set(current, offset + FGRAPH_FRAME_OFFSET, 
fgraph_idx)) {
+   /* If previous one skips calltime, update it. */
+   if (!skip_ts && !ret_stack->calltime)
+   ret_stack->calltime = trace_clock_local();
return offset + FGRAPH_FRAME_OFFSET;
+   }
 
val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
 
@@ -525,7 +530,11 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
return -EBUSY;
}
 
-   calltime = trace_clock_local();
+   /* This is not really 'likely' but for keeping the least path to be 
faster. */
+   if (likely(skip_ts))
+   calltime = 0LL;
+   else
+   calltime = trace_clock_local();
 
offset = READ_ONCE(current->curr_ret_stack);
ret_stack = RET_STACK(current, offset);
@@ -609,7 +618,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned 
long func,
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
-   offset = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
+   offset = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0,
+ fgraph_skip_timestamp);
if (offset < 0)
goto out;
 
@@ -662,7 +672,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned 
long func,
return -ENODEV;
 
/* Use start for the distance to ret_stack (skipping over reserve) */
-   offset = ftrace_push_return_trace(ret, func, frame_pointer, retp, 
gops->idx);
+   offset = ftrace_push_return_trace(ret, func, frame_pointer, retp, 
gops->idx,
+ gops->skip_timestamp);
if (offset < 0)
return offset;
type = get_fgraph_type(current, offset);
@@ -740,6 +751,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
*ret = ret_stack->ret;
trace->func = ret_stack->func;
trace->calltime = ret_stack->calltime;
+   trace->rettime = 0;
trace->overrun = atomic_read(>trace_overrun);
trace->depth = current->curr_ret_depth;
/*
@@ -800,7 +812,6 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, 
unsigned long frame_pointe
return (unsigned long)panic;
}
 
-   trace.rettime = trace_clock_local();
if (fregs)

[PATCH v10 35/36] Documentation: probes: Update fprobe on function-graph tracer

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Update fprobe documentation for the new fprobe on function-graph
tracer. This includes some bahvior changes and pt_regs to
ftrace_regs interface change.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Update @fregs parameter explanation.
---
 Documentation/trace/fprobe.rst |   42 ++--
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
index 196f52386aaa..f58bdc64504f 100644
--- a/Documentation/trace/fprobe.rst
+++ b/Documentation/trace/fprobe.rst
@@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe
 Introduction
 
 
-Fprobe is a function entry/exit probe mechanism based on ftrace.
-Instead of using ftrace full feature, if you only want to attach callbacks
-on function entry and exit, similar to the kprobes and kretprobes, you can
+Fprobe is a function entry/exit probe mechanism based on the function-graph
+tracer.
+Instead of tracing all functions, if you want to attach callbacks on specific
+function entry and exit, similar to the kprobes and kretprobes, you can
 use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
 instrumentation for multiple functions with single handler. This document
 describes how to use fprobe.
@@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as 
follows:
 
 .. code-block:: c
 
- int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct pt_regs *regs, void *entry_data);
+ int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
- void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct pt_regs *regs, void *entry_data);
+ void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long 
ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
-Note that the @entry_ip is saved at function entry and passed to exit handler.
-If the entry callback function returns !0, the corresponding exit callback 
will be cancelled.
+Note that the @entry_ip is saved at function entry and passed to exit
+handler.
+If the entry callback function returns !0, the corresponding exit callback
+will be cancelled.
 
 @fp
 This is the address of `fprobe` data structure related to this handler.
@@ -112,12 +115,10 @@ If the entry callback function returns !0, the 
corresponding exit callback will
 This is the return address that the traced function will return to,
 somewhere in the caller. This can be used at both entry and exit.
 
-@regs
-This is the `pt_regs` data structure at the entry and exit. Note that
-the instruction pointer of @regs may be different from the @entry_ip
-in the entry_handler. If you need traced instruction pointer, you need
-to use @entry_ip. On the other hand, in the exit_handler, the 
instruction
-pointer of @regs is set to the current return address.
+@fregs
+This is the `ftrace_regs` data structure at the entry and exit. This
+includes the function parameters, or the return values. So user can
+access thos values via appropriate `ftrace_regs_*` APIs.
 
 @entry_data
 This is a local storage to share the data between entry and exit 
handlers.
@@ -125,6 +126,17 @@ If the entry callback function returns !0, the 
corresponding exit callback will
 and `entry_data_size` field when registering the fprobe, the storage is
 allocated and passed to both `entry_handler` and `exit_handler`.
 
+Entry data size and exit handlers on the same function
+==
+
+Since the entry data is passed via per-task stack and it is has limited size,
+the entry data size per probe is limited to `15 * sizeof(long)`. You also need
+to take care that the different fprobes are probing on the same function, this
+limit becomes smaller. The entry data size is aligned to `sizeof(long)` and
+each fprobe which has exit handler uses a `sizeof(long)` space on the stack,
+you should keep the number of fprobes on the same function as small as
+possible.
+
 Share the callbacks with kprobes
 
 
@@ -165,8 +177,8 @@ This counter counts up when;
  - fprobe fails to take ftrace_recursion lock. This usually means that a 
function
which is traced by other ftrace users is called from the entry_handler.
 
- - fprobe fails to setup the function exit because of the shortage of rethook
-   (the shadow stack for hooking the function return.)
+ - fprobe fails to setup the function exit because of failing to allocate the
+   data buffer from the per-task shadow stack.
 
 The `fprobe::nmissed` field counts up in both cases. Therefore, the former
 skips both of entry and exit callback and the latter skips the exit

[PATCH v10 17/36] function_graph: Move graph notrace bit to shadow stack global var

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use
that instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Make description lines shorter than 76 chars.
---
 include/linux/trace_recursion.h  |7 ---
 kernel/trace/trace.h |9 +
 kernel/trace/trace_functions_graph.c |   10 ++
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index fdfb6f66718a..ae04054a1be3 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,13 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /*
-* To implement set_graph_notrace, if this bit is set, we ignore
-* function graph tracing of called functions, until the return
-* function is called to clear it.
-*/
-   TRACE_GRAPH_NOTRACE_BIT,
-
/* Used to prevent recursion recording from recursing. */
TRACE_RECORD_RECURSION_BIT,
 };
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 7ab731b9ebc8..f23b6fbd547d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -918,8 +918,17 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
+#define TRACE_GRAPH_NOTRACE(1 << TRACE_GRAPH_NOTRACE_BIT)
+
 static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
 {
return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 66cce73e94f8..13d0387ac6a6 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -130,6 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
  struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -138,7 +139,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
int ret;
int cpu;
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   if (*task_var & TRACE_GRAPH_NOTRACE)
return 0;
 
/*
@@ -149,7 +150,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 * returning from the function.
 */
if (ftrace_graph_notrace_addr(trace->func)) {
-   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   *task_var |= TRACE_GRAPH_NOTRACE_BIT;
/*
 * Need to return 1 to have the return called
 * that will clear the NOTRACE bit.
@@ -240,6 +241,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -249,8 +251,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 
ftrace_graph_addr_finish(gops, trace);
 
-   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   if (*task_var & TRACE_GRAPH_NOTRACE) {
+   *task_var &= ~TRACE_GRAPH_NOTRACE;
return;
}

[PATCH v10 34/36] selftests/ftrace: Add a test case for repeating register/unregister fprobe

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

This test case repeats define and undefine the fprobe dynamic event to
ensure that the fprobe does not cause any issue with such operations.

Signed-off-by: Masami Hiramatsu (Google) 
---
 .../test.d/dynevent/add_remove_fprobe_repeat.tc|   19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
new file mode 100644
index ..b4ad09237e2a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
@@ -0,0 +1,19 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - Repeating add/remove fprobe events
+# requires: dynamic_events "f[:[/][]] [%return] 
[]":README
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=$FUNCTION_FORK
+REPEAT_TIMES=64
+
+for i in `seq 1 $REPEAT_TIMES`; do
+  echo "f:myevent $PLACE" >> dynamic_events
+  grep -q myevent dynamic_events
+  test -d events/fprobes/myevent
+  echo > dynamic_events
+done
+
+clear_trace

[PATCH v10 33/36] selftests: ftrace: Remove obsolate maxactive syntax check

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Since the fprobe event does not support maxactive anymore, stop
testing the maxactive syntax error checking.

Signed-off-by: Masami Hiramatsu (Google) 
---
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 61877d166451..c9425a34fae3 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -16,9 +16,7 @@ aarch64)
   REG=%r0 ;;
 esac
 
-check_error 'f^100 vfs_read'   # MAXACT_NO_KPROBE
-check_error 'f^1a111 vfs_read' # BAD_MAXACT
-check_error 'f^10 vfs_read'# MAXACT_TOO_BIG
+check_error 'f^100 vfs_read'   # BAD_MAXACT
 
 check_error 'f ^non_exist_func'# BAD_PROBE_ADDR (enoent)
 check_error 'f ^vfs_read+10'   # BAD_PROBE_ADDR

[PATCH v10 32/36] tracing/fprobe: Remove nr_maxactive from fprobe

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Remove depercated fprobe::nr_maxactive. This involves fprobe events to
rejects the maxactive number.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Newly added.
---
 include/linux/fprobe.h  |2 --
 kernel/trace/trace_fprobe.c |   44 ++-
 2 files changed, 6 insertions(+), 40 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 2d06bbd99601..a86b3e4df2a0 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -54,7 +54,6 @@ struct fprobe_hlist {
  * @nmissed: The counter for missing events.
  * @flags: The status flag.
  * @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions. (*deprecated)
  * @entry_handler: The callback function for function entry.
  * @exit_handler: The callback function for function exit.
  * @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
@@ -63,7 +62,6 @@ struct fprobe {
unsigned long   nmissed;
unsigned intflags;
size_t  entry_data_size;
-   int nr_maxactive;
 
fprobe_entry_cb entry_handler;
fprobe_exit_cb  exit_handler;
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 86cd6a8c806a..20ef5cd5d419 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -422,7 +422,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char 
*group,
   const char *event,
   const char *symbol,
   struct tracepoint *tpoint,
-  int maxactive,
   int nargs, bool is_return)
 {
struct trace_fprobe *tf;
@@ -442,7 +441,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char 
*group,
tf->fp.entry_handler = fentry_dispatcher;
 
tf->tpoint = tpoint;
-   tf->fp.nr_maxactive = maxactive;
 
ret = trace_probe_init(>tp, event, group, false, nargs);
if (ret < 0)
@@ -1021,12 +1019,11 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
 *  FETCHARG:TYPE : use TYPE instead of unsigned long.
 */
struct trace_fprobe *tf = NULL;
-   int i, len, new_argc = 0, ret = 0;
+   int i, new_argc = 0, ret = 0;
bool is_return = false;
char *symbol = NULL;
const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
const char **new_argv = NULL;
-   int maxactive = 0;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
char sbuf[KSYM_NAME_LEN];
@@ -1048,33 +1045,13 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
 
trace_probe_log_init("trace_fprobe", argc, argv);
 
-   event = strchr([0][1], ':');
-   if (event)
-   event++;
-
-   if (isdigit(argv[0][1])) {
-   if (event)
-   len = event - [0][1] - 1;
-   else
-   len = strlen([0][1]);
-   if (len > MAX_EVENT_NAME_LEN - 1) {
-   trace_probe_log_err(1, BAD_MAXACT);
-   goto parse_error;
-   }
-   memcpy(buf, [0][1], len);
-   buf[len] = '\0';
-   ret = kstrtouint(buf, 0, );
-   if (ret || !maxactive) {
+   if (argv[0][1] != '\0') {
+   if (argv[0][1] != ':') {
+   trace_probe_log_set_index(0);
trace_probe_log_err(1, BAD_MAXACT);
goto parse_error;
}
-   /* fprobe rethook instances are iterated over via a list. The
-* maximum should stay reasonable.
-*/
-   if (maxactive > RETHOOK_MAXACTIVE_MAX) {
-   trace_probe_log_err(1, MAXACT_TOO_BIG);
-   goto parse_error;
-   }
+   event = [0][2];
}
 
trace_probe_log_set_index(1);
@@ -1084,12 +1061,6 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
if (ret < 0)
goto parse_error;
 
-   if (!is_return && maxactive) {
-   trace_probe_log_set_index(0);
-   trace_probe_log_err(1, BAD_MAXACT_TYPE);
-   goto parse_error;
-   }
-
trace_probe_log_set_index(0);
if (event) {
ret = traceprobe_parse_event_name(, , gbuf,
@@ -1147,8 +1118,7 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
goto out;
 
/* setup a probe */
-   tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
-   argc, is_return);
+   tf = alloc_trace_fprobe(group, event, symbol, tpoint, argc,

[PATCH v10 31/36] fprobe: Rewrite fprobe on function-graph tracer

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
 -  'nr_maxactive' field is deprecated.
 -  This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
on x86_64.
 -  Currently the entry size is limited in 15 * sizeof(long).
 -  If there is too many fprobe exit handler set on the same
function, it will fail to probe.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v9:
  - Remove unneeded prototype of ftrace_regs_get_return_address().
  - Fix entry data address calculation.
  - Remove DIV_ROUND_UP() from hotpath.
 Changes in v8:
  - Use trace_func_graph_ret/ent_t for fgraph_ops.
  - Update CONFIG_FPROBE dependencies.
  - Add ftrace_regs_get_return_address() for each arch.
 Changes in v3:
  - Update for new reserve_data/retrieve_data API.
  - Fix internal push/pop on fgraph data logic so that it can
correctly save/restore the returning fprobes.
 Changes in v2:
  - Add more lockdep_assert_held(fprobe_mutex)
  - Use READ_ONCE() and WRITE_ONCE() for fprobe_hlist_node::fp.
  - Add NOKPROBE_SYMBOL() for the functions which is called from
entry/exit callback.
---
 arch/arm64/include/asm/ftrace.h |6 
 arch/loongarch/include/asm/ftrace.h |6 
 arch/powerpc/include/asm/ftrace.h   |6 
 arch/s390/include/asm/ftrace.h  |6 
 arch/x86/include/asm/ftrace.h   |6 
 include/linux/fprobe.h  |   53 ++-
 kernel/trace/Kconfig|8 
 kernel/trace/fprobe.c   |  638 +--
 lib/test_fprobe.c   |   45 --
 9 files changed, 529 insertions(+), 245 deletions(-)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 95a8f349f871..800c75f46a13 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -143,6 +143,12 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs 
*fregs)
return fregs->fp;
 }
 
+static __always_inline unsigned long
+ftrace_regs_get_return_address(const struct ftrace_regs *fregs)
+{
+   return fregs->lr;
+}
+
 static __always_inline struct pt_regs *
 ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
 {
diff --git a/arch/loongarch/include/asm/ftrace.h 
b/arch/loongarch/include/asm/ftrace.h
index 14a1576bf948..b8432b7cc9d4 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -81,6 +81,12 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
*fregs, unsigned long ip)
 #define ftrace_regs_get_frame_pointer(fregs) \
((fregs)->regs.regs[22])
 
+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+   return *(unsigned long *)(fregs->regs.regs[1]);
+}
+
 #define ftrace_graph_func ftrace_graph_func
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs);
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 51245fd6b45b..d8a74a6570f8 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -77,6 +77,12 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs 
*fregs)
 #define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
 
+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+   return fregs->regs.link;
+}
+
 struct ftrace_ops;
 
 #define ftrace_graph_func ftrace_graph_func
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index cb8d60a5fe1d..d8ca1776c554 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -89,6 +89,12 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
return sp[0];   /* return backchain */
 }
 
+static __always_inline unsigned long
+ftrace_regs_get_return_address(const struct ftrace_regs *fregs)
+{
+   return fregs->regs.gprs[14];
+}
+
 #define arch_ftrace_fill_perf_regs(fregs, _regs)do {   \
(_regs)->psw.addr = (fregs)->regs.psw.addr; \
(_regs)->gprs[15] = (fregs)->regs.gprs[15]; \
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 7625887fc49b..979d3458a328 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -82,6 +82,12 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
 #define ftrace_regs_get_frame_pointer(fregs) \
frame_pointer(&(fregs)->regs)
 
+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+   return *(unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+}
+
 struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
 void ftrace_graph_func(unsigned long ip, unsigned long

[PATCH v10 30/36] ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func
macro check. This is for the other feature (e.g. FPROBE) which requires to
access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if
the fgraph can not pass the valid ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v8:
  - Newly added.
---
 arch/arm64/Kconfig |1 +
 arch/loongarch/Kconfig |1 +
 arch/powerpc/Kconfig   |1 +
 arch/riscv/Kconfig |1 +
 arch/x86/Kconfig   |1 +
 kernel/trace/Kconfig   |5 +
 6 files changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8d5047bc13bc..e0a5c69eeda2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -206,6 +206,7 @@ config ARM64
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
+   select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 21dc39ae6bc2..0276a6825e6d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -121,6 +121,7 @@ config LOONGARCH
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP
+   select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..b79d16c5846a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -237,6 +237,7 @@ config PPC
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
+   select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index b58b8e81b510..6fd2a166904b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -127,6 +127,7 @@ config RISCV
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && 
(CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
+   select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_FREGS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4da23dc0b07c..bd86e598e31d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -225,6 +225,7 @@ config X86
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP
select HAVE_FENTRY  if X86_64 || DYNAMIC_FTRACE
+   select HAVE_FTRACE_GRAPH_FUNC   if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_FREGSif HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_TRACER   if X86_32 || (X86_64 && 
DYNAMIC_FTRACE)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 0b6ce0a38967..0e4c33f1ab43 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -34,6 +34,11 @@ config HAVE_FUNCTION_GRAPH_TRACER
 config HAVE_FUNCTION_GRAPH_FREGS
bool
 
+config HAVE_FTRACE_GRAPH_FUNC
+   bool
+   help
+ True if ftrace_graph_func() is defined.
+
 config HAVE_DYNAMIC_FTRACE
bool
help

[PATCH v10 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
converted from ftrace_regs by ftrace_partial_regs(), thus some registers
may always returns 0. But it should be enough for function entry (access
arguments) and exit (access return value).

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes in v9:
  - Avoid wasting memory for bpf_kprobe_multi_pt_regs when
CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y
---
 kernel/trace/bpf_trace.c |   27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e51a6ef87167..b779f4a83361 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
+#ifdef CONFIG_FPROBE
 struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2600,6 +2600,13 @@ struct user_syms {
char *buf;
 };
 
+#ifndef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);
+#define bpf_kprobe_multi_pt_regs_ptr() this_cpu_ptr(_kprobe_multi_pt_regs)
+#else
+#define bpf_kprobe_multi_pt_regs_ptr() (NULL)
+#endif
+
 static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, 
u32 cnt)
 {
unsigned long __user usymbol;
@@ -2792,13 +2799,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx 
*ctx)
 
 static int
 kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
-  unsigned long entry_ip, struct pt_regs *regs)
+  unsigned long entry_ip, struct ftrace_regs *fregs)
 {
struct bpf_kprobe_multi_run_ctx run_ctx = {
.link = link,
.entry_ip = entry_ip,
};
struct bpf_run_ctx *old_run_ctx;
+   struct pt_regs *regs;
int err;
 
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
@@ -2809,6 +2817,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link 
*link,
 
migrate_disable();
rcu_read_lock();
+   regs = ftrace_partial_regs(fregs, bpf_kprobe_multi_pt_regs_ptr());
old_run_ctx = bpf_set_run_ctx(_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
@@ -2826,13 +2835,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned 
long fentry_ip,
  void *data)
 {
struct bpf_kprobe_multi_link *link;
-   struct pt_regs *regs = ftrace_get_regs(fregs);
-
-   if (!regs)
-   return 0;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
return 0;
 }
 
@@ -2842,13 +2847,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, 
unsigned long fentry_ip,
   void *data)
 {
struct bpf_kprobe_multi_link *link;
-   struct pt_regs *regs = ftrace_get_regs(fregs);
-
-   if (!regs)
-   return;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
-   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
 }
 
 static int symbols_cmp_r(const void *a, const void *b, const void *priv)
@@ -3107,7 +3108,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr 
*attr, struct bpf_prog *pr
kvfree(cookies);
return err;
 }
-#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+#else /* !CONFIG_FPROBE */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog 
*prog)
 {
return -EOPNOTSUPP;

[PATCH v10 28/36] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS,
fprobe events will not be able to be used from perf.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v9:
  - Copy store_trace_entry_data() as store_fprobe_entry_data() for
fprobe.
 Chagnes in v3:
  - Use ftrace_regs_get_return_value().
 Changes in v2:
  - Define ftrace_regs_get_kernel_stack_nth() for
!CONFIG_HAVE_REGS_AND_STACK_ACCESS_API.
 Changes from previous series: Update against the new series.
---
 include/linux/ftrace.h  |   17 ++
 kernel/trace/Kconfig|1 
 kernel/trace/trace_fprobe.c |  107 +--
 kernel/trace/trace_probe_tmpl.h |2 -
 4 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 3871823c1429..64ca91d1527f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -256,6 +256,23 @@ static __always_inline bool ftrace_regs_has_args(struct 
ftrace_regs *fregs)
frame_pointer(&(fregs)->regs)
 #endif
 
+#ifdef CONFIG_HAVE_REGS_AND_STACK_ACCESS_API
+static __always_inline unsigned long
+ftrace_regs_get_kernel_stack_nth(struct ftrace_regs *fregs, unsigned int nth)
+{
+   unsigned long *stackp;
+
+   stackp = (unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+   if (((unsigned long)(stackp + nth) & ~(THREAD_SIZE - 1)) ==
+   ((unsigned long)stackp & ~(THREAD_SIZE - 1)))
+   return *(stackp + nth);
+
+   return 0;
+}
+#else /* !CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+#define ftrace_regs_get_kernel_stack_nth(fregs, nth)   (0L)
+#endif /* CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
  struct ftrace_ops *op, struct ftrace_regs *fregs);
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 7df7b1fb305c..0b6ce0a38967 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -680,7 +680,6 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
-   depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 273cdf3cf70c..86cd6a8c806a 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -133,7 +133,7 @@ static int
 process_fetch_insn(struct fetch_insn *code, void *rec, void *edata,
   void *dest, void *base)
 {
-   struct pt_regs *regs = rec;
+   struct ftrace_regs *fregs = rec;
unsigned long val;
int ret;
 
@@ -141,17 +141,17 @@ process_fetch_insn(struct fetch_insn *code, void *rec, 
void *edata,
/* 1st stage: get value from context */
switch (code->op) {
case FETCH_OP_STACK:
-   val = regs_get_kernel_stack_nth(regs, code->param);
+   val = ftrace_regs_get_kernel_stack_nth(fregs, code->param);
break;
case FETCH_OP_STACKP:
-   val = kernel_stack_pointer(regs);
+   val = ftrace_regs_get_stack_pointer(fregs);
break;
case FETCH_OP_RETVAL:
-   val = regs_return_value(regs);
+   val = ftrace_regs_get_return_value(fregs);
break;
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
case FETCH_OP_ARG:
-   val = regs_get_kernel_argument(regs, code->param);
+   val = ftrace_regs_get_argument(fregs, code->param);
break;
case FETCH_OP_EDATA:
val = *(unsigned long *)((unsigned long)edata + code->offset);
@@ -174,7 +174,7 @@ NOKPROBE_SYMBOL(process_fetch_insn)
 /* function entry handler */
 static nokprobe_inline void
 __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
-   struct pt_regs *regs,
+   struct ftrace_regs *fregs,
struct trace_event_file *trace_file)
 {
struct fentry_trace_entry_head *entry;
@@ -188,41 +188,71 @@ __fentry_trace_func(struct trace_fprobe *tf, unsigned 
long entry_ip,
if (trace_trigger_soft_disabled(trace_file))
return;
 
-   dsize = __get_data_size(>tp, regs, NULL);
+   dsize = __get_data_size(>tp, fregs, NULL);
 
entry = trace_event_buffer_reserve(, trace_file,
   sizeof(*entry) + tf->tp.size + 
dsize);
if (!entry)
return;
 
-   fbuffer.regs = regs;
+   fbuffer.regs = ftrace_get_regs(fregs);
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->ip = entry_ip;
-   store_trace_args([1], >tp, regs, NULL,

[PATCH v10 27/36] tracing: Add ftrace_fill_perf_regs() for perf event

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.

Signed-off-by: Masami Hiramatsu (Google) 
---
  Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h   |7 +++
 arch/powerpc/include/asm/ftrace.h |7 +++
 arch/s390/include/asm/ftrace.h|5 +
 arch/x86/include/asm/ftrace.h |7 +++
 include/linux/ftrace.h|   31 +++
 5 files changed, 57 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index aab2b7a0f78c..95a8f349f871 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -154,6 +154,13 @@ ftrace_partial_regs(const struct ftrace_regs *fregs, 
struct pt_regs *regs)
return regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->pc = (fregs)->pc;  \
+   (_regs)->regs[29] = (fregs)->fp;\
+   (_regs)->sp = (fregs)->sp;  \
+   (_regs)->pstate = PSR_MODE_EL1h;\
+   } while (0)
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index cfec6c5a47d0..51245fd6b45b 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -44,6 +44,13 @@ static __always_inline struct pt_regs 
*arch_ftrace_get_regs(struct ftrace_regs *
return fregs->regs.msr ? >regs : NULL;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->result = 0;\
+   (_regs)->nip = (fregs)->regs.nip;   \
+   (_regs)->gpr[1] = (fregs)->regs.gpr[1]; \
+   asm volatile("mfmsr %0" : "=r" ((_regs)->msr)); \
+   } while (0)
+
 static __always_inline void
 ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
unsigned long ip)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 9f8cc6d13bec..cb8d60a5fe1d 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -89,6 +89,11 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
return sp[0];   /* return backchain */
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs)do {   \
+   (_regs)->psw.addr = (fregs)->regs.psw.addr; \
+   (_regs)->gprs[15] = (fregs)->regs.gprs[15]; \
+   } while (0)
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 /*
  * When an ftrace registered caller is tracing a function that is
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 8d6db2b7d03a..7625887fc49b 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -54,6 +54,13 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
return >regs;
 }
 
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {  \
+   (_regs)->ip = (fregs)->regs.ip; \
+   (_regs)->sp = (fregs)->regs.sp; \
+   (_regs)->cs = __KERNEL_CS;  \
+   (_regs)->flags = 0; \
+   } while (0)
+
 #define ftrace_regs_set_instruction_pointer(fregs, _ip)\
do { (fregs)->regs.ip = (_ip); } while (0)
 
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 914f451b0d69..3871823c1429 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -194,6 +194,37 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct 
pt_regs *regs)
 
 #endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || 
CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
 
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
+
+/*
+ * Please define arch dependent pt_regs which compatible to the
+ * perf_arch_fetch_caller_regs() but based on ftrace_regs.
+ * This requires
+ *   - user_mode(_regs) returns false (always kernel mode).
+ *   - able to use the _regs for stack trace.
+ */
+#ifndef arch_ftrace_fill_perf_regs
+/* As same as perf_arch_fetch_caller_regs(), do nothing by default */
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {} while (0)
+#endif
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   arch_ftrace_fill_perf_regs(fregs, regs);
+   return regs;
+}
+
+#else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   return >regs;
+}
+
+#endif
+
 /*
  * When true, the

[PATCH v10 26/36] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event), this will be used.

If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes in v8:
  - Add the reason why this required in changelog.
 Changes from previous series: NOTHING, just forward ported.
---
 arch/arm64/include/asm/ftrace.h |   11 +++
 include/linux/ftrace.h  |   17 +
 2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ac82dc43a57d..aab2b7a0f78c 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -143,6 +143,17 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs 
*fregs)
return fregs->fp;
 }
 
+static __always_inline struct pt_regs *
+ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   memcpy(regs->regs, fregs->regs, sizeof(u64) * 9);
+   regs->sp = fregs->sp;
+   regs->pc = fregs->pc;
+   regs->regs[29] = fregs->fp;
+   regs->regs[30] = fregs->lr;
+   return regs;
+}
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ff2966a1b529..914f451b0d69 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -177,6 +177,23 @@ static __always_inline struct pt_regs 
*ftrace_get_regs(struct ftrace_regs *fregs
return arch_ftrace_get_regs(fregs);
 }
 
+#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \
+   defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)
+
+static __always_inline struct pt_regs *
+ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+   /*
+* If CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, ftrace_regs memory
+* layout is the same as pt_regs. So always returns that address.
+* Since arch_ftrace_get_regs() will check some members and may return
+* NULL, we can not use it.
+*/
+   return >regs;
+}
+
+#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || 
CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
 /*
  * When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
  * Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.

[PATCH v10 25/36] fprobe: Use ftrace_regs in fprobe exit handler

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means
the ftrace_regs's memory layout is equal to the pt_regs so that those are
able to cast. Fprobe introduces a new dependency with that.

Signed-off-by: Masami Hiramatsu (Google) 
---
  Changes in v3:
   - Use ftrace_regs_get_return_value()
  Changes from previous series: NOTHING, just forward ported.
---
 arch/loongarch/Kconfig  |1 +
 arch/s390/Kconfig   |1 +
 arch/x86/Kconfig|1 +
 include/linux/fprobe.h  |2 +-
 include/linux/ftrace.h  |6 ++
 kernel/trace/Kconfig|8 
 kernel/trace/bpf_trace.c|6 +-
 kernel/trace/fprobe.c   |3 ++-
 kernel/trace/trace_fprobe.c |6 +-
 lib/test_fprobe.c   |6 +++---
 samples/fprobe/fprobe_example.c |2 +-
 11 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 6897eacac063..21dc39ae6bc2 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1ed25b72eb47..ce85b3165065 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT if HAVE_MARCH_Z196_FEATURES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ea198adb11d2..4da23dc0b07c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -215,6 +215,7 @@ config X86
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif X86_64
+   select HAVE_PT_REGS_TO_FTRACE_REGS_CAST if X86_64
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SAMPLE_FTRACE_DIRECTif X86_64
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if X86_64
diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index ca64ee5e45d2..ef609bcca0f9 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -14,7 +14,7 @@ typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned 
long entry_ip,
   void *entry_data);
 
 typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
-  unsigned long ret_ip, struct pt_regs *regs,
+  unsigned long ret_ip, struct ftrace_regs *regs,
   void *entry_data);
 
 /**
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1c185fefd932..ff2966a1b529 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -163,6 +163,12 @@ struct ftrace_regs {
 #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
 #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
 
+#ifdef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+
+static_assert(sizeof(struct pt_regs) == sizeof(struct ftrace_regs));
+
+#endif /* CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
 static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs 
*fregs)
 {
if (!fregs)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 4a850250dab4..7df7b1fb305c 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -57,6 +57,13 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS
 This allows for use of ftrace_regs_get_argument() and
 ftrace_regs_get_stack_pointer().
 
+config HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+   bool
+   help
+If this is set, the memory layout of the ftrace_regs data structure
+is the same as the pt_regs. So the pt_regs is possible to be casted
+to ftrace_regs.
+
 config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
bool
help
@@ -288,6 +295,7 @@ config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
+   depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || 
!HAVE_DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7837cf4e39d9..e51a6ef87167 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2838,10 +2838,14 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned 
long fentry_ip,
 
 static void

[PATCH v10 24/36] fprobe: Use ftrace_regs in fprobe entry handler

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Florent Revest 
---
 Changes in v6:
  - Keep using SAVE_REGS flag to avoid breaking bpf kprobe-multi test.
---
 include/linux/fprobe.h  |2 +-
 kernel/trace/Kconfig|3 ++-
 kernel/trace/bpf_trace.c|   10 +++---
 kernel/trace/fprobe.c   |3 ++-
 kernel/trace/trace_fprobe.c |6 +-
 lib/test_fprobe.c   |4 ++--
 samples/fprobe/fprobe_example.c |2 +-
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index f39869588117..ca64ee5e45d2 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -10,7 +10,7 @@
 struct fprobe;
 
 typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip,
-  unsigned long ret_ip, struct pt_regs *regs,
+  unsigned long ret_ip, struct ftrace_regs *regs,
   void *entry_data);
 
 typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f909792713ff..4a850250dab4 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -287,7 +287,7 @@ config DYNAMIC_FTRACE_WITH_ARGS
 config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
-   depends on DYNAMIC_FTRACE_WITH_REGS
+   depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
@@ -672,6 +672,7 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
+   depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
  This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9dc605f08a23..7837cf4e39d9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
 fs_initcall(bpf_event_init);
 #endif /* CONFIG_MODULES */
 
-#ifdef CONFIG_FPROBE
+#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
 struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2822,10 +2822,14 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link 
*link,
 
 static int
 kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
  void *data)
 {
struct bpf_kprobe_multi_link *link;
+   struct pt_regs *regs = ftrace_get_regs(fregs);
+
+   if (!regs)
+   return 0;
 
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
@@ -3099,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr 
*attr, struct bpf_prog *pr
kvfree(cookies);
return err;
 }
-#else /* !CONFIG_FPROBE */
+#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog 
*prog)
 {
return -EOPNOTSUPP;
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 9ff018245840..3d3789283873 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -46,7 +46,7 @@ static inline void __fprobe_handler(unsigned long ip, 
unsigned long parent_ip,
}
 
if (fp->entry_handler)
-   ret = fp->entry_handler(fp, ip, parent_ip, 
ftrace_get_regs(fregs), entry_data);
+   ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);
 
/* If entry_handler returns !0, nmissed is not counted. */
if (rh) {
@@ -182,6 +182,7 @@ static void fprobe_init(struct fprobe *fp)
fp->ops.func = fprobe_kprobe_handler;
else
fp->ops.func = fprobe_handler;
+
fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
 }
 
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 62e6a8f4aae9..b2c20d4fdfd7 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -338,12 +338,16 @@ NOKPROBE_SYMBOL(fexit_perf_func);
 #endif /* CONFIG_PERF_EVENTS */
 
 static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
-unsigned long ret_ip, struct pt_regs *regs,
+unsigned long ret_ip, struct ftrace_regs *fregs,
 void *entry_data)
 {
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+   struct pt_regs *regs = ftrace_get_regs(fregs);
int ret =

[PATCH v10 23/36] function_graph: Pass ftrace_regs to retfunc

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v8:
  - Pass ftrace_regs to retfunc, instead of adding retregfunc.
 Changes in v6:
  - update to use ftrace_regs_get_return_value() because of reordering
patches.
 Changes in v3:
  - Update for new multiple fgraph.
  - Save the return address to instruction pointer in ftrace_regs.
---
 include/linux/ftrace.h   |3 ++-
 kernel/trace/fgraph.c|   14 ++
 kernel/trace/ftrace.c|3 ++-
 kernel/trace/trace.h |3 ++-
 kernel/trace/trace_functions_graph.c |7 ---
 kernel/trace/trace_irqsoff.c |3 ++-
 kernel/trace/trace_sched_wakeup.c|3 ++-
 kernel/trace/trace_selftest.c|3 ++-
 8 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0af83b115886..1c185fefd932 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1067,7 +1067,8 @@ struct fgraph_ops;
 
 /* Type of the callback handlers for tracing function graph*/
 typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
-  struct fgraph_ops *); /* return */
+  struct fgraph_ops *,
+  struct ftrace_regs *); /* return */
 typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
  struct fgraph_ops *,
  struct ftrace_regs *); /* entry */
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index d520b2b78918..40f47fcbc6c3 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -264,7 +264,8 @@ static int entry_run(struct ftrace_graph_ent *trace, struct 
fgraph_ops *ops,
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops,
+  struct ftrace_regs *fregs)
 {
 }
 
@@ -455,7 +456,8 @@ int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
 }
 
 static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
 {
 }
 
@@ -799,6 +801,9 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, 
unsigned long frame_pointe
}
 
trace.rettime = trace_clock_local();
+   if (fregs)
+   ftrace_regs_set_instruction_pointer(fregs, ret);
+
 #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
trace.retval = ftrace_regs_get_return_value(fregs);
 #endif
@@ -812,7 +817,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, 
unsigned long frame_pointe
if (gops == _stub)
continue;
 
-   gops->retfunc(, gops);
+   gops->retfunc(, gops, fregs);
}
 
/*
@@ -976,7 +981,8 @@ void ftrace_graph_sleep_time_control(bool enable)
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
  */
-void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
+void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops,
+  struct ftrace_regs *fregs);
 
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5377a0b22ec9..e869258efc52 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -835,7 +835,8 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace,
 }
 
 static void profile_graph_return(struct ftrace_graph_ret *trace,
-struct fgraph_ops *gops)
+struct fgraph_ops *gops,
+struct ftrace_regs *fregs)
 {
struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8221b6febb51..81cb2a90cbda 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -681,7 +681,8 @@ void trace_latency_header(struct seq_file *m);
 void trace_default_header(struct seq_file *m);
 void print_trace_header(struct seq_file *m, struct trace_iterator *iter);
 
-void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
+void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops,
+   struct ftrace_regs *fregs);
 int trace_graph_entry(struct

[PATCH v10 22/36] function_graph: Replace fgraph_ret_regs with ftrace_regs

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.

The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v8:
  - Newly added.
---
 arch/arm64/Kconfig  |2 +-
 arch/arm64/include/asm/ftrace.h |   23 ++-
 arch/arm64/kernel/asm-offsets.c |   12 
 arch/arm64/kernel/entry-ftrace.S|   32 ++--
 arch/loongarch/Kconfig  |2 +-
 arch/loongarch/include/asm/ftrace.h |   24 ++--
 arch/loongarch/kernel/asm-offsets.c |   12 
 arch/loongarch/kernel/mcount.S  |   17 ++---
 arch/loongarch/kernel/mcount_dyn.S  |   14 +++---
 arch/riscv/Kconfig  |2 +-
 arch/riscv/include/asm/ftrace.h |   21 -
 arch/riscv/kernel/mcount.S  |   24 +---
 arch/s390/Kconfig   |2 +-
 arch/s390/include/asm/ftrace.h  |   26 +-
 arch/s390/kernel/asm-offsets.c  |6 --
 arch/s390/kernel/mcount.S   |9 +
 arch/x86/Kconfig|2 +-
 arch/x86/include/asm/ftrace.h   |   22 ++
 arch/x86/kernel/ftrace_32.S |   15 +--
 arch/x86/kernel/ftrace_64.S |   17 +
 include/linux/ftrace.h  |   14 +++---
 kernel/trace/Kconfig|4 ++--
 kernel/trace/fgraph.c   |   21 +
 23 files changed, 117 insertions(+), 206 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..8d5047bc13bc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,7 +209,7 @@ config ARM64
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
-   select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
+   select HAVE_FUNCTION_GRAPH_FREGS
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ab158196480c..ac82dc43a57d 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -137,6 +137,12 @@ ftrace_override_function_with_return(struct ftrace_regs 
*fregs)
fregs->pc = fregs->lr;
 }
 
+static __always_inline unsigned long
+ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs)
+{
+   return fregs->fp;
+}
+
 int ftrace_regs_query_register_offset(const char *name);
 
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
@@ -194,23 +200,6 @@ static inline bool arch_syscall_match_sym_name(const char 
*sym,
 
 #ifndef __ASSEMBLY__
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
-   /* x0 - x7 */
-   unsigned long regs[8];
-
-   unsigned long fp;
-   unsigned long __unused;
-};
-
-static inline unsigned long fgraph_ret_regs_return_value(struct 
fgraph_ret_regs *ret_regs)
-{
-   return ret_regs->regs[0];
-}
-
-static inline unsigned long fgraph_ret_regs_frame_pointer(struct 
fgraph_ret_regs *ret_regs)
-{
-   return ret_regs->fp;
-}
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
   unsigned long frame_pointer);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 81496083c041..81bb6704ff5a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -200,18 +200,6 @@ int main(void)
   DEFINE(FTRACE_OPS_FUNC,  offsetof(struct ftrace_ops, func));
 #endif
   BLANK();
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-  DEFINE(FGRET_REGS_X0,offsetof(struct 
fgraph_ret_regs, regs[0]));
-  DEFINE(FGRET_REGS_X1,offsetof(struct 
fgraph_ret_regs, regs[1]));
-  DEFINE(FGRET_REGS_X2,offsetof(struct 
fgraph_ret_regs, regs[2]));
-  DEFINE(FGRET_REGS_X3,offsetof(struct 
fgraph_ret_regs, regs[3]));
-  DEFINE(FGRET_REGS_X4,offsetof(struct 
fgraph_ret_regs, regs[4]));
-  DEFINE(FGRET_REGS_X5,offsetof(struct 
fgraph_ret_regs, regs[5]));
-  DEFINE(FGRET_REGS_X6,offsetof(struct 
fgraph_ret_regs, regs[6]));
-  DEFINE(FGRET_REGS_X7,offsetof(struct 
fgraph_ret_regs, regs[7]));
-  DEFINE(FGRET_REGS_FP,offsetof(struct 
fgraph_ret_regs, fp));
-  DEFINE(FGRET_REGS_SIZE,  sizeof(struct fgraph_ret_regs));
-#endif
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
   DEFINE(FTRACE_OPS_DIRECT_CALL,   offsetof(struct ftrace_ops, 
direct_call));

[PATCH v10 21/36] function_graph: Pass ftrace_regs to entryfunc

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v8:
  - Just pass ftrace_regs to the handler instead of adding a new
entryregfunc.
  - Update riscv ftrace_graph_func().
 Changes in v3:
  - Update for new multiple fgraph.
---
 arch/arm64/kernel/ftrace.c   |2 +
 arch/loongarch/kernel/ftrace_dyn.c   |2 +
 arch/powerpc/kernel/trace/ftrace.c   |2 +
 arch/powerpc/kernel/trace/ftrace_64_pg.c |   10 ---
 arch/riscv/kernel/ftrace.c   |2 +
 arch/x86/kernel/ftrace.c |   42 --
 include/linux/ftrace.h   |   20 +++---
 kernel/trace/fgraph.c|   21 +--
 kernel/trace/ftrace.c|3 +-
 kernel/trace/trace.h |3 +-
 kernel/trace/trace_functions_graph.c |3 +-
 kernel/trace/trace_irqsoff.c |3 +-
 kernel/trace/trace_sched_wakeup.c|3 +-
 kernel/trace/trace_selftest.c|8 --
 14 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index b96740829798..779b975f03f5 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -497,7 +497,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
return;
 
if (!function_graph_enter_ops(*parent, ip, frame_pointer,
- (void *)frame_pointer, gops))
+ (void *)frame_pointer, fregs, gops))
*parent = (unsigned long)_to_handler;
 
ftrace_test_recursion_unlock(bit);
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 920eb673b32b..155bdaba2012 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -254,7 +254,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
 
old = *parent;
 
-   if (!function_graph_enter_ops(old, ip, 0, parent, gops))
+   if (!function_graph_enter_ops(old, ip, 0, parent, fregs, gops))
*parent = return_hooker;
 }
 #else
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 4a9294821c0d..501adb80fc8d 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -435,7 +435,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
goto out;
 
-   if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, 
gops))
+   if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, 
fregs, gops))
parent_ip = ppc_function_entry(return_to_handler);
 
ftrace_test_recursion_unlock(bit);
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..4ae9eeb1c8f1 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -800,7 +800,8 @@ int ftrace_disable_ftrace_graph_caller(void)
  * in current thread info. Return the address we want to divert to.
  */
 static unsigned long
-__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long 
sp)
+__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long 
sp,
+   struct ftrace_regs *fregs)
 {
unsigned long return_hooker;
int bit;
@@ -817,7 +818,7 @@ __prepare_ftrace_return(unsigned long parent, unsigned long 
ip, unsigned long sp
 
return_hooker = ppc_function_entry(return_to_handler);
 
-   if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
+   if (!function_graph_enter_regs(parent, ip, 0, (unsigned long *)sp, 
fregs))
parent = return_hooker;
 
ftrace_test_recursion_unlock(bit);
@@ -829,13 +830,14 @@ __prepare_ftrace_return(unsigned long parent, unsigned 
long ip, unsigned long sp
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, 
fregs->regs.gpr[1]);
+   fregs->regs.link = __prepare_ftrace_return(parent_ip, ip,
+  fregs->regs.gpr[1], fregs);
 }
 #else
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp)
 {
-   return __prepare_ftrace_return(parent, ip, sp);
+   return __prepare_ftrace_return(parent, ip, sp, NULL);
 }
 #endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/riscv/kernel/ftrace.c

[PATCH v10 20/36] ftrace: Add multiple fgraph storage selftest

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add a selftest for multiple function graph tracer with storage on a same
function. In this case, the shadow stack entry will be shared among those
fgraph with different data storage. So this will ensure the fgraph will
not mixed those storage data.

Signed-off-by: Masami Hiramatsu (Google) 
Suggested-by: Steven Rostedt (Google) 
---
 Changes in v8:
  - Newly added.
---
 kernel/trace/trace_selftest.c |  171 ++---
 1 file changed, 126 insertions(+), 45 deletions(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index fcdc744c245e..369efc569238 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -762,28 +762,32 @@ trace_selftest_startup_function(struct tracer *trace, 
struct trace_array *tr)
 #define SHORT_NUMBER 12345
 #define WORD_NUMBER 1234567890
 #define LONG_NUMBER 1234567890123456789LL
-
-static int fgraph_store_size __initdata;
-static const char *fgraph_store_type_name __initdata;
-static char *fgraph_error_str __initdata;
-static char fgraph_error_str_buf[128] __initdata;
+#define ERRSTR_BUFLEN 128
+
+struct fgraph_fixture {
+   struct fgraph_ops gops;
+   int store_size;
+   const char *store_type_name;
+   char error_str_buf[ERRSTR_BUFLEN];
+   char *error_str;
+};
 
 static __init int store_entry(struct ftrace_graph_ent *trace,
  struct fgraph_ops *gops)
 {
-   const char *type = fgraph_store_type_name;
-   int size = fgraph_store_size;
+   struct fgraph_fixture *fixture = container_of(gops, struct 
fgraph_fixture, gops);
+   const char *type = fixture->store_type_name;
+   int size = fixture->store_size;
void *p;
 
p = fgraph_reserve_data(gops->idx, size);
if (!p) {
-   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+   snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
 "Failed to reserve %s\n", type);
-   fgraph_error_str = fgraph_error_str_buf;
return 0;
}
 
-   switch (fgraph_store_size) {
+   switch (size) {
case 1:
*(char *)p = BYTE_NUMBER;
break;
@@ -804,7 +808,8 @@ static __init int store_entry(struct ftrace_graph_ent 
*trace,
 static __init void store_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
 {
-   const char *type = fgraph_store_type_name;
+   struct fgraph_fixture *fixture = container_of(gops, struct 
fgraph_fixture, gops);
+   const char *type = fixture->store_type_name;
long long expect = 0;
long long found = -1;
int size;
@@ -812,20 +817,18 @@ static __init void store_return(struct ftrace_graph_ret 
*trace,
 
p = fgraph_retrieve_data(gops->idx, );
if (!p) {
-   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+   snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
 "Failed to retrieve %s\n", type);
-   fgraph_error_str = fgraph_error_str_buf;
return;
}
-   if (fgraph_store_size > size) {
-   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+   if (fixture->store_size > size) {
+   snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
 "Retrieved size %d is smaller than expected %d\n",
-size, (int)fgraph_store_size);
-   fgraph_error_str = fgraph_error_str_buf;
+size, (int)fixture->store_size);
return;
}
 
-   switch (fgraph_store_size) {
+   switch (fixture->store_size) {
case 1:
expect = BYTE_NUMBER;
found = *(char *)p;
@@ -845,45 +848,44 @@ static __init void store_return(struct ftrace_graph_ret 
*trace,
}
 
if (found != expect) {
-   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+   snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
 "%s returned not %lld but %lld\n", type, expect, 
found);
-   fgraph_error_str = fgraph_error_str_buf;
return;
}
-   fgraph_error_str = NULL;
+   fixture->error_str = NULL;
 }
 
-static struct fgraph_ops store_bytes __initdata = {
-   .entryfunc  = store_entry,
-   .retfunc= store_return,
-};
-
-static int __init test_graph_storage_type(const char *name, int size)
+static int __init init_fgraph_fixture(struct fgraph_fixture *fixture)
 {
char *func_name;
int len;
-   int ret;
 
-   fgraph_store_type_name = name;
-   fgraph_store_size = size;
+   snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
+"Failed to execute storage %s\n", fixture->store_type_name);
+   fixture->error_str = fixture->error_str_buf;

[PATCH v10 19/36] function_graph: Add selftest for passing local variables

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Add reserved size test.
  - Use pr_*() instead of printk(KERN_*).
---
 kernel/trace/trace_selftest.c |  169 +
 1 file changed, 169 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index f8f55fd79e53..fcdc744c245e 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -756,6 +756,173 @@ trace_selftest_startup_function(struct tracer *trace, 
struct trace_array *tr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   int size = fgraph_store_size;
+   void *p;
+
+   p = fgraph_reserve_data(gops->idx, size);
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to reserve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return 0;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   *(char *)p = BYTE_NUMBER;
+   break;
+   case 2:
+   *(short *)p = SHORT_NUMBER;
+   break;
+   case 4:
+   *(int *)p = WORD_NUMBER;
+   break;
+   case 8:
+   *(long long *)p = LONG_NUMBER;
+   break;
+   }
+
+   return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+   struct fgraph_ops *gops)
+{
+   const char *type = fgraph_store_type_name;
+   long long expect = 0;
+   long long found = -1;
+   int size;
+   char *p;
+
+   p = fgraph_retrieve_data(gops->idx, );
+   if (!p) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to retrieve %s\n", type);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+   if (fgraph_store_size > size) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Retrieved size %d is smaller than expected %d\n",
+size, (int)fgraph_store_size);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+
+   switch (fgraph_store_size) {
+   case 1:
+   expect = BYTE_NUMBER;
+   found = *(char *)p;
+   break;
+   case 2:
+   expect = SHORT_NUMBER;
+   found = *(short *)p;
+   break;
+   case 4:
+   expect = WORD_NUMBER;
+   found = *(int *)p;
+   break;
+   case 8:
+   expect = LONG_NUMBER;
+   found = *(long long *)p;
+   break;
+   }
+
+   if (found != expect) {
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"%s returned not %lld but %lld\n", type, expect, 
found);
+   fgraph_error_str = fgraph_error_str_buf;
+   return;
+   }
+   fgraph_error_str = NULL;
+}
+
+static struct fgraph_ops store_bytes __initdata = {
+   .entryfunc  = store_entry,
+   .retfunc= store_return,
+};
+
+static int __init test_graph_storage_type(const char *name, int size)
+{
+   char *func_name;
+   int len;
+   int ret;
+
+   fgraph_store_type_name = name;
+   fgraph_store_size = size;
+
+   snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+"Failed to execute storage %s\n", name);
+   fgraph_error_str = fgraph_error_str_buf;
+
+   pr_cont("PASSED\n");
+   pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : 
"");
+
+   func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+   len = strlen(func_name);
+
+   ret = ftrace_set_filter(_bytes.ops, func_name, len, 1);
+   if (ret && ret != -ENODEV) {
+   pr_cont("*Could not set filter* ");
+   return -1;
+   }
+
+   ret = register_ftrace_graph(_bytes);
+   if (ret) {
+   pr_warn("Failed to init store_bytes fgraph tracing\n");
+   return -1;
+   }
+
+   DYN_FTRACE_TEST_NAME();
+
+   unregister_ftrace_graph(_bytes);

[PATCH v10 04/36] function_graph: Convert ret_stack to a series of longs

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/sched.h |2 -
 kernel/trace/fgraph.c |  124 -
 2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3c2abbc587b4..e453ad8d2d79 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1396,7 +1396,7 @@ struct task_struct {
int curr_ret_depth;
 
/* Stack of return addresses for return function tracing: */
-   struct ftrace_ret_stack *ret_stack;
+   unsigned long   *ret_stack;
 
/* Timestamp for last schedule: */
unsigned long long  ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c83c005e654e..30edeb6d4aa9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,6 +25,18 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX \
+   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -69,6 +81,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp)
 {
+   struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
int index;
 
@@ -85,23 +98,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
smp_rmb();
 
/* The return trace stack is full */
-   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
atomic_inc(>trace_overrun);
return -EBUSY;
}
 
calltime = trace_clock_local();
 
-   index = ++current->curr_ret_stack;
+   index = current->curr_ret_stack;
+   RET_STACK_INC(current->curr_ret_stack);
+   ret_stack = RET_STACK(current, index);
barrier();
-   current->ret_stack[index].ret = ret;
-   current->ret_stack[index].func = func;
-   current->ret_stack[index].calltime = calltime;
+   ret_stack->ret = ret;
+   ret_stack->func = func;
+   ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-   current->ret_stack[index].fp = frame_pointer;
+   ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-   current->ret_stack[index].retp = retp;
+   ret_stack->retp = retp;
 #endif
return 0;
 }
@@ -148,7 +163,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
 
return 0;
  out_ret:
-   current->curr_ret_stack--;
+   RET_STACK_DEC(current->curr_ret_stack);
  out:
current->curr_ret_depth--;
return -EBUSY;
@@ -159,11 +174,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
unsigned long frame_pointer)
 {
+   struct ftrace_ret_stack *ret_stack;
int index;
 
index = current->curr_ret_stack;
+   RET_STACK_DEC(index);
 
-   if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+   if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
ftrace_graph_stop();
WARN_ON(1);
/* Might as well panic, otherwise we have no where to go */
@@ -171,6 +188,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
return;
}
 
+   ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
/*
 * The arch may choose to record the frame pointer used
@@ -186,22 +204,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 * Note, -mfentry does not use frame pointers, and this test
 *  is not needed if CC_USING_FENTRY is set.
 */
-   if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+   if (unlikely(ret_stack->fp != frame_pointer)) {
ftrace_graph_stop();
WARN(1, "Bad frame

[PATCH v10 18/36] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call
fgraph_reserve_data() to store up to 32 words onto the task's shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v10:
  - Fix to support data size up to 32 words (previously it only support up to
31 words but the max size check missed to check it is 32 words)
  - Use "offset" instead of "index".
 Changes in v8:
  - Avoid using DIV_ROUND_UP() in the hot path.
 Changes in v3:
  - Store fgraph_array index to the data entry.
  - Both function requires fgraph_array index to store/retrieve data.
  - Reserve correct size of the data.
  - Return correct data area.
 Changes in v2:
  - Retrieve the reserved size by fgraph_retrieve_data().
  - Expand the maximum data size to 32 words.
  - Update stack index with __get_index(val) if FGRAPH_TYPE_ARRAY entry.
  - fix typos and make description lines shorter than 76 chars.
---
 include/linux/ftrace.h |3 +
 kernel/trace/fgraph.c  |  179 ++--
 2 files changed, 175 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 97f7d1cf4f8f..1eee028b9a75 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1075,6 +1075,9 @@ struct fgraph_ops {
int idx;
 };
 
+void *fgraph_reserve_data(int idx, int size_bytes);
+void *fgraph_retrieve_data(int idx, int *size_bytes);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 3498e8fd8e53..4f62e82448f6 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -37,14 +37,20 @@
  * bits: 10 - 11   Type of storage
  *   0 - reserved
  *   1 - bitmap of fgraph_array index
+ *   2 - reserved data
  *
  * For bitmap of fgraph_array index
  *  bits: 12 - 27  The bitmap of fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 12 - 17  The size in words that is stored
+ *  bits: 18 - 23  The index of fgraph_array, which shows who is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
  *
  *  ret_stack[SHADOW_STACK_IN_WORD]
  * | SHADOW_STACK_TASK_VARS(ret_stack)[15]  |
@@ -54,9 +60,17 @@
  * ...
  * || <- task->curr_ret_stack
  * ++
+ * | (3 << FGRAPH_DATA_INDEX_SHIFT)| \  | This is for fgraph_ops[3].
+ * | ((2 - 1) << FGRAPH_DATA_SHIFT)| \  | The data size is 2 words.
+ * | (FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT)| \ |
+ * | (offset2:FGRAPH_FRAME_OFFSET+3)| <- the offset2 is from here
+ * ++ ( It is 4 words from the 
ret_stack)
+ * |STORED DATA WORD 2  |
+ * |STORED DATA WORD 1  |
+ * +--i-+
  * | (BIT(3)|BIT(0)) << FGRAPH_INDEX_SHIFT | \  |
  * | FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT| \ |
- * | (offset:FGRAPH_FRAME_OFFSET)   | <- the offset is from here
+ * | (offset1:FGRAPH_FRAME_OFFSET)  | <- the offset1 is from here
  * ++
  * | struct ftrace_ret_stack|
  * |   (stores the saved ret pointer)   | <- the offset points here
@@ -83,12 +97,26 @@
 enum {
FGRAPH_TYPE_RESERVED= 0,
FGRAPH_TYPE_BITMAP  = 1,
+   FGRAPH_TYPE_DATA= 2,
 };
 
 #define FGRAPH_INDEX_SIZE  16
 #define FGRAPH_INDEX_MASK  GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
 #define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
+/* The data size == 0 means 1 word, and 31 (=2^5 - 1) means 32 words. */
+#define FGRAPH_DATA_SIZE   5
+#define FGRAPH_DATA_MASK   GENMASK(FGRAPH_DATA_SIZE - 1, 0)
+#define FGRAPH_DATA_SHIFT  (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
+
+#define FGRAPH_DATA_INDEX_SIZE 4
+#define FGRAPH_DATA_INDEX_MASK GENMASK(FGRAPH_DATA_INDEX_SIZE - 1, 0)
+#define FGRAPH_DATA_INDEX_SHIFT(FGRAPH_DATA_SHIFT + FGRAPH_DATA_SIZE)
+
+#define FGRAPH_MAX_INDEX   \
+

[PATCH v10 16/36] function_graph: Move graph depth stored data to shadow stack global var

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/trace_recursion.h |   29 -
 kernel/trace/trace.h|   34 --
 2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 02e6afc6d7fe..fdfb6f66718a 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,25 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /*
-* In the very unlikely case that an interrupt came in
-* at a start of graph tracing, and we want to trace
-* the function in that interrupt, the depth can be greater
-* than zero, because of the preempted start of a previous
-* trace. In an even more unlikely case, depth could be 2
-* if a softirq interrupted the start of graph tracing,
-* followed by an interrupt preempting a start of graph
-* tracing in the softirq, and depth can even be 3
-* if an NMI came in at the start of an interrupt function
-* that preempted a softirq start of a function that
-* preempted normal context Luckily, it can't be
-* greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_FL
-*/
-
-   TRACE_GRAPH_DEPTH_START_BIT,
-   TRACE_GRAPH_DEPTH_END_BIT,
-
/*
 * To implement set_graph_notrace, if this bit is set, we ignore
 * function graph tracing of called functions, until the return
@@ -78,16 +59,6 @@ enum {
 #define trace_recursion_clear(bit) do { (current)->trace_recursion &= 
~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)  ((current)->trace_recursion & 
(1<<(bit)))
 
-#define trace_recursion_depth() \
-   (((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
-   do {\
-   current->trace_recursion &= \
-   ~(3 << TRACE_GRAPH_DEPTH_START_BIT);\
-   current->trace_recursion |= \
-   ((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;   \
-   } while (0)
-
 #define TRACE_CONTEXT_BITS 4
 
 #define TRACE_FTRACE_START TRACE_FTRACE_BIT
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c7c7e7c9f700..7ab731b9ebc8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -899,8 +899,38 @@ extern void free_fgraph_ops(struct trace_array *tr);
 
 enum {
TRACE_GRAPH_FL  = 1,
+
+   /*
+* In the very unlikely case that an interrupt came in
+* at a start of graph tracing, and we want to trace
+* the function in that interrupt, the depth can be greater
+* than zero, because of the preempted start of a previous
+* trace. In an even more unlikely case, depth could be 2
+* if a softirq interrupted the start of graph tracing,
+* followed by an interrupt preempting a start of graph
+* tracing in the softirq, and depth can even be 3
+* if an NMI came in at the start of an interrupt function
+* that preempted a softirq start of a function that
+* preempted normal context Luckily, it can't be
+* greater than 3, so the next two bits are a mask
+* of what the depth is when we set TRACE_GRAPH_FL
+*/
+
+   TRACE_GRAPH_DEPTH_START_BIT,
+   TRACE_GRAPH_DEPTH_END_BIT,
 };
 
+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+   return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+   *task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+   *task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
@@ -933,7 +963,7 @@ ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 * when the depth is zero.
 */
*task_var |= TRACE_GRAPH_FL;
-   trace_recursion_set_depth(trace->depth);
+   ftrace_graph_set_depth(task_var, trace->depth);
 
/*
 * If no irqs are to be traced, but a set_graph_function
@@ -958,7 +988,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace
unsigned long *task_var = fgraph_get_task_var(gops);
 
if

[PATCH v10 15/36] function_graph: Move set_graph_function tests to shadow stack global var

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/trace_recursion.h  |5 +
 kernel/trace/trace.h |   32 +---
 kernel/trace/trace_functions_graph.c |6 +++---
 kernel/trace/trace_irqsoff.c |4 ++--
 kernel/trace/trace_sched_wakeup.c|4 ++--
 5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 24ea8ac049b4..02e6afc6d7fe 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,9 +44,6 @@ enum {
  */
TRACE_IRQ_BIT,
 
-   /* Set if the function is in the set_graph_function file */
-   TRACE_GRAPH_BIT,
-
/*
 * In the very unlikely case that an interrupt came in
 * at a start of graph tracing, and we want to trace
@@ -60,7 +57,7 @@ enum {
 * that preempted a softirq start of a function that
 * preempted normal context Luckily, it can't be
 * greater than 3, so the next two bits are a mask
-* of what the depth is when we set TRACE_GRAPH_BIT
+* of what the depth is when we set TRACE_GRAPH_FL
 */
 
TRACE_GRAPH_DEPTH_START_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9995d6b00a93..c7c7e7c9f700 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -897,11 +897,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, 
struct ftrace_ops *ops
 extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
+enum {
+   TRACE_GRAPH_FL  = 1,
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
 extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
 
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
unsigned long addr = trace->func;
int ret = 0;
@@ -923,12 +928,11 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
}
 
if (ftrace_lookup_ip(hash, addr)) {
-
/*
 * This needs to be cleared on the return functions
 * when the depth is zero.
 */
-   trace_recursion_set(TRACE_GRAPH_BIT);
+   *task_var |= TRACE_GRAPH_FL;
trace_recursion_set_depth(trace->depth);
 
/*
@@ -948,11 +952,14 @@ static inline int ftrace_graph_addr(struct 
ftrace_graph_ent *trace)
return ret;
 }
 
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret 
*trace)
 {
-   if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
+   if ((*task_var & TRACE_GRAPH_FL) &&
trace->depth == trace_recursion_depth())
-   trace_recursion_clear(TRACE_GRAPH_BIT);
+   *task_var &= ~TRACE_GRAPH_FL;
 }
 
 static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -979,7 +986,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long 
addr)
 }
 
 #else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct 
ftrace_graph_ent *trace)
 {
return 1;
 }
@@ -988,17 +995,20 @@ static inline int ftrace_graph_notrace_addr(unsigned long 
addr)
 {
return 0;
 }
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct 
ftrace_graph_ret *trace)
 { }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 extern unsigned int fgraph_max_depth;
 
-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent 
*trace)
 {
+   unsigned long *task_var = fgraph_get_task_var(gops);
+
/* trace it when it is-nested-in or is a function enabled. */
-   return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
-ftrace_graph_addr(trace)) ||
+   return !((*task_var & TRACE_GRAPH_FL) ||
+ftrace_graph_addr(task_var, trace)) ||
(trace->depth < 0) ||
(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 7f30652f0e97..66cce73e94f8 100644
--- a/kernel/trace/trace_functions_graph.c
+++

[PATCH v10 07/36] function_graph: Allow multiple users to attach to function graph

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.

Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v10:
  - Rephrase all "index" on shadow stack to "offset" and some "ret_stack" to
   "frame".
  - Macros are renamed;
FGRAPH_RET_SIZE to FGRAPH_FRAME_SIZE
FGRAPH_RET_INDEX to FGRAPH_FRAME_OFFSET
FGRAPH_RET_INDEX_SIZE to FGRAPH_FRAME_OFFSET_SIZE
FGRAPH_RET_INDEX_MASK to FGRAPH_FRAME_OFFSET_MASK
SHADOW_STACK_INDEX to SHADOW_STACK_IN_WORD
SHADOW_STACK_MAX_INDEX to SHADOW_STACK_MAX_OFFSET
  - Remove unused FGRAPH_MAX_INDEX macro.
  - Fix wrong explanation of the shadow stack entry.
 Changes in v7:
  - Fix max limitation check in ftrace_graph_push_return().
  - Rewrite the shadow stack implementation using bitmap entry. This allows
us to detect recursive call/tail call easier. (this implementation is
moved from later patch in the series.
 Changes in v2:
  - Check return value of the ftrace_pop_return_trace() instead of 'ret'
since 'ret' is set to the address of panic().
  - Fix typo and make lines shorter than 76 chars in description.
---
 include/linux/ftrace.h |3 
 kernel/trace/fgraph.c  |  378 +++-
 2 files changed, 310 insertions(+), 71 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d5df5f8fc35a..0a1a7316de7b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1066,6 +1066,7 @@ extern int ftrace_graph_entry_stub(struct 
ftrace_graph_ent *trace);
 struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
+   int idx;
 };
 
 /*
@@ -1100,7 +1101,7 @@ function_graph_enter(unsigned long ret, unsigned long 
func,
 unsigned long frame_pointer, unsigned long *retp);
 
 struct ftrace_ret_stack *
-ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
+ftrace_graph_get_ret_stack(struct task_struct *task, int skip);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 3f9dd213e7d8..6b06962657fe 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,6 +7,7 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include 
 #include 
 #include 
 #include 
@@ -25,25 +26,157 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
+#define FGRAPH_FRAME_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_FRAME_OFFSET DIV_ROUND_UP(FGRAPH_FRAME_SIZE, sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack with bitmap entry, then each
+ * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
+ * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
+ * on the bitmap entry as a bit flag.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits:  0 -  9   offset in words from the previous ftrace_ret_stack
+ * (bitmap type should have

[PATCH v10 14/36] function_graph: Add "task variables" per task for fgraph_ops

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total
of 16, taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v10:
  - Explain where the task vars is placed in shadow stack.
 Changes in v3:
  - Move fgraph_ops::idx to previous patch in the series.
 Changes in v2:
  - Make description lines shorter than 76 chars.
---
 include/linux/ftrace.h |1 +
 kernel/trace/fgraph.c  |   74 +++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b11af9d88438..97f7d1cf4f8f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1116,6 +1116,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
skip);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c00a299decb1..3498e8fd8e53 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -46,6 +46,10 @@
  * on the return of the function being traced, this is what will be on the
  * task's shadow ret_stack: (the stack grows upward)
  *
+ *  ret_stack[SHADOW_STACK_IN_WORD]
+ * | SHADOW_STACK_TASK_VARS(ret_stack)[15]  |
+ * ...
+ * | SHADOW_STACK_TASK_VARS(ret_stack)[0]   |
  *  ret_stack[SHADOW_STACK_MAX_OFFSET]
  * ...
  * || <- task->curr_ret_stack
@@ -90,10 +94,18 @@ enum {
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_IN_WORD (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_OFFSET (SHADOW_STACK_IN_WORD - (FGRAPH_FRAME_OFFSET + 
1))
+#define SHADOW_STACK_MAX_OFFSET\
+   (SHADOW_STACK_IN_WORD - (FGRAPH_FRAME_OFFSET + 1 + FGRAPH_ARRAY_SIZE))
 
 #define RET_STACK(t, offset) ((struct ftrace_ret_stack 
*)(&(t)->ret_stack[offset]))
 
+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+   ((unsigned long *)(&(ret_stack)[SHADOW_STACK_IN_WORD - 
FGRAPH_ARRAY_SIZE]))
+
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
@@ -184,6 +196,44 @@ static void return_run(struct ftrace_graph_ret *trace, 
struct fgraph_ops *ops)
 {
 }
 
+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+   return [idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+   unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+   memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+   return ret_stack_get_task_var(current, gops->idx);
+}
+
 /*
  * @offset: The offset into @t->ret_stack to find the ret_stack entry
  * @frame_offset: Where to place the offset into @t->ret_stack of that entry
@@ -793,6 +843,7 @@ static int alloc_retstack_tasklist(unsigned long 
**ret_stack_list)
 
if (t->ret_stack == NULL) {
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack_list[start]);
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
/* Make sure the tasks see the 0 first: */
@@ -853,6 +904,7 @@ static void
 graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
atomic_set(>trace_overrun, 0);
+   ret_stack_init_task_vars(ret_stack);
t->ftrace_timestamp = 0;
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
@@ -951,6 +1003,24 @@ static int start_graph_tracing(void)
return ret;
 }
 
+static void init_task_vars(int idx)
+{
+   struct task_struct *g, *t;
+   int cpu;
+
+

[PATCH v10 13/36] function_graph: Use a simple LRU for fgraph_array index number

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Since the fgraph_array index is used for the bitmap on the shadow
stack, it may leave some entries after a function_graph instance is
removed. Thus if another instance reuses the fgraph_array index soon
after releasing it, the fgraph may confuse to call the newer callback
for the entries which are pushed by the older instance.
To avoid reusing the fgraph_array index soon after releasing, introduce
a simple LRU table for managing the index number. This will reduce the
possibility of this confusion.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v8:
  - Add a WARN_ON_ONCE() if fgraph_lru_table[] is broken when releasing
index, and remove WARN_ON_ONCE() from unregister_ftrace_graph()
  - Fix to release allocated index if register_ftrace_graph() fails.
  - Add comments and code cleanup.
 Changes in v5:
  - Fix the underflow bug in fgraph_lru_release_index() and return 0
if the release is succeded.
 Changes in v4:
  - Newly added.
---
 kernel/trace/fgraph.c |   71 +++--
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b6949e4fda79..c00a299decb1 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -97,10 +97,48 @@ enum {
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
-static int fgraph_array_cnt;
-
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+/* LRU index table for fgraph_array */
+static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
+static int fgraph_lru_next;
+static int fgraph_lru_last;
+
+/* Initialize fgraph_lru_table with unused index */
+static void fgraph_lru_init(void)
+{
+   int i;
+
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+   fgraph_lru_table[i] = i;
+}
+
+/* Release the used index to the LRU table */
+static int fgraph_lru_release_index(int idx)
+{
+   if (idx < 0 || idx >= FGRAPH_ARRAY_SIZE ||
+   WARN_ON_ONCE(fgraph_lru_table[fgraph_lru_last] != -1))
+   return -1;
+
+   fgraph_lru_table[fgraph_lru_last] = idx;
+   fgraph_lru_last = (fgraph_lru_last + 1) % FGRAPH_ARRAY_SIZE;
+   return 0;
+}
+
+/* Allocate a new index from LRU table */
+static int fgraph_lru_alloc_index(void)
+{
+   int idx = fgraph_lru_table[fgraph_lru_next];
+
+   /* No id is available */
+   if (idx == -1)
+   return -1;
+
+   fgraph_lru_table[fgraph_lru_next] = -1;
+   fgraph_lru_next = (fgraph_lru_next + 1) % FGRAPH_ARRAY_SIZE;
+   return idx;
+}
+
 static inline int get_frame_offset(struct task_struct *t, int offset)
 {
return t->ret_stack[offset] & FGRAPH_FRAME_OFFSET_MASK;
@@ -365,7 +403,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
if (offset < 0)
goto out;
 
-   for (i = 0; i < fgraph_array_cnt; i++) {
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
struct fgraph_ops *gops = fgraph_array[i];
 
if (gops == _stub)
@@ -917,7 +955,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 {
int command = 0;
int ret = 0;
-   int i;
+   int i = -1;
 
mutex_lock(_lock);
 
@@ -933,21 +971,16 @@ int register_ftrace_graph(struct fgraph_ops *gops)
/* The array must always have real data on it */
for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
fgraph_array[i] = _stub;
+   fgraph_lru_init();
}
 
-   /* Look for an available spot */
-   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
-   if (fgraph_array[i] == _stub)
-   break;
-   }
-   if (i >= FGRAPH_ARRAY_SIZE) {
+   i = fgraph_lru_alloc_index();
+   if (i < 0 || WARN_ON_ONCE(fgraph_array[i] != _stub)) {
ret = -ENOSPC;
goto out;
}
 
fgraph_array[i] = gops;
-   if (i + 1 > fgraph_array_cnt)
-   fgraph_array_cnt = i + 1;
gops->idx = i;
 
ftrace_graph_active++;
@@ -971,6 +1004,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
if (ret) {
fgraph_array[i] = _stub;
ftrace_graph_active--;
+   fgraph_lru_release_index(i);
}
 out:
mutex_unlock(_lock);
@@ -980,25 +1014,20 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
int command = 0;
-   int i;
 
mutex_lock(_lock);
 
if (unlikely(!ftrace_graph_active))
goto out;
 
-   if (unlikely(gops->idx < 0 || gops->idx >= fgraph_array_cnt))
+   if (unlikely(gops->idx < 0 || gops->idx >= FGRAPH_ARRAY_SIZE ||
+fgraph_array[gops->idx] != gops))
goto out;
 
-   WARN_ON_ONCE(fgraph_array[gops->idx] != gops);
+   if (fgraph_lru_release_index(gops->idx) < 0)
+   goto out;

[PATCH v10 12/36] function_graph: Have the instances use their own ftrace_ops for filtering

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Allow for instances to have their own ftrace_ops part of the fgraph_ops
that makes the funtion_graph tracer filter on the set_ftrace_filter file
of the instance and not the top instance.

Note that this also requires to update ftrace_graph_func() to call new
function_graph_enter_ops() instead of function_graph_enter() so that
it avoid pushing on shadow stack multiple times on the same function.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v10:
  - Use "offset" for shadow stack instead of "index".
 Changes in v9:
  - Fix to clear fgraph_array correctly when ftrace_startup() fails.
  - Return -ENOSPC if fgraph_array is full.
 Changes in v8:
  - Fix a compilation error in loongarch implementation.
  - Update riscv implementation of ftrace_graph_func().
 Changes in v7:
  - Move FGRAPH_TYPE_BITMAP type implementation to earlier patch (
which implements FGRAPH_TYPE_ARRAY) so that it does not need to
replace the FGRAPH_TYPE_ARRAY type.
  - Update loongarch and powerpc implementation of ftrace_graph_func().
  - Update description.
 Changes in v6:
  - Fix to check whether the fgraph_ops is already unregistered in
function_graph_enter_ops().
  - Fix stack unwinder error on arm64 because of passing wrong value
as retp. Thanks Mark!
 Changes in v4:
  - Simplify get_ret_stack() sanity check and use WARN_ON_ONCE() for
obviously wrong value.
  - Do not check ret == return_to_handler but always read the previous
ret_stack in ftrace_push_return_trace() to check it is reusable.
  - Set the bit 0 of the bitmap entry always in function_graph_enter()
because it uses bit 0 to check re-usability.
  - Fix to ensure the ret_stack entry is bitmap type when checking the
bitmap.
 Changes in v3:
  - Pass current fgraph_ops to the new entry handler
   (function_graph_enter_ops) if fgraph use ftrace.
  - Add fgraph_ops::idx in this patch.
  - Replace the array type with the bitmap type so that it can record
which fgraph is called.
  - Fix some helper function to use passed task_struct instead of current.
  - Reduce the ret-index size to 1024 words.
  - Make the ret-index directly points the ret_stack.
  - Fix ftrace_graph_ret_addr() to handle tail-call case correctly.
 Changes in v2:
  - Use ftrace_graph_func and FTRACE_OPS_GRAPH_STUB instead of
ftrace_stub and FTRACE_OPS_FL_STUB for new ftrace based fgraph.
---
 arch/arm64/kernel/ftrace.c   |   21 +-
 arch/loongarch/kernel/ftrace_dyn.c   |   15 
 arch/powerpc/kernel/trace/ftrace.c   |3 +
 arch/riscv/kernel/ftrace.c   |   15 
 arch/x86/kernel/ftrace.c |   19 +
 include/linux/ftrace.h   |6 ++
 kernel/trace/fgraph.c|  125 +-
 kernel/trace/ftrace.c|4 +
 kernel/trace/trace.h |   16 ++--
 kernel/trace/trace_functions.c   |2 -
 kernel/trace/trace_functions_graph.c |8 ++
 11 files changed, 183 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..b96740829798 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -481,7 +481,26 @@ void prepare_ftrace_return(unsigned long self_addr, 
unsigned long *parent,
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   prepare_ftrace_return(ip, >lr, fregs->fp);
+   struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+   unsigned long frame_pointer = fregs->fp;
+   unsigned long *parent = >lr;
+   int bit;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return;
+
+   if (unlikely(atomic_read(>tracing_graph_pause)))
+   return;
+
+   bit = ftrace_test_recursion_trylock(ip, *parent);
+   if (bit < 0)
+   return;
+
+   if (!function_graph_enter_ops(*parent, ip, frame_pointer,
+ (void *)frame_pointer, gops))
+   *parent = (unsigned long)_to_handler;
+
+   ftrace_test_recursion_unlock(bit);
 }
 #else
 /*
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..920eb673b32b 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -241,10 +241,21 @@ void prepare_ftrace_return(unsigned long self_addr, 
unsigned long *parent)
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
+   struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+   unsigned long return_hooker = (unsigned long)_to_handler;
struct pt_regs *regs = >regs;
-   unsigned long *parent = (unsigned long *)>regs[1];
+   unsigned long *parent;
+   unsigned long old;
+
+   parent =

[PATCH v10 11/36] ftrace: Allow ftrace startup flags exist without dynamic ftrace

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 include/linux/ftrace.h |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ff70ee437209..c4d81e0ec862 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -538,6 +538,15 @@ static inline void stack_tracer_disable(void) { }
 static inline void stack_tracer_enable(void) { }
 #endif
 
+enum {
+   FTRACE_UPDATE_CALLS = (1 << 0),
+   FTRACE_DISABLE_CALLS= (1 << 1),
+   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
+   FTRACE_START_FUNC_RET   = (1 << 3),
+   FTRACE_STOP_FUNC_RET= (1 << 4),
+   FTRACE_MAY_SLEEP= (1 << 5),
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 void ftrace_arch_code_modify_prepare(void);
@@ -632,15 +641,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int 
len, int reset);
 void ftrace_free_filter(struct ftrace_ops *ops);
 void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
 
-enum {
-   FTRACE_UPDATE_CALLS = (1 << 0),
-   FTRACE_DISABLE_CALLS= (1 << 1),
-   FTRACE_UPDATE_TRACE_FUNC= (1 << 2),
-   FTRACE_START_FUNC_RET   = (1 << 3),
-   FTRACE_STOP_FUNC_RET= (1 << 4),
-   FTRACE_MAY_SLEEP= (1 << 5),
-};
-
 /*
  * The FTRACE_UPDATE_* enum is used to pass information back
  * from the ftrace_update_record() and ftrace_test_record()

[PATCH v10 10/36] ftrace: Allow function_graph tracer to be enabled in instances

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Fix to remove set_graph_array() completely.
---
 include/linux/ftrace.h   |1 +
 kernel/trace/ftrace.c|1 +
 kernel/trace/trace.h |   13 ++-
 kernel/trace/trace_functions.c   |8 
 kernel/trace/trace_functions_graph.c |   65 +-
 kernel/trace/trace_selftest.c|4 +-
 6 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index f9e51cbf9a97..ff70ee437209 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1070,6 +1070,7 @@ extern int ftrace_graph_entry_stub(struct 
ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
trace_func_graph_ent_t  entryfunc;
trace_func_graph_ret_t  retfunc;
+   void*private;
int idx;
 };
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4b0708106692..92abb9869198 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7326,6 +7326,7 @@ __init void ftrace_init_global_array_ops(struct 
trace_array *tr)
tr->ops = _ops;
tr->ops->private = tr;
ftrace_init_trace_array(tr);
+   init_array_fgraph_ops(tr);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55bb9a3bf322..114b120afd2a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -396,6 +396,9 @@ struct trace_array {
struct ftrace_ops   *ops;
struct trace_pid_list   __rcu *function_pids;
struct trace_pid_list   __rcu *function_no_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   struct fgraph_ops   *gops;
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
/* All of these are protected by the ftrace_lock */
struct list_headfunc_probes;
@@ -680,7 +683,6 @@ void print_trace_header(struct seq_file *m, struct 
trace_iterator *iter);
 
 void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
 int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
-void set_graph_array(struct trace_array *tr);
 
 void tracing_start_cmdline_record(void);
 void tracing_stop_cmdline_record(void);
@@ -891,6 +893,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 struct ftrace_graph_ret *trace,
 unsigned int trace_ctx);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash __rcu *ftrace_graph_hash;
@@ -1003,6 +1008,12 @@ print_graph_function_flags(struct trace_iterator *iter, 
u32 flags)
 {
return TRACE_TYPE_UNHANDLED;
 }
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+   return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..8e8da0d0ee52 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -80,6 +80,7 @@ void ftrace_free_ftrace_ops(struct trace_array *tr)
 int ftrace_create_function_files(struct trace_array *tr,
 struct dentry *parent)
 {
+   int ret;
/*
 * The top level array uses the "global_ops", and the files are
 * created on boot up.
@@ -90,6 +91,12 @@ int ftrace_create_function_files(struct trace_array *tr,
if (!tr->ops)
return -EINVAL;
 
+   ret = allocate_fgraph_ops(tr);
+   if (ret) {
+   kfree(tr->ops);
+   return ret;
+   }
+
ftrace_create_filter_files(tr->ops, parent);
 
return 0;
@@ -99,6 +106,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
 {
ftrace_destroy_filter_files(tr->ops);
ftrace_free_ftrace_ops(tr);
+   free_fgraph_ops(tr);
 }
 
 static ftrace_func_t select_trace_function(u32 flags_val)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index b7b142b65299..9ccc904a7703 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -83,8 +83,6 @@ static

[PATCH v10 09/36] ftrace/function_graph: Pass fgraph_ops to function graph callbacks

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - cleanup to set argument name on function prototype.
---
 include/linux/ftrace.h   |   10 +++---
 kernel/trace/fgraph.c|   16 +---
 kernel/trace/ftrace.c|6 --
 kernel/trace/trace.h |4 ++--
 kernel/trace/trace_functions_graph.c |   11 +++
 kernel/trace/trace_irqsoff.c |6 --
 kernel/trace/trace_sched_wakeup.c|6 --
 kernel/trace/trace_selftest.c|5 +++--
 8 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0a1a7316de7b..f9e51cbf9a97 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1055,11 +1055,15 @@ struct ftrace_graph_ret {
unsigned long long rettime;
 } __packed;
 
+struct fgraph_ops;
+
 /* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+  struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+ struct fgraph_ops *); /* entry */
 
-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct 
fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c50edd1344e5..1fcbae5cc6d6 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -144,13 +144,13 @@ add_fgraph_index_bitmap(struct task_struct *t, int 
offset, unsigned long bitmap)
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
 {
return 0;
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 {
 }
 
@@ -211,12 +211,14 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+   struct fgraph_ops *gops)
 {
return 0;
 }
 
-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
 {
 }
 
@@ -377,7 +379,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
if (gops == _stub)
continue;
 
-   if (gops->entryfunc())
+   if (gops->entryfunc(, gops))
bitmap |= BIT(i);
}
 
@@ -525,7 +527,7 @@ static unsigned long __ftrace_return_to_handler(struct 
fgraph_ret_regs *ret_regs
if (gops == _stub)
continue;
 
-   gops->retfunc();
+   gops->retfunc(, gops);
}
 
/*
@@ -679,7 +681,7 @@ void ftrace_graph_sleep_time_control(bool enable)
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
  */
-extern void ftrace_stub_graph(struct ftrace_graph_ret *);
+void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops 
*gops);
 
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fef833f63647..4b0708106692 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -815,7 +815,8 @@ void ftrace_graph_graph_time_control(bool enable)
fgraph_graph_time = enable;
 }
 
-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+  struct fgraph_ops *gops)
 {
struct ftrace_ret_stack *ret_stack;
 
@@ -832,7 +833,8 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace)
return 1;
 }
 
-static void profile_graph_return(struct ftrace_graph_ret *trace)
+static void profile_graph_return(struct ftrace_graph_ret *trace,
+struct fgraph_ops *gops)
 {

[PATCH v10 08/36] function_graph: Remove logic around ftrace_graph_entry and return

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if
they point to the default stub function or not. This is how the static
function tracing knows to call into the function graph tracer
infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them respectively
when the function graph tracer is enabled, and this will trigger the
architecture specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Fix typo and make lines shorter than 76 chars in the description.
  - Remove unneeded return from return_run() function.
---
 kernel/trace/fgraph.c  |   67 +---
 kernel/trace/ftrace.c  |2 -
 kernel/trace/ftrace_internal.h |2 -
 3 files changed, 15 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6b06962657fe..c50edd1344e5 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -143,6 +143,17 @@ add_fgraph_index_bitmap(struct task_struct *t, int offset, 
unsigned long bitmap)
t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
 }
 
+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+}
+
 /*
  * @offset: The offset into @t->ret_stack to find the ret_stack entry
  * @frame_offset: Where to place the offset into @t->ret_stack of that entry
@@ -673,7 +684,6 @@ extern void ftrace_stub_graph(struct ftrace_graph_ret *);
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
 trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
 static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -756,46 +766,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
}
 }
 
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-   if (!ftrace_ops_test(_ops, trace->func, NULL))
-   return 0;
-   return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-   struct ftrace_ops *op;
-   bool do_test = false;
-
-   /*
-* The graph and global ops share the same set of functions
-* to test. If any other ops is on the list, then
-* the graph tracing needs to test if its the function
-* it should call.
-*/
-   do_for_each_ftrace_op(op, ftrace_ops_list) {
-   if (op != _ops && op != _ops &&
-   op != _list_end) {
-   do_test = true;
-   /* in double loop, break out with goto */
-   goto out;
-   }
-   } while_for_each_ftrace_op(op);
- out:
-   if (do_test)
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   else
-   ftrace_graph_entry = __ftrace_graph_entry;
-}
-
 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
@@ -937,18 +907,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_active--;
goto out;
}
-
-   ftrace_graph_return = gops->retfunc;
-
/*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
+* Some archs just test to see if these are not
+* the default function
 */
-   __ftrace_graph_entry = gops->entryfunc;
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   update_function_graph_func();
+

[PATCH v10 06/36] function_graph: Add an array structure that will allow multiple callbacks

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v2:
  - Remove unneeded brace.
---
 kernel/trace/fgraph.c |  114 +++--
 1 file changed, 81 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6f8d36370994..3f9dd213e7d8 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -39,6 +39,11 @@
 DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
+static int fgraph_array_cnt;
+#define FGRAPH_ARRAY_SIZE  16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -62,6 +67,20 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
 }
 #endif
 
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+   .entryfunc = ftrace_graph_entry_stub,
+   .retfunc = ftrace_graph_ret_stub,
+};
+
 /**
  * ftrace_graph_stop - set to permanently disable function graph tracing
  *
@@ -159,7 +178,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
goto out;
 
/* Only trace if the calling function expects to */
-   if (!ftrace_graph_entry())
+   if (!fgraph_array[0]->entryfunc())
goto out_ret;
 
return 0;
@@ -274,7 +293,7 @@ static unsigned long __ftrace_return_to_handler(struct 
fgraph_ret_regs *ret_regs
trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
trace.rettime = trace_clock_local();
-   ftrace_graph_return();
+   fgraph_array[0]->retfunc();
/*
 * The ftrace_graph_return() may still access the current
 * ret_stack structure, we need to make sure the update of
@@ -410,11 +429,6 @@ void ftrace_graph_sleep_time_control(bool enable)
fgraph_sleep_time = enable;
 }
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-   return 0;
-}
-
 /*
  * Simply points to ftrace_stub, but with the proper protocol.
  * Defined by the linker script in linux/vmlinux.lds.h
@@ -652,37 +666,54 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
+   int i;
 
mutex_lock(_lock);
 
-   /* we currently allow only one tracer registered at a time */
-   if (ftrace_graph_active) {
+   if (!fgraph_array[0]) {
+   /* The array must always have real data on it */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+   fgraph_array[i] = _stub;
+   }
+
+   /* Look for an available spot */
+   for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+   if (fgraph_array[i] == _stub)
+   break;
+   }
+   if (i >= FGRAPH_ARRAY_SIZE) {
ret = -EBUSY;
goto out;
}
 
-   register_pm_notifier(_suspend_notifier);
+   fgraph_array[i] = gops;
+   if (i + 1 > fgraph_array_cnt)
+   fgraph_array_cnt = i + 1;
 
ftrace_graph_active++;
-   ret = start_graph_tracing();
-   if (ret) {
-   ftrace_graph_active--;
-   goto out;
-   }
 
-   ftrace_graph_return = gops->retfunc;
+   if (ftrace_graph_active == 1) {
+   register_pm_notifier(_suspend_notifier);
+   ret = start_graph_tracing();
+   if (ret) {
+   ftrace_graph_active--;
+   goto out;
+   }
+
+   ftrace_graph_return = gops->retfunc;
 
-   /*
-* Update the indirect function to the entryfunc, and the
-* function that gets called to the entry_test first. Then
-* call the update fgraph entry function to determine if
-* the entryfunc should be called directly or not.
-*/
-   __ftrace_graph_entry = gops->entryfunc;
-   ftrace_graph_entry = ftrace_graph_entry_test;
-   update_function_graph_func();
+   /*
+

[PATCH v10 05/36] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long

2024-05-07 Thread Masami Hiramatsu (Google)

From: Steven Rostedt (VMware) 

Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should
always be divisible by sizeof(long).

Link: 
http://lkml.kernel.org/r/2019052444.gi2...@hirez.programming.kicks-ass.net

Suggested-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v7:
  - Use DIV_ROUND_UP() to calculate FGRAPH_RET_INDEX
---
 kernel/trace/fgraph.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 30edeb6d4aa9..6f8d36370994 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -26,10 +26,9 @@
 #endif
 
 #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
-#define SHADOW_STACK_INDEX \
-   (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
 /* Leave on a buffer at the end */
 #define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
 
@@ -91,6 +90,8 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
if (!current->ret_stack)
return -EBUSY;
 
+   BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
+
/*
 * We must make sure the ret_stack is tested before we read
 * anything else.
@@ -325,6 +326,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx)
 {
int index = task->curr_ret_stack;
 
+   BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
index -= FGRAPH_RET_INDEX * (idx + 1);
if (index < 0)
return NULL;

[PATCH v10 02/36] tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as
other ftrace_regs_get/set_* APIs.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Mark Rutland 
---
 Changes in v6:
  - Moved to top of the series.
 Changes in v3:
  - Newly added.
---
 arch/loongarch/include/asm/ftrace.h |2 +-
 arch/powerpc/include/asm/ftrace.h   |2 +-
 arch/s390/include/asm/ftrace.h  |2 +-
 arch/x86/include/asm/ftrace.h   |2 +-
 include/linux/ftrace.h  |2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/loongarch/include/asm/ftrace.h 
b/arch/loongarch/include/asm/ftrace.h
index de891c2c83d4..b43acfc5776c 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -70,7 +70,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs 
*fregs, unsigned long ip)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..cfec6c5a47d0 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -61,7 +61,7 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 621f23d5ae30..1912b598d1b8 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -88,7 +88,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..cf88cc8cc74d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -58,7 +58,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b81f1afa82a1..d5df5f8fc35a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -184,7 +184,7 @@ static __always_inline bool ftrace_regs_has_args(struct 
ftrace_regs *fregs)
regs_get_kernel_argument(ftrace_get_regs(fregs), n)
 #define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(ftrace_get_regs(fregs))
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(ftrace_get_regs(fregs))
 #define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(ftrace_get_regs(fregs), ret)

[PATCH v10 03/36] x86: tracing: Add ftrace_regs definition in the header

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

Add ftrace_regs definition for x86_64 in the ftrace header to
clarify what register will be accessible from ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
---
 Changes in v3:
  - Add rip to be saved.
 Changes in v2:
  - Newly added.
---
 arch/x86/include/asm/ftrace.h |6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index cf88cc8cc74d..c88bf47f46da 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -36,6 +36,12 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
 
 #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 struct ftrace_regs {
+   /*
+* On the x86_64, the ftrace_regs saves;
+* rax, rcx, rdx, rdi, rsi, r8, r9, rbp, rip and rsp.
+* Also orig_ax is used for passing direct trampoline address.
+* x86_32 doesn't support ftrace_regs.
+*/
struct pt_regs  regs;
 };

[PATCH v10 01/36] tracing: Add a comment about ftrace_regs definition

2024-05-07 Thread Masami Hiramatsu (Google)

From: Masami Hiramatsu (Google) 

To clarify what will be expected on ftrace_regs, add a comment to the
architecture independent definition of the ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) 
Acked-by: Mark Rutland 
---
 Changes in v8:
  - Update that the saved registers depends on the context.
 Changes in v3:
  - Add instruction pointer
 Changes in v2:
  - newly added.
---
 include/linux/ftrace.h |   26 ++
 1 file changed, 26 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 54d53f345d14..b81f1afa82a1 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -118,6 +118,32 @@ extern int ftrace_enabled;
 
 #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 
+/**
+ * ftrace_regs - ftrace partial/optimal register set
+ *
+ * ftrace_regs represents a group of registers which is used at the
+ * function entry and exit. There are three types of registers.
+ *
+ * - Registers for passing the parameters to callee, including the stack
+ *   pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
+ * - Registers for passing the return values to caller.
+ *   (e.g. rax and rdx on x86_64)
+ * - Registers for hooking the function call and return including the
+ *   frame pointer (the frame pointer is architecture/config dependent)
+ *   (e.g. rip, rbp and rsp for x86_64)
+ *
+ * Also, architecture dependent fields can be used for internal process.
+ * (e.g. orig_ax on x86_64)
+ *
+ * On the function entry, those registers will be restored except for
+ * the stack pointer, so that user can change the function parameters
+ * and instruction pointer (e.g. live patching.)
+ * On the function exit, only registers which is used for return values
+ * are restored.
+ *
+ * NOTE: user *must not* access regs directly, only do it via APIs, because
+ * the member can be changed according to the architecture.
+ */
 struct ftrace_regs {
struct pt_regs  regs;
 };

[PATCH v10 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-05-07 Thread Masami Hiramatsu (Google)

Hi,

Here is the 10th version of the series to re-implement the fprobe on
function-graph tracer. The previous version is;

https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/

This version is ported on the latest kernel (v6.9-rc6 + probes/for-next)
and fixed some bugs + performance optimizations.
 - [7/36] Fix terminology in comments and code. Use "offset" instead
  of "index" for shadow stack. This also update macros.
 - [18/36] Fix supported data size bug.
 - [29/36] Define bpf_kprobe_multi_pt_regs only if it is used.
 - [36/36] Add likely() to skip timestamp.

Overview

This series does major 2 changes, enable multiple function-graphs on
the ftrace (e.g. allow function-graph on sub instances) and rewrite the
fprobe on this function-graph.

The former changes had been sent from Steven Rostedt 4 years ago (*),
which allows users to set different setting function-graph tracer (and
other tracers based on function-graph) in each trace-instances at the
same time.

(*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/

The purpose of latter change are;

 1) Remove dependency of the rethook from fprobe so that we can reduce
   the return hook code and shadow stack.

 2) Make 'ftrace_regs' the common trace interface for the function
   boundary.

1) Currently we have 2(or 3) different function return hook codes,
 the function-graph tracer and rethook (and legacy kretprobe).
 But since this  is redundant and needs double maintenance cost,
 I would like to unify those. From the user's viewpoint, function-
 graph tracer is very useful to grasp the execution path. For this
 purpose, it is hard to use the rethook in the function-graph
 tracer, but the opposite is possible. (Strictly speaking, kretprobe
 can not use it because it requires 'pt_regs' for historical reasons.)

2) Now the fprobe provides the 'pt_regs' for its handler, but that is
 wrong for the function entry and exit. Moreover, depending on the
 architecture, there is no way to accurately reproduce 'pt_regs'
 outside of interrupt or exception handlers. This means fprobe should
 not use 'pt_regs' because it does not use such exceptions.
 (Conversely, kprobe should use 'pt_regs' because it is an abstract
  interface of the software breakpoint exception.)

This series changes fprobe to use function-graph tracer for tracing
function entry and exit, instead of mixture of ftrace and rethook.
Unlike the rethook which is a per-task list of system-wide allocated
nodes, the function graph's ret_stack is a per-task shadow stack.
Thus it does not need to set 'nr_maxactive' (which is the number of
pre-allocated nodes).
Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
their register interface, this changes it to convert 'ftrace_regs' to
'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
so users must access only registers for function parameters or
return value. 

Design
--
Instead of using ftrace's function entry hook directly, the new fprobe
is built on top of the function-graph's entry and return callbacks
with 'ftrace_regs'.

Since the fprobe requires access to 'ftrace_regs', the architecture
must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
entry callback with 'ftrace_regs', and also
CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
return_to_handler.

All fprobes share a single function-graph ops (means shares a common
ftrace filter) similar to the kprobe-on-ftrace. This needs another
layer to find corresponding fprobe in the common function-graph
callbacks, but has much better scalability, since the number of
registered function-graph ops is limited.

In the entry callback, the fprobe runs its entry_handler and saves the
address of 'fprobe' on the function-graph's shadow stack as data. The
return callback decodes the data to get the 'fprobe' address, and runs
the exit_handler.

The fprobe introduces two hash-tables, one is for entry callback which
searches fprobes related to the given function address passed by entry
callback. The other is for a return callback which checks if the given
'fprobe' data structure pointer is still valid. Note that it is
possible to unregister fprobe before the return callback runs. Thus
the address validation must be done before using it in the return
callback.

This series can be applied against the probes/for-next branch, which
is based on v6.9-rc6.

This series can also be found below branch.

https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph

Thank you,

---

Masami Hiramatsu (Google) (21):
  tracing: Add a comment about ftrace_regs definition
  tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
  x86: tracing: Add ftrace_regs definition in the header
  function_graph: Use a simple LRU

Re: [PATCH 0/3] Fix up qcom,halt-regs definition in various schemas

2024-05-07 Thread Bjorn Andersson



On Sun, 07 Apr 2024 11:58:29 +0200, Luca Weiss wrote:
> The original motivation is that a bunch of other schemas fail to
> validate qcom,halt-regs, for example like in the following examples:
> 
> arch/arm64/boot/dts/qcom/apq8016-sbc.dtb: remoteproc@408: 
> qcom,halt-regs:0: [20] is too short
> from schema $id: 
> http://devicetree.org/schemas/remoteproc/qcom,msm8916-mss-pil.yaml#
> arch/arm64/boot/dts/qcom/apq8096-ifc6640.dtb: remoteproc@208: 
> qcom,halt-regs:0: [82] is too short
> from schema $id: 
> http://devicetree.org/schemas/remoteproc/qcom,msm8996-mss-pil.yaml#
> arch/arm64/boot/dts/qcom/apq8039-t2.dtb: remoteproc@408: 
> qcom,halt-regs:0: [32] is too short
> from schema $id: 
> http://devicetree.org/schemas/remoteproc/qcom,msm8916-mss-pil.yaml#
> 
> [...]

Applied, thanks!

[1/3] dt-bindings: remoteproc: qcom,qcs404-cdsp-pil: Fix qcom,halt-regs 
definition
  commit: a0bcbce661216b9d9d00fb652b35f35da77b2287
[2/3] dt-bindings: remoteproc: qcom,sc7280-wpss-pil: Fix qcom,halt-regs 
definition
  commit: 16e204e958096d649aa1617433f31995a9c60809
[3/3] dt-bindings: remoteproc: qcom,sdm845-adsp-pil: Fix qcom,halt-regs 
definition
  commit: 4d5ba6ead1dc9fa298d727e92db40cd98564d1ac

Best regards,
-- 
Bjorn Andersson

Re: (subset) [PATCH 0/2] Mark qcom,ipc as deprecated in two schemas

2024-05-07 Thread Bjorn Andersson



On Thu, 25 Apr 2024 21:14:29 +0200, Luca Weiss wrote:
> The mboxes property has been supported in those bindings since a while
> and was always meant to the replace qcom,ipc properties, so let's mark
> qcom,ipc as deprecated - and update the examples to use mboxes.
> 
> Related:
> https://lore.kernel.org/linux-arm-msm/20240424-apcs-mboxes-v1-0-6556c47cb...@z3ntu.xyz/
> 
> [...]

Applied, thanks!

[1/2] dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated
  commit: 335617f0d502f80c9b9410c518222b2cb33878e8

Best regards,
-- 
Bjorn Andersson

Re: (subset) [PATCH v2 0/3] arm64: dts: qcom: msm8996: enable fastrpc and glink-edge

2024-05-07 Thread Bjorn Andersson



On Thu, 18 Apr 2024 09:44:19 +0300, Dmitry Baryshkov wrote:
> Enable the FastRPC and glink-edge nodes on MSM8996 platform. Tested on
> APQ8096 Dragonboard820c.
> 
> 

Applied, thanks!

[1/3] dt-bindings: remoteproc: qcom,msm8996-mss-pil: allow glink-edge on msm8996
  commit: a0acdef561d1699b020ab932a0edb556c4829533

Best regards,
-- 
Bjorn Andersson

Re: [PATCH 1/2] objpool: enable inlining objpool_push() and objpool_pop() operations

2024-05-07 Thread Vlastimil Babka

On 4/24/24 11:52 PM, Andrii Nakryiko wrote:
> objpool_push() and objpool_pop() are very performance-critical functions
> and can be called very frequently in kretprobe triggering path.
> 
> As such, it makes sense to allow compiler to inline them completely to
> eliminate function calls overhead. Luckily, their logic is quite well
> isolated and doesn't have any sprawling dependencies.
> 
> This patch moves both objpool_push() and objpool_pop() into
> include/linux/objpool.h and marks them as static inline functions,
> enabling inlining. To avoid anyone using internal helpers
> (objpool_try_get_slot, objpool_try_add_slot), rename them to use leading
> underscores.
> 
> We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe
> and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi
> programs in a tight loop to evaluate the effect. BPF own overhead in
> this case is minimal and it mostly stresses the rest of in-kernel
> kretprobe infrastructure overhead. Results are in millions of calls per
> second. This is not super scientific, but shows the trend nevertheless.
> 
> BEFORE
> ==
> kretprobe  :9.794 ± 0.086M/s
> kretprobe-multi:   10.219 ± 0.032M/s
> 
> AFTER
> =
> kretprobe  :9.937 ± 0.174M/s (+1.5%)
> kretprobe-multi:   10.440 ± 0.108M/s (+2.2%)
> 
> Cc: Matt (Qiang) Wu 
> Signed-off-by: Andrii Nakryiko 

Hello,

this question is not specific to your patch, but since it's a recent thread,
I'll ask it here instead of digging up the original objpool patches.

I'm trying to understand how objpool works and if it could be integrated
into SLUB, for the LSF/MM discussion next week:

https://lore.kernel.org/all/b929d5fb-8e88-4f23-8ec7-6bdaf61f8...@suse.cz/

> +/* adding object to slot, abort if the slot was already full */

I don't see any actual abort in the code (not in this code nor in the
deleted code - it's the same code, just moved for inlining purposes).

> +static inline int
> +__objpool_try_add_slot(void *obj, struct objpool_head *pool, int cpu)
> +{
> + struct objpool_slot *slot = pool->cpu_slots[cpu];
> + uint32_t head, tail;
> +
> + /* loading tail and head as a local snapshot, tail first */
> + tail = READ_ONCE(slot->tail);
> +
> + do {
> + head = READ_ONCE(slot->head);
> + /* fault caught: something must be wrong */
> + WARN_ON_ONCE(tail - head > pool->nr_objs);

So this will only WARN if we go over the capacity, but continue and
overwrite a pointer that was already there, effectively leaking said object, no?

> + } while (!try_cmpxchg_acquire(>tail, , tail + 1));
> +
> + /* now the tail position is reserved for the given obj */
> + WRITE_ONCE(slot->entries[tail & slot->mask], obj);
> + /* update sequence to make this obj available for pop() */
> + smp_store_release(>last, tail + 1);
> +
> + return 0;
> +}
>  
>  /**
>   * objpool_push() - reclaim the object and return back to objpool
> @@ -134,7 +219,19 @@ void *objpool_pop(struct objpool_head *pool);
>   * return: 0 or error code (it fails only when user tries to push
>   * the same object multiple times or wrong "objects" into objpool)
>   */
> -int objpool_push(void *obj, struct objpool_head *pool);
> +static inline int objpool_push(void *obj, struct objpool_head *pool)
> +{
> + unsigned long flags;
> + int rc;
> +
> + /* disable local irq to avoid preemption & interruption */
> + raw_local_irq_save(flags);
> + rc = __objpool_try_add_slot(obj, pool, raw_smp_processor_id());

And IIUC, we could in theory objpool_pop() on one cpu, then later another
cpu might do objpool_push() and cause the latter cpu's pool to go over
capacity? Is there some implicit requirements of objpool users to take care
of having matched cpu for pop and push? Are the current objpool users
obeying this requirement? (I can see the selftests do, not sure about the
actual users).
Or am I missing something? Thanks.

> + raw_local_irq_restore(flags);
> +
> + return rc;
> +}
> +
>  
>  /**
>   * objpool_drop() - discard the object and deref objpool
> diff --git a/lib/objpool.c b/lib/objpool.c
> index cfdc02420884..f696308fc026 100644
> --- a/lib/objpool.c
> +++ b/lib/objpool.c
> @@ -152,106 +152,6 @@ int objpool_init(struct objpool_head *pool, int 
> nr_objs, int object_size,
>  }
>  EXPORT_SYMBOL_GPL(objpool_init);
>  
> -/* adding object to slot, abort if the slot was already full */
> -static inline int
> -objpool_try_add_slot(void *obj, struct objpool_head *pool, int cpu)
> -{
> - struct objpool_slot *slot = pool->cpu_slots[cpu];
> - uint32_t head, tail;
> -
> - /* loading tail and head as a local snapshot, tail first */
> - tail = READ_ONCE(slot->tail);
> -
> - do {
> - head = READ_ONCE(slot->head);
> - /* fault caught: something must be wrong */
> - WARN_ON_ONCE(tail - head > pool->nr_objs);
> - } while (!try_cmpxchg_acquire(>tail, , tail + 1));
>

Re: [PATCH] mm: Remove mm argument from mm_get_unmapped_area()

2024-05-07 Thread Edgecombe, Rick P

On Mon, 2024-05-06 at 09:18 -0700, Christoph Hellwig wrote:
> > On Mon, May 06, 2024 at 09:07:47AM -0700, Rick Edgecombe wrote:
> > > > if (flags & MAP_FIXED) {
> > > > /* Ok, don't mess with it. */
> > > > -   return mm_get_unmapped_area(current->mm, NULL,
> > > > orig_addr, > > len, pgoff, flags);
> > > > +   return current_get_unmapped_area(NULL, orig_addr, len, >
> > > > > pgoff, flags);
> > 
> > The old name seems preferable because it's not as crazy long.  In fact
> > just get_unmapped_area would be even better, but that's already taken
> > by something else.

Ok.

> > 
> > Can we maybe take a step back and sort out the mess of the various
> > _get_unmapped_area helpers?
> > 
> > e.g. mm_get_unmapped_area_vmflags just wraps
> > arch_get_unmapped_area_topdown_vmflags and
> > arch_get_unmapped_area_vmflags, and we might as well merge all three
> > by moving the MMF_TOPDOWN into two actual implementations?
> > 
> > And then just update all the implementations to always pass the
> > vm_flags instead of having separate implementations with our without
> > the flags.
> > 
> > And then make __get_unmapped_area static in mmap.c nad move the
> > get_unmapped_area wrappers there.  And eventually write some
> > documentation for the functions based on the learnings who actually
> > uses what..

The rest of the series[0] is in the mm-tree/linux-next currently. Are you
suggesting we not do this patch, and leave the rest you describe here for the
future? I think the removal of the indirect branch is at least a positive step
forward.


[0]
https://lore.kernel.org/lkml/20240326021656.202649-1-rick.p.edgeco...@intel.com/

Re: [PATCH] mm: Remove mm argument from mm_get_unmapped_area()

2024-05-07 Thread Edgecombe, Rick P

On Mon, 2024-05-06 at 12:32 -0400, Liam R. Howlett wrote:
> 
> I like this patch.

Thanks for taking a look.

> 
> I think the context of current->mm is implied. IOW, could we call it
> get_unmapped_area() instead?  There are other functions today that use
> current->mm that don't start with current_.  I probably should
> have responded to Dan's suggestion with my comment.

Yes, get_unmapped_area() is already taken. What else to call it... It is kind of
the process "default" get_unmapped_area(). But with Christoph's proposal it
would basically be arch_get_unmapped_area().

> 
> Either way, this is a minor thing so feel free to add:
> Reviewed-by: Liam R. Howlett

Re: [PATCH 1/1] livepatch: Rename KLP_* to KLP_TRANSITION_*

2024-05-07 Thread Petr Mladek

On Tue 2024-05-07 13:01:11, zhangwar...@gmail.com wrote:
> From: Wardenjohn 
> 
> The original macros of KLP_* is about the state of the transition.
> Rename macros of KLP_* to KLP_TRANSITION_* to fix the confusing
> description of klp transition state.
> 
> Signed-off-by: Wardenjohn 

Looks good to me:

Reviewed-by: Petr Mladek 
Tested-by: Petr Mladek 

Best Regards,
Petr

Re: [PATCH RT 0/1] Linux v4.19.312-rt134-rc2

2024-05-07 Thread Daniel Wagner

Hi Sebastian,

On Tue, May 07, 2024 at 11:54:07AM GMT, Sebastian Andrzej Siewior wrote:
> I compared mine outcome vs v4.19-rt-next and the diff at the bottom came
> out:
> 
> - timer_delete_sync() used to have "#if 0" block around
>   lockdep_assert_preemption_enabled() because the function is not part
>   of v4.19. You ended up with might_sleep() which is a minor change.
>   Your queue as of a previous release had the if0 block (in
>   __del_timer_sync()).
>   I would say this is minor but looks like a miss-merge. Therefore I
>   would say it should go back for consistency vs previous release and
>   not change it due to conflicts.

Makes sense.

> - The timer_delete_sync() is structured differently with
>   __del_timer_sync(). That function invokes timer_sync_wait_running()
>   which drops two locks which are not acquired. That is wrong. It should
>   have been del_timer_wait_running().

Understood. I was a bit strungling here. Glad you caught this error.

> I suggest you apply the diff below to align it with later versions. It
> also gets rid of the basep argument in __try_to_del_timer_sync() which
> is not really used.

Will do.

> As an alternative I can send you my rebased queue if this makes it
> easier for you.

Yes please, this makes it easy to sync the rebase branch.

Thanks a lot!

Cheers,
Daniel

[PATCHv5 8/8] man2: Add uretprobe syscall page

2024-05-07 Thread Jiri Olsa

Adding man page for new uretprobe syscall.

Signed-off-by: Jiri Olsa 
---
 man2/uretprobe.2 | 50 
 1 file changed, 50 insertions(+)
 create mode 100644 man2/uretprobe.2

diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
new file mode 100644
index ..690fe3b1a44f
--- /dev/null
+++ b/man2/uretprobe.2
@@ -0,0 +1,50 @@
+.\" Copyright (C) 2024, Jiri Olsa 
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uretprobe \- execute pending return uprobes
+.SH SYNOPSIS
+.nf
+.B int uretprobe(void)
+.fi
+.SH DESCRIPTION
+The
+.BR uretprobe ()
+syscall is an alternative to breakpoint instructions for
+triggering return uprobe consumers.
+.P
+Calls to
+.BR uretprobe ()
+suscall are only made from the user-space trampoline provided by the kernel.
+Calls from any other place result in a
+.BR SIGILL .
+
+.SH RETURN VALUE
+The
+.BR uretprobe ()
+syscall return value is architecture-specific.
+
+.SH VERSIONS
+This syscall is not specified in POSIX,
+and details of its behavior vary across systems.
+.SH STANDARDS
+None.
+.SH HISTORY
+TBD
+.SH NOTES
+The
+.BR uretprobe ()
+syscall was initially introduced for the x86_64 architecture where it was shown
+to be faster than breakpoint traps. It might be extended to other 
architectures.
+.P
+The
+.BR uretprobe ()
+syscall exists only to allow the invocation of return uprobe consumers.
+It should
+.B never
+be called directly.
+Details of the arguments (if any) passed to
+.BR uretprobe ()
+and the return value are architecture-specific.
-- 
2.44.0

[PATCHv5 bpf-next 7/8] selftests/x86: Add return uprobe shadow stack test

2024-05-07 Thread Jiri Olsa

Adding return uprobe test for shadow stack and making sure it's
working properly. Borrowed some of the code from bpf selftests.

Signed-off-by: Jiri Olsa 
---
 .../testing/selftests/x86/test_shadow_stack.c | 142 ++
 1 file changed, 142 insertions(+)

diff --git a/tools/testing/selftests/x86/test_shadow_stack.c 
b/tools/testing/selftests/x86/test_shadow_stack.c
index 757e6527f67e..1b919baa999b 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Define the ABI defines if needed, so people can run the tests
@@ -681,6 +682,141 @@ int test_32bit(void)
return !segv_triggered;
 }
 
+static int parse_uint_from_file(const char *file, const char *fmt)
+{
+   int err, ret;
+   FILE *f;
+
+   f = fopen(file, "re");
+   if (!f) {
+   err = -errno;
+   printf("failed to open '%s': %d\n", file, err);
+   return err;
+   }
+   err = fscanf(f, fmt, );
+   if (err != 1) {
+   err = err == EOF ? -EIO : -errno;
+   printf("failed to parse '%s': %d\n", file, err);
+   fclose(f);
+   return err;
+   }
+   fclose(f);
+   return ret;
+}
+
+static int determine_uprobe_perf_type(void)
+{
+   const char *file = "/sys/bus/event_source/devices/uprobe/type";
+
+   return parse_uint_from_file(file, "%d\n");
+}
+
+static int determine_uprobe_retprobe_bit(void)
+{
+   const char *file = 
"/sys/bus/event_source/devices/uprobe/format/retprobe";
+
+   return parse_uint_from_file(file, "config:%d\n");
+}
+
+static ssize_t get_uprobe_offset(const void *addr)
+{
+   size_t start, end, base;
+   char buf[256];
+   bool found = false;
+   FILE *f;
+
+   f = fopen("/proc/self/maps", "r");
+   if (!f)
+   return -errno;
+
+   while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", , , buf, ) 
== 4) {
+   if (buf[2] == 'x' && (uintptr_t)addr >= start && 
(uintptr_t)addr < end) {
+   found = true;
+   break;
+   }
+   }
+
+   fclose(f);
+
+   if (!found)
+   return -ESRCH;
+
+   return (uintptr_t)addr - start + base;
+}
+
+static __attribute__((noinline)) void uretprobe_trigger(void)
+{
+   asm volatile ("");
+}
+
+/*
+ * This test setups return uprobe, which is sensitive to shadow stack
+ * (crashes without extra fix). After executing the uretprobe we fail
+ * the test if we receive SIGSEGV, no crash means we're good.
+ *
+ * Helper functions above borrowed from bpf selftests.
+ */
+static int test_uretprobe(void)
+{
+   const size_t attr_sz = sizeof(struct perf_event_attr);
+   const char *file = "/proc/self/exe";
+   int bit, fd = 0, type, err = 1;
+   struct perf_event_attr attr;
+   struct sigaction sa = {};
+   ssize_t offset;
+
+   type = determine_uprobe_perf_type();
+   if (type < 0)
+   return 1;
+
+   offset = get_uprobe_offset(uretprobe_trigger);
+   if (offset < 0)
+   return 1;
+
+   bit = determine_uprobe_retprobe_bit();
+   if (bit < 0)
+   return 1;
+
+   sa.sa_sigaction = segv_gp_handler;
+   sa.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGSEGV, , NULL))
+   return 1;
+
+   /* Setup return uprobe through perf event interface. */
+   memset(, 0, attr_sz);
+   attr.size = attr_sz;
+   attr.type = type;
+   attr.config = 1 << bit;
+   attr.config1 = (__u64) (unsigned long) file;
+   attr.config2 = offset;
+
+   fd = syscall(__NR_perf_event_open, , 0 /* pid */, -1 /* cpu */,
+-1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
+   if (fd < 0)
+   goto out;
+
+   if (sigsetjmp(jmp_buffer, 1))
+   goto out;
+
+   ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK);
+
+   /*
+* This either segfaults and goes through sigsetjmp above
+* or succeeds and we're good.
+*/
+   uretprobe_trigger();
+
+   printf("[OK]\tUretprobe test\n");
+   err = 0;
+
+out:
+   ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
+   signal(SIGSEGV, SIG_DFL);
+   if (fd)
+   close(fd);
+   return err;
+}
+
 void segv_handler_ptrace(int signum, siginfo_t *si, void *uc)
 {
/* The SSP adjustment caused a segfault. */
@@ -867,6 +1003,12 @@ int main(int argc, char *argv[])
goto out;
}
 
+   if (test_uretprobe()) {
+   ret = 1;
+   printf("[FAIL]\turetprobe test\n");
+   goto out;
+   }
+
return ret;
 
 out:
-- 
2.44.0

[PATCHv5 bpf-next 6/8] x86/shstk: Add return uprobe support

2024-05-07 Thread Jiri Olsa

Adding return uprobe support to work with enabled shadow stack.

Currently the application with enabled shadow stack will crash
if it sets up return uprobe. The reason is the uretprobe kernel
code changes the user space task's stack, but does not update
shadow stack accordingly.

Adding new functions to update values on shadow stack and using
them in uprobe code to keep shadow stack in sync with uretprobe
changes to user stack.

Signed-off-by: Jiri Olsa 
---
 arch/x86/include/asm/shstk.h |  4 
 arch/x86/kernel/shstk.c  | 29 +
 arch/x86/kernel/uprobes.c| 12 +++-
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..2e1ddcf98242 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, 
unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+int shstk_update_last_frame(unsigned long val);
+int shstk_push_frame(unsigned long val);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
   unsigned long arg2) { return -EINVAL; }
@@ -31,6 +33,8 @@ static inline unsigned long shstk_alloc_thread_stack(struct 
task_struct *p,
 static inline void shstk_free(struct task_struct *p) {}
 static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
 static inline int restore_signal_shadow_stack(void) { return 0; }
+static inline int shstk_update_last_frame(unsigned long val) { return 0; }
+static inline int shstk_push_frame(unsigned long val) { return 0; }
 #endif /* CONFIG_X86_USER_SHADOW_STACK */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..66434dfde52e 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,32 @@ long shstk_prctl(struct task_struct *task, int option, 
unsigned long arg2)
return wrss_control(true);
return -EINVAL;
 }
+
+int shstk_update_last_frame(unsigned long val)
+{
+   unsigned long ssp;
+
+   if (!features_enabled(ARCH_SHSTK_SHSTK))
+   return 0;
+
+   ssp = get_user_shstk_addr();
+   return write_user_shstk_64((u64 __user *)ssp, (u64)val);
+}
+
+int shstk_push_frame(unsigned long val)
+{
+   unsigned long ssp;
+
+   if (!features_enabled(ARCH_SHSTK_SHSTK))
+   return 0;
+
+   ssp = get_user_shstk_addr();
+   ssp -= SS_FRAME_SIZE;
+   if (write_user_shstk_64((u64 __user *)ssp, (u64)val))
+   return -EFAULT;
+
+   fpregs_lock_and_load();
+   wrmsrl(MSR_IA32_PL3_SSP, ssp);
+   fpregs_unlock();
+   return 0;
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..ae6c3458a675 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -406,6 +406,11 @@ SYSCALL_DEFINE0(uretprobe)
 * trampoline's ret instruction
 */
r11_cx_ax[2] = regs->ip;
+
+   /* make the shadow stack follow that */
+   if (shstk_push_frame(regs->ip))
+   goto sigill;
+
regs->ip = ip;
 
err = copy_to_user((void __user *)regs->sp, r11_cx_ax, 
sizeof(r11_cx_ax));
@@ -1191,8 +1196,13 @@ arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs
return orig_ret_vaddr;
 
nleft = copy_to_user((void __user *)regs->sp, _vaddr, 
rasize);
-   if (likely(!nleft))
+   if (likely(!nleft)) {
+   if (shstk_update_last_frame(trampoline_vaddr)) {
+   force_sig(SIGSEGV);
+   return -1;
+   }
return orig_ret_vaddr;
+   }
 
if (nleft != rasize) {
pr_err("return address clobbered: pid=%d, %%sp=%#lx, 
%%ip=%#lx\n",
-- 
2.44.0

[PATCHv5 bpf-next 5/8] selftests/bpf: Add uretprobe syscall call from user space test

2024-05-07 Thread Jiri Olsa

Adding test to verify that when called from outside of the
trampoline provided by kernel, the uretprobe syscall will cause
calling process to receive SIGILL signal and the attached bpf
program is not executed.

Reviewed-by: Masami Hiramatsu (Google) 
Signed-off-by: Jiri Olsa 
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 95 +++
 .../bpf/progs/uprobe_syscall_executed.c   | 17 
 2 files changed, 112 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c 
b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 1a50cd35205d..3ef324c2db50 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -7,7 +7,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "uprobe_syscall.skel.h"
+#include "uprobe_syscall_executed.skel.h"
 
 __naked unsigned long uretprobe_regs_trigger(void)
 {
@@ -209,6 +212,91 @@ static void test_uretprobe_regs_change(void)
}
 }
 
+#ifndef __NR_uretprobe
+#define __NR_uretprobe 462
+#endif
+
+__naked unsigned long uretprobe_syscall_call_1(void)
+{
+   /*
+* Pretend we are uretprobe trampoline to trigger the return
+* probe invocation in order to verify we get SIGILL.
+*/
+   asm volatile (
+   "pushq %rax\n"
+   "pushq %rcx\n"
+   "pushq %r11\n"
+   "movq $" __stringify(__NR_uretprobe) ", %rax\n"
+   "syscall\n"
+   "popq %r11\n"
+   "popq %rcx\n"
+   "retq\n"
+   );
+}
+
+__naked unsigned long uretprobe_syscall_call(void)
+{
+   asm volatile (
+   "call uretprobe_syscall_call_1\n"
+   "retq\n"
+   );
+}
+
+static void test_uretprobe_syscall_call(void)
+{
+   LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+   .retprobe = true,
+   );
+   struct uprobe_syscall_executed *skel;
+   int pid, status, err, go[2], c;
+
+   if (ASSERT_OK(pipe(go), "pipe"))
+   return;
+
+   skel = uprobe_syscall_executed__open_and_load();
+   if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+   goto cleanup;
+
+   pid = fork();
+   if (!ASSERT_GE(pid, 0, "fork"))
+   goto cleanup;
+
+   /* child */
+   if (pid == 0) {
+   close(go[1]);
+
+   /* wait for parent's kick */
+   err = read(go[0], , 1);
+   if (err != 1)
+   exit(-1);
+
+   uretprobe_syscall_call();
+   _exit(0);
+   }
+
+   skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, 
pid,
+   "/proc/self/exe",
+   
"uretprobe_syscall_call", );
+   if (!ASSERT_OK_PTR(skel->links.test, 
"bpf_program__attach_uprobe_multi"))
+   goto cleanup;
+
+   /* kick the child */
+   write(go[1], , 1);
+   err = waitpid(pid, , 0);
+   ASSERT_EQ(err, pid, "waitpid");
+
+   /* verify the child got killed with SIGILL */
+   ASSERT_EQ(WIFSIGNALED(status), 1, "WIFSIGNALED");
+   ASSERT_EQ(WTERMSIG(status), SIGILL, "WTERMSIG");
+
+   /* verify the uretprobe program wasn't called */
+   ASSERT_EQ(skel->bss->executed, 0, "executed");
+
+cleanup:
+   uprobe_syscall_executed__destroy(skel);
+   close(go[1]);
+   close(go[0]);
+}
 #else
 static void test_uretprobe_regs_equal(void)
 {
@@ -219,6 +307,11 @@ static void test_uretprobe_regs_change(void)
 {
test__skip();
 }
+
+static void test_uretprobe_syscall_call(void)
+{
+   test__skip();
+}
 #endif
 
 void test_uprobe_syscall(void)
@@ -227,4 +320,6 @@ void test_uprobe_syscall(void)
test_uretprobe_regs_equal();
if (test__start_subtest("uretprobe_regs_change"))
test_uretprobe_regs_change();
+   if (test__start_subtest("uretprobe_syscall_call"))
+   test_uretprobe_syscall_call();
 }
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c 
b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
new file mode 100644
index ..0d7f1a7db2e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include 
+#include 
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+int executed = 0;
+
+SEC("uretprobe.multi")
+int test(struct pt_regs *regs)
+{
+   executed = 1;
+   return 0;
+}
-- 
2.44.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4331959 matches

Mail list logo