to the function graph tracer.
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- cleanup to set argument name on function prototype.
---
include/linux/ftrace.h | 10 +++---
kernel/trace/fgraph.c| 16
into the
function_graph tracer.
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Fix typo and make lines shorter than 76 chars in the description.
- Remove unneeded return from return_run() function.
---
kernel/trace/fgraph.c | 67
are called is not completely
handled yet, but that shouldn't be too hard to manage.
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v7:
- Fix max limitation check in ftrace_graph_push_return().
- Rewrite the shadow stack implementation using bitmap
on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Remove
dt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v7:
- Use DIV_ROUND_UP() to calculate FGRAPH_RET_INDEX
---
kernel/trace/fgraph.c |9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 30
for the return side of the functions.
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
include/linux/sched.h |2 -
kernel/trace/fgraph.c | 124 -
2 files changed, 71 insertions(+), 55 deletions(-)
diff --git
From: Masami Hiramatsu (Google)
Add ftrace_regs definition for x86_64 in the ftrace header to
clarify what register will be accessible from ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v3:
- Add rip to be saved.
Changes in v2:
- Newly added.
---
arch/x86/include
From: Masami Hiramatsu (Google)
Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as
other ftrace_regs_get/set_* APIs.
Signed-off-by: Masami Hiramatsu (Google)
Acked-by: Mark Rutland
---
Changes in v6:
- Moved to top of the series.
Changes in v3:
- Newly added
From: Masami Hiramatsu (Google)
To clarify what will be expected on ftrace_regs, add a comment to the
architecture independent definition of the ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google)
Acked-by: Mark Rutland
---
Changes in v8:
- Update that the saved registers depends
it in the return
callback.
This series can be applied against the probes/for-next branch, which
is based on v6.9-rc3.
This series can also be found below branch.
https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
Thank you,
---
Masami Hiramatsu (Google) (21
From: Masami Hiramatsu (Google)
Check the number of probe target symbols in the target module when
the module is loaded. If the probe is not on the unique name symbols
in the module, it will be rejected at that point.
Note that the symbol which has a unique name in the target module
6605] 88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [9.654675]
> ==
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Qiang Zhang
Looks good to me.
Acked-by: Masami Hiramatsu (Goo
h this fix?
Thank you,
On Fri, 12 Apr 2024 22:18:20 +0900
Masami Hiramatsu (Google) wrote:
> On Fri, 12 Apr 2024 18:49:41 +0800
> qiang4.zh...@linux.intel.com wrote:
>
> > From: Qiang Zhang
> >
> > On the time to free xbc memory in xbc_exit(), memblock may has handed
>
6605] 88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [9.654675]
> ==
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Qiang Zhang
Looks good to me.
Acked-by: Masami Hiramatsu (Google)
onfig.c
> @@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_mem(size_t size)
>
> static inline void __init xbc_free_mem(void *addr, size_t size)
> {
> - memblock_free(addr, size);
> + memblock_free_late(__pa(addr), size);
> }
>
> #else /* !__KERNEL__ */
> --
> 2.39.2
>
>
--
Masami Hiramatsu (Google)
args to make the string in saved_command_line look more perfect.
>
> Signed-off-by: Yuntao Wang
OK, this looks good to me.
Acked-by: Masami Hiramatsu (Google)
Let me pick this to bootconfig/for-next.
Thank you,
> ---
> v1 -> v2: Fix the issue using the method suggested by M
On Thu, 11 Apr 2024 23:29:40 +0800
Yuntao Wang wrote:
> On Thu, 11 Apr 2024 23:07:45 +0900, Masami Hiramatsu (Google)
> wrote:
>
> > On Thu, 11 Apr 2024 09:19:32 +0200
> > Geert Uytterhoeven wrote:
> >
> > > CC Hiramatsu-san (now for real :-)
> >
80.048580: sched_prepare_exec: interp=/bin/bash
> filename=/bin/bash pid=381 comm=sshd
> <...>-385 [001] . 180.068277: sched_prepare_exec:
> interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
> <...>-389 [006] . 192.020147: sched_prepare_exec:
> interp=/u
however, we have
> environments where eBPF is not available.
>
> It's sounding like to do this properly without eBPF a new feature would
> be required. If so, I do have some patches I can share in a bit as an
> RFC.
It is better to be shared in RFC stage, so that we can discuss it from
the direction level.
Thank you,
>
> Thanks,
> -Beau
>
> > Thank you,
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
e
> > trailing space */
> > }
> > }
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like
> that.
> -- Linus Torvalds
>
--
Masami Hiramatsu (Google)
gt; > > > > > > > > *uprobe)
> > > > > > > > > > if (WARN_ON(!uprobe_is_active(uprobe)))
> > > > > > > > > > return;
> > > > > > > > > >
> > > > > > > > > > - spin_lock(_treelock);
> > > > > > > > > > + write_lock(_treelock);
> > > > > > > > > > rb_erase(>rb_node, _tree);
> > > > > > > > > > - spin_unlock(_treelock);
> > > > > > > > > > + write_unlock(_treelock);
> > > > > > > > > > RB_CLEAR_NODE(>rb_node); /* for
> > > > > > > > > > uprobe_is_active() */
> > > > > > > > > > put_uprobe(uprobe);
> > > > > > > > > > }
> > > > > > > > > > @@ -1298,7 +1298,7 @@ static void build_probe_list(struct
> > > > > > > > > > inode *inode,
> > > > > > > > > > min = vaddr_to_offset(vma, start);
> > > > > > > > > > max = min + (end - start) - 1;
> > > > > > > > > >
> > > > > > > > > > - spin_lock(_treelock);
> > > > > > > > > > + read_lock(_treelock);
> > > > > > > > > > n = find_node_in_range(inode, min, max);
> > > > > > > > > > if (n) {
> > > > > > > > > > for (t = n; t; t = rb_prev(t)) {
> > > > > > > > > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct
> > > > > > > > > > inode *inode,
> > > > > > > > > > get_uprobe(u);
> > > > > > > > > > }
> > > > > > > > > > }
> > > > > > > > > > - spin_unlock(_treelock);
> > > > > > > > > > + read_unlock(_treelock);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > /* @vma contains reference counter, not the probed
> > > > > > > > > > instruction. */
> > > > > > > > > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct
> > > > > > > > > > *vma, unsigned long start, unsigned long e
> > > > > > > > > > min = vaddr_to_offset(vma, start);
> > > > > > > > > > max = min + (end - start) - 1;
> > > > > > > > > >
> > > > > > > > > > - spin_lock(_treelock);
> > > > > > > > > > + read_lock(_treelock);
> > > > > > > > > > n = find_node_in_range(inode, min, max);
> > > > > > > > > > - spin_unlock(_treelock);
> > > > > > > > > > + read_unlock(_treelock);
> > > > > > > > > >
> > > > > > > > > > return !!n;
> > > > > > > > > > }
> > > > > > > > > > --
> > > > > > > > > > 2.43.0
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Masami Hiramatsu (Google)
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Masami Hiramatsu (Google)
> > > > >
> > > > >
> > > > > --
> > > > > Masami Hiramatsu (Google)
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
On Mon, 4 Mar 2024 19:13:42 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The trace_seq buffer is used to print out entire events. It's typically
> set to PAGE_SIZE * 2 as there's some events that can be quite large.
>
> As a side effect, write
goto out;
> }
>
> - /* Check if 'p' is probing a module. */
> - *probed_mod = __module_text_address((unsigned long) p->addr);
> + /* Get module refcount and reject __init functions for loaded modules.
> */
> if (*probed_mod) {
> /*
>* We must hold a refcount of the probed module while updating
> --
> 2.25.1
>
--
Masami Hiramatsu (Google)
t; + __string( comm, task->comm )
> + ),
> +
> + TP_fast_assign(
> + __assign_str(filename, bprm->filename);
> + __entry->pid = task->pid;
> + __assign_str(comm, task->comm);
> + ),
> +
> + TP_printk("filename=%s pid=%d comm=%s",
> + __get_str(filename), __entry->pid, __get_str(comm))
> +);
> +
> #endif
>
> /* This part must be outside protection */
> --
> 2.44.0.478.gd926399ef9-goog
>
--
Masami Hiramatsu (Google)
possible via some other means? It'd be great to be able
> to do this directly at the perf_event sample via the ABI or a probe.
>
Have you tried to use uprobes? It should be able to access user-space
registers including fs/gs.
Thank you,
--
Masami Hiramatsu (Google)
nt using
"perf probe" tool.
Thank you,
>
> I don't know if "common" is the right question here, because it's a
> chicken-egg problem: no tracepoint, we give up; we have the
> tracepoint, it unlocks a range of new use cases (that require robust
> solution to make BPF programs exec-aware, and a tracepoint is the only
> option IMHO).
>
> Thanks,
> -- Marco
--
Masami Hiramatsu (Google)
On Tue, 9 Apr 2024 15:13:09 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The "buffer_percent" logic that is used by the ring buffer splice code to
> only wake up the tasks when there's no data after the buffer is filled to
> the perc
rethook
> infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]).
>
> [0]
> https://lore.kernel.org/bpf/caef4bzauq2wkmjzdc9s0rbwa01bybgwhn6andxqshyia47p...@mail.gmail.com/
>
This looks good to me :)
Acked-by: Masami Hiramatsu (Google)
Thank you,
> Cc: S
claimed
> @@ -174,6 +175,7 @@ struct rethook_node *rethook_try_get(struct rethook *rh)
>*/
> if (unlikely(!rcu_is_watching()))
> return NULL;
> +#endif
>
> return (struct rethook_node *)objpool_pop(>pool);
> }
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
Hi Paul,
Thanks, both looks good to me.
Acked-by: Masami Hiramatsu (Google)
Let me update bootconfig/fixes.
On Mon, 8 Apr 2024 21:42:49 -0700
"Paul E. McKenney" wrote:
> Hello!
>
> This series removes redundant comments from /proc/bootconfig:
>
> 1.fs/proc:
On Tue, 9 Apr 2024 14:20:45 +0800
Zheng Yejian wrote:
> On 2024/4/8 20:41, Masami Hiramatsu (Google) wrote:
> > Hi Zheng,
> >
> > On Mon, 8 Apr 2024 16:34:03 +0800
> > Zheng Yejian wrote:
> >
> >> There is once warn in __arm_kprobe_ftrace() on:
> >sigreturn() exists only to allow the implementation of signal
> > handlers. It should never be
> >called directly. Details of the arguments (if any) passed to
> > sigreturn() vary depending on
> >the architecture.
> >
> > this is how sys_uretprobe() should be treated/documented.
>
> yes, will include man page patch in new version
And please follow Documentation/process/adding-syscalls.rst in new version,
then we can avoid repeating the same discussion :-)
Thank you!
>
> jirka
>
> >
> > sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> > and return -EINVAL if this addr is not valid. But why?
> >
> > Oleg.
> >
--
Masami Hiramatsu (Google)
900, Masami Hiramatsu wrote:
> > > > On Fri, 5 Apr 2024 10:23:24 +0900
> > > > Masami Hiramatsu (Google) wrote:
> > > >
> > > > > On Thu, 4 Apr 2024 10:43:14 -0700
> > > > > "Paul E. McKenney" wrote:
> >
if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
> + if (tsk != current && task_is_running(tsk))
> return 0;
>
> do {
> --
> 2.34.1
>
--
Masami Hiramatsu (Google)
7,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> goto out;
> }
>
> - /* Check if 'p' is probing a module. */
> - *probed_mod = __module_text_address((unsigned long) p->addr);
> + /* Get module refcount and reject __init functions for loaded modules.
> */
> if (*probed_mod) {
> /*
>* We must hold a refcount of the probed module while updating
> --
> 2.25.1
>
--
Masami Hiramatsu (Google)
).
I would like to clear that the abuse of this syscall will not possible to harm
the normal programs, and even if it is used by malicious code (e.g. injected by
stack overflow) it doesn't cause a problem. At least thsese points are cleared,
and documented. it is easier to push it as new Linux API.
Thank you,
>
> Thanks!
>
> Oleg.
>
>
--
Masami Hiramatsu (Google)
ry == ri->stack
0: |ret-addr| <- call pushed it
0: |ret-addr| <- sp at return trampoline
3: |r11 | <- regs->sp at syscall
2: |rcx |
1: |rax | <- ri->stack
0: |ret-addr|
(Note: The lower the line, the larger the address.)
Thus, we can check the stack address by (regs->sp + 16 == ri->stack).
Thank you,
--
Masami Hiramatsu (Google)
t_reserved(p->addr, p->addr) ||
> @@ -1581,7 +1587,6 @@ static int check_kprobe_address_safe(struct kprobe *p,
> }
>
> /* Check if 'p' is probing a module. */
/* Get module refcount and reject __init functions for loaded modules.
*/
> - *probed_mod = __module_text_address((unsigned long) p->addr);
> if (*probed_mod) {
> /*
>* We must hold a refcount of the probed module while updating
> --
> 2.25.1
>
Thank you,
--
Masami Hiramatsu (Google)
h Masami's tree, but the change makes
> > sense to me, given this is an expected condition.
> >
> > Acked-by: Andrii Nakryiko
>
> Masami, I assume you'll pick this up?
OK, anyway it will just return 0 if this situation happens, and caller will
get the trampoline address instead of correct return address in this case.
I think it does not do any unsafe things. So I agree removing it.
But I think the explanation is a bit confusing.
Thank you,
>
> Thanks,
> Daniel
--
Masami Hiramatsu (Google)
On Tue, 2 Apr 2024 22:21:00 -0700
Andrii Nakryiko wrote:
> On Tue, Apr 2, 2024 at 9:00 PM Andrii Nakryiko
> wrote:
> >
> > On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt wrote:
> > >
> > > On Wed, 3 Apr 2024 09:40:48 +0900
> > > Masami Hirama
> handlers. It should never be
>called directly. Details of the arguments (if any) passed to
> sigreturn() vary depending on
>the architecture.
>
> this is how sys_uretprobe() should be treated/documented.
OK.
>
> sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> and return -EINVAL if this addr is not valid. But why?
Because sigreturn() never returns, but sys_uretprobe() will return.
If it changes the regs->ip and directly returns to the original return address,
I think it is natural to send SIGILL.
Thank you,
--
Masami Hiramatsu (Google)
On Thu, 4 Apr 2024 21:25:41 -0700
"Paul E. McKenney" wrote:
> On Fri, Apr 05, 2024 at 11:57:45AM +0900, Masami Hiramatsu wrote:
> > On Fri, 5 Apr 2024 10:23:24 +0900
> > Masami Hiramatsu (Google) wrote:
> >
> > > On Thu, 4 Apr 2024 10:43:
On Fri, 5 Apr 2024 10:23:24 +0900
Masami Hiramatsu (Google) wrote:
> On Thu, 4 Apr 2024 10:43:14 -0700
> "Paul E. McKenney" wrote:
>
> > On Thu, Apr 04, 2024 at 08:55:22AM +0900, Masami Hiramatsu wrote:
> > > On Wed, 3 Apr 2024 12:16:28 -07
c894d4b ("fs/proc: Add boot loader arguments as comment to
> > > /proc/bootconfig")
> > > Signed-off-by: Zhenhua Huang
> > > Signed-off-by: Paul E. McKenney
> > > Cc: Masami Hiramatsu
> > > Cc:
> > > Cc:
> >
> > OOps, good
sp + expected_offset)
goto sigill;
expected_offset should be 16 (push * 3 - ret) on x64 if we ri->stack is the
regs->sp right after call.
Thank you,
--
Masami Hiramatsu (Google)
peration strictly.
> > > > > I concern that new system calls introduce vulnerabilities.
> > > > >
> > > >
> > > > As Oleg and Jiri mentioned, this syscall can't harm kernel or other
> > > > processes, only the process that is abusing the API. So any extra
> > > > checks that would slow down this approach is an unnecessary overhead
> > > > and complication that will never be useful in practice.
> > >
> > > I think at least it should check the caller address to ensure the
> > > address is in the trampoline.
> > > But anyway, uprobes itself can break the target process, so no one
> > > might care if this system call breaks the process now.
> >
> > If we already have an expected range of addresses, then I think it's
> > fine to do a quick unlikely() check. I'd be more concerned if we need
> > to do another lookup or search to just validate this. I'm sure Jiri
> > will figure it out.
>
> Oleg mentioned the trampoline address check could race with another
> thread's mremap call, however trap is using that check as well, it
> still seems like good idea to do it also in the uretprobe syscall
Yeah, and also, can we add a stack pointer check if the trampoline is
shared with other probe points?
Thank you,
>
> jirka
--
Masami Hiramatsu (Google)
ul in practice.
> >
> > I think at least it should check the caller address to ensure the
> > address is in the trampoline.
> > But anyway, uprobes itself can break the target process, so no one
> > might care if this system call breaks the process now.
>
> If we already have an expected range of addresses, then I think it's
> fine to do a quick unlikely() check. I'd be more concerned if we need
> to do another lookup or search to just validate this. I'm sure Jiri
> will figure it out.
Good.
>
> >
> > >
> > > Also note that sys_uretprobe is a kind of internal and unstable API
> > > and it is explicitly called out that its contract can change at any
> > > time and user space shouldn't rely on it. It's purely for the kernel's
> > > own usage.
> >
> > Is that OK to use a syscall as "internal" and "unstable" API?
>
> See above about rt_sigreturn. It seems like yes, for some highly
> specialized syscalls it is the case already.
OK, but as I said it is just for performance optimization, that is
a bit different from rt_sigreturn case.
Thank you,
> >
> > >
> > > So let's please keep it fast and simple.
> > >
> > >
> > > > Thank you,
> > > >
> > > >
> > > > >
> > > > > thanks,
> > > > > jirka
> > > > >
> > > > >
> > > > > >
> > >
> > > [...]
> >
> >
> > ([OT] If we can add syscall so casually, I would like to add sys_traceevent
> > for recording user space events :-) .)
>
> Have you proposed this upstream? :) I have no clue and no opinion about it...
>
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
t should check the caller address to ensure the
address is in the trampoline.
But anyway, uprobes itself can break the target process, so no one
might care if this system call breaks the process now.
>
> Also note that sys_uretprobe is a kind of internal and unstable API
> and it is explicitly called out that its contract can change at any
> time and user space shouldn't rely on it. It's purely for the kernel's
> own usage.
Is that OK to use a syscall as "internal" and "unstable" API?
>
> So let's please keep it fast and simple.
>
>
> > Thank you,
> >
> >
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > >
>
> [...]
([OT] If we can add syscall so casually, I would like to add sys_traceevent
for recording user space events :-) .)
--
Masami Hiramatsu (Google)
> Signed-off-by: Paul E. McKenney
> Cc: Masami Hiramatsu
> Cc:
> Cc:
OOps, good catch! Let me pick it.
Acked-by: Masami Hiramatsu (Google)
Thank you!
>
> diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> index 902b326e1e560..e5635a6b127b0 100644
>
__NR_uretprobe 462
> > > +__SYSCALL(__NR_uretprobe, sys_uretprobe)
> > > +
> > > #undef __NR_syscalls
> > > -#define __NR_syscalls 462
> > > +#define __NR_syscalls 463
> > >
> > > /*
> > > * 32 bit systems traditionally used different
> > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > index 929e98c62965..90395b16bde0 100644
> > > --- a/kernel/events/uprobes.c
> > > +++ b/kernel/events/uprobes.c
> > > @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm,
> > > struct xol_area *area)
> > > return ret;
> > > }
> > >
> > > +void * __weak arch_uprobe_trampoline(unsigned long *psize)
> > > +{
> > > + static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> > > +
> > > + *psize = UPROBE_SWBP_INSN_SIZE;
> > > + return
> > > +}
> > > +
> > > static struct xol_area *__create_xol_area(unsigned long vaddr)
> > > {
> > > struct mm_struct *mm = current->mm;
> > > - uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> > > + unsigned long insns_size;
> > > struct xol_area *area;
> > > + void *insns;
> > >
> > > area = kmalloc(sizeof(*area), GFP_KERNEL);
> > > if (unlikely(!area))
> > > @@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned
> > > long vaddr)
> > > /* Reserve the 1st slot for get_trampoline_vaddr() */
> > > set_bit(0, area->bitmap);
> > > atomic_set(>slot_count, 1);
> > > - arch_uprobe_copy_ixol(area->pages[0], 0, , UPROBE_SWBP_INSN_SIZE);
> > > + insns = arch_uprobe_trampoline(_size);
> > > + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
> > >
> > > if (!xol_add_vma(mm, area))
> > > return area;
> > > @@ -2123,7 +2133,7 @@ static struct return_instance
> > > *find_next_ret_chain(struct return_instance *ri)
> > > return ri;
> > > }
> > >
> > > -static void handle_trampoline(struct pt_regs *regs)
> > > +void uprobe_handle_trampoline(struct pt_regs *regs)
> > > {
> > > struct uprobe_task *utask;
> > > struct return_instance *ri, *next;
> > > @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
> > >
> > > bp_vaddr = uprobe_get_swbp_addr(regs);
> > > if (bp_vaddr == get_trampoline_vaddr())
> > > - return handle_trampoline(regs);
> > > + return uprobe_handle_trampoline(regs);
> > >
> > > uprobe = find_active_uprobe(bp_vaddr, _swbp);
> > > if (!uprobe) {
> > > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> > > index faad00cce269..be6195e0d078 100644
> > > --- a/kernel/sys_ni.c
> > > +++ b/kernel/sys_ni.c
> > > @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
> > >
> > > /* restartable sequence */
> > > COND_SYSCALL(rseq);
> > > +
> > > +COND_SYSCALL(uretprobe);
> > > --
> > > 2.44.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
gt; +static __always_inline unsigned long
> > +ftrace_regs_get_return_value(const struct ftrace_regs *fregs)
> > +{
> > + return fregs->a0;
> > +}
> > +
> > +static __always_inline void
> > +ftrace_regs_set_return_value(struct ftrace_regs *fregs,
> > +unsigned long ret)
> > +{
> > + fregs->a0 = ret;
> > +}
> > +
> > +static __always_inline void
> > +ftrace_override_function_with_return(struct ftrace_regs *fregs)
> > +{
> > + fregs->epc = fregs->ra;
> > +}
>
> Style/nit: All above; Try to use the full 100 chars, and keep the
> function name return value on the same line for grepability.
>
>
> Björn
>
> [1]
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
--
Masami Hiramatsu (Google)
tomic_set(>slot_count, 1);
> - arch_uprobe_copy_ixol(area->pages[0], 0, , UPROBE_SWBP_INSN_SIZE);
> + insns = arch_uprobe_trampoline(_size);
> + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>
> if (!xol_add_vma(mm, area))
> return area;
> @@ -2123,7 +2133,7 @@ static struct return_instance
> *find_next_ret_chain(struct return_instance *ri)
> return ri;
> }
>
> -static void handle_trampoline(struct pt_regs *regs)
> +void uprobe_handle_trampoline(struct pt_regs *regs)
> {
> struct uprobe_task *utask;
> struct return_instance *ri, *next;
> @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
>
> bp_vaddr = uprobe_get_swbp_addr(regs);
> if (bp_vaddr == get_trampoline_vaddr())
> - return handle_trampoline(regs);
> + return uprobe_handle_trampoline(regs);
>
> uprobe = find_active_uprobe(bp_vaddr, _swbp);
> if (!uprobe) {
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index faad00cce269..be6195e0d078 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
>
> /* restartable sequence */
> COND_SYSCALL(rseq);
> +
> +COND_SYSCALL(uretprobe);
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
:
> > >
> > > > On Mon, 1 Apr 2024 20:25:52 +0900
> > > > Masami Hiramatsu (Google) wrote:
> > > >
> > > > > > Masami,
> > > > > >
> > > > > > Are you OK with just keeping it set to N.
> &
On Mon, 1 Apr 2024 12:09:18 -0400
Steven Rostedt wrote:
> On Mon, 1 Apr 2024 20:25:52 +0900
> Masami Hiramatsu (Google) wrote:
>
> > > Masami,
> > >
> > > Are you OK with just keeping it set to N.
> >
> > OK, if it is only for the debugging,
Thanks!
>
> Masami,
>
> Are you OK with just keeping it set to N.
OK, if it is only for the debugging, I'm OK to set N this.
>
> We could have other options like PROVE_LOCKING enable it.
Agreed (but it should say this is a debug option)
Thank you,
>
> -- Steve
--
Masami Hiramatsu (Google)
gt; > > }
> > > > > > > > @@ -935,9 +935,9 @@ static void delete_uprobe(struct uprobe
> > > > > > > > *uprobe)
> > > > > > > > if (WARN_ON(!uprobe_is_active(uprobe)))
> > > > > > > > return;
> > > > &g
struct inode
> > > > > *inode,
> > > > > min = vaddr_to_offset(vma, start);
> > > > > max = min + (end - start) - 1;
> > > > >
> > > > > - spin_lock(_treelock);
> > > > > + read_lock(_treelock);
> > > > > n = find_node_in_range(inode, min, max);
> > > > > if (n) {
> > > > > for (t = n; t; t = rb_prev(t)) {
> > > > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct inode
> > > > > *inode,
> > > > > get_uprobe(u);
> > > > > }
> > > > > }
> > > > > - spin_unlock(_treelock);
> > > > > + read_unlock(_treelock);
> > > > > }
> > > > >
> > > > > /* @vma contains reference counter, not the probed instruction. */
> > > > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct *vma,
> > > > > unsigned long start, unsigned long e
> > > > > min = vaddr_to_offset(vma, start);
> > > > > max = min + (end - start) - 1;
> > > > >
> > > > > - spin_lock(_treelock);
> > > > > + read_lock(_treelock);
> > > > > n = find_node_in_range(inode, min, max);
> > > > > - spin_unlock(_treelock);
> > > > > + read_unlock(_treelock);
> > > > >
> > > > > return !!n;
> > > > > }
> > > > > --
> > > > > 2.43.0
> > > > >
> > > >
> > > >
> > > > --
> > > > Masami Hiramatsu (Google)
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
ution time is observed (28 -> 3.5
> > > microsecs).
> >
> > Looks good to me.
> >
> > Acked-by: Masami Hiramatsu (Google)
> >
> > BTW, how did you measure the overhead? I think spinlock overhead
> > will depend on how much lo
> > > This change has been tested against production workloads that exhibit
> > > significant contention on the spinlock and an almost order of magnitude
> > > reduction for mean uprobe execution time is observed (28 -> 3.5
> > > microsecs).
> >
> > Looks go
found that exposing CONFIG_ALLOC_EXECMEM
to user does not help, it should be an internal change. So hiding this change
from user is better choice. Then there is no reason to introduce the new
alloc_execmem, but just expand kprobe_alloc_insn_page() is reasonable.
Mark, can you send this series here, so that others can review/test it?
Thank you!
>
> Thanks,
> Mark.
--
Masami Hiramatsu (Google)
deed. We need module_state() function to avoid it.
> >
> > It is non-existent unless CONFIG_MODULES is set given how things are
> > flagged in include/linux/module.h.
>
> Hey, noticed kconfig issue.
>
> According to kconfig-language.txt:
>
> "select should be used with care. select will force a symbol to a value
> without visiting the dependencies."
>
> So the problem here lies in KPROBES config entry using select statement
> to pick ALLOC_EXECMEM. It will not take the depends on statement into
> account and thus will allow to select kprobes without any allocator in
> place.
OK, in that case "depend on" is good.
>
> So to address this I'd suggest to use depends on statement also for
> describing relation between KPROBES and ALLOC_EXECMEM. It does not make
> life worse than before for anyone because even with the current kernel
> you have to select MODULES before you can move forward with kprobes.
Yeah, since ALLOC_EXECMEM is enabled by default.
Thank you!
>
> BR, Jarkko
--
Masami Hiramatsu (Google)
e_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +709,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* CONFIG_MODULES */
As I similar to the above, let's move trace_kprobe_module_nb outside
of #ifdef.
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1903,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#ifdef CONFIG_MODULES
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif /* CONFIG_MODULES */
And remove this #ifdef.
Thank you!
>
> return 0;
> }
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
mp; IS_ENABLED(CONFIG_MODULES))
err = register_module_notifier(_module_nb);
kprobes_initialized = (err == 0);
-
Thank you,
--
Masami Hiramatsu (Google)
his change has been tested against production workloads that exhibit
> significant contention on the spinlock and an almost order of magnitude
> reduction for mean uprobe execution time is observed (28 -> 3.5 microsecs).
Looks good to me.
Acked-by: Masami Hiramatsu (Google)
BTW, how d
checks to make sure RCU is on when ftrace callbacks recurse.
> +
> + This will add more overhead to all ftrace-based invocations.
... invocations, but keep it safe.
> +
> + If unsure, say N
If unsure, say Y
Thank you,
> +
> config RING_BUFFER_RECORD_RECURSION
> bool "Record functions that recurse in the ring buffer"
> depends on FTRACE_RECORD_RECURSION
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
et == -ENOENT) {
> pr_warn("This probe might be able to register after target
> module is loaded. Continue.\n");
> ret = 0;
> }
> @@ -670,6 +680,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> return ret;
> }
>
> +#ifdef CONFIG_MODULES
> /* Module notifier call back, checking event on the module */
> static int trace_kprobe_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +715,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* CONFIG_MODULES */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1909,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#ifdef CONFIG_MODULES
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif /* CONFIG_MODULES */
>
> return 0;
> }
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
"%s%s+0x0(+0x%zx(+0x%zx(%s))):string",
> +equal ? tmp : "", equal ? "=" : "",
> +offsetof(struct dentry, d_name.name),
> +offsetof(struct file, f_path.dentry),
> +equal ? equal + 1 : tmp);
> +
> + kfree(tmp);
> + if (ret >= bufsize - used)
> + goto nomem;
> + argv[i] = tmpbuf + used;
> + used += ret + 1;
> }
>
> *buf = tmpbuf;
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
le addresses.
>
Hi Ye,
Thanks for update! There is a nit comment but basically OK.
Acked-by: Masami Hiramatsu (Google)
Thank you,
> Diff v8 vs v7:
> 1. Add detail change log for patch[1-2];
>
> Diff v7 vs v6:
> 1. Squash [1/8] to [3/8] patches into 1 patch;
> 2. S
On Fri, 22 Mar 2024 00:07:59 +0900
Masami Hiramatsu (Google) wrote:
> > What would be really useful is if we had a way to expose BTF here.
> > Something like:
> >
> > "%pB::"
> >
> > The "%pB" would mean to look up the struct/field o
p the struct/field offsets and types via BTF,
> and create the appropriate command to find and print it.
Would you mean casing the pointer to ""?
>
> That would be much more flexible and useful as it would allow reading any
> field from any structure without having to use gdb.
>
> -- Steve
>
>
> > + kfree(tmp);
> > + if (ret >= bufsize - used)
> > + goto nomem;
> > + argv[i] = tmpbuf + used;
> > + used += ret + 1;
> > + }
> > + }
> > +
> > + *buf = tmpbuf;
> > + return 0;
> > +nomem:
> > + kfree(tmpbuf);
> > + return -ENOMEM;
> > +}
>
--
Masami Hiramatsu (Google)
dentry/file addresses.
Thanks for update! This series looks good to me.
Acked-by: Masami Hiramatsu (Google)
Let me pick this and test.
Thank you!
>
> Diff v7 vs v6:
> 1. Squash [1/8] to [3/8] patches into 1 patch;
> 2. Split readme_msg[] into each patch;
>
> Diff v6 vs v
On Wed, 20 Mar 2024 09:27:49 -0400
Steven Rostedt wrote:
> On Wed, 20 Mar 2024 17:10:38 +0900
> "Masami Hiramatsu (Google)" wrote:
>
> > From: Masami Hiramatsu (Google)
> >
> > Fix to initialize 'val' local variable with zero.
> > Dan reported that
@@ static int __trace_fprobe_create(int argc, const char
> *argv[])
> trace_probe_log_clear();
> kfree(new_argv);
> kfree(symbol);
> + kfree(dbuf);
> return ret;
>
> parse_error:
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
_cnt);
> + do_div(delta, (u32)bm_cnt);
> avg = delta;
>
> if (stddev > 0) {
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
race_kprobe.c | 6 ++
> kernel/trace/trace_probe.c| 63 +++
> kernel/trace/trace_probe.h| 2 +
> .../ftrace/test.d/dynevent/fprobe_args_vfs.tc | 40
> .../ftrace/test.d/kprobe/kprobe_args_vfs.tc | 43 +
> 8 files changed, 167 insertions(+), 3 deletions(-)
> create mode 100644
> tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc
> create mode 100644
> tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
>
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
\[\\]\n"
> + "\t symstr, %pd/%pD, \\[\\]\n"
> #ifdef CONFIG_HIST_TRIGGERS
> "\tfield: ;\n"
> "\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
On Sun, 17 Mar 2024 10:53:59 -0500
Jinghao Jia wrote:
>
>
> On 3/16/24 08:46, Masami Hiramatsu (Google) wrote:
> > On Thu, 14 Mar 2024 18:56:35 -0500
> > Jinghao Jia wrote:
> >
> >> On 3/14/24 10:17, Masami Hiramatsu (Google) wrote:
> >>> F
From: Masami Hiramatsu (Google)
Fix to initialize 'val' local variable with zero.
Dan reported that Smatch static code checker reports an error that a local
'val' variable needs to be initialized. Actually, the 'val' is expected to
be initialized by FETCH_OP_ARG in the same loop
On Tue, 19 Mar 2024 10:10:00 -0400
Steven Rostedt wrote:
> On Tue, 19 Mar 2024 10:19:09 +0300
> Dan Carpenter wrote:
>
> > Hello Masami Hiramatsu (Google),
> >
> > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> > (kprobe and
st argument %pD with name"
> +echo 'p:testprobe vfs_read name=$arg1:%pD' > kprobe_events
> +echo 1 > events/kprobes/testprobe/enable
> +grep -q "1" events/kprobes/testprobe/enable
> +echo 0 > events/kprobes/testprobe/enable
> +grep "vfs_read" trace | grep -q "enable"
> +echo "" > kprobe_events
> +echo "" > trace
> +
> +: "Test argument %pD without name"
> +echo 'p:testprobe vfs_read $arg1:%pD' > kprobe_events
> +echo 1 > events/kprobes/testprobe/enable
> +grep -q "1" events/kprobes/testprobe/enable
> +echo 0 > events/kprobes/testprobe/enable
> +grep "vfs_read" trace | grep -q "enable"
> +echo "" > kprobe_events
> +echo "" > trace
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
gt; > uprobes: encapsulate preparation of uprobe args buffer
> > > uprobes: prepare uprobe args buffer lazily
> > > uprobes: add speculative lockless system-wide uprobe filter check
> > >
> > > kernel/trace/trace_uprobe.c | 103 +---
> > > 1 file changed, 59 insertions(+), 44 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
On Tue, 19 Mar 2024 13:20:57 +0900
Masami Hiramatsu (Google) wrote:
> Hi,
>
> On Mon, 18 Mar 2024 11:17:25 -0700
> Andrii Nakryiko wrote:
>
> > This patch set implements two speed ups for uprobe/uretprobe runtime
> > execution
> > path for some common scenar
stem-wide uprobe filter check
>
> kernel/trace/trace_uprobe.c | 103 +---
> 1 file changed, 59 insertions(+), 44 deletions(-)
>
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
From: Masami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area
On Fri, 15 Mar 2024 00:12:30 +0900
"Masami Hiramatsu (Google)" wrote:
> From: Masami Hiramatsu (Google)
>
> Read from an unsafe address with copy_from_kernel_nofault() in
> arch_adjust_kprobe_addr() because this function is used before checking
> the address is in t
From: Masami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area
On Tue, 12 Mar 2024 11:56:41 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The rb_watermark_hit() checks if the amount of data in the ring buffer is
> above the percentage level passed in by the "full" variable. If it is, it
>
On Tue, 12 Mar 2024 09:19:21 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The check for knowing if the poll should wait or not is basically the
> exact same logic as rb_watermark_hit(). The only difference is that
> rb_watermark_hit() al
On Tue, 12 Mar 2024 09:19:20 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> If a reader of the ring buffer is doing a poll, and waiting for the ring
> buffer to hit a specific watermark, there could be a case where it gets
> into an infinite ping
naming, execmem_alloc() seems reasonable to me? I have no strong
> > feelings at all, I'll just use that going forward unless somebody else
> > expresses an opinion.
>
> I am not good at naming things. No objection from me to "execmem_alloc".
Hm, it sounds good to me too. I think we should add a patch which just
rename the module_alloc/module_memfree with execmem_alloc/free first.
Thanks,
--
Masami Hiramatsu (Google)
{
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
>
> Plain #ifdef will do here and below. IS_ENABLED is for usage withing the
> code, like
>
> if (IS_ENABLED(CONFIG_MODULES))
> ;
>
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
>
> --
> Sincerely yours,
> Mike.
--
Masami Hiramatsu (Google)
@@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif
>
> return 0;
> }
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
>
> If you're gonna split code up to move to another place, it'd be nice
> if you can add copyright headers as was done with the kernel/module.c
> split into kernel/module/*.c
>
> Can we start with some small basic stuff we can all agree on?
>
> [0] https://lwn.net/Articles/944857/
>
> Luis
--
Masami Hiramatsu (Google)
VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
> + NUMA_NO_NODE, __builtin_return_address(0));
>
On Mon, 4 Mar 2024 22:34:33 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> Limit the max print event of trace_marker to just 4K string size. This must
> also be less than the amount that can be held by a trace_seq along with
> the text that i
On Thu, 29 Feb 2024 22:58:54 +0100
Jiri Olsa wrote:
> On Thu, Feb 29, 2024 at 08:22:47PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google)
> >
> > Fix to allocate fprobe::entry_data_size buffer with rethook instances.
> > If fprobe doesn
On Thu, 29 Feb 2024 20:22:47 +0900
"Masami Hiramatsu (Google)" wrote:
> From: Masami Hiramatsu (Google)
>
> Fix to allocate fprobe::entry_data_size buffer with rethook instances.
> If fprobe doesn't allocate entry_data_size buffer for each rethook instance,
> fprob
From: Masami Hiramatsu (Google)
Fix to allocate fprobe::entry_data_size buffer with rethook instances.
If fprobe doesn't allocate entry_data_size buffer for each rethook instance,
fprobe entry handler can cause a buffer overrun when storing entry data in
entry handler.
Reported-by: Jiri Olsa
On Tue, 27 Feb 2024 12:57:06 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The trace_marker write goes into the ring buffer. A test was added to
> write a string as big as the sub-buffer of the ring buffer to see if it
> would work. A sub-buff
On Mon, 26 Feb 2024 13:56:09 +0100
Jiri Olsa wrote:
> On Mon, Feb 26, 2024 at 12:20:43AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google)
> >
> > Rewrite fprobe implementation on function-graph tracer.
> > Major API changes are:
&
201 - 300 of 1419 matches
Mail list logo