Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-02 Thread Andrii Nakryiko
On Tue, Apr 2, 2024 at 9:00 PM Andrii Nakryiko
 wrote:
>
> On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt  wrote:
> >
> > On Wed, 3 Apr 2024 09:40:48 +0900
> > Masami Hiramatsu (Google)  wrote:
> >
> > > OK, for me, this last sentence is preferred for the help message. That 
> > > explains
> > > what this is for.
> > >
> > > All callbacks that attach to the function tracing have some sort
> > > of protection against recursion. This option is only to verify 
> > > that
> > >    ftrace (and other users of ftrace_test_recursion_trylock()) are not
> > > called outside of RCU, as if they are, it can cause a race.
> > > But it also has a noticeable overhead when enabled.
>
> Sounds good to me, I can add this to the description of the Kconfig option.
>
> > >
> > > BTW, how much overhead does this introduce? and the race case a kernel 
> > > crash?
>
> I just checked our fleet-wide production data for the last 24 hours.
> Within the kprobe/kretprobe code path (ftrace_trampoline and
> everything called from it), rcu_is_watching (both calls, see below)
> cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd
> prefer to be able to avoid that in production use cases.
>

I just ran synthetic microbenchmark testing multi-kretprobe
throughput. We get (in millions of BPF kretprobe-multi program
invocations per second):
  - 5.568M/s as baseline;
  - 5.679M/s with changes in this patch (+2% throughput improvement);
  - 5.808M/s with disabling rcu_is_watching in rethook_try_get()
(+2.3% more vs just one of rcu_is_watching, and +4.3% vs baseline).

It's definitely noticeable.

> > > or just messed up the ftrace buffer?
> >
> > There's a hypothetical race where it can cause a use after free.
> >
> > That is, after you shutdown ftrace, you need to call 
> > synchronize_rcu_tasks(),
> > which requires RCU to be watching. There's a theoretical case where that
> > task calls the trampoline and misses the synchronization. Note, these
> > locations are with preemption disabled, as rcu is always watching when
> > preemption is enabled. Thus it would be extremely fast where as the
> > synchronize_rcu_tasks() is rather slow.
> >
> > We also have synchronize_rcu_tasks_rude() which would actually keep the
> > trace from happening, as it would schedule on each CPU forcing all CPUs to
> > have RCU watching.
> >
> > I have never heard of this race being hit. I guess it could happen on a VM
> > where a vCPU gets preempted at the right moment for a long time and the
> > other CPUs synchronize.
> >
> > But like lockdep, where deadlocks can crash the kernel, we don't enable it
> > for production.
> >
> > The overhead is another function call within the function tracer. I had
> > numbers before, but I guess I could run tests again and get new numbers.
> >
>
> I just noticed another rcu_is_watching() call, in rethook_try_get(),
> which seems to be a similar and complementary validation check to the
> one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option
> in this patch. It feels like both of them should be controlled by the
> same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> guard around rcu_is_watching() check in rethook_try_get() as well?
>
>
> > Thanks,
> >
> > -- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-02 Thread Andrii Nakryiko
On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt  wrote:
>
> On Wed, 3 Apr 2024 09:40:48 +0900
> Masami Hiramatsu (Google)  wrote:
>
> > OK, for me, this last sentence is preferred for the help message. That 
> > explains
> > what this is for.
> >
> > All callbacks that attach to the function tracing have some sort
> > of protection against recursion. This option is only to verify that
> >    ftrace (and other users of ftrace_test_recursion_trylock()) are not
> > called outside of RCU, as if they are, it can cause a race.
> > But it also has a noticeable overhead when enabled.

Sounds good to me, I can add this to the description of the Kconfig option.

> >
> > BTW, how much overhead does this introduce? and the race case a kernel 
> > crash?

I just checked our fleet-wide production data for the last 24 hours.
Within the kprobe/kretprobe code path (ftrace_trampoline and
everything called from it), rcu_is_watching (both calls, see below)
cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd
prefer to be able to avoid that in production use cases.

> > or just messed up the ftrace buffer?
>
> There's a hypothetical race where it can cause a use after free.
>
> That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(),
> which requires RCU to be watching. There's a theoretical case where that
> task calls the trampoline and misses the synchronization. Note, these
> locations are with preemption disabled, as rcu is always watching when
> preemption is enabled. Thus it would be extremely fast where as the
> synchronize_rcu_tasks() is rather slow.
>
> We also have synchronize_rcu_tasks_rude() which would actually keep the
> trace from happening, as it would schedule on each CPU forcing all CPUs to
> have RCU watching.
>
> I have never heard of this race being hit. I guess it could happen on a VM
> where a vCPU gets preempted at the right moment for a long time and the
> other CPUs synchronize.
>
> But like lockdep, where deadlocks can crash the kernel, we don't enable it
> for production.
>
> The overhead is another function call within the function tracer. I had
> numbers before, but I guess I could run tests again and get new numbers.
>

I just noticed another rcu_is_watching() call, in rethook_try_get(),
which seems to be a similar and complementary validation check to the
one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option
in this patch. It feels like both of them should be controlled by the
same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
guard around rcu_is_watching() check in rethook_try_get() as well?


> Thanks,
>
> -- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-02 Thread Steven Rostedt
On Wed, 3 Apr 2024 09:40:48 +0900
Masami Hiramatsu (Google)  wrote:

> OK, for me, this last sentence is preferred for the help message. That 
> explains
> what this is for. 
> 
> All callbacks that attach to the function tracing have some sort
> of protection against recursion. This option is only to verify that
>    ftrace (and other users of ftrace_test_recursion_trylock()) are not
> called outside of RCU, as if they are, it can cause a race.
> But it also has a noticeable overhead when enabled.
> 
> BTW, how much overhead does this introduce? and the race case a kernel crash?
> or just messed up the ftrace buffer?

There's a hypothetical race where it can cause a use after free.

That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(),
which requires RCU to be watching. There's a theoretical case where that
task calls the trampoline and misses the synchronization. Note, these
locations are with preemption disabled, as rcu is always watching when
preemption is enabled. Thus it would be extremely fast where as the
synchronize_rcu_tasks() is rather slow.

We also have synchronize_rcu_tasks_rude() which would actually keep the
trace from happening, as it would schedule on each CPU forcing all CPUs to
have RCU watching.

I have never heard of this race being hit. I guess it could happen on a VM
where a vCPU gets preempted at the right moment for a long time and the
other CPUs synchronize.

But like lockdep, where deadlocks can crash the kernel, we don't enable it
for production.

The overhead is another function call within the function tracer. I had
numbers before, but I guess I could run tests again and get new numbers.

Thanks,

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-02 Thread Google
On Mon, 1 Apr 2024 22:47:33 -0400
Steven Rostedt  wrote:

> On Mon, 1 Apr 2024 19:29:46 -0700
> Andrii Nakryiko  wrote:
> 
> > On Mon, Apr 1, 2024 at 5:38 PM Masami Hiramatsu  wrote:
> > >
> > > On Mon, 1 Apr 2024 12:09:18 -0400
> > > Steven Rostedt  wrote:
> > >  
> > > > On Mon, 1 Apr 2024 20:25:52 +0900
> > > > Masami Hiramatsu (Google)  wrote:
> > > >  
> > > > > > Masami,
> > > > > >
> > > > > > Are you OK with just keeping it set to N.  
> > > > >
> > > > > OK, if it is only for the debugging, I'm OK to set N this.
> > > > >  
> > > > > >
> > > > > > We could have other options like PROVE_LOCKING enable it.  
> > > > >
> > > > > Agreed (but it should say this is a debug option)  
> > > >
> > > > It does say "Validate" which to me is a debug option. What would you
> > > > suggest?  
> > >
> > > I think the help message should have "This is for debugging ftrace."
> > >  
> > 
> > Sent v2 with adjusted wording, thanks!
> 
> You may want to wait till Masami and I agree ;-)
> 
> Masami,
> 
> But it isn't really for "debugging", it's for validating. That is, it
> doesn't give us any information to debug ftrace. It only validates if it is
> executed properly. In other words, I never want to be asked "How can I use
> this option to debug ftrace?"
> 
> For example, we also have:
> 
>   "Verify ring buffer time stamp deltas"
> 
> that makes sure the time stamps of the ring buffer are not buggy.
> 
> And there's:
> 
>   "Verify compile time sorting of ftrace functions"
> 
> Which is also used to make sure things are working properly.
> 
> Neither of the above says they are for "debugging", even though they are
> more useful for debugging than this option.
> 
> I'm not sure saying this is "debugging ftrace" is accurate. It may help
> debug ftrace if it is called outside of an RCU location, which has a
> 1 in 100,000,000,000 chance of causing an actual bug, as the race window is
> extremely small. 
> 
> Now if it is also called outside of instrumentation, that will likely trigger
> other warnings even without this code, and this will not be needed to debug
> that.
> 
> ftrace has all sorts of "verifiers" that is used to make sure things are
> working properly. And yes, you can consider it as "debugging". But I would
> not consider this an option to enable if ftrace was broken, and you are
> looking into why it is broken.
> 
> To me, this option is only to verify that ftrace (and other users of
> ftrace_test_recursion_trylock()) are not called outside of RCU, as if they
> are, it can cause a race. But it also has a noticeable overhead when enabled.

OK, for me, this last sentence is preferred for the help message. That explains
what this is for. 

  All callbacks that attach to the function tracing have some sort
  of protection against recursion. This option is only to verify that
 ftrace (and other users of ftrace_test_recursion_trylock()) are not
  called outside of RCU, as if they are, it can cause a race.
  But it also has a noticeable overhead when enabled.

BTW, how much overhead does this introduce? and the race case a kernel crash?
or just messed up the ftrace buffer?

Thank you,

> 
> -- Steve
> 
> 
> -- Steve


-- 
Masami Hiramatsu (Google) 



Re: [PATCH bpf-next] rethook: Remove warning messages printed for finding return address of a frame.

2024-04-02 Thread Andrii Nakryiko
On Mon, Apr 1, 2024 at 12:16 PM Kui-Feng Lee  wrote:
>
> rethook_find_ret_addr() prints a warning message and returns 0 when the
> target task is running and not the "current" task to prevent returning an
> incorrect return address. However, this check is incomplete as the target
> task can still transition to the running state when finding the return
> address, although it is safe with RCU.
>
> The issue we encounter is that the kernel frequently prints warning
> messages when BPF profiling programs call to bpf_get_task_stack() on
> running tasks.
>
> The callers should be aware and willing to take the risk of receiving an
> incorrect return address from a task that is currently running other than
> the "current" one. A warning is not needed here as the callers are intent
> on it.
>
> Signed-off-by: Kui-Feng Lee 
> ---
>  kernel/trace/rethook.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index fa03094e9e69..4297a132a7ae 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
> @@ -248,7 +248,7 @@ unsigned long rethook_find_ret_addr(struct task_struct 
> *tsk, unsigned long frame
> if (WARN_ON_ONCE(!cur))
> return 0;
>
> -   if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
> +   if (tsk != current && task_is_running(tsk))
> return 0;
>

This should probably go through Masami's tree, but the change makes
sense to me, given this is an expected condition.

Acked-by: Andrii Nakryiko 

> do {
> --
> 2.34.1
>
>