Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Oleg Nesterov
On 08/10, Steven Rostedt wrote:
>
> Are you OK with this patch? I set it for stable and plan on pushing it
> with the patches for the upcoming merge window. If you are OK, mind
> giving me an Acked or Reviewed-by?

Yes, thanks, feel free to add

Acked-by: Oleg Nesterov 

> Even if it is theoretical, we should make sure it can't happen. But
> this is unrelated to the current patch,

agreed.

> and if we should fix this, then
> it can be a separate patch. I don't think your change hurts, and even
> if it can't technically happen, it may let us sleep better at night.
> Want to send a formal patch to make this change?

OK, will do.

Oleg.



Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Steven Rostedt


[ Removing Jovi as his email is bouncing ]

On Fri, 10 Aug 2018 15:36:08 +0200
Oleg Nesterov  wrote:

> > > Can't we change uprobe_trace_func() and uretprobe_trace_func() to use
> > > rcu_read_lock_sched() instead? It is more cheap.  
> >
> > Is it? rcu_read_lock_sched() is a preempt_disable(),  
> 
> which is just raw_cpu_inc()
> 
> > where
> > rcu_read_lock() may just be a task counter increment.  
> 
> and  __rcu_read_unlock() is more heavy.
> 
> OK, I agree, this doesn't really matter.

Are you OK with this patch? I set it for stable and plan on pushing it
with the patches for the upcoming merge window. If you are OK, mind
giving me an Acked or Reviewed-by?

> 
> > > Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't
> > > look right... Not only because kfree() can race with 
> > > list_for_each_entry_rcu(),
> > > we should not put the 1st link on list until uprobe_buffer_enable().
> > >
> > > Does the patch below make sense or I am confused?  
> >
> > I guess the question is, if it isn't enabled, are there any users or
> > even past users still running.  
> 
> Note that uprobe_register() is not "atomic".
> 
> To simplify, suppose we have 2 tasks T1 and T2 running the probed binary.
> So we are going to do install_breakpoint(T1->mm) + install_breakpoint(T2->mm).
> If the 2nd install_breakpoint() fails for any reason, _register() will do
> remove_breakpoint(T1->mm) and return the error.
> 
> However, T1 can hit this bp right after install_breakpoint(T1->mm), so it
> can call uprobe_trace_func() before list_del(&link->list).
> 
> OK, even if I am right this is mostly theoretical.

Even if it is theoretical, we should make sure it can't happen. But
this is unrelated to the current patch, and if we should fix this, then
it can be a separate patch. I don't think your change hurts, and even
if it can't technically happen, it may let us sleep better at night.
Want to send a formal patch to make this change?

-- Steve


Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Oleg Nesterov
On 08/10, Steven Rostedt wrote:
>
> On Fri, 10 Aug 2018 13:35:49 +0200
> Oleg Nesterov  wrote:
>
> > On 08/09, Steven Rostedt wrote:
> > >
> > > --- a/kernel/trace/trace_uprobe.c
> > > +++ b/kernel/trace/trace_uprobe.c
> > > @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
> > > trace_event_file *file)
> > >
> > >   list_del_rcu(&link->list);
> > >   /* synchronize with u{,ret}probe_trace_func */
> > > - synchronize_sched();
> > > + synchronize_rcu();
> >
> > Can't we change uprobe_trace_func() and uretprobe_trace_func() to use
> > rcu_read_lock_sched() instead? It is more cheap.
>
> Is it? rcu_read_lock_sched() is a preempt_disable(),

which is just raw_cpu_inc()

> where
> rcu_read_lock() may just be a task counter increment.

and  __rcu_read_unlock() is more heavy.

OK, I agree, this doesn't really matter.

> > Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't
> > look right... Not only because kfree() can race with 
> > list_for_each_entry_rcu(),
> > we should not put the 1st link on list until uprobe_buffer_enable().
> >
> > Does the patch below make sense or I am confused?
>
> I guess the question is, if it isn't enabled, are there any users or
> even past users still running.

Note that uprobe_register() is not "atomic".

To simplify, suppose we have 2 tasks T1 and T2 running the probed binary.
So we are going to do install_breakpoint(T1->mm) + install_breakpoint(T2->mm).
If the 2nd install_breakpoint() fails for any reason, _register() will do
remove_breakpoint(T1->mm) and return the error.

However, T1 can hit this bp right after install_breakpoint(T1->mm), so it
can call uprobe_trace_func() before list_del(&link->list).

OK, even if I am right this is mostly theoretical.

Oleg.



Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Steven Rostedt
On Fri, 10 Aug 2018 13:35:49 +0200
Oleg Nesterov  wrote:

> On 08/09, Steven Rostedt wrote:
> >
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
> > trace_event_file *file)
> >
> > list_del_rcu(&link->list);
> > /* synchronize with u{,ret}probe_trace_func */
> > -   synchronize_sched();
> > +   synchronize_rcu();  
> 
> Can't we change uprobe_trace_func() and uretprobe_trace_func() to use
> rcu_read_lock_sched() instead? It is more cheap.

Is it? rcu_read_lock_sched() is a preempt_disable(), where
rcu_read_lock() may just be a task counter increment.

> 
> 
> Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't
> look right... Not only because kfree() can race with 
> list_for_each_entry_rcu(),
> we should not put the 1st link on list until uprobe_buffer_enable().
> 
> Does the patch below make sense or I am confused?

I guess the question is, if it isn't enabled, are there any users or
even past users still running. If not, then I think the current code is
OK, as there shouldn't be anything happening to race with it.

-- Steve

> 
> Oleg.
> 
> 
> --- x/kernel/trace/trace_uprobe.c
> +++ x/kernel/trace/trace_uprobe.c
> @@ -896,8 +896,6 @@ probe_event_enable(struct trace_uprobe *
>   return -ENOMEM;
>  
>   link->file = file;
> - list_add_tail_rcu(&link->list, &tu->tp.files);
> -
>   tu->tp.flags |= TP_FLAG_TRACE;
>   } else {
>   if (tu->tp.flags & TP_FLAG_TRACE)
> @@ -909,7 +907,7 @@ probe_event_enable(struct trace_uprobe *
>   WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>  
>   if (enabled)
> - return 0;
> + goto add;
>  
>   ret = uprobe_buffer_enable();
>   if (ret)
> @@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe *
>   ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>   if (ret)
>   goto err_buffer;
> -
> + add:
> + list_add_tail_rcu(&link->list, &tu->tp.files);
>   return 0;
>  
>   err_buffer:
> @@ -928,7 +927,6 @@ probe_event_enable(struct trace_uprobe *
>  
>   err_flags:
>   if (file) {
> - list_del(&link->list);
>   kfree(link);
>   tu->tp.flags &= ~TP_FLAG_TRACE;
>   } else {



Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Oleg Nesterov
On 08/10, Oleg Nesterov wrote:
>
> @@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe *
>   ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>   if (ret)
>   goto err_buffer;
> -
> + add:
> + list_add_tail_rcu(&link->list, &tu->tp.files);

if (link)
list_add_tail_rcu(&link->list, &tu->tp.files);

Oleg.



Re: [PATCH] uprobes: Use synchronize_rcu() not synchronize_sched()

2018-08-10 Thread Oleg Nesterov
On 08/09, Steven Rostedt wrote:
>
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -952,7 +952,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
> trace_event_file *file)
>
>   list_del_rcu(&link->list);
>   /* synchronize with u{,ret}probe_trace_func */
> - synchronize_sched();
> + synchronize_rcu();

Can't we change uprobe_trace_func() and uretprobe_trace_func() to use
rcu_read_lock_sched() instead? It is more cheap.


Hmm. probe_event_enable() does list_del + kfree on failure, this doesn't
look right... Not only because kfree() can race with list_for_each_entry_rcu(),
we should not put the 1st link on list until uprobe_buffer_enable().

Does the patch below make sense or I am confused?

Oleg.


--- x/kernel/trace/trace_uprobe.c
+++ x/kernel/trace/trace_uprobe.c
@@ -896,8 +896,6 @@ probe_event_enable(struct trace_uprobe *
return -ENOMEM;
 
link->file = file;
-   list_add_tail_rcu(&link->list, &tu->tp.files);
-
tu->tp.flags |= TP_FLAG_TRACE;
} else {
if (tu->tp.flags & TP_FLAG_TRACE)
@@ -909,7 +907,7 @@ probe_event_enable(struct trace_uprobe *
WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
if (enabled)
-   return 0;
+   goto add;
 
ret = uprobe_buffer_enable();
if (ret)
@@ -920,7 +918,8 @@ probe_event_enable(struct trace_uprobe *
ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
if (ret)
goto err_buffer;
-
+ add:
+   list_add_tail_rcu(&link->list, &tu->tp.files);
return 0;
 
  err_buffer:
@@ -928,7 +927,6 @@ probe_event_enable(struct trace_uprobe *
 
  err_flags:
if (file) {
-   list_del(&link->list);
kfree(link);
tu->tp.flags &= ~TP_FLAG_TRACE;
} else {