Re: rcu-bh design

2018-05-04 Thread Joel Fernandes
On Fri, May 4, 2018 at 4:42 PM Paul E. McKenney 
wrote:
[..]
> > Say that the tracepoint code is buggy and for some reason a
> > call_srcu wasn't done. For example, say the hypothetical bug I'm
> > taking about is in tracepoint_remove_func which called the
> > rcu_assign_pointer, but didn't call release_probes:
> >
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index d0639d917899..f54cb358f451 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint
*tp,
> > rcu_assign_pointer(tp->funcs, tp_funcs);
> > if (!static_key_enabled(>key))
> > static_key_slow_inc(>key);
> > -   release_probes(old);
> > return 0;
> >  }
> > ---
> > I want to catching this from the test code. If I did the following,
> > it would be insufficient then:
> >
> > trace_probe_register(...);
> > srcu_barrier(_srcu_struct);  // only tells me any queued calls
> >   // are done but not that something
> >   // was queued
> >
> > I was seeing if I could use my_srcu_struct::completed for that. but
> > I'm not sure if that'll work (or that its legal to access it
> > directly - but hey this is just test code! :P).

> This is a bit ugly to do programmatically in the kernel.  The callbacks
> are on per-CPU queues that cannot be safely accessed except by that CPU.
> Plus they are on singly linked lists that can be quite long, so it could
> be insanely slow as well.

> But given that this is test code, why not leverage event tracing?
> There are some relevant events in include/trace/events/rcu.h, starting
> with the rcu_callback trace event.


Sounds like a great idea. I'll try that, thanks.

- Joel


Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 04:20:41PM -0700, Joel Fernandes wrote:
> 
> 
> On 05/04/2018 03:49 PM, Paul E. McKenney wrote:
> >>Yes just one more ;-). I am trying to write a 'probetorture' test inspired
> >>by RCU torture that whacks the tracepoints in various scenarios. One of the
> >>things I want to do is verify the RCU callbacks are queued and secondly,
> >>they are executed. Just to verify that the garbage collect was done and
> >>we're not leaking the function probe table (not that I don't have
> >>confidence in the chained callback technique which you mentioned, but it
> >>would be nice to assure this mechanism is working for tracepoints).
> >>
> >>Is there a way to detect this given a reference to srcu_struct? Mathieu and
> >>we were chatting about srcu_barrier which is cool but that just tells me
> >>that if there was a callback queued, it would have executed after the
> >>readers were done. But not whether something was queued.
> >
> >Suppose that you are queuing an RCU callback that in turn queues an SRCU
> >callback on my_srcu_struct, like this:
> >
> > void my_rcu_callback(struct rcu_head *rhp)
> > {
> > p = container_of(rhp, struct my_struct, my_rcu_head);
> >
> > free_it_up_or_down(p);
> > }
> >
> > void my_srcu_callback(struct rcu_head *rhp)
> > {
> > call_rcu(rhp, my_rcu_callback);
> > }
> >
> > call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);
> >
> >Then to make sure that any previously submitted callback has been fully
> >processed, you do this:
> >
> > rcu_barrier();
> > srcu_barrier(_srcu_struct);
> >
> >Of course if you queue in the opposite order, like this:
> >
> > void my_srcu_callback(struct rcu_head *rhp)
> > {
> > p = container_of(rhp, struct my_struct, my_rcu_head);
> >
> > free_it_up_or_down(p);
> > }
> >
> > void my_rcu_callback(struct rcu_head *rhp)
> > {
> > call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);
> > }
> >
> > call_rcu(rhp, my_rcu_callback);
> >
> >Then you must wait in the opposite order:
> >
> > rcu_barrier();
> > srcu_barrier(_srcu_struct);
> >
> >Either way, the trick is that the first *_barrier() call cannot return
> >until after all previous callbacks have executed, which means that by that
> >time the callback is enqueued for the other flavor of {S,}RCU.  So the
> >second *_barrier() call must wait for the callback to be completely done,
> >through both flavors of {S,}RCU.
> >
> >So after executing the pair of *_barrier() calls, you know that the
> >callback is no longer queued.
> >
> >Does that cover it, or am I missing a turn in here somewhere?
> 
> Yes, that covers some of it. Thanks a lot. Btw, I was also thinking
> I want to check that srcu received a callback queuing request as
> well (not just completion but also queuing).
> 
> Say that the tracepoint code is buggy and for some reason a
> call_srcu wasn't done. For example, say the hypothetical bug I'm
> taking about is in tracepoint_remove_func which called the
> rcu_assign_pointer, but didn't call release_probes:
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index d0639d917899..f54cb358f451 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint *tp,
> rcu_assign_pointer(tp->funcs, tp_funcs);
> if (!static_key_enabled(>key))
> static_key_slow_inc(>key);
> -   release_probes(old);
> return 0;
>  }
> ---
> I want to catching this from the test code. If I did the following,
> it would be insufficient then:
> 
> trace_probe_register(...);
> srcu_barrier(_srcu_struct);  // only tells me any queued calls
>   // are done but not that something
>   // was queued
> 
> I was seeing if I could use my_srcu_struct::completed for that. but
> I'm not sure if that'll work (or that its legal to access it
> directly - but hey this is just test code! :P).

This is a bit ugly to do programmatically in the kernel.  The callbacks
are on per-CPU queues that cannot be safely accessed except by that CPU.
Plus they are on singly linked lists that can be quite long, so it could
be insanely slow as well.

But given that this is test code, why not leverage event tracing?
There are some relevant events in include/trace/events/rcu.h, starting
with the rcu_callback trace event.

Thanx, Paul



Re: rcu-bh design

2018-05-04 Thread Joel Fernandes



On 05/04/2018 03:49 PM, Paul E. McKenney wrote:

Yes just one more ;-). I am trying to write a 'probetorture' test inspired
by RCU torture that whacks the tracepoints in various scenarios. One of the
things I want to do is verify the RCU callbacks are queued and secondly,
they are executed. Just to verify that the garbage collect was done and
we're not leaking the function probe table (not that I don't have
confidence in the chained callback technique which you mentioned, but it
would be nice to assure this mechanism is working for tracepoints).

Is there a way to detect this given a reference to srcu_struct? Mathieu and
we were chatting about srcu_barrier which is cool but that just tells me
that if there was a callback queued, it would have executed after the
readers were done. But not whether something was queued.


Suppose that you are queuing an RCU callback that in turn queues an SRCU
callback on my_srcu_struct, like this:

void my_rcu_callback(struct rcu_head *rhp)
{
p = container_of(rhp, struct my_struct, my_rcu_head);

free_it_up_or_down(p);
}

void my_srcu_callback(struct rcu_head *rhp)
{
call_rcu(rhp, my_rcu_callback);
}

call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);

Then to make sure that any previously submitted callback has been fully
processed, you do this:

rcu_barrier();
srcu_barrier(_srcu_struct);

Of course if you queue in the opposite order, like this:

void my_srcu_callback(struct rcu_head *rhp)
{
p = container_of(rhp, struct my_struct, my_rcu_head);

free_it_up_or_down(p);
}

void my_rcu_callback(struct rcu_head *rhp)
{
call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);
}

call_rcu(rhp, my_rcu_callback);

Then you must wait in the opposite order:

rcu_barrier();
srcu_barrier(_srcu_struct);

Either way, the trick is that the first *_barrier() call cannot return
until after all previous callbacks have executed, which means that by that
time the callback is enqueued for the other flavor of {S,}RCU.  So the
second *_barrier() call must wait for the callback to be completely done,
through both flavors of {S,}RCU.

So after executing the pair of *_barrier() calls, you know that the
callback is no longer queued.

Does that cover it, or am I missing a turn in here somewhere?


Yes, that covers some of it. Thanks a lot. Btw, I was also thinking I 
want to check that srcu received a callback queuing request as well (not 
just completion but also queuing).


Say that the tracepoint code is buggy and for some reason a call_srcu 
wasn't done. For example, say the hypothetical bug I'm taking about is 
in tracepoint_remove_func which called the rcu_assign_pointer, but 
didn't call release_probes:


diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index d0639d917899..f54cb358f451 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint *tp,
rcu_assign_pointer(tp->funcs, tp_funcs);
if (!static_key_enabled(>key))
static_key_slow_inc(>key);
-   release_probes(old);
return 0;
 }
---
I want to catching this from the test code. If I did the following, it 
would be insufficient then:


trace_probe_register(...);
srcu_barrier(_srcu_struct);  // only tells me any queued calls
// are done but not that something
// was queued

I was seeing if I could use my_srcu_struct::completed for that. but I'm 
not sure if that'll work (or that its legal to access it directly - but 
hey this is just test code! :P).


thanks,

 - Joel





Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 08:33:19PM +, Joel Fernandes wrote:
> On Fri, May 4, 2018 at 1:10 PM Paul E. McKenney 
> wrote:
> [...]
> > > >> > Almost.  All context switches in an RCU-preempt read-side critical
> section
> > > >> > must be subject to priority boosting.  Preemption is one example,
> because
> > > >> > boosting the priority of the preempted task will make it runnable.
> > > >> > The priority-inheritance -rt "spinlock" is another example, because
> > > >> > boosting the priority of the task holding the lock will eventually
> make
> > > >> > runnable the task acquiring the lock within the RCU-preempt
> read-side
> > > >> > critical section.
> > > >>
> > > >> Yes I understand priority boosting is needed with preemptible RCU so
> that
> > > >> read-sections are making forward progress. I meant (and correct me
> if I'm
> > > >> wrong) that, as long as a task doesn't sleep in a preemptible RCU
> > > >> read-section (rcu-preempt flavor), then bad things wont happen and
> RCU will
> > > >> work correctly.
> > > >
> > > > The exception is -rt "spinlock" acquisition.  If the "spinlock" is
> held,
> > > > the task acquiring it will block, which is legal within an RCU-preempt
> > > > read-side critical section.
> > > >
> > > > This exception is why I define bad things in terms of lack of
> > > > susceptibility to priority boosting instead of sleeping.
> > >
> > > Oh, that's a tricky situation. Thanks for letting me know. I guess my
> > > view was too idealistic. Makes sense now.
> 
> > Well, let me put it this way...
> 
> > Here is your nice elegant little algorithm:
> > https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg
> 
> > Here is your nice elegant little algorithm equipped to survive within
> > the Linux kernel:
> > https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg
> 
> A picture speaks a 1000 words! :-D

And I suspect that a real face-to-face encounter with that guy is worth
1000 pictures.  ;-)

> > Any questions?  ;-)
> 
> Yes just one more ;-). I am trying to write a 'probetorture' test inspired
> by RCU torture that whacks the tracepoints in various scenarios. One of the
> things I want to do is verify the RCU callbacks are queued and secondly,
> they are executed. Just to verify that the garbage collect was done and
> we're not leaking the function probe table (not that I don't have
> confidence in the chained callback technique which you mentioned, but it
> would be nice to assure this mechanism is working for tracepoints).
> 
> Is there a way to detect this given a reference to srcu_struct? Mathieu and
> we were chatting about srcu_barrier which is cool but that just tells me
> that if there was a callback queued, it would have executed after the
> readers were done. But not whether something was queued.

Suppose that you are queuing an RCU callback that in turn queues an SRCU
callback on my_srcu_struct, like this:

void my_rcu_callback(struct rcu_head *rhp)
{
p = container_of(rhp, struct my_struct, my_rcu_head);

free_it_up_or_down(p);
}

void my_srcu_callback(struct rcu_head *rhp)
{
call_rcu(rhp, my_rcu_callback);
}

call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);

Then to make sure that any previously submitted callback has been fully
processed, you do this:

rcu_barrier();
srcu_barrier(_srcu_struct);

Of course if you queue in the opposite order, like this:

void my_srcu_callback(struct rcu_head *rhp)
{
p = container_of(rhp, struct my_struct, my_rcu_head);

free_it_up_or_down(p);
}

void my_rcu_callback(struct rcu_head *rhp)
{
call_srcu(_srcu_struct, >my_rcu_head, my_srcu_callback);
}

call_rcu(rhp, my_rcu_callback);

Then you must wait in the opposite order:

rcu_barrier();
srcu_barrier(_srcu_struct);

Either way, the trick is that the first *_barrier() call cannot return
until after all previous callbacks have executed, which means that by that
time the callback is enqueued for the other flavor of {S,}RCU.  So the
second *_barrier() call must wait for the callback to be completely done,
through both flavors of {S,}RCU.

So after executing the pair of *_barrier() calls, you know that the
callback is no longer queued.

Does that cover it, or am I missing a turn in here somewhere?

Thanx, Paul

> thanks,
> 
> - Joel
> PS: I remember Paul, you mentioned you are testing this chained callback
> case in rcutorture, so if the answer is "go read rcutorture", that's
> totally Ok I could just do that ;-)
> 



Re: rcu-bh design

2018-05-04 Thread Joel Fernandes
On Fri, May 4, 2018 at 1:10 PM Paul E. McKenney 
wrote:
[...]
> > >> > Almost.  All context switches in an RCU-preempt read-side critical
section
> > >> > must be subject to priority boosting.  Preemption is one example,
because
> > >> > boosting the priority of the preempted task will make it runnable.
> > >> > The priority-inheritance -rt "spinlock" is another example, because
> > >> > boosting the priority of the task holding the lock will eventually
make
> > >> > runnable the task acquiring the lock within the RCU-preempt
read-side
> > >> > critical section.
> > >>
> > >> Yes I understand priority boosting is needed with preemptible RCU so
that
> > >> read-sections are making forward progress. I meant (and correct me
if I'm
> > >> wrong) that, as long as a task doesn't sleep in a preemptible RCU
> > >> read-section (rcu-preempt flavor), then bad things wont happen and
RCU will
> > >> work correctly.
> > >
> > > The exception is -rt "spinlock" acquisition.  If the "spinlock" is
held,
> > > the task acquiring it will block, which is legal within an RCU-preempt
> > > read-side critical section.
> > >
> > > This exception is why I define bad things in terms of lack of
> > > susceptibility to priority boosting instead of sleeping.
> >
> > Oh, that's a tricky situation. Thanks for letting me know. I guess my
> > view was too idealistic. Makes sense now.

> Well, let me put it this way...

> Here is your nice elegant little algorithm:
> https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg

> Here is your nice elegant little algorithm equipped to survive within
> the Linux kernel:
> https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg


A picture speaks a 1000 words! :-D

> Any questions?  ;-)

Yes just one more ;-). I am trying to write a 'probetorture' test inspired
by RCU torture that whacks the tracepoints in various scenarios. One of the
things I want to do is verify the RCU callbacks are queued and secondly,
they are executed. Just to verify that the garbage collect was done and
we're not leaking the function probe table (not that I don't have
confidence in the chained callback technique which you mentioned, but it
would be nice to assure this mechanism is working for tracepoints).

Is there a way to detect this given a reference to srcu_struct? Mathieu and
we were chatting about srcu_barrier which is cool but that just tells me
that if there was a callback queued, it would have executed after the
readers were done. But not whether something was queued.

thanks,

- Joel
PS: I remember Paul, you mentioned you are testing this chained callback
case in rcutorture, so if the answer is "go read rcutorture", that's
totally Ok I could just do that ;-)


Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 12:57:19PM -0700, Joel Fernandes wrote:
> On Fri, May 4, 2018 at 11:49 AM, Paul E. McKenney
>  wrote:
> > On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote:
> >> On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney
> >> 
> >> wrote:
> >> [...]
> >> > > > > But preemptible RCU *does not* use context-switch as a quiescent
> >> state.
> >> > > > It doesn't?
> >> > >
> >> > > I thought that's what preemptible rcu is about. You can get preempted
> >> but
> >> > > you shouldn't block in a read-section. Is that not true?
> >>
> >> > Almost.  All context switches in an RCU-preempt read-side critical 
> >> > section
> >> > must be subject to priority boosting.  Preemption is one example, because
> >> > boosting the priority of the preempted task will make it runnable.
> >> > The priority-inheritance -rt "spinlock" is another example, because
> >> > boosting the priority of the task holding the lock will eventually make
> >> > runnable the task acquiring the lock within the RCU-preempt read-side
> >> > critical section.
> >>
> >> Yes I understand priority boosting is needed with preemptible RCU so that
> >> read-sections are making forward progress. I meant (and correct me if I'm
> >> wrong) that, as long as a task doesn't sleep in a preemptible RCU
> >> read-section (rcu-preempt flavor), then bad things wont happen and RCU will
> >> work correctly.
> >
> > The exception is -rt "spinlock" acquisition.  If the "spinlock" is held,
> > the task acquiring it will block, which is legal within an RCU-preempt
> > read-side critical section.
> >
> > This exception is why I define bad things in terms of lack of
> > susceptibility to priority boosting instead of sleeping.
> 
> Oh, that's a tricky situation. Thanks for letting me know. I guess my
> view was too idealistic. Makes sense now.

Well, let me put it this way...

Here is your nice elegant little algorithm:
https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg

Here is your nice elegant little algorithm equipped to survive within
the Linux kernel:
https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg

Any questions?  ;-)

Thanx, Paul



Re: rcu-bh design

2018-05-04 Thread Joel Fernandes
On Fri, May 4, 2018 at 11:49 AM, Paul E. McKenney
 wrote:
> On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote:
>> On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney
>> 
>> wrote:
>> [...]
>> > > > > But preemptible RCU *does not* use context-switch as a quiescent
>> state.
>> > > > It doesn't?
>> > >
>> > > I thought that's what preemptible rcu is about. You can get preempted
>> but
>> > > you shouldn't block in a read-section. Is that not true?
>>
>> > Almost.  All context switches in an RCU-preempt read-side critical section
>> > must be subject to priority boosting.  Preemption is one example, because
>> > boosting the priority of the preempted task will make it runnable.
>> > The priority-inheritance -rt "spinlock" is another example, because
>> > boosting the priority of the task holding the lock will eventually make
>> > runnable the task acquiring the lock within the RCU-preempt read-side
>> > critical section.
>>
>> Yes I understand priority boosting is needed with preemptible RCU so that
>> read-sections are making forward progress. I meant (and correct me if I'm
>> wrong) that, as long as a task doesn't sleep in a preemptible RCU
>> read-section (rcu-preempt flavor), then bad things wont happen and RCU will
>> work correctly.
>
> The exception is -rt "spinlock" acquisition.  If the "spinlock" is held,
> the task acquiring it will block, which is legal within an RCU-preempt
> read-side critical section.
>
> This exception is why I define bad things in terms of lack of
> susceptibility to priority boosting instead of sleeping.

Oh, that's a tricky situation. Thanks for letting me know. I guess my
view was too idealistic. Makes sense now.

>> > > I was asking why rcu-bh is needed in the kernel, like why can't we just
>> use
>> > > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent
>> > > denial of service during heavy softirq load. I was trying to understand
>> > > that usecase. In my mind, such denial of service / out of memory is then
>> > > even possible with preemptible rcu which is used in many places in the
>> > > kernel, then why not just use rcu-bh for everything? I was just studying
>> > > this RCU flavor (and all other RCU flavors) and so this question popped
>> up.
>>
>> > Because RCU-bh is not preemptible.
>>
>> > And the non-DoS nature of RCU-bh is one challenge in my current quest to
>> > fold all three flavors (RCU-bh,  RCU-preempt, and RCU-sched) into one
>> > flavor to rule them all.  ;-)
>>
>> But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses
>> in the kernel are susceptible to DoS'ing as well?
>
> Right now, not much.  So this is one of the problems I must solve.

Oh, ok.

>> Isn't the issue the heavy softirq processing itself which can also lead to
>> other issues such as scheduling issues (other than the OOM) so probably
>> that should be fixed instead of RCU?
>
> In theory, yes.  In practice, the way that the kernel hangs leads them
> to yell at me about RCU instead of yelling at whoever is behind the
> root cause.  So it behooves me to make RCU able to deal with whatever
> shows up, at least where reasonably feasible.  Otherwise, I am signed up
> to fix random DoS-related bugs though more that 20 million lines of code,
> which would be a nobel quest, but not one that I am currently prepared
> to sign up for.  ;-)

Yes, I understand. :-)

thanks,

- Joel


Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote:
> On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney
> 
> wrote:
> [...]
> > > > > But preemptible RCU *does not* use context-switch as a quiescent
> state.
> > > > It doesn't?
> > >
> > > I thought that's what preemptible rcu is about. You can get preempted
> but
> > > you shouldn't block in a read-section. Is that not true?
> 
> > Almost.  All context switches in an RCU-preempt read-side critical section
> > must be subject to priority boosting.  Preemption is one example, because
> > boosting the priority of the preempted task will make it runnable.
> > The priority-inheritance -rt "spinlock" is another example, because
> > boosting the priority of the task holding the lock will eventually make
> > runnable the task acquiring the lock within the RCU-preempt read-side
> > critical section.
> 
> Yes I understand priority boosting is needed with preemptible RCU so that
> read-sections are making forward progress. I meant (and correct me if I'm
> wrong) that, as long as a task doesn't sleep in a preemptible RCU
> read-section (rcu-preempt flavor), then bad things wont happen and RCU will
> work correctly.

The exception is -rt "spinlock" acquisition.  If the "spinlock" is held,
the task acquiring it will block, which is legal within an RCU-preempt
read-side critical section.

This exception is why I define bad things in terms of lack of
susceptibility to priority boosting instead of sleeping.

> > > > > So in that case rcu-bh would make
> > > > > sense only in a configuration where we're not using preemptible-rcu
> at
> > > all
> > > > > and are getting flooded by softirqs. Is that the reason rcu-bh
> needs to
> > > > > exist?
> > >
> > > > Maybe I'm confused by what you are asking.
> > >
> > > Sorry for any confusion. I was going through the below link for
> motivation
> > > of rcu-bh and why it was created:
> > >
> https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor
> > >
> > > I was asking why rcu-bh is needed in the kernel, like why can't we just
> use
> > > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent
> > > denial of service during heavy softirq load. I was trying to understand
> > > that usecase. In my mind, such denial of service / out of memory is then
> > > even possible with preemptible rcu which is used in many places in the
> > > kernel, then why not just use rcu-bh for everything? I was just studying
> > > this RCU flavor (and all other RCU flavors) and so this question popped
> up.
> 
> > Because RCU-bh is not preemptible.
> 
> > And the non-DoS nature of RCU-bh is one challenge in my current quest to
> > fold all three flavors (RCU-bh,  RCU-preempt, and RCU-sched) into one
> > flavor to rule them all.  ;-)
> 
> But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses
> in the kernel are susceptible to DoS'ing as well?

Right now, not much.  So this is one of the problems I must solve.

> Isn't the issue the heavy softirq processing itself which can also lead to
> other issues such as scheduling issues (other than the OOM) so probably
> that should be fixed instead of RCU?

In theory, yes.  In practice, the way that the kernel hangs leads them
to yell at me about RCU instead of yelling at whoever is behind the
root cause.  So it behooves me to make RCU able to deal with whatever
shows up, at least where reasonably feasible.  Otherwise, I am signed up
to fix random DoS-related bugs though more that 20 million lines of code,
which would be a nobel quest, but not one that I am currently prepared
to sign up for.  ;-)

Hence things like RCU CPU stall warnings.

But yes, there have been some modifications to softirq and I will
no doubt have to make a few more in order to make this work.

Thanx, Paul



Re: rcu-bh design

2018-05-04 Thread Joel Fernandes
On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney

wrote:
[...]
> > > > But preemptible RCU *does not* use context-switch as a quiescent
state.
> > > It doesn't?
> >
> > I thought that's what preemptible rcu is about. You can get preempted
but
> > you shouldn't block in a read-section. Is that not true?

> Almost.  All context switches in an RCU-preempt read-side critical section
> must be subject to priority boosting.  Preemption is one example, because
> boosting the priority of the preempted task will make it runnable.
> The priority-inheritance -rt "spinlock" is another example, because
> boosting the priority of the task holding the lock will eventually make
> runnable the task acquiring the lock within the RCU-preempt read-side
> critical section.

Yes I understand priority boosting is needed with preemptible RCU so that
read-sections are making forward progress. I meant (and correct me if I'm
wrong) that, as long as a task doesn't sleep in a preemptible RCU
read-section (rcu-preempt flavor), then bad things wont happen and RCU will
work correctly.


> > > > So in that case rcu-bh would make
> > > > sense only in a configuration where we're not using preemptible-rcu
at
> > all
> > > > and are getting flooded by softirqs. Is that the reason rcu-bh
needs to
> > > > exist?
> >
> > > Maybe I'm confused by what you are asking.
> >
> > Sorry for any confusion. I was going through the below link for
motivation
> > of rcu-bh and why it was created:
> >
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor
> >
> > I was asking why rcu-bh is needed in the kernel, like why can't we just
use
> > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent
> > denial of service during heavy softirq load. I was trying to understand
> > that usecase. In my mind, such denial of service / out of memory is then
> > even possible with preemptible rcu which is used in many places in the
> > kernel, then why not just use rcu-bh for everything? I was just studying
> > this RCU flavor (and all other RCU flavors) and so this question popped
up.

> Because RCU-bh is not preemptible.

> And the non-DoS nature of RCU-bh is one challenge in my current quest to
> fold all three flavors (RCU-bh,  RCU-preempt, and RCU-sched) into one
> flavor to rule them all.  ;-)

But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses
in the kernel are susceptible to DoS'ing as well?

Isn't the issue the heavy softirq processing itself which can also lead to
other issues such as scheduling issues (other than the OOM) so probably
that should be fixed instead of RCU?

thanks,

- Joel


Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 05:15:11PM +, Joel Fernandes wrote:
> Hi Steven,
> Just for a warning/disclaimer, I am new to RCU-land and trying to make
> sense ;-) So forgive me if something sounds too outlandish.
> 
> On Fri, May 4, 2018 at 9:30 AM Steven Rostedt <rost...@goodmis.org> wrote:
> 
> > On Fri, 04 May 2018 16:20:11 +
> > Joel Fernandes <joe...@google.com> wrote:
> 
> > > Hi Paul, everyone,
> > >
> > > I had some question(s) about rcu-bh design.
> > > I am trying to understand the reasoning or need of it. I see that rcu-bh
> > > will disable softirqs across read-side sections. But I am wondering why
> > > this is needed. __do_softirq already disables softirq when a softirq
> > > handler is running. The only reason I can see is, rcu-bh helps in
> > > situations where - a softirq interrupts a preemptible RCU read-section
> and
> > > prevents that read section from completing. But this problem would
> happen
> > > if anyone where to use rcu-preempt - then does rcu-preempt even make
> sense
> > > to use and shouldn't everyone be using rcu-bh?
> 
> > I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking
> > softirqs from happening makes sense. I don't think an
> > rcu_read_lock_bh() makes sense in a softirq.
> 
> Ok.
> 
> > > The other usecase for rcu-bh seems to be if context-switch is used as a
> > > quiescent state, then softirq flood can prevent that from happening and
> > > cause rcu grace periods from completing.
> 
> > > But preemptible RCU *does not* use context-switch as a quiescent state.
> > It doesn't?
> 
> I thought that's what preemptible rcu is about. You can get preempted but
> you shouldn't block in a read-section. Is that not true?

Almost.  All context switches in an RCU-preempt read-side critical section
must be subject to priority boosting.  Preemption is one example, because
boosting the priority of the preempted task will make it runnable.
The priority-inheritance -rt "spinlock" is another example, because
boosting the priority of the task holding the lock will eventually make
runnable the task acquiring the lock within the RCU-preempt read-side
critical section.

> > > So in that case rcu-bh would make
> > > sense only in a configuration where we're not using preemptible-rcu at
> all
> > > and are getting flooded by softirqs. Is that the reason rcu-bh needs to
> > > exist?
> 
> > Maybe I'm confused by what you are asking.
> 
> Sorry for any confusion. I was going through the below link for motivation
> of rcu-bh and why it was created:
> https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor
> 
> I was asking why rcu-bh is needed in the kernel, like why can't we just use
> rcu-preempt. As per above link, the motivation of rcu-bh was to prevent
> denial of service during heavy softirq load. I was trying to understand
> that usecase. In my mind, such denial of service / out of memory is then
> even possible with preemptible rcu which is used in many places in the
> kernel, then why not just use rcu-bh for everything? I was just studying
> this RCU flavor (and all other RCU flavors) and so this question popped up.

Because RCU-bh is not preemptible.

And the non-DoS nature of RCU-bh is one challenge in my current quest to
fold all three flavors (RCU-bh,  RCU-preempt, and RCU-sched) into one
flavor to rule them all.  ;-)

Thanx, Paul



Re: rcu-bh design

2018-05-04 Thread Steven Rostedt
On Fri, 4 May 2018 10:32:06 -0700
"Paul E. McKenney"  wrote:

> > > But preemptible RCU *does not* use context-switch as a quiescent state.  
> > 
> > It doesn't?  
> 
> It does, but only sort of.

Ah, my confusion. I was thinking of rcu_read_lock_sched() as
"preemptible RCU", not the "we can preempt in rcu_read_lock() context".

-- Steve


> 
> A context switch really is always an RCU-preempt quiescent state from
> the perspective of the CPU.  However, from the perspective of the task,
> context switch is a quiescent state only if the task is not in an
> RCU-preempt read-side critical section at the time.


Re: rcu-bh design

2018-05-04 Thread Paul E. McKenney
On Fri, May 04, 2018 at 12:30:50PM -0400, Steven Rostedt wrote:
> On Fri, 04 May 2018 16:20:11 +
> Joel Fernandes <joe...@google.com> wrote:
> 
> > Hi Paul, everyone,
> > 
> > I had some question(s) about rcu-bh design.
> > I am trying to understand the reasoning or need of it. I see that rcu-bh
> > will disable softirqs across read-side sections. But I am wondering why
> > this is needed. __do_softirq already disables softirq when a softirq
> > handler is running. The only reason I can see is, rcu-bh helps in
> > situations where - a softirq interrupts a preemptible RCU read-section and
> > prevents that read section from completing. But this problem would happen
> > if anyone where to use rcu-preempt - then does rcu-preempt even make sense
> > to use and shouldn't everyone be using rcu-bh?
> 
> I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking
> softirqs from happening makes sense. I don't think an
> rcu_read_lock_bh() makes sense in a softirq.

Agreed, any place in the code where bottom halves are enabled is an RCU-bh
quiescent state.

> > The other usecase for rcu-bh seems to be if context-switch is used as a
> > quiescent state, then softirq flood can prevent that from happening and
> > cause rcu grace periods from completing. 
> 
> > But preemptible RCU *does not* use context-switch as a quiescent state.
> 
> It doesn't?

It does, but only sort of.

A context switch really is always an RCU-preempt quiescent state from
the perspective of the CPU.  However, from the perspective of the task,
context switch is a quiescent state only if the task is not in an
RCU-preempt read-side critical section at the time.

> > So in that case rcu-bh would make
> > sense only in a configuration where we're not using preemptible-rcu at all
> > and are getting flooded by softirqs. Is that the reason rcu-bh needs to
> > exist?
> 
> Maybe I'm confused by what you are asking.

The existence and use of RCU-bh in no way degrades the preemptibility of
RCU-preempt read-side critical sections.  The reason is that RCU-bh and
RCU-preempt are completely separate, represented by different rcu_state
structure instances.

Thanx, Paul



Re: rcu-bh design

2018-05-04 Thread Joel Fernandes
Hi Steven,
Just for a warning/disclaimer, I am new to RCU-land and trying to make
sense ;-) So forgive me if something sounds too outlandish.

On Fri, May 4, 2018 at 9:30 AM Steven Rostedt <rost...@goodmis.org> wrote:

> On Fri, 04 May 2018 16:20:11 +
> Joel Fernandes <joe...@google.com> wrote:

> > Hi Paul, everyone,
> >
> > I had some question(s) about rcu-bh design.
> > I am trying to understand the reasoning or need of it. I see that rcu-bh
> > will disable softirqs across read-side sections. But I am wondering why
> > this is needed. __do_softirq already disables softirq when a softirq
> > handler is running. The only reason I can see is, rcu-bh helps in
> > situations where - a softirq interrupts a preemptible RCU read-section
and
> > prevents that read section from completing. But this problem would
happen
> > if anyone where to use rcu-preempt - then does rcu-preempt even make
sense
> > to use and shouldn't everyone be using rcu-bh?

> I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking
> softirqs from happening makes sense. I don't think an
> rcu_read_lock_bh() makes sense in a softirq.

Ok.

> >
> > The other usecase for rcu-bh seems to be if context-switch is used as a
> > quiescent state, then softirq flood can prevent that from happening and
> > cause rcu grace periods from completing.

> > But preemptible RCU *does not* use context-switch as a quiescent state.

> It doesn't?

I thought that's what preemptible rcu is about. You can get preempted but
you shouldn't block in a read-section. Is that not true?


> > So in that case rcu-bh would make
> > sense only in a configuration where we're not using preemptible-rcu at
all
> > and are getting flooded by softirqs. Is that the reason rcu-bh needs to
> > exist?

> Maybe I'm confused by what you are asking.

Sorry for any confusion. I was going through the below link for motivation
of rcu-bh and why it was created:
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor

I was asking why rcu-bh is needed in the kernel, like why can't we just use
rcu-preempt. As per above link, the motivation of rcu-bh was to prevent
denial of service during heavy softirq load. I was trying to understand
that usecase. In my mind, such denial of service / out of memory is then
even possible with preemptible rcu which is used in many places in the
kernel, then why not just use rcu-bh for everything? I was just studying
this RCU flavor (and all other RCU flavors) and so this question popped up.

thanks,

- Joel


Re: rcu-bh design

2018-05-04 Thread Steven Rostedt
On Fri, 04 May 2018 16:20:11 +
Joel Fernandes <joe...@google.com> wrote:

> Hi Paul, everyone,
> 
> I had some question(s) about rcu-bh design.
> I am trying to understand the reasoning or need of it. I see that rcu-bh
> will disable softirqs across read-side sections. But I am wondering why
> this is needed. __do_softirq already disables softirq when a softirq
> handler is running. The only reason I can see is, rcu-bh helps in
> situations where - a softirq interrupts a preemptible RCU read-section and
> prevents that read section from completing. But this problem would happen
> if anyone where to use rcu-preempt - then does rcu-preempt even make sense
> to use and shouldn't everyone be using rcu-bh?

I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking
softirqs from happening makes sense. I don't think an
rcu_read_lock_bh() makes sense in a softirq.

> 
> The other usecase for rcu-bh seems to be if context-switch is used as a
> quiescent state, then softirq flood can prevent that from happening and
> cause rcu grace periods from completing. 

> But preemptible RCU *does not* use context-switch as a quiescent state.

It doesn't?

> So in that case rcu-bh would make
> sense only in a configuration where we're not using preemptible-rcu at all
> and are getting flooded by softirqs. Is that the reason rcu-bh needs to
> exist?

Maybe I'm confused by what you are asking.

-- Steve


rcu-bh design

2018-05-04 Thread Joel Fernandes
Hi Paul, everyone,

I had some question(s) about rcu-bh design.
I am trying to understand the reasoning or need of it. I see that rcu-bh
will disable softirqs across read-side sections. But I am wondering why
this is needed. __do_softirq already disables softirq when a softirq
handler is running. The only reason I can see is, rcu-bh helps in
situations where - a softirq interrupts a preemptible RCU read-section and
prevents that read section from completing. But this problem would happen
if anyone where to use rcu-preempt - then does rcu-preempt even make sense
to use and shouldn't everyone be using rcu-bh?

The other usecase for rcu-bh seems to be if context-switch is used as a
quiescent state, then softirq flood can prevent that from happening and
cause rcu grace periods from completing. But preemptible RCU *does not* use
context-switch as a quiescent state. So in that case rcu-bh would make
sense only in a configuration where we're not using preemptible-rcu at all
and are getting flooded by softirqs. Is that the reason rcu-bh needs to
exist?

thanks!

- Joel