Re: rcu-bh design
On Fri, May 4, 2018 at 4:42 PM Paul E. McKenney wrote: [..] > > Say that the tracepoint code is buggy and for some reason a > > call_srcu wasn't done. For example, say the hypothetical bug I'm > > taking about is in tracepoint_remove_func which called the > > rcu_assign_pointer, but didn't call release_probes: > > > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > > index d0639d917899..f54cb358f451 100644 > > --- a/kernel/tracepoint.c > > +++ b/kernel/tracepoint.c > > @@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint *tp, > > rcu_assign_pointer(tp->funcs, tp_funcs); > > if (!static_key_enabled(&tp->key)) > > static_key_slow_inc(&tp->key); > > - release_probes(old); > > return 0; > > } > > --- > > I want to catching this from the test code. If I did the following, > > it would be insufficient then: > > > > trace_probe_register(...); > > srcu_barrier(&my_srcu_struct); // only tells me any queued calls > > // are done but not that something > > // was queued > > > > I was seeing if I could use my_srcu_struct::completed for that. but > > I'm not sure if that'll work (or that its legal to access it > > directly - but hey this is just test code! :P). > This is a bit ugly to do programmatically in the kernel. The callbacks > are on per-CPU queues that cannot be safely accessed except by that CPU. > Plus they are on singly linked lists that can be quite long, so it could > be insanely slow as well. > But given that this is test code, why not leverage event tracing? > There are some relevant events in include/trace/events/rcu.h, starting > with the rcu_callback trace event. Sounds like a great idea. I'll try that, thanks. - Joel
Re: rcu-bh design
On Fri, May 04, 2018 at 04:20:41PM -0700, Joel Fernandes wrote: > > > On 05/04/2018 03:49 PM, Paul E. McKenney wrote: > >>Yes just one more ;-). I am trying to write a 'probetorture' test inspired > >>by RCU torture that whacks the tracepoints in various scenarios. One of the > >>things I want to do is verify the RCU callbacks are queued and secondly, > >>they are executed. Just to verify that the garbage collect was done and > >>we're not leaking the function probe table (not that I don't have > >>confidence in the chained callback technique which you mentioned, but it > >>would be nice to assure this mechanism is working for tracepoints). > >> > >>Is there a way to detect this given a reference to srcu_struct? Mathieu and > >>we were chatting about srcu_barrier which is cool but that just tells me > >>that if there was a callback queued, it would have executed after the > >>readers were done. But not whether something was queued. > > > >Suppose that you are queuing an RCU callback that in turn queues an SRCU > >callback on my_srcu_struct, like this: > > > > void my_rcu_callback(struct rcu_head *rhp) > > { > > p = container_of(rhp, struct my_struct, my_rcu_head); > > > > free_it_up_or_down(p); > > } > > > > void my_srcu_callback(struct rcu_head *rhp) > > { > > call_rcu(rhp, my_rcu_callback); > > } > > > > call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); > > > >Then to make sure that any previously submitted callback has been fully > >processed, you do this: > > > > rcu_barrier(); > > srcu_barrier(&my_srcu_struct); > > > >Of course if you queue in the opposite order, like this: > > > > void my_srcu_callback(struct rcu_head *rhp) > > { > > p = container_of(rhp, struct my_struct, my_rcu_head); > > > > free_it_up_or_down(p); > > } > > > > void my_rcu_callback(struct rcu_head *rhp) > > { > > call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); > > } > > > > call_rcu(rhp, my_rcu_callback); > > > >Then you must wait in the opposite order: > > > > rcu_barrier(); > > srcu_barrier(&my_srcu_struct); > > > >Either way, the trick is that the first *_barrier() call cannot return > >until after all previous callbacks have executed, which means that by that > >time the callback is enqueued for the other flavor of {S,}RCU. So the > >second *_barrier() call must wait for the callback to be completely done, > >through both flavors of {S,}RCU. > > > >So after executing the pair of *_barrier() calls, you know that the > >callback is no longer queued. > > > >Does that cover it, or am I missing a turn in here somewhere? > > Yes, that covers some of it. Thanks a lot. Btw, I was also thinking > I want to check that srcu received a callback queuing request as > well (not just completion but also queuing). > > Say that the tracepoint code is buggy and for some reason a > call_srcu wasn't done. For example, say the hypothetical bug I'm > taking about is in tracepoint_remove_func which called the > rcu_assign_pointer, but didn't call release_probes: > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index d0639d917899..f54cb358f451 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint *tp, > rcu_assign_pointer(tp->funcs, tp_funcs); > if (!static_key_enabled(&tp->key)) > static_key_slow_inc(&tp->key); > - release_probes(old); > return 0; > } > --- > I want to catching this from the test code. If I did the following, > it would be insufficient then: > > trace_probe_register(...); > srcu_barrier(&my_srcu_struct); // only tells me any queued calls > // are done but not that something > // was queued > > I was seeing if I could use my_srcu_struct::completed for that. but > I'm not sure if that'll work (or that its legal to access it > directly - but hey this is just test code! :P). This is a bit ugly to do programmatically in the kernel. The callbacks are on per-CPU queues that cannot be safely accessed except by that CPU. Plus they are on singly linked lists that can be quite long, so it could be insanely slow as well. But given that this is test code, why not leverage event tracing? There are some relevant events in include/trace/events/rcu.h, starting with the rcu_callback trace event. Thanx, Paul
Re: rcu-bh design
On 05/04/2018 03:49 PM, Paul E. McKenney wrote: Yes just one more ;-). I am trying to write a 'probetorture' test inspired by RCU torture that whacks the tracepoints in various scenarios. One of the things I want to do is verify the RCU callbacks are queued and secondly, they are executed. Just to verify that the garbage collect was done and we're not leaking the function probe table (not that I don't have confidence in the chained callback technique which you mentioned, but it would be nice to assure this mechanism is working for tracepoints). Is there a way to detect this given a reference to srcu_struct? Mathieu and we were chatting about srcu_barrier which is cool but that just tells me that if there was a callback queued, it would have executed after the readers were done. But not whether something was queued. Suppose that you are queuing an RCU callback that in turn queues an SRCU callback on my_srcu_struct, like this: void my_rcu_callback(struct rcu_head *rhp) { p = container_of(rhp, struct my_struct, my_rcu_head); free_it_up_or_down(p); } void my_srcu_callback(struct rcu_head *rhp) { call_rcu(rhp, my_rcu_callback); } call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); Then to make sure that any previously submitted callback has been fully processed, you do this: rcu_barrier(); srcu_barrier(&my_srcu_struct); Of course if you queue in the opposite order, like this: void my_srcu_callback(struct rcu_head *rhp) { p = container_of(rhp, struct my_struct, my_rcu_head); free_it_up_or_down(p); } void my_rcu_callback(struct rcu_head *rhp) { call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); } call_rcu(rhp, my_rcu_callback); Then you must wait in the opposite order: rcu_barrier(); srcu_barrier(&my_srcu_struct); Either way, the trick is that the first *_barrier() call cannot return until after all previous callbacks have executed, which means that by that time the callback is enqueued for the other flavor of {S,}RCU. So the second *_barrier() call must wait for the callback to be completely done, through both flavors of {S,}RCU. So after executing the pair of *_barrier() calls, you know that the callback is no longer queued. Does that cover it, or am I missing a turn in here somewhere? Yes, that covers some of it. Thanks a lot. Btw, I was also thinking I want to check that srcu received a callback queuing request as well (not just completion but also queuing). Say that the tracepoint code is buggy and for some reason a call_srcu wasn't done. For example, say the hypothetical bug I'm taking about is in tracepoint_remove_func which called the rcu_assign_pointer, but didn't call release_probes: diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index d0639d917899..f54cb358f451 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -216,7 +216,6 @@ static int tracepoint_add_func(struct tracepoint *tp, rcu_assign_pointer(tp->funcs, tp_funcs); if (!static_key_enabled(&tp->key)) static_key_slow_inc(&tp->key); - release_probes(old); return 0; } --- I want to catching this from the test code. If I did the following, it would be insufficient then: trace_probe_register(...); srcu_barrier(&my_srcu_struct); // only tells me any queued calls // are done but not that something // was queued I was seeing if I could use my_srcu_struct::completed for that. but I'm not sure if that'll work (or that its legal to access it directly - but hey this is just test code! :P). thanks, - Joel
Re: rcu-bh design
On Fri, May 04, 2018 at 08:33:19PM +, Joel Fernandes wrote: > On Fri, May 4, 2018 at 1:10 PM Paul E. McKenney > wrote: > [...] > > > >> > Almost. All context switches in an RCU-preempt read-side critical > section > > > >> > must be subject to priority boosting. Preemption is one example, > because > > > >> > boosting the priority of the preempted task will make it runnable. > > > >> > The priority-inheritance -rt "spinlock" is another example, because > > > >> > boosting the priority of the task holding the lock will eventually > make > > > >> > runnable the task acquiring the lock within the RCU-preempt > read-side > > > >> > critical section. > > > >> > > > >> Yes I understand priority boosting is needed with preemptible RCU so > that > > > >> read-sections are making forward progress. I meant (and correct me > if I'm > > > >> wrong) that, as long as a task doesn't sleep in a preemptible RCU > > > >> read-section (rcu-preempt flavor), then bad things wont happen and > RCU will > > > >> work correctly. > > > > > > > > The exception is -rt "spinlock" acquisition. If the "spinlock" is > held, > > > > the task acquiring it will block, which is legal within an RCU-preempt > > > > read-side critical section. > > > > > > > > This exception is why I define bad things in terms of lack of > > > > susceptibility to priority boosting instead of sleeping. > > > > > > Oh, that's a tricky situation. Thanks for letting me know. I guess my > > > view was too idealistic. Makes sense now. > > > Well, let me put it this way... > > > Here is your nice elegant little algorithm: > > https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg > > > Here is your nice elegant little algorithm equipped to survive within > > the Linux kernel: > > https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg > > A picture speaks a 1000 words! :-D And I suspect that a real face-to-face encounter with that guy is worth 1000 pictures. ;-) > > Any questions? ;-) > > Yes just one more ;-). I am trying to write a 'probetorture' test inspired > by RCU torture that whacks the tracepoints in various scenarios. One of the > things I want to do is verify the RCU callbacks are queued and secondly, > they are executed. Just to verify that the garbage collect was done and > we're not leaking the function probe table (not that I don't have > confidence in the chained callback technique which you mentioned, but it > would be nice to assure this mechanism is working for tracepoints). > > Is there a way to detect this given a reference to srcu_struct? Mathieu and > we were chatting about srcu_barrier which is cool but that just tells me > that if there was a callback queued, it would have executed after the > readers were done. But not whether something was queued. Suppose that you are queuing an RCU callback that in turn queues an SRCU callback on my_srcu_struct, like this: void my_rcu_callback(struct rcu_head *rhp) { p = container_of(rhp, struct my_struct, my_rcu_head); free_it_up_or_down(p); } void my_srcu_callback(struct rcu_head *rhp) { call_rcu(rhp, my_rcu_callback); } call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); Then to make sure that any previously submitted callback has been fully processed, you do this: rcu_barrier(); srcu_barrier(&my_srcu_struct); Of course if you queue in the opposite order, like this: void my_srcu_callback(struct rcu_head *rhp) { p = container_of(rhp, struct my_struct, my_rcu_head); free_it_up_or_down(p); } void my_rcu_callback(struct rcu_head *rhp) { call_srcu(&my_srcu_struct, &p->my_rcu_head, my_srcu_callback); } call_rcu(rhp, my_rcu_callback); Then you must wait in the opposite order: rcu_barrier(); srcu_barrier(&my_srcu_struct); Either way, the trick is that the first *_barrier() call cannot return until after all previous callbacks have executed, which means that by that time the callback is enqueued for the other flavor of {S,}RCU. So the second *_barrier() call must wait for the callback to be completely done, through both flavors of {S,}RCU. So after executing the pair of *_barrier() calls, you know that the callback is no longer queued. Does that cover it, or am I missing a turn in here somewhere? Thanx, Paul > thanks, > > - Joel > PS: I remember Paul, you mentioned you are testing this chained callback > case in rcutorture, so if the answer is "go read rcutorture", that's > totally Ok I could just do that ;-) >
Re: rcu-bh design
On Fri, May 4, 2018 at 1:10 PM Paul E. McKenney wrote: [...] > > >> > Almost. All context switches in an RCU-preempt read-side critical section > > >> > must be subject to priority boosting. Preemption is one example, because > > >> > boosting the priority of the preempted task will make it runnable. > > >> > The priority-inheritance -rt "spinlock" is another example, because > > >> > boosting the priority of the task holding the lock will eventually make > > >> > runnable the task acquiring the lock within the RCU-preempt read-side > > >> > critical section. > > >> > > >> Yes I understand priority boosting is needed with preemptible RCU so that > > >> read-sections are making forward progress. I meant (and correct me if I'm > > >> wrong) that, as long as a task doesn't sleep in a preemptible RCU > > >> read-section (rcu-preempt flavor), then bad things wont happen and RCU will > > >> work correctly. > > > > > > The exception is -rt "spinlock" acquisition. If the "spinlock" is held, > > > the task acquiring it will block, which is legal within an RCU-preempt > > > read-side critical section. > > > > > > This exception is why I define bad things in terms of lack of > > > susceptibility to priority boosting instead of sleeping. > > > > Oh, that's a tricky situation. Thanks for letting me know. I guess my > > view was too idealistic. Makes sense now. > Well, let me put it this way... > Here is your nice elegant little algorithm: > https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg > Here is your nice elegant little algorithm equipped to survive within > the Linux kernel: > https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg A picture speaks a 1000 words! :-D > Any questions? ;-) Yes just one more ;-). I am trying to write a 'probetorture' test inspired by RCU torture that whacks the tracepoints in various scenarios. One of the things I want to do is verify the RCU callbacks are queued and secondly, they are executed. Just to verify that the garbage collect was done and we're not leaking the function probe table (not that I don't have confidence in the chained callback technique which you mentioned, but it would be nice to assure this mechanism is working for tracepoints). Is there a way to detect this given a reference to srcu_struct? Mathieu and we were chatting about srcu_barrier which is cool but that just tells me that if there was a callback queued, it would have executed after the readers were done. But not whether something was queued. thanks, - Joel PS: I remember Paul, you mentioned you are testing this chained callback case in rcutorture, so if the answer is "go read rcutorture", that's totally Ok I could just do that ;-)
Re: rcu-bh design
On Fri, May 04, 2018 at 12:57:19PM -0700, Joel Fernandes wrote: > On Fri, May 4, 2018 at 11:49 AM, Paul E. McKenney > wrote: > > On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote: > >> On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney > >> > >> wrote: > >> [...] > >> > > > > But preemptible RCU *does not* use context-switch as a quiescent > >> state. > >> > > > It doesn't? > >> > > > >> > > I thought that's what preemptible rcu is about. You can get preempted > >> but > >> > > you shouldn't block in a read-section. Is that not true? > >> > >> > Almost. All context switches in an RCU-preempt read-side critical > >> > section > >> > must be subject to priority boosting. Preemption is one example, because > >> > boosting the priority of the preempted task will make it runnable. > >> > The priority-inheritance -rt "spinlock" is another example, because > >> > boosting the priority of the task holding the lock will eventually make > >> > runnable the task acquiring the lock within the RCU-preempt read-side > >> > critical section. > >> > >> Yes I understand priority boosting is needed with preemptible RCU so that > >> read-sections are making forward progress. I meant (and correct me if I'm > >> wrong) that, as long as a task doesn't sleep in a preemptible RCU > >> read-section (rcu-preempt flavor), then bad things wont happen and RCU will > >> work correctly. > > > > The exception is -rt "spinlock" acquisition. If the "spinlock" is held, > > the task acquiring it will block, which is legal within an RCU-preempt > > read-side critical section. > > > > This exception is why I define bad things in terms of lack of > > susceptibility to priority boosting instead of sleeping. > > Oh, that's a tricky situation. Thanks for letting me know. I guess my > view was too idealistic. Makes sense now. Well, let me put it this way... Here is your nice elegant little algorithm: https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg Here is your nice elegant little algorithm equipped to survive within the Linux kernel: https://totatema.files.wordpress.com/2012/06/feeling_grizzly-1600x1200.jpg Any questions? ;-) Thanx, Paul
Re: rcu-bh design
On Fri, May 4, 2018 at 11:49 AM, Paul E. McKenney wrote: > On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote: >> On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney >> >> wrote: >> [...] >> > > > > But preemptible RCU *does not* use context-switch as a quiescent >> state. >> > > > It doesn't? >> > > >> > > I thought that's what preemptible rcu is about. You can get preempted >> but >> > > you shouldn't block in a read-section. Is that not true? >> >> > Almost. All context switches in an RCU-preempt read-side critical section >> > must be subject to priority boosting. Preemption is one example, because >> > boosting the priority of the preempted task will make it runnable. >> > The priority-inheritance -rt "spinlock" is another example, because >> > boosting the priority of the task holding the lock will eventually make >> > runnable the task acquiring the lock within the RCU-preempt read-side >> > critical section. >> >> Yes I understand priority boosting is needed with preemptible RCU so that >> read-sections are making forward progress. I meant (and correct me if I'm >> wrong) that, as long as a task doesn't sleep in a preemptible RCU >> read-section (rcu-preempt flavor), then bad things wont happen and RCU will >> work correctly. > > The exception is -rt "spinlock" acquisition. If the "spinlock" is held, > the task acquiring it will block, which is legal within an RCU-preempt > read-side critical section. > > This exception is why I define bad things in terms of lack of > susceptibility to priority boosting instead of sleeping. Oh, that's a tricky situation. Thanks for letting me know. I guess my view was too idealistic. Makes sense now. >> > > I was asking why rcu-bh is needed in the kernel, like why can't we just >> use >> > > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent >> > > denial of service during heavy softirq load. I was trying to understand >> > > that usecase. In my mind, such denial of service / out of memory is then >> > > even possible with preemptible rcu which is used in many places in the >> > > kernel, then why not just use rcu-bh for everything? I was just studying >> > > this RCU flavor (and all other RCU flavors) and so this question popped >> up. >> >> > Because RCU-bh is not preemptible. >> >> > And the non-DoS nature of RCU-bh is one challenge in my current quest to >> > fold all three flavors (RCU-bh, RCU-preempt, and RCU-sched) into one >> > flavor to rule them all. ;-) >> >> But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses >> in the kernel are susceptible to DoS'ing as well? > > Right now, not much. So this is one of the problems I must solve. Oh, ok. >> Isn't the issue the heavy softirq processing itself which can also lead to >> other issues such as scheduling issues (other than the OOM) so probably >> that should be fixed instead of RCU? > > In theory, yes. In practice, the way that the kernel hangs leads them > to yell at me about RCU instead of yelling at whoever is behind the > root cause. So it behooves me to make RCU able to deal with whatever > shows up, at least where reasonably feasible. Otherwise, I am signed up > to fix random DoS-related bugs though more that 20 million lines of code, > which would be a nobel quest, but not one that I am currently prepared > to sign up for. ;-) Yes, I understand. :-) thanks, - Joel
Re: rcu-bh design
On Fri, May 04, 2018 at 06:34:32PM +, Joel Fernandes wrote: > On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney > > wrote: > [...] > > > > > But preemptible RCU *does not* use context-switch as a quiescent > state. > > > > It doesn't? > > > > > > I thought that's what preemptible rcu is about. You can get preempted > but > > > you shouldn't block in a read-section. Is that not true? > > > Almost. All context switches in an RCU-preempt read-side critical section > > must be subject to priority boosting. Preemption is one example, because > > boosting the priority of the preempted task will make it runnable. > > The priority-inheritance -rt "spinlock" is another example, because > > boosting the priority of the task holding the lock will eventually make > > runnable the task acquiring the lock within the RCU-preempt read-side > > critical section. > > Yes I understand priority boosting is needed with preemptible RCU so that > read-sections are making forward progress. I meant (and correct me if I'm > wrong) that, as long as a task doesn't sleep in a preemptible RCU > read-section (rcu-preempt flavor), then bad things wont happen and RCU will > work correctly. The exception is -rt "spinlock" acquisition. If the "spinlock" is held, the task acquiring it will block, which is legal within an RCU-preempt read-side critical section. This exception is why I define bad things in terms of lack of susceptibility to priority boosting instead of sleeping. > > > > > So in that case rcu-bh would make > > > > > sense only in a configuration where we're not using preemptible-rcu > at > > > all > > > > > and are getting flooded by softirqs. Is that the reason rcu-bh > needs to > > > > > exist? > > > > > > > Maybe I'm confused by what you are asking. > > > > > > Sorry for any confusion. I was going through the below link for > motivation > > > of rcu-bh and why it was created: > > > > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor > > > > > > I was asking why rcu-bh is needed in the kernel, like why can't we just > use > > > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent > > > denial of service during heavy softirq load. I was trying to understand > > > that usecase. In my mind, such denial of service / out of memory is then > > > even possible with preemptible rcu which is used in many places in the > > > kernel, then why not just use rcu-bh for everything? I was just studying > > > this RCU flavor (and all other RCU flavors) and so this question popped > up. > > > Because RCU-bh is not preemptible. > > > And the non-DoS nature of RCU-bh is one challenge in my current quest to > > fold all three flavors (RCU-bh, RCU-preempt, and RCU-sched) into one > > flavor to rule them all. ;-) > > But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses > in the kernel are susceptible to DoS'ing as well? Right now, not much. So this is one of the problems I must solve. > Isn't the issue the heavy softirq processing itself which can also lead to > other issues such as scheduling issues (other than the OOM) so probably > that should be fixed instead of RCU? In theory, yes. In practice, the way that the kernel hangs leads them to yell at me about RCU instead of yelling at whoever is behind the root cause. So it behooves me to make RCU able to deal with whatever shows up, at least where reasonably feasible. Otherwise, I am signed up to fix random DoS-related bugs though more that 20 million lines of code, which would be a nobel quest, but not one that I am currently prepared to sign up for. ;-) Hence things like RCU CPU stall warnings. But yes, there have been some modifications to softirq and I will no doubt have to make a few more in order to make this work. Thanx, Paul
Re: rcu-bh design
On Fri, May 4, 2018 at 10:42 AM Paul E. McKenney wrote: [...] > > > > But preemptible RCU *does not* use context-switch as a quiescent state. > > > It doesn't? > > > > I thought that's what preemptible rcu is about. You can get preempted but > > you shouldn't block in a read-section. Is that not true? > Almost. All context switches in an RCU-preempt read-side critical section > must be subject to priority boosting. Preemption is one example, because > boosting the priority of the preempted task will make it runnable. > The priority-inheritance -rt "spinlock" is another example, because > boosting the priority of the task holding the lock will eventually make > runnable the task acquiring the lock within the RCU-preempt read-side > critical section. Yes I understand priority boosting is needed with preemptible RCU so that read-sections are making forward progress. I meant (and correct me if I'm wrong) that, as long as a task doesn't sleep in a preemptible RCU read-section (rcu-preempt flavor), then bad things wont happen and RCU will work correctly. > > > > So in that case rcu-bh would make > > > > sense only in a configuration where we're not using preemptible-rcu at > > all > > > > and are getting flooded by softirqs. Is that the reason rcu-bh needs to > > > > exist? > > > > > Maybe I'm confused by what you are asking. > > > > Sorry for any confusion. I was going through the below link for motivation > > of rcu-bh and why it was created: > > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor > > > > I was asking why rcu-bh is needed in the kernel, like why can't we just use > > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent > > denial of service during heavy softirq load. I was trying to understand > > that usecase. In my mind, such denial of service / out of memory is then > > even possible with preemptible rcu which is used in many places in the > > kernel, then why not just use rcu-bh for everything? I was just studying > > this RCU flavor (and all other RCU flavors) and so this question popped up. > Because RCU-bh is not preemptible. > And the non-DoS nature of RCU-bh is one challenge in my current quest to > fold all three flavors (RCU-bh, RCU-preempt, and RCU-sched) into one > flavor to rule them all. ;-) But what prevents DoS'ing of RCU-preempt? That means all RCU-preempt uses in the kernel are susceptible to DoS'ing as well? Isn't the issue the heavy softirq processing itself which can also lead to other issues such as scheduling issues (other than the OOM) so probably that should be fixed instead of RCU? thanks, - Joel
Re: rcu-bh design
On Fri, May 04, 2018 at 05:15:11PM +, Joel Fernandes wrote: > Hi Steven, > Just for a warning/disclaimer, I am new to RCU-land and trying to make > sense ;-) So forgive me if something sounds too outlandish. > > On Fri, May 4, 2018 at 9:30 AM Steven Rostedt wrote: > > > On Fri, 04 May 2018 16:20:11 + > > Joel Fernandes wrote: > > > > Hi Paul, everyone, > > > > > > I had some question(s) about rcu-bh design. > > > I am trying to understand the reasoning or need of it. I see that rcu-bh > > > will disable softirqs across read-side sections. But I am wondering why > > > this is needed. __do_softirq already disables softirq when a softirq > > > handler is running. The only reason I can see is, rcu-bh helps in > > > situations where - a softirq interrupts a preemptible RCU read-section > and > > > prevents that read section from completing. But this problem would > happen > > > if anyone where to use rcu-preempt - then does rcu-preempt even make > sense > > > to use and shouldn't everyone be using rcu-bh? > > > I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking > > softirqs from happening makes sense. I don't think an > > rcu_read_lock_bh() makes sense in a softirq. > > Ok. > > > > The other usecase for rcu-bh seems to be if context-switch is used as a > > > quiescent state, then softirq flood can prevent that from happening and > > > cause rcu grace periods from completing. > > > > But preemptible RCU *does not* use context-switch as a quiescent state. > > It doesn't? > > I thought that's what preemptible rcu is about. You can get preempted but > you shouldn't block in a read-section. Is that not true? Almost. All context switches in an RCU-preempt read-side critical section must be subject to priority boosting. Preemption is one example, because boosting the priority of the preempted task will make it runnable. The priority-inheritance -rt "spinlock" is another example, because boosting the priority of the task holding the lock will eventually make runnable the task acquiring the lock within the RCU-preempt read-side critical section. > > > So in that case rcu-bh would make > > > sense only in a configuration where we're not using preemptible-rcu at > all > > > and are getting flooded by softirqs. Is that the reason rcu-bh needs to > > > exist? > > > Maybe I'm confused by what you are asking. > > Sorry for any confusion. I was going through the below link for motivation > of rcu-bh and why it was created: > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor > > I was asking why rcu-bh is needed in the kernel, like why can't we just use > rcu-preempt. As per above link, the motivation of rcu-bh was to prevent > denial of service during heavy softirq load. I was trying to understand > that usecase. In my mind, such denial of service / out of memory is then > even possible with preemptible rcu which is used in many places in the > kernel, then why not just use rcu-bh for everything? I was just studying > this RCU flavor (and all other RCU flavors) and so this question popped up. Because RCU-bh is not preemptible. And the non-DoS nature of RCU-bh is one challenge in my current quest to fold all three flavors (RCU-bh, RCU-preempt, and RCU-sched) into one flavor to rule them all. ;-) Thanx, Paul
Re: rcu-bh design
On Fri, 4 May 2018 10:32:06 -0700 "Paul E. McKenney" wrote: > > > But preemptible RCU *does not* use context-switch as a quiescent state. > > > > It doesn't? > > It does, but only sort of. Ah, my confusion. I was thinking of rcu_read_lock_sched() as "preemptible RCU", not the "we can preempt in rcu_read_lock() context". -- Steve > > A context switch really is always an RCU-preempt quiescent state from > the perspective of the CPU. However, from the perspective of the task, > context switch is a quiescent state only if the task is not in an > RCU-preempt read-side critical section at the time.
Re: rcu-bh design
On Fri, May 04, 2018 at 12:30:50PM -0400, Steven Rostedt wrote: > On Fri, 04 May 2018 16:20:11 + > Joel Fernandes wrote: > > > Hi Paul, everyone, > > > > I had some question(s) about rcu-bh design. > > I am trying to understand the reasoning or need of it. I see that rcu-bh > > will disable softirqs across read-side sections. But I am wondering why > > this is needed. __do_softirq already disables softirq when a softirq > > handler is running. The only reason I can see is, rcu-bh helps in > > situations where - a softirq interrupts a preemptible RCU read-section and > > prevents that read section from completing. But this problem would happen > > if anyone where to use rcu-preempt - then does rcu-preempt even make sense > > to use and shouldn't everyone be using rcu-bh? > > I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking > softirqs from happening makes sense. I don't think an > rcu_read_lock_bh() makes sense in a softirq. Agreed, any place in the code where bottom halves are enabled is an RCU-bh quiescent state. > > The other usecase for rcu-bh seems to be if context-switch is used as a > > quiescent state, then softirq flood can prevent that from happening and > > cause rcu grace periods from completing. > > > But preemptible RCU *does not* use context-switch as a quiescent state. > > It doesn't? It does, but only sort of. A context switch really is always an RCU-preempt quiescent state from the perspective of the CPU. However, from the perspective of the task, context switch is a quiescent state only if the task is not in an RCU-preempt read-side critical section at the time. > > So in that case rcu-bh would make > > sense only in a configuration where we're not using preemptible-rcu at all > > and are getting flooded by softirqs. Is that the reason rcu-bh needs to > > exist? > > Maybe I'm confused by what you are asking. The existence and use of RCU-bh in no way degrades the preemptibility of RCU-preempt read-side critical sections. The reason is that RCU-bh and RCU-preempt are completely separate, represented by different rcu_state structure instances. Thanx, Paul
Re: rcu-bh design
Hi Steven, Just for a warning/disclaimer, I am new to RCU-land and trying to make sense ;-) So forgive me if something sounds too outlandish. On Fri, May 4, 2018 at 9:30 AM Steven Rostedt wrote: > On Fri, 04 May 2018 16:20:11 + > Joel Fernandes wrote: > > Hi Paul, everyone, > > > > I had some question(s) about rcu-bh design. > > I am trying to understand the reasoning or need of it. I see that rcu-bh > > will disable softirqs across read-side sections. But I am wondering why > > this is needed. __do_softirq already disables softirq when a softirq > > handler is running. The only reason I can see is, rcu-bh helps in > > situations where - a softirq interrupts a preemptible RCU read-section and > > prevents that read section from completing. But this problem would happen > > if anyone where to use rcu-preempt - then does rcu-preempt even make sense > > to use and shouldn't everyone be using rcu-bh? > I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking > softirqs from happening makes sense. I don't think an > rcu_read_lock_bh() makes sense in a softirq. Ok. > > > > The other usecase for rcu-bh seems to be if context-switch is used as a > > quiescent state, then softirq flood can prevent that from happening and > > cause rcu grace periods from completing. > > But preemptible RCU *does not* use context-switch as a quiescent state. > It doesn't? I thought that's what preemptible rcu is about. You can get preempted but you shouldn't block in a read-section. Is that not true? > > So in that case rcu-bh would make > > sense only in a configuration where we're not using preemptible-rcu at all > > and are getting flooded by softirqs. Is that the reason rcu-bh needs to > > exist? > Maybe I'm confused by what you are asking. Sorry for any confusion. I was going through the below link for motivation of rcu-bh and why it was created: https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Bottom-Half%20Flavor I was asking why rcu-bh is needed in the kernel, like why can't we just use rcu-preempt. As per above link, the motivation of rcu-bh was to prevent denial of service during heavy softirq load. I was trying to understand that usecase. In my mind, such denial of service / out of memory is then even possible with preemptible rcu which is used in many places in the kernel, then why not just use rcu-bh for everything? I was just studying this RCU flavor (and all other RCU flavors) and so this question popped up. thanks, - Joel
Re: rcu-bh design
On Fri, 04 May 2018 16:20:11 + Joel Fernandes wrote: > Hi Paul, everyone, > > I had some question(s) about rcu-bh design. > I am trying to understand the reasoning or need of it. I see that rcu-bh > will disable softirqs across read-side sections. But I am wondering why > this is needed. __do_softirq already disables softirq when a softirq > handler is running. The only reason I can see is, rcu-bh helps in > situations where - a softirq interrupts a preemptible RCU read-section and > prevents that read section from completing. But this problem would happen > if anyone where to use rcu-preempt - then does rcu-preempt even make sense > to use and shouldn't everyone be using rcu-bh? I thought rcu-bh uses softirqs as a quiescent state. Thus, blocking softirqs from happening makes sense. I don't think an rcu_read_lock_bh() makes sense in a softirq. > > The other usecase for rcu-bh seems to be if context-switch is used as a > quiescent state, then softirq flood can prevent that from happening and > cause rcu grace periods from completing. > But preemptible RCU *does not* use context-switch as a quiescent state. It doesn't? > So in that case rcu-bh would make > sense only in a configuration where we're not using preemptible-rcu at all > and are getting flooded by softirqs. Is that the reason rcu-bh needs to > exist? Maybe I'm confused by what you are asking. -- Steve