Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Tue, Jun 24, 2014 at 01:43:16PM -0700, Dave Hansen wrote: > On 06/23/2014 05:39 PM, Paul E. McKenney wrote: > > On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: > >> On 06/23/2014 05:15 PM, Paul E. McKenney wrote: > >>> Just out of curiosity, how many CPUs does your system have? 80? > >>> If 160, looks like something bad is happening at 80. > >> > >> 80 cores, 160 threads. >80 processes/threads is where we start using > >> the second thread on the cores. The tasks are also pinned to > >> hyperthread pairs, so they disturb each other, and the scheduler moves > >> them between threads on occasion which causes extra noise. > > > > OK, that could explain the near flattening of throughput near 80 > > processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the > > new sysfs parameter at its default value? > > Here's 3.16-rc1 with e552592e applied and jiffies_till_sched_qs=12 vs. 3.15: > > > https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq12&1=3.15 > > 3.16-rc1 is actually in the lead up until the end when we're filling up > the hyperthreads. The same pattern holds when comparing > 3.16-rc1+e552592e to 3.16-rc1 with ac1bea8 reverted: > > > https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq12&1=3.16.0-rc1-wrevert > > So, the current situation is generally _better_ than 3.15, except during > the noisy ranges of the test where hyperthreading and the scheduler are > coming in to play. Good to know that my intuition is not yet completely broken. ;-) > I made the mistake of doing all my spot-checks at > the 160-thread number, which honestly wasn't the best point to be > looking at. That would do it! ;-) > At this point, I'm satisfied with how e552592e is dealing with the > original regression. Thanks for all the prompt attention on this one, Paul. Glad it worked out, I have sent a pull request to Ingo to hopefully get this into 3.16. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:39 PM, Paul E. McKenney wrote: > On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: >> On 06/23/2014 05:15 PM, Paul E. McKenney wrote: >>> Just out of curiosity, how many CPUs does your system have? 80? >>> If 160, looks like something bad is happening at 80. >> >> 80 cores, 160 threads. >80 processes/threads is where we start using >> the second thread on the cores. The tasks are also pinned to >> hyperthread pairs, so they disturb each other, and the scheduler moves >> them between threads on occasion which causes extra noise. > > OK, that could explain the near flattening of throughput near 80 > processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the > new sysfs parameter at its default value? Here's 3.16-rc1 with e552592e applied and jiffies_till_sched_qs=12 vs. 3.15: > https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq12&1=3.15 3.16-rc1 is actually in the lead up until the end when we're filling up the hyperthreads. The same pattern holds when comparing 3.16-rc1+e552592e to 3.16-rc1 with ac1bea8 reverted: > https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq12&1=3.16.0-rc1-wrevert So, the current situation is generally _better_ than 3.15, except during the noisy ranges of the test where hyperthreading and the scheduler are coming in to play. I made the mistake of doing all my spot-checks at the 160-thread number, which honestly wasn't the best point to be looking at. At this point, I'm satisfied with how e552592e is dealing with the original regression. Thanks for all the prompt attention on this one, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:39 PM, Paul E. McKenney wrote: > On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: >> On 06/23/2014 05:15 PM, Paul E. McKenney wrote: >>> Just out of curiosity, how many CPUs does your system have? 80? >>> If 160, looks like something bad is happening at 80. >> >> 80 cores, 160 threads. >80 processes/threads is where we start using >> the second thread on the cores. The tasks are also pinned to >> hyperthread pairs, so they disturb each other, and the scheduler moves >> them between threads on occasion which causes extra noise. > > OK, that could explain the near flattening of throughput near 80 > processes. Is 3.16.0-rc1-pf2 with the two RCU patches? It's actually with _just_ e552592e03 applied on top of 3.16-rc1. > If so, is the new sysfs parameter at its default value? I didn't record that, and I've forgotten. I'll re-run it to verify what it was. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:39 PM, Paul E. McKenney wrote: On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: On 06/23/2014 05:15 PM, Paul E. McKenney wrote: Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. 80 cores, 160 threads. 80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. OK, that could explain the near flattening of throughput near 80 processes. Is 3.16.0-rc1-pf2 with the two RCU patches? It's actually with _just_ e552592e03 applied on top of 3.16-rc1. If so, is the new sysfs parameter at its default value? I didn't record that, and I've forgotten. I'll re-run it to verify what it was. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:39 PM, Paul E. McKenney wrote: On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: On 06/23/2014 05:15 PM, Paul E. McKenney wrote: Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. 80 cores, 160 threads. 80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. OK, that could explain the near flattening of throughput near 80 processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the new sysfs parameter at its default value? Here's 3.16-rc1 with e552592e applied and jiffies_till_sched_qs=12 vs. 3.15: https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq121=3.15 3.16-rc1 is actually in the lead up until the end when we're filling up the hyperthreads. The same pattern holds when comparing 3.16-rc1+e552592e to 3.16-rc1 with ac1bea8 reverted: https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq121=3.16.0-rc1-wrevert So, the current situation is generally _better_ than 3.15, except during the noisy ranges of the test where hyperthreading and the scheduler are coming in to play. I made the mistake of doing all my spot-checks at the 160-thread number, which honestly wasn't the best point to be looking at. At this point, I'm satisfied with how e552592e is dealing with the original regression. Thanks for all the prompt attention on this one, Paul. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Tue, Jun 24, 2014 at 01:43:16PM -0700, Dave Hansen wrote: On 06/23/2014 05:39 PM, Paul E. McKenney wrote: On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: On 06/23/2014 05:15 PM, Paul E. McKenney wrote: Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. 80 cores, 160 threads. 80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. OK, that could explain the near flattening of throughput near 80 processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the new sysfs parameter at its default value? Here's 3.16-rc1 with e552592e applied and jiffies_till_sched_qs=12 vs. 3.15: https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq121=3.15 3.16-rc1 is actually in the lead up until the end when we're filling up the hyperthreads. The same pattern holds when comparing 3.16-rc1+e552592e to 3.16-rc1 with ac1bea8 reverted: https://www.sr71.net/~dave/intel/bb.html?2=3.16.0-rc1-paultry2-jtsq121=3.16.0-rc1-wrevert So, the current situation is generally _better_ than 3.15, except during the noisy ranges of the test where hyperthreading and the scheduler are coming in to play. Good to know that my intuition is not yet completely broken. ;-) I made the mistake of doing all my spot-checks at the 160-thread number, which honestly wasn't the best point to be looking at. That would do it! ;-) At this point, I'm satisfied with how e552592e is dealing with the original regression. Thanks for all the prompt attention on this one, Paul. Glad it worked out, I have sent a pull request to Ingo to hopefully get this into 3.16. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: > On 06/23/2014 05:15 PM, Paul E. McKenney wrote: > > Just out of curiosity, how many CPUs does your system have? 80? > > If 160, looks like something bad is happening at 80. > > 80 cores, 160 threads. >80 processes/threads is where we start using > the second thread on the cores. The tasks are also pinned to > hyperthread pairs, so they disturb each other, and the scheduler moves > them between threads on occasion which causes extra noise. OK, that could explain the near flattening of throughput near 80 processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the new sysfs parameter at its default value? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:15 PM, Paul E. McKenney wrote: > Just out of curiosity, how many CPUs does your system have? 80? > If 160, looks like something bad is happening at 80. 80 cores, 160 threads. >80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 04:30:12PM -0700, Dave Hansen wrote: > On 06/23/2014 11:09 AM, Paul E. McKenney wrote: > > So let's see... The open1 benchmark sits in a loop doing open() > > and close(), and probably spends most of its time in the kernel. > > It doesn't do much context switching. I am guessing that you don't > > have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have > > much effect because then the first quiescent-state-forcing attempt would > > likely finish the grace period. > > > > So, given that short grace periods help other workloads (I have the > > scars to prove it), and given that the patch fixes some real problems, > > I'm not arguing that short grace periods _can_ help some workloads, or > that one is better than the other. The patch in question changes > existing behavior by shortening grace periods. This change of existing > behavior removes some of the benefits that my system gets out of RCU. I > suspect this affects a lot more systems, but my core cout makes it > easier to see. And adds some benefits for other systems. Your tight loop on open() and close() will be sensitive to some things, and tight loops on other syscalls will be sensitive to others. > Perhaps I'm misunderstanding the original patch's intent, but it seemed > to me to be working around an overactive debug message. While often a > _useful_ debug message, it was firing falsely in the case being > addressed in the patch. You are indeed misunderstanding the original patch's intent. It was preventing OOMs. The "overactive debug message" is just a warning that OOMs are possible. > > and given that the large number for rcutree.jiffies_till_sched_qs got > > us within 3%, shouldn't we consider this issue closed? > > With the default value for the tunable, the regression is still solidly > over 10%. I think we can have a reasonable argument about it once the > default delta is down to the small single digits. Look, you are to be congratulated for identifying a micro-benchmark that exposes such small changes in timing, but I am not at all interested in that micro-benchmark becoming the kernel's straightjacket. If you have real workloads for which this micro-benchmark is a good predictor of performance, we can talk about quite a few additional steps to take to tune for those workloads. > One more thing I just realized: this isn't a scalability problem, at > least with rcutree.jiffies_till_sched_qs=12. There's a pretty > consistent delta in throughput throughout the entire range of threads > from 1->160. See the "processes" column in the data files: > > plain 3.15: > > https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.15/open1.csv > e552592e0383bc: > > https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.16.0-rc1-pf2/open1.csv > > or visually: > > > https://www.sr71.net/~dave/intel/array-join.html?1=willitscale/systems/bigbox/3.15&2=willitscale/systems/bigbox/3.16.0-rc1-pf2=linear,threads_idle,processes_idle Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 11:09 AM, Paul E. McKenney wrote: > So let's see... The open1 benchmark sits in a loop doing open() > and close(), and probably spends most of its time in the kernel. > It doesn't do much context switching. I am guessing that you don't > have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have > much effect because then the first quiescent-state-forcing attempt would > likely finish the grace period. > > So, given that short grace periods help other workloads (I have the > scars to prove it), and given that the patch fixes some real problems, I'm not arguing that short grace periods _can_ help some workloads, or that one is better than the other. The patch in question changes existing behavior by shortening grace periods. This change of existing behavior removes some of the benefits that my system gets out of RCU. I suspect this affects a lot more systems, but my core cout makes it easier to see. Perhaps I'm misunderstanding the original patch's intent, but it seemed to me to be working around an overactive debug message. While often a _useful_ debug message, it was firing falsely in the case being addressed in the patch. > and given that the large number for rcutree.jiffies_till_sched_qs got > us within 3%, shouldn't we consider this issue closed? With the default value for the tunable, the regression is still solidly over 10%. I think we can have a reasonable argument about it once the default delta is down to the small single digits. One more thing I just realized: this isn't a scalability problem, at least with rcutree.jiffies_till_sched_qs=12. There's a pretty consistent delta in throughput throughout the entire range of threads from 1->160. See the "processes" column in the data files: plain 3.15: > https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.15/open1.csv e552592e0383bc: > https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.16.0-rc1-pf2/open1.csv or visually: > https://www.sr71.net/~dave/intel/array-join.html?1=willitscale/systems/bigbox/3.15&2=willitscale/systems/bigbox/3.16.0-rc1-pf2=linear,threads_idle,processes_idle -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 10:17:19AM -0700, Dave Hansen wrote: > On 06/23/2014 09:55 AM, Dave Hansen wrote: > > This still has a regression. Commit 1ed70de (from Paul's git tree), > > gets a result of 52231880. If I back up two commits to v3.16-rc1 and > > revert ac1bea85 (the original culprit) the result goes back up to 57308512. > > > > So something is still going on here. > > > > I'll go back and compare the grace period ages to see if I can tell what > > is going on. > > RCU_TRACE interferes with the benchmark a little bit, and it lowers the > delta that the regression causes. So, evaluate this cautiously. RCU_TRACE does increase overhead somewhat, so I would expect somewhat less difference with it enabled. Though I am a bit surprised that the overhead of its counters is measurable. Or is something going on? > According to rcu_sched/rcugp, the average "age" is: > > v3.16-rc1, with ac1bea85 reverted:10.7 > v3.16-rc1, plus e552592e: 6.1 > > Paul, have you been keeping an eye on rcugp? Even if I run my system > with only 10 threads, I still see this basic pattern where the average > "age" is lower when I see lower performance. It seems to be a > reasonable proxy that could be used instead of waiting on me to re-run > tests. I do print out GPs/sec when running rcutorture, and they do vary somewhat, but mostly with different Kconfig parameter settings. Plus rcutorture ramps up and down, so the GPs/sec is less than what you might see in a system running an unvarying workload. That said, increasing grace-period latency is not always good for performance, in fact, I usually get beaten up for grace periods completing too quickly rather than too slowly. This current issue is one of the rare exceptions, perhaps even the only exception. So let's see... The open1 benchmark sits in a loop doing open() and close(), and probably spends most of its time in the kernel. It doesn't do much context switching. I am guessing that you don't have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have much effect because then the first quiescent-state-forcing attempt would likely finish the grace period. So, given that short grace periods help other workloads (I have the scars to prove it), and given that the patch fixes some real problems, and given that the large number for rcutree.jiffies_till_sched_qs got us within 3%, shouldn't we consider this issue closed? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 10:19:05AM -0700, Andi Kleen wrote: > > In 3.10, RCU had 14,046 lines of code, not counting documentation and > > test scripting. In 3.15, RCU had 13,208 lines of code, again not counting > > documentation and test scripting. That is a decrease of almost 1KLoC, > > so your wish is granted. > > Ok that's good progress. Glad you like it! > > CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, > > and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace > > uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing > > CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven > > very good at finding bugs, so I am considering eliminating it as well. > > Given recent and planned changes related to RCU's stall-warning stack > > dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and > > CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. > > (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default > > for some time beforehand.) I have also been considering getting rid of > > CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. > > Yes please to all. > > Sounds good thanks. Very good! Please note that this will take some time. For example, getting rid of CONFIG_RCU_CPU_STALL_TIMEOUT resulted in a series of bugs over a period of a well over a year. It turned out that very few people were exercising it while it was non-default. Hopefully, Fengguang Wu's RANDCONFIG testing is testing these things more these days. Also, some of them might have non-obvious effects on performance, witness the cond_resched() fun and excitement. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 10:16 AM, Paul E. McKenney wrote: > On Mon, Jun 23, 2014 at 09:55:21AM -0700, Dave Hansen wrote: >> This still has a regression. Commit 1ed70de (from Paul's git tree), >> gets a result of 52231880. If I back up two commits to v3.16-rc1 and >> revert ac1bea85 (the original culprit) the result goes back up to 57308512. >> >> So something is still going on here. > > And commit 1ed70de is in fact the right one, so... > > The rcutree.jiffies_till_sched_qs boot/sysfs parameter controls how > long RCU waits before asking for quiescent states. The default is > currently HZ/20. Does increasing this parameter help? Easy for me to > increase the default if it does. Making it an insane value: echo 12 > /sys/module/rcutree/parameters/jiffies_till_sched_qs average:52248706 echo 999 > /sys/module/rcutree/parameters/jiffies_till_sched_qs average:55712533 gets us back up _closer_ to our original 57M number, but it's still not quite there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
> In 3.10, RCU had 14,046 lines of code, not counting documentation and > test scripting. In 3.15, RCU had 13,208 lines of code, again not counting > documentation and test scripting. That is a decrease of almost 1KLoC, > so your wish is granted. Ok that's good progress. > CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, > and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace > uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing > CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven > very good at finding bugs, so I am considering eliminating it as well. > Given recent and planned changes related to RCU's stall-warning stack > dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and > CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. > (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default > for some time beforehand.) I have also been considering getting rid of > CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. Yes please to all. Sounds good thanks. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 09:55 AM, Dave Hansen wrote: > This still has a regression. Commit 1ed70de (from Paul's git tree), > gets a result of 52231880. If I back up two commits to v3.16-rc1 and > revert ac1bea85 (the original culprit) the result goes back up to 57308512. > > So something is still going on here. > > I'll go back and compare the grace period ages to see if I can tell what > is going on. RCU_TRACE interferes with the benchmark a little bit, and it lowers the delta that the regression causes. So, evaluate this cautiously. According to rcu_sched/rcugp, the average "age" is: v3.16-rc1, with ac1bea85 reverted: 10.7 v3.16-rc1, plus e552592e:6.1 Paul, have you been keeping an eye on rcugp? Even if I run my system with only 10 threads, I still see this basic pattern where the average "age" is lower when I see lower performance. It seems to be a reasonable proxy that could be used instead of waiting on me to re-run tests. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 09:55:21AM -0700, Dave Hansen wrote: > This still has a regression. Commit 1ed70de (from Paul's git tree), > gets a result of 52231880. If I back up two commits to v3.16-rc1 and > revert ac1bea85 (the original culprit) the result goes back up to 57308512. > > So something is still going on here. And commit 1ed70de is in fact the right one, so... The rcutree.jiffies_till_sched_qs boot/sysfs parameter controls how long RCU waits before asking for quiescent states. The default is currently HZ/20. Does increasing this parameter help? Easy for me to increase the default if it does. > I'll go back and compare the grace period ages to see if I can tell what > is going on. That would be very helpful, thank you! Thanx, Paul > -- > > root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s30 > testcase:Separate file open/close > warmup > min:319787 max:386071 total:57464989 > min:307235 max:351905 total:53289241 > min:291765 max:342439 total:51364514 > min:297948 max:349214 total:52552745 > min:294950 max:340132 total:51586179 > min:290791 max:339958 total:50793238 > measurement > min:298851 max:346868 total:51951469 > min:292879 max:340704 total:50817269 > min:305768 max:347381 total:52655149 > min:301046 max:345616 total:52449584 > min:300428 max:345293 total:52021166 > min:293404 max:337973 total:51012206 > min:303569 max:348191 total:52713179 > min:305523 max:357448 total:53707053 > min:307040 max:356937 total:53271883 > min:302134 max:347923 total:52477496 > min:297823 max:340488 total:51884417 > min:286981 max:338246 total:50496850 > min:295920 max:349405 total:51792563 > min:302749 max:343780 total:52305074 > min:298497 max:345208 total:52035318 > min:291393 max:332195 total:50163093 > min:303561 max:353396 total:52983515 > min:301613 max:352988 total:53029200 > min:300693 max:343726 total:52057334 > min:296801 max:352408 total:52028824 > min:304834 max:358236 total:53526191 > min:297933 max:338351 total:51578481 > min:299571 max:341679 total:51817941 > min:308225 max:354075 total:53760098 > min:296262 max:346965 total:51856596 > min:309196 max:356432 total:53455141 > min:295604 max:341814 total:51449366 > min:296931 max:345961 total:52051944 > min:300533 max:350304 total:52652951 > min:299887 max:350764 total:52955064 > average:52231880 > root@bigbox:~/will-it-scale# uname -a > Linux bigbox 3.16.0-rc1-2-g1ed70de #176 SMP Mon Jun 23 09:04:02 PDT > 2014 x86_64 x86_64 x86_64 GNU/Linux > > > root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s 30 > testcase:Separate file open/close > warmup > min:346853 max:416035 total:62412724 > min:281766 max:344178 total:52207349 > min:311187 max:374918 total:57149451 > min:326309 max:391061 total:60200366 > min:310327 max:375619 total:56744357 > min:323336 max:393415 total:59619164 > measurement > min:323934 max:393718 total:59665843 > min:307247 max:368313 total:55681436 > min:318210 max:378048 total:57849321 > min:314494 max:383884 total:57741073 > min:316497 max:385223 total:58565045 > min:320490 max:397636 total:59003133 > min:318695 max:391712 total:57789360 > min:304368 max:378540 total:56412216 > min:314609 max:384462 total:58298008 > min:317235 max:384205 total:58812490 > min:323556 max:388014 total:59468492 > min:301011 max:362664 total:55381779 > min:301113 max:364712 total:55375445 > min:311730 max:369336 total:56640530 > min:316951 max:381341 total:58649244 > min:317077 max:383943 total:58132878 > min:316970 max:390127 total:59039489 > min:315895 max:375937 total:57404755 > min:295500 max:346523 total:53086962 > min:310882 max:371923 total:56612144 > min:321837 max:390544 total:59651640 > min:303481 max:368716 total:56135908 > min:306437 max:367658 total:56388659 > min:307343 max:373645 total:56893136 > min:298703 max:358090 total:54152268 > min:319162 max:386583 total:58999429 > min:304881 max:361968 total:55286607 > min:311034 max:381100 total:57846182 > min:312786 max:378270 total:57964383 > min:311740 max:367481 total:56327526 > average:57308512 > root@bigbox:~/will-it-scale# uname -a > Linux bigbox 3.16.0-rc1-dirty #177 SMP Mon Jun 23 09:13:59 PDT 2014 > x86_64 x86_64 x86_64 GNU/Linux > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. I'll go back and compare the grace period ages to see if I can tell what is going on. -- root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s30 testcase:Separate file open/close warmup min:319787 max:386071 total:57464989 min:307235 max:351905 total:53289241 min:291765 max:342439 total:51364514 min:297948 max:349214 total:52552745 min:294950 max:340132 total:51586179 min:290791 max:339958 total:50793238 measurement min:298851 max:346868 total:51951469 min:292879 max:340704 total:50817269 min:305768 max:347381 total:52655149 min:301046 max:345616 total:52449584 min:300428 max:345293 total:52021166 min:293404 max:337973 total:51012206 min:303569 max:348191 total:52713179 min:305523 max:357448 total:53707053 min:307040 max:356937 total:53271883 min:302134 max:347923 total:52477496 min:297823 max:340488 total:51884417 min:286981 max:338246 total:50496850 min:295920 max:349405 total:51792563 min:302749 max:343780 total:52305074 min:298497 max:345208 total:52035318 min:291393 max:332195 total:50163093 min:303561 max:353396 total:52983515 min:301613 max:352988 total:53029200 min:300693 max:343726 total:52057334 min:296801 max:352408 total:52028824 min:304834 max:358236 total:53526191 min:297933 max:338351 total:51578481 min:299571 max:341679 total:51817941 min:308225 max:354075 total:53760098 min:296262 max:346965 total:51856596 min:309196 max:356432 total:53455141 min:295604 max:341814 total:51449366 min:296931 max:345961 total:52051944 min:300533 max:350304 total:52652951 min:299887 max:350764 total:52955064 average:52231880 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-2-g1ed70de #176 SMP Mon Jun 23 09:04:02 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s 30 testcase:Separate file open/close warmup min:346853 max:416035 total:62412724 min:281766 max:344178 total:52207349 min:311187 max:374918 total:57149451 min:326309 max:391061 total:60200366 min:310327 max:375619 total:56744357 min:323336 max:393415 total:59619164 measurement min:323934 max:393718 total:59665843 min:307247 max:368313 total:55681436 min:318210 max:378048 total:57849321 min:314494 max:383884 total:57741073 min:316497 max:385223 total:58565045 min:320490 max:397636 total:59003133 min:318695 max:391712 total:57789360 min:304368 max:378540 total:56412216 min:314609 max:384462 total:58298008 min:317235 max:384205 total:58812490 min:323556 max:388014 total:59468492 min:301011 max:362664 total:55381779 min:301113 max:364712 total:55375445 min:311730 max:369336 total:56640530 min:316951 max:381341 total:58649244 min:317077 max:383943 total:58132878 min:316970 max:390127 total:59039489 min:315895 max:375937 total:57404755 min:295500 max:346523 total:53086962 min:310882 max:371923 total:56612144 min:321837 max:390544 total:59651640 min:303481 max:368716 total:56135908 min:306437 max:367658 total:56388659 min:307343 max:373645 total:56893136 min:298703 max:358090 total:54152268 min:319162 max:386583 total:58999429 min:304881 max:361968 total:55286607 min:311034 max:381100 total:57846182 min:312786 max:378270 total:57964383 min:311740 max:367481 total:56327526 average:57308512 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-dirty #177 SMP Mon Jun 23 09:13:59 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:51:08AM -0500, Christoph Lameter wrote: > On Mon, 23 Jun 2014, Peter Zijlstra wrote: > > > On the topic of these threads; I recently noticed RCU grew a metric ton > > of them, I found some 75 rcu kthreads on my box, wth up with that? > > Would kworker threads work for rcu? That would also avoid the shifting > around of RCU threads for NOHZ configurations (which seems to have to be > done manually right now). The kworker subsystem work that allows > restriction to non NOHZ hardware threads would then also allow the > shifting of the rcu threads which would simplify the whole endeavor. Short term, I am planning to use a different method to automate the binding of the rcuo kthreads to housekeeping CPUs, but longer term, it might well make a lot of sense to move to workqueues and the kworker threads. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:49:31AM -0700, Andi Kleen wrote: > > On the topic of these threads; I recently noticed RCU grew a metric ton > > of them, I found some 75 rcu kthreads on my box, wth up with that? > > It seems like RCU is growing in complexity all the time. > > Can it be put on a diet in general? In 3.10, RCU had 14,046 lines of code, not counting documentation and test scripting. In 3.15, RCU had 13,208 lines of code, again not counting documentation and test scripting. That is a decrease of almost 1KLoC, so your wish is granted. In the future, I hope to be able to make NOCB the default and remove the softirq-based callback handling, which should shrink things a bit further. Of course, continued work to make NOCB handle various corner cases will offset that expected shrinkage, though hopefully not be too much. Of course, I cannot resist taking your call for RCU simplicity as a vote against Peter's proposal for aligning the rcu_node tree to the hardware's electrical structure. ;-) > No more new CONFIGs please either. Since 3.10, I have gotten rid of CONFIG_RCU_CPU_STALL_TIMEOUT. Over time, it might be possible to make CONFIG_RCU_FAST_NO_HZ the default, and thus eliminate that Kconfig parameter. As noted about, ditto for CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven very good at finding bugs, so I am considering eliminating it as well. Given recent and planned changes related to RCU's stall-warning stack dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default for some time beforehand.) I have also been considering getting rid of CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. That should make room for additional RCU Kconfig parameters as needed for specialized or high-risk new functionality, when and if required. Thanx, Paul > -Andi > -- > a...@linux.intel.com -- Speaking for myself only > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
> On the topic of these threads; I recently noticed RCU grew a metric ton > of them, I found some 75 rcu kthreads on my box, wth up with that? It seems like RCU is growing in complexity all the time. Can it be put on a diet in general? No more new CONFIGs please either. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:53:12AM -0500, Christoph Lameter wrote: > On Fri, 20 Jun 2014, Paul E. McKenney wrote: > > > > I like this approach *far* better. This is the kind of thing I had in > > > mind when I suggested using the fqs machinery: remove the poll entirely > > > and just thwack a CPU if it takes too long without a quiescent state. > > > Reviewed-by: Josh Triplett > > > > Glad you like it. Not a fan of the IPI myself, but then again if you > > are spending that must time looping in the kernel, an extra IPI is the > > least of your problems. > > Good. The IPI is only used when actually necessary. The code inserted > was always there and always executed although rarely needed. Interesting. I actually proposed this approach several times in the earlier thread, but to deafing silence: https://lkml.org/lkml/2014/6/18/836, https://lkml.org/lkml/2014/6/17/793, and https://lkml.org/lkml/2014/6/20/479. I guess this further validates interpreting silence as assent. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:26:15AM +0200, Peter Zijlstra wrote: > On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: > > Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) > > fixed a problem where a CPU looping in the kernel with but one runnable > > task would give RCU CPU stall warnings, even if the in-kernel loop > > contained cond_resched() calls. Unfortunately, in so doing, it introduced > > performance regressions in Anton Blanchard's will-it-scale "open1" test. > > The problem appears to be not so much the increased cond_resched() path > > length as an increase in the rate at which grace periods complete, which > > increased per-update grace-period overhead. > > > > This commit takes a different approach to fixing this bug, mainly by > > moving the RCU-visible quiescent state from cond_resched() to > > rcu_note_context_switch(), and by further reducing the check to a > > simple non-zero test of a single per-CPU variable. However, this > > approach requires that the force-quiescent-state processing send > > resched IPIs to the offending CPUs. These will be sent only once > > the grace period has reached an age specified by the boot/sysfs > > parameter rcutree.jiffies_till_sched_qs, or once the grace period > > reaches an age halfway to the point at which RCU CPU stall warnings > > will be emitted, whichever comes first. > > Right, and I suppose the force quiescent stuff is triggered from the > tick, which in turn wakes some of these rcu kthreads, which on UP would > cause scheduling themselves. Yep, which is another reason why this commit only affects TREE_RCU and TREE_PREEMPT_RCU, not TINY_RCU. > On the topic of these threads; I recently noticed RCU grew a metric ton > of them, I found some 75 rcu kthreads on my box, wth up with that? The most likely cause of a recent increase would be if you now have CONFIG_RCU_NOCB_CPU_ALL=y, which would give you a pair of kthreads per CPU for callback offloading. Plus an additional kthread per CPU (for a total of three new kthreads per CPU) for CONFIG_PREEMPT=y. These would be the rcuo kthreads. Are they causing you trouble? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, 20 Jun 2014, Paul E. McKenney wrote: > > I like this approach *far* better. This is the kind of thing I had in > > mind when I suggested using the fqs machinery: remove the poll entirely > > and just thwack a CPU if it takes too long without a quiescent state. > > Reviewed-by: Josh Triplett > > Glad you like it. Not a fan of the IPI myself, but then again if you > are spending that must time looping in the kernel, an extra IPI is the > least of your problems. Good. The IPI is only used when actually necessary. The code inserted was always there and always executed although rarely needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, 23 Jun 2014, Peter Zijlstra wrote: > On the topic of these threads; I recently noticed RCU grew a metric ton > of them, I found some 75 rcu kthreads on my box, wth up with that? Would kworker threads work for rcu? That would also avoid the shifting around of RCU threads for NOHZ configurations (which seems to have to be done manually right now). The kworker subsystem work that allows restriction to non NOHZ hardware threads would then also allow the shifting of the rcu threads which would simplify the whole endeavor. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: > Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) > fixed a problem where a CPU looping in the kernel with but one runnable > task would give RCU CPU stall warnings, even if the in-kernel loop > contained cond_resched() calls. Unfortunately, in so doing, it introduced > performance regressions in Anton Blanchard's will-it-scale "open1" test. > The problem appears to be not so much the increased cond_resched() path > length as an increase in the rate at which grace periods complete, which > increased per-update grace-period overhead. > > This commit takes a different approach to fixing this bug, mainly by > moving the RCU-visible quiescent state from cond_resched() to > rcu_note_context_switch(), and by further reducing the check to a > simple non-zero test of a single per-CPU variable. However, this > approach requires that the force-quiescent-state processing send > resched IPIs to the offending CPUs. These will be sent only once > the grace period has reached an age specified by the boot/sysfs > parameter rcutree.jiffies_till_sched_qs, or once the grace period > reaches an age halfway to the point at which RCU CPU stall warnings > will be emitted, whichever comes first. Right, and I suppose the force quiescent stuff is triggered from the tick, which in turn wakes some of these rcu kthreads, which on UP would cause scheduling themselves. On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, 23 Jun 2014, Peter Zijlstra wrote: On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? Would kworker threads work for rcu? That would also avoid the shifting around of RCU threads for NOHZ configurations (which seems to have to be done manually right now). The kworker subsystem work that allows restriction to non NOHZ hardware threads would then also allow the shifting of the rcu threads which would simplify the whole endeavor. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, 20 Jun 2014, Paul E. McKenney wrote: I like this approach *far* better. This is the kind of thing I had in mind when I suggested using the fqs machinery: remove the poll entirely and just thwack a CPU if it takes too long without a quiescent state. Reviewed-by: Josh Triplett j...@joshtriplett.org Glad you like it. Not a fan of the IPI myself, but then again if you are spending that must time looping in the kernel, an extra IPI is the least of your problems. Good. The IPI is only used when actually necessary. The code inserted was always there and always executed although rarely needed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:26:15AM +0200, Peter Zijlstra wrote: On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale open1 test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Right, and I suppose the force quiescent stuff is triggered from the tick, which in turn wakes some of these rcu kthreads, which on UP would cause scheduling themselves. Yep, which is another reason why this commit only affects TREE_RCU and TREE_PREEMPT_RCU, not TINY_RCU. On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? The most likely cause of a recent increase would be if you now have CONFIG_RCU_NOCB_CPU_ALL=y, which would give you a pair of kthreads per CPU for callback offloading. Plus an additional kthread per CPU (for a total of three new kthreads per CPU) for CONFIG_PREEMPT=y. These would be the rcuo kthreads. Are they causing you trouble? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:53:12AM -0500, Christoph Lameter wrote: On Fri, 20 Jun 2014, Paul E. McKenney wrote: I like this approach *far* better. This is the kind of thing I had in mind when I suggested using the fqs machinery: remove the poll entirely and just thwack a CPU if it takes too long without a quiescent state. Reviewed-by: Josh Triplett j...@joshtriplett.org Glad you like it. Not a fan of the IPI myself, but then again if you are spending that must time looping in the kernel, an extra IPI is the least of your problems. Good. The IPI is only used when actually necessary. The code inserted was always there and always executed although rarely needed. Interesting. I actually proposed this approach several times in the earlier thread, but to deafing silence: https://lkml.org/lkml/2014/6/18/836, https://lkml.org/lkml/2014/6/17/793, and https://lkml.org/lkml/2014/6/20/479. I guess this further validates interpreting silence as assent. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? It seems like RCU is growing in complexity all the time. Can it be put on a diet in general? No more new CONFIGs please either. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:49:31AM -0700, Andi Kleen wrote: On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? It seems like RCU is growing in complexity all the time. Can it be put on a diet in general? In 3.10, RCU had 14,046 lines of code, not counting documentation and test scripting. In 3.15, RCU had 13,208 lines of code, again not counting documentation and test scripting. That is a decrease of almost 1KLoC, so your wish is granted. In the future, I hope to be able to make NOCB the default and remove the softirq-based callback handling, which should shrink things a bit further. Of course, continued work to make NOCB handle various corner cases will offset that expected shrinkage, though hopefully not be too much. Of course, I cannot resist taking your call for RCU simplicity as a vote against Peter's proposal for aligning the rcu_node tree to the hardware's electrical structure. ;-) No more new CONFIGs please either. Since 3.10, I have gotten rid of CONFIG_RCU_CPU_STALL_TIMEOUT. Over time, it might be possible to make CONFIG_RCU_FAST_NO_HZ the default, and thus eliminate that Kconfig parameter. As noted about, ditto for CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven very good at finding bugs, so I am considering eliminating it as well. Given recent and planned changes related to RCU's stall-warning stack dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default for some time beforehand.) I have also been considering getting rid of CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. That should make room for additional RCU Kconfig parameters as needed for specialized or high-risk new functionality, when and if required. Thanx, Paul -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 08:51:08AM -0500, Christoph Lameter wrote: On Mon, 23 Jun 2014, Peter Zijlstra wrote: On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? Would kworker threads work for rcu? That would also avoid the shifting around of RCU threads for NOHZ configurations (which seems to have to be done manually right now). The kworker subsystem work that allows restriction to non NOHZ hardware threads would then also allow the shifting of the rcu threads which would simplify the whole endeavor. Short term, I am planning to use a different method to automate the binding of the rcuo kthreads to housekeeping CPUs, but longer term, it might well make a lot of sense to move to workqueues and the kworker threads. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. I'll go back and compare the grace period ages to see if I can tell what is going on. -- root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s30 testcase:Separate file open/close warmup min:319787 max:386071 total:57464989 min:307235 max:351905 total:53289241 min:291765 max:342439 total:51364514 min:297948 max:349214 total:52552745 min:294950 max:340132 total:51586179 min:290791 max:339958 total:50793238 measurement min:298851 max:346868 total:51951469 min:292879 max:340704 total:50817269 min:305768 max:347381 total:52655149 min:301046 max:345616 total:52449584 min:300428 max:345293 total:52021166 min:293404 max:337973 total:51012206 min:303569 max:348191 total:52713179 min:305523 max:357448 total:53707053 min:307040 max:356937 total:53271883 min:302134 max:347923 total:52477496 min:297823 max:340488 total:51884417 min:286981 max:338246 total:50496850 min:295920 max:349405 total:51792563 min:302749 max:343780 total:52305074 min:298497 max:345208 total:52035318 min:291393 max:332195 total:50163093 min:303561 max:353396 total:52983515 min:301613 max:352988 total:53029200 min:300693 max:343726 total:52057334 min:296801 max:352408 total:52028824 min:304834 max:358236 total:53526191 min:297933 max:338351 total:51578481 min:299571 max:341679 total:51817941 min:308225 max:354075 total:53760098 min:296262 max:346965 total:51856596 min:309196 max:356432 total:53455141 min:295604 max:341814 total:51449366 min:296931 max:345961 total:52051944 min:300533 max:350304 total:52652951 min:299887 max:350764 total:52955064 average:52231880 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-2-g1ed70de #176 SMP Mon Jun 23 09:04:02 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s 30 testcase:Separate file open/close warmup min:346853 max:416035 total:62412724 min:281766 max:344178 total:52207349 min:311187 max:374918 total:57149451 min:326309 max:391061 total:60200366 min:310327 max:375619 total:56744357 min:323336 max:393415 total:59619164 measurement min:323934 max:393718 total:59665843 min:307247 max:368313 total:55681436 min:318210 max:378048 total:57849321 min:314494 max:383884 total:57741073 min:316497 max:385223 total:58565045 min:320490 max:397636 total:59003133 min:318695 max:391712 total:57789360 min:304368 max:378540 total:56412216 min:314609 max:384462 total:58298008 min:317235 max:384205 total:58812490 min:323556 max:388014 total:59468492 min:301011 max:362664 total:55381779 min:301113 max:364712 total:55375445 min:311730 max:369336 total:56640530 min:316951 max:381341 total:58649244 min:317077 max:383943 total:58132878 min:316970 max:390127 total:59039489 min:315895 max:375937 total:57404755 min:295500 max:346523 total:53086962 min:310882 max:371923 total:56612144 min:321837 max:390544 total:59651640 min:303481 max:368716 total:56135908 min:306437 max:367658 total:56388659 min:307343 max:373645 total:56893136 min:298703 max:358090 total:54152268 min:319162 max:386583 total:58999429 min:304881 max:361968 total:55286607 min:311034 max:381100 total:57846182 min:312786 max:378270 total:57964383 min:311740 max:367481 total:56327526 average:57308512 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-dirty #177 SMP Mon Jun 23 09:13:59 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 09:55:21AM -0700, Dave Hansen wrote: This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. And commit 1ed70de is in fact the right one, so... The rcutree.jiffies_till_sched_qs boot/sysfs parameter controls how long RCU waits before asking for quiescent states. The default is currently HZ/20. Does increasing this parameter help? Easy for me to increase the default if it does. I'll go back and compare the grace period ages to see if I can tell what is going on. That would be very helpful, thank you! Thanx, Paul -- root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s30 testcase:Separate file open/close warmup min:319787 max:386071 total:57464989 min:307235 max:351905 total:53289241 min:291765 max:342439 total:51364514 min:297948 max:349214 total:52552745 min:294950 max:340132 total:51586179 min:290791 max:339958 total:50793238 measurement min:298851 max:346868 total:51951469 min:292879 max:340704 total:50817269 min:305768 max:347381 total:52655149 min:301046 max:345616 total:52449584 min:300428 max:345293 total:52021166 min:293404 max:337973 total:51012206 min:303569 max:348191 total:52713179 min:305523 max:357448 total:53707053 min:307040 max:356937 total:53271883 min:302134 max:347923 total:52477496 min:297823 max:340488 total:51884417 min:286981 max:338246 total:50496850 min:295920 max:349405 total:51792563 min:302749 max:343780 total:52305074 min:298497 max:345208 total:52035318 min:291393 max:332195 total:50163093 min:303561 max:353396 total:52983515 min:301613 max:352988 total:53029200 min:300693 max:343726 total:52057334 min:296801 max:352408 total:52028824 min:304834 max:358236 total:53526191 min:297933 max:338351 total:51578481 min:299571 max:341679 total:51817941 min:308225 max:354075 total:53760098 min:296262 max:346965 total:51856596 min:309196 max:356432 total:53455141 min:295604 max:341814 total:51449366 min:296931 max:345961 total:52051944 min:300533 max:350304 total:52652951 min:299887 max:350764 total:52955064 average:52231880 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-2-g1ed70de #176 SMP Mon Jun 23 09:04:02 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux root@bigbox:~/will-it-scale# ./open1_processes -t 160 -s 30 testcase:Separate file open/close warmup min:346853 max:416035 total:62412724 min:281766 max:344178 total:52207349 min:311187 max:374918 total:57149451 min:326309 max:391061 total:60200366 min:310327 max:375619 total:56744357 min:323336 max:393415 total:59619164 measurement min:323934 max:393718 total:59665843 min:307247 max:368313 total:55681436 min:318210 max:378048 total:57849321 min:314494 max:383884 total:57741073 min:316497 max:385223 total:58565045 min:320490 max:397636 total:59003133 min:318695 max:391712 total:57789360 min:304368 max:378540 total:56412216 min:314609 max:384462 total:58298008 min:317235 max:384205 total:58812490 min:323556 max:388014 total:59468492 min:301011 max:362664 total:55381779 min:301113 max:364712 total:55375445 min:311730 max:369336 total:56640530 min:316951 max:381341 total:58649244 min:317077 max:383943 total:58132878 min:316970 max:390127 total:59039489 min:315895 max:375937 total:57404755 min:295500 max:346523 total:53086962 min:310882 max:371923 total:56612144 min:321837 max:390544 total:59651640 min:303481 max:368716 total:56135908 min:306437 max:367658 total:56388659 min:307343 max:373645 total:56893136 min:298703 max:358090 total:54152268 min:319162 max:386583 total:58999429 min:304881 max:361968 total:55286607 min:311034 max:381100 total:57846182 min:312786 max:378270 total:57964383 min:311740 max:367481 total:56327526 average:57308512 root@bigbox:~/will-it-scale# uname -a Linux bigbox 3.16.0-rc1-dirty #177 SMP Mon Jun 23 09:13:59 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 09:55 AM, Dave Hansen wrote: This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. I'll go back and compare the grace period ages to see if I can tell what is going on. RCU_TRACE interferes with the benchmark a little bit, and it lowers the delta that the regression causes. So, evaluate this cautiously. According to rcu_sched/rcugp, the average age is: v3.16-rc1, with ac1bea85 reverted: 10.7 v3.16-rc1, plus e552592e:6.1 Paul, have you been keeping an eye on rcugp? Even if I run my system with only 10 threads, I still see this basic pattern where the average age is lower when I see lower performance. It seems to be a reasonable proxy that could be used instead of waiting on me to re-run tests. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
In 3.10, RCU had 14,046 lines of code, not counting documentation and test scripting. In 3.15, RCU had 13,208 lines of code, again not counting documentation and test scripting. That is a decrease of almost 1KLoC, so your wish is granted. Ok that's good progress. CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven very good at finding bugs, so I am considering eliminating it as well. Given recent and planned changes related to RCU's stall-warning stack dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default for some time beforehand.) I have also been considering getting rid of CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. Yes please to all. Sounds good thanks. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 10:16 AM, Paul E. McKenney wrote: On Mon, Jun 23, 2014 at 09:55:21AM -0700, Dave Hansen wrote: This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. And commit 1ed70de is in fact the right one, so... The rcutree.jiffies_till_sched_qs boot/sysfs parameter controls how long RCU waits before asking for quiescent states. The default is currently HZ/20. Does increasing this parameter help? Easy for me to increase the default if it does. Making it an insane value: echo 12 /sys/module/rcutree/parameters/jiffies_till_sched_qs average:52248706 echo 999 /sys/module/rcutree/parameters/jiffies_till_sched_qs average:55712533 gets us back up _closer_ to our original 57M number, but it's still not quite there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 10:19:05AM -0700, Andi Kleen wrote: In 3.10, RCU had 14,046 lines of code, not counting documentation and test scripting. In 3.15, RCU had 13,208 lines of code, again not counting documentation and test scripting. That is a decrease of almost 1KLoC, so your wish is granted. Ok that's good progress. Glad you like it! CONFIG_RCU_NOCB_CPU, CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL. It also might be reasonable to replace uses of CONFIG_PROVE_RCU with CONFIG_PROVE_LOCKING, thus allowing CONFIG_PROVE_RCU to be eliminated. CONFIG_PROVE_RCU_DELAY hasn't proven very good at finding bugs, so I am considering eliminating it as well. Given recent and planned changes related to RCU's stall-warning stack dumping, I hope to eliminate both CONFIG_RCU_CPU_STALL_VERBOSE and CONFIG_RCU_CPU_STALL_INFO, making them both happen unconditionally. (And yes, I should probably make CONFIG_RCU_CPU_STALL_INFO be the default for some time beforehand.) I have also been considering getting rid of CONFIG_RCU_FANOUT_EXACT, given that it appears that no one uses it. Yes please to all. Sounds good thanks. Very good! Please note that this will take some time. For example, getting rid of CONFIG_RCU_CPU_STALL_TIMEOUT resulted in a series of bugs over a period of a well over a year. It turned out that very few people were exercising it while it was non-default. Hopefully, Fengguang Wu's RANDCONFIG testing is testing these things more these days. Also, some of them might have non-obvious effects on performance, witness the cond_resched() fun and excitement. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 10:17:19AM -0700, Dave Hansen wrote: On 06/23/2014 09:55 AM, Dave Hansen wrote: This still has a regression. Commit 1ed70de (from Paul's git tree), gets a result of 52231880. If I back up two commits to v3.16-rc1 and revert ac1bea85 (the original culprit) the result goes back up to 57308512. So something is still going on here. I'll go back and compare the grace period ages to see if I can tell what is going on. RCU_TRACE interferes with the benchmark a little bit, and it lowers the delta that the regression causes. So, evaluate this cautiously. RCU_TRACE does increase overhead somewhat, so I would expect somewhat less difference with it enabled. Though I am a bit surprised that the overhead of its counters is measurable. Or is something going on? According to rcu_sched/rcugp, the average age is: v3.16-rc1, with ac1bea85 reverted:10.7 v3.16-rc1, plus e552592e: 6.1 Paul, have you been keeping an eye on rcugp? Even if I run my system with only 10 threads, I still see this basic pattern where the average age is lower when I see lower performance. It seems to be a reasonable proxy that could be used instead of waiting on me to re-run tests. I do print out GPs/sec when running rcutorture, and they do vary somewhat, but mostly with different Kconfig parameter settings. Plus rcutorture ramps up and down, so the GPs/sec is less than what you might see in a system running an unvarying workload. That said, increasing grace-period latency is not always good for performance, in fact, I usually get beaten up for grace periods completing too quickly rather than too slowly. This current issue is one of the rare exceptions, perhaps even the only exception. So let's see... The open1 benchmark sits in a loop doing open() and close(), and probably spends most of its time in the kernel. It doesn't do much context switching. I am guessing that you don't have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have much effect because then the first quiescent-state-forcing attempt would likely finish the grace period. So, given that short grace periods help other workloads (I have the scars to prove it), and given that the patch fixes some real problems, and given that the large number for rcutree.jiffies_till_sched_qs got us within 3%, shouldn't we consider this issue closed? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 11:09 AM, Paul E. McKenney wrote: So let's see... The open1 benchmark sits in a loop doing open() and close(), and probably spends most of its time in the kernel. It doesn't do much context switching. I am guessing that you don't have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have much effect because then the first quiescent-state-forcing attempt would likely finish the grace period. So, given that short grace periods help other workloads (I have the scars to prove it), and given that the patch fixes some real problems, I'm not arguing that short grace periods _can_ help some workloads, or that one is better than the other. The patch in question changes existing behavior by shortening grace periods. This change of existing behavior removes some of the benefits that my system gets out of RCU. I suspect this affects a lot more systems, but my core cout makes it easier to see. Perhaps I'm misunderstanding the original patch's intent, but it seemed to me to be working around an overactive debug message. While often a _useful_ debug message, it was firing falsely in the case being addressed in the patch. and given that the large number for rcutree.jiffies_till_sched_qs got us within 3%, shouldn't we consider this issue closed? With the default value for the tunable, the regression is still solidly over 10%. I think we can have a reasonable argument about it once the default delta is down to the small single digits. One more thing I just realized: this isn't a scalability problem, at least with rcutree.jiffies_till_sched_qs=12. There's a pretty consistent delta in throughput throughout the entire range of threads from 1-160. See the processes column in the data files: plain 3.15: https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.15/open1.csv e552592e0383bc: https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.16.0-rc1-pf2/open1.csv or visually: https://www.sr71.net/~dave/intel/array-join.html?1=willitscale/systems/bigbox/3.152=willitscale/systems/bigbox/3.16.0-rc1-pf2hide=linear,threads_idle,processes_idle -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 04:30:12PM -0700, Dave Hansen wrote: On 06/23/2014 11:09 AM, Paul E. McKenney wrote: So let's see... The open1 benchmark sits in a loop doing open() and close(), and probably spends most of its time in the kernel. It doesn't do much context switching. I am guessing that you don't have CONFIG_NO_HZ_FULL=y, or the boot/sysfs parameter would not have much effect because then the first quiescent-state-forcing attempt would likely finish the grace period. So, given that short grace periods help other workloads (I have the scars to prove it), and given that the patch fixes some real problems, I'm not arguing that short grace periods _can_ help some workloads, or that one is better than the other. The patch in question changes existing behavior by shortening grace periods. This change of existing behavior removes some of the benefits that my system gets out of RCU. I suspect this affects a lot more systems, but my core cout makes it easier to see. And adds some benefits for other systems. Your tight loop on open() and close() will be sensitive to some things, and tight loops on other syscalls will be sensitive to others. Perhaps I'm misunderstanding the original patch's intent, but it seemed to me to be working around an overactive debug message. While often a _useful_ debug message, it was firing falsely in the case being addressed in the patch. You are indeed misunderstanding the original patch's intent. It was preventing OOMs. The overactive debug message is just a warning that OOMs are possible. and given that the large number for rcutree.jiffies_till_sched_qs got us within 3%, shouldn't we consider this issue closed? With the default value for the tunable, the regression is still solidly over 10%. I think we can have a reasonable argument about it once the default delta is down to the small single digits. Look, you are to be congratulated for identifying a micro-benchmark that exposes such small changes in timing, but I am not at all interested in that micro-benchmark becoming the kernel's straightjacket. If you have real workloads for which this micro-benchmark is a good predictor of performance, we can talk about quite a few additional steps to take to tune for those workloads. One more thing I just realized: this isn't a scalability problem, at least with rcutree.jiffies_till_sched_qs=12. There's a pretty consistent delta in throughput throughout the entire range of threads from 1-160. See the processes column in the data files: plain 3.15: https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.15/open1.csv e552592e0383bc: https://www.sr71.net/~dave/intel/willitscale/systems/bigbox/3.16.0-rc1-pf2/open1.csv or visually: https://www.sr71.net/~dave/intel/array-join.html?1=willitscale/systems/bigbox/3.152=willitscale/systems/bigbox/3.16.0-rc1-pf2hide=linear,threads_idle,processes_idle Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On 06/23/2014 05:15 PM, Paul E. McKenney wrote: Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. 80 cores, 160 threads. 80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Mon, Jun 23, 2014 at 05:20:30PM -0700, Dave Hansen wrote: On 06/23/2014 05:15 PM, Paul E. McKenney wrote: Just out of curiosity, how many CPUs does your system have? 80? If 160, looks like something bad is happening at 80. 80 cores, 160 threads. 80 processes/threads is where we start using the second thread on the cores. The tasks are also pinned to hyperthread pairs, so they disturb each other, and the scheduler moves them between threads on occasion which causes extra noise. OK, that could explain the near flattening of throughput near 80 processes. Is 3.16.0-rc1-pf2 with the two RCU patches? If so, is the new sysfs parameter at its default value? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale open1 test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Right, and I suppose the force quiescent stuff is triggered from the tick, which in turn wakes some of these rcu kthreads, which on UP would cause scheduling themselves. On the topic of these threads; I recently noticed RCU grew a metric ton of them, I found some 75 rcu kthreads on my box, wth up with that? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 09:29:58PM -0700, Josh Triplett wrote: > On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: > > Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) > > fixed a problem where a CPU looping in the kernel with but one runnable > > task would give RCU CPU stall warnings, even if the in-kernel loop > > contained cond_resched() calls. Unfortunately, in so doing, it introduced > > performance regressions in Anton Blanchard's will-it-scale "open1" test. > > The problem appears to be not so much the increased cond_resched() path > > length as an increase in the rate at which grace periods complete, which > > increased per-update grace-period overhead. > > > > This commit takes a different approach to fixing this bug, mainly by > > moving the RCU-visible quiescent state from cond_resched() to > > rcu_note_context_switch(), and by further reducing the check to a > > simple non-zero test of a single per-CPU variable. However, this > > approach requires that the force-quiescent-state processing send > > resched IPIs to the offending CPUs. These will be sent only once > > the grace period has reached an age specified by the boot/sysfs > > parameter rcutree.jiffies_till_sched_qs, or once the grace period > > reaches an age halfway to the point at which RCU CPU stall warnings > > will be emitted, whichever comes first. > > > > Reported-by: Dave Hansen > > Signed-off-by: Paul E. McKenney > > Cc: Josh Triplett > > Cc: Andi Kleen > > Cc: Christoph Lameter > > Cc: Mike Galbraith > > Cc: Eric Dumazet > > I like this approach *far* better. This is the kind of thing I had in > mind when I suggested using the fqs machinery: remove the poll entirely > and just thwack a CPU if it takes too long without a quiescent state. > Reviewed-by: Josh Triplett Glad you like it. Not a fan of the IPI myself, but then again if you are spending that must time looping in the kernel, an extra IPI is the least of your problems. I will be testing this more thoroughly, and if nothing bad happens will send it on up within a few days. Thanx, Paul > > --- > > > > b/Documentation/kernel-parameters.txt |6 + > > b/include/linux/rcupdate.h| 36 > > b/kernel/rcu/tree.c | 140 > > +++--- > > b/kernel/rcu/tree.h |6 + > > b/kernel/rcu/tree_plugin.h|2 > > b/kernel/rcu/update.c | 18 > > b/kernel/sched/core.c |7 - > > 7 files changed, 125 insertions(+), 90 deletions(-) > > > > diff --git a/Documentation/kernel-parameters.txt > > b/Documentation/kernel-parameters.txt > > index 6eaa9cdb7094..910c3829f81d 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be > > entirely omitted. > > leaf rcu_node structure. Useful for very large > > systems. > > > > + rcutree.jiffies_till_sched_qs= [KNL] > > + Set required age in jiffies for a > > + given grace period before RCU starts > > + soliciting quiescent-state help from > > + rcu_note_context_switch(). > > + > > rcutree.jiffies_till_first_fqs= [KNL] > > Set delay from grace-period initialization to > > first attempt to force quiescent states. > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 5a75d19aa661..243aa4656cb7 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -44,7 +44,6 @@ > > #include > > #include > > #include > > -#include > > #include > > > > extern int rcu_expedited; /* for sysctl */ > > @@ -300,41 +299,6 @@ bool __rcu_is_watching(void); > > #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || > > defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ > > > > /* > > - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > > - */ > > - > > -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ > > -DECLARE_PER_CPU(int, rcu_cond_resched_count); > > -void rcu_resched(void); > > - > > -/* > > - * Is it time to report RCU quiescent states? > > - * > > - * Note unsynchronized access to rcu_cond_resched_count. Yes, we might > > - * increment some random CPU's count, and possibly also load the result > > from > > - * yet another CPU's count. We might even clobber some other CPU's attempt > > - * to zero its counter. This is all OK because the goal is not precision, > > - * but rather reasonable amortization of rcu_note_context_switch() overhead > > - * and extremely high probability of avoiding RCU CPU stall warnings. > > - * Note that this function has to be preempted in just the wrong place, > > - * many thousands of times in a row, for
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 09:29:58PM -0700, Josh Triplett wrote: On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale open1 test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen dave.han...@intel.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Josh Triplett j...@joshtriplett.org Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Lameter c...@gentwo.org Cc: Mike Galbraith umgwanakikb...@gmail.com Cc: Eric Dumazet eric.duma...@gmail.com I like this approach *far* better. This is the kind of thing I had in mind when I suggested using the fqs machinery: remove the poll entirely and just thwack a CPU if it takes too long without a quiescent state. Reviewed-by: Josh Triplett j...@joshtriplett.org Glad you like it. Not a fan of the IPI myself, but then again if you are spending that must time looping in the kernel, an extra IPI is the least of your problems. I will be testing this more thoroughly, and if nothing bad happens will send it on up within a few days. Thanx, Paul --- b/Documentation/kernel-parameters.txt |6 + b/include/linux/rcupdate.h| 36 b/kernel/rcu/tree.c | 140 +++--- b/kernel/rcu/tree.h |6 + b/kernel/rcu/tree_plugin.h|2 b/kernel/rcu/update.c | 18 b/kernel/sched/core.c |7 - 7 files changed, 125 insertions(+), 90 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6eaa9cdb7094..910c3829f81d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_sched_qs= [KNL] + Set required age in jiffies for a + given grace period before RCU starts + soliciting quiescent-state help from + rcu_note_context_switch(). + rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to first attempt to force quiescent states. diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5a75d19aa661..243aa4656cb7 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -44,7 +44,6 @@ #include linux/debugobjects.h #include linux/bug.h #include linux/compiler.h -#include linux/percpu.h #include asm/barrier.h extern int rcu_expedited; /* for sysctl */ @@ -300,41 +299,6 @@ bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ /* - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. - */ - -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ -DECLARE_PER_CPU(int, rcu_cond_resched_count); -void rcu_resched(void); - -/* - * Is it time to report RCU quiescent states? - * - * Note unsynchronized access to rcu_cond_resched_count. Yes, we might - * increment some random CPU's count, and possibly also load the result from - * yet another CPU's count. We might even clobber some other CPU's attempt - * to zero its counter. This is all OK because the goal is not precision, - * but rather reasonable amortization of rcu_note_context_switch() overhead - * and extremely high probability of avoiding RCU CPU stall warnings. - * Note that this function has to be preempted in just the wrong place,
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: > Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) > fixed a problem where a CPU looping in the kernel with but one runnable > task would give RCU CPU stall warnings, even if the in-kernel loop > contained cond_resched() calls. Unfortunately, in so doing, it introduced > performance regressions in Anton Blanchard's will-it-scale "open1" test. > The problem appears to be not so much the increased cond_resched() path > length as an increase in the rate at which grace periods complete, which > increased per-update grace-period overhead. > > This commit takes a different approach to fixing this bug, mainly by > moving the RCU-visible quiescent state from cond_resched() to > rcu_note_context_switch(), and by further reducing the check to a > simple non-zero test of a single per-CPU variable. However, this > approach requires that the force-quiescent-state processing send > resched IPIs to the offending CPUs. These will be sent only once > the grace period has reached an age specified by the boot/sysfs > parameter rcutree.jiffies_till_sched_qs, or once the grace period > reaches an age halfway to the point at which RCU CPU stall warnings > will be emitted, whichever comes first. > > Reported-by: Dave Hansen > Signed-off-by: Paul E. McKenney > Cc: Josh Triplett > Cc: Andi Kleen > Cc: Christoph Lameter > Cc: Mike Galbraith > Cc: Eric Dumazet I like this approach *far* better. This is the kind of thing I had in mind when I suggested using the fqs machinery: remove the poll entirely and just thwack a CPU if it takes too long without a quiescent state. Reviewed-by: Josh Triplett > --- > > b/Documentation/kernel-parameters.txt |6 + > b/include/linux/rcupdate.h| 36 > b/kernel/rcu/tree.c | 140 > +++--- > b/kernel/rcu/tree.h |6 + > b/kernel/rcu/tree_plugin.h|2 > b/kernel/rcu/update.c | 18 > b/kernel/sched/core.c |7 - > 7 files changed, 125 insertions(+), 90 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index 6eaa9cdb7094..910c3829f81d 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > leaf rcu_node structure. Useful for very large > systems. > > + rcutree.jiffies_till_sched_qs= [KNL] > + Set required age in jiffies for a > + given grace period before RCU starts > + soliciting quiescent-state help from > + rcu_note_context_switch(). > + > rcutree.jiffies_till_first_fqs= [KNL] > Set delay from grace-period initialization to > first attempt to force quiescent states. > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 5a75d19aa661..243aa4656cb7 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -44,7 +44,6 @@ > #include > #include > #include > -#include > #include > > extern int rcu_expedited; /* for sysctl */ > @@ -300,41 +299,6 @@ bool __rcu_is_watching(void); > #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) > || defined(CONFIG_SMP) */ > > /* > - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > - */ > - > -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ > -DECLARE_PER_CPU(int, rcu_cond_resched_count); > -void rcu_resched(void); > - > -/* > - * Is it time to report RCU quiescent states? > - * > - * Note unsynchronized access to rcu_cond_resched_count. Yes, we might > - * increment some random CPU's count, and possibly also load the result from > - * yet another CPU's count. We might even clobber some other CPU's attempt > - * to zero its counter. This is all OK because the goal is not precision, > - * but rather reasonable amortization of rcu_note_context_switch() overhead > - * and extremely high probability of avoiding RCU CPU stall warnings. > - * Note that this function has to be preempted in just the wrong place, > - * many thousands of times in a row, for anything bad to happen. > - */ > -static inline bool rcu_should_resched(void) > -{ > - return raw_cpu_inc_return(rcu_cond_resched_count) >= > -RCU_COND_RESCHED_LIM; > -} > - > -/* > - * Report quiscent states to RCU if it is time to do so. > - */ > -static inline void rcu_cond_resched(void) > -{ > - if (unlikely(rcu_should_resched())) > - rcu_resched(); > -} > - > -/* > * Infrastructure to implement the synchronize_() primitives in > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > */ > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >
Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU
On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote: Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale open1 test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen dave.han...@intel.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Josh Triplett j...@joshtriplett.org Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Lameter c...@gentwo.org Cc: Mike Galbraith umgwanakikb...@gmail.com Cc: Eric Dumazet eric.duma...@gmail.com I like this approach *far* better. This is the kind of thing I had in mind when I suggested using the fqs machinery: remove the poll entirely and just thwack a CPU if it takes too long without a quiescent state. Reviewed-by: Josh Triplett j...@joshtriplett.org --- b/Documentation/kernel-parameters.txt |6 + b/include/linux/rcupdate.h| 36 b/kernel/rcu/tree.c | 140 +++--- b/kernel/rcu/tree.h |6 + b/kernel/rcu/tree_plugin.h|2 b/kernel/rcu/update.c | 18 b/kernel/sched/core.c |7 - 7 files changed, 125 insertions(+), 90 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6eaa9cdb7094..910c3829f81d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_sched_qs= [KNL] + Set required age in jiffies for a + given grace period before RCU starts + soliciting quiescent-state help from + rcu_note_context_switch(). + rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to first attempt to force quiescent states. diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5a75d19aa661..243aa4656cb7 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -44,7 +44,6 @@ #include linux/debugobjects.h #include linux/bug.h #include linux/compiler.h -#include linux/percpu.h #include asm/barrier.h extern int rcu_expedited; /* for sysctl */ @@ -300,41 +299,6 @@ bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ /* - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. - */ - -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ -DECLARE_PER_CPU(int, rcu_cond_resched_count); -void rcu_resched(void); - -/* - * Is it time to report RCU quiescent states? - * - * Note unsynchronized access to rcu_cond_resched_count. Yes, we might - * increment some random CPU's count, and possibly also load the result from - * yet another CPU's count. We might even clobber some other CPU's attempt - * to zero its counter. This is all OK because the goal is not precision, - * but rather reasonable amortization of rcu_note_context_switch() overhead - * and extremely high probability of avoiding RCU CPU stall warnings. - * Note that this function has to be preempted in just the wrong place, - * many thousands of times in a row, for anything bad to happen. - */ -static inline bool rcu_should_resched(void) -{ - return raw_cpu_inc_return(rcu_cond_resched_count) = -RCU_COND_RESCHED_LIM; -} - -/* - * Report quiscent states to RCU if it is time to do so. - */ -static inline void rcu_cond_resched(void) -{ - if (unlikely(rcu_should_resched())) - rcu_resched(); -} - -/* * Infrastructure to implement the synchronize_()