Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On Sun, May 21, 2017 at 11:51:39AM +0300, Konstantin Khlebnikov wrote: > On 20.05.2017 19:42, Paul E. McKenney wrote: > >On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > >>This allows to get rid of unneeded invocations. > >> > >>Function debug_lockdep_rcu_enabled() becomes really hot if several > >>debug options are enabled together with CONFIG_PROVE_RCU. > >> > >>Hottest path ends with: > >> debug_lockdep_rcu_enabled > >> is_ftrace_trampoline > >> __kernel_text_address > >> > >>Here debug_lockdep_rcu_enabled() is called from condition > >>(debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > >>do_for_each_ftrace_op(), where "c" is false. > >> > >>With this patch "netperf -H localhost" shows boost from 2400 to 2500. > >> > >>Signed-off-by: Konstantin Khlebnikov> > > >Nice performance increase! > > > >The gcc documentation says that __attribute__((pure)) functions are > >supposed to have return values that depend only at the function's > >arguments and the values of global variables. However, it also says: > > > > Interesting non-pure functions are functions with infinite loops > > or those depending on volatile memory or other system resource, > > that may change between two consecutive calls (such as feof in > > a multithreading environment). > > > >This is OK for current->lockdep_recursion because this variable is changed > >only by the current task (I think so, anyway). > > > >It is sort of OK for debug_locks. This could be set to zero at any time > >by any other task, but if we have a race condition that very rarely causes > >two lockdep splats instead of just one, so what? (But I am sure that > >some of the people on CC will correct me if I am wrong here.) > > > >It should be OK for rcu_scheduler_active because the transition from > >RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first > >context switch, and the various barrier() calls, implied as well as > >explicit, should keep things straight. > > > >But I don't totally trust my analysis. Could you please get someone > >more gcc-savvy to review this and give their ack/review? Given that, > >I will queue it. > > > > Thanx, Paul > > Thank you for analisys. > > __attribute__((pure)) allows compiler to deduplicate / eliminate calls. > This indeed might expand race windows when global switches like debug_locks > changes their state. But strict synchronization here isn't required. Any objections to this patch? Thanx, Paul > >>--- > >> include/linux/rcupdate.h |2 +- > >> kernel/rcu/update.c |2 +- > >> 2 files changed, 2 insertions(+), 2 deletions(-) > >> > >>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > >>index e1e5d002fdb9..9ecb3cb715bd 100644 > >>--- a/include/linux/rcupdate.h > >>+++ b/include/linux/rcupdate.h > >>@@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > >> extern struct lockdep_map rcu_bh_lock_map; > >> extern struct lockdep_map rcu_sched_lock_map; > >> extern struct lockdep_map rcu_callback_map; > >>-int debug_lockdep_rcu_enabled(void); > >>+int __pure debug_lockdep_rcu_enabled(void); > >> > >> int rcu_read_lock_held(void); > >> int rcu_read_lock_bh_held(void); > >>diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > >>index 273e869ca21d..a0c30abefdcd 100644 > >>--- a/kernel/rcu/update.c > >>+++ b/kernel/rcu/update.c > >>@@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > >>STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); > >> EXPORT_SYMBOL_GPL(rcu_callback_map); > >> > >>-int notrace debug_lockdep_rcu_enabled(void) > >>+int __pure notrace debug_lockdep_rcu_enabled(void) > >> { > >>return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > >> current->lockdep_recursion == 0; > >> > > >
Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On Sun, May 21, 2017 at 11:51:39AM +0300, Konstantin Khlebnikov wrote: > On 20.05.2017 19:42, Paul E. McKenney wrote: > >On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > >>This allows to get rid of unneeded invocations. > >> > >>Function debug_lockdep_rcu_enabled() becomes really hot if several > >>debug options are enabled together with CONFIG_PROVE_RCU. > >> > >>Hottest path ends with: > >> debug_lockdep_rcu_enabled > >> is_ftrace_trampoline > >> __kernel_text_address > >> > >>Here debug_lockdep_rcu_enabled() is called from condition > >>(debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > >>do_for_each_ftrace_op(), where "c" is false. > >> > >>With this patch "netperf -H localhost" shows boost from 2400 to 2500. > >> > >>Signed-off-by: Konstantin Khlebnikov > > > >Nice performance increase! > > > >The gcc documentation says that __attribute__((pure)) functions are > >supposed to have return values that depend only at the function's > >arguments and the values of global variables. However, it also says: > > > > Interesting non-pure functions are functions with infinite loops > > or those depending on volatile memory or other system resource, > > that may change between two consecutive calls (such as feof in > > a multithreading environment). > > > >This is OK for current->lockdep_recursion because this variable is changed > >only by the current task (I think so, anyway). > > > >It is sort of OK for debug_locks. This could be set to zero at any time > >by any other task, but if we have a race condition that very rarely causes > >two lockdep splats instead of just one, so what? (But I am sure that > >some of the people on CC will correct me if I am wrong here.) > > > >It should be OK for rcu_scheduler_active because the transition from > >RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first > >context switch, and the various barrier() calls, implied as well as > >explicit, should keep things straight. > > > >But I don't totally trust my analysis. Could you please get someone > >more gcc-savvy to review this and give their ack/review? Given that, > >I will queue it. > > > > Thanx, Paul > > Thank you for analisys. > > __attribute__((pure)) allows compiler to deduplicate / eliminate calls. > This indeed might expand race windows when global switches like debug_locks > changes their state. But strict synchronization here isn't required. Any objections to this patch? Thanx, Paul > >>--- > >> include/linux/rcupdate.h |2 +- > >> kernel/rcu/update.c |2 +- > >> 2 files changed, 2 insertions(+), 2 deletions(-) > >> > >>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > >>index e1e5d002fdb9..9ecb3cb715bd 100644 > >>--- a/include/linux/rcupdate.h > >>+++ b/include/linux/rcupdate.h > >>@@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > >> extern struct lockdep_map rcu_bh_lock_map; > >> extern struct lockdep_map rcu_sched_lock_map; > >> extern struct lockdep_map rcu_callback_map; > >>-int debug_lockdep_rcu_enabled(void); > >>+int __pure debug_lockdep_rcu_enabled(void); > >> > >> int rcu_read_lock_held(void); > >> int rcu_read_lock_bh_held(void); > >>diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > >>index 273e869ca21d..a0c30abefdcd 100644 > >>--- a/kernel/rcu/update.c > >>+++ b/kernel/rcu/update.c > >>@@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > >>STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); > >> EXPORT_SYMBOL_GPL(rcu_callback_map); > >> > >>-int notrace debug_lockdep_rcu_enabled(void) > >>+int __pure notrace debug_lockdep_rcu_enabled(void) > >> { > >>return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > >> current->lockdep_recursion == 0; > >> > > >
Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On 20.05.2017 19:42, Paul E. McKenney wrote: On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: This allows to get rid of unneeded invocations. Function debug_lockdep_rcu_enabled() becomes really hot if several debug options are enabled together with CONFIG_PROVE_RCU. Hottest path ends with: debug_lockdep_rcu_enabled is_ftrace_trampoline __kernel_text_address Here debug_lockdep_rcu_enabled() is called from condition (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro do_for_each_ftrace_op(), where "c" is false. With this patch "netperf -H localhost" shows boost from 2400 to 2500. Signed-off-by: Konstantin KhlebnikovNice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul Thank you for analisys. __attribute__((pure)) allows compiler to deduplicate / eliminate calls. This indeed might expand race windows when global switches like debug_locks changes their state. But strict synchronization here isn't required. --- include/linux/rcupdate.h |2 +- kernel/rcu/update.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e1e5d002fdb9..9ecb3cb715bd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; extern struct lockdep_map rcu_bh_lock_map; extern struct lockdep_map rcu_sched_lock_map; extern struct lockdep_map rcu_callback_map; -int debug_lockdep_rcu_enabled(void); +int __pure debug_lockdep_rcu_enabled(void); int rcu_read_lock_held(void); int rcu_read_lock_bh_held(void); diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 273e869ca21d..a0c30abefdcd 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); EXPORT_SYMBOL_GPL(rcu_callback_map); -int notrace debug_lockdep_rcu_enabled(void) +int __pure notrace debug_lockdep_rcu_enabled(void) { return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && current->lockdep_recursion == 0;
Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On 20.05.2017 19:42, Paul E. McKenney wrote: On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: This allows to get rid of unneeded invocations. Function debug_lockdep_rcu_enabled() becomes really hot if several debug options are enabled together with CONFIG_PROVE_RCU. Hottest path ends with: debug_lockdep_rcu_enabled is_ftrace_trampoline __kernel_text_address Here debug_lockdep_rcu_enabled() is called from condition (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro do_for_each_ftrace_op(), where "c" is false. With this patch "netperf -H localhost" shows boost from 2400 to 2500. Signed-off-by: Konstantin Khlebnikov Nice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul Thank you for analisys. __attribute__((pure)) allows compiler to deduplicate / eliminate calls. This indeed might expand race windows when global switches like debug_locks changes their state. But strict synchronization here isn't required. --- include/linux/rcupdate.h |2 +- kernel/rcu/update.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e1e5d002fdb9..9ecb3cb715bd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; extern struct lockdep_map rcu_bh_lock_map; extern struct lockdep_map rcu_sched_lock_map; extern struct lockdep_map rcu_callback_map; -int debug_lockdep_rcu_enabled(void); +int __pure debug_lockdep_rcu_enabled(void); int rcu_read_lock_held(void); int rcu_read_lock_bh_held(void); diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 273e869ca21d..a0c30abefdcd 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); EXPORT_SYMBOL_GPL(rcu_callback_map); -int notrace debug_lockdep_rcu_enabled(void) +int __pure notrace debug_lockdep_rcu_enabled(void) { return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && current->lockdep_recursion == 0;
Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > This allows to get rid of unneeded invocations. > > Function debug_lockdep_rcu_enabled() becomes really hot if several > debug options are enabled together with CONFIG_PROVE_RCU. > > Hottest path ends with: > debug_lockdep_rcu_enabled > is_ftrace_trampoline > __kernel_text_address > > Here debug_lockdep_rcu_enabled() is called from condition > (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > do_for_each_ftrace_op(), where "c" is false. > > With this patch "netperf -H localhost" shows boost from 2400 to 2500. > > Signed-off-by: Konstantin KhlebnikovNice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul > --- > include/linux/rcupdate.h |2 +- > kernel/rcu/update.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index e1e5d002fdb9..9ecb3cb715bd 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > extern struct lockdep_map rcu_bh_lock_map; > extern struct lockdep_map rcu_sched_lock_map; > extern struct lockdep_map rcu_callback_map; > -int debug_lockdep_rcu_enabled(void); > +int __pure debug_lockdep_rcu_enabled(void); > > int rcu_read_lock_held(void); > int rcu_read_lock_bh_held(void); > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 273e869ca21d..a0c30abefdcd 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); > EXPORT_SYMBOL_GPL(rcu_callback_map); > > -int notrace debug_lockdep_rcu_enabled(void) > +int __pure notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; >
Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure
On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > This allows to get rid of unneeded invocations. > > Function debug_lockdep_rcu_enabled() becomes really hot if several > debug options are enabled together with CONFIG_PROVE_RCU. > > Hottest path ends with: > debug_lockdep_rcu_enabled > is_ftrace_trampoline > __kernel_text_address > > Here debug_lockdep_rcu_enabled() is called from condition > (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > do_for_each_ftrace_op(), where "c" is false. > > With this patch "netperf -H localhost" shows boost from 2400 to 2500. > > Signed-off-by: Konstantin Khlebnikov Nice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul > --- > include/linux/rcupdate.h |2 +- > kernel/rcu/update.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index e1e5d002fdb9..9ecb3cb715bd 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > extern struct lockdep_map rcu_bh_lock_map; > extern struct lockdep_map rcu_sched_lock_map; > extern struct lockdep_map rcu_callback_map; > -int debug_lockdep_rcu_enabled(void); > +int __pure debug_lockdep_rcu_enabled(void); > > int rcu_read_lock_held(void); > int rcu_read_lock_bh_held(void); > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 273e869ca21d..a0c30abefdcd 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > STATIC_LOCKDEP_MAP_INIT("rcu_callback", _callback_key); > EXPORT_SYMBOL_GPL(rcu_callback_map); > > -int notrace debug_lockdep_rcu_enabled(void) > +int __pure notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; >