Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Paul E. McKenney
On Mon, Aug 19, 2019 at 11:46:36AM -0400, Joel Fernandes wrote:
> On Mon, Aug 19, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> > > On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) 
> > > > > > wrote:
> > > > > > > I really cannot explain this patch, but without it, the "else if" 
> > > > > > > block
> > > > > > > just doesn't execute thus causing the tick's dep mask to not be 
> > > > > > > set and
> > > > > > > causes the tick to be turned off.
> > > > > > > 
> > > > > > > I tried various _ONCE() macros but the only thing that works is 
> > > > > > > this
> > > > > > > patch.
> > > > > > > 
> > > > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > > > > ---
> > > > > > >  kernel/rcu/tree.c | 3 ++-
> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > > > > rcu_nmi_enter_common(bool irq)
> > > > > > >  {
> > > > > > >   struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > > > >   long incby = 2;
> > > > > > > + int dnn = rdp->dynticks_nmi_nesting;
> > > > > > 
> > > > > > I believe the accidental sign extension / conversion from long to 
> > > > > > int was
> > > > > > giving me an illusion since things started working well. Changing 
> > > > > > the 'int
> > > > > > dnn' to 'long dnn' gives similar behavior as without this patch! At 
> > > > > > least I
> > > > > > know now. Please feel free to ignore this particular RFC patch 
> > > > > > while I debug
> > > > > > this more (over the weekend or early next week). The first 2 
> > > > > > patches are
> > > > > > good, just ignore this one.
> > > > > 
> > > > > Ah, good point on the type!  So you were ending up with zero due to 
> > > > > the
> > > > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > > > it is actually worse then the earlier comparison against the constant 
> > > > > 2.
> > > > > 
> > > > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > > > nohz_full tick on upon irq enter instead of exit").
> > > > 
> > > > I can't find that patch so all I can say so far is that its title 
> > > > doesn't
> > > > inspire me much. Do you still need that change for some reason?
> > > 
> > > No we don't need it. Paul's dev branch fixed it by checking 
> > > DYNTICK_IRQ_NONIDLE:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=227482fd4f3ede0502b586da28a59971dfbac0b0
> > 
> > Ah, so you have tested reverting this?  If so, thank you very much!
> 
> Just tried reverting, and found a bug if done in the reverted way. Sent you
> email with a proposed change which is essentially the top of tree:
> https://github.com/joelagnel/linux-kernel/commits/rcu/nohz-test-3
> 
> Also for Frederick, I wanted to mention why my pure hack above (dnn variable)
> seemed to work. The reason was because of long to int conversion of
> rdp->dynticks_nmi_nesting which I surprisingly did not get a compiler warning
> for. dynticks_nmi_nesting getting converted to int was truncating the
> DYNTICK_IRQ_NONIDLE bit (in fact I believe this was due to the cltq
> instruction in x86). This caused the "else if" condition to always evaluate
> to true and turn off the tick.
> 
> Paul, I wanted to see if I can create a repeatable test case for this issue.
> Not a full blown RCU torture test, but something that one could run and get a
> PASS or FAIL. Do you think this could be useful? And what is the best place
> for such a test?
> Essentially the test would be:
> 1. Run a test and dump some traces.
> 2. Parse the traces and see if things are sane (such as the tick not turning
>off for this issue).
> 3. Report pass or fail.
> 
> The other way instead of parsing traces could be, a kernel module that does
> trace_probe_register on various tracepoints and tries to see if the tick
> indeed could stay turned on. Then report pass/fail at the end of the module's
> execution.

Or you could increment a per-CPU counter in rcu_sched_clock_irq() and use
that to verify the tick.  Maybe you could use the existing ->ticks_this_gp,
though that does get zeroed at the beginning of each grace period, which
would make sampling it a bit trickier.

Thanx, Paul



Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Joel Fernandes
On Mon, Aug 19, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> > On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) 
> > > > > wrote:
> > > > > > I really cannot explain this patch, but without it, the "else if" 
> > > > > > block
> > > > > > just doesn't execute thus causing the tick's dep mask to not be set 
> > > > > > and
> > > > > > causes the tick to be turned off.
> > > > > > 
> > > > > > I tried various _ONCE() macros but the only thing that works is this
> > > > > > patch.
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > > > ---
> > > > > >  kernel/rcu/tree.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > > > rcu_nmi_enter_common(bool irq)
> > > > > >  {
> > > > > > struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > > > long incby = 2;
> > > > > > +   int dnn = rdp->dynticks_nmi_nesting;
> > > > > 
> > > > > I believe the accidental sign extension / conversion from long to int 
> > > > > was
> > > > > giving me an illusion since things started working well. Changing the 
> > > > > 'int
> > > > > dnn' to 'long dnn' gives similar behavior as without this patch! At 
> > > > > least I
> > > > > know now. Please feel free to ignore this particular RFC patch while 
> > > > > I debug
> > > > > this more (over the weekend or early next week). The first 2 patches 
> > > > > are
> > > > > good, just ignore this one.
> > > > 
> > > > Ah, good point on the type!  So you were ending up with zero due to the
> > > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > > it is actually worse then the earlier comparison against the constant 2.
> > > > 
> > > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > > nohz_full tick on upon irq enter instead of exit").
> > > 
> > > I can't find that patch so all I can say so far is that its title doesn't
> > > inspire me much. Do you still need that change for some reason?
> > 
> > No we don't need it. Paul's dev branch fixed it by checking 
> > DYNTICK_IRQ_NONIDLE:
> > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=227482fd4f3ede0502b586da28a59971dfbac0b0
> 
> Ah, so you have tested reverting this?  If so, thank you very much!

Just tried reverting, and found a bug if done in the reverted way. Sent you
email with a proposed change which is essentially the top of tree:
https://github.com/joelagnel/linux-kernel/commits/rcu/nohz-test-3

Also for Frederick, I wanted to mention why my pure hack above (dnn variable)
seemed to work. The reason was because of long to int conversion of
rdp->dynticks_nmi_nesting which I surprisingly did not get a compiler warning
for. dynticks_nmi_nesting getting converted to int was truncating the
DYNTICK_IRQ_NONIDLE bit (in fact I believe this was due to the cltq
instruction in x86). This caused the "else if" condition to always evaluate
to true and turn off the tick.

Paul, I wanted to see if I can create a repeatable test case for this issue.
Not a full blown RCU torture test, but something that one could run and get a
PASS or FAIL. Do you think this could be useful? And what is the best place
for such a test?
Essentially the test would be:
1. Run a test and dump some traces.
2. Parse the traces and see if things are sane (such as the tick not turning
   off for this issue).
3. Report pass or fail.

The other way instead of parsing traces could be, a kernel module that does
trace_probe_register on various tracepoints and tries to see if the tick
indeed could stay turned on. Then report pass/fail at the end of the module's
execution.

thanks,

 - Joel



Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Paul E. McKenney
On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > > I really cannot explain this patch, but without it, the "else if" 
> > > > > block
> > > > > just doesn't execute thus causing the tick's dep mask to not be set 
> > > > > and
> > > > > causes the tick to be turned off.
> > > > > 
> > > > > I tried various _ONCE() macros but the only thing that works is this
> > > > > patch.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > > ---
> > > > >  kernel/rcu/tree.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > > rcu_nmi_enter_common(bool irq)
> > > > >  {
> > > > >   struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > >   long incby = 2;
> > > > > + int dnn = rdp->dynticks_nmi_nesting;
> > > > 
> > > > I believe the accidental sign extension / conversion from long to int 
> > > > was
> > > > giving me an illusion since things started working well. Changing the 
> > > > 'int
> > > > dnn' to 'long dnn' gives similar behavior as without this patch! At 
> > > > least I
> > > > know now. Please feel free to ignore this particular RFC patch while I 
> > > > debug
> > > > this more (over the weekend or early next week). The first 2 patches are
> > > > good, just ignore this one.
> > > 
> > > Ah, good point on the type!  So you were ending up with zero due to the
> > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > it is actually worse then the earlier comparison against the constant 2.
> > > 
> > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > nohz_full tick on upon irq enter instead of exit").
> > 
> > I can't find that patch so all I can say so far is that its title doesn't
> > inspire me much. Do you still need that change for some reason?
> 
> No we don't need it. Paul's dev branch fixed it by checking 
> DYNTICK_IRQ_NONIDLE:
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=227482fd4f3ede0502b586da28a59971dfbac0b0

Ah, so you have tested reverting this?  If so, thank you very much!

Thanx, Paul


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Paul E. McKenney
On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) 
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > rcu_nmi_enter_common(bool irq)
> > > >  {
> > > > struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > long incby = 2;
> > > > +   int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least 
> > > I
> > > know now. Please feel free to ignore this particular RFC patch while I 
> > > debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I can't find that patch so all I can say so far is that its title doesn't
> inspire me much. Do you still need that change for some reason?

It is in -rcu branch dev, but has been rebased.  The current version
is 227482fd4f3e ("rcu: Force nohz_full tick on upon irq enter instead
of exit").

It is not yet clear to me whether this is needed or not.  I -think- that
it is not, but without it, it is possible that some chain of events would
result in the rcu_data structure's ->rcu_urgent_qs field being cleared
before the interrupt-exit code could sample it, which might possibly
result in the tick remaining off.

Thanx, Paul


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Joel Fernandes
On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) 
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > rcu_nmi_enter_common(bool irq)
> > > >  {
> > > > struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > long incby = 2;
> > > > +   int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least 
> > > I
> > > know now. Please feel free to ignore this particular RFC patch while I 
> > > debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I can't find that patch so all I can say so far is that its title doesn't
> inspire me much. Do you still need that change for some reason?

No we don't need it. Paul's dev branch fixed it by checking DYNTICK_IRQ_NONIDLE:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=227482fd4f3ede0502b586da28a59971dfbac0b0

thanks,

 - Joel



Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-19 Thread Frederic Weisbecker
On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > I really cannot explain this patch, but without it, the "else if" block
> > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > causes the tick to be turned off.
> > > 
> > > I tried various _ONCE() macros but the only thing that works is this
> > > patch.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) 
> > > ---
> > >  kernel/rcu/tree.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > > irq)
> > >  {
> > >   struct rcu_data *rdp = this_cpu_ptr(_data);
> > >   long incby = 2;
> > > + int dnn = rdp->dynticks_nmi_nesting;
> > 
> > I believe the accidental sign extension / conversion from long to int was
> > giving me an illusion since things started working well. Changing the 'int
> > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > know now. Please feel free to ignore this particular RFC patch while I debug
> > this more (over the weekend or early next week). The first 2 patches are
> > good, just ignore this one.
> 
> Ah, good point on the type!  So you were ending up with zero due to the
> low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> it is actually worse then the earlier comparison against the constant 2.
> 
> Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> nohz_full tick on upon irq enter instead of exit").

I can't find that patch so all I can say so far is that its title doesn't
inspire me much. Do you still need that change for some reason?

Thanks.


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-16 Thread Paul E. McKenney
On Fri, Aug 16, 2019 at 01:07:00PM -0400, Joel Fernandes wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) 
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void 
> > > > rcu_nmi_enter_common(bool irq)
> > > >  {
> > > > struct rcu_data *rdp = this_cpu_ptr(_data);
> > > > long incby = 2;
> > > > +   int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least 
> > > I
> > > know now. Please feel free to ignore this particular RFC patch while I 
> > > debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I think just using doing " == DYNTICK_IRQ_NONIDLE" as you mentioned should
> make it work. I'll test that soon, thanks!

My first step is indeed to add "== DYNTICK_IRQ_NONIDLE".

> I would prefer not to revert that commit, and just make the above change.
> Just because I feel this is safer. Since the tick is turned off in the IRQ
> exit path, I am a bit worried about timing (does the tick turn off before RCU
> sees the IRQ exit, or after it?). Either way, doing it on IRQ entry makes the
> question irrelevant and immune to future changes in the timing.

Well, comparing to 0x2 is probably cheaper than comparing to
0x4000 on most architectures.  Probably not a really big
deal, but this is after all the interrupt-entry fastpath.  Or is the
compiler trickier than I am giving it credit for?  (Still, seems like
comparing to 0x2 is one small instruction and to 0x4000 is
one big instruction at the very least, and probably several instructions
on many architectures.)

But to your point, if it is absolutely necessary to turn on the tick
at interrupt entry, then the larger comparison cannot be avoided.

> Would you think the check for the nesting variable is more expensive to do on
> IRQ entry than exit? If so, we could discuss doing it in the exit path,
> otherwise we could doing on entry with just the above change in the equality
> condition.

Yes, but let's see what is possible.  ;-)

Thanx, Paul


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-16 Thread Joel Fernandes
On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > I really cannot explain this patch, but without it, the "else if" block
> > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > causes the tick to be turned off.
> > > 
> > > I tried various _ONCE() macros but the only thing that works is this
> > > patch.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) 
> > > ---
> > >  kernel/rcu/tree.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > > irq)
> > >  {
> > >   struct rcu_data *rdp = this_cpu_ptr(_data);
> > >   long incby = 2;
> > > + int dnn = rdp->dynticks_nmi_nesting;
> > 
> > I believe the accidental sign extension / conversion from long to int was
> > giving me an illusion since things started working well. Changing the 'int
> > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > know now. Please feel free to ignore this particular RFC patch while I debug
> > this more (over the weekend or early next week). The first 2 patches are
> > good, just ignore this one.
> 
> Ah, good point on the type!  So you were ending up with zero due to the
> low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> it is actually worse then the earlier comparison against the constant 2.
> 
> Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> nohz_full tick on upon irq enter instead of exit").

I think just using doing " == DYNTICK_IRQ_NONIDLE" as you mentioned should
make it work. I'll test that soon, thanks!

I would prefer not to revert that commit, and just make the above change.
Just because I feel this is safer. Since the tick is turned off in the IRQ
exit path, I am a bit worried about timing (does the tick turn off before RCU
sees the IRQ exit, or after it?). Either way, doing it on IRQ entry makes the
question irrelevant and immune to future changes in the timing.

Would you think the check for the nesting variable is more expensive to do on
IRQ entry than exit? If so, we could discuss doing it in the exit path,
otherwise we could doing on entry with just the above change in the equality
condition.

thanks,

 - Joel

> 
>   Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > >  
> > >   /* Complain about underflow. */
> > >   WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> > > @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > > irq)
> > >  
> > >   incby = 1;
> > >   } else if (tick_nohz_full_cpu(rdp->cpu) &&
> > > -!rdp->dynticks_nmi_nesting &&
> > > +!dnn &&
> > >  rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > >   rdp->rcu_forced_tick = true;
> > >   tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > > -- 
> > > 2.23.0.rc1.153.gdeed80330f-goog
> > > 
> > 


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-16 Thread Paul E. McKenney
On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > I really cannot explain this patch, but without it, the "else if" block
> > just doesn't execute thus causing the tick's dep mask to not be set and
> > causes the tick to be turned off.
> > 
> > I tried various _ONCE() macros but the only thing that works is this
> > patch.
> > 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> >  kernel/rcu/tree.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 856d3c9f1955..ac6bcf7614d7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > irq)
> >  {
> > struct rcu_data *rdp = this_cpu_ptr(_data);
> > long incby = 2;
> > +   int dnn = rdp->dynticks_nmi_nesting;
> 
> I believe the accidental sign extension / conversion from long to int was
> giving me an illusion since things started working well. Changing the 'int
> dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> know now. Please feel free to ignore this particular RFC patch while I debug
> this more (over the weekend or early next week). The first 2 patches are
> good, just ignore this one.

Ah, good point on the type!  So you were ending up with zero due to the
low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
the "!rdp->dynticks_nmi_nesting" instead needs to be something like
"rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
it is actually worse then the earlier comparison against the constant 2.

Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
nohz_full tick on upon irq enter instead of exit").

Or would that once again cause RCU to fail to enable the tick?

Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> >  
> > /* Complain about underflow. */
> > WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> > @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > irq)
> >  
> > incby = 1;
> > } else if (tick_nohz_full_cpu(rdp->cpu) &&
> > -  !rdp->dynticks_nmi_nesting &&
> > +  !dnn &&
> >rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > rdp->rcu_forced_tick = true;
> > tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > -- 
> > 2.23.0.rc1.153.gdeed80330f-goog
> > 
> 


Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance

2019-08-16 Thread Joel Fernandes
On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> I really cannot explain this patch, but without it, the "else if" block
> just doesn't execute thus causing the tick's dep mask to not be set and
> causes the tick to be turned off.
> 
> I tried various _ONCE() macros but the only thing that works is this
> patch.
> 
> Signed-off-by: Joel Fernandes (Google) 
> ---
>  kernel/rcu/tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 856d3c9f1955..ac6bcf7614d7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  {
>   struct rcu_data *rdp = this_cpu_ptr(_data);
>   long incby = 2;
> + int dnn = rdp->dynticks_nmi_nesting;

I believe the accidental sign extension / conversion from long to int was
giving me an illusion since things started working well. Changing the 'int
dnn' to 'long dnn' gives similar behavior as without this patch! At least I
know now. Please feel free to ignore this particular RFC patch while I debug
this more (over the weekend or early next week). The first 2 patches are
good, just ignore this one.

thanks,

 - Joel


>  
>   /* Complain about underflow. */
>   WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  
>   incby = 1;
>   } else if (tick_nohz_full_cpu(rdp->cpu) &&
> -!rdp->dynticks_nmi_nesting &&
> +!dnn &&
>  rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
>   rdp->rcu_forced_tick = true;
>   tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
>