Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 09:23:52PM +0100, Sebastian Andrzej Siewior wrote:
> Okay, this makes sense. What looked odd was the powerpc implementation
> where they let the timer interrupt expire and call the irq_work from
> the timer interrupt just before the clockevents callback is executed
> (which would invoke the irq_work callback as well).

Ah, so what I remember Paul Mackerras saying was that that was the
easiest way to trigger an interrupt on the local CPU.

ARM runs the things from every IRQ tail IIRC (or maybe only the PMI, I
forget).

> Would it be a win if we would remove the arch specific code and instead
> raise the timer interrupt asap? It sure won't be a win or change for
> -RT but it would allow all architectures to get irq_work done as soon
> as possible in IRQ context (and out of NMI context if I understand
> correct) without an arch specific implementation.

No, because poking at timer hardware on x86 is stupid expensive, whereas
sending a self-IPI through the APIC is tons cheaper.

Also, I don't really want to poke at the fragile x86 timer hardware from
NMI context while we might already be poking at it from another context.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 09:05 PM, Peter Zijlstra wrote:
> On Fri, Jan 31, 2014 at 08:48:45PM +0100, Sebastian Andrzej Siewior wrote:
>>
>> How "bad" is it? Is this something generic or just not getting
>> perf events fast enough out? Most users don't seem to require small
>> latencies.
> 
> I have vague memories of there being an actual perf problem if there's a
> hole between the NMI/IRQ triggering the irq_work and the interrupt
> running the work.
> 
> I should have some notes on it somewhere and an todo entry to plug the
> hole.
> 
> But note that the MCE code also uses irq_work, they really _need_ to be
> fast because the system might be crumbling under their feet.

Okay, this makes sense. What looked odd was the powerpc implementation
where they let the timer interrupt expire and call the irq_work from
the timer interrupt just before the clockevents callback is executed
(which would invoke the irq_work callback as well).

Would it be a win if we would remove the arch specific code and instead
raise the timer interrupt asap? It sure won't be a win or change for
-RT but it would allow all architectures to get irq_work done as soon
as possible in IRQ context (and out of NMI context if I understand
correct) without an arch specific implementation.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 08:48:45PM +0100, Sebastian Andrzej Siewior wrote:
> 
> How "bad" is it? Is this something generic or just not getting
> perf events fast enough out? Most users don't seem to require small
> latencies.

I have vague memories of there being an actual perf problem if there's a
hole between the NMI/IRQ triggering the irq_work and the interrupt
running the work.

I should have some notes on it somewhere and an todo entry to plug the
hole.

But note that the MCE code also uses irq_work, they really _need_ to be
fast because the system might be crumbling under their feet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 20:48:45 +0100
Sebastian Andrzej Siewior  wrote:

> > As I have worked on code that uses irq_work() I can say that we want
> > the arch specific interrupts. For those architectures that don't have
> > it will experience larger latencies for the work required. It's
> > basically, a "too bad" for them.
> 
> How "bad" is it? Is this something generic or just not getting
> perf events fast enough out? Most users don't seem to require small
> latencies.

I use it for waking up trace-cmd, and if it is too long, then it we can
miss lots of events. Same goes for perf.

Remember, irq_work can be triggered from NMI context. That means, if
the CPU is idle, it may be several seconds before that work happens.
That is WAY too long to wait.

Anyway, your suggestion to get rid of the arch code doesn't help. We
call the irq_work_run() from interrupt context whether there is work or
not, with or without removing the arch code. The only thing your
suggestion will do is to push the work from happening immediately to
happening on the timer tick (which may be several seconds later). I
don't see the savings.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:34:41PM -0500, Steven Rostedt wrote:
> On Fri, 31 Jan 2014 20:26:54 +0100
> Sebastian Andrzej Siewior  wrote:
> 
> > On 01/31/2014 06:57 PM, Steven Rostedt wrote:
> > 
> > > In vanilla Linux, irq_work_run() is called from update_process_times()
> > > when it is called from the timer interrupt. In -rt, there's reasons we
> > 
> > and in vanilla Linux some architectures (like x86 or sparc to name just
> > a few) overwrite arch_irq_work_raise() which means they provide
> > their "own" interrupt like callback. That means on those architectures
> > irq_work_run() gets invoked twice: once via update_process_times() and
> > via and once the custom interface.
> > So my question to the original inventor of this code: Peter, do we
> > really need that arch specific callback? Wouldn't one be enough? Is it
> > that critical that it can't wait to the next timer tick?
> 
> There's flags that determine when the next call should be invoked. The
> irq_work_run() should return immediately if it was already done by the
> arch specific call. The work wont be called twice.
> 
> As I have worked on code that uses irq_work() I can say that we want
> the arch specific interrupts. For those architectures that don't have
> it will experience larger latencies for the work required. It's
> basically, a "too bad" for them.
> 
> But to answer your question, no we want the immediate response.

Yah, what Steve said. That fallback is really a you suck to have to use
this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 08:34 PM, Steven Rostedt wrote:
> There's flags that determine when the next call should be invoked. The
> irq_work_run() should return immediately if it was already done by the
> arch specific call. The work wont be called twice.

Well, it is called twice. It just does nothing because the list is
empty & returns.

> As I have worked on code that uses irq_work() I can say that we want
> the arch specific interrupts. For those architectures that don't have
> it will experience larger latencies for the work required. It's
> basically, a "too bad" for them.

How "bad" is it? Is this something generic or just not getting
perf events fast enough out? Most users don't seem to require small
latencies.

> But to answer your question, no we want the immediate response.
> 
> -- Steve

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 20:26:54 +0100
Sebastian Andrzej Siewior  wrote:

> On 01/31/2014 06:57 PM, Steven Rostedt wrote:
> 
> > In vanilla Linux, irq_work_run() is called from update_process_times()
> > when it is called from the timer interrupt. In -rt, there's reasons we
> 
> and in vanilla Linux some architectures (like x86 or sparc to name just
> a few) overwrite arch_irq_work_raise() which means they provide
> their "own" interrupt like callback. That means on those architectures
> irq_work_run() gets invoked twice: once via update_process_times() and
> via and once the custom interface.
> So my question to the original inventor of this code: Peter, do we
> really need that arch specific callback? Wouldn't one be enough? Is it
> that critical that it can't wait to the next timer tick?

There's flags that determine when the next call should be invoked. The
irq_work_run() should return immediately if it was already done by the
arch specific call. The work wont be called twice.

As I have worked on code that uses irq_work() I can say that we want
the arch specific interrupts. For those architectures that don't have
it will experience larger latencies for the work required. It's
basically, a "too bad" for them.

But to answer your question, no we want the immediate response.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 06:57 PM, Steven Rostedt wrote:

> In vanilla Linux, irq_work_run() is called from update_process_times()
> when it is called from the timer interrupt. In -rt, there's reasons we

and in vanilla Linux some architectures (like x86 or sparc to name just
a few) overwrite arch_irq_work_raise() which means they provide
their "own" interrupt like callback. That means on those architectures
irq_work_run() gets invoked twice: once via update_process_times() and
via and once the custom interface.
So my question to the original inventor of this code: Peter, do we
really need that arch specific callback? Wouldn't one be enough? Is it
that critical that it can't wait to the next timer tick?

Sebastian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 06:07 PM, Steven Rostedt wrote:
> The bug that I found is that if there *are* active timers, but they
> have not expired yet. Why is this a problem? Because in that case we do

Argh, right. But your patch looks also way better. After I made I was
not too happy about that amount of ifdef and wanted to redo it later.
What you just posted is a way better solution.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Paul E. McKenney
On Fri, Jan 31, 2014 at 12:57:19PM -0500, Steven Rostedt wrote:
> On Fri, 31 Jan 2014 09:42:27 -0800
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
> > > On Fri, 31 Jan 2014 15:34:05 +0100
> > > Sebastian Andrzej Siewior  wrote:
> > > 
> > > > from looking at the code, it seems that the softirq is only raised (in
> > > > the !base->active_timers case) if we have also an expired timer
> > > > (time_before_eq() is true). This patch ensures that the timer softirq is
> > > > also raised in the !base->active_timers && no timer expired.
> > > 
> > > A couple of things. If there is no active timers, we do not need to
> > > check the expired timers. That may contain a deferred timer that does
> > > not need to be raised if the system is idle. This will just
> > > re-introduce the problems that other people have been seeing.
> > > 
> > > The bug that I found is that if there *are* active timers, but they
> > > have not expired yet. Why is this a problem? Because in that case we do
> > > not check if there is irq_work to be done. That means the irq_work will
> > > have to wait till the timer expires, and since RCU depends on this,
> > > that can take a while. I've had a synchronize_sched() take up to 5
> > > seconds to complete due to this!
> > > 
> > > 
> > > The real fix is the following:
> > > 
> > > timer/rt: Always raise the softirq if there's irq_work to be done
> > > 
> > > It was previously discovered that some systems would hang on boot up
> > > with a previous version of 3.12-rt. This was due to RCU using irq_work,
> > > and RT defers the irq_work to a softirq. But if there's no active
> > > timers, the softirq will not be raised, and RCU work will not get done,
> > > causing the system to hang.  The fix was to check that if there was no
> > > active timers but irq_work to be done, then we should raise the softirq.
> > > 
> > > But this fix was not 100% correct. It left out the case that there were
> > > active timers that were not expired yet. This would have the softirq
> > > not get raised even if there was irq work to be done.
> > > 
> > > If there is irq_work to be done, then we must raise the timer softirq
> > > regardless of if there is active timers or whether they are expired or
> > > not. The softirq can handle those cases. But we can never ignore
> > > irq_work.
> > > 
> > > As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
> > > softirq, we can pull out the check in the active_timers condition, and
> > > make the code a bit cleaner by having the irq_work check separate, and
> > > put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
> > > to be done, there's no need to check the active timers or if they are
> > > expired. Just raise the time softirq and be done with it. Otherwise, we
> > > can do the timer checks just like we do with non -rt.
> > > 
> > > Signed-off-by: Steven Rostedt 
> > > 
> > > diff --git a/kernel/timer.c b/kernel/timer.c
> > > index 106968f..426d114 100644
> > > --- a/kernel/timer.c
> > > +++ b/kernel/timer.c
> > > @@ -1461,18 +1461,20 @@ void run_local_timers(void)
> > >* the timer softirq.
> > >*/
> > >  #ifdef CONFIG_PREEMPT_RT_FULL
> > > + /* On RT, irq work runs from softirq */
> > > + if (irq_work_needs_cpu()) {
> > > + raise_softirq(TIMER_SOFTIRQ);
> > 
> > OK, I'll bite...  What if the IRQ work that needs doing is something
> > other than TIMER_SOFTIRQ?
> 
> Heh, don't let the timer part confuse you. The only reason that softirq
> is relevant to irq_work is that is the softirq that we placed the
> irq_work to be done. If you look at the code that is called for that
> softirq (in -rt) you'll see:
> 
> static void run_timer_softirq(struct softirq_action *h)
> {
>   struct tvec_base *base = __this_cpu_read(tvec_bases);
> 
> #if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
>   irq_work_run();
> #endif
> 
>   if (time_after_eq(jiffies, base->timer_jiffies))
>   __run_timers(base);
> }
> 
> And we also have:
> 
> void update_process_times(int user_tick)
> {
>   struct task_struct *p = current;
>   int cpu = smp_processor_id();
> 
>   /* Note: this timer irq context must be accounted for as well. */
>   account_process_tick(p, user_tick);
>   scheduler_tick();
>   run_local_timers();
>   rcu_check_callbacks(cpu, user_tick);
> #if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
>   if (in_irq())
>   irq_work_run();
> #endif
>   run_posix_cpu_timers(p);
> }
> 
> 
> In vanilla Linux, irq_work_run() is called from update_process_times()
> when it is called from the timer interrupt. In -rt, there's reasons we
> can't do the irq work from hard irq, so we push it off to the timer
> softirq, and run it there.
> 
> That means if we have *any* irq work to do, we raise the timer softirq,
> even if the work to be done has nothing to do with timers. As you can
> see from the 

Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 09:42:27 -0800
"Paul E. McKenney"  wrote:

> On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
> > On Fri, 31 Jan 2014 15:34:05 +0100
> > Sebastian Andrzej Siewior  wrote:
> > 
> > > from looking at the code, it seems that the softirq is only raised (in
> > > the !base->active_timers case) if we have also an expired timer
> > > (time_before_eq() is true). This patch ensures that the timer softirq is
> > > also raised in the !base->active_timers && no timer expired.
> > 
> > A couple of things. If there is no active timers, we do not need to
> > check the expired timers. That may contain a deferred timer that does
> > not need to be raised if the system is idle. This will just
> > re-introduce the problems that other people have been seeing.
> > 
> > The bug that I found is that if there *are* active timers, but they
> > have not expired yet. Why is this a problem? Because in that case we do
> > not check if there is irq_work to be done. That means the irq_work will
> > have to wait till the timer expires, and since RCU depends on this,
> > that can take a while. I've had a synchronize_sched() take up to 5
> > seconds to complete due to this!
> > 
> > 
> > The real fix is the following:
> > 
> > timer/rt: Always raise the softirq if there's irq_work to be done
> > 
> > It was previously discovered that some systems would hang on boot up
> > with a previous version of 3.12-rt. This was due to RCU using irq_work,
> > and RT defers the irq_work to a softirq. But if there's no active
> > timers, the softirq will not be raised, and RCU work will not get done,
> > causing the system to hang.  The fix was to check that if there was no
> > active timers but irq_work to be done, then we should raise the softirq.
> > 
> > But this fix was not 100% correct. It left out the case that there were
> > active timers that were not expired yet. This would have the softirq
> > not get raised even if there was irq work to be done.
> > 
> > If there is irq_work to be done, then we must raise the timer softirq
> > regardless of if there is active timers or whether they are expired or
> > not. The softirq can handle those cases. But we can never ignore
> > irq_work.
> > 
> > As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
> > softirq, we can pull out the check in the active_timers condition, and
> > make the code a bit cleaner by having the irq_work check separate, and
> > put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
> > to be done, there's no need to check the active timers or if they are
> > expired. Just raise the time softirq and be done with it. Otherwise, we
> > can do the timer checks just like we do with non -rt.
> > 
> > Signed-off-by: Steven Rostedt 
> > 
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index 106968f..426d114 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1461,18 +1461,20 @@ void run_local_timers(void)
> >  * the timer softirq.
> >  */
> >  #ifdef CONFIG_PREEMPT_RT_FULL
> > +   /* On RT, irq work runs from softirq */
> > +   if (irq_work_needs_cpu()) {
> > +   raise_softirq(TIMER_SOFTIRQ);
> 
> OK, I'll bite...  What if the IRQ work that needs doing is something
> other than TIMER_SOFTIRQ?

Heh, don't let the timer part confuse you. The only reason that softirq
is relevant to irq_work is that is the softirq that we placed the
irq_work to be done. If you look at the code that is called for that
softirq (in -rt) you'll see:

static void run_timer_softirq(struct softirq_action *h)
{
struct tvec_base *base = __this_cpu_read(tvec_bases);

#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
irq_work_run();
#endif

if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
}

And we also have:

void update_process_times(int user_tick)
{
struct task_struct *p = current;
int cpu = smp_processor_id();

/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
scheduler_tick();
run_local_timers();
rcu_check_callbacks(cpu, user_tick);
#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
if (in_irq())
irq_work_run();
#endif
run_posix_cpu_timers(p);
}


In vanilla Linux, irq_work_run() is called from update_process_times()
when it is called from the timer interrupt. In -rt, there's reasons we
can't do the irq work from hard irq, so we push it off to the timer
softirq, and run it there.

That means if we have *any* irq work to do, we raise the timer softirq,
even if the work to be done has nothing to do with timers. As you can
see from the softirq timer code, in -rt, irq_work_run() is always
called, without having to look at any timers.

-- Steve



> 
>   Thanx, Paul
> 
> > +   return;
> > +   }
> > +
> > if 

Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Paul E. McKenney
On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
> On Fri, 31 Jan 2014 15:34:05 +0100
> Sebastian Andrzej Siewior  wrote:
> 
> > from looking at the code, it seems that the softirq is only raised (in
> > the !base->active_timers case) if we have also an expired timer
> > (time_before_eq() is true). This patch ensures that the timer softirq is
> > also raised in the !base->active_timers && no timer expired.
> 
> A couple of things. If there is no active timers, we do not need to
> check the expired timers. That may contain a deferred timer that does
> not need to be raised if the system is idle. This will just
> re-introduce the problems that other people have been seeing.
> 
> The bug that I found is that if there *are* active timers, but they
> have not expired yet. Why is this a problem? Because in that case we do
> not check if there is irq_work to be done. That means the irq_work will
> have to wait till the timer expires, and since RCU depends on this,
> that can take a while. I've had a synchronize_sched() take up to 5
> seconds to complete due to this!
> 
> 
> The real fix is the following:
> 
> timer/rt: Always raise the softirq if there's irq_work to be done
> 
> It was previously discovered that some systems would hang on boot up
> with a previous version of 3.12-rt. This was due to RCU using irq_work,
> and RT defers the irq_work to a softirq. But if there's no active
> timers, the softirq will not be raised, and RCU work will not get done,
> causing the system to hang.  The fix was to check that if there was no
> active timers but irq_work to be done, then we should raise the softirq.
> 
> But this fix was not 100% correct. It left out the case that there were
> active timers that were not expired yet. This would have the softirq
> not get raised even if there was irq work to be done.
> 
> If there is irq_work to be done, then we must raise the timer softirq
> regardless of if there is active timers or whether they are expired or
> not. The softirq can handle those cases. But we can never ignore
> irq_work.
> 
> As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
> softirq, we can pull out the check in the active_timers condition, and
> make the code a bit cleaner by having the irq_work check separate, and
> put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
> to be done, there's no need to check the active timers or if they are
> expired. Just raise the time softirq and be done with it. Otherwise, we
> can do the timer checks just like we do with non -rt.
> 
> Signed-off-by: Steven Rostedt 
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 106968f..426d114 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1461,18 +1461,20 @@ void run_local_timers(void)
>* the timer softirq.
>*/
>  #ifdef CONFIG_PREEMPT_RT_FULL
> + /* On RT, irq work runs from softirq */
> + if (irq_work_needs_cpu()) {
> + raise_softirq(TIMER_SOFTIRQ);

OK, I'll bite...  What if the IRQ work that needs doing is something
other than TIMER_SOFTIRQ?

Thanx, Paul

> + return;
> + }
> +
>   if (!spin_do_trylock(>lock)) {
>   raise_softirq(TIMER_SOFTIRQ);
>   return;
>   }
>  #endif
> - if (!base->active_timers) {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> - /* On RT, irq work runs from softirq */
> - if (!irq_work_needs_cpu())
> -#endif
> - goto out;
> - }
> +
> + if (!base->active_timers)
> + goto out;
> 
>   /* Check whether the next pending timer has expired */
>   if (time_before_eq(base->next_timer, jiffies))
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 12:07:57 -0500
Steven Rostedt  wrote:
 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 106968f..426d114 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1461,18 +1461,20 @@ void run_local_timers(void)
>* the timer softirq.
>*/
>  #ifdef CONFIG_PREEMPT_RT_FULL
> + /* On RT, irq work runs from softirq */
> + if (irq_work_needs_cpu()) {
> + raise_softirq(TIMER_SOFTIRQ);
> + return;
> + }
> +
>   if (!spin_do_trylock(>lock)) {
>   raise_softirq(TIMER_SOFTIRQ);
>   return;
>   }
>  #endif

Note, I debated about doing:

if (irq_work_needs_cpu() ||
!spin_do_trylock(>lock)) {
raise_softirq(TIMER_SOFTIRQ);
return;
}

instead, which is pretty much the same code. But I find it rather ugly,
and does not read as well. I haven't looked at the disassembly, but I
would hope that gcc would make my original version have the same result
as this one.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 15:34:05 +0100
Sebastian Andrzej Siewior  wrote:

> from looking at the code, it seems that the softirq is only raised (in
> the !base->active_timers case) if we have also an expired timer
> (time_before_eq() is true). This patch ensures that the timer softirq is
> also raised in the !base->active_timers && no timer expired.

A couple of things. If there is no active timers, we do not need to
check the expired timers. That may contain a deferred timer that does
not need to be raised if the system is idle. This will just
re-introduce the problems that other people have been seeing.

The bug that I found is that if there *are* active timers, but they
have not expired yet. Why is this a problem? Because in that case we do
not check if there is irq_work to be done. That means the irq_work will
have to wait till the timer expires, and since RCU depends on this,
that can take a while. I've had a synchronize_sched() take up to 5
seconds to complete due to this!


The real fix is the following:

timer/rt: Always raise the softirq if there's irq_work to be done

It was previously discovered that some systems would hang on boot up
with a previous version of 3.12-rt. This was due to RCU using irq_work,
and RT defers the irq_work to a softirq. But if there's no active
timers, the softirq will not be raised, and RCU work will not get done,
causing the system to hang.  The fix was to check that if there was no
active timers but irq_work to be done, then we should raise the softirq.

But this fix was not 100% correct. It left out the case that there were
active timers that were not expired yet. This would have the softirq
not get raised even if there was irq work to be done.

If there is irq_work to be done, then we must raise the timer softirq
regardless of if there is active timers or whether they are expired or
not. The softirq can handle those cases. But we can never ignore
irq_work.

As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
softirq, we can pull out the check in the active_timers condition, and
make the code a bit cleaner by having the irq_work check separate, and
put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
to be done, there's no need to check the active timers or if they are
expired. Just raise the time softirq and be done with it. Otherwise, we
can do the timer checks just like we do with non -rt.

Signed-off-by: Steven Rostedt 

diff --git a/kernel/timer.c b/kernel/timer.c
index 106968f..426d114 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1461,18 +1461,20 @@ void run_local_timers(void)
 * the timer softirq.
 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+   /* On RT, irq work runs from softirq */
+   if (irq_work_needs_cpu()) {
+   raise_softirq(TIMER_SOFTIRQ);
+   return;
+   }
+
if (!spin_do_trylock(>lock)) {
raise_softirq(TIMER_SOFTIRQ);
return;
}
 #endif
-   if (!base->active_timers) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-   /* On RT, irq work runs from softirq */
-   if (!irq_work_needs_cpu())
-#endif
-   goto out;
-   }
+
+   if (!base->active_timers)
+   goto out;
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
from looking at the code, it seems that the softirq is only raised (in
the !base->active_timers case) if we have also an expired timer
(time_before_eq() is true). This patch ensures that the timer softirq is
also raised in the !base->active_timers && no timer expired.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/timer.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 6b3c172..b04c879 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1453,6 +1453,7 @@ static void run_timer_softirq(struct softirq_action *h)
 void run_local_timers(void)
 {
struct tvec_base *base = __this_cpu_read(tvec_bases);
+   bool need_raise = false;
 
hrtimer_run_queues();
/*
@@ -1470,12 +1471,17 @@ void run_local_timers(void)
 #ifdef CONFIG_PREEMPT_RT_FULL
/* On RT, irq work runs from softirq */
if (!irq_work_needs_cpu())
-#endif
goto out;
+   need_raise = true;
+#else
+   goto out;
+#endif
}
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
+   need_raise = true;
+   if (need_raise)
raise_softirq(TIMER_SOFTIRQ);
 out:
 #ifdef CONFIG_PREEMPT_RT_FULL
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 15:34:05 +0100
Sebastian Andrzej Siewior bige...@linutronix.de wrote:

 from looking at the code, it seems that the softirq is only raised (in
 the !base-active_timers case) if we have also an expired timer
 (time_before_eq() is true). This patch ensures that the timer softirq is
 also raised in the !base-active_timers  no timer expired.

A couple of things. If there is no active timers, we do not need to
check the expired timers. That may contain a deferred timer that does
not need to be raised if the system is idle. This will just
re-introduce the problems that other people have been seeing.

The bug that I found is that if there *are* active timers, but they
have not expired yet. Why is this a problem? Because in that case we do
not check if there is irq_work to be done. That means the irq_work will
have to wait till the timer expires, and since RCU depends on this,
that can take a while. I've had a synchronize_sched() take up to 5
seconds to complete due to this!


The real fix is the following:

timer/rt: Always raise the softirq if there's irq_work to be done

It was previously discovered that some systems would hang on boot up
with a previous version of 3.12-rt. This was due to RCU using irq_work,
and RT defers the irq_work to a softirq. But if there's no active
timers, the softirq will not be raised, and RCU work will not get done,
causing the system to hang.  The fix was to check that if there was no
active timers but irq_work to be done, then we should raise the softirq.

But this fix was not 100% correct. It left out the case that there were
active timers that were not expired yet. This would have the softirq
not get raised even if there was irq work to be done.

If there is irq_work to be done, then we must raise the timer softirq
regardless of if there is active timers or whether they are expired or
not. The softirq can handle those cases. But we can never ignore
irq_work.

As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
softirq, we can pull out the check in the active_timers condition, and
make the code a bit cleaner by having the irq_work check separate, and
put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
to be done, there's no need to check the active timers or if they are
expired. Just raise the time softirq and be done with it. Otherwise, we
can do the timer checks just like we do with non -rt.

Signed-off-by: Steven Rostedt rost...@goodmis.org

diff --git a/kernel/timer.c b/kernel/timer.c
index 106968f..426d114 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1461,18 +1461,20 @@ void run_local_timers(void)
 * the timer softirq.
 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+   /* On RT, irq work runs from softirq */
+   if (irq_work_needs_cpu()) {
+   raise_softirq(TIMER_SOFTIRQ);
+   return;
+   }
+
if (!spin_do_trylock(base-lock)) {
raise_softirq(TIMER_SOFTIRQ);
return;
}
 #endif
-   if (!base-active_timers) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-   /* On RT, irq work runs from softirq */
-   if (!irq_work_needs_cpu())
-#endif
-   goto out;
-   }
+
+   if (!base-active_timers)
+   goto out;
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base-next_timer, jiffies))
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 12:07:57 -0500
Steven Rostedt rost...@goodmis.org wrote:
 
 diff --git a/kernel/timer.c b/kernel/timer.c
 index 106968f..426d114 100644
 --- a/kernel/timer.c
 +++ b/kernel/timer.c
 @@ -1461,18 +1461,20 @@ void run_local_timers(void)
* the timer softirq.
*/
  #ifdef CONFIG_PREEMPT_RT_FULL
 + /* On RT, irq work runs from softirq */
 + if (irq_work_needs_cpu()) {
 + raise_softirq(TIMER_SOFTIRQ);
 + return;
 + }
 +
   if (!spin_do_trylock(base-lock)) {
   raise_softirq(TIMER_SOFTIRQ);
   return;
   }
  #endif

Note, I debated about doing:

if (irq_work_needs_cpu() ||
!spin_do_trylock(base-lock)) {
raise_softirq(TIMER_SOFTIRQ);
return;
}

instead, which is pretty much the same code. But I find it rather ugly,
and does not read as well. I haven't looked at the disassembly, but I
would hope that gcc would make my original version have the same result
as this one.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Paul E. McKenney
On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
 On Fri, 31 Jan 2014 15:34:05 +0100
 Sebastian Andrzej Siewior bige...@linutronix.de wrote:
 
  from looking at the code, it seems that the softirq is only raised (in
  the !base-active_timers case) if we have also an expired timer
  (time_before_eq() is true). This patch ensures that the timer softirq is
  also raised in the !base-active_timers  no timer expired.
 
 A couple of things. If there is no active timers, we do not need to
 check the expired timers. That may contain a deferred timer that does
 not need to be raised if the system is idle. This will just
 re-introduce the problems that other people have been seeing.
 
 The bug that I found is that if there *are* active timers, but they
 have not expired yet. Why is this a problem? Because in that case we do
 not check if there is irq_work to be done. That means the irq_work will
 have to wait till the timer expires, and since RCU depends on this,
 that can take a while. I've had a synchronize_sched() take up to 5
 seconds to complete due to this!
 
 
 The real fix is the following:
 
 timer/rt: Always raise the softirq if there's irq_work to be done
 
 It was previously discovered that some systems would hang on boot up
 with a previous version of 3.12-rt. This was due to RCU using irq_work,
 and RT defers the irq_work to a softirq. But if there's no active
 timers, the softirq will not be raised, and RCU work will not get done,
 causing the system to hang.  The fix was to check that if there was no
 active timers but irq_work to be done, then we should raise the softirq.
 
 But this fix was not 100% correct. It left out the case that there were
 active timers that were not expired yet. This would have the softirq
 not get raised even if there was irq work to be done.
 
 If there is irq_work to be done, then we must raise the timer softirq
 regardless of if there is active timers or whether they are expired or
 not. The softirq can handle those cases. But we can never ignore
 irq_work.
 
 As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
 softirq, we can pull out the check in the active_timers condition, and
 make the code a bit cleaner by having the irq_work check separate, and
 put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
 to be done, there's no need to check the active timers or if they are
 expired. Just raise the time softirq and be done with it. Otherwise, we
 can do the timer checks just like we do with non -rt.
 
 Signed-off-by: Steven Rostedt rost...@goodmis.org
 
 diff --git a/kernel/timer.c b/kernel/timer.c
 index 106968f..426d114 100644
 --- a/kernel/timer.c
 +++ b/kernel/timer.c
 @@ -1461,18 +1461,20 @@ void run_local_timers(void)
* the timer softirq.
*/
  #ifdef CONFIG_PREEMPT_RT_FULL
 + /* On RT, irq work runs from softirq */
 + if (irq_work_needs_cpu()) {
 + raise_softirq(TIMER_SOFTIRQ);

OK, I'll bite...  What if the IRQ work that needs doing is something
other than TIMER_SOFTIRQ?

Thanx, Paul

 + return;
 + }
 +
   if (!spin_do_trylock(base-lock)) {
   raise_softirq(TIMER_SOFTIRQ);
   return;
   }
  #endif
 - if (!base-active_timers) {
 -#ifdef CONFIG_PREEMPT_RT_FULL
 - /* On RT, irq work runs from softirq */
 - if (!irq_work_needs_cpu())
 -#endif
 - goto out;
 - }
 +
 + if (!base-active_timers)
 + goto out;
 
   /* Check whether the next pending timer has expired */
   if (time_before_eq(base-next_timer, jiffies))
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 09:42:27 -0800
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
  On Fri, 31 Jan 2014 15:34:05 +0100
  Sebastian Andrzej Siewior bige...@linutronix.de wrote:
  
   from looking at the code, it seems that the softirq is only raised (in
   the !base-active_timers case) if we have also an expired timer
   (time_before_eq() is true). This patch ensures that the timer softirq is
   also raised in the !base-active_timers  no timer expired.
  
  A couple of things. If there is no active timers, we do not need to
  check the expired timers. That may contain a deferred timer that does
  not need to be raised if the system is idle. This will just
  re-introduce the problems that other people have been seeing.
  
  The bug that I found is that if there *are* active timers, but they
  have not expired yet. Why is this a problem? Because in that case we do
  not check if there is irq_work to be done. That means the irq_work will
  have to wait till the timer expires, and since RCU depends on this,
  that can take a while. I've had a synchronize_sched() take up to 5
  seconds to complete due to this!
  
  
  The real fix is the following:
  
  timer/rt: Always raise the softirq if there's irq_work to be done
  
  It was previously discovered that some systems would hang on boot up
  with a previous version of 3.12-rt. This was due to RCU using irq_work,
  and RT defers the irq_work to a softirq. But if there's no active
  timers, the softirq will not be raised, and RCU work will not get done,
  causing the system to hang.  The fix was to check that if there was no
  active timers but irq_work to be done, then we should raise the softirq.
  
  But this fix was not 100% correct. It left out the case that there were
  active timers that were not expired yet. This would have the softirq
  not get raised even if there was irq work to be done.
  
  If there is irq_work to be done, then we must raise the timer softirq
  regardless of if there is active timers or whether they are expired or
  not. The softirq can handle those cases. But we can never ignore
  irq_work.
  
  As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
  softirq, we can pull out the check in the active_timers condition, and
  make the code a bit cleaner by having the irq_work check separate, and
  put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
  to be done, there's no need to check the active timers or if they are
  expired. Just raise the time softirq and be done with it. Otherwise, we
  can do the timer checks just like we do with non -rt.
  
  Signed-off-by: Steven Rostedt rost...@goodmis.org
  
  diff --git a/kernel/timer.c b/kernel/timer.c
  index 106968f..426d114 100644
  --- a/kernel/timer.c
  +++ b/kernel/timer.c
  @@ -1461,18 +1461,20 @@ void run_local_timers(void)
   * the timer softirq.
   */
   #ifdef CONFIG_PREEMPT_RT_FULL
  +   /* On RT, irq work runs from softirq */
  +   if (irq_work_needs_cpu()) {
  +   raise_softirq(TIMER_SOFTIRQ);
 
 OK, I'll bite...  What if the IRQ work that needs doing is something
 other than TIMER_SOFTIRQ?

Heh, don't let the timer part confuse you. The only reason that softirq
is relevant to irq_work is that is the softirq that we placed the
irq_work to be done. If you look at the code that is called for that
softirq (in -rt) you'll see:

static void run_timer_softirq(struct softirq_action *h)
{
struct tvec_base *base = __this_cpu_read(tvec_bases);

#if defined(CONFIG_IRQ_WORK)  defined(CONFIG_PREEMPT_RT_FULL)
irq_work_run();
#endif

if (time_after_eq(jiffies, base-timer_jiffies))
__run_timers(base);
}

And we also have:

void update_process_times(int user_tick)
{
struct task_struct *p = current;
int cpu = smp_processor_id();

/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
scheduler_tick();
run_local_timers();
rcu_check_callbacks(cpu, user_tick);
#if defined(CONFIG_IRQ_WORK)  !defined(CONFIG_PREEMPT_RT_FULL)
if (in_irq())
irq_work_run();
#endif
run_posix_cpu_timers(p);
}


In vanilla Linux, irq_work_run() is called from update_process_times()
when it is called from the timer interrupt. In -rt, there's reasons we
can't do the irq work from hard irq, so we push it off to the timer
softirq, and run it there.

That means if we have *any* irq work to do, we raise the timer softirq,
even if the work to be done has nothing to do with timers. As you can
see from the softirq timer code, in -rt, irq_work_run() is always
called, without having to look at any timers.

-- Steve



 
   Thanx, Paul
 
  +   return;
  +   }
  +
  if (!spin_do_trylock(base-lock)) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the 

Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Paul E. McKenney
On Fri, Jan 31, 2014 at 12:57:19PM -0500, Steven Rostedt wrote:
 On Fri, 31 Jan 2014 09:42:27 -0800
 Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
 
  On Fri, Jan 31, 2014 at 12:07:57PM -0500, Steven Rostedt wrote:
   On Fri, 31 Jan 2014 15:34:05 +0100
   Sebastian Andrzej Siewior bige...@linutronix.de wrote:
   
from looking at the code, it seems that the softirq is only raised (in
the !base-active_timers case) if we have also an expired timer
(time_before_eq() is true). This patch ensures that the timer softirq is
also raised in the !base-active_timers  no timer expired.
   
   A couple of things. If there is no active timers, we do not need to
   check the expired timers. That may contain a deferred timer that does
   not need to be raised if the system is idle. This will just
   re-introduce the problems that other people have been seeing.
   
   The bug that I found is that if there *are* active timers, but they
   have not expired yet. Why is this a problem? Because in that case we do
   not check if there is irq_work to be done. That means the irq_work will
   have to wait till the timer expires, and since RCU depends on this,
   that can take a while. I've had a synchronize_sched() take up to 5
   seconds to complete due to this!
   
   
   The real fix is the following:
   
   timer/rt: Always raise the softirq if there's irq_work to be done
   
   It was previously discovered that some systems would hang on boot up
   with a previous version of 3.12-rt. This was due to RCU using irq_work,
   and RT defers the irq_work to a softirq. But if there's no active
   timers, the softirq will not be raised, and RCU work will not get done,
   causing the system to hang.  The fix was to check that if there was no
   active timers but irq_work to be done, then we should raise the softirq.
   
   But this fix was not 100% correct. It left out the case that there were
   active timers that were not expired yet. This would have the softirq
   not get raised even if there was irq work to be done.
   
   If there is irq_work to be done, then we must raise the timer softirq
   regardless of if there is active timers or whether they are expired or
   not. The softirq can handle those cases. But we can never ignore
   irq_work.
   
   As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
   softirq, we can pull out the check in the active_timers condition, and
   make the code a bit cleaner by having the irq_work check separate, and
   put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
   to be done, there's no need to check the active timers or if they are
   expired. Just raise the time softirq and be done with it. Otherwise, we
   can do the timer checks just like we do with non -rt.
   
   Signed-off-by: Steven Rostedt rost...@goodmis.org
   
   diff --git a/kernel/timer.c b/kernel/timer.c
   index 106968f..426d114 100644
   --- a/kernel/timer.c
   +++ b/kernel/timer.c
   @@ -1461,18 +1461,20 @@ void run_local_timers(void)
  * the timer softirq.
  */
#ifdef CONFIG_PREEMPT_RT_FULL
   + /* On RT, irq work runs from softirq */
   + if (irq_work_needs_cpu()) {
   + raise_softirq(TIMER_SOFTIRQ);
  
  OK, I'll bite...  What if the IRQ work that needs doing is something
  other than TIMER_SOFTIRQ?
 
 Heh, don't let the timer part confuse you. The only reason that softirq
 is relevant to irq_work is that is the softirq that we placed the
 irq_work to be done. If you look at the code that is called for that
 softirq (in -rt) you'll see:
 
 static void run_timer_softirq(struct softirq_action *h)
 {
   struct tvec_base *base = __this_cpu_read(tvec_bases);
 
 #if defined(CONFIG_IRQ_WORK)  defined(CONFIG_PREEMPT_RT_FULL)
   irq_work_run();
 #endif
 
   if (time_after_eq(jiffies, base-timer_jiffies))
   __run_timers(base);
 }
 
 And we also have:
 
 void update_process_times(int user_tick)
 {
   struct task_struct *p = current;
   int cpu = smp_processor_id();
 
   /* Note: this timer irq context must be accounted for as well. */
   account_process_tick(p, user_tick);
   scheduler_tick();
   run_local_timers();
   rcu_check_callbacks(cpu, user_tick);
 #if defined(CONFIG_IRQ_WORK)  !defined(CONFIG_PREEMPT_RT_FULL)
   if (in_irq())
   irq_work_run();
 #endif
   run_posix_cpu_timers(p);
 }
 
 
 In vanilla Linux, irq_work_run() is called from update_process_times()
 when it is called from the timer interrupt. In -rt, there's reasons we
 can't do the irq work from hard irq, so we push it off to the timer
 softirq, and run it there.
 
 That means if we have *any* irq work to do, we raise the timer softirq,
 even if the work to be done has nothing to do with timers. As you can
 see from the softirq timer code, in -rt, irq_work_run() is always
 called, without having to look at any timers.

OK, got it!  Thank you for the tutorial.  ;-)

  

Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 06:07 PM, Steven Rostedt wrote:
 The bug that I found is that if there *are* active timers, but they
 have not expired yet. Why is this a problem? Because in that case we do

Argh, right. But your patch looks also way better. After I made I was
not too happy about that amount of ifdef and wanted to redo it later.
What you just posted is a way better solution.

Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 06:57 PM, Steven Rostedt wrote:

 In vanilla Linux, irq_work_run() is called from update_process_times()
 when it is called from the timer interrupt. In -rt, there's reasons we

and in vanilla Linux some architectures (like x86 or sparc to name just
a few) overwrite arch_irq_work_raise() which means they provide
their own interrupt like callback. That means on those architectures
irq_work_run() gets invoked twice: once via update_process_times() and
via and once the custom interface.
So my question to the original inventor of this code: Peter, do we
really need that arch specific callback? Wouldn't one be enough? Is it
that critical that it can't wait to the next timer tick?

Sebastian

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 20:26:54 +0100
Sebastian Andrzej Siewior bige...@linutronix.de wrote:

 On 01/31/2014 06:57 PM, Steven Rostedt wrote:
 
  In vanilla Linux, irq_work_run() is called from update_process_times()
  when it is called from the timer interrupt. In -rt, there's reasons we
 
 and in vanilla Linux some architectures (like x86 or sparc to name just
 a few) overwrite arch_irq_work_raise() which means they provide
 their own interrupt like callback. That means on those architectures
 irq_work_run() gets invoked twice: once via update_process_times() and
 via and once the custom interface.
 So my question to the original inventor of this code: Peter, do we
 really need that arch specific callback? Wouldn't one be enough? Is it
 that critical that it can't wait to the next timer tick?

There's flags that determine when the next call should be invoked. The
irq_work_run() should return immediately if it was already done by the
arch specific call. The work wont be called twice.

As I have worked on code that uses irq_work() I can say that we want
the arch specific interrupts. For those architectures that don't have
it will experience larger latencies for the work required. It's
basically, a too bad for them.

But to answer your question, no we want the immediate response.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 08:34 PM, Steven Rostedt wrote:
 There's flags that determine when the next call should be invoked. The
 irq_work_run() should return immediately if it was already done by the
 arch specific call. The work wont be called twice.

Well, it is called twice. It just does nothing because the list is
empty  returns.

 As I have worked on code that uses irq_work() I can say that we want
 the arch specific interrupts. For those architectures that don't have
 it will experience larger latencies for the work required. It's
 basically, a too bad for them.

How bad is it? Is this something generic or just not getting
perf events fast enough out? Most users don't seem to require small
latencies.

 But to answer your question, no we want the immediate response.
 
 -- Steve

Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:34:41PM -0500, Steven Rostedt wrote:
 On Fri, 31 Jan 2014 20:26:54 +0100
 Sebastian Andrzej Siewior bige...@linutronix.de wrote:
 
  On 01/31/2014 06:57 PM, Steven Rostedt wrote:
  
   In vanilla Linux, irq_work_run() is called from update_process_times()
   when it is called from the timer interrupt. In -rt, there's reasons we
  
  and in vanilla Linux some architectures (like x86 or sparc to name just
  a few) overwrite arch_irq_work_raise() which means they provide
  their own interrupt like callback. That means on those architectures
  irq_work_run() gets invoked twice: once via update_process_times() and
  via and once the custom interface.
  So my question to the original inventor of this code: Peter, do we
  really need that arch specific callback? Wouldn't one be enough? Is it
  that critical that it can't wait to the next timer tick?
 
 There's flags that determine when the next call should be invoked. The
 irq_work_run() should return immediately if it was already done by the
 arch specific call. The work wont be called twice.
 
 As I have worked on code that uses irq_work() I can say that we want
 the arch specific interrupts. For those architectures that don't have
 it will experience larger latencies for the work required. It's
 basically, a too bad for them.
 
 But to answer your question, no we want the immediate response.

Yah, what Steve said. That fallback is really a you suck to have to use
this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Steven Rostedt
On Fri, 31 Jan 2014 20:48:45 +0100
Sebastian Andrzej Siewior bige...@linutronix.de wrote:

  As I have worked on code that uses irq_work() I can say that we want
  the arch specific interrupts. For those architectures that don't have
  it will experience larger latencies for the work required. It's
  basically, a too bad for them.
 
 How bad is it? Is this something generic or just not getting
 perf events fast enough out? Most users don't seem to require small
 latencies.

I use it for waking up trace-cmd, and if it is too long, then it we can
miss lots of events. Same goes for perf.

Remember, irq_work can be triggered from NMI context. That means, if
the CPU is idle, it may be several seconds before that work happens.
That is WAY too long to wait.

Anyway, your suggestion to get rid of the arch code doesn't help. We
call the irq_work_run() from interrupt context whether there is work or
not, with or without removing the arch code. The only thing your
suggestion will do is to push the work from happening immediately to
happening on the timer tick (which may be several seconds later). I
don't see the savings.

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 08:48:45PM +0100, Sebastian Andrzej Siewior wrote:
 
 How bad is it? Is this something generic or just not getting
 perf events fast enough out? Most users don't seem to require small
 latencies.

I have vague memories of there being an actual perf problem if there's a
hole between the NMI/IRQ triggering the irq_work and the interrupt
running the work.

I should have some notes on it somewhere and an todo entry to plug the
hole.

But note that the MCE code also uses irq_work, they really _need_ to be
fast because the system might be crumbling under their feet.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
On 01/31/2014 09:05 PM, Peter Zijlstra wrote:
 On Fri, Jan 31, 2014 at 08:48:45PM +0100, Sebastian Andrzej Siewior wrote:

 How bad is it? Is this something generic or just not getting
 perf events fast enough out? Most users don't seem to require small
 latencies.
 
 I have vague memories of there being an actual perf problem if there's a
 hole between the NMI/IRQ triggering the irq_work and the interrupt
 running the work.
 
 I should have some notes on it somewhere and an todo entry to plug the
 hole.
 
 But note that the MCE code also uses irq_work, they really _need_ to be
 fast because the system might be crumbling under their feet.

Okay, this makes sense. What looked odd was the powerpc implementation
where they let the timer interrupt expire and call the irq_work from
the timer interrupt just before the clockevents callback is executed
(which would invoke the irq_work callback as well).

Would it be a win if we would remove the arch specific code and instead
raise the timer interrupt asap? It sure won't be a win or change for
-RT but it would allow all architectures to get irq_work done as soon
as possible in IRQ context (and out of NMI context if I understand
correct) without an arch specific implementation.

Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 09:23:52PM +0100, Sebastian Andrzej Siewior wrote:
 Okay, this makes sense. What looked odd was the powerpc implementation
 where they let the timer interrupt expire and call the irq_work from
 the timer interrupt just before the clockevents callback is executed
 (which would invoke the irq_work callback as well).

Ah, so what I remember Paul Mackerras saying was that that was the
easiest way to trigger an interrupt on the local CPU.

ARM runs the things from every IRQ tail IIRC (or maybe only the PMI, I
forget).

 Would it be a win if we would remove the arch specific code and instead
 raise the timer interrupt asap? It sure won't be a win or change for
 -RT but it would allow all architectures to get irq_work done as soon
 as possible in IRQ context (and out of NMI context if I understand
 correct) without an arch specific implementation.

No, because poking at timer hardware on x86 is stupid expensive, whereas
sending a self-IPI through the APIC is tons cheaper.

Also, I don't really want to poke at the fragile x86 timer hardware from
NMI context while we might already be poking at it from another context.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] timer: really raise softirq if there is irq_work to do

2014-01-31 Thread Sebastian Andrzej Siewior
from looking at the code, it seems that the softirq is only raised (in
the !base-active_timers case) if we have also an expired timer
(time_before_eq() is true). This patch ensures that the timer softirq is
also raised in the !base-active_timers  no timer expired.

Signed-off-by: Sebastian Andrzej Siewior bige...@linutronix.de
---
 kernel/timer.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 6b3c172..b04c879 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1453,6 +1453,7 @@ static void run_timer_softirq(struct softirq_action *h)
 void run_local_timers(void)
 {
struct tvec_base *base = __this_cpu_read(tvec_bases);
+   bool need_raise = false;
 
hrtimer_run_queues();
/*
@@ -1470,12 +1471,17 @@ void run_local_timers(void)
 #ifdef CONFIG_PREEMPT_RT_FULL
/* On RT, irq work runs from softirq */
if (!irq_work_needs_cpu())
-#endif
goto out;
+   need_raise = true;
+#else
+   goto out;
+#endif
}
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base-next_timer, jiffies))
+   need_raise = true;
+   if (need_raise)
raise_softirq(TIMER_SOFTIRQ);
 out:
 #ifdef CONFIG_PREEMPT_RT_FULL
-- 
1.8.5.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/