Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 20:52 +0200, Thomas Gleixner wrote:
> On Thu, 11 Jun 2015, Mike Galbraith wrote:
> > On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> > 
> > > The kernel does not care whether the actual interrupt happens or the
> > > timer is expired by other means.
> > 
> > Final question: Notabug explanation fits.. but why would the behavior
> > delta be so radical and repeatable w/wo deadline timer on some boxen?
> 
> Good question. Cannot answer that w/o looking at traces.

I may come back with some, I spent enough time doing circular notabug
chasing all over hell and creation to be a tad annoyed ;-)

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> 
> > The kernel does not care whether the actual interrupt happens or the
> > timer is expired by other means.
> 
> Final question: Notabug explanation fits.. but why would the behavior
> delta be so radical and repeatable w/wo deadline timer on some boxen?

Good question. Cannot answer that w/o looking at traces.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.

Final question: Notabug explanation fits.. but why would the behavior
delta be so radical and repeatable w/wo deadline timer on some boxen?

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

> What you are looking at is the actual timer interrupt vector, but
> that's not the the tick.
> 
> # grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
> 45
> 
> 175 / 45  =~ 4 . So I you have CPNFIG_HZ=250
> 
> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.

Learn something new every day.  I (obviously) had no idea that it's
perfectly fine for interrupts to go away when the CPU is busy.

Oh well.  Thanks for the new wrinkle in the mush between my ears.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:
> 
> > Can you try the following, please?
> > 
> > Enable function tracer and hrtimer events manually. Then watch the irq
> > count on cpu3. If it stalls or becomes slow, then stop the trace with
> > 
> >   echo 0 >/sys/kernel/debug/tracing/tracing_on
> > 
> > If the overhead of the function tracer hides the problem, then try just
> > with hrtimer, sched_switch and irq events.
> 
> Yeah, function tracer hides it.  After some fiddling with buffer size to
> be able to ever see a tick, I finally have a trace trimmed with a tick
> on each end, 175ms gap between them.  It's still 1.5 after xz, sending
> off list.

What you are looking at is the actual timer interrupt vector, but
that's not the the tick.

# grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
45

175 / 45  =~ 4 . So I you have CPNFIG_HZ=250

The kernel does not care whether the actual interrupt happens or the
timer is expired by other means.

If Xen relies on the vector, it's none of my problems 

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:

> Can you try the following, please?
> 
> Enable function tracer and hrtimer events manually. Then watch the irq
> count on cpu3. If it stalls or becomes slow, then stop the trace with
> 
>   echo 0 >/sys/kernel/debug/tracing/tracing_on
> 
> If the overhead of the function tracer hides the problem, then try just
> with hrtimer, sched_switch and irq events.

Yeah, function tracer hides it.  After some fiddling with buffer size to
be able to ever see a tick, I finally have a trace trimmed with a tick
on each end, 175ms gap between them.  It's still 1.5 after xz, sending
off list.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> Well, trying to do that like so...
> 
> trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e 
> timer:hrtimer*  -- sleep 1

> ..capture takes much more than a second...
> 
> LOC: 248161 226536  42091  38892   Local timer interrupts
> LOC: 248381 226783  42092  38901   Local timer interrupts
> LOC: 248668 227038  42092  38903   Local timer interrupts
> LOC: 248963 227277  42092  38904   Local timer interrupts
> LOC: 249214 227515  42092  38905   Local timer interrupts
> LOC: 249486 227732  42092  38905   Local timer interrupts
> LOC: 249720 227961  42092  38905   Local timer interrupts
> 
> ...but more importantly, when it gets cranked up..
> 
> homer:~/tmp # trace-cmd report > report
> homer:~/tmp # grep local_timer_entry report|wc -l
> 252
> 
> ...it scares the problem away.

Can you try the following, please?

Enable function tracer and hrtimer events manually. Then watch the irq
count on cpu3. If it stalls or becomes slow, then stop the trace with

  echo 0 >/sys/kernel/debug/tracing/tracing_on

If the overhead of the function tracer hides the problem, then try just
with hrtimer, sched_switch and irq events.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 09:35 +0200, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> 
> > I just booted a SNB with lapic=notscdeadline and ran that test
> > program. All what happens is - as expected - that ksoftirqd runs more
> > than we would like it to. I cannot observe any anomality vs. local
> > timer interrupts at all.
> 
> Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
> verified "went away"

(dropped off the net on reboot, doesn't respond to remote button poking)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all.

Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
verified "went away" when I put master on it this morning, but that
doesn't matter much, having deadline timer feature alone definitely
doesn't mean box can be convinced to reproduce the stall.  Hohum.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:

 Can you try the following, please?
 
 Enable function tracer and hrtimer events manually. Then watch the irq
 count on cpu3. If it stalls or becomes slow, then stop the trace with
 
   echo 0 /sys/kernel/debug/tracing/tracing_on
 
 If the overhead of the function tracer hides the problem, then try just
 with hrtimer, sched_switch and irq events.

Yeah, function tracer hides it.  After some fiddling with buffer size to
be able to ever see a tick, I finally have a trace trimmed with a tick
on each end, 175ms gap between them.  It's still 1.5 after xz, sending
off list.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
 On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:
 
  Can you try the following, please?
  
  Enable function tracer and hrtimer events manually. Then watch the irq
  count on cpu3. If it stalls or becomes slow, then stop the trace with
  
echo 0 /sys/kernel/debug/tracing/tracing_on
  
  If the overhead of the function tracer hides the problem, then try just
  with hrtimer, sched_switch and irq events.
 
 Yeah, function tracer hides it.  After some fiddling with buffer size to
 be able to ever see a tick, I finally have a trace trimmed with a tick
 on each end, 175ms gap between them.  It's still 1.5 after xz, sending
 off list.

What you are looking at is the actual timer interrupt vector, but
that's not the the tick.

# grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
45

175 / 45  =~ 4 . So I you have CPNFIG_HZ=250

The kernel does not care whether the actual interrupt happens or the
timer is expired by other means.

If Xen relies on the vector, it's none of my problems 

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

 What you are looking at is the actual timer interrupt vector, but
 that's not the the tick.
 
 # grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
 45
 
 175 / 45  =~ 4 . So I you have CPNFIG_HZ=250
 
 The kernel does not care whether the actual interrupt happens or the
 timer is expired by other means.

Learn something new every day.  I (obviously) had no idea that it's
perfectly fine for interrupts to go away when the CPU is busy.

Oh well.  Thanks for the new wrinkle in the mush between my ears.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

 I just booted a SNB with lapic=notscdeadline and ran that test
 program. All what happens is - as expected - that ksoftirqd runs more
 than we would like it to. I cannot observe any anomality vs. local
 timer interrupts at all.

Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
verified went away when I put master on it this morning, but that
doesn't matter much, having deadline timer feature alone definitely
doesn't mean box can be convinced to reproduce the stall.  Hohum.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 09:35 +0200, Mike Galbraith wrote:
 On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
 
  I just booted a SNB with lapic=notscdeadline and ran that test
  program. All what happens is - as expected - that ksoftirqd runs more
  than we would like it to. I cannot observe any anomality vs. local
  timer interrupts at all.
 
 Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
 verified went away

(dropped off the net on reboot, doesn't respond to remote button poking)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
 On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
 Well, trying to do that like so...
 
 trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e 
 timer:hrtimer*  -- sleep 1

 ..capture takes much more than a second...
 
 LOC: 248161 226536  42091  38892   Local timer interrupts
 LOC: 248381 226783  42092  38901   Local timer interrupts
 LOC: 248668 227038  42092  38903   Local timer interrupts
 LOC: 248963 227277  42092  38904   Local timer interrupts
 LOC: 249214 227515  42092  38905   Local timer interrupts
 LOC: 249486 227732  42092  38905   Local timer interrupts
 LOC: 249720 227961  42092  38905   Local timer interrupts
 
 ...but more importantly, when it gets cranked up..
 
 homer:~/tmp # trace-cmd report  report
 homer:~/tmp # grep local_timer_entry report|wc -l
 252
 
 ...it scares the problem away.

Can you try the following, please?

Enable function tracer and hrtimer events manually. Then watch the irq
count on cpu3. If it stalls or becomes slow, then stop the trace with

  echo 0 /sys/kernel/debug/tracing/tracing_on

If the overhead of the function tracer hides the problem, then try just
with hrtimer, sched_switch and irq events.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

 The kernel does not care whether the actual interrupt happens or the
 timer is expired by other means.

Final question: Notabug explanation fits.. but why would the behavior
delta be so radical and repeatable w/wo deadline timer on some boxen?

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Thomas Gleixner
On Thu, 11 Jun 2015, Mike Galbraith wrote:
 On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
 
  The kernel does not care whether the actual interrupt happens or the
  timer is expired by other means.
 
 Final question: Notabug explanation fits.. but why would the behavior
 delta be so radical and repeatable w/wo deadline timer on some boxen?

Good question. Cannot answer that w/o looking at traces.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-11 Thread Mike Galbraith
On Thu, 2015-06-11 at 20:52 +0200, Thomas Gleixner wrote:
 On Thu, 11 Jun 2015, Mike Galbraith wrote:
  On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
  
   The kernel does not care whether the actual interrupt happens or the
   timer is expired by other means.
  
  Final question: Notabug explanation fits.. but why would the behavior
  delta be so radical and repeatable w/wo deadline timer on some boxen?
 
 Good question. Cannot answer that w/o looking at traces.

I may come back with some, I spent enough time doing circular notabug
chasing all over hell and creation to be a tad annoyed ;-)

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> 
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > 
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> > 
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> >  ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
> > vector=239
> 
> That gap is odd. Can you fire up the function tracer please?

Well, trying to do that like so...

trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e 
timer:hrtimer*  -- sleep 1

..capture takes much more than a second...

LOC: 248161 226536  42091  38892   Local timer interrupts
LOC: 248381 226783  42092  38901   Local timer interrupts
LOC: 248668 227038  42092  38903   Local timer interrupts
LOC: 248963 227277  42092  38904   Local timer interrupts
LOC: 249214 227515  42092  38905   Local timer interrupts
LOC: 249486 227732  42092  38905   Local timer interrupts
LOC: 249720 227961  42092  38905   Local timer interrupts
LOC: 249996 228200  42092  38906   Local timer interrupts
LOC: 250215 228450  42092  38906   Local timer interrupts
LOC: 250489 228704  42092  38906   Local timer interrupts
LOC: 250729 228952  42099  38964   Local timer interrupts
LOC: 250951 229141  42100  39215   Local timer interrupts
LOC: 251168 229374  42119  39468   Local timer interrupts
LOC: 251426 229641  42131  39719   Local timer interrupts
LOC: 251652 229868  42131  39970   Local timer interrupts
LOC: 251881 230141  42132  40220   Local timer interrupts
LOC: 252131 230371  42132  40471   Local timer interrupts
LOC: 252361 230646  42133  40721   Local timer interrupts
LOC: 252611 230875  42133  40972   Local timer interrupts
LOC: 252866 231149  42137  41222   Local timer interrupts
LOC: 253103 231386  42153  41450   Local timer interrupts

...but more importantly, when it gets cranked up..

homer:~/tmp # trace-cmd report > report
homer:~/tmp # grep local_timer_entry report|wc -l
252

...it scares the problem away.

(4 cpus because I fired up cpuset shield, isolates 2-3, takes 4-7 down)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Steven Rostedt wrote:

> On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
> Thomas Gleixner  wrote:
> 
> > On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > 
> > > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > > 
> > > > Some more information about your symptoms in form of configuration,
> > > > extra patches, kernel traces etc. would be appreciated.
> > > 
> > > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> 
> I would recommend using trace-cmd especially if you add function
> tracing, as it's made more for fast tracing.
> 
> -- Steve
> 
> > > [ perf record: Woken up 2 times to write data ]
> > > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 
> > > samples) ]
> > > homer:~/tmp # perf script | grep irq_vectors:local_timer
> > >  ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
> > > vector=239
> > >  ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
> > > vector=239
> > >  ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
> > > vector=239
> > >  ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
> > > vector=239
> > >  ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
> > > vector=239
> > 
> > That gap is odd. Can you fire up the function tracer please?

And please enable hrtimer events as well. Output of that cpu is good enough.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Steven Rostedt
On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
Thomas Gleixner  wrote:

> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> 
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > 
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> > 
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10

I would recommend using trace-cmd especially if you add function
tracing, as it's made more for fast tracing.

-- Steve

> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> >  ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
> > vector=239
> >  ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
> > vector=239
> 
> That gap is odd. Can you fire up the function tracer please?
> 
> Thanks,
> 
>   tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:

> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> 
> > Some more information about your symptoms in form of configuration,
> > extra patches, kernel traces etc. would be appreciated.
> 
> homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> homer:~/tmp # perf script | grep irq_vectors:local_timer
>  ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
> vector=239
>  ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
> vector=239
>  ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
> vector=239
>  ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
> vector=239
>  ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
> vector=239

That gap is odd. Can you fire up the function tracer please?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:
> Virgin source or kernels with zillion+ patches, doesn't matter.  To test
> virgin source earlier than EFI_STUB I had to pollute the source with
> EFI backports, but nothing else.
> 
> Just a sec while I check yet again that absolutely virgin master really
> really does stall  Yup.  I pinned the tescase to CPU3..
> 
> LOC:   7432   5803   5418   6387   3371   3039   
> 5757   3080   Local timer interrupts <== here
> LOC:   7560   6028   5632   6394   3538   3195   
> 5937   3084   Local timer interrupts
> LOC:   7747   6135   5720   6394   3543   3262   
> 6087   3086   Local timer interrupts
> LOC:   7930   6206   5785   6394   3571   3288   
> 6303   3087   Local timer interrupts

So now it would be helpful to have a trace to see what CPU3 is
actually doing.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.

homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
homer:~/tmp # perf script | grep irq_vectors:local_timer
 ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.632219: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5587.056069: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5587.056071: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5587.307978: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5587.307978: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5588.255633: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5588.255635: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5589.627132: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5589.627133: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.334877: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.334883: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.510815: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.510816: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5590.558794: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5590.558796: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.822707: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.822709: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5592.346152: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5592.346153: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5592.558074: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5592.558076: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5593.741647: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5593.741649: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5594.953208: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5594.953209: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5595.009197: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5595.009198: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5595.397049: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5595.397050: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5595.473025: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5595.473027: irq_vectors:local_timer_exit: 
vector=239
homer:~/tmp # 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > The above was handed to me by a colleague working on a Xen guest that
> > livelocked.  I at first though Xen arch must have a weird problem, but
> > when I tried proggy on my desktop box, while it didn't stop the tick
> > completely as it did the Xen box, it slowed it to a crawl.  I noticed
> > that this did not happen with newer kernels, so a bisecting I did go,
> > and found that...
> > 
> > 279f14614 x86: apic: Use tsc deadline for oneshot when available
> > 
> > ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
> 
> This does not make any sense at all. It does not matter whether the
> box uses tscdeadline or local apic timer. We do not even program the
> hardware because we see that the event is in the past already.

Yup.

> So we raise the hrtimer softirqd, which then expires the timer. So all
> what happens is that ksoftirqd accumulates runtime, but I cannot at
> all see how that amounts to a DoS and brings the machine to a grinding
> halt.

The tick certainly appears to crawl here, and Dom0 boxen gripe if you
let them not tick at all for a while.

> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all. If I run this pinned on an otherwise idle
> core, then I get ~ CONFIG_HZ interrupts per second, which is what you
> expect when the cpu never reaches idle.

Hm.  In order to successfully bisect the thing 3.7->3.8 I ran 2xCPUS
copies because the first bisect went gaga.  I'm not having any trouble
reproducing on master with a single pinned copy though, nor did I have
any on any of the kernels either stable or enterprise I tested, and
that's quite a few.  Whatever, that first bisect did go bad.

> > The below targets the symptom, consider it hrtimer cluebat attractant.
> 
> By now I know to take your patches with a grain of salt :)

Sodium being bad for blood pressure is a medical myth.

> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.

Virgin source or kernels with zillion+ patches, doesn't matter.  To test
virgin source earlier than EFI_STUB I had to pollute the source with
EFI backports, but nothing else.

Just a sec while I check yet again that absolutely virgin master really
really does stall  Yup.  I pinned the tescase to CPU3..

while sleep 1; do grep LOC /proc/interrupts; done
LOC:   6706   5367   5053   6217   3031   2866   
5477   3022   Local timer interrupts
LOC:   6753   5391   5074   6238   3058   2894   
5576   3034   Local timer interrupts
LOC:   6791   5422   5104   6265   3066   2903   
5582   3039   Local timer interrupts
LOC:   6846   5472   5154   6293   3096   2909   
5595   3042   Local timer interrupts
LOC:   6855   5518   5177   6325   3199   2920   
5613   3046   Local timer interrupts
LOC:   6892   5552   5217   6338   3234   2935   
5637   3053   Local timer interrupts
LOC:   6983   5568   5236   6347   3244   2944   
5660   3065   Local timer interrupts
LOC:   7028   5583   5251   6363   3262   2963   
5673   3071   Local timer interrupts
LOC:   7217   5676   5343   6383   3305   2976   
5682   3078   Local timer interrupts
LOC:   7432   5803   5418   6387   3371   3039   
5757   3080   Local timer interrupts <== here
LOC:   7560   6028   5632   6394   3538   3195   
5937   3084   Local timer interrupts
LOC:   7747   6135   5720   6394   3543   3262   
6087   3086   Local timer interrupts
LOC:   7930   6206   5785   6394   3571   3288   
6303   3087   Local timer interrupts
LOC:   8057   6299   5842   6394   3606   3346   
6415   3088   Local timer interrupts
LOC:   8236   6361   5921   6394   3632   3409   
6630   3090   Local timer interrupts
LOC:   8382   6448   6004   6394   3664   3478   
6754   3090   Local timer interrupts
LOC:   8460   6571   6124   6394   3690   3542   
6951   3092   Local timer interrupts
LOC:   8605   6670   6224   6394   3723   3614   
7078   3093   Local timer interrupts
LOC:   8710   6842   6323   6394   3776   3702   
7295   3123   Local timer interrupts
LOC:   8868   6947   6402   6394   3828   3784   
7422   3149   Local timer 

Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:
> The above was handed to me by a colleague working on a Xen guest that
> livelocked.  I at first though Xen arch must have a weird problem, but
> when I tried proggy on my desktop box, while it didn't stop the tick
> completely as it did the Xen box, it slowed it to a crawl.  I noticed
> that this did not happen with newer kernels, so a bisecting I did go,
> and found that...
> 
> 279f14614 x86: apic: Use tsc deadline for oneshot when available
> 
> ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a

This does not make any sense at all. It does not matter whether the
box uses tscdeadline or local apic timer. We do not even program the
hardware because we see that the event is in the past already.

So we raise the hrtimer softirqd, which then expires the timer. So all
what happens is that ksoftirqd accumulates runtime, but I cannot at
all see how that amounts to a DoS and brings the machine to a grinding
halt.

> Xen dom0 remains busted by that testcase whether that patch is applied
> to the host or not, even though the hypervisor supports deadline timer,
> and seemingly regardless of CPU type all together.
>
> Of all the x86_64 bare metal boxen I've tested, only those with the TSC
> deadline timer have shown the issue, and there it goes away as of v3.8
> unless you boot lapic=notscdeadline.

I just booted a SNB with lapic=notscdeadline and ran that test
program. All what happens is - as expected - that ksoftirqd runs more
than we would like it to. I cannot observe any anomality vs. local
timer interrupts at all. If I run this pinned on an otherwise idle
core, then I get ~ CONFIG_HZ interrupts per second, which is what you
expect when the cpu never reaches idle.

With the changes pending in tip/timers/core we get more timer
interrupts instead of offloading crap to ksoftirqd, but they cannot
lead to a DoS either and we do not care whether the user spends its
cycles looping in user space or firing timer interrupts. It can only
do as long as it is on the cpu.

These timers (futex, nanosleep, poll, ...) are oneshot and all timers
which are self rearming are rate limited by the fact that we only
rearm when the previous event has been consumed by the task which
scheduled it. So the scheduler controls how many of these events can
be created from user space.

> However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
> hasbeen) can be made to exhibit the symptom, there may be other arches
> that get seriously dinged up or maybe even as thoroughly b0rked as Xen
> does when hrtimer_interrupt() is pounded into the ground by userspace.
> 
> Alternatively, should someone out there know that all bare metal is in
> fact fine post 279f14614, that person will likely also know what the Xen
> folks need to do to fix up their busted arch. 
> 
> The below targets the symptom, consider it hrtimer cluebat attractant.

By now I know to take your patches with a grain of salt :)

Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
Greetings,

Like so...

#include 
#include 
#include 
#include  

int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void 
*addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}


int main()
{
struct timespec t;
int f = 1;

clock_gettime(CLOCK_REALTIME, );
t.tv_sec -= 10;

while (1) {
sys_futex(, FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME, 1, , 
NULL, FUTEX_BITSET_MATCH_ANY);
}
}


The above was handed to me by a colleague working on a Xen guest that
livelocked.  I at first though Xen arch must have a weird problem, but
when I tried proggy on my desktop box, while it didn't stop the tick
completely as it did the Xen box, it slowed it to a crawl.  I noticed
that this did not happen with newer kernels, so a bisecting I did go,
and found that...

279f14614 x86: apic: Use tsc deadline for oneshot when available

..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
Xen dom0 remains busted by that testcase whether that patch is applied
to the host or not, even though the hypervisor supports deadline timer,
and seemingly regardless of CPU type all together.

Of all the x86_64 bare metal boxen I've tested, only those with the TSC
deadline timer have shown the issue, and there it goes away as of v3.8
unless you boot lapic=notscdeadline.

However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
hasbeen) can be made to exhibit the symptom, there may be other arches
that get seriously dinged up or maybe even as thoroughly b0rked as Xen
does when hrtimer_interrupt() is pounded into the ground by userspace.

Alternatively, should someone out there know that all bare metal is in
fact fine post 279f14614, that person will likely also know what the Xen
folks need to do to fix up their busted arch. 

The below targets the symptom, consider it hrtimer cluebat attractant.

---
 kernel/time/hrtimer.c |   31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -933,6 +933,8 @@ remove_hrtimer(struct hrtimer *timer, st
return 0;
 }
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
unsigned long delta_ns, const enum hrtimer_mode mode,
int wakeup)
@@ -980,8 +982,27 @@ int __hrtimer_start_range_ns(struct hrti
 * on dynticks target.
 */
wake_up_nohz_cpu(new_base->cpu_base->cpu);
-   } else if (new_base->cpu_base == this_cpu_ptr(_bases) &&
-   hrtimer_reprogram(timer, new_base)) {
+   } else if (new_base->cpu_base == this_cpu_ptr(_bases)) {
+   int res = hrtimer_reprogram(timer, new_base);
+
+   if (!res)
+   goto out;
+
+   /*
+* If a buggy app tries forever to be awakened in the past,
+* banging on hrtimer_interrupt() at high speed can stall
+* the tick, and on a Xen box, forever.  On haswell with
+* tsc_deadline_timer disabled you can see it, though it
+* only slows the tick way down.  Other bare metal boxes
+* may also be terminally affected.
+*/
+   if (unlikely(wakeup && !ret && IS_ERR_VALUE(res) &&
+   timer->function == hrtimer_wakeup)) {
+   debug_deactivate(timer);
+   __remove_hrtimer(timer, new_base, 0, 0);
+   ret = -ETIMEDOUT;
+   }
+
/*
 * Only allow reprogramming if the new base is on this CPU.
 * (it might still be on another CPU if the timer was pending)
@@ -994,7 +1015,10 @@ int __hrtimer_start_range_ns(struct hrti
 * lock ordering issue vs. rq->lock.
 */
raw_spin_unlock(_base->cpu_base->lock);
-   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   if (!IS_ERR_VALUE(ret))
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   else
+   hrtimer_wakeup(timer);
local_irq_restore(flags);
return ret;
} else {
@@ -1002,6 +1026,7 @@ int __hrtimer_start_range_ns(struct hrti
}
}
 
+out:
unlock_hrtimer_base(timer, );
 
return ret;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
Greetings,

Like so...

#include time.h
#include sys/time.h
#include linux/futex.h
#include  sys/syscall.h

int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void 
*addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}


int main()
{
struct timespec t;
int f = 1;

clock_gettime(CLOCK_REALTIME, t);
t.tv_sec -= 10;

while (1) {
sys_futex(f, FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME, 1, t, 
NULL, FUTEX_BITSET_MATCH_ANY);
}
}


The above was handed to me by a colleague working on a Xen guest that
livelocked.  I at first though Xen arch must have a weird problem, but
when I tried proggy on my desktop box, while it didn't stop the tick
completely as it did the Xen box, it slowed it to a crawl.  I noticed
that this did not happen with newer kernels, so a bisecting I did go,
and found that...

279f14614 x86: apic: Use tsc deadline for oneshot when available

..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
Xen dom0 remains busted by that testcase whether that patch is applied
to the host or not, even though the hypervisor supports deadline timer,
and seemingly regardless of CPU type all together.

Of all the x86_64 bare metal boxen I've tested, only those with the TSC
deadline timer have shown the issue, and there it goes away as of v3.8
unless you boot lapic=notscdeadline.

However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
hasbeen) can be made to exhibit the symptom, there may be other arches
that get seriously dinged up or maybe even as thoroughly b0rked as Xen
does when hrtimer_interrupt() is pounded into the ground by userspace.

Alternatively, should someone out there know that all bare metal is in
fact fine post 279f14614, that person will likely also know what the Xen
folks need to do to fix up their busted arch. 

The below targets the symptom, consider it hrtimer cluebat attractant.

---
 kernel/time/hrtimer.c |   31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -933,6 +933,8 @@ remove_hrtimer(struct hrtimer *timer, st
return 0;
 }
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
unsigned long delta_ns, const enum hrtimer_mode mode,
int wakeup)
@@ -980,8 +982,27 @@ int __hrtimer_start_range_ns(struct hrti
 * on dynticks target.
 */
wake_up_nohz_cpu(new_base-cpu_base-cpu);
-   } else if (new_base-cpu_base == this_cpu_ptr(hrtimer_bases) 
-   hrtimer_reprogram(timer, new_base)) {
+   } else if (new_base-cpu_base == this_cpu_ptr(hrtimer_bases)) {
+   int res = hrtimer_reprogram(timer, new_base);
+
+   if (!res)
+   goto out;
+
+   /*
+* If a buggy app tries forever to be awakened in the past,
+* banging on hrtimer_interrupt() at high speed can stall
+* the tick, and on a Xen box, forever.  On haswell with
+* tsc_deadline_timer disabled you can see it, though it
+* only slows the tick way down.  Other bare metal boxes
+* may also be terminally affected.
+*/
+   if (unlikely(wakeup  !ret  IS_ERR_VALUE(res) 
+   timer-function == hrtimer_wakeup)) {
+   debug_deactivate(timer);
+   __remove_hrtimer(timer, new_base, 0, 0);
+   ret = -ETIMEDOUT;
+   }
+
/*
 * Only allow reprogramming if the new base is on this CPU.
 * (it might still be on another CPU if the timer was pending)
@@ -994,7 +1015,10 @@ int __hrtimer_start_range_ns(struct hrti
 * lock ordering issue vs. rq-lock.
 */
raw_spin_unlock(new_base-cpu_base-lock);
-   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   if (!IS_ERR_VALUE(ret))
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+   else
+   hrtimer_wakeup(timer);
local_irq_restore(flags);
return ret;
} else {
@@ -1002,6 +1026,7 @@ int __hrtimer_start_range_ns(struct hrti
}
}
 
+out:
unlock_hrtimer_base(timer, flags);
 
return ret;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
 On Wed, 10 Jun 2015, Mike Galbraith wrote:
  The above was handed to me by a colleague working on a Xen guest that
  livelocked.  I at first though Xen arch must have a weird problem, but
  when I tried proggy on my desktop box, while it didn't stop the tick
  completely as it did the Xen box, it slowed it to a crawl.  I noticed
  that this did not happen with newer kernels, so a bisecting I did go,
  and found that...
  
  279f14614 x86: apic: Use tsc deadline for oneshot when available
  
  ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
 
 This does not make any sense at all. It does not matter whether the
 box uses tscdeadline or local apic timer. We do not even program the
 hardware because we see that the event is in the past already.

Yup.

 So we raise the hrtimer softirqd, which then expires the timer. So all
 what happens is that ksoftirqd accumulates runtime, but I cannot at
 all see how that amounts to a DoS and brings the machine to a grinding
 halt.

The tick certainly appears to crawl here, and Dom0 boxen gripe if you
let them not tick at all for a while.

 I just booted a SNB with lapic=notscdeadline and ran that test
 program. All what happens is - as expected - that ksoftirqd runs more
 than we would like it to. I cannot observe any anomality vs. local
 timer interrupts at all. If I run this pinned on an otherwise idle
 core, then I get ~ CONFIG_HZ interrupts per second, which is what you
 expect when the cpu never reaches idle.

Hm.  In order to successfully bisect the thing 3.7-3.8 I ran 2xCPUS
copies because the first bisect went gaga.  I'm not having any trouble
reproducing on master with a single pinned copy though, nor did I have
any on any of the kernels either stable or enterprise I tested, and
that's quite a few.  Whatever, that first bisect did go bad.

  The below targets the symptom, consider it hrtimer cluebat attractant.
 
 By now I know to take your patches with a grain of salt :)

Sodium being bad for blood pressure is a medical myth.

 Some more information about your symptoms in form of configuration,
 extra patches, kernel traces etc. would be appreciated.

Virgin source or kernels with zillion+ patches, doesn't matter.  To test
virgin source earlier than EFI_STUB I had to pollute the source with
EFI backports, but nothing else.

Just a sec while I check yet again that absolutely virgin master really
really does stall  Yup.  I pinned the tescase to CPU3..

while sleep 1; do grep LOC /proc/interrupts; done
LOC:   6706   5367   5053   6217   3031   2866   
5477   3022   Local timer interrupts
LOC:   6753   5391   5074   6238   3058   2894   
5576   3034   Local timer interrupts
LOC:   6791   5422   5104   6265   3066   2903   
5582   3039   Local timer interrupts
LOC:   6846   5472   5154   6293   3096   2909   
5595   3042   Local timer interrupts
LOC:   6855   5518   5177   6325   3199   2920   
5613   3046   Local timer interrupts
LOC:   6892   5552   5217   6338   3234   2935   
5637   3053   Local timer interrupts
LOC:   6983   5568   5236   6347   3244   2944   
5660   3065   Local timer interrupts
LOC:   7028   5583   5251   6363   3262   2963   
5673   3071   Local timer interrupts
LOC:   7217   5676   5343   6383   3305   2976   
5682   3078   Local timer interrupts
LOC:   7432   5803   5418   6387   3371   3039   
5757   3080   Local timer interrupts == here
LOC:   7560   6028   5632   6394   3538   3195   
5937   3084   Local timer interrupts
LOC:   7747   6135   5720   6394   3543   3262   
6087   3086   Local timer interrupts
LOC:   7930   6206   5785   6394   3571   3288   
6303   3087   Local timer interrupts
LOC:   8057   6299   5842   6394   3606   3346   
6415   3088   Local timer interrupts
LOC:   8236   6361   5921   6394   3632   3409   
6630   3090   Local timer interrupts
LOC:   8382   6448   6004   6394   3664   3478   
6754   3090   Local timer interrupts
LOC:   8460   6571   6124   6394   3690   3542   
6951   3092   Local timer interrupts
LOC:   8605   6670   6224   6394   3723   3614   
7078   3093   Local timer interrupts
LOC:   8710   6842   6323   6394   3776   3702   
7295   3123   Local timer interrupts
LOC:   8868   6947   6402   6394   3828   3784   
7422   3149   Local timer interrupts
LOC:   9077   7124   6523 

Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Steven Rostedt wrote:

 On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
 Thomas Gleixner t...@linutronix.de wrote:
 
  On Wed, 10 Jun 2015, Mike Galbraith wrote:
  
   On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
   
Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.
   
   homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
 
 I would recommend using trace-cmd especially if you add function
 tracing, as it's made more for fast tracing.
 
 -- Steve
 
   [ perf record: Woken up 2 times to write data ]
   [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 
   samples) ]
   homer:~/tmp # perf script | grep irq_vectors:local_timer
ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
   vector=239
ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
   vector=239
ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
   vector=239
ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
   vector=239
ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
   vector=239
  
  That gap is odd. Can you fire up the function tracer please?

And please enable hrtimer events as well. Output of that cpu is good enough.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Steven Rostedt
On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
Thomas Gleixner t...@linutronix.de wrote:

 On Wed, 10 Jun 2015, Mike Galbraith wrote:
 
  On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
  
   Some more information about your symptoms in form of configuration,
   extra patches, kernel traces etc. would be appreciated.
  
  homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10

I would recommend using trace-cmd especially if you add function
tracing, as it's made more for fast tracing.

-- Steve

  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
  homer:~/tmp # perf script | grep irq_vectors:local_timer
   ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
  vector=239
   ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
  vector=239
   ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
  vector=239
   ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
  vector=239
   ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
  vector=239
 
 That gap is odd. Can you fire up the function tracer please?
 
 Thanks,
 
   tglx

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:
 Virgin source or kernels with zillion+ patches, doesn't matter.  To test
 virgin source earlier than EFI_STUB I had to pollute the source with
 EFI backports, but nothing else.
 
 Just a sec while I check yet again that absolutely virgin master really
 really does stall  Yup.  I pinned the tescase to CPU3..
 
 LOC:   7432   5803   5418   6387   3371   3039   
 5757   3080   Local timer interrupts == here
 LOC:   7560   6028   5632   6394   3538   3195   
 5937   3084   Local timer interrupts
 LOC:   7747   6135   5720   6394   3543   3262   
 6087   3086   Local timer interrupts
 LOC:   7930   6206   5785   6394   3571   3288   
 6303   3087   Local timer interrupts

So now it would be helpful to have a trace to see what CPU3 is
actually doing.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:

 On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
 
  Some more information about your symptoms in form of configuration,
  extra patches, kernel traces etc. would be appreciated.
 
 homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
 [ perf record: Woken up 2 times to write data ]
 [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
 homer:~/tmp # perf script | grep irq_vectors:local_timer
  ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
 vector=239
  ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
 vector=239
  ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
 vector=239
  ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
 vector=239
  ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
 vector=239

That gap is odd. Can you fire up the function tracer please?

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

 Some more information about your symptoms in form of configuration,
 extra patches, kernel traces etc. would be appreciated.

homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
homer:~/tmp # perf script | grep irq_vectors:local_timer
 ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5586.632219: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5587.056069: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5587.056071: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5587.307978: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5587.307978: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5588.255633: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5588.255635: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5589.627132: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5589.627133: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.334877: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.334883: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.510815: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.510816: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5590.558794: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5590.558796: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5590.822707: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5590.822709: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5592.346152: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5592.346153: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5592.558074: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5592.558076: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5593.741647: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5593.741649: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5594.953208: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5594.953209: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5595.009197: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5595.009198: irq_vectors:local_timer_exit: 
vector=239
 ksoftirqd/319 [003]  5595.397049: irq_vectors:local_timer_entry: 
vector=239
 ksoftirqd/319 [003]  5595.397050: irq_vectors:local_timer_exit: 
vector=239
 bnc886785_livel  9817 [003]  5595.473025: irq_vectors:local_timer_entry: 
vector=239
 bnc886785_livel  9817 [003]  5595.473027: irq_vectors:local_timer_exit: 
vector=239
homer:~/tmp # 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Mike Galbraith
On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
 On Wed, 10 Jun 2015, Mike Galbraith wrote:
 
  On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
  
   Some more information about your symptoms in form of configuration,
   extra patches, kernel traces etc. would be appreciated.
  
  homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
  homer:~/tmp # perf script | grep irq_vectors:local_timer
   ksoftirqd/319 [003]  5586.564238: irq_vectors:local_timer_entry: 
  vector=239
   ksoftirqd/319 [003]  5586.564239: irq_vectors:local_timer_exit: 
  vector=239
   ksoftirqd/319 [003]  5586.568238: irq_vectors:local_timer_entry: 
  vector=239
   ksoftirqd/319 [003]  5586.568239: irq_vectors:local_timer_exit: 
  vector=239
   ksoftirqd/319 [003]  5586.632218: irq_vectors:local_timer_entry: 
  vector=239
 
 That gap is odd. Can you fire up the function tracer please?

Well, trying to do that like so...

trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e 
timer:hrtimer*  -- sleep 1

..capture takes much more than a second...

LOC: 248161 226536  42091  38892   Local timer interrupts
LOC: 248381 226783  42092  38901   Local timer interrupts
LOC: 248668 227038  42092  38903   Local timer interrupts
LOC: 248963 227277  42092  38904   Local timer interrupts
LOC: 249214 227515  42092  38905   Local timer interrupts
LOC: 249486 227732  42092  38905   Local timer interrupts
LOC: 249720 227961  42092  38905   Local timer interrupts
LOC: 249996 228200  42092  38906   Local timer interrupts
LOC: 250215 228450  42092  38906   Local timer interrupts
LOC: 250489 228704  42092  38906   Local timer interrupts
LOC: 250729 228952  42099  38964   Local timer interrupts
LOC: 250951 229141  42100  39215   Local timer interrupts
LOC: 251168 229374  42119  39468   Local timer interrupts
LOC: 251426 229641  42131  39719   Local timer interrupts
LOC: 251652 229868  42131  39970   Local timer interrupts
LOC: 251881 230141  42132  40220   Local timer interrupts
LOC: 252131 230371  42132  40471   Local timer interrupts
LOC: 252361 230646  42133  40721   Local timer interrupts
LOC: 252611 230875  42133  40972   Local timer interrupts
LOC: 252866 231149  42137  41222   Local timer interrupts
LOC: 253103 231386  42153  41450   Local timer interrupts

...but more importantly, when it gets cranked up..

homer:~/tmp # trace-cmd report  report
homer:~/tmp # grep local_timer_entry report|wc -l
252

...it scares the problem away.

(4 cpus because I fired up cpuset shield, isolates 2-3, takes 4-7 down)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: futex_wait() can DoS the tick

2015-06-10 Thread Thomas Gleixner
On Wed, 10 Jun 2015, Mike Galbraith wrote:
 The above was handed to me by a colleague working on a Xen guest that
 livelocked.  I at first though Xen arch must have a weird problem, but
 when I tried proggy on my desktop box, while it didn't stop the tick
 completely as it did the Xen box, it slowed it to a crawl.  I noticed
 that this did not happen with newer kernels, so a bisecting I did go,
 and found that...
 
 279f14614 x86: apic: Use tsc deadline for oneshot when available
 
 ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a

This does not make any sense at all. It does not matter whether the
box uses tscdeadline or local apic timer. We do not even program the
hardware because we see that the event is in the past already.

So we raise the hrtimer softirqd, which then expires the timer. So all
what happens is that ksoftirqd accumulates runtime, but I cannot at
all see how that amounts to a DoS and brings the machine to a grinding
halt.

 Xen dom0 remains busted by that testcase whether that patch is applied
 to the host or not, even though the hypervisor supports deadline timer,
 and seemingly regardless of CPU type all together.

 Of all the x86_64 bare metal boxen I've tested, only those with the TSC
 deadline timer have shown the issue, and there it goes away as of v3.8
 unless you boot lapic=notscdeadline.

I just booted a SNB with lapic=notscdeadline and ran that test
program. All what happens is - as expected - that ksoftirqd runs more
than we would like it to. I cannot observe any anomality vs. local
timer interrupts at all. If I run this pinned on an otherwise idle
core, then I get ~ CONFIG_HZ interrupts per second, which is what you
expect when the cpu never reaches idle.

With the changes pending in tip/timers/core we get more timer
interrupts instead of offloading crap to ksoftirqd, but they cannot
lead to a DoS either and we do not care whether the user spends its
cycles looping in user space or firing timer interrupts. It can only
do as long as it is on the cpu.

These timers (futex, nanosleep, poll, ...) are oneshot and all timers
which are self rearming are rate limited by the fact that we only
rearm when the previous event has been consumed by the task which
scheduled it. So the scheduler controls how many of these events can
be created from user space.

 However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
 hasbeen) can be made to exhibit the symptom, there may be other arches
 that get seriously dinged up or maybe even as thoroughly b0rked as Xen
 does when hrtimer_interrupt() is pounded into the ground by userspace.
 
 Alternatively, should someone out there know that all bare metal is in
 fact fine post 279f14614, that person will likely also know what the Xen
 folks need to do to fix up their busted arch. 
 
 The below targets the symptom, consider it hrtimer cluebat attractant.

By now I know to take your patches with a grain of salt :)

Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/