Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-16 Thread Hideaki Kimura
Removing dependency to the database code is trivial. It's just 100 lines 
that launch lots of threads and do NUMA-aware memory accesses so that 
remote NUMA access cost does not affect the benchmark.


It's just a bit tedious to convert the C++11 code into C/pthread.
C++11 really spoiled me.
Still, not much work. Let me know where to post the code.

On 10/16/2015 10:34 AM, Jason Low wrote:

Mind posting it, so that people can stick it into a new 'perf bench timer'
subcommand, and/or reproduce your results with it?


Yes, sure. At the moment, this micro benchmark is written in C++ and
integrated with the database code. We can look into rewriting it into a
more general program so that it can be included in perf.

Thanks,
Jason



--
Hideaki Kimura
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-16 Thread Jason Low
On Fri, 2015-10-16 at 09:12 +0200, Ingo Molnar wrote:
> * Jason Low  wrote:
> 
> > > > With this patch set (along with commit 1018016c706f mentioned above),
> > > > the performance hit of itimers almost completely goes away on the
> > > > 16 socket system.
> > > > 
> > > > Jason Low (4):
> > > >   timer: Optimize fastpath_timer_check()
> > > >   timer: Check thread timers only when there are active thread timers
> > > >   timer: Convert cputimer->running to bool
> > > >   timer: Reduce unnecessary sighand lock contention
> > > > 
> > > >  include/linux/init_task.h  |3 +-
> > > >  include/linux/sched.h  |9 --
> > > >  kernel/fork.c  |2 +-
> > > >  kernel/time/posix-cpu-timers.c |   63 
> > > > ---
> > > >  4 files changed, 54 insertions(+), 23 deletions(-)
> > > 
> > > Is there some itimers benchmark that can be used to measure the effects 
> > > of these 
> > > changes?
> > 
> > Yes, we also wrote a micro benchmark which generates cache misses and 
> > measures 
> > the average cost of each cache miss (with itimers enabled). We used this 
> > while 
> > writing and testing patches, since it takes a bit longer to set up and run 
> > the 
> > database.
> 
> Mind posting it, so that people can stick it into a new 'perf bench timer' 
> subcommand, and/or reproduce your results with it?

Yes, sure. At the moment, this micro benchmark is written in C++ and
integrated with the database code. We can look into rewriting it into a
more general program so that it can be included in perf.

Thanks,
Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-16 Thread Ingo Molnar

* Jason Low  wrote:

> > > With this patch set (along with commit 1018016c706f mentioned above),
> > > the performance hit of itimers almost completely goes away on the
> > > 16 socket system.
> > > 
> > > Jason Low (4):
> > >   timer: Optimize fastpath_timer_check()
> > >   timer: Check thread timers only when there are active thread timers
> > >   timer: Convert cputimer->running to bool
> > >   timer: Reduce unnecessary sighand lock contention
> > > 
> > >  include/linux/init_task.h  |3 +-
> > >  include/linux/sched.h  |9 --
> > >  kernel/fork.c  |2 +-
> > >  kernel/time/posix-cpu-timers.c |   63 
> > > ---
> > >  4 files changed, 54 insertions(+), 23 deletions(-)
> > 
> > Is there some itimers benchmark that can be used to measure the effects of 
> > these 
> > changes?
> 
> Yes, we also wrote a micro benchmark which generates cache misses and 
> measures 
> the average cost of each cache miss (with itimers enabled). We used this 
> while 
> writing and testing patches, since it takes a bit longer to set up and run 
> the 
> database.

Mind posting it, so that people can stick it into a new 'perf bench timer' 
subcommand, and/or reproduce your results with it?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Thu, 2015-10-15 at 10:47 +0200, Ingo Molnar wrote:
> * Jason Low  wrote:
> 
> > While running a database workload on a 16 socket machine, there were
> > scalability issues related to itimers. The following link contains a
> > more detailed summary of the issues at the application level.
> > 
> > https://lkml.org/lkml/2015/8/26/737
> > 
> > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > spinlock taking up a significant portion of total run time.
> > This patch series addresses the secondary issue where a lot of time is
> > spent trying to acquire the sighand lock. It was found in some cases
> > that 200+ threads were simultaneously contending for the same sighand
> > lock, reducing throughput by more than 30%.
> > 
> > With this patch set (along with commit 1018016c706f mentioned above),
> > the performance hit of itimers almost completely goes away on the
> > 16 socket system.
> > 
> > Jason Low (4):
> >   timer: Optimize fastpath_timer_check()
> >   timer: Check thread timers only when there are active thread timers
> >   timer: Convert cputimer->running to bool
> >   timer: Reduce unnecessary sighand lock contention
> > 
> >  include/linux/init_task.h  |3 +-
> >  include/linux/sched.h  |9 --
> >  kernel/fork.c  |2 +-
> >  kernel/time/posix-cpu-timers.c |   63 
> > ---
> >  4 files changed, 54 insertions(+), 23 deletions(-)
> 
> Is there some itimers benchmark that can be used to measure the effects of 
> these 
> changes?

Yes, we also wrote a micro benchmark which generates cache misses and
measures the average cost of each cache miss (with itimers enabled). We
used this while writing and testing patches, since it takes a bit longer
to set up and run the database.

Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Wed, 2015-10-14 at 17:18 -0400, George Spelvin wrote:
> I'm going to give 4/4 a closer look to see if the races with timer
> expiration make more sense to me than last time around.
> (E.g. do CPU time signals even work in CONFIG_NO_HZ_FULL?)
> 
> But although I haven't yet convinced myself the current code is right,
> the changes don't seem to make it any worse.  So consider all four
> 
> Reviewed-by: George Spelvin 

Thanks George!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Frederic Weisbecker
On Wed, Oct 14, 2015 at 05:18:27PM -0400, George Spelvin wrote:
> I'm going to give 4/4 a closer look to see if the races with timer
> expiration make more sense to me than last time around.
> (E.g. do CPU time signals even work in CONFIG_NO_HZ_FULL?)

Those enqueued with timer_settime() do work. But itimers,
and rlimits (RLIMIT_RTTIME, RLIMIT_CPU) aren't supported well. I
need to rework that.

> 
> But although I haven't yet convinced myself the current code is right,
> the changes don't seem to make it any worse.  So consider all four
> 
> Reviewed-by: George Spelvin 
> 
> Thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Ingo Molnar

* Jason Low  wrote:

> While running a database workload on a 16 socket machine, there were
> scalability issues related to itimers. The following link contains a
> more detailed summary of the issues at the application level.
> 
> https://lkml.org/lkml/2015/8/26/737
> 
> Commit 1018016c706f addressed the issue with the thread_group_cputimer
> spinlock taking up a significant portion of total run time.
> This patch series addresses the secondary issue where a lot of time is
> spent trying to acquire the sighand lock. It was found in some cases
> that 200+ threads were simultaneously contending for the same sighand
> lock, reducing throughput by more than 30%.
> 
> With this patch set (along with commit 1018016c706f mentioned above),
> the performance hit of itimers almost completely goes away on the
> 16 socket system.
> 
> Jason Low (4):
>   timer: Optimize fastpath_timer_check()
>   timer: Check thread timers only when there are active thread timers
>   timer: Convert cputimer->running to bool
>   timer: Reduce unnecessary sighand lock contention
> 
>  include/linux/init_task.h  |3 +-
>  include/linux/sched.h  |9 --
>  kernel/fork.c  |2 +-
>  kernel/time/posix-cpu-timers.c |   63 ---
>  4 files changed, 54 insertions(+), 23 deletions(-)

Is there some itimers benchmark that can be used to measure the effects of 
these 
changes?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-14 Thread George Spelvin
I'm going to give 4/4 a closer look to see if the races with timer
expiration make more sense to me than last time around.
(E.g. do CPU time signals even work in CONFIG_NO_HZ_FULL?)

But although I haven't yet convinced myself the current code is right,
the changes don't seem to make it any worse.  So consider all four

Reviewed-by: George Spelvin 

Thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/4] timer: Improve itimers scalability

2015-10-14 Thread Jason Low
While running a database workload on a 16 socket machine, there were
scalability issues related to itimers. The following link contains a
more detailed summary of the issues at the application level.

https://lkml.org/lkml/2015/8/26/737

Commit 1018016c706f addressed the issue with the thread_group_cputimer
spinlock taking up a significant portion of total run time.
This patch series addresses the secondary issue where a lot of time is
spent trying to acquire the sighand lock. It was found in some cases
that 200+ threads were simultaneously contending for the same sighand
lock, reducing throughput by more than 30%.

With this patch set (along with commit 1018016c706f mentioned above),
the performance hit of itimers almost completely goes away on the
16 socket system.

Jason Low (4):
  timer: Optimize fastpath_timer_check()
  timer: Check thread timers only when there are active thread timers
  timer: Convert cputimer->running to bool
  timer: Reduce unnecessary sighand lock contention

 include/linux/init_task.h  |3 +-
 include/linux/sched.h  |9 --
 kernel/fork.c  |2 +-
 kernel/time/posix-cpu-timers.c |   63 ---
 4 files changed, 54 insertions(+), 23 deletions(-)

-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/