Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Davidlohr Bueso

On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote:


On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming
a non-trivial (ie 13%) amount of CPU. This is due to cacheline
bouncing due to the increment of w->ops. We can easily fix this by
just working on a local copy and updating the actual worker once
done running, and ready to show the program summary. There is no
danger of the worker being concurrent, so we can trust that no stale
value is being seen by another thread.

Reported-by: Sebastian Andrzej Siewior 

Acked-by: Sebastian Andrzej Siewior 


Thanks.




--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
-   unsigned int i;
struct worker *w = (struct worker *) arg;
+   unsigned int i;
+   unsigned long ops = w->ops; /* avoid cacheline bouncing */


we start at 0 so there is probably no need to init it with w->ops.


Yeah, but I prefer having it this way - separates the init from the actual
work (although no big deal here). The extra load happens ncpu times, so
also no big deal.


Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Davidlohr Bueso

On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote:


On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming
a non-trivial (ie 13%) amount of CPU. This is due to cacheline
bouncing due to the increment of w->ops. We can easily fix this by
just working on a local copy and updating the actual worker once
done running, and ready to show the program summary. There is no
danger of the worker being concurrent, so we can trust that no stale
value is being seen by another thread.

Reported-by: Sebastian Andrzej Siewior 

Acked-by: Sebastian Andrzej Siewior 


Thanks.




--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
-   unsigned int i;
struct worker *w = (struct worker *) arg;
+   unsigned int i;
+   unsigned long ops = w->ops; /* avoid cacheline bouncing */


we start at 0 so there is probably no need to init it with w->ops.


Yeah, but I prefer having it this way - separates the init from the actual
work (although no big deal here). The extra load happens ncpu times, so
also no big deal.


Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Sebastian Andrzej Siewior
On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:
> Sebastian noted that overhead for worker thread ops (throughput)
> accounting was producing 'perf' to appear in the profiles, consuming
> a non-trivial (ie 13%) amount of CPU. This is due to cacheline
> bouncing due to the increment of w->ops. We can easily fix this by
> just working on a local copy and updating the actual worker once
> done running, and ready to show the program summary. There is no
> danger of the worker being concurrent, so we can trust that no stale
> value is being seen by another thread.
> 
> Reported-by: Sebastian Andrzej Siewior 
Acked-by: Sebastian Andrzej Siewior 

> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
> static void *workerfn(void *arg)
> {
>   int ret;
> - unsigned int i;
>   struct worker *w = (struct worker *) arg;
> + unsigned int i;
> + unsigned long ops = w->ops; /* avoid cacheline bouncing */

we start at 0 so there is probably no need to init it with w->ops.

Sebastian


Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Sebastian Andrzej Siewior
On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:
> Sebastian noted that overhead for worker thread ops (throughput)
> accounting was producing 'perf' to appear in the profiles, consuming
> a non-trivial (ie 13%) amount of CPU. This is due to cacheline
> bouncing due to the increment of w->ops. We can easily fix this by
> just working on a local copy and updating the actual worker once
> done running, and ready to show the program summary. There is no
> danger of the worker being concurrent, so we can trust that no stale
> value is being seen by another thread.
> 
> Reported-by: Sebastian Andrzej Siewior 
Acked-by: Sebastian Andrzej Siewior 

> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
> static void *workerfn(void *arg)
> {
>   int ret;
> - unsigned int i;
>   struct worker *w = (struct worker *) arg;
> + unsigned int i;
> + unsigned long ops = w->ops; /* avoid cacheline bouncing */

we start at 0 so there is probably no need to init it with w->ops.

Sebastian