Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing
On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote: On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: Sebastian noted that overhead for worker thread ops (throughput) accounting was producing 'perf' to appear in the profiles, consuming a non-trivial (ie 13%) amount of CPU. This is due to cacheline bouncing due to the increment of w->ops. We can easily fix this by just working on a local copy and updating the actual worker once done running, and ready to show the program summary. There is no danger of the worker being concurrent, so we can trust that no stale value is being seen by another thread. Reported-by: Sebastian Andrzej SiewiorAcked-by: Sebastian Andrzej Siewior Thanks. --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { static void *workerfn(void *arg) { int ret; - unsigned int i; struct worker *w = (struct worker *) arg; + unsigned int i; + unsigned long ops = w->ops; /* avoid cacheline bouncing */ we start at 0 so there is probably no need to init it with w->ops. Yeah, but I prefer having it this way - separates the init from the actual work (although no big deal here). The extra load happens ncpu times, so also no big deal.
Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing
On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote: On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: Sebastian noted that overhead for worker thread ops (throughput) accounting was producing 'perf' to appear in the profiles, consuming a non-trivial (ie 13%) amount of CPU. This is due to cacheline bouncing due to the increment of w->ops. We can easily fix this by just working on a local copy and updating the actual worker once done running, and ready to show the program summary. There is no danger of the worker being concurrent, so we can trust that no stale value is being seen by another thread. Reported-by: Sebastian Andrzej Siewior Acked-by: Sebastian Andrzej Siewior Thanks. --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { static void *workerfn(void *arg) { int ret; - unsigned int i; struct worker *w = (struct worker *) arg; + unsigned int i; + unsigned long ops = w->ops; /* avoid cacheline bouncing */ we start at 0 so there is probably no need to init it with w->ops. Yeah, but I prefer having it this way - separates the init from the actual work (although no big deal here). The extra load happens ncpu times, so also no big deal.
Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing
On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: > Sebastian noted that overhead for worker thread ops (throughput) > accounting was producing 'perf' to appear in the profiles, consuming > a non-trivial (ie 13%) amount of CPU. This is due to cacheline > bouncing due to the increment of w->ops. We can easily fix this by > just working on a local copy and updating the actual worker once > done running, and ready to show the program summary. There is no > danger of the worker being concurrent, so we can trust that no stale > value is being seen by another thread. > > Reported-by: Sebastian Andrzej SiewiorAcked-by: Sebastian Andrzej Siewior > --- a/tools/perf/bench/futex-hash.c > +++ b/tools/perf/bench/futex-hash.c > @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { > static void *workerfn(void *arg) > { > int ret; > - unsigned int i; > struct worker *w = (struct worker *) arg; > + unsigned int i; > + unsigned long ops = w->ops; /* avoid cacheline bouncing */ we start at 0 so there is probably no need to init it with w->ops. Sebastian
Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing
On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: > Sebastian noted that overhead for worker thread ops (throughput) > accounting was producing 'perf' to appear in the profiles, consuming > a non-trivial (ie 13%) amount of CPU. This is due to cacheline > bouncing due to the increment of w->ops. We can easily fix this by > just working on a local copy and updating the actual worker once > done running, and ready to show the program summary. There is no > danger of the worker being concurrent, so we can trust that no stale > value is being seen by another thread. > > Reported-by: Sebastian Andrzej Siewior Acked-by: Sebastian Andrzej Siewior > --- a/tools/perf/bench/futex-hash.c > +++ b/tools/perf/bench/futex-hash.c > @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { > static void *workerfn(void *arg) > { > int ret; > - unsigned int i; > struct worker *w = (struct worker *) arg; > + unsigned int i; > + unsigned long ops = w->ops; /* avoid cacheline bouncing */ we start at 0 so there is probably no need to init it with w->ops. Sebastian