Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Jiri Olsa
On Mon, Oct 17, 2016 at 11:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior escreveu:
> > By default the application uses malloc() and all available CPUs. This
> > patch introduces NUMA support which means:
> > - memory is allocated node local via numa_alloc_local()
> > - all CPUs of the specified NUMA node are used. This is also true if the
> >   number of threads set is greater than the number of CPUs available on
> >   this node.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> >  tools/perf/bench/Build|  4 ++
> >  tools/perf/bench/futex-hash.c | 87 
> > ++-
> >  2 files changed, 81 insertions(+), 10 deletions(-)
> > 
> > diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> > index 60bf11943047..9e6e518d7d62 100644
> > --- a/tools/perf/bench/Build
> > +++ b/tools/perf/bench/Build
> > @@ -1,3 +1,7 @@
> > +ifdef CONFIG_NUMA
> > +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> > +endif
> 
> Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
> available to tools?

not directly ATM.. it's prepared for the .config customary setting feature

meanwhile we set HAVE_* defines, like for CONFIG_NUMA we have:
 -DHAVE_LIBNUMA_SUPPORT

you can check it in Makefile.config

jirka


Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Jiri Olsa
On Mon, Oct 17, 2016 at 11:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior escreveu:
> > By default the application uses malloc() and all available CPUs. This
> > patch introduces NUMA support which means:
> > - memory is allocated node local via numa_alloc_local()
> > - all CPUs of the specified NUMA node are used. This is also true if the
> >   number of threads set is greater than the number of CPUs available on
> >   this node.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> >  tools/perf/bench/Build|  4 ++
> >  tools/perf/bench/futex-hash.c | 87 
> > ++-
> >  2 files changed, 81 insertions(+), 10 deletions(-)
> > 
> > diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> > index 60bf11943047..9e6e518d7d62 100644
> > --- a/tools/perf/bench/Build
> > +++ b/tools/perf/bench/Build
> > @@ -1,3 +1,7 @@
> > +ifdef CONFIG_NUMA
> > +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> > +endif
> 
> Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
> available to tools?

not directly ATM.. it's prepared for the .config customary setting feature

meanwhile we set HAVE_* defines, like for CONFIG_NUMA we have:
 -DHAVE_LIBNUMA_SUPPORT

you can check it in Makefile.config

jirka


Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 17, 2016 at 05:01:23PM +0200, Jiri Olsa escreveu:
> On Mon, Oct 17, 2016 at 11:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior 
> > escreveu:
> > > By default the application uses malloc() and all available CPUs. This
> > > patch introduces NUMA support which means:
> > > - memory is allocated node local via numa_alloc_local()
> > > - all CPUs of the specified NUMA node are used. This is also true if the
> > >   number of threads set is greater than the number of CPUs available on
> > >   this node.
> > > 
> > > Signed-off-by: Sebastian Andrzej Siewior 
> > > ---
> > >  tools/perf/bench/Build|  4 ++
> > >  tools/perf/bench/futex-hash.c | 87 
> > > ++-
> > >  2 files changed, 81 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> > > index 60bf11943047..9e6e518d7d62 100644
> > > --- a/tools/perf/bench/Build
> > > +++ b/tools/perf/bench/Build
> > > @@ -1,3 +1,7 @@
> > > +ifdef CONFIG_NUMA
> > > +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> > > +endif
> > 
> > Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
> > available to tools?
> 
> not directly ATM.. it's prepared for the .config customary setting feature
> 
> meanwhile we set HAVE_* defines, like for CONFIG_NUMA we have:
>  -DHAVE_LIBNUMA_SUPPORT
> 
> you can check it in Makefile.config

So, Andrzej, can you use that instead? I merged the first patch already.

- Arnaldo


Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 17, 2016 at 05:01:23PM +0200, Jiri Olsa escreveu:
> On Mon, Oct 17, 2016 at 11:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior 
> > escreveu:
> > > By default the application uses malloc() and all available CPUs. This
> > > patch introduces NUMA support which means:
> > > - memory is allocated node local via numa_alloc_local()
> > > - all CPUs of the specified NUMA node are used. This is also true if the
> > >   number of threads set is greater than the number of CPUs available on
> > >   this node.
> > > 
> > > Signed-off-by: Sebastian Andrzej Siewior 
> > > ---
> > >  tools/perf/bench/Build|  4 ++
> > >  tools/perf/bench/futex-hash.c | 87 
> > > ++-
> > >  2 files changed, 81 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> > > index 60bf11943047..9e6e518d7d62 100644
> > > --- a/tools/perf/bench/Build
> > > +++ b/tools/perf/bench/Build
> > > @@ -1,3 +1,7 @@
> > > +ifdef CONFIG_NUMA
> > > +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> > > +endif
> > 
> > Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
> > available to tools?
> 
> not directly ATM.. it's prepared for the .config customary setting feature
> 
> meanwhile we set HAVE_* defines, like for CONFIG_NUMA we have:
>  -DHAVE_LIBNUMA_SUPPORT
> 
> you can check it in Makefile.config

So, Andrzej, can you use that instead? I merged the first patch already.

- Arnaldo


Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Arnaldo Carvalho de Melo
Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior escreveu:
> By default the application uses malloc() and all available CPUs. This
> patch introduces NUMA support which means:
> - memory is allocated node local via numa_alloc_local()
> - all CPUs of the specified NUMA node are used. This is also true if the
>   number of threads set is greater than the number of CPUs available on
>   this node.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  tools/perf/bench/Build|  4 ++
>  tools/perf/bench/futex-hash.c | 87 
> ++-
>  2 files changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> index 60bf11943047..9e6e518d7d62 100644
> --- a/tools/perf/bench/Build
> +++ b/tools/perf/bench/Build
> @@ -1,3 +1,7 @@
> +ifdef CONFIG_NUMA
> +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> +endif

Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
available to tools?

>  perf-y += sched-messaging.o
>  perf-y += sched-pipe.o
>  perf-y += mem-functions.o
> diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
> index d9e5e80bb4d0..8db4a5bd6a4e 100644
> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -25,6 +25,9 @@
>  
>  #include 
>  #include 
> +#ifdef CONFIG_NUMA
> +#include 
> +#endif
>  
>  static unsigned int nthreads = 0;
>  static unsigned int nsecs= 10;
> @@ -32,6 +35,7 @@ static unsigned int nsecs= 10;
>  static unsigned int nfutexes = 1024;
>  static bool fshared = false, done = false, silent = false;
>  static int futex_flag = 0;
> +static int numa_node = -1;
>  
>  struct timeval start, end, runtime;
>  static pthread_mutex_t thread_lock;
> @@ -55,9 +59,28 @@ static const struct option options[] = {
>   OPT_UINTEGER('f', "futexes", , "Specify amount of futexes per 
> threads"),
>   OPT_BOOLEAN( 's', "silent",  ,   "Silent mode: do not display 
> data/details"),
>   OPT_BOOLEAN( 'S', "shared",  ,  "Use shared futexes instead of 
> private ones"),
> +#ifdef CONFIG_NUMA
> + OPT_INTEGER( 'n', "numa",   _node,  "Specify the NUMA node"),
> +#endif
>   OPT_END()
>  };
>  
> +#ifndef CONFIG_NUMA
> +static int numa_run_on_node(int node __maybe_unused) { return 0; }
> +static int numa_node_of_cpu(int node __maybe_unused) { return 0; }
> +static void *numa_alloc_local(size_t size) { return malloc(size); }
> +static void numa_free(void *p, size_t size __maybe_unused) { return free(p); 
> }
> +#endif
> +
> +static bool cpu_is_local(int cpu)
> +{
> + if (numa_node < 0)
> + return true;
> + if (numa_node_of_cpu(cpu) == numa_node)
> + return true;
> + return false;
> +}
> +
>  static const char * const bench_futex_hash_usage[] = {
>   "perf bench futex hash ",
>   NULL
> @@ -123,6 +146,8 @@ int bench_futex_hash(int argc, const char **argv,
>   unsigned int i, ncpus;
>   pthread_attr_t thread_attr;
>   struct worker *worker = NULL;
> + char *node_str = NULL;
> + unsigned int cpunum;
>  
>   argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
>   if (argc) {
> @@ -136,18 +161,50 @@ int bench_futex_hash(int argc, const char **argv,
>   act.sa_sigaction = toggle_done;
>   sigaction(SIGINT, , NULL);
>  
> - if (!nthreads) /* default to the number of CPUs */
> - nthreads = ncpus;
> + if (!nthreads) {
> + /* default to the number of CPUs per NUMA node */
> + if (numa_node < 0) {
> + nthreads = ncpus;
> + } else {
> + for (i = 0; i < ncpus; i++) {
> + if (cpu_is_local(i))
> + nthreads++;
> + }
> + if (!nthreads)
> + err(EXIT_FAILURE, "No online CPUs for this 
> node");
> + }
> + } else {
> + int cpu_available = 0;
>  
> - worker = calloc(nthreads, sizeof(*worker));
> + for (i = 0; i < ncpus && !cpu_available; i++) {
> + if (cpu_is_local(i))
> + cpu_available = 1;
> + }
> + if (!cpu_available)
> + err(EXIT_FAILURE, "No online CPUs for this node");
> + }
> +
> + if (numa_node >= 0) {
> + ret = numa_run_on_node(numa_node);
> + if (ret < 0)
> + err(EXIT_FAILURE, "numa_run_on_node");
> + ret = asprintf(_str, " on node %d", numa_node);
> + if (ret < 0)
> + err(EXIT_FAILURE, "numa_node, asprintf");
> + }
> +
> + worker = numa_alloc_local(nthreads * sizeof(*worker));
>   if (!worker)
>   goto errmem;
>  
>   if (!fshared)
>   futex_flag = FUTEX_PRIVATE_FLAG;
>  
> - printf("Run summary [PID %d]: %d threads, each 

Re: [PATCH 2/2] perf bench futex: add NUMA support

2016-10-17 Thread Arnaldo Carvalho de Melo
Em Sun, Oct 16, 2016 at 09:08:03PM +0200, Sebastian Andrzej Siewior escreveu:
> By default the application uses malloc() and all available CPUs. This
> patch introduces NUMA support which means:
> - memory is allocated node local via numa_alloc_local()
> - all CPUs of the specified NUMA node are used. This is also true if the
>   number of threads set is greater than the number of CPUs available on
>   this node.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  tools/perf/bench/Build|  4 ++
>  tools/perf/bench/futex-hash.c | 87 
> ++-
>  2 files changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> index 60bf11943047..9e6e518d7d62 100644
> --- a/tools/perf/bench/Build
> +++ b/tools/perf/bench/Build
> @@ -1,3 +1,7 @@
> +ifdef CONFIG_NUMA
> +CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
> +endif

Jiri, do we really need this? I.e. aren't the CONFIG_FOO defines
available to tools?

>  perf-y += sched-messaging.o
>  perf-y += sched-pipe.o
>  perf-y += mem-functions.o
> diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
> index d9e5e80bb4d0..8db4a5bd6a4e 100644
> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -25,6 +25,9 @@
>  
>  #include 
>  #include 
> +#ifdef CONFIG_NUMA
> +#include 
> +#endif
>  
>  static unsigned int nthreads = 0;
>  static unsigned int nsecs= 10;
> @@ -32,6 +35,7 @@ static unsigned int nsecs= 10;
>  static unsigned int nfutexes = 1024;
>  static bool fshared = false, done = false, silent = false;
>  static int futex_flag = 0;
> +static int numa_node = -1;
>  
>  struct timeval start, end, runtime;
>  static pthread_mutex_t thread_lock;
> @@ -55,9 +59,28 @@ static const struct option options[] = {
>   OPT_UINTEGER('f', "futexes", , "Specify amount of futexes per 
> threads"),
>   OPT_BOOLEAN( 's', "silent",  ,   "Silent mode: do not display 
> data/details"),
>   OPT_BOOLEAN( 'S', "shared",  ,  "Use shared futexes instead of 
> private ones"),
> +#ifdef CONFIG_NUMA
> + OPT_INTEGER( 'n', "numa",   _node,  "Specify the NUMA node"),
> +#endif
>   OPT_END()
>  };
>  
> +#ifndef CONFIG_NUMA
> +static int numa_run_on_node(int node __maybe_unused) { return 0; }
> +static int numa_node_of_cpu(int node __maybe_unused) { return 0; }
> +static void *numa_alloc_local(size_t size) { return malloc(size); }
> +static void numa_free(void *p, size_t size __maybe_unused) { return free(p); 
> }
> +#endif
> +
> +static bool cpu_is_local(int cpu)
> +{
> + if (numa_node < 0)
> + return true;
> + if (numa_node_of_cpu(cpu) == numa_node)
> + return true;
> + return false;
> +}
> +
>  static const char * const bench_futex_hash_usage[] = {
>   "perf bench futex hash ",
>   NULL
> @@ -123,6 +146,8 @@ int bench_futex_hash(int argc, const char **argv,
>   unsigned int i, ncpus;
>   pthread_attr_t thread_attr;
>   struct worker *worker = NULL;
> + char *node_str = NULL;
> + unsigned int cpunum;
>  
>   argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
>   if (argc) {
> @@ -136,18 +161,50 @@ int bench_futex_hash(int argc, const char **argv,
>   act.sa_sigaction = toggle_done;
>   sigaction(SIGINT, , NULL);
>  
> - if (!nthreads) /* default to the number of CPUs */
> - nthreads = ncpus;
> + if (!nthreads) {
> + /* default to the number of CPUs per NUMA node */
> + if (numa_node < 0) {
> + nthreads = ncpus;
> + } else {
> + for (i = 0; i < ncpus; i++) {
> + if (cpu_is_local(i))
> + nthreads++;
> + }
> + if (!nthreads)
> + err(EXIT_FAILURE, "No online CPUs for this 
> node");
> + }
> + } else {
> + int cpu_available = 0;
>  
> - worker = calloc(nthreads, sizeof(*worker));
> + for (i = 0; i < ncpus && !cpu_available; i++) {
> + if (cpu_is_local(i))
> + cpu_available = 1;
> + }
> + if (!cpu_available)
> + err(EXIT_FAILURE, "No online CPUs for this node");
> + }
> +
> + if (numa_node >= 0) {
> + ret = numa_run_on_node(numa_node);
> + if (ret < 0)
> + err(EXIT_FAILURE, "numa_run_on_node");
> + ret = asprintf(_str, " on node %d", numa_node);
> + if (ret < 0)
> + err(EXIT_FAILURE, "numa_node, asprintf");
> + }
> +
> + worker = numa_alloc_local(nthreads * sizeof(*worker));
>   if (!worker)
>   goto errmem;
>  
>   if (!fshared)
>   futex_flag = FUTEX_PRIVATE_FLAG;
>  
> - printf("Run summary [PID %d]: %d threads, each operating on %d [%s] 
> 

[PATCH 2/2] perf bench futex: add NUMA support

2016-10-16 Thread Sebastian Andrzej Siewior
By default the application uses malloc() and all available CPUs. This
patch introduces NUMA support which means:
- memory is allocated node local via numa_alloc_local()
- all CPUs of the specified NUMA node are used. This is also true if the
  number of threads set is greater than the number of CPUs available on
  this node.

Signed-off-by: Sebastian Andrzej Siewior 
---
 tools/perf/bench/Build|  4 ++
 tools/perf/bench/futex-hash.c | 87 ++-
 2 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index 60bf11943047..9e6e518d7d62 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -1,3 +1,7 @@
+ifdef CONFIG_NUMA
+CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
+endif
+
 perf-y += sched-messaging.o
 perf-y += sched-pipe.o
 perf-y += mem-functions.o
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index d9e5e80bb4d0..8db4a5bd6a4e 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -25,6 +25,9 @@
 
 #include 
 #include 
+#ifdef CONFIG_NUMA
+#include 
+#endif
 
 static unsigned int nthreads = 0;
 static unsigned int nsecs= 10;
@@ -32,6 +35,7 @@ static unsigned int nsecs= 10;
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
 static int futex_flag = 0;
+static int numa_node = -1;
 
 struct timeval start, end, runtime;
 static pthread_mutex_t thread_lock;
@@ -55,9 +59,28 @@ static const struct option options[] = {
OPT_UINTEGER('f', "futexes", , "Specify amount of futexes per 
threads"),
OPT_BOOLEAN( 's', "silent",  ,   "Silent mode: do not display 
data/details"),
OPT_BOOLEAN( 'S', "shared",  ,  "Use shared futexes instead of 
private ones"),
+#ifdef CONFIG_NUMA
+   OPT_INTEGER( 'n', "numa",   _node,  "Specify the NUMA node"),
+#endif
OPT_END()
 };
 
+#ifndef CONFIG_NUMA
+static int numa_run_on_node(int node __maybe_unused) { return 0; }
+static int numa_node_of_cpu(int node __maybe_unused) { return 0; }
+static void *numa_alloc_local(size_t size) { return malloc(size); }
+static void numa_free(void *p, size_t size __maybe_unused) { return free(p); }
+#endif
+
+static bool cpu_is_local(int cpu)
+{
+   if (numa_node < 0)
+   return true;
+   if (numa_node_of_cpu(cpu) == numa_node)
+   return true;
+   return false;
+}
+
 static const char * const bench_futex_hash_usage[] = {
"perf bench futex hash ",
NULL
@@ -123,6 +146,8 @@ int bench_futex_hash(int argc, const char **argv,
unsigned int i, ncpus;
pthread_attr_t thread_attr;
struct worker *worker = NULL;
+   char *node_str = NULL;
+   unsigned int cpunum;
 
argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
if (argc) {
@@ -136,18 +161,50 @@ int bench_futex_hash(int argc, const char **argv,
act.sa_sigaction = toggle_done;
sigaction(SIGINT, , NULL);
 
-   if (!nthreads) /* default to the number of CPUs */
-   nthreads = ncpus;
+   if (!nthreads) {
+   /* default to the number of CPUs per NUMA node */
+   if (numa_node < 0) {
+   nthreads = ncpus;
+   } else {
+   for (i = 0; i < ncpus; i++) {
+   if (cpu_is_local(i))
+   nthreads++;
+   }
+   if (!nthreads)
+   err(EXIT_FAILURE, "No online CPUs for this 
node");
+   }
+   } else {
+   int cpu_available = 0;
 
-   worker = calloc(nthreads, sizeof(*worker));
+   for (i = 0; i < ncpus && !cpu_available; i++) {
+   if (cpu_is_local(i))
+   cpu_available = 1;
+   }
+   if (!cpu_available)
+   err(EXIT_FAILURE, "No online CPUs for this node");
+   }
+
+   if (numa_node >= 0) {
+   ret = numa_run_on_node(numa_node);
+   if (ret < 0)
+   err(EXIT_FAILURE, "numa_run_on_node");
+   ret = asprintf(_str, " on node %d", numa_node);
+   if (ret < 0)
+   err(EXIT_FAILURE, "numa_node, asprintf");
+   }
+
+   worker = numa_alloc_local(nthreads * sizeof(*worker));
if (!worker)
goto errmem;
 
if (!fshared)
futex_flag = FUTEX_PRIVATE_FLAG;
 
-   printf("Run summary [PID %d]: %d threads, each operating on %d [%s] 
futexes for %d secs.\n\n",
-  getpid(), nthreads, nfutexes, fshared ? "shared":"private", 
nsecs);
+   printf("Run summary [PID %d]: %d threads%s, each operating on %d [%s] 
futexes for %d secs.\n\n",
+  getpid(), nthreads,
+  node_str ? : "",
+  

[PATCH 2/2] perf bench futex: add NUMA support

2016-10-16 Thread Sebastian Andrzej Siewior
By default the application uses malloc() and all available CPUs. This
patch introduces NUMA support which means:
- memory is allocated node local via numa_alloc_local()
- all CPUs of the specified NUMA node are used. This is also true if the
  number of threads set is greater than the number of CPUs available on
  this node.

Signed-off-by: Sebastian Andrzej Siewior 
---
 tools/perf/bench/Build|  4 ++
 tools/perf/bench/futex-hash.c | 87 ++-
 2 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index 60bf11943047..9e6e518d7d62 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -1,3 +1,7 @@
+ifdef CONFIG_NUMA
+CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
+endif
+
 perf-y += sched-messaging.o
 perf-y += sched-pipe.o
 perf-y += mem-functions.o
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index d9e5e80bb4d0..8db4a5bd6a4e 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -25,6 +25,9 @@
 
 #include 
 #include 
+#ifdef CONFIG_NUMA
+#include 
+#endif
 
 static unsigned int nthreads = 0;
 static unsigned int nsecs= 10;
@@ -32,6 +35,7 @@ static unsigned int nsecs= 10;
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
 static int futex_flag = 0;
+static int numa_node = -1;
 
 struct timeval start, end, runtime;
 static pthread_mutex_t thread_lock;
@@ -55,9 +59,28 @@ static const struct option options[] = {
OPT_UINTEGER('f', "futexes", , "Specify amount of futexes per 
threads"),
OPT_BOOLEAN( 's', "silent",  ,   "Silent mode: do not display 
data/details"),
OPT_BOOLEAN( 'S', "shared",  ,  "Use shared futexes instead of 
private ones"),
+#ifdef CONFIG_NUMA
+   OPT_INTEGER( 'n', "numa",   _node,  "Specify the NUMA node"),
+#endif
OPT_END()
 };
 
+#ifndef CONFIG_NUMA
+static int numa_run_on_node(int node __maybe_unused) { return 0; }
+static int numa_node_of_cpu(int node __maybe_unused) { return 0; }
+static void *numa_alloc_local(size_t size) { return malloc(size); }
+static void numa_free(void *p, size_t size __maybe_unused) { return free(p); }
+#endif
+
+static bool cpu_is_local(int cpu)
+{
+   if (numa_node < 0)
+   return true;
+   if (numa_node_of_cpu(cpu) == numa_node)
+   return true;
+   return false;
+}
+
 static const char * const bench_futex_hash_usage[] = {
"perf bench futex hash ",
NULL
@@ -123,6 +146,8 @@ int bench_futex_hash(int argc, const char **argv,
unsigned int i, ncpus;
pthread_attr_t thread_attr;
struct worker *worker = NULL;
+   char *node_str = NULL;
+   unsigned int cpunum;
 
argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
if (argc) {
@@ -136,18 +161,50 @@ int bench_futex_hash(int argc, const char **argv,
act.sa_sigaction = toggle_done;
sigaction(SIGINT, , NULL);
 
-   if (!nthreads) /* default to the number of CPUs */
-   nthreads = ncpus;
+   if (!nthreads) {
+   /* default to the number of CPUs per NUMA node */
+   if (numa_node < 0) {
+   nthreads = ncpus;
+   } else {
+   for (i = 0; i < ncpus; i++) {
+   if (cpu_is_local(i))
+   nthreads++;
+   }
+   if (!nthreads)
+   err(EXIT_FAILURE, "No online CPUs for this 
node");
+   }
+   } else {
+   int cpu_available = 0;
 
-   worker = calloc(nthreads, sizeof(*worker));
+   for (i = 0; i < ncpus && !cpu_available; i++) {
+   if (cpu_is_local(i))
+   cpu_available = 1;
+   }
+   if (!cpu_available)
+   err(EXIT_FAILURE, "No online CPUs for this node");
+   }
+
+   if (numa_node >= 0) {
+   ret = numa_run_on_node(numa_node);
+   if (ret < 0)
+   err(EXIT_FAILURE, "numa_run_on_node");
+   ret = asprintf(_str, " on node %d", numa_node);
+   if (ret < 0)
+   err(EXIT_FAILURE, "numa_node, asprintf");
+   }
+
+   worker = numa_alloc_local(nthreads * sizeof(*worker));
if (!worker)
goto errmem;
 
if (!fshared)
futex_flag = FUTEX_PRIVATE_FLAG;
 
-   printf("Run summary [PID %d]: %d threads, each operating on %d [%s] 
futexes for %d secs.\n\n",
-  getpid(), nthreads, nfutexes, fshared ? "shared":"private", 
nsecs);
+   printf("Run summary [PID %d]: %d threads%s, each operating on %d [%s] 
futexes for %d secs.\n\n",
+  getpid(), nthreads,
+  node_str ? : "",
+  nfutexes, fshared ?