Re: [hwloc-users] memory binding on Knights Landing

2016-09-08 Thread Brice Goglin



Le 08/09/2016 17:59, Dave Love a écrit :
> Brice Goglin  writes:
>
>> Hello
>> It's not a feature. This should work fine.
>> Random guess: do you have NUMA headers on your build machine ? (package
>> libnuma-dev or numactl-devel)
>> (hwloc-info --support also report whether membinding is supported or not)
>> Brice
> Oops, you're right!  Thanks.  I thought what I'm using elsewhere was
> built from the same srpm, but the rpm on the KNL box doesn't actually
> require libnuma.  After a rebuild, it's OK and I'm suitably embarrassed.

Is there anything to fix on the RPM side? Intel people are carrefully
working with RedHat so that hwloc is properly packaged for RHEL. I can
report bugs if needed.

> By the way, is it expected that binding will be slow on it?  hwloc-bind
> is ~10 times slower (~1s) than on two-socket sandybridge, and ~3 times
> slower than on a 128-core, 16-socket system.

Binding itself shouldn't be slower. But hwloc's topology discovery
(which is performed by hwloc-bind before actual binding) is slower on
KNL than on "normal" nodes. The overhead is basically linear with the
number of hyperthreads, and KNL sequential perf is lower than your other
nodes.

The easy fix is to export the topology to XML with lstopo foo.xml and
then tell all hwloc users to load from XML:
export HWLOC_XMLFILE=foo.xml
export HWLOC_THISSYSTEM=1
https://www.open-mpi.org/projects/hwloc/doc/v1.11.4/a00030.php#faq_xml

For hwloc 2.0, I am trying to make sure we don't perform useless
discovery steps. hwloc-bind (and many applications) don't require all
topology details. v1.x gathers everything and filters things out later.
For 2.0, the plan is rather to directly just gather what we need. What
you can try for fun is:
export HWLOC_COMPONENTS=-x86 (without the above XML env vars)
It disables the x86-specific discovery which is useless for most cases
on Linux.

I'll do some performance testing tomorrow too.

Regards
Brice

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] memory binding on Knights Landing

2016-09-08 Thread Jeff Hammond
On Thu, Sep 8, 2016 at 8:59 AM, Dave Love  wrote:

> Brice Goglin  writes:
>
> > Hello
> > It's not a feature. This should work fine.
> > Random guess: do you have NUMA headers on your build machine ? (package
> > libnuma-dev or numactl-devel)
> > (hwloc-info --support also report whether membinding is supported or not)
> > Brice
>
> Oops, you're right!  Thanks.  I thought what I'm using elsewhere was
> built from the same srpm, but the rpm on the KNL box doesn't actually
> require libnuma.  After a rebuild, it's OK and I'm suitably embarrassed.
>
> By the way, is it expected that binding will be slow on it?  hwloc-bind
> is ~10 times slower (~1s) than on two-socket sandybridge, and ~3 times
> slower than on a 128-core, 16-socket system.
>
> Is this a bottleneck in any application?  Are there codes bindings memory
frequently?

Because most things inside the kernel are limited by single-threaded
performance, it is reasonable for them to be slower than on a Xeon
processor, but I've not seen slowdowns that high.

Jeff

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] memory binding on Knights Landing

2016-09-08 Thread Dave Love
Brice Goglin  writes:

> Hello
> It's not a feature. This should work fine.
> Random guess: do you have NUMA headers on your build machine ? (package
> libnuma-dev or numactl-devel)
> (hwloc-info --support also report whether membinding is supported or not)
> Brice

Oops, you're right!  Thanks.  I thought what I'm using elsewhere was
built from the same srpm, but the rpm on the KNL box doesn't actually
require libnuma.  After a rebuild, it's OK and I'm suitably embarrassed.

By the way, is it expected that binding will be slow on it?  hwloc-bind
is ~10 times slower (~1s) than on two-socket sandybridge, and ~3 times
slower than on a 128-core, 16-socket system.
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] memory binding on Knights Landing

2016-09-08 Thread Brice Goglin
Hello
It's not a feature. This should work fine.
Random guess: do you have NUMA headers on your build machine ? (package
libnuma-dev or numactl-devel)
(hwloc-info --support also report whether membinding is supported or not)
Brice



Le 08/09/2016 16:34, Dave Love a écrit :
> I'm somewhat confused by binding on Knights Landing -- which is probably
> a feature.
>
> I'm looking at a KNL box configured as "Cluster Mode: SNC4 Memory Mode:
> Cache" with hwloc 1.11.4; I've read the KNL hwloc FAQ entries.  I ran
> openmpi and it reported failure to bind memory (but binding to cores was
> OK).  So I tried hwloc-bind --membind and that seems to fail with no
> matter what I do, reporting
>
>   hwloc_set_membind 0x0002 (policy 2 flags 0) failed (errno 38 Function 
> not implemented)
>
> Is that expected, and is there a recommendation on how to do binding in
> that configuration with things that use hwloc?  I'm particularly
> interested in OMPI, but I guess this is a better place to ask.  Thanks.
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


[hwloc-users] memory binding on Knights Landing

2016-09-08 Thread Dave Love
I'm somewhat confused by binding on Knights Landing -- which is probably
a feature.

I'm looking at a KNL box configured as "Cluster Mode: SNC4 Memory Mode:
Cache" with hwloc 1.11.4; I've read the KNL hwloc FAQ entries.  I ran
openmpi and it reported failure to bind memory (but binding to cores was
OK).  So I tried hwloc-bind --membind and that seems to fail with no
matter what I do, reporting

  hwloc_set_membind 0x0002 (policy 2 flags 0) failed (errno 38 Function not 
implemented)

Is that expected, and is there a recommendation on how to do binding in
that configuration with things that use hwloc?  I'm particularly
interested in OMPI, but I guess this is a better place to ask.  Thanks.
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users