On 11/03/2015 06:13 PM, Dave Love wrote:
Fabian Wein <fabian.w...@fau.de> writes:

There is an old OpenFOAM installation which includes and old open-mpi,
this might
cause the trouble.

OpenFOAM should definitely be built against the system MPI (and, in
general, you should avoid bundled libraries wherever possible IMHO).


I don't use OpenFOAM at the moment, it was just still sourced in my .bashrc

I also suspect that sourcing the Intel 2016
compilers somehow disturbs.

I don't know, but the Intel compiler is a definite source of trouble,
particularly because of the myths around it.  I've fixed a fair number
of problems for the users who will listen with "Use GCC and Open MPI".


I build against gcc, and meanwhile I don't source the intel compilers any more.

I don’t know how to check if hwloc supports numa, sockets, … But if I
configure 1.11.1 I see on in the configure output. Therefore I build
it manually.

I don't know what the bundled version builds, but if it builds the
utilities, running the hwloc-ps program under hwloc-bind is a way to
test it.  That doesn't verify the mpi installation, though.

   # hwloc-bind node1:1 hwloc-ps | grep hwloc
   13425        NUMANode:1              hwloc-ps

I don't understand what you mean:

opt/hwloc-1.11.1/bin/hwloc-bind
/opt/hwloc-1.11.1/bin/hwloc-bind: nothing to do!

/opt/hwloc-1.11.1/bin/hwloc-bind node1:1
/opt/hwloc-1.11.1/bin/hwloc-bind: nothing to do!

/opt/hwloc-1.11.1/bin/hwloc-bind node1:1 hwloc-ps
-> no output

therefore nothing to grep. I have no idea what hwloc-ps does. There is no man page and --help doesn't help.


   # grep -m1 model\ name /proc/cpuinfo
   model name   : AMD Opteron(TM) Processor 6276

Running hwloc-ps under mpirun should show the default binding anyway.

I don't understand what you man with tha.


but it does not bring me the performance I expect for the petsc benchmark.

Without a sane installation it's probably irrelevant, but performance
relative to what?  Anyhow, why don't you want to bind to cores, or at
least L2 cache, if that’s shared?

I compare the performance of the petsc stream benchmark with a similar but older
4 packages 24 cores opteron system and there -bind-to numa results in
a significant
increase in performance.

I don't know what that benchmark is, but if it's like the canonical
Stream benchmark, that's surprising.  I still don't understand why you
wouldn't want to bind to the lowest level possible.  (lstopo shows that
the system above has 2MB L2 for pairs of cores and 6M L3 for four pairs
on the NUMAnode.)

with open-mpi I cannot bind-to numa, socket, core and cpu.



Anyhow, I finally managed to compile mpich (there were issues with the
intel compilers) and mpich allows bindings on my system.

[I think it also uses hwloc.]

I still have
to find out the optimal binding/ mapping, simply binding to numa as in
the other system doesn’t work but the topology is different. I’m a
user and new to MPI, I still have to learn a lot.

There is tutorial material on locality and hwloc under
<https://www.open-mpi.org/projects/hwloc/> that looks as good as I'd
expect.
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/11/27980.php


--
Dr. Fabian Wein, University of Erlangen-Nuremberg
Department of Mathematics/ Excellence Cluster for Engineering of Advanced Materials
phone: +49 9131 85 20849

Reply via email to