On Feb 14, 2011, at 9:35 AM, Siew Yin Chan wrote:
> 1. I tried Open MPI 1.5.1 before turning to hwloc-bind. Yep. Open MPI 1.5.1
> does provide the --bycore and --bind-to-core option, but this option seems to
> bind processes to cores on my machine according to the *physical* indexes:
FWIW, you might want to try one of the OMPI 1.5.2 nightly tarballs -- we
switched the process affinity stuff to hwloc in 1.5.2 (the 1.5.1 stuff uses a
different mechanism).
> FYI, my testing environment and application imposes these requirements for
> optimum performance:
>
> i. Different binaries optimized for heterogeneous machines. This necessitates
> MIMD, and can be done in OMPI using the -app option (providing an
> application context file).
> ii. The application is communication-sensitive. Thus, fine-grained process
> mapping on *machines* and on *cores* is required to minimize inter-machine
> and inter-socket communication costs occurring on the network and on the
> system bus. Specifically, processes should be mapped onto successive cores of
> one socket before the next socket is considered, i.e., socket.0:core0-3, then
> socket.1:core0-3. In this case, the communication among neighboring rank 0-3
> will be confined to socket 0 without going through the system bus. Same for
> rank 4-7 on socket 1. As such, the order of the cores should follow the
> *logical* indexes.
I think that OMPI 1.5.2 should do this for you -- rather than following and
logical/physical ordering, it does what you describe: traverses successive
cores on a socket before going to the next socket (which happens to correspond
to hwloc's logical ordering, but that was not the intent).
FWIW, we have a huge revamp of OMPI's affinity support on the mpirun command
line that will offer much more flexible binding choices.
> Initially, I tried combining the features of rankfile and appfile, e.g.,
>
> $ cat rankfile8np4
> rank 0=compute-0-8 slot=0:0
> rank 1=compute-0-8 slot=0:1
> rank 2=compute-0-8 slot=0:2
> rank 3=compute-0-8 slot=0:3
> $ cat rankfile9np4
> rank 0=compute-0-9 slot=0:0
> rank 1=compute-0-9 slot=0:1
> rank 2=compute-0-9 slot=0:2
> rank 3=compute-0-9 slot=0:3
> $ cat my_appfile_rankfile
> --host compute-0-8 -rf rankfile8np4 -np 4 ./test1
> --host compute-0-9 -rf rankfile9np4 -np 4 ./test2
> $ mpirun -app my_appfile_rankfile
>
> but found out that only the rankfile stated on the first line took effect;
> the second was ignored completely. After some time of googling and trial and
> error, I decided to try an external binder, and this direction led me to
> hwloc-bind.
>
> Maybe I should bring the issue of rankfile + appfile to the OMPI mailing list.
Yes.
I'd have to look at it more closely, but it's possible that we only allow one
rankfile per job -- i.e., that the rankfile should specify all the procs in the
job, not on a per-host basis. But perhaps we don't warn/error if multiple
rankfiles are used; I would consider that a bug.
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/