We're open to suggestion, really - just need some help identifying the best
way to get this info out there.
well, OpenMPI information is fragmented and sprayed all over.
In some places, there is mention of a wiki to be updated with
an explanation; for other things, a consumer needs to wander around
loosely-related blogs, mail archives, FAQs, usage statements, etc.
For instance, I've been trying to figure out how to do a simple thing,
launch a hybrid job. Assume I have a scheduled, heterogenous cluster
where mpirun simply receives a normal nodefile like this:
clu357
clu357
clu357
clu354
clu354
clu354
and I want to launch a 2-rank, 3-thread-per-rank job. forget about
frills like hwloc or binding.
back when --cpus-per-proc was around, this was obvious and worked
flawlessly. I honestly can't figure out how it works now, though -
for any definition of "now" since:
http://www.open-mpi.org/community/lists/devel/2011/12/10060.php
2011! then there's a dribble more info in 2014 (!) that hints that
"--map-by node:pe=3" might do the trick here:
http://comments.gmane.org/gmane.comp.clustering.open-mpi.user/21193
where did "pe" come from? is it the same as slot, hwthread, core?
why does the documentation make snide comments about how the conventional
understanding of "rank" (~ equivalent to process) might not be true?
most of all, when was the break introduced? at this point, I tell people
that 1.4.3 worked, and that everything after that is broken.
recent releases (I tried 1.7.3, 1.7.5 and 1.8.1) choke on this.
I wonder whether it's having trouble with the fact that a job
gets an arbitrary set of cores via cgroup, and perhaps hwloc
doesn't understand that it can only work within this set...
So please see this URL below(especially the first half part
of it - from 1 to 20 pages):
http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation
Although these slides by Jeff are the explanation for LAMA,
which is another mapping system installed in the openmpi-1.7
series, I guess you can easily understand what is mapping and
binding in general terms.
AFAIKT, the lama slide deck seemed to be only concerned with
affinity settings, which are irrelevant here.
confused,
Mark Hahn.