We're open to suggestion, really - just need some help identifying the best
way to get this info out there.

well, OpenMPI information is fragmented and sprayed all over.
In some places, there is mention of a wiki to be updated with an explanation; for other things, a consumer needs to wander around loosely-related blogs, mail archives, FAQs, usage statements, etc.

For instance, I've been trying to figure out how to do a simple thing,
launch a hybrid job.  Assume I have a scheduled, heterogenous cluster
where mpirun simply receives a normal nodefile like this:

clu357
clu357
clu357
clu354
clu354
clu354

and I want to launch a 2-rank, 3-thread-per-rank job. forget about frills like hwloc or binding.

back when --cpus-per-proc was around, this was obvious and worked flawlessly. I honestly can't figure out how it works now, though - for any definition of "now" since:

http://www.open-mpi.org/community/lists/devel/2011/12/10060.php

2011! then there's a dribble more info in 2014 (!) that hints that "--map-by node:pe=3" might do the trick here:

http://comments.gmane.org/gmane.comp.clustering.open-mpi.user/21193

where did "pe" come from?  is it the same as slot, hwthread, core?
why does the documentation make snide comments about how the conventional
understanding of "rank" (~ equivalent to process) might not be true?

most of all, when was the break introduced?  at this point, I tell people
that 1.4.3 worked, and that everything after that is broken.

recent releases (I tried 1.7.3, 1.7.5 and 1.8.1) choke on this. I wonder whether it's having trouble with the fact that a job gets an arbitrary set of cores via cgroup, and perhaps hwloc doesn't understand that it can only work within this set...


   So please see this URL below(especially the first half part
   of it - from 1 to 20 pages):
   
http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation

   Although these slides by Jeff are the explanation for LAMA,
   which is another mapping system installed in the openmpi-1.7
   series, I guess you can easily understand what is mapping and
   binding in general terms.

AFAIKT, the lama slide deck seemed to be only concerned with affinity settings, which are irrelevant here.

confused,
Mark Hahn.

Reply via email to