Yes, we don’t propagate envars ourselves other than MCA params. You can ask mpirun to forward specific envars to every proc, but that would only push the same value to everyone, and that doesn’t sound like what you are looking for.
FWIW: we are working on adding the ability to directly query the info you are seeking - i.e., to ask for things like “which procs are on the same switch as me?”. Hoping to have it later this year, perhaps in the summer. > On Jan 15, 2016, at 7:56 AM, Matt Thompson <fort...@gmail.com> wrote: > > Ralph, > > That doesn't help: > > (1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2 > Process 0 of 8 is on host borgo086 > Process 0 of 8 is on processor borgo086 > Process 1 of 8 is on host borgo086 > Process 1 of 8 is on processor borgo140 > Process 2 of 8 is on host borgo086 > Process 2 of 8 is on processor borgo086 > Process 3 of 8 is on host borgo086 > Process 3 of 8 is on processor borgo140 > Process 4 of 8 is on host borgo086 > Process 4 of 8 is on processor borgo086 > Process 5 of 8 is on host borgo086 > Process 5 of 8 is on processor borgo140 > Process 6 of 8 is on host borgo086 > Process 6 of 8 is on processor borgo086 > Process 7 of 8 is on host borgo086 > Process 7 of 8 is on processor borgo140 > > But it was doing the right thing before. It saw my SLURM_* bits and correctly > put 4 processes on the first node and 4 on the second (see the processor line > which is from MPI, not the environment), and I only asked for 4 tasks per > node: > > SLURM_NODELIST=borgo[086,140] > SLURM_NTASKS_PER_NODE=4 > SLURM_NNODES=2 > SLURM_NTASKS=8 > SLURM_TASKS_PER_NODE=4(x2) > > My guess is no MPI stack wants to propagate an environment variable to every > process. I'm picturing an 1000 node/28000 core job...and poor Open MPI (or > MPT or Intel MPI) would have to marshall 28000xN environment variables around > and keep track of who gets what... > > Matt > > > On Fri, Jan 15, 2016 at 10:48 AM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > Actually, the explanation is much simpler. You probably have more than 8 > slots on borgj020, and so your job is simply small enough that we put it all > on one host. If you want to force the job to use both hosts, add “-map-by > node” to your cmd line > > >> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu >> <mailto:jedwa...@ucar.edu>> wrote: >> >> >> >> On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com >> <mailto:fort...@gmail.com>> wrote: >> All, >> >> I'm not too sure if this is an MPI issue, a Fortran issue, or something else >> but I thought I'd ask the MPI gurus here first since my web search failed me. >> >> There is a chance in the future I might want/need to query an environment >> variable in a Fortran program, namely to figure out what switch a currently >> running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make >> a "per-switch" communicator.[1] >> >> So, I coded up a boring Fortran program whose only exciting lines are: >> >> call MPI_Get_Processor_Name(processor_name,name_length,ierror) >> call get_environment_variable("HOST",host_name) >> >> write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is on >> processor", trim(processor_name) >> write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is on >> host", trim(host_name) >> >> I decided to try out with the HOST environment variable first because it is >> simple and different per node (I didn't want to take many, many nodes to >> find the point when a switch is traversed). I then grabbed two nodes with 4 >> processes per node and...: >> >> (1046) $ echo "$SLURM_NODELIST" >> borgj[020,036] >> (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST' >> borgj036: borgj036 >> borgj020: borgj020 >> (1048) $ mpifort -o hostenv.x hostenv.F90 >> (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2 >> Process 0 of 8 is on host borgj020 >> Process 0 of 8 is on processor borgj020 >> Process 1 of 8 is on host borgj020 >> Process 1 of 8 is on processor borgj020 >> Process 2 of 8 is on host borgj020 >> Process 2 of 8 is on processor borgj020 >> Process 3 of 8 is on host borgj020 >> Process 3 of 8 is on processor borgj020 >> Process 4 of 8 is on host borgj020 >> Process 4 of 8 is on processor borgj036 >> Process 5 of 8 is on host borgj020 >> Process 5 of 8 is on processor borgj036 >> Process 6 of 8 is on host borgj020 >> Process 6 of 8 is on processor borgj036 >> Process 7 of 8 is on host borgj020 >> Process 7 of 8 is on processor borgj036 >> >> It looks like MPI_Get_Processor_Name is doing its thing, but the HOST one >> seems to only be reflecting the first host. My guess is that OpenMPI doesn't >> export every processes' environment separately to every process so it is >> reflecting HOST from process 0. >> >> >> I would guess that what is actually happening is that slurm is exporting >> all of the variables from the host node including the $HOST variable and >> overwriting the >> defaults on other nodes. You should use the SLURM options to limit the >> list of >> variables that you export from the host to only those that you need. >> >> >> >> >> >> So, I guess my question is: can this be done? Is there an option to Open MPI >> that might do it? Or is this just something MPI doesn't do? Or is my >> Google-fu just too weak to figure out the right search-phrase to find the >> answer to this probable FAQ? >> >> Matt >> >> [1] Note, this might be unnecessary, but I got to the point where I wanted >> to see if I *could* do it, rather than *should*. >> >> -- >> Matt Thompson >> Man Among Men >> Fulcrum of History >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28287.php >> <http://www.open-mpi.org/community/lists/users/2016/01/28287.php> >> >> >> >> -- >> Jim Edwards >> >> CESM Software Engineer >> National Center for Atmospheric Research >> Boulder, CO >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28289.php >> <http://www.open-mpi.org/community/lists/users/2016/01/28289.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28290.php > <http://www.open-mpi.org/community/lists/users/2016/01/28290.php> > > > > -- > Matt Thompson > Man Among Men > Fulcrum of History > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28291.php