This doesn’t provide info beyond the local node topology, so it won’t help answer the common switch question
> On Jan 15, 2016, at 8:35 AM, Nick Papior <nickpap...@gmail.com> wrote: > > Wouldn't this be partially available via > https://github.com/open-mpi/ompi/pull/326 > <https://github.com/open-mpi/ompi/pull/326> in the trunk? > > Of course the switch is not gathered from this, but it might work as an > initial step towards what you seek Matt? > > 2016-01-15 17:27 GMT+01:00 Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>>: > Yes, we don’t propagate envars ourselves other than MCA params. You can ask > mpirun to forward specific envars to every proc, but that would only push the > same value to everyone, and that doesn’t sound like what you are looking for. > > FWIW: we are working on adding the ability to directly query the info you are > seeking - i.e., to ask for things like “which procs are on the same switch as > me?”. Hoping to have it later this year, perhaps in the summer. > > >> On Jan 15, 2016, at 7:56 AM, Matt Thompson <fort...@gmail.com >> <mailto:fort...@gmail.com>> wrote: >> >> Ralph, >> >> That doesn't help: >> >> (1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2 >> Process 0 of 8 is on host borgo086 >> Process 0 of 8 is on processor borgo086 >> Process 1 of 8 is on host borgo086 >> Process 1 of 8 is on processor borgo140 >> Process 2 of 8 is on host borgo086 >> Process 2 of 8 is on processor borgo086 >> Process 3 of 8 is on host borgo086 >> Process 3 of 8 is on processor borgo140 >> Process 4 of 8 is on host borgo086 >> Process 4 of 8 is on processor borgo086 >> Process 5 of 8 is on host borgo086 >> Process 5 of 8 is on processor borgo140 >> Process 6 of 8 is on host borgo086 >> Process 6 of 8 is on processor borgo086 >> Process 7 of 8 is on host borgo086 >> Process 7 of 8 is on processor borgo140 >> >> But it was doing the right thing before. It saw my SLURM_* bits and >> correctly put 4 processes on the first node and 4 on the second (see the >> processor line which is from MPI, not the environment), and I only asked for >> 4 tasks per node: >> >> SLURM_NODELIST=borgo[086,140] >> SLURM_NTASKS_PER_NODE=4 >> SLURM_NNODES=2 >> SLURM_NTASKS=8 >> SLURM_TASKS_PER_NODE=4(x2) >> >> My guess is no MPI stack wants to propagate an environment variable to every >> process. I'm picturing an 1000 node/28000 core job...and poor Open MPI (or >> MPT or Intel MPI) would have to marshall 28000xN environment variables >> around and keep track of who gets what... >> >> Matt >> >> >> On Fri, Jan 15, 2016 at 10:48 AM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> Actually, the explanation is much simpler. You probably have more than 8 >> slots on borgj020, and so your job is simply small enough that we put it all >> on one host. If you want to force the job to use both hosts, add “-map-by >> node” to your cmd line >> >> >>> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu >>> <mailto:jedwa...@ucar.edu>> wrote: >>> >>> >>> >>> On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com >>> <mailto:fort...@gmail.com>> wrote: >>> All, >>> >>> I'm not too sure if this is an MPI issue, a Fortran issue, or something >>> else but I thought I'd ask the MPI gurus here first since my web search >>> failed me. >>> >>> There is a chance in the future I might want/need to query an environment >>> variable in a Fortran program, namely to figure out what switch a currently >>> running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make >>> a "per-switch" communicator.[1] >>> >>> So, I coded up a boring Fortran program whose only exciting lines are: >>> >>> call MPI_Get_Processor_Name(processor_name,name_length,ierror) >>> call get_environment_variable("HOST",host_name) >>> >>> write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is >>> on processor", trim(processor_name) >>> write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is >>> on host", trim(host_name) >>> >>> I decided to try out with the HOST environment variable first because it is >>> simple and different per node (I didn't want to take many, many nodes to >>> find the point when a switch is traversed). I then grabbed two nodes with 4 >>> processes per node and...: >>> >>> (1046) $ echo "$SLURM_NODELIST" >>> borgj[020,036] >>> (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST' >>> borgj036: borgj036 >>> borgj020: borgj020 >>> (1048) $ mpifort -o hostenv.x hostenv.F90 >>> (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2 >>> Process 0 of 8 is on host borgj020 >>> Process 0 of 8 is on processor borgj020 >>> Process 1 of 8 is on host borgj020 >>> Process 1 of 8 is on processor borgj020 >>> Process 2 of 8 is on host borgj020 >>> Process 2 of 8 is on processor borgj020 >>> Process 3 of 8 is on host borgj020 >>> Process 3 of 8 is on processor borgj020 >>> Process 4 of 8 is on host borgj020 >>> Process 4 of 8 is on processor borgj036 >>> Process 5 of 8 is on host borgj020 >>> Process 5 of 8 is on processor borgj036 >>> Process 6 of 8 is on host borgj020 >>> Process 6 of 8 is on processor borgj036 >>> Process 7 of 8 is on host borgj020 >>> Process 7 of 8 is on processor borgj036 >>> >>> It looks like MPI_Get_Processor_Name is doing its thing, but the HOST one >>> seems to only be reflecting the first host. My guess is that OpenMPI >>> doesn't export every processes' environment separately to every process so >>> it is reflecting HOST from process 0. >>> >>> >>> I would guess that what is actually happening is that slurm is exporting >>> all of the variables from the host node including the $HOST variable and >>> overwriting the >>> defaults on other nodes. You should use the SLURM options to limit the >>> list of >>> variables that you export from the host to only those that you need. >>> >>> >>> >>> >>> >>> So, I guess my question is: can this be done? Is there an option to Open >>> MPI that might do it? Or is this just something MPI doesn't do? Or is my >>> Google-fu just too weak to figure out the right search-phrase to find the >>> answer to this probable FAQ? >>> >>> Matt >>> >>> [1] Note, this might be unnecessary, but I got to the point where I wanted >>> to see if I *could* do it, rather than *should*. >>> >>> -- >>> Matt Thompson >>> Man Among Men >>> Fulcrum of History >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28287.php >>> <http://www.open-mpi.org/community/lists/users/2016/01/28287.php> >>> >>> >>> >>> -- >>> Jim Edwards >>> >>> CESM Software Engineer >>> National Center for Atmospheric Research >>> Boulder, CO >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28289.php >>> <http://www.open-mpi.org/community/lists/users/2016/01/28289.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28290.php >> <http://www.open-mpi.org/community/lists/users/2016/01/28290.php> >> >> >> >> -- >> Matt Thompson >> Man Among Men >> Fulcrum of History >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28291.php >> <http://www.open-mpi.org/community/lists/users/2016/01/28291.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28292.php > <http://www.open-mpi.org/community/lists/users/2016/01/28292.php> > > > > -- > Kind regards Nick > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28294.php