This doesn’t provide info beyond the local node topology, so it won’t help 
answer the common switch question

> On Jan 15, 2016, at 8:35 AM, Nick Papior <nickpap...@gmail.com> wrote:
> 
> Wouldn't this be partially available via 
> https://github.com/open-mpi/ompi/pull/326 
> <https://github.com/open-mpi/ompi/pull/326> in the trunk?
> 
> Of course the switch is not gathered from this, but it might work as an 
> initial step towards what you seek Matt?
> 
> 2016-01-15 17:27 GMT+01:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>>:
> Yes, we don’t propagate envars ourselves other than MCA params. You can ask 
> mpirun to forward specific envars to every proc, but that would only push the 
> same value to everyone, and that doesn’t sound like what you are looking for.
> 
> FWIW: we are working on adding the ability to directly query the info you are 
> seeking - i.e., to ask for things like “which procs are on the same switch as 
> me?”. Hoping to have it later this year, perhaps in the summer.
> 
> 
>> On Jan 15, 2016, at 7:56 AM, Matt Thompson <fort...@gmail.com 
>> <mailto:fort...@gmail.com>> wrote:
>> 
>> Ralph,
>> 
>> That doesn't help:
>> 
>> (1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2
>> Process    0 of    8 is on host borgo086
>> Process    0 of    8 is on processor borgo086
>> Process    1 of    8 is on host borgo086
>> Process    1 of    8 is on processor borgo140
>> Process    2 of    8 is on host borgo086
>> Process    2 of    8 is on processor borgo086
>> Process    3 of    8 is on host borgo086
>> Process    3 of    8 is on processor borgo140
>> Process    4 of    8 is on host borgo086
>> Process    4 of    8 is on processor borgo086
>> Process    5 of    8 is on host borgo086
>> Process    5 of    8 is on processor borgo140
>> Process    6 of    8 is on host borgo086
>> Process    6 of    8 is on processor borgo086
>> Process    7 of    8 is on host borgo086
>> Process    7 of    8 is on processor borgo140
>> 
>> But it was doing the right thing before. It saw my SLURM_* bits and 
>> correctly put 4 processes on the first node and 4 on the second (see the 
>> processor line which is from MPI, not the environment), and I only asked for 
>> 4 tasks per node:
>> 
>> SLURM_NODELIST=borgo[086,140]
>> SLURM_NTASKS_PER_NODE=4
>> SLURM_NNODES=2
>> SLURM_NTASKS=8
>> SLURM_TASKS_PER_NODE=4(x2)
>> 
>> My guess is no MPI stack wants to propagate an environment variable to every 
>> process. I'm picturing an 1000 node/28000 core job...and poor Open MPI (or 
>> MPT or Intel MPI) would have to marshall 28000xN environment variables 
>> around and keep track of who gets what...
>> 
>> Matt
>> 
>> 
>> On Fri, Jan 15, 2016 at 10:48 AM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> Actually, the explanation is much simpler. You probably have more than 8 
>> slots on borgj020, and so your job is simply small enough that we put it all 
>> on one host. If you want to force the job to use both hosts, add “-map-by 
>> node” to your cmd line
>> 
>> 
>>> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu 
>>> <mailto:jedwa...@ucar.edu>> wrote:
>>> 
>>> 
>>> 
>>> On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com 
>>> <mailto:fort...@gmail.com>> wrote:
>>> All,
>>> 
>>> I'm not too sure if this is an MPI issue, a Fortran issue, or something 
>>> else but I thought I'd ask the MPI gurus here first since my web search 
>>> failed me.
>>> 
>>> There is a chance in the future I might want/need to query an environment 
>>> variable in a Fortran program, namely to figure out what switch a currently 
>>> running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make 
>>> a "per-switch" communicator.[1]
>>> 
>>> So, I coded up a boring Fortran program whose only exciting lines are:
>>> 
>>>    call MPI_Get_Processor_Name(processor_name,name_length,ierror)
>>>    call get_environment_variable("HOST",host_name)
>>> 
>>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is 
>>> on processor", trim(processor_name)
>>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is 
>>> on host", trim(host_name)
>>> 
>>> I decided to try out with the HOST environment variable first because it is 
>>> simple and different per node (I didn't want to take many, many nodes to 
>>> find the point when a switch is traversed). I then grabbed two nodes with 4 
>>> processes per node and...:
>>> 
>>> (1046) $ echo "$SLURM_NODELIST"
>>> borgj[020,036]
>>> (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST'
>>> borgj036: borgj036
>>> borgj020: borgj020
>>> (1048) $ mpifort -o hostenv.x hostenv.F90
>>> (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2
>>> Process    0 of    8 is on host borgj020
>>> Process    0 of    8 is on processor borgj020
>>> Process    1 of    8 is on host borgj020
>>> Process    1 of    8 is on processor borgj020
>>> Process    2 of    8 is on host borgj020
>>> Process    2 of    8 is on processor borgj020
>>> Process    3 of    8 is on host borgj020
>>> Process    3 of    8 is on processor borgj020
>>> Process    4 of    8 is on host borgj020
>>> Process    4 of    8 is on processor borgj036
>>> Process    5 of    8 is on host borgj020
>>> Process    5 of    8 is on processor borgj036
>>> Process    6 of    8 is on host borgj020
>>> Process    6 of    8 is on processor borgj036
>>> Process    7 of    8 is on host borgj020
>>> Process    7 of    8 is on processor borgj036
>>> 
>>> It looks like MPI_Get_Processor_Name is doing its thing, but the HOST one 
>>> seems to only be reflecting the first host. My guess is that OpenMPI 
>>> doesn't export every processes' environment separately to every process so 
>>> it is reflecting HOST from process 0.
>>> ​
>>> 
>>> ​I would guess that what is actually happening is that slurm is exporting 
>>> all of the variables from the host node including the $HOST variable and 
>>> overwriting the ​
>>> ​defaults on other nodes.   You should use the SLURM options to limit the 
>>> list of 
>>> variables that you export from the host to only those that you need.​
>>> 
>>> 
>>>  
>>> ​
>>> 
>>> So, I guess my question is: can this be done? Is there an option to Open 
>>> MPI that might do it? Or is this just something MPI doesn't do? Or is my 
>>> Google-fu just too weak to figure out the right search-phrase to find the 
>>> answer to this probable FAQ?
>>> 
>>> Matt
>>> 
>>> [1] Note, this might be unnecessary, but I got to the point where I wanted 
>>> to see if I *could* do it, rather than *should*.
>>> 
>>> -- 
>>> Matt Thompson
>>> Man Among Men
>>> Fulcrum of History
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/01/28287.php 
>>> <http://www.open-mpi.org/community/lists/users/2016/01/28287.php>
>>> 
>>> 
>>> 
>>> -- 
>>> Jim Edwards
>>> 
>>> CESM Software Engineer
>>> National Center for Atmospheric Research
>>> Boulder, CO 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/01/28289.php 
>>> <http://www.open-mpi.org/community/lists/users/2016/01/28289.php>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/01/28290.php 
>> <http://www.open-mpi.org/community/lists/users/2016/01/28290.php>
>> 
>> 
>> 
>> -- 
>> Matt Thompson
>> Man Among Men
>> Fulcrum of History
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/01/28291.php 
>> <http://www.open-mpi.org/community/lists/users/2016/01/28291.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/01/28292.php 
> <http://www.open-mpi.org/community/lists/users/2016/01/28292.php>
> 
> 
> 
> -- 
> Kind regards Nick
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/01/28294.php

Reply via email to