Ralph,

Sounds good. I'll keep my eyes out. I figured it probably wasn't possible.
Of course, it's simple enough to run a script ahead of time that can build
a table that could be read in-program. I was just hoping perhaps I could do
it in one-step instead of two!

And, well, I'm slowly learning that whatever I knew about switches in an
Ethernet way means nothing in an Infiniband situation!

On Fri, Jan 15, 2016 at 11:27 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Yes, we don’t propagate envars ourselves other than MCA params. You can
> ask mpirun to forward specific envars to every proc, but that would only
> push the same value to everyone, and that doesn’t sound like what you are
> looking for.
>
> FWIW: we are working on adding the ability to directly query the info you
> are seeking - i.e., to ask for things like “which procs are on the same
> switch as me?”. Hoping to have it later this year, perhaps in the summer.
>
>
> On Jan 15, 2016, at 7:56 AM, Matt Thompson <fort...@gmail.com> wrote:
>
> Ralph,
>
> That doesn't help:
>
> (1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2
> Process    0 of    8 is on host borgo086
> Process    0 of    8 is on processor borgo086
> Process    1 of    8 is on host borgo086
> Process    1 of    8 is on processor borgo140
> Process    2 of    8 is on host borgo086
> Process    2 of    8 is on processor borgo086
> Process    3 of    8 is on host borgo086
> Process    3 of    8 is on processor borgo140
> Process    4 of    8 is on host borgo086
> Process    4 of    8 is on processor borgo086
> Process    5 of    8 is on host borgo086
> Process    5 of    8 is on processor borgo140
> Process    6 of    8 is on host borgo086
> Process    6 of    8 is on processor borgo086
> Process    7 of    8 is on host borgo086
> Process    7 of    8 is on processor borgo140
>
> But it was doing the right thing before. It saw my SLURM_* bits and
> correctly put 4 processes on the first node and 4 on the second (see the
> processor line which is from MPI, not the environment), and I only asked
> for 4 tasks per node:
>
> SLURM_NODELIST=borgo[086,140]
> SLURM_NTASKS_PER_NODE=4
> SLURM_NNODES=2
> SLURM_NTASKS=8
> SLURM_TASKS_PER_NODE=4(x2)
>
> My guess is no MPI stack wants to propagate an environment variable to
> every process. I'm picturing an 1000 node/28000 core job...and poor Open
> MPI (or MPT or Intel MPI) would have to marshall 28000xN environment
> variables around and keep track of who gets what...
>
> Matt
>
>
> On Fri, Jan 15, 2016 at 10:48 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Actually, the explanation is much simpler. You probably have more than 8
>> slots on borgj020, and so your job is simply small enough that we put it
>> all on one host. If you want to force the job to use both hosts, add
>> “-map-by node” to your cmd line
>>
>>
>> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu> wrote:
>>
>>
>>
>> On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com> wrote:
>>
>>> All,
>>>
>>> I'm not too sure if this is an MPI issue, a Fortran issue, or something
>>> else but I thought I'd ask the MPI gurus here first since my web search
>>> failed me.
>>>
>>> There is a chance in the future I might want/need to query an
>>> environment variable in a Fortran program, namely to figure out what switch
>>> a currently running process is on (via SLURM_TOPOLOGY_ADDR in my case) and
>>> perhaps make a "per-switch" communicator.[1]
>>>
>>> So, I coded up a boring Fortran program whose only exciting lines are:
>>>
>>>    call MPI_Get_Processor_Name(processor_name,name_length,ierror)
>>>    call get_environment_variable("HOST",host_name)
>>>
>>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes,
>>> "is on processor", trim(processor_name)
>>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes,
>>> "is on host", trim(host_name)
>>>
>>> I decided to try out with the HOST environment variable first because
>>> it is simple and different per node (I didn't want to take many, many nodes
>>> to find the point when a switch is traversed). I then grabbed two nodes
>>> with 4 processes per node and...:
>>>
>>> (1046) $ echo "$SLURM_NODELIST"
>>> borgj[020,036]
>>> (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST'
>>> borgj036: borgj036
>>> borgj020: borgj020
>>> (1048) $ mpifort -o hostenv.x hostenv.F90
>>> (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2
>>> Process    0 of    8 is on host borgj020
>>> Process    0 of    8 is on processor borgj020
>>> Process    1 of    8 is on host borgj020
>>> Process    1 of    8 is on processor borgj020
>>> Process    2 of    8 is on host borgj020
>>> Process    2 of    8 is on processor borgj020
>>> Process    3 of    8 is on host borgj020
>>> Process    3 of    8 is on processor borgj020
>>> Process    4 of    8 is on host borgj020
>>> Process    4 of    8 is on processor borgj036
>>> Process    5 of    8 is on host borgj020
>>> Process    5 of    8 is on processor borgj036
>>> Process    6 of    8 is on host borgj020
>>> Process    6 of    8 is on processor borgj036
>>> Process    7 of    8 is on host borgj020
>>> Process    7 of    8 is on processor borgj036
>>>
>>> It looks like MPI_Get_Processor_Name is doing its thing, but the HOST
>>> one seems to only be reflecting the first host. My guess is that OpenMPI
>>> doesn't export every processes' environment separately to every process so
>>> it is reflecting HOST from process 0.
>>> ​
>>>
>>
>> ​I would guess that what is actually happening is that slurm is exporting
>> all of the variables from the host node including the $HOST variable and
>> overwriting the ​
>> ​defaults on other nodes.   You should use the SLURM options to limit the
>> list of
>> variables that you export from the host to only those that you need.​
>>
>>
>>
>>
>>> ​
>>>
>>> So, I guess my question is: can this be done? Is there an option to Open
>>> MPI that might do it? Or is this just something MPI doesn't do? Or is my
>>> Google-fu just too weak to figure out the right search-phrase to find the
>>> answer to this probable FAQ?
>>>
>>> Matt
>>>
>>> [1] Note, this might be unnecessary, but I got to the point where I
>>> wanted to see if I *could* do it, rather than *should*.
>>>
>>> --
>>> Matt Thompson
>>>
>>> Man Among Men
>>> Fulcrum of History
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/01/28287.php
>>>
>>
>>
>>
>> --
>> Jim Edwards
>>
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/01/28289.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/01/28290.php
>>
>
>
>
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28291.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28292.php
>



-- 
Matt Thompson

Man Among Men
Fulcrum of History

Reply via email to