Ralph,

That doesn't help:

(1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2
Process    0 of    8 is on host borgo086
Process    0 of    8 is on processor borgo086
Process    1 of    8 is on host borgo086
Process    1 of    8 is on processor borgo140
Process    2 of    8 is on host borgo086
Process    2 of    8 is on processor borgo086
Process    3 of    8 is on host borgo086
Process    3 of    8 is on processor borgo140
Process    4 of    8 is on host borgo086
Process    4 of    8 is on processor borgo086
Process    5 of    8 is on host borgo086
Process    5 of    8 is on processor borgo140
Process    6 of    8 is on host borgo086
Process    6 of    8 is on processor borgo086
Process    7 of    8 is on host borgo086
Process    7 of    8 is on processor borgo140

But it was doing the right thing before. It saw my SLURM_* bits and
correctly put 4 processes on the first node and 4 on the second (see the
processor line which is from MPI, not the environment), and I only asked
for 4 tasks per node:

SLURM_NODELIST=borgo[086,140]
SLURM_NTASKS_PER_NODE=4
SLURM_NNODES=2
SLURM_NTASKS=8
SLURM_TASKS_PER_NODE=4(x2)

My guess is no MPI stack wants to propagate an environment variable to
every process. I'm picturing an 1000 node/28000 core job...and poor Open
MPI (or MPT or Intel MPI) would have to marshall 28000xN environment
variables around and keep track of who gets what...

Matt


On Fri, Jan 15, 2016 at 10:48 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Actually, the explanation is much simpler. You probably have more than 8
> slots on borgj020, and so your job is simply small enough that we put it
> all on one host. If you want to force the job to use both hosts, add
> “-map-by node” to your cmd line
>
>
> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu> wrote:
>
>
>
> On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com> wrote:
>
>> All,
>>
>> I'm not too sure if this is an MPI issue, a Fortran issue, or something
>> else but I thought I'd ask the MPI gurus here first since my web search
>> failed me.
>>
>> There is a chance in the future I might want/need to query an environment
>> variable in a Fortran program, namely to figure out what switch a currently
>> running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make
>> a "per-switch" communicator.[1]
>>
>> So, I coded up a boring Fortran program whose only exciting lines are:
>>
>>    call MPI_Get_Processor_Name(processor_name,name_length,ierror)
>>    call get_environment_variable("HOST",host_name)
>>
>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is
>> on processor", trim(processor_name)
>>    write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is
>> on host", trim(host_name)
>>
>> I decided to try out with the HOST environment variable first because it
>> is simple and different per node (I didn't want to take many, many nodes to
>> find the point when a switch is traversed). I then grabbed two nodes with 4
>> processes per node and...:
>>
>> (1046) $ echo "$SLURM_NODELIST"
>> borgj[020,036]
>> (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST'
>> borgj036: borgj036
>> borgj020: borgj020
>> (1048) $ mpifort -o hostenv.x hostenv.F90
>> (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2
>> Process    0 of    8 is on host borgj020
>> Process    0 of    8 is on processor borgj020
>> Process    1 of    8 is on host borgj020
>> Process    1 of    8 is on processor borgj020
>> Process    2 of    8 is on host borgj020
>> Process    2 of    8 is on processor borgj020
>> Process    3 of    8 is on host borgj020
>> Process    3 of    8 is on processor borgj020
>> Process    4 of    8 is on host borgj020
>> Process    4 of    8 is on processor borgj036
>> Process    5 of    8 is on host borgj020
>> Process    5 of    8 is on processor borgj036
>> Process    6 of    8 is on host borgj020
>> Process    6 of    8 is on processor borgj036
>> Process    7 of    8 is on host borgj020
>> Process    7 of    8 is on processor borgj036
>>
>> It looks like MPI_Get_Processor_Name is doing its thing, but the HOST one
>> seems to only be reflecting the first host. My guess is that OpenMPI
>> doesn't export every processes' environment separately to every process so
>> it is reflecting HOST from process 0.
>> ​
>>
>
> ​I would guess that what is actually happening is that slurm is exporting
> all of the variables from the host node including the $HOST variable and
> overwriting the ​
> ​defaults on other nodes.   You should use the SLURM options to limit the
> list of
> variables that you export from the host to only those that you need.​
>
>
>
>
>> ​
>>
>> So, I guess my question is: can this be done? Is there an option to Open
>> MPI that might do it? Or is this just something MPI doesn't do? Or is my
>> Google-fu just too weak to figure out the right search-phrase to find the
>> answer to this probable FAQ?
>>
>> Matt
>>
>> [1] Note, this might be unnecessary, but I got to the point where I
>> wanted to see if I *could* do it, rather than *should*.
>>
>> --
>> Matt Thompson
>>
>> Man Among Men
>> Fulcrum of History
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/01/28287.php
>>
>
>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28289.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28290.php
>



-- 
Matt Thompson

Man Among Men
Fulcrum of History

Reply via email to