Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-12 Thread jody
In a similar situation i wrote a simple shell script "rankcreate.sh"
which creates a rank file assigning the various ranks to the correct
processors/slots when given a number of processes. In addition, this
script returns the name of this created rank file. I then use it like
this:

mpirun -np 5 --rankfile `rankcreate.sh 5` myApplication

May be this is of use for you

jody

On Fri, Dec 10, 2010 at 11:50 PM, Eugene Loh  wrote:
> David Mathog wrote:
>
>> Also, in my limited testing --host and -hostfile seem to be mutually
>> exclusive.
>>
> No.  You can use both together.  Indeed, the mpirun man page even has
> examples of this (though personally, I don't see having a use for this).  I
> think the idea was you might use a hostfile to define the nodes in your
> cluster and an mpirun command line that uses --host to select specific nodes
> from the file.
>
>> That is reasonable, but it isn't clear that it is intended.
>> Example, with a hostfile containing one entry for "monkey02.cluster
>> slots=1":
>>
>> mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
>> monkey01.cluster
>>
>
> Okay.
>
>> mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
>> monkey02.cluster
>>
>
> Okay.
>
>> mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
>>  --mca plm_rsh_agent rsh  hostname
>> monkey02.cluster
>>
>
> Okay.
>
>> mpirun  --host monkey01  \
>>  -hostfile /usr/commom/etc/openmpi.machines.test1 \
>>  --mca plm_rsh_agent rsh  hostname
>> --
>> There are no allocated resources for the application  hostname
>> that match the requested mapping:
>>
>> Verify that you have mapped the allocated resources properly using the
>> --host or --hostfile specification.
>> --
>>
>
> Right.  Your hostfile has monkey02.  On the command line, you specify
> monkey01, but that's not in your hostfile.  That's a problem.  Just like on
> the mpirun man page.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Eugene Loh

David Mathog wrote:


Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.

No.  You can use both together.  Indeed, the mpirun man page even has 
examples of this (though personally, I don't see having a use for 
this).  I think the idea was you might use a hostfile to define the 
nodes in your cluster and an mpirun command line that uses --host to 
select specific nodes from the file.



That is reasonable, but it isn't clear that it is intended.
Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
monkey01.cluster
 


Okay.


mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
 


Okay.


mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
  --mca plm_rsh_agent rsh  hostname
monkey02.cluster
 


Okay.


mpirun  --host monkey01  \
 -hostfile /usr/commom/etc/openmpi.machines.test1 \
 --mca plm_rsh_agent rsh  hostname
--
There are no allocated resources for the application 
 hostname

that match the requested mapping:
 

Verify that you have mapped the allocated resources properly using the 
--host or --hostfile specification.

--
 

Right.  Your hostfile has monkey02.  On the command line, you specify 
monkey01, but that's not in your hostfile.  That's a problem.  Just like 
on the mpirun man page.


Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Ralph Castain
Terry is correct - not guaranteed, but that is the typical behavior.

However, you -can- guarantee that rank=0 will be on a particular host. Just run 
your job:

mpirun -n 1 -host  my_app : -n (N-1) my_app

This guarantees that rank=0 is on host . All other ranks will be 
distributed according to the selected mapping algorithm, including loadbalance

Ralph

On Dec 10, 2010, at 12:08 PM, Terry Dontje wrote:

> On 12/10/2010 01:46 PM, David Mathog wrote:
>> 
>> The master is commonly very different from the workers, so I expected
>> there would be something like
>> 
>>   --rank0-on 
>> 
>> but there doesn't seem to be a single switch on mpirun to do that.
>> 
>> If "mastermachine" is the first entry in the hostfile, or the first
>> machine in a -hosts list, will rank 0 always run there?  If so, will it
>> always run in the first slot on the first machine listed?  That seems to
>> be the case in practice, but is it guaranteed?  Even if -loadbalance is
>> used?  
>> 
> For Open MPI the above is correct, I am hesitant to use guaranteed though.
>> Otherwise, there is the rankfile method.  In the situation where the
>> master must run on a specific node, but there is no preference for the
>> workers, would a rank file like this be sufficient?
>> 
>> rank 0=mastermachine slot=0
> I thought you may have had to give all ranks but empirically it looks like 
> you can.
>> The mpirun man page gives an example where all nodes/slots are
>> specified, but it doesn't say explicitly what happens if the
>> configuration is only partially specified, or how it interacts with the
>> -np parameter.  Modifying the man page example:
>> 
>> cat myrankfile
>> rank 0=aa slot=1:0-2
>> rank 1=bb slot=0:0,1
>> rank 2=cc slot=1-2
>> mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out
>> 
>> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
>> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
>> Rank 2 runs on node cc, bound to cores 1 and 2.
>> 
>> Rank 3 runs where?  not at all, or on dd, aa:slot=0, or ...? 
> From my empirical runs it looks to me like rank 3 would end up on aa possibly 
> slot=0.
> In other words once you run out of entries in the rankfile it looks like the 
> processes then start from the beginning of the hostlist again.  
> 
> --td
>> Also, in my limited testing --host and -hostfile seem to be mutually
>> exclusive.  That is reasonable, but it isn't clear that it is intended.
>>  Example, with a hostfile containing one entry for "monkey02.cluster
>> slots=1":
>> 
>> mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
>> monkey01.cluster
>> mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
>> monkey02.cluster
>> mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
>>--mca plm_rsh_agent rsh  hostname
>> monkey02.cluster
>> mpirun  --host monkey01  \
>>   -hostfile /usr/commom/etc/openmpi.machines.test1 \
>>   --mca plm_rsh_agent rsh  hostname
>> --
>> There are no allocated resources for the application 
>>   hostname
>> that match the requested mapping:
>>   
>> 
>> Verify that you have mapped the allocated resources properly using the 
>> --host or --hostfile specification.
>> --
>> 
>> 
>> 
>> 
>> Thanks,
>> 
>> David Mathog
>> mat...@caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Terry Dontje

On 12/10/2010 01:46 PM, David Mathog wrote:

The master is commonly very different from the workers, so I expected
there would be something like

   --rank0-on

but there doesn't seem to be a single switch on mpirun to do that.

If "mastermachine" is the first entry in the hostfile, or the first
machine in a -hosts list, will rank 0 always run there?  If so, will it
always run in the first slot on the first machine listed?  That seems to
be the case in practice, but is it guaranteed?  Even if -loadbalance is
used?


For Open MPI the above is correct, I am hesitant to use guaranteed though.

Otherwise, there is the rankfile method.  In the situation where the
master must run on a specific node, but there is no preference for the
workers, would a rank file like this be sufficient?

rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks 
like you can.

The mpirun man page gives an example where all nodes/slots are
specified, but it doesn't say explicitly what happens if the
configuration is only partially specified, or how it interacts with the
-np parameter.  Modifying the man page example:

cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out

Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
Rank 2 runs on node cc, bound to cores 1 and 2.

Rank 3 runs where?  not at all, or on dd, aa:slot=0, or ...?
From my empirical runs it looks to me like rank 3 would end up on aa 
possibly slot=0.
In other words once you run out of entries in the rankfile it looks like 
the processes then start from the beginning of the hostlist again.


--td

Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.  That is reasonable, but it isn't clear that it is intended.
  Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
monkey01.cluster
mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
--mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  --host monkey01  \
   -hostfile /usr/commom/etc/openmpi.machines.test1 \
   --mca plm_rsh_agent rsh  hostname
--
There are no allocated resources for the application
   hostname
that match the requested mapping:


Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--




Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com