I checked that bug using the current 1.8.4 branch and I can’t replicate it - 
looks like it might have already been fixed. If I give a hostfile like the one 
you described:
node1
node1
node2
node3

and then ask to launch four processes:
mpirun -n 4 --display-allocation --display-map --do-not-launch --do-not-resolve 
-hostfile ./hosts hostname

I get the following allocation and map:

======================   ALLOCATED NODES   ======================
        bend001: slots=6 max_slots=0 slots_inuse=0 state=UP
        node1: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
        node2: slots=12 max_slots=0 slots_inuse=0 state=UNKNOWN
        node3: slots=12 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
 Data for JOB [54391,1] offset 0

 ========================   JOB MAP   ========================

 Data for node: node1   Num slots: 2    Max slots: 0    Num procs: 2
        Process OMPI jobid: [54391,1] App: 0 Process rank: 0
        Process OMPI jobid: [54391,1] App: 0 Process rank: 1

 Data for node: node2   Num slots: 12   Max slots: 0    Num procs: 2
        Process OMPI jobid: [54391,1] App: 0 Process rank: 2
        Process OMPI jobid: [54391,1] App: 0 Process rank: 3

Note that we see the host where mpirun is executing in the “allocation”, but it 
isn’t used as we specified a hostfile that didn’t include it. Also, you see the 
impact of the autodetect algo. Since I specified node1 more than once, we 
assume this is intended to provide a slot count and use that instead of what we 
detect. Since node2 and node3 were only given once, we autodetect those cores 
and set the slots equal to them.

The job map matches what I would have expected, so I think we are okay here.

HTH
Ralph


> On Nov 11, 2014, at 8:10 AM, Blosch, Edwin L <edwin.l.blo...@lmco.com> wrote:
> 
> Thanks Ralph.  I’ll experiment with these options.  Much appreciated.
>  
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Tuesday, November 11, 2014 10:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to 
> hosts file
>  
>  
> On Nov 11, 2014, at 6:11 AM, Blosch, Edwin L <edwin.l.blo...@lmco.com 
> <mailto:edwin.l.blo...@lmco.com>> wrote:
>  
> OK, that’s what I was suspecting.  It’s a bug, right?  I asked for 4 
> processes and I supplied a host file with 4 lines in it, and mpirun didn’t 
> launch the processes where I told it to launch them. 
>  
> Actually, no - it’s an intended “feature”. When the dinosaurs still roamed 
> the earth and OMPI was an infant, we had no way of detecting the number of 
> processors on a node in advance of the map/launch phase. During that time, 
> users were required to tell us that info in the hostfile, which was a source 
> of constant complaint.
>  
> Since that time, we have changed the launch procedure so we do have access to 
> that info when we need it. Accordingly, we now check to see if you told us 
> the number of slots on each node in the hostfile - if not, then we autodetect 
> it for you.
>  
> Quite honestly, it sounds to me like you might be happier using the 
> “sequential” mapper for this use case. It will place one proc on each of the 
> indicated nodes, with the rank set by the order in the hostfile. So a 
> hostfile like this:
>  
> node1
> node2
> node1
> node3
>  
> will result in
> rank 0 -> node1
> rank 1 -> node2
> rank 2 -> node1
> rank 3 -> node3
>  
> etc. To use it, just add "-mca rmaps seq" to you cmd line. Alternatively, you 
> could add “--map-by node" to your cmd line and we will round-robin by node.
> 
> 
>  
> Do you know when or if this changed?  I can’t recall seeing this this 
> behavior in 1.6.5 or 1.4 or 1.2, and I know I’ve run cases across workstation 
> clusters, so I think I would have noticed this behavior. 
>  
> It changed early in the 1.7 series, and has remained consistent since then.
> 
> 
>  
> Can I throw another one at you, most likely related?  On a system where 
> node01, node02, node03, and node04 already had a full load of work (i.e. 
> other applications were running a number of processes equal to the number of 
> cores on each node), I had a hosts file like this:  node01, node01, node02, 
> node02.   I asked for 4 processes.  mpirun launched them as I would think: 
> rank 0 and rank 1 on node01, and rank 2 and 3 on node02.  Then I tried 
> node01, node01, node02, node03.  In this case, all 4 processes were launched 
> on node01.  Is there a logical explanation for this behavior as well?
>  
> Now that one is indeed a bug! I’ll dig it up and fix it.
>  
> 
> 
>  
> Thanks again,
>  
> Ed
>  
>  
> From: users [mailto:users-boun...@open-mpi.org 
> <mailto:users-boun...@open-mpi.org>] On Behalf Of Ralph Castain
> Sent: Friday, November 07, 2014 11:51 AM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] Question on mapping processes to hosts 
> file
>  
> Ah, yes - so here is what is happening. When no slot info is provided, we use 
> the number of detected cores on each node as the #slots. So if you want to 
> loadbalance across the nodes, you need to set —map-by node
>  
> Or add slots=1 to each line of your host file to override the default behavior
>  
> On Nov 7, 2014, at 8:52 AM, Blosch, Edwin L <edwin.l.blo...@lmco.com 
> <mailto:edwin.l.blo...@lmco.com>> wrote:
>  
> Here’s my command:
>  
> <path_to_OpenMPI_1.8.3>/bin/mpirun <unrelated MCA options> --machinefile 
> hosts.dat -np 4 <executable>
>  
> Here’s my hosts.dat file:
>  
> % cat hosts.dat
> node01
> node02
> node03
> node04
>  
> All 4 ranks are launched on node01.  I don’t believe I’ve ever seen this 
> before.  I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I 
> expected: 1 process runs on each of the 4 nodes.  The mpirun man page says 
> ‘round-robin’, which I take to mean that one process would be launched per 
> line in the hosts file, so this really seems like incorrect behavior.
>  
> What could be the possibilities here?
>  
> Thanks for the help!
>  
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25707.php 
> <http://www.open-mpi.org/community/lists/users/2014/11/25707.php>
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25742.php 
> <http://www.open-mpi.org/community/lists/users/2014/11/25742.php>
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25752.php 
> <http://www.open-mpi.org/community/lists/users/2014/11/25752.php>

Reply via email to