Hmmm...what is in your "hostsfile"?

On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote:

> Hi Ralph -
> 
> Thanks for confirming this is possible.  I'm trying this and currently
> failing.  Perhaps there's something I'm missing in the code to make
> this work.  Here are the two instantiations and their outputs:
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
> cannot start slaves... not enough nodes
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
> master spawned 1 slaves...
> slave responding...
> 
> 
> The code:
> 
> //master.cpp
> #include <mpi.h>
> #include <boost/filesystem.hpp>
> #include <iostream>
> 
> int main(int argc, char **args) {
>    int worldSize, universeSize, *puniverseSize, flag;
> 
>    MPI_Comm everyone; //intercomm
>    boost::filesystem::path curPath =
> boost::filesystem::absolute(boost::filesystem::current_path());
> 
>    std::string toRun = (curPath / "slave_exe").string();
> 
>    int ret = MPI_Init(&argc, &args);
> 
>    if(ret != MPI_SUCCESS) {
>        std::cerr << "failed init" << std::endl;
>        return -1;
>    }
> 
>    MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
> 
>    if(worldSize != 1) {
>        std::cerr << "too many masters" << std::endl;
>    }
> 
>    MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag);
> 
>    if(!flag) {
>        std::cerr << "no universe size" << std::endl;
>        return -1;
>    }
>    universeSize = *puniverseSize;
>    if(universeSize == 1) {
>        std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>    }
> 
> 
>    char *buf = (char*)alloca(toRun.size() + 1);
>    memcpy(buf, toRun.c_str(), toRun.size());
>    buf[toRun.size()] = '\0';
> 
>    MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
> 0, MPI_COMM_SELF, &everyone,
>                   MPI_ERRCODES_IGNORE);
> 
>    std::cerr << "master spawned " << universeSize-1 << " slaves..."
> << std::endl;
> 
>    MPI_Finalize();
> 
>   return 0;
> }
> 
> 
> //slave.cpp
> #include <mpi.h>
> 
> int main(int argc, char **args) {
>    int size;
>    MPI_Comm parent;
>    MPI_Init(&argc, &args);
> 
>    MPI_Comm_get_parent(&parent);
> 
>    if(parent == MPI_COMM_NULL) {
>        std::cerr << "slave has no parent" << std::endl;
>    }
>    MPI_Comm_remote_size(parent, &size);
>    if(size != 1) {
>        std::cerr << "parent size is " << size << std::endl;
>    }
> 
>    std::cerr << "slave responding..." << std::endl;
> 
>    MPI_Finalize();
> 
>    return 0;
> }
> 
> 
> Any ideas?  Thanks for any help.
> 
>  Brian
> 
> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> It really is just that simple :-)
>> 
>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> wrote:
>> 
>>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
>>> really just that simple?  I don't need to run a copy of the orte
>>> server somewhere?
>>> 
>>> if my current ip is 192.168.0.1,
>>> 
>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>> 3 > ./mySpawningExe
>>> 
>>> At this point, mySpawningExe will be the master, running on
>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>>> childExe2 on 192.168.0.12?
>>> 
>>> Thanks for the help.
>>> 
>>> Brian
>>> 
>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> Sure, that's still true on all 1.3 or above releases. All you need to do 
>>>> is set the hostfile envar so we pick it up:
>>>> 
>>>> OMPI_MCA_orte_default_hostfile=<foo>
>>>> 
>>>> 
>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>> 
>>>>> Hi.  I know this is an old thread, but I'm curious if there are any
>>>>> tutorials describing how to set this up?  Is this still available on
>>>>> newer open mpi versions?
>>>>> 
>>>>> Thanks,
>>>>> Brian
>>>>> 
>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> wrote:
>>>>>> Hi Elena
>>>>>> 
>>>>>> I'm copying this to the user list just to correct a mis-statement on my 
>>>>>> part
>>>>>> in an earlier message that went there. I had stated that a singleton 
>>>>>> could
>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an 
>>>>>> environmental
>>>>>> variable that pointed us to the hostfile.
>>>>>> 
>>>>>> This is incorrect in the 1.2 code series. That series does not allow
>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done by a
>>>>>> singleton can only launch child processes on the singleton's local host.
>>>>>> 
>>>>>> This situation has been corrected for the upcoming 1.3 code series. For 
>>>>>> the
>>>>>> 1.2 series, though, you will have to do it via an mpirun command line.
>>>>>> 
>>>>>> Sorry for the confusion - I sometimes have too many code families to keep
>>>>>> straight in this old mind!
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>>>> 
>>>>>>> Hello Ralph,
>>>>>>> 
>>>>>>> Thank you very much for the explanations.
>>>>>>> But I still do not get it running...
>>>>>>> 
>>>>>>> For the case
>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>>>>>>> everything works.
>>>>>>> 
>>>>>>> For the case
>>>>>>> ./my_master.exe
>>>>>>> it does not.
>>>>>>> 
>>>>>>> I did:
>>>>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/
>>>>>>> my_hostfile :
>>>>>>> bollenstreek slots=2 max_slots=3
>>>>>>> octocore01 slots=8  max_slots=8
>>>>>>> octocore02 slots=8  max_slots=8
>>>>>>> clstr000 slots=2 max_slots=3
>>>>>>> clstr001 slots=2 max_slots=3
>>>>>>> clstr002 slots=2 max_slots=3
>>>>>>> clstr003 slots=2 max_slots=3
>>>>>>> clstr004 slots=2 max_slots=3
>>>>>>> clstr005 slots=2 max_slots=3
>>>>>>> clstr006 slots=2 max_slots=3
>>>>>>> clstr007 slots=2 max_slots=3
>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in .tcshrc 
>>>>>>> and
>>>>>>> then source .tcshrc)
>>>>>>> - in my_master.cpp I did
>>>>>>> MPI_Info info1;
>>>>>>> MPI_Info_create(&info1);
>>>>>>> char* hostname =
>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>>>>> MPI_Info_set(info1, "host", hostname);
>>>>>>> 
>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0,
>>>>>>> MPI_ERRCODES_IGNORE);
>>>>>>> 
>>>>>>> - After I call the executable, I've got this error message
>>>>>>> 
>>>>>>> bollenstreek: > ./my_master
>>>>>>> number of processes to run: 1
>>>>>>> --------------------------------------------------------------------------
>>>>>>> Some of the requested hosts are not included in the current allocation 
>>>>>>> for
>>>>>>> the application:
>>>>>>> ./childexe
>>>>>>> The requested hosts were:
>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>>>> 
>>>>>>> Verify that you have mapped the allocated resources properly using the
>>>>>>> --host specification.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>>>>> base/rmaps_base_support_fns.c at line 225
>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>>>>> rmaps_rr.c at line 478
>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>>>>> base/rmaps_base_map_job.c at line 210
>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>>>>> rmgr_urm.c at line 372
>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>>>>>>> communicator/comm_dyn.c at line 608
>>>>>>> 
>>>>>>> Did I miss something?
>>>>>>> Thanks for help!
>>>>>>> 
>>>>>>> Elena
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
>>>>>>> Cc: Ralph H Castain
>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>> configuration
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>>>>> 
>>>>>>>> Thanks a lot! Now it works!
>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass
>>>>>>> MPI_Info
>>>>>>>> Key to the Spawn function!
>>>>>>>> 
>>>>>>>> One more question: is it necessary to start my "master" program with
>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ?
>>>>>>> 
>>>>>>> No, it isn't necessary - assuming that my_master_host is the first host
>>>>>>> listed in your hostfile! If you are only executing one my_master.exe 
>>>>>>> (i.e.,
>>>>>>> you gave -n 1 to mpirun), then we will automatically map that process 
>>>>>>> onto
>>>>>>> the first host in your hostfile.
>>>>>>> 
>>>>>>> If you want my_master.exe to go on someone other than the first host in 
>>>>>>> the
>>>>>>> file, then you have to give us the -host option.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Are there other possibilities for easy start?
>>>>>>>> I would say just to run ./my_master.exe , but then the master process
>>>>>>> doesn't
>>>>>>>> know about the available in the network hosts.
>>>>>>> 
>>>>>>> You can set the hostfile parameter in your environment instead of on the
>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>>>>>> 
>>>>>>> You can then just run ./my_master.exe on the host where you want the 
>>>>>>> master
>>>>>>> to reside - everything should work the same.
>>>>>>> 
>>>>>>> Just as an FYI: the name of that environmental variable is going to 
>>>>>>> change
>>>>>>> in the 1.3 release, but everything will still work the same.
>>>>>>> 
>>>>>>> Hope that helps
>>>>>>> Ralph
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks and regards,
>>>>>>>> Elena
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>>>>>>>> Cc: Ralph H Castain
>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>> configuration
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>>>>>> 
>>>>>>>>> Hello Ralph,
>>>>>>>>> 
>>>>>>>>> Thank you for your answer.
>>>>>>>>> 
>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0.
>>>>>>>>> My "master" executable runs only on the one local host, then it spawns
>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>>>>> My question was: how to determine the hosts where these "slaves" will 
>>>>>>>>> be
>>>>>>>>> spawned?
>>>>>>>>> You said: "You have to specify all of the hosts that can be used by
>>>>>>>>> your job
>>>>>>>>> in the original hostfile". How can I specify the host file? I can not
>>>>>>>>> find it
>>>>>>>>> in the documentation.
>>>>>>>> 
>>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed that 
>>>>>>>> the MPI
>>>>>>>> folks in the project would document such things since it has little to 
>>>>>>>> do
>>>>>>>> with the underlying run-time, but I guess that fell through the cracks.
>>>>>>>> 
>>>>>>>> There are two parts to your question:
>>>>>>>> 
>>>>>>>> 1. how to specify the hosts to be used for the entire job. I believe 
>>>>>>>> that
>>>>>>> is
>>>>>>>> somewhat covered here:
>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>>>> 
>>>>>>>> That FAQ tells you what a hostfile should look like, though you may 
>>>>>>>> already
>>>>>>>> know that. Basically, we require that you list -all- of the nodes that 
>>>>>>>> both
>>>>>>>> your master and slave programs will use.
>>>>>>>> 
>>>>>>>> 2. how to specify which nodes are available for the master, and which 
>>>>>>>> for
>>>>>>>> the slave.
>>>>>>>> 
>>>>>>>> You would specify the host for your master on the mpirun command line 
>>>>>>>> with
>>>>>>>> something like:
>>>>>>>> 
>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>>>>>>>> 
>>>>>>>> This directs Open MPI to map that specified executable on the specified
>>>>>>> host
>>>>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>>>> 
>>>>>>>> Inside your master, you would create an MPI_Info key "host" that has a
>>>>>>> value
>>>>>>>> consisting of a string "host1,host2,host3" identifying the hosts you 
>>>>>>>> want
>>>>>>>> your slave to execute upon. Those hosts must have been included in
>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to your 
>>>>>>>> Spawn.
>>>>>>>> 
>>>>>>>> We don't currently support providing a hostfile for the slaves (as 
>>>>>>>> opposed
>>>>>>>> to the host-at-a-time string above). This may become available in a 
>>>>>>>> future
>>>>>>>> release - TBD.
>>>>>>>> 
>>>>>>>> Hope that helps
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks and regards,
>>>>>>>>> Elena
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
>>>>>>>>> On
>>>>>>>>> Behalf Of Ralph H Castain
>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>>>>>>>> configuration
>>>>>>>>> 
>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI instead of
>>>>>>>>>> MPICH.
>>>>>>>>>> 
>>>>>>>>>> In my "master" program I call the function MPI::Intracomm::Spawn 
>>>>>>>>>> which
>>>>>>>>> spawns
>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the "slave"
>>>>>>>>> processes
>>>>>>>>>> over the network. Currently "master" creates "slaves" on the same
>>>>>>>>>> host.
>>>>>>>>>> 
>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are spawn
>>>>>>>>>> over
>>>>>>>>> the
>>>>>>>>>> network as expected. But now I need to spawn processes over the
>>>>>>>>>> network
>>>>>>>>> from
>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I achieve it?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm not sure from your description exactly what you are trying to do,
>>>>>>>>> nor in
>>>>>>>>> what environment this is all operating within or what version of Open
>>>>>>>>> MPI
>>>>>>>>> you are using. Setting aside the environment and version issue, I'm
>>>>>>>>> guessing
>>>>>>>>> that you are running your executable over some specified set of hosts,
>>>>>>>>> but
>>>>>>>>> want to provide a different hostfile that specifies the hosts to be
>>>>>>>>> used for
>>>>>>>>> the "slave" processes. Correct?
>>>>>>>>> 
>>>>>>>>> If that is correct, then I'm afraid you can't do that in any version
>>>>>>>>> of Open
>>>>>>>>> MPI today. You have to specify all of the hosts that can be used by
>>>>>>>>> your job
>>>>>>>>> in the original hostfile. You can then specify a subset of those hosts
>>>>>>>>> to be
>>>>>>>>> used by your original "master" program, and then specify a different
>>>>>>>>> subset
>>>>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>>>> 
>>>>>>>>> But the system requires that you tell it -all- of the hosts that are
>>>>>>>>> going
>>>>>>>>> to be used at the beginning of the job.
>>>>>>>>> 
>>>>>>>>> At the moment, there is no plan to remove that requirement, though
>>>>>>>>> there has
>>>>>>>>> been occasional discussion about doing so at some point in the future.
>>>>>>>>> No
>>>>>>>>> promises that it will happen, though - managed environments, in
>>>>>>>>> particular,
>>>>>>>>> currently object to the idea of changing the allocation on-the-fly. We
>>>>>>>>> may,
>>>>>>>>> though, make a provision for purely hostfile-based environments (i.e.,
>>>>>>>>> unmanaged) at some time in the future.
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks in advance for any help.
>>>>>>>>>> 
>>>>>>>>>> Elena
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to