Hmmm...what is in your "hostsfile"? On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote:
> Hi Ralph - > > Thanks for confirming this is possible. I'm trying this and currently > failing. Perhaps there's something I'm missing in the code to make > this work. Here are the two instantiations and their outputs: > >> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe > cannot start slaves... not enough nodes > >> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe > master spawned 1 slaves... > slave responding... > > > The code: > > //master.cpp > #include <mpi.h> > #include <boost/filesystem.hpp> > #include <iostream> > > int main(int argc, char **args) { > int worldSize, universeSize, *puniverseSize, flag; > > MPI_Comm everyone; //intercomm > boost::filesystem::path curPath = > boost::filesystem::absolute(boost::filesystem::current_path()); > > std::string toRun = (curPath / "slave_exe").string(); > > int ret = MPI_Init(&argc, &args); > > if(ret != MPI_SUCCESS) { > std::cerr << "failed init" << std::endl; > return -1; > } > > MPI_Comm_size(MPI_COMM_WORLD, &worldSize); > > if(worldSize != 1) { > std::cerr << "too many masters" << std::endl; > } > > MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag); > > if(!flag) { > std::cerr << "no universe size" << std::endl; > return -1; > } > universeSize = *puniverseSize; > if(universeSize == 1) { > std::cerr << "cannot start slaves... not enough nodes" << std::endl; > } > > > char *buf = (char*)alloca(toRun.size() + 1); > memcpy(buf, toRun.c_str(), toRun.size()); > buf[toRun.size()] = '\0'; > > MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL, > 0, MPI_COMM_SELF, &everyone, > MPI_ERRCODES_IGNORE); > > std::cerr << "master spawned " << universeSize-1 << " slaves..." > << std::endl; > > MPI_Finalize(); > > return 0; > } > > > //slave.cpp > #include <mpi.h> > > int main(int argc, char **args) { > int size; > MPI_Comm parent; > MPI_Init(&argc, &args); > > MPI_Comm_get_parent(&parent); > > if(parent == MPI_COMM_NULL) { > std::cerr << "slave has no parent" << std::endl; > } > MPI_Comm_remote_size(parent, &size); > if(size != 1) { > std::cerr << "parent size is " << size << std::endl; > } > > std::cerr << "slave responding..." << std::endl; > > MPI_Finalize(); > > return 0; > } > > > Any ideas? Thanks for any help. > > Brian > > On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote: >> It really is just that simple :-) >> >> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> wrote: >> >>> Okay. Is there a tutorial or FAQ for setting everything up? Or is it >>> really just that simple? I don't need to run a copy of the orte >>> server somewhere? >>> >>> if my current ip is 192.168.0.1, >>> >>> 0 > echo 192.168.0.11 > /tmp/hostfile >>> 1 > echo 192.168.0.12 >> /tmp/hostfile >>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile >>> 3 > ./mySpawningExe >>> >>> At this point, mySpawningExe will be the master, running on >>> 192.168.0.1, and I can have spawned, for example, childExe on >>> 192.168.0.11 and 192.168.0.12? Or childExe1 on 192.168.0.11 and >>> childExe2 on 192.168.0.12? >>> >>> Thanks for the help. >>> >>> Brian >>> >>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>> Sure, that's still true on all 1.3 or above releases. All you need to do >>>> is set the hostfile envar so we pick it up: >>>> >>>> OMPI_MCA_orte_default_hostfile=<foo> >>>> >>>> >>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>>> >>>>> Hi. I know this is an old thread, but I'm curious if there are any >>>>> tutorials describing how to set this up? Is this still available on >>>>> newer open mpi versions? >>>>> >>>>> Thanks, >>>>> Brian >>>>> >>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> wrote: >>>>>> Hi Elena >>>>>> >>>>>> I'm copying this to the user list just to correct a mis-statement on my >>>>>> part >>>>>> in an earlier message that went there. I had stated that a singleton >>>>>> could >>>>>> comm_spawn onto other nodes listed in a hostfile by setting an >>>>>> environmental >>>>>> variable that pointed us to the hostfile. >>>>>> >>>>>> This is incorrect in the 1.2 code series. That series does not allow >>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done by a >>>>>> singleton can only launch child processes on the singleton's local host. >>>>>> >>>>>> This situation has been corrected for the upcoming 1.3 code series. For >>>>>> the >>>>>> 1.2 series, though, you will have to do it via an mpirun command line. >>>>>> >>>>>> Sorry for the confusion - I sometimes have too many code families to keep >>>>>> straight in this old mind! >>>>>> >>>>>> Ralph >>>>>> >>>>>> >>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>> >>>>>>> Hello Ralph, >>>>>>> >>>>>>> Thank you very much for the explanations. >>>>>>> But I still do not get it running... >>>>>>> >>>>>>> For the case >>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe >>>>>>> everything works. >>>>>>> >>>>>>> For the case >>>>>>> ./my_master.exe >>>>>>> it does not. >>>>>>> >>>>>>> I did: >>>>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/ >>>>>>> my_hostfile : >>>>>>> bollenstreek slots=2 max_slots=3 >>>>>>> octocore01 slots=8 max_slots=8 >>>>>>> octocore02 slots=8 max_slots=8 >>>>>>> clstr000 slots=2 max_slots=3 >>>>>>> clstr001 slots=2 max_slots=3 >>>>>>> clstr002 slots=2 max_slots=3 >>>>>>> clstr003 slots=2 max_slots=3 >>>>>>> clstr004 slots=2 max_slots=3 >>>>>>> clstr005 slots=2 max_slots=3 >>>>>>> clstr006 slots=2 max_slots=3 >>>>>>> clstr007 slots=2 max_slots=3 >>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in .tcshrc >>>>>>> and >>>>>>> then source .tcshrc) >>>>>>> - in my_master.cpp I did >>>>>>> MPI_Info info1; >>>>>>> MPI_Info_create(&info1); >>>>>>> char* hostname = >>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02"; >>>>>>> MPI_Info_set(info1, "host", hostname); >>>>>>> >>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0, >>>>>>> MPI_ERRCODES_IGNORE); >>>>>>> >>>>>>> - After I call the executable, I've got this error message >>>>>>> >>>>>>> bollenstreek: > ./my_master >>>>>>> number of processes to run: 1 >>>>>>> -------------------------------------------------------------------------- >>>>>>> Some of the requested hosts are not included in the current allocation >>>>>>> for >>>>>>> the application: >>>>>>> ./childexe >>>>>>> The requested hosts were: >>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02 >>>>>>> >>>>>>> Verify that you have mapped the allocated resources properly using the >>>>>>> --host specification. >>>>>>> -------------------------------------------------------------------------- >>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>> base/rmaps_base_support_fns.c at line 225 >>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>> rmaps_rr.c at line 478 >>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>> base/rmaps_base_map_job.c at line 210 >>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>> rmgr_urm.c at line 372 >>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>> communicator/comm_dyn.c at line 608 >>>>>>> >>>>>>> Did I miss something? >>>>>>> Thanks for help! >>>>>>> >>>>>>> Elena >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM >>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org> >>>>>>> Cc: Ralph H Castain >>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>> configuration >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>> >>>>>>>> Thanks a lot! Now it works! >>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass >>>>>>> MPI_Info >>>>>>>> Key to the Spawn function! >>>>>>>> >>>>>>>> One more question: is it necessary to start my "master" program with >>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ? >>>>>>> >>>>>>> No, it isn't necessary - assuming that my_master_host is the first host >>>>>>> listed in your hostfile! If you are only executing one my_master.exe >>>>>>> (i.e., >>>>>>> you gave -n 1 to mpirun), then we will automatically map that process >>>>>>> onto >>>>>>> the first host in your hostfile. >>>>>>> >>>>>>> If you want my_master.exe to go on someone other than the first host in >>>>>>> the >>>>>>> file, then you have to give us the -host option. >>>>>>> >>>>>>>> >>>>>>>> Are there other possibilities for easy start? >>>>>>>> I would say just to run ./my_master.exe , but then the master process >>>>>>> doesn't >>>>>>>> know about the available in the network hosts. >>>>>>> >>>>>>> You can set the hostfile parameter in your environment instead of on the >>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts. >>>>>>> >>>>>>> You can then just run ./my_master.exe on the host where you want the >>>>>>> master >>>>>>> to reside - everything should work the same. >>>>>>> >>>>>>> Just as an FYI: the name of that environmental variable is going to >>>>>>> change >>>>>>> in the 1.3 release, but everything will still work the same. >>>>>>> >>>>>>> Hope that helps >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks and regards, >>>>>>>> Elena >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>> Sent: Monday, December 17, 2007 5:49 PM >>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel >>>>>>>> Cc: Ralph H Castain >>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>> configuration >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>> >>>>>>>>> Hello Ralph, >>>>>>>>> >>>>>>>>> Thank you for your answer. >>>>>>>>> >>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0. >>>>>>>>> My "master" executable runs only on the one local host, then it spawns >>>>>>>>> "slaves" (with MPI::Intracomm::Spawn). >>>>>>>>> My question was: how to determine the hosts where these "slaves" will >>>>>>>>> be >>>>>>>>> spawned? >>>>>>>>> You said: "You have to specify all of the hosts that can be used by >>>>>>>>> your job >>>>>>>>> in the original hostfile". How can I specify the host file? I can not >>>>>>>>> find it >>>>>>>>> in the documentation. >>>>>>>> >>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed that >>>>>>>> the MPI >>>>>>>> folks in the project would document such things since it has little to >>>>>>>> do >>>>>>>> with the underlying run-time, but I guess that fell through the cracks. >>>>>>>> >>>>>>>> There are two parts to your question: >>>>>>>> >>>>>>>> 1. how to specify the hosts to be used for the entire job. I believe >>>>>>>> that >>>>>>> is >>>>>>>> somewhat covered here: >>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run >>>>>>>> >>>>>>>> That FAQ tells you what a hostfile should look like, though you may >>>>>>>> already >>>>>>>> know that. Basically, we require that you list -all- of the nodes that >>>>>>>> both >>>>>>>> your master and slave programs will use. >>>>>>>> >>>>>>>> 2. how to specify which nodes are available for the master, and which >>>>>>>> for >>>>>>>> the slave. >>>>>>>> >>>>>>>> You would specify the host for your master on the mpirun command line >>>>>>>> with >>>>>>>> something like: >>>>>>>> >>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe >>>>>>>> >>>>>>>> This directs Open MPI to map that specified executable on the specified >>>>>>> host >>>>>>>> - note that my_master_host must have been in my_hostfile. >>>>>>>> >>>>>>>> Inside your master, you would create an MPI_Info key "host" that has a >>>>>>> value >>>>>>>> consisting of a string "host1,host2,host3" identifying the hosts you >>>>>>>> want >>>>>>>> your slave to execute upon. Those hosts must have been included in >>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to your >>>>>>>> Spawn. >>>>>>>> >>>>>>>> We don't currently support providing a hostfile for the slaves (as >>>>>>>> opposed >>>>>>>> to the host-at-a-time string above). This may become available in a >>>>>>>> future >>>>>>>> release - TBD. >>>>>>>> >>>>>>>> Hope that helps >>>>>>>> Ralph >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> Elena >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >>>>>>>>> On >>>>>>>>> Behalf Of Ralph H Castain >>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM >>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>> Cc: Ralph H Castain >>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>> configuration >>>>>>>>> >>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI instead of >>>>>>>>>> MPICH. >>>>>>>>>> >>>>>>>>>> In my "master" program I call the function MPI::Intracomm::Spawn >>>>>>>>>> which >>>>>>>>> spawns >>>>>>>>>> "slave" processes. It is not clear for me how to spawn the "slave" >>>>>>>>> processes >>>>>>>>>> over the network. Currently "master" creates "slaves" on the same >>>>>>>>>> host. >>>>>>>>>> >>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are spawn >>>>>>>>>> over >>>>>>>>> the >>>>>>>>>> network as expected. But now I need to spawn processes over the >>>>>>>>>> network >>>>>>>>> from >>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I achieve it? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure from your description exactly what you are trying to do, >>>>>>>>> nor in >>>>>>>>> what environment this is all operating within or what version of Open >>>>>>>>> MPI >>>>>>>>> you are using. Setting aside the environment and version issue, I'm >>>>>>>>> guessing >>>>>>>>> that you are running your executable over some specified set of hosts, >>>>>>>>> but >>>>>>>>> want to provide a different hostfile that specifies the hosts to be >>>>>>>>> used for >>>>>>>>> the "slave" processes. Correct? >>>>>>>>> >>>>>>>>> If that is correct, then I'm afraid you can't do that in any version >>>>>>>>> of Open >>>>>>>>> MPI today. You have to specify all of the hosts that can be used by >>>>>>>>> your job >>>>>>>>> in the original hostfile. You can then specify a subset of those hosts >>>>>>>>> to be >>>>>>>>> used by your original "master" program, and then specify a different >>>>>>>>> subset >>>>>>>>> to be used by the "slaves" when calling Spawn. >>>>>>>>> >>>>>>>>> But the system requires that you tell it -all- of the hosts that are >>>>>>>>> going >>>>>>>>> to be used at the beginning of the job. >>>>>>>>> >>>>>>>>> At the moment, there is no plan to remove that requirement, though >>>>>>>>> there has >>>>>>>>> been occasional discussion about doing so at some point in the future. >>>>>>>>> No >>>>>>>>> promises that it will happen, though - managed environments, in >>>>>>>>> particular, >>>>>>>>> currently object to the idea of changing the allocation on-the-fly. We >>>>>>>>> may, >>>>>>>>> though, make a provision for purely hostfile-based environments (i.e., >>>>>>>>> unmanaged) at some time in the future. >>>>>>>>> >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks in advance for any help. >>>>>>>>>> >>>>>>>>>> Elena >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users