Hi Elena

I'm copying this to the user list just to correct a mis-statement on my part
in an earlier message that went there. I had stated that a singleton could
comm_spawn onto other nodes listed in a hostfile by setting an environmental
variable that pointed us to the hostfile.

This is incorrect in the 1.2 code series. That series does not allow
singletons to read a hostfile at all. Hence, any comm_spawn done by a
singleton can only launch child processes on the singleton's local host.

This situation has been corrected for the upcoming 1.3 code series. For the
1.2 series, though, you will have to do it via an mpirun command line.

Sorry for the confusion - I sometimes have too many code families to keep
straight in this old mind!

Ralph


On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:

> Hello Ralph,
> 
> Thank you very much for the explanations.
> But I still do not get it running...
> 
> For the case 
> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
> everything works.
> 
> For the case 
> ./my_master.exe
> it does not.
> 
> I did:
> - create my_hostfile and put it in the $HOME/.openmpi/components/
>   my_hostfile :
> bollenstreek slots=2 max_slots=3
> octocore01 slots=8  max_slots=8
> octocore02 slots=8  max_slots=8
> clstr000 slots=2 max_slots=3
> clstr001 slots=2 max_slots=3
> clstr002 slots=2 max_slots=3
> clstr003 slots=2 max_slots=3
> clstr004 slots=2 max_slots=3
> clstr005 slots=2 max_slots=3
> clstr006 slots=2 max_slots=3
> clstr007 slots=2 max_slots=3
> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in .tcshrc and
> then source .tcshrc)
> - in my_master.cpp I did
>   MPI_Info info1;
>   MPI_Info_create(&info1);
>   char* hostname =
> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>   MPI_Info_set(info1, "host", hostname);
> 
>   _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0,
> MPI_ERRCODES_IGNORE);
> 
> - After I call the executable, I've got this error message
> 
> bollenstreek: > ./my_master
> number of processes to run: 1
> --------------------------------------------------------------------------
> Some of the requested hosts are not included in the current allocation for
> the application:
>   ./childexe
> The requested hosts were:
>   clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
> 
> Verify that you have mapped the allocated resources properly using the
> --host specification.
> --------------------------------------------------------------------------
> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> base/rmaps_base_support_fns.c at line 225
> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> rmaps_rr.c at line 478
> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> base/rmaps_base_map_job.c at line 210
> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> rmgr_urm.c at line 372
> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> communicator/comm_dyn.c at line 608
> 
> Did I miss something?
> Thanks for help!
> 
> Elena
> 
> 
> -----Original Message-----
> From: Ralph H Castain [mailto:r...@lanl.gov]
> Sent: Tuesday, December 18, 2007 3:50 PM
> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
> Cc: Ralph H Castain
> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
> 
> 
> 
> 
> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
> 
>> Thanks a lot! Now it works!
>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass
> MPI_Info
>> Key to the Spawn function!
>> 
>> One more question: is it necessary to start my "master" program with
>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ?
> 
> No, it isn't necessary - assuming that my_master_host is the first host
> listed in your hostfile! If you are only executing one my_master.exe (i.e.,
> you gave -n 1 to mpirun), then we will automatically map that process onto
> the first host in your hostfile.
> 
> If you want my_master.exe to go on someone other than the first host in the
> file, then you have to give us the -host option.
> 
>> 
>> Are there other possibilities for easy start?
>> I would say just to run ./my_master.exe , but then the master process
> doesn't
>> know about the available in the network hosts.
> 
> You can set the hostfile parameter in your environment instead of on the
> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
> 
> You can then just run ./my_master.exe on the host where you want the master
> to reside - everything should work the same.
> 
> Just as an FYI: the name of that environmental variable is going to change
> in the 1.3 release, but everything will still work the same.
> 
> Hope that helps
> Ralph
> 
> 
>>  
>> Thanks and regards,
>> Elena
>> 
>> 
>> -----Original Message-----
>> From: Ralph H Castain [mailto:r...@lanl.gov]
>> Sent: Monday, December 17, 2007 5:49 PM
>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>> Cc: Ralph H Castain
>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
>> 
>> 
>> 
>> 
>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>> 
>>> Hello Ralph,
>>> 
>>> Thank you for your answer.
>>> 
>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0.
>>> My "master" executable runs only on the one local host, then it spawns
>>> "slaves" (with MPI::Intracomm::Spawn).
>>> My question was: how to determine the hosts where these "slaves" will be
>>> spawned?
>>> You said: "You have to specify all of the hosts that can be used by
>>> your job
>>> in the original hostfile". How can I specify the host file? I can not
>>> find it
>>> in the documentation.
>> 
>> Hmmm...sorry about the lack of documentation. I always assumed that the MPI
>> folks in the project would document such things since it has little to do
>> with the underlying run-time, but I guess that fell through the cracks.
>> 
>> There are two parts to your question:
>> 
>> 1. how to specify the hosts to be used for the entire job. I believe that
> is
>> somewhat covered here:
>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>> 
>> That FAQ tells you what a hostfile should look like, though you may already
>> know that. Basically, we require that you list -all- of the nodes that both
>> your master and slave programs will use.
>> 
>> 2. how to specify which nodes are available for the master, and which for
>> the slave.
>> 
>> You would specify the host for your master on the mpirun command line with
>> something like:
>> 
>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>> 
>> This directs Open MPI to map that specified executable on the specified
> host
>> - note that my_master_host must have been in my_hostfile.
>> 
>> Inside your master, you would create an MPI_Info key "host" that has a
> value
>> consisting of a string "host1,host2,host3" identifying the hosts you want
>> your slave to execute upon. Those hosts must have been included in
>> my_hostfile. Include that key in the MPI_Info array passed to your Spawn.
>> 
>> We don't currently support providing a hostfile for the slaves (as opposed
>> to the host-at-a-time string above). This may become available in a future
>> release - TBD.
>> 
>> Hope that helps
>> Ralph
>> 
>>> 
>>> Thanks and regards,
>>> Elena
>>> 
>>> -----Original Message-----
>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>> Behalf Of Ralph H Castain
>>> Sent: Monday, December 17, 2007 3:31 PM
>>> To: Open MPI Users <us...@open-mpi.org>
>>> Cc: Ralph H Castain
>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>> configuration
>>> 
>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>> 
>>>> 
>>>> Hello,
>>>> 
>>>> I'm working on a MPI application where I'm using OpenMPI instead of
>>>> MPICH.
>>>> 
>>>> In my "master" program I call the function MPI::Intracomm::Spawn which
>>> spawns
>>>> "slave" processes. It is not clear for me how to spawn the "slave"
>>> processes
>>>> over the network. Currently "master" creates "slaves" on the same
>>>> host.
>>>> 
>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are spawn
>>>> over
>>> the
>>>> network as expected. But now I need to spawn processes over the
>>>> network
>>> from
>>>> my own executable using MPI::Intracomm::Spawn, how can I achieve it?
>>>> 
>>> 
>>> I'm not sure from your description exactly what you are trying to do,
>>> nor in
>>> what environment this is all operating within or what version of Open
>>> MPI
>>> you are using. Setting aside the environment and version issue, I'm
>>> guessing
>>> that you are running your executable over some specified set of hosts,
>>> but
>>> want to provide a different hostfile that specifies the hosts to be
>>> used for
>>> the "slave" processes. Correct?
>>> 
>>> If that is correct, then I'm afraid you can't do that in any version
>>> of Open
>>> MPI today. You have to specify all of the hosts that can be used by
>>> your job
>>> in the original hostfile. You can then specify a subset of those hosts
>>> to be
>>> used by your original "master" program, and then specify a different
>>> subset
>>> to be used by the "slaves" when calling Spawn.
>>> 
>>> But the system requires that you tell it -all- of the hosts that are
>>> going
>>> to be used at the beginning of the job.
>>> 
>>> At the moment, there is no plan to remove that requirement, though
>>> there has
>>> been occasional discussion about doing so at some point in the future.
>>> No
>>> promises that it will happen, though - managed environments, in
>>> particular,
>>> currently object to the idea of changing the allocation on-the-fly. We
>>> may,
>>> though, make a provision for purely hostfile-based environments (i.e.,
>>> unmanaged) at some time in the future.
>>> 
>>> Ralph
>>> 
>>>> 
>>>> 
>>>> Thanks in advance for any help.
>>>> 
>>>> Elena
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 


Reply via email to