Great.  I'll try applying this tomorrow and I'll let you know if it
works for me.

  Brian

On Mon, Sep 3, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Give the attached patch a try - this works for me, but I'd like it verified 
> before it goes into the next 1.6 release (singleton comm_spawn is so rarely 
> used that it can easily be overlooked for some time).
>
> Thx
> Ralph
>
>
>
>
> On Aug 31, 2012, at 3:32 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>
>> Thanks, much appreciated.
>>
>> On Fri, Aug 31, 2012 at 2:37 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> I see - well, I hope to work on it this weekend and may get it fixed. If I 
>>> do, I can provide you with a patch for the 1.6 series that you can use 
>>> until the actual release is issued, if that helps.
>>>
>>>
>>> On Aug 31, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>
>>>> Hi Ralph -
>>>>
>>>> This is true, but we may not know until well into the process whether
>>>> we need MPI at all.  We have an SMP/NUMA mode that is designed to run
>>>> faster on a single machine.  We also may build our application on
>>>> machines where there is no MPI, and we simply don't build the code
>>>> that runs the MPI functionality in that case.  We have scripts all
>>>> over the place that need to start this application, and it would be
>>>> much easier to be able to simply run the program than to figure out
>>>> when or if mpirun needs to be starting the program.
>>>>
>>>> Before, we went so far as to fork and exec a full mpirun when we run
>>>> in clustered mode.  This resulted in an additional process running,
>>>> and we had to use sockets to get the data to the new master process.
>>>> I very much like the idea of being able to have our process become the
>>>> MPI master instead, so I have been very excited about your work around
>>>> this singleton fork/exec under the hood.
>>>>
>>>> Once I get my new infrastructure designed to work with mpirun -n 1 +
>>>> spawn, I will try some previous openmpi versions to see if I can find
>>>> a version with this singleton functionality in-tact.
>>>>
>>>> Thanks again,
>>>> Brian
>>>>
>>>> On Thu, Aug 30, 2012 at 4:51 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> not off the top of my head. However, as noted earlier, there is 
>>>>> absolutely no advantage to a singleton vs mpirun start - all the 
>>>>> singleton does is immediately fork/exec "mpirun" to support the rest of 
>>>>> the job. In both cases, you have a daemon running the job - only 
>>>>> difference is in the number of characters the user types to start it.
>>>>>
>>>>>
>>>>> On Aug 30, 2012, at 8:44 AM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>
>>>>>> In the event that I need to get this up-and-running soon (I do need
>>>>>> something working within 2 weeks), can you recommend an older version
>>>>>> where this is expected to work?
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>> On Tue, Aug 28, 2012 at 4:58 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>> wrote:
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Tue, Aug 28, 2012 at 4:57 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> wrote:
>>>>>>>> Yeah, I'm seeing the hang as well when running across multiple 
>>>>>>>> machines. Let me dig a little and get this fixed.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Aug 28, 2012, at 4:51 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hmmm, I went to the build directories of openmpi for my two machines,
>>>>>>>>> went into the orte/test/mpi directory and made the executables on both
>>>>>>>>> machines.  I set the hostsfile in the env variable on the "master"
>>>>>>>>> machine.
>>>>>>>>>
>>>>>>>>> Here's the output:
>>>>>>>>>
>>>>>>>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>>>>>>>> ./simple_spawn
>>>>>>>>> Parent [pid 97504] starting up!
>>>>>>>>> 0 completed MPI_Init
>>>>>>>>> Parent [pid 97504] about to spawn!
>>>>>>>>> Parent [pid 97507] starting up!
>>>>>>>>> Parent [pid 97508] starting up!
>>>>>>>>> Parent [pid 30626] starting up!
>>>>>>>>> ^C
>>>>>>>>> zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn
>>>>>>>>>
>>>>>>>>> I had to ^C to kill the hung process.
>>>>>>>>>
>>>>>>>>> When I run using mpirun:
>>>>>>>>>
>>>>>>>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>>>>>>>> mpirun -np 1 ./simple_spawn
>>>>>>>>> Parent [pid 97511] starting up!
>>>>>>>>> 0 completed MPI_Init
>>>>>>>>> Parent [pid 97511] about to spawn!
>>>>>>>>> Parent [pid 97513] starting up!
>>>>>>>>> Parent [pid 30762] starting up!
>>>>>>>>> Parent [pid 30764] starting up!
>>>>>>>>> Parent done with spawn
>>>>>>>>> Parent sending message to child
>>>>>>>>> 1 completed MPI_Init
>>>>>>>>> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
>>>>>>>>> 0 completed MPI_Init
>>>>>>>>> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
>>>>>>>>> 2 completed MPI_Init
>>>>>>>>> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
>>>>>>>>> Child 1 disconnected
>>>>>>>>> Child 0 received msg: 38
>>>>>>>>> Child 0 disconnected
>>>>>>>>> Parent disconnected
>>>>>>>>> Child 2 disconnected
>>>>>>>>> 97511: exiting
>>>>>>>>> 97513: exiting
>>>>>>>>> 30762: exiting
>>>>>>>>> 30764: exiting
>>>>>>>>>
>>>>>>>>> As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
>>>>>>>>> installed on both machines using the default configure options.
>>>>>>>>>
>>>>>>>>> Thanks for all your help.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>> wrote:
>>>>>>>>>> Looks to me like it didn't find your executable - could be a 
>>>>>>>>>> question of where it exists relative to where you are running. If 
>>>>>>>>>> you look in your OMPI source tree at the orte/test/mpi directory, 
>>>>>>>>>> you'll see an example program "simple_spawn.c" there. Just "make 
>>>>>>>>>> simple_spawn" and execute that with your default hostfile set - does 
>>>>>>>>>> it work okay?
>>>>>>>>>>
>>>>>>>>>> It works fine for me, hence the question.
>>>>>>>>>>
>>>>>>>>>> Also, what OMPI version are you using?
>>>>>>>>>>
>>>>>>>>>> On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I see.  Okay.  So, I just tried removing the check for universe 
>>>>>>>>>>> size,
>>>>>>>>>>> and set the universe size to 2.  Here's my output:
>>>>>>>>>>>
>>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>>>>>>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>>>>>>>>>>> base/plm_base_receive.c at line 253
>>>>>>>>>>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>>>>>>>>>>> application failed to start in file dpm_orte.c at line 785
>>>>>>>>>>>
>>>>>>>>>>> The corresponding run with mpirun still works.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> I see the issue - it's here:
>>>>>>>>>>>>
>>>>>>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, 
>>>>>>>>>>>>> &flag);
>>>>>>>>>>>>>
>>>>>>>>>>>>> if(!flag) {
>>>>>>>>>>>>>  std::cerr << "no universe size" << std::endl;
>>>>>>>>>>>>>  return -1;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> universeSize = *puniverseSize;
>>>>>>>>>>>>> if(universeSize == 1) {
>>>>>>>>>>>>>  std::cerr << "cannot start slaves... not enough nodes" << 
>>>>>>>>>>>>> std::endl;
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> The universe size is set to 1 on a singleton because the attribute 
>>>>>>>>>>>> gets set at the beginning of time - we haven't any way to go back 
>>>>>>>>>>>> and change it. The sequence of events explains why. The singleton 
>>>>>>>>>>>> starts up and sets its attributes, including universe_size. It 
>>>>>>>>>>>> also spins off an orte daemon to act as its own private "mpirun" 
>>>>>>>>>>>> in case you call comm_spawn. At this point, however, no hostfile 
>>>>>>>>>>>> has been read - the singleton is just an MPI proc doing its own 
>>>>>>>>>>>> thing, and the orte daemon is just sitting there on "stand-by".
>>>>>>>>>>>>
>>>>>>>>>>>> When your app calls comm_spawn, then the orte daemon gets called 
>>>>>>>>>>>> to launch the new procs. At that time, it (not the original 
>>>>>>>>>>>> singleton!) reads the hostfile to find out how many nodes are 
>>>>>>>>>>>> around, and then does the launch.
>>>>>>>>>>>>
>>>>>>>>>>>> You are trying to check the number of nodes from within the 
>>>>>>>>>>>> singleton, which won't work - it has no way of discovering that 
>>>>>>>>>>>> info.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>> echo hostsfile
>>>>>>>>>>>>> localhost
>>>>>>>>>>>>> budgeb-sandybridge
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain 
>>>>>>>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>>>>>>>> Hmmm...what is in your "hostsfile"?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ralph -
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for confirming this is possible.  I'm trying this and 
>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>> failing.  Perhaps there's something I'm missing in the code to 
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>> this work.  Here are the two instantiations and their outputs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>>>>>>>>>>>>  OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>>>>>>>>>>> cannot start slaves... not enough nodes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>>>>>>>>>>>>  OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 
>>>>>>>>>>>>>>>> ./master_exe
>>>>>>>>>>>>>>> master spawned 1 slaves...
>>>>>>>>>>>>>>> slave responding...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The code:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> //master.cpp
>>>>>>>>>>>>>>> #include <mpi.h>
>>>>>>>>>>>>>>> #include <boost/filesystem.hpp>
>>>>>>>>>>>>>>> #include <iostream>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>>>>>>>> int worldSize, universeSize, *puniverseSize, flag;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Comm everyone; //intercomm
>>>>>>>>>>>>>>> boost::filesystem::path curPath =
>>>>>>>>>>>>>>> boost::filesystem::absolute(boost::filesystem::current_path());
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> std::string toRun = (curPath / "slave_exe").string();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> int ret = MPI_Init(&argc, &args);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if(ret != MPI_SUCCESS) {
>>>>>>>>>>>>>>>  std::cerr << "failed init" << std::endl;
>>>>>>>>>>>>>>>  return -1;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if(worldSize != 1) {
>>>>>>>>>>>>>>>  std::cerr << "too many masters" << std::endl;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, 
>>>>>>>>>>>>>>> &flag);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if(!flag) {
>>>>>>>>>>>>>>>  std::cerr << "no universe size" << std::endl;
>>>>>>>>>>>>>>>  return -1;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> universeSize = *puniverseSize;
>>>>>>>>>>>>>>> if(universeSize == 1) {
>>>>>>>>>>>>>>>  std::cerr << "cannot start slaves... not enough nodes" << 
>>>>>>>>>>>>>>> std::endl;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> char *buf = (char*)alloca(toRun.size() + 1);
>>>>>>>>>>>>>>> memcpy(buf, toRun.c_str(), toRun.size());
>>>>>>>>>>>>>>> buf[toRun.size()] = '\0';
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, 
>>>>>>>>>>>>>>> MPI_INFO_NULL,
>>>>>>>>>>>>>>> 0, MPI_COMM_SELF, &everyone,
>>>>>>>>>>>>>>>             MPI_ERRCODES_IGNORE);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> std::cerr << "master spawned " << universeSize-1 << " slaves..."
>>>>>>>>>>>>>>> << std::endl;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Finalize();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> //slave.cpp
>>>>>>>>>>>>>>> #include <mpi.h>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>>>>>>>> int size;
>>>>>>>>>>>>>>> MPI_Comm parent;
>>>>>>>>>>>>>>> MPI_Init(&argc, &args);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Comm_get_parent(&parent);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if(parent == MPI_COMM_NULL) {
>>>>>>>>>>>>>>>  std::cerr << "slave has no parent" << std::endl;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> MPI_Comm_remote_size(parent, &size);
>>>>>>>>>>>>>>> if(size != 1) {
>>>>>>>>>>>>>>>  std::cerr << "parent size is " << size << std::endl;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> std::cerr << "slave responding..." << std::endl;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MPI_Finalize();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas?  Thanks for any help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain 
>>>>>>>>>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>>>>>>>>>> It really is just that simple :-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge 
>>>>>>>>>>>>>>>> <brian.bu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Okay.  Is there a tutorial or FAQ for setting everything up?  
>>>>>>>>>>>>>>>>> Or is it
>>>>>>>>>>>>>>>>> really just that simple?  I don't need to run a copy of the 
>>>>>>>>>>>>>>>>> orte
>>>>>>>>>>>>>>>>> server somewhere?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if my current ip is 192.168.0.1,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>>>>>>>>>>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>>>>>>>>>>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>>>>>>>>>>>>>>>> 3 > ./mySpawningExe
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> At this point, mySpawningExe will be the master, running on
>>>>>>>>>>>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>>>>>>>>>>>>>>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> childExe2 on 192.168.0.12?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain 
>>>>>>>>>>>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>>>>>>>>>>>> Sure, that's still true on all 1.3 or above releases. All 
>>>>>>>>>>>>>>>>>> you need to do is set the hostfile envar so we pick it up:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge 
>>>>>>>>>>>>>>>>>> <brian.bu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi.  I know this is an old thread, but I'm curious if there 
>>>>>>>>>>>>>>>>>>> are any
>>>>>>>>>>>>>>>>>>> tutorials describing how to set this up?  Is this still 
>>>>>>>>>>>>>>>>>>> available on
>>>>>>>>>>>>>>>>>>> newer open mpi versions?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain 
>>>>>>>>>>>>>>>>>>> <r...@lanl.gov> wrote:
>>>>>>>>>>>>>>>>>>>> Hi Elena
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm copying this to the user list just to correct a 
>>>>>>>>>>>>>>>>>>>> mis-statement on my part
>>>>>>>>>>>>>>>>>>>> in an earlier message that went there. I had stated that a 
>>>>>>>>>>>>>>>>>>>> singleton could
>>>>>>>>>>>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by 
>>>>>>>>>>>>>>>>>>>> setting an environmental
>>>>>>>>>>>>>>>>>>>> variable that pointed us to the hostfile.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This is incorrect in the 1.2 code series. That series does 
>>>>>>>>>>>>>>>>>>>> not allow
>>>>>>>>>>>>>>>>>>>> singletons to read a hostfile at all. Hence, any 
>>>>>>>>>>>>>>>>>>>> comm_spawn done by a
>>>>>>>>>>>>>>>>>>>> singleton can only launch child processes on the 
>>>>>>>>>>>>>>>>>>>> singleton's local host.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This situation has been corrected for the upcoming 1.3 
>>>>>>>>>>>>>>>>>>>> code series. For the
>>>>>>>>>>>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun 
>>>>>>>>>>>>>>>>>>>> command line.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sorry for the confusion - I sometimes have too many code 
>>>>>>>>>>>>>>>>>>>> families to keep
>>>>>>>>>>>>>>>>>>>> straight in this old mind!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" 
>>>>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you very much for the explanations.
>>>>>>>>>>>>>>>>>>>>> But I still do not get it running...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>>>>>>>> everything works.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>>>>>>>> ./my_master.exe
>>>>>>>>>>>>>>>>>>>>> it does not.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I did:
>>>>>>>>>>>>>>>>>>>>> - create my_hostfile and put it in the 
>>>>>>>>>>>>>>>>>>>>> $HOME/.openmpi/components/
>>>>>>>>>>>>>>>>>>>>> my_hostfile :
>>>>>>>>>>>>>>>>>>>>> bollenstreek slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> octocore01 slots=8  max_slots=8
>>>>>>>>>>>>>>>>>>>>> octocore02 slots=8  max_slots=8
>>>>>>>>>>>>>>>>>>>>> clstr000 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr001 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr002 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr003 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr004 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr005 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr006 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> clstr007 slots=2 max_slots=3
>>>>>>>>>>>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put 
>>>>>>>>>>>>>>>>>>>>> it in .tcshrc and
>>>>>>>>>>>>>>>>>>>>> then source .tcshrc)
>>>>>>>>>>>>>>>>>>>>> - in my_master.cpp I did
>>>>>>>>>>>>>>>>>>>>> MPI_Info info1;
>>>>>>>>>>>>>>>>>>>>> MPI_Info_create(&info1);
>>>>>>>>>>>>>>>>>>>>> char* hostname =
>>>>>>>>>>>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>>>>>>>>>>>>>>>>>>> MPI_Info_set(info1, "host", hostname);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, 
>>>>>>>>>>>>>>>>>>>>> info1, 0,
>>>>>>>>>>>>>>>>>>>>> MPI_ERRCODES_IGNORE);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - After I call the executable, I've got this error message
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> bollenstreek: > ./my_master
>>>>>>>>>>>>>>>>>>>>> number of processes to run: 1
>>>>>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>> Some of the requested hosts are not included in the 
>>>>>>>>>>>>>>>>>>>>> current allocation for
>>>>>>>>>>>>>>>>>>>>> the application:
>>>>>>>>>>>>>>>>>>>>> ./childexe
>>>>>>>>>>>>>>>>>>>>> The requested hosts were:
>>>>>>>>>>>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Verify that you have mapped the allocated resources 
>>>>>>>>>>>>>>>>>>>>> properly using the
>>>>>>>>>>>>>>>>>>>>> --host specification.
>>>>>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of 
>>>>>>>>>>>>>>>>>>>>> resource in file
>>>>>>>>>>>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225
>>>>>>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of 
>>>>>>>>>>>>>>>>>>>>> resource in file
>>>>>>>>>>>>>>>>>>>>> rmaps_rr.c at line 478
>>>>>>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of 
>>>>>>>>>>>>>>>>>>>>> resource in file
>>>>>>>>>>>>>>>>>>>>> base/rmaps_base_map_job.c at line 210
>>>>>>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of 
>>>>>>>>>>>>>>>>>>>>> resource in file
>>>>>>>>>>>>>>>>>>>>> rmgr_urm.c at line 372
>>>>>>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of 
>>>>>>>>>>>>>>>>>>>>> resource in file
>>>>>>>>>>>>>>>>>>>>> communicator/comm_dyn.c at line 608
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Did I miss something?
>>>>>>>>>>>>>>>>>>>>> Thanks for help!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>>>>>>>>>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and 
>>>>>>>>>>>>>>>>>>>>> cluster configuration
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" 
>>>>>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks a lot! Now it works!
>>>>>>>>>>>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts 
>>>>>>>>>>>>>>>>>>>>>> *.exe and pass
>>>>>>>>>>>>>>>>>>>>> MPI_Info
>>>>>>>>>>>>>>>>>>>>>> Key to the Spawn function!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> One more question: is it necessary to start my "master" 
>>>>>>>>>>>>>>>>>>>>>> program with
>>>>>>>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>>>>>>>> my_master.exe ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is 
>>>>>>>>>>>>>>>>>>>>> the first host
>>>>>>>>>>>>>>>>>>>>> listed in your hostfile! If you are only executing one 
>>>>>>>>>>>>>>>>>>>>> my_master.exe (i.e.,
>>>>>>>>>>>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map 
>>>>>>>>>>>>>>>>>>>>> that process onto
>>>>>>>>>>>>>>>>>>>>> the first host in your hostfile.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If you want my_master.exe to go on someone other than the 
>>>>>>>>>>>>>>>>>>>>> first host in the
>>>>>>>>>>>>>>>>>>>>> file, then you have to give us the -host option.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Are there other possibilities for easy start?
>>>>>>>>>>>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the 
>>>>>>>>>>>>>>>>>>>>>> master process
>>>>>>>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>>>>>>> know about the available in the network hosts.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You can set the hostfile parameter in your environment 
>>>>>>>>>>>>>>>>>>>>> instead of on the
>>>>>>>>>>>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = 
>>>>>>>>>>>>>>>>>>>>> my.hosts.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You can then just run ./my_master.exe on the host where 
>>>>>>>>>>>>>>>>>>>>> you want the master
>>>>>>>>>>>>>>>>>>>>> to reside - everything should work the same.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Just as an FYI: the name of that environmental variable 
>>>>>>>>>>>>>>>>>>>>> is going to change
>>>>>>>>>>>>>>>>>>>>> in the 1.3 release, but everything will still work the 
>>>>>>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>>>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>>>>>>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and 
>>>>>>>>>>>>>>>>>>>>>> cluster configuration
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" 
>>>>>>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thank you for your answer.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux 
>>>>>>>>>>>>>>>>>>>>>>> Suse 10.0.
>>>>>>>>>>>>>>>>>>>>>>> My "master" executable runs only on the one local host, 
>>>>>>>>>>>>>>>>>>>>>>> then it spawns
>>>>>>>>>>>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>>>>>>>>>>>>>>>>>>> My question was: how to determine the hosts where these 
>>>>>>>>>>>>>>>>>>>>>>> "slaves" will be
>>>>>>>>>>>>>>>>>>>>>>> spawned?
>>>>>>>>>>>>>>>>>>>>>>> You said: "You have to specify all of the hosts that 
>>>>>>>>>>>>>>>>>>>>>>> can be used by
>>>>>>>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>>>>>>>> in the original hostfile". How can I specify the host 
>>>>>>>>>>>>>>>>>>>>>>> file? I can not
>>>>>>>>>>>>>>>>>>>>>>> find it
>>>>>>>>>>>>>>>>>>>>>>> in the documentation.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always 
>>>>>>>>>>>>>>>>>>>>>> assumed that the MPI
>>>>>>>>>>>>>>>>>>>>>> folks in the project would document such things since it 
>>>>>>>>>>>>>>>>>>>>>> has little to do
>>>>>>>>>>>>>>>>>>>>>> with the underlying run-time, but I guess that fell 
>>>>>>>>>>>>>>>>>>>>>> through the cracks.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> There are two parts to your question:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire 
>>>>>>>>>>>>>>>>>>>>>> job. I believe that
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> somewhat covered here:
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, 
>>>>>>>>>>>>>>>>>>>>>> though you may already
>>>>>>>>>>>>>>>>>>>>>> know that. Basically, we require that you list -all- of 
>>>>>>>>>>>>>>>>>>>>>> the nodes that both
>>>>>>>>>>>>>>>>>>>>>> your master and slave programs will use.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2. how to specify which nodes are available for the 
>>>>>>>>>>>>>>>>>>>>>> master, and which for
>>>>>>>>>>>>>>>>>>>>>> the slave.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> You would specify the host for your master on the mpirun 
>>>>>>>>>>>>>>>>>>>>>> command line with
>>>>>>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This directs Open MPI to map that specified executable 
>>>>>>>>>>>>>>>>>>>>>> on the specified
>>>>>>>>>>>>>>>>>>>>> host
>>>>>>>>>>>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Inside your master, you would create an MPI_Info key 
>>>>>>>>>>>>>>>>>>>>>> "host" that has a
>>>>>>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying 
>>>>>>>>>>>>>>>>>>>>>> the hosts you want
>>>>>>>>>>>>>>>>>>>>>> your slave to execute upon. Those hosts must have been 
>>>>>>>>>>>>>>>>>>>>>> included in
>>>>>>>>>>>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array 
>>>>>>>>>>>>>>>>>>>>>> passed to your Spawn.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We don't currently support providing a hostfile for the 
>>>>>>>>>>>>>>>>>>>>>> slaves (as opposed
>>>>>>>>>>>>>>>>>>>>>> to the host-at-a-time string above). This may become 
>>>>>>>>>>>>>>>>>>>>>> available in a future
>>>>>>>>>>>>>>>>>>>>>> release - TBD.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>>> From: users-boun...@open-mpi.org 
>>>>>>>>>>>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On
>>>>>>>>>>>>>>>>>>>>>>> Behalf Of Ralph H Castain
>>>>>>>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>>>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and 
>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" 
>>>>>>>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm working on a MPI application where I'm using 
>>>>>>>>>>>>>>>>>>>>>>>> OpenMPI instead of
>>>>>>>>>>>>>>>>>>>>>>>> MPICH.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In my "master" program I call the function 
>>>>>>>>>>>>>>>>>>>>>>>> MPI::Intracomm::Spawn which
>>>>>>>>>>>>>>>>>>>>>>> spawns
>>>>>>>>>>>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn 
>>>>>>>>>>>>>>>>>>>>>>>> the "slave"
>>>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" 
>>>>>>>>>>>>>>>>>>>>>>>> on the same
>>>>>>>>>>>>>>>>>>>>>>>> host.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then 
>>>>>>>>>>>>>>>>>>>>>>>> processes are spawn
>>>>>>>>>>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> network as expected. But now I need to spawn processes 
>>>>>>>>>>>>>>>>>>>>>>>> over the
>>>>>>>>>>>>>>>>>>>>>>>> network
>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can 
>>>>>>>>>>>>>>>>>>>>>>>> I achieve it?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm not sure from your description exactly what you are 
>>>>>>>>>>>>>>>>>>>>>>> trying to do,
>>>>>>>>>>>>>>>>>>>>>>> nor in
>>>>>>>>>>>>>>>>>>>>>>> what environment this is all operating within or what 
>>>>>>>>>>>>>>>>>>>>>>> version of Open
>>>>>>>>>>>>>>>>>>>>>>> MPI
>>>>>>>>>>>>>>>>>>>>>>> you are using. Setting aside the environment and 
>>>>>>>>>>>>>>>>>>>>>>> version issue, I'm
>>>>>>>>>>>>>>>>>>>>>>> guessing
>>>>>>>>>>>>>>>>>>>>>>> that you are running your executable over some 
>>>>>>>>>>>>>>>>>>>>>>> specified set of hosts,
>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>> want to provide a different hostfile that specifies the 
>>>>>>>>>>>>>>>>>>>>>>> hosts to be
>>>>>>>>>>>>>>>>>>>>>>> used for
>>>>>>>>>>>>>>>>>>>>>>> the "slave" processes. Correct?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that 
>>>>>>>>>>>>>>>>>>>>>>> in any version
>>>>>>>>>>>>>>>>>>>>>>> of Open
>>>>>>>>>>>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that 
>>>>>>>>>>>>>>>>>>>>>>> can be used by
>>>>>>>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>>>>>>>> in the original hostfile. You can then specify a subset 
>>>>>>>>>>>>>>>>>>>>>>> of those hosts
>>>>>>>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>>>>>>>> used by your original "master" program, and then 
>>>>>>>>>>>>>>>>>>>>>>> specify a different
>>>>>>>>>>>>>>>>>>>>>>> subset
>>>>>>>>>>>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> But the system requires that you tell it -all- of the 
>>>>>>>>>>>>>>>>>>>>>>> hosts that are
>>>>>>>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>>>>>>>> to be used at the beginning of the job.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> At the moment, there is no plan to remove that 
>>>>>>>>>>>>>>>>>>>>>>> requirement, though
>>>>>>>>>>>>>>>>>>>>>>> there has
>>>>>>>>>>>>>>>>>>>>>>> been occasional discussion about doing so at some point 
>>>>>>>>>>>>>>>>>>>>>>> in the future.
>>>>>>>>>>>>>>>>>>>>>>> No
>>>>>>>>>>>>>>>>>>>>>>> promises that it will happen, though - managed 
>>>>>>>>>>>>>>>>>>>>>>> environments, in
>>>>>>>>>>>>>>>>>>>>>>> particular,
>>>>>>>>>>>>>>>>>>>>>>> currently object to the idea of changing the allocation 
>>>>>>>>>>>>>>>>>>>>>>> on-the-fly. We
>>>>>>>>>>>>>>>>>>>>>>> may,
>>>>>>>>>>>>>>>>>>>>>>> though, make a provision for purely hostfile-based 
>>>>>>>>>>>>>>>>>>>>>>> environments (i.e.,
>>>>>>>>>>>>>>>>>>>>>>> unmanaged) at some time in the future.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks in advance for any help.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to