George,

Thanks for the tip. In fact, calling mpi_comm_spawn right away with MPI
_COMM_SELF
has worked for me just as well -- no subgroups needed at all.

I am testing this openmpi app named "siesta" in parallel. The source code
is available,
so making it "spawn ready" by adding the pair mpi_comm_get_parent +
mpi_comm_disconnect

into the main code can be done.  If it works, maybe the siesta's developers
can be convinced
to add this feature in a future release.

However, siesta is launched only by specifying input/output files with i/o
redirection like

mpirun -n <*some number*>  siesta < infile > outfile

So far, I could not find anything about how to set an stdin file for an
spawnee process.
Specifiyng it in a app context file doesn't seem to work. Can it be done?
Maybe through
an MCA parameter?

Alex





2014-12-15 2:43 GMT-02:00 George Bosilca <bosi...@icl.utk.edu>:
>
> Alex,
>
> The code looks good, and is 100% MPI standard accurate.
>
> I would change the way you create the subcoms in the parent. You do a lot
> of useless operations, as you can achieve exactly the same outcome (one
> communicator per node), either by duplicating MPI_COMM_SELF or doing an
> MPI_Comm_split with the color equal to your rank.
>
>   George.
>
>
> On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt <a...@ufsm.br> wrote:
>
>> Hi,
>>
>> Sorry, guys. I don't think the newbie here can follow any discussion
>> beyond basic mpi...
>>
>> Anyway, if I add the pair
>>
>> call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror)
>> call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror)
>>
>> on the spawnee side I get the proper response in the spawning processes.
>>
>> Please, take a look at the attached toy codes parent.F and child.F
>> I've been playing with. 'mpirun -n 2 parent' seems to work as expected.
>>
>> Alex
>>
>> 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com>:
>>>
>>> Alex,
>>>
>>> Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the
>>> same remote communicator ?
>>>
>>> I also read the man page again, and MPI_Comm_disconnect does not ensure
>>> the remote processes have finished or called MPI_Comm_disconnect, so that
>>> might not be the thing you need.
>>> George, can you please comment on that ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> George Bosilca <bosi...@icl.utk.edu> wrote:
>>> MPI_Comm_disconnect should be a local operation, there is no reason for
>>> it to deadlock. I looked at the code and everything is local with the
>>> exception of a call to PMIX.FENCE. Can you attach to your deadlocked
>>> processes and confirm that they are stopped in the pmix.fence?
>>>
>>>   George.
>>>
>>>
>>> On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt <a...@ufsm.br> wrote:
>>>
>>>> Hi
>>>>
>>>> Sorry, I was calling mpi_comm_disconnect on the group comm handler, not
>>>> on the intercomm handler returned from the spawn call as it should be.
>>>>
>>>> Well, calling the disconnect on the intercomm handler does halt the
>>>> spwaner
>>>> side but the wait is never completed since, as George points out, there
>>>> is no
>>>> disconnect call being made on the spawnee side.... and that brings me
>>>> back
>>>> to the beginning of the problem since, being a third party app, that
>>>> call would
>>>> never be there. I guess an mpi wrapper to deal with that could be made
>>>> for
>>>> the app, but I fell the wrapper itself, at the end, would face the same
>>>> problem
>>>> we face right now.
>>>>
>>>> My application is a genetic algorithm code that search optimal
>>>> configuration
>>>> (minimum or maximum energy) of cluster of atoms. The work flow
>>>> bottleneck
>>>> is the calculation of the cluster energy. For the cases which an
>>>> analytical
>>>> potential is available the calculation can be made internally and the
>>>> workload
>>>> is distributed among slaves nodes from a master node. This is also done
>>>> when an analytical potential is not available and the energy
>>>> calculation must
>>>> be done externally by a quantum chemistry code like dftb+, siesta and
>>>> Gaussian.
>>>> So far, we have been running these codes in serial mode. No need to say
>>>> that
>>>> we could do a lot better if they could be executed in parallel.
>>>>
>>>> I am not familiar with DMRAA but it seems to be the right choice to
>>>> deal with
>>>> job schedulers as it covers the ones I am interested in (pbs/torque and
>>>> loadlever).
>>>>
>>>> Alex
>>>>
>>>> 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com>:
>>>>>
>>>>> George is right about the semantic
>>>>>
>>>>> However i am surprised it returns immediatly...
>>>>> That should either work or hang imho
>>>>>
>>>>> The second point is no more mpi related, and is batch manager specific.
>>>>>
>>>>> You will likely find a submit parameter to make the command block
>>>>> until the job completes. Or you can write your own wrapper.
>>>>> Or you can retrieve the jobid and qstat periodically to get the job
>>>>> state.
>>>>> If an api is available, this is also an option.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>> You have to call MPI_Comm_disconnect on both sides of the
>>>>> intercommunicator. On the spawner processes you should call it on the
>>>>> intercom, while on the spawnees you should call it on the
>>>>> MPI_Comm_get_parent.
>>>>>
>>>>>   George.
>>>>>
>>>>> On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote:
>>>>>
>>>>> Gilles,
>>>>>
>>>>> MPI_comm_disconnect seem to work but not quite.
>>>>> The call to it returns almost immediatly while
>>>>> the spawn processes keep piling up in the background
>>>>> until they are all done...
>>>>>
>>>>> I think system('env -i qsub...') to launch the third party apps
>>>>> would take the execution of every call back to the scheduler
>>>>> queue. How would I track each one for their completion?
>>>>>
>>>>> Alex
>>>>>
>>>>> 2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet <
>>>>> gilles.gouaillar...@gmail.com>:
>>>>>>
>>>>>> Alex,
>>>>>>
>>>>>> You need MPI_Comm_disconnect at least.
>>>>>> I am not sure if this is 100% correct nor working.
>>>>>>
>>>>>> If you are using third party apps, why dont you do something like
>>>>>> system("env -i qsub ...")
>>>>>> with the right options to make qsub blocking or you manually wait for
>>>>>> the end of the job ?
>>>>>>
>>>>>> That looks like a much cleaner and simpler approach to me.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> "Alex A. Schmidt" <a...@ufsm.br> wrote:
>>>>>> Hello Gilles,
>>>>>>
>>>>>> Ok, I believe I have a simple toy app running as I think it should:
>>>>>> 'n' parent processes running under mpi_comm_world, each one
>>>>>> spawning its own 'm' child processes (each child group work
>>>>>> together nicely, returning the expected result for an mpi_allreduce
>>>>>> call).
>>>>>>
>>>>>> Now, as I mentioned before, the apps I want to run in the spawned
>>>>>> processes are third party mpi apps and I don't think it will be
>>>>>> possible
>>>>>> to exchange messages with them from my app. So, I do I tell
>>>>>> when the spawned processes have finnished running? All I have to work
>>>>>> with is the intercommunicator returned from the mpi_comm_spawn call...
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>:
>>>>>>>
>>>>>>> Gilles,
>>>>>>>
>>>>>>> Well, yes, I guess....
>>>>>>>
>>>>>>> I'll do tests with the real third party apps and let you know.
>>>>>>> These are huge quantum chemistry codes (dftb+, siesta and Gaussian)
>>>>>>> which greatly benefits from a parallel environment. My code is just
>>>>>>> a front end to use those, but since we have a lot of data to process
>>>>>>> it also benefits from a parallel environment.
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>> 2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet <
>>>>>>> gilles.gouaillar...@iferc.org>:
>>>>>>>>
>>>>>>>>  Alex,
>>>>>>>>
>>>>>>>> just to make sure ...
>>>>>>>> this is the behavior you expected, right ?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2014/12/12 13:27, Alex A. Schmidt wrote:
>>>>>>>>
>>>>>>>> Gilles,
>>>>>>>>
>>>>>>>> Ok, very nice!
>>>>>>>>
>>>>>>>> When I excute
>>>>>>>>
>>>>>>>> do rank=1,3
>>>>>>>>     call  MPI_Comm_spawn('hello_world','
>>>>>>>> ',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status)
>>>>>>>> enddo
>>>>>>>>
>>>>>>>> I do get 15 instances of the 'hello_world' app running: 5 for each 
>>>>>>>> parent
>>>>>>>> rank 1, 2 and 3.
>>>>>>>>
>>>>>>>> Thanks a lot, Gilles.
>>>>>>>>
>>>>>>>> Best regargs,
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-12-12 1:32 GMT-02:00 Gilles Gouaillardet 
>>>>>>>> <gilles.gouaillar...@iferc.org
>>>>>>>>
>>>>>>>>  :
>>>>>>>>
>>>>>>>>  Alex,
>>>>>>>>
>>>>>>>> just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs
>>>>>>>> parameter :
>>>>>>>>
>>>>>>>>        int MPI_Comm_spawn(char *command, char *argv[], int maxprocs,
>>>>>>>> MPI_Info info,
>>>>>>>>                          int root, MPI_Comm comm, MPI_Comm *intercomm,
>>>>>>>>                          int array_of_errcodes[])
>>>>>>>>
>>>>>>>> INPUT PARAMETERS
>>>>>>>>        maxprocs
>>>>>>>>               - maximum number of processes to start (integer, 
>>>>>>>> significant
>>>>>>>> only at root)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2014/12/12 12:23, Alex A. Schmidt wrote:
>>>>>>>>
>>>>>>>> Hello Gilles,
>>>>>>>>
>>>>>>>> Thanks for your reply. The "env -i PATH=..." stuff seems to work!!!
>>>>>>>>
>>>>>>>> call system("sh -c 'env -i PATH=/usr/lib64/openmpi/bin:/bin mpirun -n 2
>>>>>>>> hello_world' ")
>>>>>>>>
>>>>>>>> did produce the expected result with a simple openmi "hello_world" 
>>>>>>>> code I
>>>>>>>> wrote.
>>>>>>>>
>>>>>>>> I might be harder though with the real third party app I have in mind. 
>>>>>>>> And
>>>>>>>> I realize
>>>>>>>> getting passed over a job scheduler with this approach might not work 
>>>>>>>> at
>>>>>>>> all...
>>>>>>>>
>>>>>>>> I have looked at the MPI_Comm_spawn call but I failed to understand 
>>>>>>>> how it
>>>>>>>> could help here. For instance, can I use it to launch an mpi app with 
>>>>>>>> the
>>>>>>>> option "-n 5" ?
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet 
>>>>>>>> <gilles.gouaillar...@iferc.org
>>>>>>>>
>>>>>>>>
>>>>>>>>  :
>>>>>>>>
>>>>>>>>  Alex,
>>>>>>>>
>>>>>>>> can you try something like
>>>>>>>> call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name')
>>>>>>>>
>>>>>>>> -i start with an empty environment
>>>>>>>> that being said, you might need to set a few environment variables
>>>>>>>> manually :
>>>>>>>> env -i PATH=/bin ...
>>>>>>>>
>>>>>>>> and that being also said, this "trick" could be just a bad idea :
>>>>>>>> you might be using a scheduler, and if you empty the environment, the
>>>>>>>> scheduler
>>>>>>>> will not be aware of the "inside" run.
>>>>>>>>
>>>>>>>> on top of that, invoking system might fail depending on the 
>>>>>>>> interconnect
>>>>>>>> you use.
>>>>>>>>
>>>>>>>> Bottom line, i believe Ralph's reply is still valid, even if five years
>>>>>>>> have passed :
>>>>>>>> changing your workflow, or using MPI_Comm_spawn is a much better 
>>>>>>>> approach.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>> On 2014/12/12 11:22, Alex A. Schmidt wrote:
>>>>>>>>
>>>>>>>> Dear OpenMPI users,
>>>>>>>>
>>>>>>>> Regarding to this previous 
>>>>>>>> post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from 
>>>>>>>> 2009,
>>>>>>>> I wonder if the reply
>>>>>>>> from Ralph Castain is still valid. My need is similar but quite 
>>>>>>>> simpler:
>>>>>>>> to make a system call from an openmpi fortran application to run a
>>>>>>>> third party openmpi application. I don't need to exchange mpi messages
>>>>>>>> with the application. I just need to read the resulting output file
>>>>>>>> generated
>>>>>>>> by it. I have tried to do the following system call from my fortran 
>>>>>>>> openmpi
>>>>>>>> code:
>>>>>>>>
>>>>>>>> call system("sh -c 'mpirun -n 2 app_name")
>>>>>>>>
>>>>>>>> but I get
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> Open MPI does not support recursive calls of mpirun
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> Is there a way to make this work?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>>
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25966.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this 
>>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25968.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this 
>>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25969.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25970.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25971.php
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25974.php
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25975.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25978.php
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25979.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25981.php
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25982.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25991.php
>

Reply via email to