Hmmm....I -think- this will work, but I cannot guarantee it: 1. launch one process (can just be a spinner) using mpirun that includes the following option:
mpirun -report-uri file where file is some filename that mpirun can create and insert its contact info into it. This can be a relative or absolute path. This process must remain alive throughout your application - doesn't matter what it does. It's purpose is solely to keep mpirun alive. 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where "file" is the filename given above. This will tell your processes how to find mpirun, which is acting as a meeting place to handle the connect/accept operations Now run your processes, and have them connect/accept to each other. The reason I cannot guarantee this will work is that these processes will all have the same rank && name since they all start as singletons. Hence, connect/accept is likely to fail. But it -might- work, so you might want to give it a try. On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: > To be more precise: by 'server process' I mean some process that I > could run once on my system and it could help in creating those > groups. > My typical scenario is: > 1. run N separate processes, each without mpirun > 2. connect them into MPI group > 3. do some job > 4. exit all N processes > 5. goto 1 > > 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >> Thank you Ralph for your explanation. >> And, apart from that descriptors' issue, is there any other way to >> solve my problem, i.e. to run separately a number of processes, >> without mpirun and then to collect them into an MPI intracomm group? >> If I for example would need to run some 'server process' (even using >> mpirun) for this task, that's OK. Any ideas? >> >> Thanks, >> Grzegorz Maj >> >> >> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>> Okay, but here is the problem. If you don't use mpirun, and are not >>> operating in an environment we support for "direct" launch (i.e., starting >>> processes outside of mpirun), then every one of those processes thinks it >>> is a singleton - yes? >>> >>> What you may not realize is that each singleton immediately fork/exec's an >>> orted daemon that is configured to behave just like mpirun. This is >>> required in order to support MPI-2 operations such as MPI_Comm_spawn, >>> MPI_Comm_connect/accept, etc. >>> >>> So if you launch 64 processes that think they are singletons, then you have >>> 64 copies of orted running as well. This eats up a lot of file descriptors, >>> which is probably why you are hitting this 65 process limit - your system >>> is probably running out of file descriptors. You might check you system >>> limits and see if you can get them revised upward. >>> >>> >>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>> >>>> Yes, I know. The problem is that I need to use some special way for >>>> running my processes provided by the environment in which I'm working >>>> and unfortunately I can't use mpirun. >>>> >>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>> Guess I don't understand why you can't use mpirun - all it does is start >>>>> things, provide a means to forward io, etc. It mainly sits there quietly >>>>> without using any cpu unless required to support the job. >>>>> >>>>> Sounds like it would solve your problem. Otherwise, I know of no way to >>>>> get all these processes into comm_world. >>>>> >>>>> >>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>> >>>>>> Hi, >>>>>> I'd like to dynamically create a group of processes communicating via >>>>>> MPI. Those processes need to be run without mpirun and create >>>>>> intracommunicator after the startup. Any ideas how to do this >>>>>> efficiently? >>>>>> I came up with a solution in which the processes are connecting one by >>>>>> one using MPI_Comm_connect, but unfortunately all the processes that >>>>>> are already in the group need to call MPI_Comm_accept. This means that >>>>>> when the n-th process wants to connect I need to collect all the n-1 >>>>>> processes on the MPI_Comm_accept call. After I run about 40 processes >>>>>> every subsequent call takes more and more time, which I'd like to >>>>>> avoid. >>>>>> Another problem in this solution is that when I try to connect 66-th >>>>>> process the root of the existing group segfaults on MPI_Comm_accept. >>>>>> Maybe it's my bug, but it's weird as everything works fine for at most >>>>>> 65 processes. Is there any limitation I don't know about? >>>>>> My last question is about MPI_COMM_WORLD. When I run my processes >>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF. Is >>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>> intracommunicator that I've created? >>>>>> >>>>>> Thanks, >>>>>> Grzegorz Maj >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users