Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Grzegorz Maj Mon, 12 Jul 2010 14:13:42 -0400

2010/7/12 Ralph Castain <r...@open-mpi.org>:
> Dug around a bit and found the problem!!
>
> I have no idea who or why this was done, but somebody set a limit of 64 
> separate jobids in the dynamic init called by ompi_comm_set, which builds the 
> intercommunicator. Unfortunately, they hard-wired the array size, but never 
> check that size before adding to it.
>
> So after 64 calls to connect_accept, you are overwriting other areas of the 
> code. As you found, hitting 66 causes it to segfault.
>
> I'll fix this on the developer's trunk (I'll also add that original patch to 
> it). Rather than my searching this thread in detail, can you remind me what 
> version you are using so I can patch it too?


I'm using 1.4.2
Thanks a lot and I'm looking forward for the patch.

>
> Thanks for your patience with this!
> Ralph
>
>
> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>
>> 1024 is not the problem: changing it to 2048 hasn't change anything.
>> Following your advice I've run my process using gdb. Unfortunately I
>> didn't get anything more than:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>>
>> (gdb) bt
>> #0  0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>> #1  0xf7e3ba95 in connect_accept () from
>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>> /home/gmaj/openmpi/lib/libmpi.so.0
>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43
>>
>> What's more: when I've added a breakpoint on ompi_comm_set in 66th
>> process and stepped a couple of instructions, one of the other
>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did.
>>
>> Finally I decided to recompile openmpi using -g flag for gcc. In this
>> case the 66 processes issue has gone! I was running my applications
>> exactly the same way as previously (even without recompilation) and
>> I've run successfully over 130 processes.
>> When switching back to the openmpi compilation without -g it again segfaults.
>>
>> Any ideas? I'm really confused.
>>
>>
>>
>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>> I would guess the #files limit of 1024. However, if it behaves the same way 
>>> when spread across multiple machines, I would suspect it is somewhere in 
>>> your program itself. Given that the segfault is in your process, can you 
>>> use gdb to look at the core file and see where and why it fails?
>>>
>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>
>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>
>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>
>>>>>> Hi Ralph,
>>>>>> sorry for the late response, but I couldn't find free time to play
>>>>>> with this. Finally I've applied the patch you prepared. I've launched
>>>>>> my processes in the way you've described and I think it's working as
>>>>>> you expected. None of my processes runs the orted daemon and they can
>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65
>>>>>> processes issue :(
>>>>>> Maybe I'm doing something wrong.
>>>>>> I attach my source code. If anybody could have a look on this, I would
>>>>>> be grateful.
>>>>>>
>>>>>> When I run that code with clients_count <= 65 everything works fine:
>>>>>> all the processes create a common grid, exchange some information and
>>>>>> disconnect.
>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>
>>>>> I didn't have time to check the code, but my guess is that you are still 
>>>>> hitting some kind of file descriptor or other limit. Check to see what 
>>>>> your limits are - usually "ulimit" will tell you.
>>>>
>>>> My limitations are:
>>>> time(seconds)        unlimited
>>>> file(blocks)         unlimited
>>>> data(kb)             unlimited
>>>> stack(kb)            10240
>>>> coredump(blocks)     0
>>>> memory(kb)           unlimited
>>>> locked memory(kb)    64
>>>> process              200704
>>>> nofiles              1024
>>>> vmemory(kb)          unlimited
>>>> locks                unlimited
>>>>
>>>> Which one do you think could be responsible for that?
>>>>
>>>> I was trying to run all the 66 processes on one machine or spread them
>>>> across several machines and it always crashes the same way on the 66th
>>>> process.
>>>>
>>>>>
>>>>>>
>>>>>> Another thing I would like to know is if it's normal that any of my
>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the
>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>
>>>>> Yes - the waiting process is polling in a tight loop waiting for the 
>>>>> connection to be made.
>>>>>
>>>>>>
>>>>>> Any help would be appreciated,
>>>>>> Grzegorz Maj
>>>>>>
>>>>>>
>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>:
>>>>>>> Actually, OMPI is distributed with a daemon that does pretty much what 
>>>>>>> you
>>>>>>> want. Checkout "man ompi-server". I originally wrote that code to 
>>>>>>> support
>>>>>>> cross-application MPI publish/subscribe operations, but we can utilize 
>>>>>>> it
>>>>>>> here too. Have to blame me for not making it more publicly known.
>>>>>>> The attached patch upgrades ompi-server and modifies the singleton 
>>>>>>> startup
>>>>>>> to provide your desired support. This solution works in the following
>>>>>>> manner:
>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a persistent
>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for
>>>>>>> independently started applications.  The problem with starting different
>>>>>>> applications and wanting them to MPI connect/accept lies in the need to 
>>>>>>> have
>>>>>>> the applications find each other. If they can't discover contact info 
>>>>>>> for
>>>>>>> the other app, then they can't wire up their interconnects. The
>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like that
>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where 
>>>>>>> you
>>>>>>> will start your processes. This will allow your singleton processes to 
>>>>>>> find
>>>>>>> the ompi-server. I automatically also set the envar to connect the MPI
>>>>>>> publish/subscribe system for you.
>>>>>>> 3. run your processes. As they think they are singletons, they will 
>>>>>>> detect
>>>>>>> the presence of the above envar and automatically connect themselves to 
>>>>>>> the
>>>>>>> "ompi-server" daemon. This provides each process with the ability to 
>>>>>>> perform
>>>>>>> any MPI-2 operation.
>>>>>>> I tested this on my machines and it worked, so hopefully it will meet 
>>>>>>> your
>>>>>>> needs. You only need to run one "ompi-server" period, so long as you 
>>>>>>> locate
>>>>>>> it where all of the processes can find the contact file and can open a 
>>>>>>> TCP
>>>>>>> socket to the daemon. There is a way to knit multiple ompi-servers into 
>>>>>>> a
>>>>>>> broader network (e.g., to connect processes that cannot directly access 
>>>>>>> a
>>>>>>> server due to network segmentation), but it's a tad tricky - let me 
>>>>>>> know if
>>>>>>> you require it and I'll try to help.
>>>>>>> If you have trouble wiring them all into a single communicator, you 
>>>>>>> might
>>>>>>> ask separately about that and see if one of our MPI experts can provide
>>>>>>> advice (I'm just the RTE grunt).
>>>>>>> HTH - let me know how this works for you and I'll incorporate it into 
>>>>>>> future
>>>>>>> OMPI releases.
>>>>>>> Ralph
>>>>>>>
>>>>>>>
>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>
>>>>>>> Hi Ralph,
>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small
>>>>>>> project/experiment.
>>>>>>> We definitely would like to give your patch a try. But could you please
>>>>>>> explain your solution a little more?
>>>>>>> You still would like to start one mpirun per mpi grid, and then have
>>>>>>> processes started by us to join the MPI comm?
>>>>>>> It is a good solution of course.
>>>>>>> But it would be especially preferable to have one daemon running
>>>>>>> persistently on our "entry" machine that can handle several mpi grid 
>>>>>>> starts.
>>>>>>> Can your patch help us this way too?
>>>>>>> Thanks for your help!
>>>>>>> Krzysztof
>>>>>>>
>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>>>
>>>>>>>> In thinking about this, my proposed solution won't entirely fix the
>>>>>>>> problem - you'll still wind up with all those daemons. I believe I can
>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>
>>>>>>>> Would you like me to send you something you could try? Might take a 
>>>>>>>> couple
>>>>>>>> of iterations to get it right...
>>>>>>>>
>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>
>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it:
>>>>>>>>>
>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that 
>>>>>>>>> includes
>>>>>>>>> the following option:
>>>>>>>>>
>>>>>>>>> mpirun -report-uri file
>>>>>>>>>
>>>>>>>>> where file is some filename that mpirun can create and insert its
>>>>>>>>> contact info into it. This can be a relative or absolute path. This 
>>>>>>>>> process
>>>>>>>>> must remain alive throughout your application - doesn't matter what 
>>>>>>>>> it does.
>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>
>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where
>>>>>>>>> "file" is the filename given above. This will tell your processes how 
>>>>>>>>> to
>>>>>>>>> find mpirun, which is acting as a meeting place to handle the 
>>>>>>>>> connect/accept
>>>>>>>>> operations
>>>>>>>>>
>>>>>>>>> Now run your processes, and have them connect/accept to each other.
>>>>>>>>>
>>>>>>>>> The reason I cannot guarantee this will work is that these processes
>>>>>>>>> will all have the same rank && name since they all start as 
>>>>>>>>> singletons.
>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>
>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>
>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>
>>>>>>>>>> To be more precise: by 'server process' I mean some process that I
>>>>>>>>>> could run once on my system and it could help in creating those
>>>>>>>>>> groups.
>>>>>>>>>> My typical scenario is:
>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>> 3. do some job
>>>>>>>>>> 4. exit all N processes
>>>>>>>>>> 5. goto 1
>>>>>>>>>>
>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>> And, apart from that descriptors' issue, is there any other way to
>>>>>>>>>>> solve my problem, i.e. to run separately a number of processes,
>>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm group?
>>>>>>>>>>> If I for example would need to run some 'server process' (even using
>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are not
>>>>>>>>>>>> operating in an environment we support for "direct" launch (i.e., 
>>>>>>>>>>>> starting
>>>>>>>>>>>> processes outside of mpirun), then every one of those processes 
>>>>>>>>>>>> thinks it is
>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>
>>>>>>>>>>>> What you may not realize is that each singleton immediately
>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave just like 
>>>>>>>>>>>> mpirun.
>>>>>>>>>>>> This is required in order to support MPI-2 operations such as
>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> So if you launch 64 processes that think they are singletons, then
>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a lot of 
>>>>>>>>>>>> file
>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 process 
>>>>>>>>>>>> limit -
>>>>>>>>>>>> your system is probably running out of file descriptors. You might 
>>>>>>>>>>>> check you
>>>>>>>>>>>> system limits and see if you can get them revised upward.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special way 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> running my processes provided by the environment in which I'm
>>>>>>>>>>>>> working
>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it does 
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It mainly sits 
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> quietly without using any cpu unless required to support the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of no
>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes 
>>>>>>>>>>>>>>> communicating
>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create
>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this
>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>> I came up with a solution in which the processes are connecting
>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the processes
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This 
>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect all the
>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40
>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd like 
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>> Another problem in this solution is that when I try to connect
>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine for 
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about?
>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my 
>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as 
>>>>>>>>>>>>>>> MPI_COMM_SELF.
>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the
>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>> <client.c><server.c>_______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Reply via email to