I've attached gdb to the client which has just connected to the grid. Its bt is almost exactly the same as the server's one: #0 0x428066d7 in sched_yield () from /lib/libc.so.6 #1 0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220 #2 0x00d460b8 in opal_condition_wait (c=0xdc3160, m=0xdc31a0) at ../../opal/threads/condition.h:99 #3 0x00d463cc in ompi_request_default_wait_all (count=2, requests=0xff8a36d0, statuses=0x0) at ../../ompi/request/req_wait.c:262 #4 0x00a1431f in mca_coll_inter_allgatherv_inter (sbuf=0xff8a3794, scount=1, sdtype=0x8049400, rbuf=0xff8a3750, rcounts=0x80948e0, disps=0x8093938, rdtype=0x8049400, comm=0x8094fb8, module=0x80954a0) at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127 #5 0x00d3198f in ompi_comm_determine_first (intercomm=0x8094fb8, high=1) at ../../ompi/communicator/comm.c:1199 #6 0x00d75833 in PMPI_Intercomm_merge (intercomm=0x8094fb8, high=1, newcomm=0xff8a4c00) at pintercomm_merge.c:84 #7 0x08048a16 in main (argc=892352312, argv=0x32323038) at client.c:28
I've tried both scenarios described: when hangs a client connecting from machines B and C. In both cases bt looks the same. How does it look like? Shall I repost that using a different subject as Ralph suggested? Regards, Grzegorz 2010/7/27 Edgar Gabriel <gabr...@cs.uh.edu>: > based on your output shown here, there is absolutely nothing wrong > (yet). Both processes are in the same function and do what they are > supposed to do. > > However, I am fairly sure that the client process bt that you show is > already part of current_intracomm. Could you try to create a bt of the > process that is not yet part of current_intracomm (If I understand your > code correctly, the intercommunicator is n-1 configuration, with each > client process being part of n after the intercomm_merge). It would be > interesting to see where that process is... > > Thanks > Edgar > > On 7/27/2010 1:42 PM, Ralph Castain wrote: >> This slides outside of my purview - I would suggest you post this question >> with a different subject line specifically mentioning failure of >> intercomm_merge to work so it attracts the attention of those with knowledge >> of that area. >> >> >> On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: >> >>> So now I have a new question. >>> When I run my server and a lot of clients on the same machine, >>> everything looks fine. >>> >>> But when I try to run the clients on several machines the most >>> frequent scenario is: >>> * server is stared on machine A >>> * X (= 1, 4, 10, ..) clients are started on machine B and they connect >>> successfully >>> * the first client starting on machine C connects successfully to the >>> server, but the whole grid hangs on MPI_Comm_merge (all the processes >>> from intercommunicator get there). >>> >>> As I said it's the most frequent scenario. Sometimes I can connect the >>> clients from several machines. Sometimes it hangs (always on >>> MPI_Comm_merge) when connecting the clients from machine B. >>> The interesting thing is, that if before MPI_Comm_merge I send a dummy >>> message on the intercommunicator from process rank 0 in one group to >>> process rank 0 in the other one, it will not hang on MPI_Comm_merge. >>> >>> I've tried both versions with and without the first patch (ompi-server >>> as orted) but it doesn't change the behavior. >>> >>> I've attached gdb to my server, this is bt: >>> #0 0xffffe410 in __kernel_vsyscall () >>> #1 0x00637afc in sched_yield () from /lib/libc.so.6 >>> #2 0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220 >>> #3 0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at >>> ../../opal/threads/condition.h:99 >>> #4 0xf7f60dee in ompi_request_default_wait_all (count=2, >>> requests=0xff8d7754, statuses=0x0) at >>> ../../ompi/request/req_wait.c:262 >>> #5 0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824, >>> scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8, >>> disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08) >>> at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127 >>> #6 0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8, >>> high=0) at ../../ompi/communicator/comm.c:1199 >>> #7 0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0, >>> newcomm=0xff8d78c0) at pintercomm_merge.c:84 >>> #8 0x0804893c in main (argc=Cannot access memory at address 0xf >>> ) at server.c:50 >>> >>> And this is bt from one of the clients: >>> #0 0xffffe410 in __kernel_vsyscall () >>> #1 0x0064993b in poll () from /lib/libc.so.6 >>> #2 0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8, >>> tv=0xff82299c) at ../../../opal/event/poll.c:168 >>> #3 0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at >>> ../../../opal/event/event.c:807 >>> #4 0xf7dde34f in opal_event_loop (flags=2) at >>> ../../../opal/event/event.c:730 >>> #5 0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189 >>> #6 0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at >>> ../../opal/threads/condition.h:99 >>> #7 0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at >>> ../../ompi/request/request.h:375 >>> #8 0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8, >>> status=0x0) at ../../ompi/request/req_wait.c:37 >>> #9 0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic >>> (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0, >>> comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8) >>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237 >>> #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial >>> (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0, >>> comm=0x86aa7f8, module=0x868b700, segsize=0) >>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368 >>> #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed >>> (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8, >>> module=0x868b700) >>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256 >>> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1, >>> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at >>> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44 >>> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64, >>> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188, >>> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300, >>> module=0x86aae18) at >>> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134 >>> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300, >>> high=0) at ../../ompi/communicator/comm.c:1199 >>> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0, >>> newcomm=0xff8241d0) at pintercomm_merge.c:84 >>> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47 >>> >>> >>> >>> What do you think may cause the problem? >>> >>> >>> 2010/7/26 Ralph Castain <r...@open-mpi.org>: >>>> No problem at all - glad it works! >>>> >>>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote: >>>> >>>>> Hi, >>>>> I'm very sorry, but the problem was on my side. My installation >>>>> process was not always taking the newest sources of openmpi. In this >>>>> case it hasn't installed the version with the latest patch. Now I >>>>> think everything works fine - I could run over 130 processes with no >>>>> problems. >>>>> I'm sorry again that I've wasted your time. And thank you for the patch. >>>>> >>>>> 2010/7/21 Ralph Castain <r...@open-mpi.org>: >>>>>> We're having some problem replicating this once my patches are applied. >>>>>> Can you send us your configure cmd? Just the output from "head >>>>>> config.log" will do for now. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote: >>>>>> >>>>>>> My start script looks almost exactly the same as the one published by >>>>>>> Edgar, ie. the processes are starting one by one with no delay. >>>>>>> >>>>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>: >>>>>>>> Grzegorz: something occurred to me. When you start all these >>>>>>>> processes, how are you staggering their wireup? Are they flooding us, >>>>>>>> or are you time-shifting them a little? >>>>>>>> >>>>>>>> >>>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote: >>>>>>>> >>>>>>>>> Hm, so I am not sure how to approach this. First of all, the test case >>>>>>>>> works for me. I used up to 80 clients, and for both optimized and >>>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4 >>>>>>>>> series, but the communicator code is identical in both cases). >>>>>>>>> Clearly, >>>>>>>>> the patch from Ralph is necessary to make it work. >>>>>>>>> >>>>>>>>> Additionally, I went through the communicator creation code for >>>>>>>>> dynamic >>>>>>>>> communicators trying to find spots that could create problems. The >>>>>>>>> only >>>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping >>>>>>>>> arrays (e.g. for communicators), where the initial size of the table >>>>>>>>> is >>>>>>>>> 64. I looked twice over the pointer-array code to see whether we could >>>>>>>>> have a problem their (since it is a key-piece of the cid allocation >>>>>>>>> code >>>>>>>>> for communicators), but I am fairly confident that it is correct. >>>>>>>>> >>>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called >>>>>>>>> 100,000 times, and the code per se does not seem to have a problem due >>>>>>>>> to being called too often. So I am not sure what else to look at. >>>>>>>>> >>>>>>>>> Edgar >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote: >>>>>>>>>> As far as I can tell, it appears the problem is somewhere in our >>>>>>>>>> communicator setup. The people knowledgeable on that area are going >>>>>>>>>> to look into it later this week. >>>>>>>>>> >>>>>>>>>> I'm creating a ticket to track the problem and will copy you on it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote: >>>>>>>>>>> >>>>>>>>>>>> Bad news.. >>>>>>>>>>>> I've tried the latest patch with and without the prior one, but it >>>>>>>>>>>> hasn't changed anything. I've also tried using the old code but >>>>>>>>>>>> with >>>>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also >>>>>>>>>>>> didn't >>>>>>>>>>>> help. >>>>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find >>>>>>>>>>>> any >>>>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm. >>>>>>>>>>> >>>>>>>>>>> It isn't directly called - it shows in ompi_comm_set as >>>>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, >>>>>>>>>>> but I guess something else is also being hit. Have to look >>>>>>>>>>> further... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the >>>>>>>>>>>>> patch (doesn't include the prior patch). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>> Dug around a bit and found the problem!! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a >>>>>>>>>>>>>>> limit of 64 separate jobids in the dynamic init called by >>>>>>>>>>>>>>> ompi_comm_set, which builds the intercommunicator. >>>>>>>>>>>>>>> Unfortunately, they hard-wired the array size, but never check >>>>>>>>>>>>>>> that size before adding to it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other >>>>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to >>>>>>>>>>>>>>> segfault. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that >>>>>>>>>>>>>>> original patch to it). Rather than my searching this thread in >>>>>>>>>>>>>>> detail, can you remind me what version you are using so I can >>>>>>>>>>>>>>> patch it too? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using 1.4.2 >>>>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for your patience with this! >>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change >>>>>>>>>>>>>>>> anything. >>>>>>>>>>>>>>>> Following your advice I've run my process using gdb. >>>>>>>>>>>>>>>> Unfortunately I >>>>>>>>>>>>>>>> didn't get anything more than: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>>>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (gdb) bt >>>>>>>>>>>>>>>> #0 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>> #1 0xf7e3ba95 in connect_accept () from >>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>>>>>>>>>>>>>>> #2 0xf7f62013 in PMPI_Comm_connect () from >>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at >>>>>>>>>>>>>>>> client.c:43 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in >>>>>>>>>>>>>>>> 66th >>>>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other >>>>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than >>>>>>>>>>>>>>>> 66th did. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. >>>>>>>>>>>>>>>> In this >>>>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my >>>>>>>>>>>>>>>> applications >>>>>>>>>>>>>>>> exactly the same way as previously (even without >>>>>>>>>>>>>>>> recompilation) and >>>>>>>>>>>>>>>> I've run successfully over 130 processes. >>>>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it >>>>>>>>>>>>>>>> again segfaults. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any ideas? I'm really confused. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it >>>>>>>>>>>>>>>>> behaves the same way when spread across multiple machines, I >>>>>>>>>>>>>>>>> would suspect it is somewhere in your program itself. Given >>>>>>>>>>>>>>>>> that the segfault is in your process, can you use gdb to look >>>>>>>>>>>>>>>>> at the core file and see where and why it fails? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time >>>>>>>>>>>>>>>>>>>> to play >>>>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. >>>>>>>>>>>>>>>>>>>> I've launched >>>>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's >>>>>>>>>>>>>>>>>>>> working as >>>>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon >>>>>>>>>>>>>>>>>>>> and they can >>>>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting >>>>>>>>>>>>>>>>>>>> the 65 >>>>>>>>>>>>>>>>>>>> processes issue :( >>>>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong. >>>>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on >>>>>>>>>>>>>>>>>>>> this, I would >>>>>>>>>>>>>>>>>>>> be grateful. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything >>>>>>>>>>>>>>>>>>>> works fine: >>>>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some >>>>>>>>>>>>>>>>>>>> information and >>>>>>>>>>>>>>>>>>>> disconnect. >>>>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that >>>>>>>>>>>>>>>>>>> you are still hitting some kind of file descriptor or other >>>>>>>>>>>>>>>>>>> limit. Check to see what your limits are - usually "ulimit" >>>>>>>>>>>>>>>>>>> will tell you. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> My limitations are: >>>>>>>>>>>>>>>>>> time(seconds) unlimited >>>>>>>>>>>>>>>>>> file(blocks) unlimited >>>>>>>>>>>>>>>>>> data(kb) unlimited >>>>>>>>>>>>>>>>>> stack(kb) 10240 >>>>>>>>>>>>>>>>>> coredump(blocks) 0 >>>>>>>>>>>>>>>>>> memory(kb) unlimited >>>>>>>>>>>>>>>>>> locked memory(kb) 64 >>>>>>>>>>>>>>>>>> process 200704 >>>>>>>>>>>>>>>>>> nofiles 1024 >>>>>>>>>>>>>>>>>> vmemory(kb) unlimited >>>>>>>>>>>>>>>>>> locks unlimited >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Which one do you think could be responsible for that? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or >>>>>>>>>>>>>>>>>> spread them >>>>>>>>>>>>>>>>>> across several machines and it always crashes the same way >>>>>>>>>>>>>>>>>> on the 66th >>>>>>>>>>>>>>>>>> process. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that >>>>>>>>>>>>>>>>>>>> any of my >>>>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept >>>>>>>>>>>>>>>>>>>> when the >>>>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop >>>>>>>>>>>>>>>>>>> waiting for the connection to be made. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Any help would be appreciated, >>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does >>>>>>>>>>>>>>>>>>>>> pretty much what you >>>>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that >>>>>>>>>>>>>>>>>>>>> code to support >>>>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but >>>>>>>>>>>>>>>>>>>>> we can utilize it >>>>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more >>>>>>>>>>>>>>>>>>>>> publicly known. >>>>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the >>>>>>>>>>>>>>>>>>>>> singleton startup >>>>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in >>>>>>>>>>>>>>>>>>>>> the following >>>>>>>>>>>>>>>>>>>>> manner: >>>>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This >>>>>>>>>>>>>>>>>>>>> starts a persistent >>>>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous >>>>>>>>>>>>>>>>>>>>> point for >>>>>>>>>>>>>>>>>>>>> independently started applications. The problem with >>>>>>>>>>>>>>>>>>>>> starting different >>>>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies >>>>>>>>>>>>>>>>>>>>> in the need to have >>>>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover >>>>>>>>>>>>>>>>>>>>> contact info for >>>>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their >>>>>>>>>>>>>>>>>>>>> interconnects. The >>>>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I >>>>>>>>>>>>>>>>>>>>> don't like that >>>>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the >>>>>>>>>>>>>>>>>>>>> environment where you >>>>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton >>>>>>>>>>>>>>>>>>>>> processes to find >>>>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to >>>>>>>>>>>>>>>>>>>>> connect the MPI >>>>>>>>>>>>>>>>>>>>> publish/subscribe system for you. >>>>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, >>>>>>>>>>>>>>>>>>>>> they will detect >>>>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect >>>>>>>>>>>>>>>>>>>>> themselves to the >>>>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the >>>>>>>>>>>>>>>>>>>>> ability to perform >>>>>>>>>>>>>>>>>>>>> any MPI-2 operation. >>>>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully >>>>>>>>>>>>>>>>>>>>> it will meet your >>>>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so >>>>>>>>>>>>>>>>>>>>> long as you locate >>>>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file >>>>>>>>>>>>>>>>>>>>> and can open a TCP >>>>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple >>>>>>>>>>>>>>>>>>>>> ompi-servers into a >>>>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot >>>>>>>>>>>>>>>>>>>>> directly access a >>>>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad >>>>>>>>>>>>>>>>>>>>> tricky - let me know if >>>>>>>>>>>>>>>>>>>>> you require it and I'll try to help. >>>>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single >>>>>>>>>>>>>>>>>>>>> communicator, you might >>>>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI >>>>>>>>>>>>>>>>>>>>> experts can provide >>>>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt). >>>>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll >>>>>>>>>>>>>>>>>>>>> incorporate it into future >>>>>>>>>>>>>>>>>>>>> OMPI releases. >>>>>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this >>>>>>>>>>>>>>>>>>>>> our small >>>>>>>>>>>>>>>>>>>>> project/experiment. >>>>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But >>>>>>>>>>>>>>>>>>>>> could you please >>>>>>>>>>>>>>>>>>>>> explain your solution a little more? >>>>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, >>>>>>>>>>>>>>>>>>>>> and then have >>>>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm? >>>>>>>>>>>>>>>>>>>>> It is a good solution of course. >>>>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon >>>>>>>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle >>>>>>>>>>>>>>>>>>>>> several mpi grid starts. >>>>>>>>>>>>>>>>>>>>> Can your patch help us this way too? >>>>>>>>>>>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>>>>>>>>>>> Krzysztof >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't >>>>>>>>>>>>>>>>>>>>>> entirely fix the >>>>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I >>>>>>>>>>>>>>>>>>>>>> believe I can >>>>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? >>>>>>>>>>>>>>>>>>>>>> Might take a couple >>>>>>>>>>>>>>>>>>>>>> of iterations to get it right... >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot >>>>>>>>>>>>>>>>>>>>>>> guarantee it: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using >>>>>>>>>>>>>>>>>>>>>>> mpirun that includes >>>>>>>>>>>>>>>>>>>>>>> the following option: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and >>>>>>>>>>>>>>>>>>>>>>> insert its >>>>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or >>>>>>>>>>>>>>>>>>>>>>> absolute path. This process >>>>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't >>>>>>>>>>>>>>>>>>>>>>> matter what it does. >>>>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your >>>>>>>>>>>>>>>>>>>>>>> environment, where >>>>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your >>>>>>>>>>>>>>>>>>>>>>> processes how to >>>>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to >>>>>>>>>>>>>>>>>>>>>>> handle the connect/accept >>>>>>>>>>>>>>>>>>>>>>> operations >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to >>>>>>>>>>>>>>>>>>>>>>> each other. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that >>>>>>>>>>>>>>>>>>>>>>> these processes >>>>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all >>>>>>>>>>>>>>>>>>>>>>> start as singletons. >>>>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some >>>>>>>>>>>>>>>>>>>>>>>> process that I >>>>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in >>>>>>>>>>>>>>>>>>>>>>>> creating those >>>>>>>>>>>>>>>>>>>>>>>> groups. >>>>>>>>>>>>>>>>>>>>>>>> My typical scenario is: >>>>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>>>>>>>>>>>>>>> 3. do some job >>>>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes >>>>>>>>>>>>>>>>>>>>>>>> 5. goto 1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any >>>>>>>>>>>>>>>>>>>>>>>>> other way to >>>>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of >>>>>>>>>>>>>>>>>>>>>>>>> processes, >>>>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI >>>>>>>>>>>>>>>>>>>>>>>>> intracomm group? >>>>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server >>>>>>>>>>>>>>>>>>>>>>>>> process' (even using >>>>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use >>>>>>>>>>>>>>>>>>>>>>>>>> mpirun, and are not >>>>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" >>>>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting >>>>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of >>>>>>>>>>>>>>>>>>>>>>>>>> those processes thinks it is >>>>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton >>>>>>>>>>>>>>>>>>>>>>>>>> immediately >>>>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to >>>>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun. >>>>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 >>>>>>>>>>>>>>>>>>>>>>>>>> operations such as >>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are >>>>>>>>>>>>>>>>>>>>>>>>>> singletons, then >>>>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This >>>>>>>>>>>>>>>>>>>>>>>>>> eats up a lot of file >>>>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting >>>>>>>>>>>>>>>>>>>>>>>>>> this 65 process limit - >>>>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file >>>>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you >>>>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised >>>>>>>>>>>>>>>>>>>>>>>>>> upward. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some >>>>>>>>>>>>>>>>>>>>>>>>>>> special way for >>>>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in >>>>>>>>>>>>>>>>>>>>>>>>>>> which I'm >>>>>>>>>>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun >>>>>>>>>>>>>>>>>>>>>>>>>>>> - all it does is >>>>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>> It mainly sits there >>>>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to >>>>>>>>>>>>>>>>>>>>>>>>>>>> support the job. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Otherwise, I know of no >>>>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of >>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes communicating >>>>>>>>>>>>>>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mpirun and create >>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas >>>>>>>>>>>>>>>>>>>>>>>>>>>>> how to do this >>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>> are connecting >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> run about 40 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to connect >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at >>>>>>>>>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>> know about? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> run my processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same >>>>>>>>>>>>>>>>>>>>>>>>>>>>> as MPI_COMM_SELF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________ >>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Edgar Gabriel > Assistant Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu > Department of Computer Science University of Houston > Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >