Re: [OMPI users] Issues with Get/Put and IRecv
Well, I've managed to get a working solution, but I'm not sure how I got there. I built a test case that looked like a nice simple version of what I was trying to do and it worked, so I moved the test code into my implementation and low and behold it works. I must have been doing something a little funky in the original pass, likely causing a stack smash somewhere or trying to do a get/put out of bounds. If I have any more problems, I'll let y'all know. I've tested pretty heavy usage up to 128 MPI processes across 16 nodes and things seem to be behaving. I did notice that single sided transfers seem to be a little slower than explicit send/recv, at least on GigE. Once I do some more testing, I'll bring things up on IB and see how things are going. -Mike Mike Houston wrote: Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688 #8 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #9 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82 #11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at ptest.c:52 -Mike ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] multithreading support
On Mar 16, 2007, at 1:35 AM, Chevchenkovic Chevchenkovic wrote: Could someone let me know about the status of multithread support in openMPI and MVAPICH. I got some details about MVAPICH which says that it is supported for MVAPICH2 but I am not sure of the same for openMPI. Open MPI's threading support is "light testing", at best. In reality, it probably will not work. Threading support was designed into the system from the beginning, but we have not really got around to debugging / testing it yet. It is possible that we will do so over the next few months; a few of the Open MPI member organizations indicated that they will be working on threading support for the v1.3 series (probably towards the end of this year -- v1.3 is very much in the planning/definition stage at this point). We cannot really comment on MVAPICH2 here; it's an entirely different software project. You'll probably want to post to their mailing list to get an answer. -- Jeff Squyres Cisco Systems
Re: [OMPI users] Signal 13
FWIW, most LDAP installations I have seen have ended up doing the same thing -- if you have a large enough cluster, you have MPI jobs starting all the time, and rate control of a single job startup is not sufficient to avoid overloading your LDAP server. The solutions that I have seen typically have a job fired once a day via cron that dumps relevant information from LDAP into local /etc/ passwd / shadow / group files and then simply use that for authentication across the cluster. Hope that helps. On Mar 18, 2007, at 8:34 PM, David Bronke wrote: That's great to hear! For now we'll just create local users for those who need access to MPI on this system, but I'll keep an eye on the list for when you do get a chance to finish that fix. Thanks again! On 3/18/07, Ralph Castainwrote: Excellent! Yes, we use pipe in several places, including in the run-time during various stages of launch, so that could be a problem. Also, be aware that other users have reported problems on LDAP- based systems when attempting to launch large jobs. The problem is that the OpenMPI launch system has no rate control in it - and the LDAP's slapd servers get overwhelmed by the launch when we ssh on a large number of nodes. I promised another user to concoct a fix for this problem, but am taking a break from the project for a few months so it may be a little while before a fix is available. When I do get it done, it may or may not make it into an OpenMPI release for some time - I'm not sure how they will decide to schedule the change (is it a "bug", or a new "feature"?). So I may do an interim release as a patch on the OpenRTE site (since that is the run-time underneath OpenMPI). I'll let people know via this mailing list either way. Ralph On 3/18/07 2:06 PM, "David Bronke" wrote: I just received an email from a friend who is helping me work on resolving this; he was able to trace the problem back to a pipe() call in OpenMPI apparently: The problem is with the pipe() system call (which is invoked by the MPI_Send() as far as I can tell) by a LDAP authenticated user. Still working out where exactly that goes wrong, but the fact is that it isn't actually a permissions problem - the reason it works as root is because root is a local user and does /etc/passwd normal authentication. I had forgotten to mention that we use LDAP for authentication on this machine; PAM and NSS are set up to use it, but I'm guessing that either OpenMPI itself or the pipe() system call won't check with them when needed... We have made some local users on the machine to get things going, but I'll probably have to find an LDAP mailing list to get this issue resolved. Thanks for all the help so far! On 3/16/07, Ralph Castain wrote: I'm afraid I have zero knowledge or experience with gentoo portage, so I can't help you there. I always install our releases from the tarball source as it is pretty trivial to do and avoids any issues. I will have to defer to someone who knows that system to help you from here. It sounds like an installation or configuration issue. Ralph On 3/16/07 3:15 PM, "David Bronke" wrote: On 3/15/07, Ralph Castain wrote: Hmmm...well, a few thoughts to hopefully help with the debugging. One initial comment, though - 1.1.2 is quite old. You might want to upgrade to 1.2 (releasing momentarily - you can use the last release candidate in the interim as it is identical). Version 1.2 doesn't seem to be in gentoo portage yet, so I may end up having to compile from source... I generally prefer to do everything from portage if possible, because it makes upgrades and maintenance much cleaner. Meantime, looking at this output, there appear to be a couple of common possibilities. First, I don't see any of the diagnostic output from after we do a local fork (we do this prior to actually launching the daemon). Is it possible your system doesn't allow you to fork processes (some don't, though it's unusual)? I don't see any problems with forking on this system... I'm able to start a dbus daemon as a regular user without any problems. Second, it could be that the "orted" program isn't being found in your path. People often forget that the path in shells started up by programs isn't necessarily the same as that in their login shell. You might try executing a simple shellscript that outputs the results of "which orted" to verify this is correct. 'which orted' from a shell script gives me '/usr/bin/orted', which seems to be correct. BTW, I should have asked as well: what are you running this on, and how did you configure openmpi? I'm running this on two identical machines with 2 dual-core hyperthreading Xeon processors. (EM64T) I simply installed OpenMPI using portage, with the USE flags "debug fortran pbs -threads". (I've also tried it
Re: [OMPI users] Issues with Get/Put and IRecv
Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688 #8 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #9 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82 #11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at ptest.c:52 -Mike
Re: [OMPI users] Issues with Get/Put and IRecv
On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian
[OMPI users] Issues with Get/Put and IRecv
If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: [expose:21249] *** Process received signal *** [expose:21249] Signal: Segmentation fault (11) [expose:21249] Signal code: Address not mapped (1) [expose:21249] Failing at address: 0xa0 [expose:21249] [ 0] [0x96e440] [expose:21249] [ 1] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_replyreq_send+0xed) [0x2c765d] [expose:21249] [ 2] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5ebe] [expose:21249] [ 3] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389] [expose:21249] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019] [expose:21249] [ 5] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0xb5) [0x2ca9e5] [expose:21249] [ 6] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5cd2] [expose:21249] [ 7] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389] [expose:21249] [ 8] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019] [expose:21249] [ 9] /usr/lib/libmpi.so.0(ompi_request_test+0x35) [0x3d6f05] [expose:21249] [10] /usr/lib/libmpi.so.0(PMPI_Test+0x80) [0x404770] Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... -Mike
Re: [OMPI users] v1.2 Bus Error (/tmp usage)
One option would be to amend your mpirun command with -mca btl ^sm. This turns off the shared memory subsystem, so you'll see some performance loss in your collectives. However, it will reduce your /tmp usage to almost nothing. Others may suggest alternative solutions. Ralph On 3/20/07 2:32 PM, "Hugh Merz"wrote: > Good Day, > > I'm using Open MPI on a diskless cluster (/tmp is part of a 1m ramdisk), and > I found that after upgrading from v1.1.4 to v1.2 that jobs using np > 4 would > fail to start during MPI_Init, due to what appears to be a lack of space in > /tmp. The error output is: > > - > > [tpb200:32193] *** Process received signal *** > [tpb200:32193] Signal: Bus error (7) > [tpb200:32193] Signal code: (2) > [tpb200:32193] Failing at address: 0x2a998f4120 > [tpb200:32193] [ 0] /lib64/tls/libpthread.so.0 [0x2a95f6e430] > [tpb200:32193] [ 1] > /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_grow+0x138) > [0x2a9568abc8] > [tpb200:32193] [ 2] > /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_resize+0x2d) > [0x2a9568b0dd] > [tpb200:32193] [ 3] > /opt/openmpi/1.2.gcc3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_add_procs_same_base > _addr+0x6bf) [0x2a98ba419f] > [tpb200:32193] [ 4] > /opt/openmpi/1.2.gcc3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x28a) > [0x2a9899a4fa] > [tpb200:32193] [ 5] > /opt/openmpi/1.2.gcc3/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xe8) > [0x2a98889308] > [tpb200:32193] [ 6] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_mpi_init+0x45d) > [0x2a956a32ed] > [tpb200:32193] [ 7] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(MPI_Init+0x93) > [0x2a956c5c93] > [tpb200:32193] [ 8] a.out(main+0x1c) [0x400a44] > [tpb200:32193] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > [0x2a960933fb] > [tpb200:32193] [10] a.out [0x40099a] > [tpb200:32193] *** End of error message *** > > ... lots of the above for each process ... > > mpirun noticed that job rank 0 with PID 32040 on node tpb200 exited on signal > 7 (Bus error). > > --/-- > > If I increase the size of my ramdisk or point $TMP to a network filesystem > then jobs start and complete fine, so it's not a showstopper, but with v1.1.4 > (or LAM v7.1.2) I didn't encounter this issue with my default 1m ramdisk (even > with np > 100 ). Is there a way to limit /tmp usage in Open MPI v1.2? > > Hugh > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] CFP: 2007 IEEE International Conference on Cluster Computing (Cluster2007)
*** Call for Papers 2007 IEEE International Conference on Cluster Computing (Cluster2007) 17 - 21 September 2007 Austin, Texas, USA http://www.cluster2007.org/ *** In less than a decade, cluster computing has become the mainstream technology for High Performance Computing and Information Technology. It has gained this prominence by providing reliable, robust and cost-effective platforms for solving many complex computational problems, accessing and visualizing data, and providing information services. Cluster 2007 is hosted by Texas Advanced Computing Center (TACC) in the culturally rich, high-tech city of Austin, Texas. Here you will experience an open forum with fellow cluster researchers, system designers and installers, and users for presenting and discussing new directions, opportunities and ideas that will shape Cluster Computing. Cluster 2007 welcomes paper and poster submissions on innovative work from researchers in academia, industry and government describing original research work in cluster computing. The ability to aggregate the computing power of thousands of processors is a significant milestone in the scalability of commodity systems. Nevertheless, the ability to use both small and large systems efficiently is an ongoing effort in the areas of Networking, Management, Interconnects, and Application Optimization. A continued vigilance and assessment of R efforts is important to insure that Cluster Computing will harness the new technological advances in hardware and software to solve the challenges of our age, and the next generation. Topics of interest are (but not limited to): - Cluster Software and Middleware - Software Environments and Tools - Single -System Image Services - Parallel File Systems and I/O Libraries - Standard Software for Clusters - Cluster Networking - High-Speed Interconnects - High Performance Message Passing Libraries - Lightweight Communication Protocols - Applications - Application Methods and Algorithms - Adaptation to Multi-Core - Data Distribution, Load Balancing & Scaling - MPI/OpenMP Hybrid Computing - Visualization - Performance Analysis and Evaluation - Benchmarking & Profiling Tools - Performance Prediction & Modeling - Cluster Management - Security and Reliability - High Availability Solutions - Resource and Job Management - Administration and Maintenance Tools Paper Submission: Paper Format: Since the camera-ready version of accepted papers must be compliant with the IEEE Xplore format for publication, submitted papers must conform to the following Xplore layout, page limit, and font size. This will insure a size consistency and a uniform layout for the reviewers. (With minimal changes, accepted document can be styled for publication according to Xplore requirements explained in the Xplore formatting guide, which is also in Xplore format). - PDF files only. - Maximum 10 pages for Technical Papers, maximum 6 pages for Posters. - Single-spaced - 2-column numbered pages in IEEE Xplore format - (8.5x11-inch paper, margins in inches-- top: 0.75, bottom: 1.0, sides:0.625, and between columns:0.25, main text: 10pt ). - Format instructions are available for: LaTeX, Word document, PDF files. - Margin and placement guides are available in: Word, PDF and postscript files. - Concerning the final camera-ready version: Maximum of 2 extra pages at $100/page. Camera-ready means PDF file must comply with IEEE Xplore formatting and style for publication. - A conversion tool kit for converting from Word, LaTeX, and PostScript and checking compliance will be available by April 11. See the Final Submission section then. - Electronic Submission: Only web-based submission is accepted. The URL will be announced two weeks before the submission deadline, on the Cluster2007 web page. In addition to the normal technical paper sessions, we plan to organize vendor sessions and industrial exhibitions. Companies interested in participating in the vendor sessions or presenting their exhibits at the meeting or both, should contact the Exhibits Chair member, Ivan R. Judson (jud...@mcs.anl.gov) by July 13, 2007. Important Dates: Technical paper submissions:11 May 2007 Last minute paper abstracts:11 May 2007 Workshop/tutorial proposals:11 May 2007 Poster submissions: 8 Jun 2007 Panel proposals: 8 Jun 2007 Workshop/tutorial notification: 8 Jun 2007 Technical paper notification: 29 Jun 2007 Poster notification:13 Jul 2007 Exhibit proposals: 13 Jul 2007 Last minute papers:
Re: [OMPI users] mpirun exit status for non-existent executable
Well that's not a good thing. I have filed a bug about this (https:// svn.open-mpi.org/trac/ompi/ticket/954) and will try to look into it soon, but don't know when it will get fixed. Thanks for bringing this to our attention! Tim On Mar 20, 2007, at 1:39 AM, Bill Saphir wrote: If you ask mpirun to launch an executable that does not exist, it fails, but returns an exit status of 0. This makes it difficult to write scripts that invoke mpirun and need to check for errors. I'm wondering if a) this is considered a bug and b) whether it might be fixed in a near term release. Example: orterun -np 2 asdflkj -- -- -- Failed to find the following executable: Host: build-linux64 Executable: asdflkj Cannot continue. -- -- -- echo $? 0 I see this behavior for both 1.2 and 1.1.x. Thanks for your help. Bill ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] mpirun exit status for non-existent executable
If you ask mpirun to launch an executable that does not exist, it fails, but returns an exit status of 0. This makes it difficult to write scripts that invoke mpirun and need to check for errors. I'm wondering if a) this is considered a bug and b) whether it might be fixed in a near term release. Example: > orterun -np 2 asdflkj -- Failed to find the following executable: Host: build-linux64 Executable: asdflkj Cannot continue. -- > echo $? 0 I see this behavior for both 1.2 and 1.1.x. Thanks for your help. Bill