Re: [OMPI users] Issues with Get/Put and IRecv
Well, mpich2 and mvapich2 are working smoothly for my app. mpich2 under gige is also giving ~2X the performance of openmpi during the working cases for openmpi. After the paper deadline, I'll attempt to package up a simple test case and send it to the list. Thanks! -Mike Mike Houston wrote: Sadly, I've just hit this problem again, so I'll have to find another MPI implementation as I have a paper deadline quickly approaching. I'm using single threads now, but I had very similar issues when using multiple threads and issuing send/recv on one thread and waiting on a posted MPI_Recv on another. The issue seems to actually be with MPI_Gets. I can do heavy MPI_Put's and things seem okay. But as soon as I have a similar communication pattern with MPI_Get's things get unstable. -Mike Brian Barrett wrote: Mike - In Open MPI 1.2, one-sided is implemented over point-to-point, so I would expect it to be slower. This may or may not be addressed in a future version of Open MPI (I would guess so, but don't want to commit to it). Where you using multiple threads? If so, how? On the good news, I think your call stack looked similar to what I was seeing, so hopefully I can make some progress on a real solution. Brian On Mar 20, 2007, at 8:54 PM, Mike Houston wrote: Well, I've managed to get a working solution, but I'm not sure how I got there. I built a test case that looked like a nice simple version of what I was trying to do and it worked, so I moved the test code into my implementation and low and behold it works. I must have been doing something a little funky in the original pass, likely causing a stack smash somewhere or trying to do a get/put out of bounds. If I have any more problems, I'll let y'all know. I've tested pretty heavy usage up to 128 MPI processes across 16 nodes and things seem to be behaving. I did notice that single sided transfers seem to be a little slower than explicit send/recv, at least on GigE. Once I do some more testing, I'll bring things up on IB and see how things are going. -Mike Mike Houston wrote: Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc
Re: [OMPI users] Issues with Get/Put and IRecv
Sadly, I've just hit this problem again, so I'll have to find another MPI implementation as I have a paper deadline quickly approaching. I'm using single threads now, but I had very similar issues when using multiple threads and issuing send/recv on one thread and waiting on a posted MPI_Recv on another. The issue seems to actually be with MPI_Gets. I can do heavy MPI_Put's and things seem okay. But as soon as I have a similar communication pattern with MPI_Get's things get unstable. -Mike Brian Barrett wrote: Mike - In Open MPI 1.2, one-sided is implemented over point-to-point, so I would expect it to be slower. This may or may not be addressed in a future version of Open MPI (I would guess so, but don't want to commit to it). Where you using multiple threads? If so, how? On the good news, I think your call stack looked similar to what I was seeing, so hopefully I can make some progress on a real solution. Brian On Mar 20, 2007, at 8:54 PM, Mike Houston wrote: Well, I've managed to get a working solution, but I'm not sure how I got there. I built a test case that looked like a nice simple version of what I was trying to do and it worked, so I moved the test code into my implementation and low and behold it works. I must have been doing something a little funky in the original pass, likely causing a stack smash somewhere or trying to do a get/put out of bounds. If I have any more problems, I'll let y'all know. I've tested pretty heavy usage up to 128 MPI processes across 16 nodes and things seem to be behaving. I did notice that single sided transfers seem to be a little slower than explicit send/recv, at least on GigE. Once I do some more testing, I'll bring things up on IB and see how things are going. -Mike Mike Houston wrote: Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688 #8 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #9 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at request/re
Re: [OMPI users] Failure to launch on a remote node. SSH problem?
Also make sure that /tmp is user writable. By default, that is where openmpi likes to stick some files. -Mike David Burns wrote: Could also be a firewall problem. Make sure all nodes in the cluster accept tcp packets from all others. Dave Walker, David T. wrote: I am presently trying to get OpenMPI up and running on a small cluster of MacPros (dual dual-core Xeons) using TCP. Opne MPI was compiled using the intel Fortran Compiler (9.1) and gcc. When I try to launch a job on a remote node, orted starts on the remote node but then times out. I am guessing that the problem is SSH related. Any thoughts? Thanks, Dave Details: I am using SSH, set up as outlined in the FAQ, using ssh-agent to allow passwordless logins. The paths for all the libraries appear to be OK. A simple MPI code (Hello_World_Fortran) launched on node01 will run OK for up to four processors (all on node01). The output is shown here. node01 1247% mpirun --debug-daemons -hostfile machinefile -np 4 Hello_World_Fortran Calling MPI_INIT Calling MPI_INIT Calling MPI_INIT Calling MPI_INIT Fortran version of Hello World, rank2 Rank 0 is present in Fortran version of Hello World. Fortran version of Hello World, rank3 Fortran version of Hello World, rank1 For five processors mpirun tries to start an additional process on node03. Everything launches the same on node01 (four instances of Hello_World_Fortran are launched). On node03, orted starts, but times out after 10 seconds and the output below is generated. node01 1246% mpirun --debug-daemons -hostfile machinefile -np 5 Hello_World_Fortran Calling MPI_INIT Calling MPI_INIT Calling MPI_INIT Calling MPI_INIT [node03:02422] [0,0,1]-[0,0,0] mca_oob_tcp_peer_send_blocking: send() failed with errno=57 [node01.local:21427] ERROR: A daemon on node node03 failed to start as expected. [node01.local:21427] ERROR: There may be more information available from [node01.local:21427] ERROR: the remote shell (see above). [node01.local:21427] ERROR: The daemon exited unexpectedly with status 255. forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM) Here is the ompi info: node01 1248% ompi_info --all Open MPI: 1.1.2 Open MPI SVN revision: r12073 Open RTE: 1.1.2 Open RTE SVN revision: r12073 OPAL: 1.1.2 OPAL SVN revision: r12073 MCA memory: darwin (MCA v1.0, API v1.0, Component v1.1.2) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.2) MCA timer: darwin (MCA v1.0, API v1.0, Component v1.1.2) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2) MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: xgrid (MCA v1.0, API v1.0, Component v1.1.2) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.2) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.2) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2)
Re: [OMPI users] Cell EIB support for OpenMPI
Marcus G. Daniels wrote: Marcus G. Daniels wrote: Mike Houston wrote: The main issue with this, and addressed at the end of the report, is that the code size is going to be a problem as data and code must live in the same 256KB in each SPE. They mention dynamic overlay loading, which is also how we deal with large code size, but things get tricky and slow with the potentially needed save and restore of registers and LS. I did some checking on this. Apparently the trunk of GCC and the latest GNU Binutils handle overlays. Because the SPU compiler knows of its limits address space, the ELF object code sections reflect this, and the the linker can transparently generate stubs to trigger the loading. GCC also has options like -ffunction-sections that enable the linker to optimize for locality. So even though the OpenMPI shared libraries in total appear to have a footprint about four times too big for code alone (don't know about the typical stack & heap requirements), perhaps it's still doable without a big effort to strip down OpenMPI? But loading an overlay can be quite expensive depending on how much needs to be loaded and how much user data/code needs to be restored. If the user is trying to use most of the LS for data, which is perfectly sane and reasonable, then you might have to load multiple overlays to complete a function. We've also been having issues with mixing manual overlay loading of our code with the autoloading generated by the compiler. Regardless, it would be interesting to see if this can even be made to work. If so, it might really help people get apps up on Cell since it can be reasonably thought of as a cluster on a chip, backed by a larger address space. -Mike
Re: [OMPI users] Cell EIB support for OpenMPI
That's pretty cool. The main issue with this, and addressed at the end of the report, is that the code size is going to be a problem as data and code must live in the same 256KB in each SPE. They mention dynamic overlay loading, which is also how we deal with large code size, but things get tricky and slow with the potentially needed save and restore of registers and LS. It would be interesting to see how much of MPI could be implemented and how much is really needed. Maybe it's time to think about and MPI-ES spec? -Mike Marcus G. Daniels wrote: Hi, Has anyone investigated adding intra chip Cell EIB messaging to OpenMPI? It seems like it ought to work. This paper seems pretty convincing: http://www.cs.fsu.edu/research/reports/TR-061215.pdf ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Issues with Get/Put and IRecv
Well, I've managed to get a working solution, but I'm not sure how I got there. I built a test case that looked like a nice simple version of what I was trying to do and it worked, so I moved the test code into my implementation and low and behold it works. I must have been doing something a little funky in the original pass, likely causing a stack smash somewhere or trying to do a get/put out of bounds. If I have any more problems, I'll let y'all know. I've tested pretty heavy usage up to 128 MPI processes across 16 nodes and things seem to be behaving. I did notice that single sided transfers seem to be a little slower than explicit send/recv, at least on GigE. Once I do some more testing, I'll bring things up on IB and see how things are going. -Mike Mike Houston wrote: Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688 #8 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #9 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82 #11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at ptest.c:52 -Mike ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Issues with Get/Put and IRecv
Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... Hi Mike - I've spent some time this afternoon looking at the problem and have some ideas on what could be happening. I don't think it's a data mismatch (the data intended for the IRecv getting delivered to the Get), but more a problem with the call to MPI_Test perturbing the progress flow of the one-sided engine. I can see one or two places where it's possible this could happen, although I'm having trouble replicating the problem with any test case I can write. Is it possible for you to share the code causing the problem (or some small test case)? It would make me feel considerably better if I could really understand the conditions required to end up in a seg fault state. Thanks, Brian Well, I can give you a linux x86 binary if that would do it. The code is huge as it's part of a much larger system, so there is no such thing as a simple case at the moment, and the code is in pieces an largely unrunnable now with all the hacking... I basically have one thread spinning on an MPI_Test on a posted IRecv while being used as the target to the MPI_Get. I'll see if I can hack together a simple version that breaks late tonight. I've just played with posting a send to that IRecv, issuing the MPI_Get, handshaking and then posting another IRecv and the MPI_Test continues to eat it, but in a memcpy: #0 0x001c068c in memcpy () from /lib/libc.so.6 #1 0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254 #2 0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, replyreq=0x83c1180) at osc_pt2pt_data_move.c:411 #3 0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582 #4 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #5 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #6 0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, origin=1, count=1) at osc_pt2pt_sync.c:60 #7 0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb (pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688 #8 0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769 #9 0x00aa3019 in opal_progress () at runtime/opal_progress.c:288 #10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82 #11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, status=0xaffc23fc) at ptest.c:52 -Mike
[OMPI users] Issues with Get/Put and IRecv
If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: [expose:21249] *** Process received signal *** [expose:21249] Signal: Segmentation fault (11) [expose:21249] Signal code: Address not mapped (1) [expose:21249] Failing at address: 0xa0 [expose:21249] [ 0] [0x96e440] [expose:21249] [ 1] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_replyreq_send+0xed) [0x2c765d] [expose:21249] [ 2] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5ebe] [expose:21249] [ 3] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389] [expose:21249] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019] [expose:21249] [ 5] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0xb5) [0x2ca9e5] [expose:21249] [ 6] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5cd2] [expose:21249] [ 7] /usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389] [expose:21249] [ 8] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019] [expose:21249] [ 9] /usr/lib/libmpi.so.0(ompi_request_test+0x35) [0x3d6f05] [expose:21249] [10] /usr/lib/libmpi.so.0(PMPI_Test+0x80) [0x404770] Anyone have suggestions? Sadly, I need to have IRecv's posted. I'll attempt to find a workaround, but it looks like the posed IRecv is getting all the data of the MPI_Get from the other node. It's like the message tagging is getting ignored. I've never tried posting two different IRecv's with different message tags either... -Mike
Re: [OMPI users] Signal 13
I've been having similar issues with brand new FC5/6 and RHEL5 machines, but our FC4/RHEL4 machines are just fine. On the FC5/6 RHEL5 machines, I can get things to run as root. There must be some ACL or security setting issue that's enabled by default on the newer distros. If I figure it out this weekend, I'll let you know. If anyone else knows the solution, please post to the list. -Mike David Bronke wrote: I've been trying to get OpenMPI working on two of the computers at a lab I help administer, and I'm running into a rather large issue. When running anything using mpirun as a normal user, I get the following output: $ mpirun --no-daemonize --host localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost /workspace/bronke/mpi/hello mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on signal 13. [trixie:18104] ERROR: A daemon on node localhost failed to start as expected. [trixie:18104] ERROR: There may be more information available from [trixie:18104] ERROR: the remote shell (see above). [trixie:18104] The daemon received a signal 13. 8 additional processes aborted (not shown) However, running the same exact command line as root works fine: $ sudo mpirun --no-daemonize --host localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost /workspace/bronke/mpi/hello Password: p is 8, my_rank is 0 p is 8, my_rank is 1 p is 8, my_rank is 2 p is 8, my_rank is 3 p is 8, my_rank is 6 p is 8, my_rank is 7 Greetings from process 1! Greetings from process 2! Greetings from process 3! p is 8, my_rank is 5 p is 8, my_rank is 4 Greetings from process 4! Greetings from process 5! Greetings from process 6! Greetings from process 7! I've looked up signal 13, and have found that it is apparently SIGPIPE; I also found a thread on the LAM-MPI site: http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php However, this thread seems to indicate that the problem would be in the application, (/workspace/bronke/mpi/hello in this case) but there are no pipes in use in this app, and the fact that it works as expected as root doesn't seem to fit either. I have tried running mpirun with --verbose and it doesn't show any more output than without it, so I've run into a sort of dead-end on this issue. Does anyone know of any way I can figure out what's going wrong or how I can fix it? Thanks!
[OMPI users] Fun with threading
At least with 1.1.4, I'm having a heck of a time with enabling multi-threading. Configuring with --with-threads=posix --enable-mpi-threads --enable-progress-threads leads to mpirun just hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname, and I can't crtl-c to kill it, I have to kill -9 it. Removing progress threads support results in the same behavior. Removing --enable-mpi-threads gets mpirun working again, but not the thread protection I need. What is the status for multi thread support? It looks like it's still largely untested from my reading of the mailing lists. We actually have an application that would be much easier to deal with if we could have two threads in a process both using MPI. Funneling everything through a single processor creates a locking nightmare, and generally means we will be forced to spin checking a IRecv and the status of a data structure instead of having one thread happily sitting on a blocking receive and the other watching the data structure, basically pissing away a processor that we could be using to do something useful. (We are basically doing a simplified version of DSM and we need to respond to remote data requests). At the moment, it seems that when running without threading support enabled, if we only post a receive on a single thread, things are mostly happy, except if one thread in process sends to the other thread in the same process who has posted a receive. Under TCP, the send fails with: *** An error occurred in MPI_Send *** on communicator MPI_COMM_WORLD *** MPI_ERR_INTERN: internal error *** MPI_ERRORS_ARE_FATAL (goodbye) [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104 SM has undefined results. Obviously I'm playing fast and loose, which is why I'm attempting to get threading support to work to see if it solve the headaches. If you really want to have some fun, have a posted MPI_Recv on one thread and issue an MPI_Barrier on the other (with SM): Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1c [0] func:/usr/lib/libopal.so.0 [0xc030f4] [1] func:/lib/tls/libpthread.so.0 [0x46f93890] [2] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_match+0xb08) [0x14ec38] [3] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback+0x2f9) [0x14f7e9] [4] func:/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0xa87) [0x806c07] [5] func:/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x39) [0x510c69] [6] func:/usr/lib/libopal.so.0(opal_progress+0x69) [0xbecc39] [7] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x785) [0x14d675] [8] func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual_localcompleted+0x8c) [0x5cc3fc] [9] func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_two_procs+0x76) [0x5ceef6] [10] func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x38) [0x5cc638] [11] func:/usr/lib/libmpi.so.0(PMPI_Barrier+0xe9) [0x29a1b9] -Mike
Re: [O-MPI users] Infiniband performance problems (mvapi)
Sometimes getting crashes: mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 -hostfile /u/mhouston/mpihosts mpi_bandwidth 25 131072 mpirun noticed that job rank 0 with PID 10611 on node "spire-2.stanford.edu" exited on signal 11. 1 process killed (possibly by Open MPI). The backtrace is bogus, else I'd drop it in. Setting the number of messages <=10 always seems to work. -Mike Tim S. Woodall wrote: Mike, There appears to be an issue in our mvapi get protocol. To temporarily disable this: /u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 25 131072 131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec) Mike Houston wrote: What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I believe was probably corrected today and should be in the next release candidate. Thanks, Tim Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives: [0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [O-MPI users] Infiniband performance problems (mvapi)
Better, but still having issues at lots of outstanding messages: mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 mpi_bandwidth 1000 131072 131072 669.574904 (MillionBytes/sec) 638.556389(MegaBytes/sec) mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 mpi_bandwidth 1 131072 131072 115.873284 (MillionBytes/sec) 110.505375(MegaBytes/sec) Sorry to be such a pain... We need the speed of MVAPICH with the threading support of OpenMPI... -Mike Tim S. Woodall wrote: Mike, There appears to be an issue in our mvapi get protocol. To temporarily disable this: /u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 25 131072 131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec) Mike Houston wrote: What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I believe was probably corrected today and should be in the next release candidate. Thanks, Tim Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives: [0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[O-MPI users] TCP problems
I have things working now. I needed to limit OpenMPI to actual working interfaces (thanks for the tip). It still seems that should be figured out correctly... Now I've moved onto stress testing with the bandwidth testing app I posted earlier in the Infiniband thread: mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile /u/mhouston/mpihosts mpi_bandwidth 3750 262144 262144 109.697279 (MillionBytes/sec) 104.615478(MegaBytes/sec) mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile /u/mhouston/mpihosts mpi_bandwidth 4000 262144 [spire-2.Stanford.EDU:06645] mca_btl_tcp_frag_send: writev failed with errno=104mpirun noticed that job rank 1 with PID 21281 on node "spire-3.stanford.edu" exited on signal 11. Cranking up the number of messages in flight makes things really unhappy. I haven't seen this behavior with LAM or MPICH so I thought I'd mention it. Thanks! -Mike
Re: [O-MPI users] Infiniband performance problems (mvapi)
What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I believe was probably corrected today and should be in the next release candidate. Thanks, Tim Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives: [0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [O-MPI users] Infiniband performance problems (mvapi)
mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2 -hostfile /u/mhouston/mpihosts mpi_bandwidth 21 131072 131072 519.922184 (MillionBytes/sec) 495.836433(MegaBytes/sec) mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2 -hostfile /u/mhouston/mpihosts mpi_bandwidth 22 131072 131072 3.360296 (MillionBytes/sec) 3.204628(MegaBytes/sec) Moving from 21 messages to 22 causes a HUGE drop in performance. The app tries to send all of the messages non-blocking at once... Setting -mca mpi_leave_pinned 1 causes: [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73412fc repeated until it eventually hangs. -Mike Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives: [0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [O-MPI users] Infiniband performance problems (mvapi)
Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives: [0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] [0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [O-MPI users] Infiniband performance problems (mvapi)
That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 700MB/s for messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but still pretty darn good. My guess is that I can play with the setting more to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [O-MPI users] Infiniband performance problems (mvapi)
I'll give it a go. Attached is the code. Thanks! -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? You might try running w/ the: mpirun -mca mpi_leave_pinned 1 Which will cause mvapi port to maintain an mru cache of registrations, rather than dynamically pinning/unpinning memory. If this does not resolve the BW problems, try increasing the resources allocated to each connection: -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 Also can you forward me a copy of the test code or a reference to it? Thanks, Tim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users /* * Copyright (C) 2002-2003 the Network-Based Computing Laboratory * (NBCL), The Ohio State University. */ #include "mpi.h" #include #include #include #include #include #define MYBUFSIZE (4*1024*1028) #define MAX_REQ_NUM 10 char s_buf1[MYBUFSIZE]; char r_buf1[MYBUFSIZE]; MPI_Request request[MAX_REQ_NUM]; MPI_Status stat[MAX_REQ_NUM]; int main(int argc,char *argv[]) { int myid, numprocs, i; int size, loop, page_size; char *s_buf, *r_buf; double t_start=0.0, t_end=0.0, t=0.0; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if ( argc < 3 ) { fprintf(stderr, "Usage: bw loop msg_size\n"); MPI_Finalize(); return 0; } size=atoi(argv[2]); loop = atoi(argv[1]); if(size > MYBUFSIZE){ fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); MPI_Finalize(); return 0; } if(loop > MAX_REQ_NUM){ fprintf(stderr, "Maximum number of iterations is %d\n",MAX_REQ_NUM); MPI_Finalize(); return 0; } page_size = getpagesize(); s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * page_size); r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * page_size); assert( (s_buf != NULL) && (r_buf != NULL) ); for ( i=0; i
Re: [O-MPI users] Infiniband performance problems (mvapi)
Same performance problems with that fix. In fact, if I ever use tcp currently, OpenMPI crashes... -Mike George Bosilca wrote: If there are several networks available between 2 nodes they will get selected. That can lead to poor performances in the case when the second network is a high latency one (like TCP). If you want to insure that only the IB driver is loaded you have to add in the .openmpi/mca-params.conf the following line: btl_base_exclude=tcp This will force the TCP driver to be unloaded (always). In order to use this option you have to have all nodes reacheable via IB. Thanks, george. On Oct 31, 2005, at 10:50 AM, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? Thanks! -Mike ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran
[O-MPI users] Infiniband performance problems (mvapi)
When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but we really need the thread support in OpenMPI. Ideas? I noticed there are a plethora of runtime options configurable for mvapi. Do I need to tweak these to get performacne up? Thanks! -Mike
[O-MPI users] TCP problems with 1.0rc4
We can't seem to run across TCP. We did a default 'configure'. Shared memory seems to work, but trying tcp give us: [0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 I'm assuming that the tcp backend is the most thoroughly tested, so I thought I'd ask in case we are doing something silly. The above is caused when running the OSU NBCL mpi_bandwidth test. Thanks! -Mike