Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-27 Thread Mike Houston
Well, mpich2 and mvapich2 are working smoothly for my app.  mpich2 under 
gige is also giving ~2X the performance of openmpi during the working 
cases for openmpi.  After the paper deadline, I'll attempt to package up 
a simple test case and send it to the list.


Thanks!

-Mike

Mike Houston wrote:
Sadly, I've just hit this problem again, so I'll have to find another 
MPI implementation as I have a paper deadline quickly approaching.


I'm using single threads now, but I had very similar issues when using 
multiple threads and issuing send/recv on one thread and waiting on a 
posted MPI_Recv on another.  The issue seems to actually be with 
MPI_Gets.  I can do heavy MPI_Put's and things seem okay.  But as soon 
as I have a similar communication pattern with MPI_Get's things get 
unstable.


-Mike

Brian Barrett wrote:
  

Mike -

In Open MPI 1.2, one-sided is implemented over point-to-point, so I  
would expect it to be slower.  This may or may not be addressed in a  
future version of Open MPI (I would guess so, but don't want to  
commit to it).  Where you using multiple threads?  If so, how?


On the good news, I think your call stack looked similar to what I  
was seeing, so hopefully I can make some progress on a real solution.


Brian

On Mar 20, 2007, at 8:54 PM, Mike Houston wrote:

  

Well, I've managed to get a working solution, but I'm not sure how  
I got

there.  I built a test case that looked like a nice simple version of
what I was trying to do and it worked, so I moved the test code  
into my

implementation and low and behold it works.  I must have been doing
something a little funky in the original pass, likely causing a stack
smash somewhere or trying to do a get/put out of bounds.

If I have any more problems, I'll let y'all know.  I've tested pretty
heavy usage up to 128 MPI processes across 16 nodes and things seem to
be behaving.  I did notice that single sided transfers seem to be a
little slower than explicit send/recv, at least on GigE.  Once I do  
some

more testing, I'll bring things up on IB and see how things are going.

-Mike

Mike Houston wrote:

  

Brian Barrett wrote:

  


On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:




  

If I only do gets/puts, things seem to be working correctly with
version
1.2.  However, if I have a posted Irecv on the target node and  
issue a

MPI_Get against that target, MPI_Test on the posed IRecv causes a
segfaults:

Anyone have suggestions?  Sadly, I need to have IRecv's posted.   
I'll

attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like
the
message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...


  


Hi Mike -

I've spent some time this afternoon looking at the problem and have
some ideas on what could be happening.  I don't think it's a data
mismatch (the data intended for the IRecv getting delivered to the
Get), but more a problem with the call to MPI_Test perturbing the
progress flow of the one-sided engine.  I can see one or two places
where it's possible this could happen, although I'm having trouble
replicating the problem with any test case I can write.  Is it
possible for you to share the code causing the problem (or some  
small

test case)?  It would make me feel considerably better if I could
really understand the conditions required to end up in a seg fault
state.

Thanks,

Brian



  
Well, I can give you a linux x86 binary if that would do it.  The  
code
is huge as it's part of a much larger system, so there is no such  
thing

as a simple case at the moment, and the code is in pieces an largely
unrunnable now with all the hacking...

I basically have one thread spinning on an MPI_Test on a posted IRecv
while being used as the target to the MPI_Get.  I'll see if I can  
hack

together a simple version that breaks late tonight.  I've just played
with posting a send to that IRecv, issuing the MPI_Get,  
handshaking and
then posting another IRecv and the MPI_Test continues to eat it,  
but in

a memcpy:

#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0,
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668,
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582
#4  0x00ea1389 in ompi_osc_pt2pt_progress () at  
osc_pt2pt_component.c:769

#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668,
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-26 Thread Mike Houston
Sadly, I've just hit this problem again, so I'll have to find another 
MPI implementation as I have a paper deadline quickly approaching.


I'm using single threads now, but I had very similar issues when using 
multiple threads and issuing send/recv on one thread and waiting on a 
posted MPI_Recv on another.  The issue seems to actually be with 
MPI_Gets.  I can do heavy MPI_Put's and things seem okay.  But as soon 
as I have a similar communication pattern with MPI_Get's things get 
unstable.


-Mike

Brian Barrett wrote:

Mike -

In Open MPI 1.2, one-sided is implemented over point-to-point, so I  
would expect it to be slower.  This may or may not be addressed in a  
future version of Open MPI (I would guess so, but don't want to  
commit to it).  Where you using multiple threads?  If so, how?


On the good news, I think your call stack looked similar to what I  
was seeing, so hopefully I can make some progress on a real solution.


Brian

On Mar 20, 2007, at 8:54 PM, Mike Houston wrote:

  
Well, I've managed to get a working solution, but I'm not sure how  
I got

there.  I built a test case that looked like a nice simple version of
what I was trying to do and it worked, so I moved the test code  
into my

implementation and low and behold it works.  I must have been doing
something a little funky in the original pass, likely causing a stack
smash somewhere or trying to do a get/put out of bounds.

If I have any more problems, I'll let y'all know.  I've tested pretty
heavy usage up to 128 MPI processes across 16 nodes and things seem to
be behaving.  I did notice that single sided transfers seem to be a
little slower than explicit send/recv, at least on GigE.  Once I do  
some

more testing, I'll bring things up on IB and see how things are going.

-Mike

Mike Houston wrote:


Brian Barrett wrote:

  

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:





If I only do gets/puts, things seem to be working correctly with
version
1.2.  However, if I have a posted Irecv on the target node and  
issue a

MPI_Get against that target, MPI_Test on the posed IRecv causes a
segfaults:

Anyone have suggestions?  Sadly, I need to have IRecv's posted.   
I'll

attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like
the
message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...


  

Hi Mike -

I've spent some time this afternoon looking at the problem and have
some ideas on what could be happening.  I don't think it's a data
mismatch (the data intended for the IRecv getting delivered to the
Get), but more a problem with the call to MPI_Test perturbing the
progress flow of the one-sided engine.  I can see one or two places
where it's possible this could happen, although I'm having trouble
replicating the problem with any test case I can write.  Is it
possible for you to share the code causing the problem (or some  
small

test case)?  It would make me feel considerably better if I could
really understand the conditions required to end up in a seg fault
state.

Thanks,

Brian



Well, I can give you a linux x86 binary if that would do it.  The  
code
is huge as it's part of a much larger system, so there is no such  
thing

as a simple case at the moment, and the code is in pieces an largely
unrunnable now with all the hacking...

I basically have one thread spinning on an MPI_Test on a posted IRecv
while being used as the target to the MPI_Get.  I'll see if I can  
hack

together a simple version that breaks late tonight.  I've just played
with posting a send to that IRecv, issuing the MPI_Get,  
handshaking and
then posting another IRecv and the MPI_Test continues to eat it,  
but in

a memcpy:

#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0,
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668,
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582
#4  0x00ea1389 in ompi_osc_pt2pt_progress () at  
osc_pt2pt_component.c:769

#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668,
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb
(pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688
#8  0x00ea1389 in ompi_osc_pt2pt_progress () at  
osc_pt2pt_component.c:769

#9  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430,
completed=0xaffc2434, status=0xaffc23fc) at request/re

Re: [OMPI users] Failure to launch on a remote node. SSH problem?

2007-03-24 Thread Mike Houston
Also make sure that /tmp is user writable.  By default, that is where 
openmpi likes to stick some files.


-Mike

David Burns wrote:
Could also be a firewall problem. Make sure all nodes in the cluster 
accept tcp packets from all others.


Dave

Walker, David T. wrote:
  

I am presently trying to get OpenMPI up and running on a small cluster
of MacPros (dual dual-core Xeons) using TCP. Opne MPI was compiled using
the intel Fortran Compiler (9.1) and gcc.  When I try to launch a job on
a remote node, orted starts on the remote node but then times out.  I am
guessing that the problem is SSH related.  Any thoughts?

Thanks,

Dave

Details:  


I am using SSH, set up as outlined in the FAQ, using ssh-agent to allow
passwordless logins.  The paths for all the libraries appear to be OK.  


A simple MPI code (Hello_World_Fortran) launched on node01 will run OK
for up to four processors (all on node01).  The output is shown here.

node01 1247% mpirun --debug-daemons -hostfile machinefile -np 4
Hello_World_Fortran
 Calling MPI_INIT
 Calling MPI_INIT
 Calling MPI_INIT
 Calling MPI_INIT
Fortran version of Hello World, rank2
Rank 0 is present in Fortran version of Hello World.
Fortran version of Hello World, rank3
Fortran version of Hello World, rank1

For five processors mpirun tries to start an additional process on
node03.  Everything launches the same on node01 (four instances of
Hello_World_Fortran are launched).  On node03, orted starts, but times
out after 10 seconds and the output below is generated.   


node01 1246% mpirun --debug-daemons -hostfile machinefile -np 5
Hello_World_Fortran
 Calling MPI_INIT
 Calling MPI_INIT
 Calling MPI_INIT
 Calling MPI_INIT
[node03:02422] [0,0,1]-[0,0,0] mca_oob_tcp_peer_send_blocking: send()
failed with errno=57
[node01.local:21427] ERROR: A daemon on node node03 failed to start as
expected.
[node01.local:21427] ERROR: There may be more information available from
[node01.local:21427] ERROR: the remote shell (see above).
[node01.local:21427] ERROR: The daemon exited unexpectedly with status
255.
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)

Here is the ompi info:


node01 1248% ompi_info --all
Open MPI: 1.1.2
   Open MPI SVN revision: r12073
Open RTE: 1.1.2
   Open RTE SVN revision: r12073
OPAL: 1.1.2
   OPAL SVN revision: r12073
  MCA memory: darwin (MCA v1.0, API v1.0, Component v1.1.2)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.1.2)
   MCA timer: darwin (MCA v1.0, API v1.0, Component v1.1.2)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2)
MCA coll: hierarch (MCA v1.0, API v1.0, Component
v1.1.2)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.0, Component
v1.1.2)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component
v1.1.2)
 MCA ras: localhost (MCA v1.0, API v1.0, Component
v1.1.2)
 MCA ras: xgrid (MCA v1.0, API v1.0, Component v1.1.2)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component
v1.1.2)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
v1.1.2)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2)
 

Re: [OMPI users] Cell EIB support for OpenMPI

2007-03-23 Thread Mike Houston



Marcus G. Daniels wrote:

Marcus G. Daniels wrote:
  

Mike Houston wrote:
  

The main issue with this, and addressed at the end 
of the report, is that the code size is going to be a problem as data 
and code must live in the same 256KB in each SPE.  They mention dynamic 
overlay loading, which is also how we deal with large code size, but 
things get tricky and slow with the potentially needed save and restore 
of registers and LS. 

  


I did some checking on this.   Apparently the trunk of GCC and the 
latest GNU Binutils handle overlays.   Because the SPU compiler knows of 
its limits address space, the ELF object code sections reflect this, and 
the the linker can transparently generate stubs to trigger the 
loading.   GCC also has options like -ffunction-sections that enable the 
linker to optimize for locality. 

So even though the OpenMPI shared libraries in total appear to have a 
footprint about four times too big for code alone (don't know about the 
typical stack & heap requirements), perhaps it's still doable without a 
big effort to strip down OpenMPI?
  
But loading an overlay can be quite expensive depending on how much 
needs to be loaded and how much user data/code needs to be restored.  If 
the user is trying to use most of the LS for data, which is perfectly 
sane and reasonable, then you might have to load multiple overlays to 
complete a function. We've also been having issues with mixing manual 
overlay loading of our code with the autoloading generated by the compiler.


Regardless, it would be interesting to see if this can even be made to 
work.  If so, it might really help people get apps up on Cell since it 
can be reasonably thought of as a cluster on a chip, backed by a larger 
address space.


-Mike


Re: [OMPI users] Cell EIB support for OpenMPI

2007-03-22 Thread Mike Houston
That's pretty cool.  The main issue with this, and addressed at the end 
of the report, is that the code size is going to be a problem as data 
and code must live in the same 256KB in each SPE.  They mention dynamic 
overlay loading, which is also how we deal with large code size, but 
things get tricky and slow with the potentially needed save and restore 
of registers and LS.  It would be interesting to see how much of MPI 
could be implemented and how much is really needed.  Maybe it's time to 
think about and MPI-ES spec?


-Mike

Marcus G. Daniels wrote:

Hi,

Has anyone investigated adding intra chip Cell EIB messaging to OpenMPI?
It seems like it ought to work.  This paper seems pretty convincing:

http://www.cs.fsu.edu/research/reports/TR-061215.pdf
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  


Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
Well, I've managed to get a working solution, but I'm not sure how I got 
there.  I built a test case that looked like a nice simple version of 
what I was trying to do and it worked, so I moved the test code into my 
implementation and low and behold it works.  I must have been doing 
something a little funky in the original pass, likely causing a stack 
smash somewhere or trying to do a get/put out of bounds.


If I have any more problems, I'll let y'all know.  I've tested pretty 
heavy usage up to 128 MPI processes across 16 nodes and things seem to 
be behaving.  I did notice that single sided transfers seem to be a 
little slower than explicit send/recv, at least on GigE.  Once I do some 
more testing, I'll bring things up on IB and see how things are going.


-Mike

Mike Houston wrote:

Brian Barrett wrote:
  

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:

  

If I only do gets/puts, things seem to be working correctly with  
version

1.2.  However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a  
segfaults:


Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll
attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like  
the

message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...

  

Hi Mike -

I've spent some time this afternoon looking at the problem and have  
some ideas on what could be happening.  I don't think it's a data  
mismatch (the data intended for the IRecv getting delivered to the  
Get), but more a problem with the call to MPI_Test perturbing the  
progress flow of the one-sided engine.  I can see one or two places  
where it's possible this could happen, although I'm having trouble  
replicating the problem with any test case I can write.  Is it  
possible for you to share the code causing the problem (or some small  
test case)?  It would make me feel considerably better if I could  
really understand the conditions required to end up in a seg fault  
state.


Thanks,

Brian
  

Well, I can give you a linux x86 binary if that would do it.  The code 
is huge as it's part of a much larger system, so there is no such thing 
as a simple case at the moment, and the code is in pieces an largely 
unrunnable now with all the hacking...


I basically have one thread spinning on an MPI_Test on a posted IRecv 
while being used as the target to the MPI_Get.  I'll see if I can hack 
together a simple version that breaks late tonight.  I've just played 
with posting a send to that IRecv, issuing the MPI_Get, handshaking and 
then posting another IRecv and the MPI_Test continues to eat it, but in 
a memcpy:


#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, 
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, 
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582

#4  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, 
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688

#8  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#9  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, 
completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82
#11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, 
status=0xaffc23fc) at ptest.c:52


-Mike
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  


Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston



Brian Barrett wrote:

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:

  
If I only do gets/puts, things seem to be working correctly with  
version

1.2.  However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a  
segfaults:


Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll
attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like  
the

message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...



Hi Mike -

I've spent some time this afternoon looking at the problem and have  
some ideas on what could be happening.  I don't think it's a data  
mismatch (the data intended for the IRecv getting delivered to the  
Get), but more a problem with the call to MPI_Test perturbing the  
progress flow of the one-sided engine.  I can see one or two places  
where it's possible this could happen, although I'm having trouble  
replicating the problem with any test case I can write.  Is it  
possible for you to share the code causing the problem (or some small  
test case)?  It would make me feel considerably better if I could  
really understand the conditions required to end up in a seg fault  
state.


Thanks,

Brian
  
Well, I can give you a linux x86 binary if that would do it.  The code 
is huge as it's part of a much larger system, so there is no such thing 
as a simple case at the moment, and the code is in pieces an largely 
unrunnable now with all the hacking...


I basically have one thread spinning on an MPI_Test on a posted IRecv 
while being used as the target to the MPI_Get.  I'll see if I can hack 
together a simple version that breaks late tonight.  I've just played 
with posting a send to that IRecv, issuing the MPI_Get, handshaking and 
then posting another IRecv and the MPI_Test continues to eat it, but in 
a memcpy:


#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, 
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, 
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582

#4  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, 
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688

#8  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#9  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, 
completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82
#11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, 
status=0xaffc23fc) at ptest.c:52


-Mike


[OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
If I only do gets/puts, things seem to be working correctly with version 
1.2.  However, if I have a posted Irecv on the target node and issue a 
MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults:


[expose:21249] *** Process received signal ***
[expose:21249] Signal: Segmentation fault (11)
[expose:21249] Signal code: Address not mapped (1)
[expose:21249] Failing at address: 0xa0
[expose:21249] [ 0] [0x96e440]
[expose:21249] [ 1] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_replyreq_send+0xed) 
[0x2c765d]

[expose:21249] [ 2] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5ebe]
[expose:21249] [ 3] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389]

[expose:21249] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019]
[expose:21249] [ 5] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0xb5) 
[0x2ca9e5]

[expose:21249] [ 6] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5cd2]
[expose:21249] [ 7] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389]

[expose:21249] [ 8] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019]
[expose:21249] [ 9] /usr/lib/libmpi.so.0(ompi_request_test+0x35) [0x3d6f05]
[expose:21249] [10] /usr/lib/libmpi.so.0(PMPI_Test+0x80) [0x404770]

Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll 
attempt to find a workaround, but it looks like the posed IRecv is 
getting all the data of the MPI_Get from the other node.  It's like the 
message tagging is getting ignored.  I've never tried posting two 
different IRecv's with different message tags either...


-Mike


Re: [OMPI users] Signal 13

2007-03-15 Thread Mike Houston
I've been having similar issues with brand new FC5/6 and RHEL5 machines, 
but our FC4/RHEL4 machines are just fine.  On the FC5/6 RHEL5 machines, 
I can get things to run as root.  There must be some ACL or security 
setting issue that's enabled by default on the newer distros.  If I 
figure it out this weekend, I'll let you know.  If anyone else knows the 
solution, please post to the list.


-Mike

David Bronke wrote:

I've been trying to get OpenMPI working on two of the computers at a
lab I help administer, and I'm running into a rather large issue. When
running anything using mpirun as a normal user, I get the following
output:


$ mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
on signal 13.
[trixie:18104] ERROR: A daemon on node localhost failed to start as expected.
[trixie:18104] ERROR: There may be more information available from
[trixie:18104] ERROR: the remote shell (see above).
[trixie:18104] The daemon received a signal 13.
8 additional processes aborted (not shown)


However, running the same exact command line as root works fine:


$ sudo mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
Password:
p is 8, my_rank is 0
p is 8, my_rank is 1
p is 8, my_rank is 2
p is 8, my_rank is 3
p is 8, my_rank is 6
p is 8, my_rank is 7
Greetings from process 1!

Greetings from process 2!

Greetings from process 3!

p is 8, my_rank is 5
p is 8, my_rank is 4
Greetings from process 4!

Greetings from process 5!

Greetings from process 6!

Greetings from process 7!


I've looked up signal 13, and have found that it is apparently
SIGPIPE; I also found a thread on the LAM-MPI site:
http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
However, this thread seems to indicate that the problem would be in
the application, (/workspace/bronke/mpi/hello in this case) but there
are no pipes in use in this app, and the fact that it works as
expected as root doesn't seem to fit either. I have tried running
mpirun with --verbose and it doesn't show any more output than without
it, so I've run into a sort of dead-end on this issue. Does anyone
know of any way I can figure out what's going wrong or how I can fix
it?

Thanks!
  


[OMPI users] Fun with threading

2007-03-13 Thread Mike Houston
At least with 1.1.4, I'm having a heck of a time with enabling 
multi-threading.  Configuring with --with-threads=posix 
--enable-mpi-threads --enable-progress-threads leads to mpirun just 
hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname, 
and I can't crtl-c to kill it, I have to kill -9 it.  Removing progress 
threads support results in the same behavior.  Removing 
--enable-mpi-threads gets mpirun working again, but not the thread 
protection I need.


What is the status for multi thread support?  It looks like it's still 
largely untested from my reading of the mailing lists.  We actually have 
an application that would be much easier to deal with if we could have 
two threads in a process both using MPI.  Funneling everything through a 
single processor creates a locking nightmare, and generally means we 
will be forced to spin checking a IRecv and the status of a data 
structure instead of having one thread happily sitting on a blocking 
receive and the other watching the data structure, basically pissing 
away a processor that we could be using to do something useful.  (We are 
basically doing a simplified version of DSM and we need to respond to 
remote data requests).


At the moment, it seems that when running without threading support 
enabled, if we only post a receive on a single thread, things are mostly 
happy, except if one thread in process sends to the other thread in the 
same process who has posted a receive.  Under TCP, the send fails with:


*** An error occurred in MPI_Send
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL (goodbye)
[0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104

SM has undefined results.

Obviously I'm playing fast and loose, which is why I'm attempting to get 
threading support to work to see if it solve the headaches.  If you 
really want to have some fun, have a posted MPI_Recv on one thread and 
issue an MPI_Barrier on the other (with SM):


Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1c
[0] func:/usr/lib/libopal.so.0 [0xc030f4]
[1] func:/lib/tls/libpthread.so.0 [0x46f93890]
[2] 
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_match+0xb08) 
[0x14ec38]
[3] 
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback+0x2f9) 
[0x14f7e9]
[4] 
func:/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0xa87) 
[0x806c07]

[5] func:/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x39) [0x510c69]
[6] func:/usr/lib/libopal.so.0(opal_progress+0x69) [0xbecc39]
[7] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x785) [0x14d675]
[8] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual_localcompleted+0x8c) 
[0x5cc3fc]
[9] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_two_procs+0x76) 
[0x5ceef6]
[10] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x38) 
[0x5cc638]

[11] func:/usr/lib/libmpi.so.0(PMPI_Barrier+0xe9) [0x29a1b9]

-Mike


Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston

Sometimes getting crashes:

mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 -hostfile 
/u/mhouston/mpihosts mpi_bandwidth 25 131072
mpirun noticed that job rank 0 with PID 10611 on node 
"spire-2.stanford.edu" exited on signal 11.

1 process killed (possibly by Open MPI).

The backtrace is bogus, else I'd drop it in.

Setting the number of messages <=10 always seems to work.

-Mike

Tim S. Woodall wrote:


Mike,

There appears to be an issue in our mvapi get protocol. To temporarily
disable this:

/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 
25 131072
131072  801.580272 (MillionBytes/sec)   764.446518(MegaBytes/sec)


Mike Houston wrote:
 


What's the ETA, or should I try grabbing from cvs?

-Mike

Tim S. Woodall wrote:


   


Mike,

I believe was probably corrected today and should be in the
next release candidate.

Thanks,
Tim

Mike Houston wrote:



 

Woops, spoke to soon.  The performance quoted was not actually going 
between nodes.  Actually using the network with the pinned option gives:


[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] 
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : 
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720


repeated many times.

-Mike

Mike Houston wrote:


 



   

That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:




   



 


Hello Mike,

Mike Houston wrote:





 



   

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?




 

   

 


You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




 

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


   

 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston

Better, but still having issues at lots of outstanding messages:

mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 
mpi_bandwidth 1000 131072

131072  669.574904 (MillionBytes/sec)   638.556389(MegaBytes/sec)

mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 
mpi_bandwidth 1 131072

131072  115.873284 (MillionBytes/sec)   110.505375(MegaBytes/sec)

Sorry to be such a pain...  We need the speed of MVAPICH with the 
threading support of OpenMPI...


-Mike


Tim S. Woodall wrote:


Mike,

There appears to be an issue in our mvapi get protocol. To temporarily
disable this:

/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 
25 131072
131072  801.580272 (MillionBytes/sec)   764.446518(MegaBytes/sec)


Mike Houston wrote:
 


What's the ETA, or should I try grabbing from cvs?

-Mike

Tim S. Woodall wrote:


   


Mike,

I believe was probably corrected today and should be in the
next release candidate.

Thanks,
Tim

Mike Houston wrote:



 

Woops, spoke to soon.  The performance quoted was not actually going 
between nodes.  Actually using the network with the pinned option gives:


[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] 
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : 
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720


repeated many times.

-Mike

Mike Houston wrote:


 



   

That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:




   



     


Hello Mike,

Mike Houston wrote:





 



   

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?




 

   

 


You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




 

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


   

 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





[O-MPI users] TCP problems

2005-10-31 Thread Mike Houston
I have things working now.  I needed to limit OpenMPI to actual working 
interfaces (thanks for the tip).  It still seems that should be figured 
out correctly...  Now I've moved onto stress testing with the bandwidth 
testing app I posted earlier in the Infiniband thread:


mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile 
/u/mhouston/mpihosts mpi_bandwidth 3750 262144


262144  109.697279 (MillionBytes/sec)   104.615478(MegaBytes/sec)

mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile 
/u/mhouston/mpihosts mpi_bandwidth 4000 262144
[spire-2.Stanford.EDU:06645] mca_btl_tcp_frag_send: writev failed with 
errno=104mpirun noticed that job rank 1 with PID 21281 on node 
"spire-3.stanford.edu" exited on signal 11.


Cranking up the number of messages in flight makes things really 
unhappy.  I haven't seen this behavior with LAM or MPICH so I thought 
I'd mention it.


Thanks!

-Mike


Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston

What's the ETA, or should I try grabbing from cvs?

-Mike

Tim S. Woodall wrote:


Mike,

I believe was probably corrected today and should be in the
next release candidate.

Thanks,
Tim

Mike Houston wrote:
 

Woops, spoke to soon.  The performance quoted was not actually going 
between nodes.  Actually using the network with the pinned option gives:


[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] 
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : 
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720


repeated many times.

-Mike

Mike Houston wrote:


   

That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:




     


Hello Mike,

Mike Houston wrote:


 



   

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?




   

 


You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2 
-hostfile /u/mhouston/mpihosts mpi_bandwidth 21 131072


131072  519.922184 (MillionBytes/sec)   495.836433(MegaBytes/sec)

mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2 
-hostfile /u/mhouston/mpihosts mpi_bandwidth 22 131072


131072  3.360296 (MillionBytes/sec) 3.204628(MegaBytes/sec)

Moving from 21 messages to 22 causes a HUGE drop in performance.  The 
app tries to send all of the messages non-blocking at once...  Setting 
-mca mpi_leave_pinned 1 causes:


[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73412fc


repeated until it eventually hangs.

-Mike

Mike Houston wrote:

Woops, spoke to soon.  The performance quoted was not actually going 
between nodes.  Actually using the network with the pinned option gives:


[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] 
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : 
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720


repeated many times.

-Mike

Mike Houston wrote:

 

That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:



   


Hello Mike,

Mike Houston wrote:


  

 

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?


 



   


You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


  

 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


   



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
Woops, spoke to soon.  The performance quoted was not actually going 
between nodes.  Actually using the network with the pinned option gives:


[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] 
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got 
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : 
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720


repeated many times.

-Mike

Mike Houston wrote:

That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:

 


Hello Mike,

Mike Houston wrote:


   

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?


  

 


You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


   



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
That seems to work with the pinning option enabled.  THANKS! 

Now I'll go back to testing my real code.  I'm getting 700MB/s for 
messages >=128KB.  This is a little bit lower than MVAPICH, 10-20%, but 
still pretty darn good.  My guess is that I can play with the setting 
more to tweak up performance.  Now if I can get the tcp layer working, 
I'm pretty much good to go.


Any word on an SDP layer?  I can probably modify the tcp layer quickly 
to do SDP, but I thought I would ask.


-Mike

Tim S. Woodall wrote:


Hello Mike,

Mike Houston wrote:
 

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?


   



You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 





Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston

I'll give it a go.  Attached is the code.

Thanks!

-Mike

Tim S. Woodall wrote:


Hello Mike,

Mike Houston wrote:
 

When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?


   



You might try running w/ the:

mpirun -mca mpi_leave_pinned 1

Which will cause mvapi port to maintain an mru cache of registrations,
rather than dynamically pinning/unpinning memory.

If this does not resolve the BW problems, try increasing the
resources allocated to each connection:

-mca btl_mvapi_rd_min 128
-mca btl_mvapi_rd_max 256

Also can you forward me a copy of the test code or a reference to it?

Thanks,
Tim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 




/*
 * Copyright (C) 2002-2003 the Network-Based Computing Laboratory
 * (NBCL), The Ohio State University.  
 */

#include "mpi.h"
#include 
#include 
#include 
#include 
#include 


#define MYBUFSIZE (4*1024*1028)
#define MAX_REQ_NUM 10

char s_buf1[MYBUFSIZE];
char r_buf1[MYBUFSIZE];


MPI_Request request[MAX_REQ_NUM];
MPI_Status stat[MAX_REQ_NUM];

int main(int argc,char *argv[])
{
int  myid, numprocs, i;
int size, loop, page_size;
char *s_buf, *r_buf;
double t_start=0.0, t_end=0.0, t=0.0;


MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if ( argc < 3 ) {
   fprintf(stderr, "Usage: bw loop msg_size\n");
   MPI_Finalize();
   return 0;
}
size=atoi(argv[2]);
loop = atoi(argv[1]);

if(size > MYBUFSIZE){
 fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE);
 MPI_Finalize();
 return 0;
}

if(loop > MAX_REQ_NUM){
 fprintf(stderr, "Maximum number of iterations is 
%d\n",MAX_REQ_NUM);
 MPI_Finalize();
 return 0;
}

page_size = getpagesize();

s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * 
page_size);
r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * 
page_size);

assert( (s_buf != NULL) && (r_buf != NULL) );

for ( i=0; i

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
Same performance problems with that fix.  In fact, if I ever use tcp 
currently, OpenMPI crashes...


-Mike

George Bosilca wrote:

If there are several networks available between 2 nodes they will get  
selected. That can lead to poor performances in the case when the  
second network is a high latency one (like TCP). If you want to  
insure that only the IB driver is loaded you have to add in  the 
.openmpi/mca-params.conf the following line:

btl_base_exclude=tcp
This will force the TCP driver to be unloaded (always). In order to  
use this option you have to have all nodes reacheable via IB.


  Thanks,
george.

On Oct 31, 2005, at 10:50 AM, Mike Houston wrote:

When only sending a few messages, we get reasonably good IB  
performance,

~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but
we really need the thread support in OpenMPI.

Ideas?  I noticed there are a plethora of runtime options configurable
for mvapi.  Do I need to tweak these to get performacne up?

Thanks!

-Mike
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran






[O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
When only sending a few messages, we get reasonably good IB performance, 
~500MB/s (MVAPICH is 850MB/s).  However, if I crank the number of 
messages up, we drop to 3MB/s(!!!).  This is with the OSU NBCL 
mpi_bandwidth test.  We are running Mellanox IB Gold 1.8 with 3.3.3 
firmware on PCI-X (Couger) boards.  Everything works with MVAPICH, but 
we really need the thread support in OpenMPI.


Ideas?  I noticed there are a plethora of runtime options configurable 
for mvapi.  Do I need to tweak these to get performacne up?


Thanks!

-Mike


[O-MPI users] TCP problems with 1.0rc4

2005-10-31 Thread Mike Houston
We can't seem to run across TCP.  We did a default 'configure'.  Shared 
memory seems to work, but trying tcp give us:


[0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113


I'm assuming that the tcp backend is the most thoroughly tested, so I 
thought I'd ask in case we are doing something silly.  The above is 
caused when running the OSU NBCL mpi_bandwidth test.


Thanks!

-Mike