[OMPI users] Threading fault(?)

2009-02-26 Thread Mahmoud Payami
Dear All, I have installed openmpi-1.3.1 (with the defaults), and built my application. The linux box is 2Xamd64 quad. In the middle of running of my application, I receive the message and stops. I tried to configure openmpi using "--disable-mpi-threads" but it automatically assumes "posix". This

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread George Bosilca
Last time I got such an error was when the shared libraries on my head node didn't match the one loaded by the compute nodes. It was a simple LD_LIBRARY_PATH mistake from my part. And it was the last time I didn't build my tree with --enable-mpirun-prefix-by-default. george. On Feb 26, 2

Re: [OMPI users] valgrind problems

2009-02-26 Thread Jeff Squyres
On Feb 26, 2009, at 10:27 PM, Justin wrote: Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any known issues with this version and valgrid? To be honest, the Debian OMPI maintainers will need to answer this. I'm afraid that it's been so long since I've used the 1.2 ser

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any known issues with this version and valgrid? Thanks, Justin Justin wrote: Is there any tricks to getting it to work? When we run with valgrind we get segfaults, valgrind reports errors in different MPI functions for exam

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
Is there any tricks to getting it to work? When we run with valgrind we get segfaults, valgrind reports errors in different MPI functions for example: ==3629== Invalid read of size 4 ==3629==at 0x1CF7AEEC: (within /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so) ==3629==by 0x1D9C23F4: mca

Re: [OMPI users] valgrind problems

2009-02-26 Thread Jeff Squyres
On Feb 26, 2009, at 7:03 PM, Justin wrote: I'm trying to use valgrind to check if we have any memory problems in our code when running with parallel processors. However, when I run using mpi and valgrind I crashes in various places. For example some of the times it will crash with a segf

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread Ralph Castain
FWIW: I tested the trunk tonight using both SLURM and rsh launchers, and everything checks out fine. However, this is running under SGE and thus using qrsh, so it is possible the SGE support is having a problem. Perhaps one of the Sun OMPI developers can help here? Ralph On Feb 26, 2009, at

[OMPI users] openib RETRY EXCEEDED ERROR

2009-02-26 Thread Brett Pemberton
Hey, I've had a couple of errors recently, of the form: [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0 --

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread Ralph Castain
It looks like the system doesn't know what nodes the procs are to be placed upon. Can you run this with --display-devel-map? That will tell us where the system thinks it is placing things. Thanks Ralph On Feb 26, 2009, at 3:41 PM, Mostyn Lewis wrote: Maybe it's my pine mailer. This is a N

[OMPI users] valgrind problems

2009-02-26 Thread Justin
Hi, I'm trying to use valgrind to check if we have any memory problems in our code when running with parallel processors. However, when I run using mpi and valgrind I crashes in various places. For example some of the times it will crash with a segfault within MPI_Allgatherv despite the fa

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread Mostyn Lewis
Maybe it's my pine mailer. This is a NAMD run on 256 procs across 32 dual-socket quad-core AMD shangai nodes running a standard benchmark called stmv. The basic error message, which occurs 31 times is like: [s0164:24296] [[64102,0],16] ORTE_ERROR_LOG: Not found in file ../../../.././orte/mca/o

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread Ralph Castain
I'm sorry, but I can't make any sense of this message. Could you provide a little explanation of what you are doing, what the system looks like, what is supposed to happen, etc? I can barely parse your cmd line... Thanks Ralph On Feb 26, 2009, at 1:03 PM, Mostyn Lewis wrote: Today's and

[OMPI users] Latest SVN failures

2009-02-26 Thread Mostyn Lewis
Today's and yesterdays. 1.4a1r20643_svn + mpirun --prefix /tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6 4/opteron -np 256 --mca btl sm,openib,self -x OMPI_MCA_btl_openib_use_eager_rdma -x OMPI_MCA_btl_ope nib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x

Re: [OMPI users] Memory leak in my code

2009-02-26 Thread Mark Allan
Eugene / Dorian, Thanks for the advice. I didn't appreciate that it was necessary to explicitly complete the first send by an MPI call. I assumed that when the second receive was complete the first send must also have been completed and all would be ok. In any case, I'm now using MPI_Probe to e

Re: [OMPI users] Memory leak in my code

2009-02-26 Thread doriankrause
Mark Allan wrote: Dear all, With this simple code I find I am getting a memory leak when I run on 2 processors. Can anyone advise why? I suspect the prototype of nonBlockingRecv is actually MPI_Request nonBlockingRecv(int **t, int &size, const int tag, const int senderRank) and in thi

Re: [OMPI users] Memory leak in my code

2009-02-26 Thread Eugene Loh
Mark Allan wrote: Dear all, With this simple code I find I am getting a memory leak when I run on 2 processors.  Can anyone advise why? I'm fairly new to MPI (have only done very simple things in the past).  I'm trying to do a non-blocking send

[OMPI users] Memory leak in my code

2009-02-26 Thread Mark Allan
Dear all, With this simple code I find I am getting a memory leak when I run on 2 processors.  Can anyone advise why? I'm fairly new to MPI (have only done very simple things in the past).  I'm trying to do a non-blocking send/recv (from any proc to any proc) but the receiving processor doesn'

[OMPI users] OMPI, and HPUX

2009-02-26 Thread Nader
Hello, Does anyone has installed OMPI on a HPUX system? I do apprciate any info. Best Regards. Nader

Re: [OMPI users] Problems in 1.3 loading shared libs when using VampirServer

2009-02-26 Thread Jeff Squyres
Schweet! Sorry it took so long to figure out... On Feb 26, 2009, at 7:54 AM, Kiril Dichev wrote: I am happy to confirm that Jeff's suggestion worked. The problem was following: in previous versions VampirServer issued ComLib = dlopen( driverName, RTLD_LAZY ); Changing this to following

Re: [OMPI users] Problems in 1.3 loading shared libs when using VampirServer

2009-02-26 Thread Kiril Dichev
I am happy to confirm that Jeff's suggestion worked. The problem was following: in previous versions VampirServer issued ComLib = dlopen( driverName, RTLD_LAZY ); Changing this to following fixed the problem: ComLib = dlopen( driverName, RTLD_LAZY | RTLD_GLOBAL ); The VampirServe

Re: [OMPI users] openmpi 1.2.9 with Xgrid support more information

2009-02-26 Thread Ricardo Fernández-Perea
Hi I have been looking to the code. and some documentation. By apple documentation the finalize method should never been call. so that seems to be erroneous the connection belong to an autorelease pool that is release just after it and due to the comment in the code /* need to shut down con

Re: [OMPI users] Ompi runs thru cmd line but fails when run thru SGE

2009-02-26 Thread Reuti
Hi, the daemons will fork into daemon land - no accounting, no control by SGE via qdel (nevertheless it runs, just not tightly integrated): https://svn.open-mpi.org/trac/ompi/ticket/1783 -- Reuti Am 26.02.2009 um 06:13 schrieb Sangamesh B: Hello Reuti, I'm sorry for the late response

Re: [OMPI users] openmpi 1.2.9 with Xgrid support more information

2009-02-26 Thread Ricardo Fernández-Perea
Yes Brian Its in Leopard. thanks for your interest. Ricardo On Wed, Feb 25, 2009 at 9:45 PM, Brian W. Barrett wrote: > Ricardo - > > That's really interesting. THis is on a Leopard system, right? I'm the > author/maintainer of the xgrid code. Unfortunately, I've been hiding trying > to finis

Re: [OMPI users] Ompi runs thru cmd line but fails when run thru SGE

2009-02-26 Thread Sangamesh B
Hello Reuti, I'm sorry for the late response. On Mon, Jan 26, 2009 at 7:11 PM, Reuti wrote: > Am 25.01.2009 um 06:16 schrieb Sangamesh B: > >> Thanks Reuti for the reply. >> >> On Sun, Jan 25, 2009 at 2:22 AM, Reuti wrote: >>> >>> Am 24.01.2009 um 17:12 schrieb Jeremy Stout: >>> The RLI