[OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-12 Thread Allen Barnett
Hi: I recently tried to build my MPI application against OpenMPI 1.3.3. It worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way through. It does a fair amount of comm, but eventually it stops in a Send/Recv point-to-point exchange. If I turn off the openib btl, it runs to completion.

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
Hmmm...well, I'm going to ask our TCP friends for some help here. Meantime, I do see one thing that stands out. Port 4 is an awfully low port number that usually sits in the reserved range. I checked the / etc/services file on my Mac, and it was commented out as unassigned, which should mean

Re: [OMPI users] orte_launch_agent usage?

2009-08-12 Thread Ralph Castain
Okay - let me debug this. It is likely broken, but I can get the fix into 1.3.4 (probably coming out fairly soon) Will update shortly. On Aug 12, 2009, at 6:26 PM, Kenneth Yoshimoto wrote: This is 1.3.3. I would like to specify the path to orted on different sets of nodes. Thanks, Kenneth

Re: [OMPI users] orte_launch_agent usage?

2009-08-12 Thread Kenneth Yoshimoto
This is 1.3.3. I would like to specify the path to orted on different sets of nodes. Thanks, Kenneth On Wed, 12 Aug 2009, Ralph Castain wrote: Date: Wed, 12 Aug 2009 17:03:17 -0600 From: Ralph Castain To: Kenneth Yoshimoto , Open MPI Users Subject: Re: [OMPI users] orte_launch_agent usage?

Re: [OMPI users] orte_launch_agent usage?

2009-08-12 Thread Ralph Castain
This is using 1.3.3, devel trunk, ...?? I doubt anyone has really tested it in a long time as everyone just uses the default orted - are you just trying to see if it works, or are you trying your own orted out? On Aug 12, 2009, at 4:04 PM, Kenneth Yoshimoto wrote: If I use -mca orte_lau

[OMPI users] orte_launch_agent usage?

2009-08-12 Thread Kenneth Yoshimoto
If I use -mca orte_launch_agent /home/kenneth/info/openmpi/install/bin/orted, I get an error: ... bash: -c: line 0: `( test ! -r ./.profile || . ./.profile; PATH=/home/kenneth/info/openmpi/install/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/home/kenneth/info/openmpi/install/lib:$LD_LIBRARY_PATH

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Gus Correa
Hi Jody Jody Klymak wrote: On Aug 11, 2009, at 18:55 PM, Gus Correa wrote: Did you wipe off the old directories before reinstalling? Check. I prefer to install on a NFS mounted directory, Check Have you tried to ssh from node to node on all possible pairs? check - fixed this toda

Re: [OMPI users] init failing

2009-08-12 Thread Dominik Táborský
Update: No, still it ain't work. I have been trying setups with different env. variables like OPAL_PREFIX but it just gets me the same error all over again. Also I've been trying to compile the package but I didn't even get over configure script. I got stuck with configure being unable to compute

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 12, 2009, at 12:46 PM, Jody Klymak wrote: So I think ranks 0 and 2 are on xserve02 and rank 1 is on xserve01, Should read xserve03, -- Jody Klymak http://web.uvic.ca/~jklymak/

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 12, 2009, at 12:31 PM, Ralph Castain wrote: Well, it is getting better! :-) On your cmd line, what btl's are you specifying? You should try -mca btl sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback on the node. What I see below indicates that inter-node

[OMPI users] configure OPENMPI with DMTCP

2009-08-12 Thread Kritiraj Sajadah
HI, I want to configure OPENMPI to checkpoint MPI applications using DMTCP. Does anyone know how to specify the path to the DMTCP application when installing OPENMPI. Also, I wanted to use OPENMPI with SELF instead of BLCR. Is there any guide for setting up OPENMPI with SELF? Thanks a lot.

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
Well, it is getting better! :-) On your cmd line, what btl's are you specifying? You should try -mca btl sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback on the node. What I see below indicates that inter-node comm was fine, but the two procs that share a node couldn't co

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
Hi Ralph, That gives me something more to work with... On Aug 12, 2009, at 9:44 AM, Ralph Castain wrote: I believe TCP works fine, Jody, as it is used on Macs fairly widely. I suspect this is something funny about your installation. One thing I have found is that you can get this error me

Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler

2009-08-12 Thread Sims, James S. Dr.
Sorry, I don't understand what you want me to do. I assume you want me to run the app on n296 as rank 0 and run the app on n298 as rank 1, but I don't know how to do that outside of either torque or mpirun -hostfile Jim P.S. O tried -x LD_LIBRARY_PATH and it doesn't work. ___

Re: [OMPI users] Memchecker and Wait

2009-08-12 Thread Shiqing Fan
Hi Allen, Sorry for the confusion, your application doesn't use non-blocking communications, so the receive buffers are still valid after you call MPI_Recv_init, that's why the first two printf didn't complain. But in MPI_Wait, it still checks the buffer, and make it invalid after packing the

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Ralph Castain
I believe TCP works fine, Jody, as it is used on Macs fairly widely. I suspect this is something funny about your installation. One thing I have found is that you can get this error message when you have multiple NICs installed, each with a different subnet, and the procs try to connect across dif

Re: [OMPI users] tcp connectivity OS X and 1.3.3

2009-08-12 Thread Jody Klymak
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote: Did you wipe off the old directories before reinstalling? Check. I prefer to install on a NFS mounted directory, Check Have you tried to ssh from node to node on all possible pairs? check - fixed this today, works fine with the spawni

[OMPI users] Totalview and OpenMPI problem solved

2009-08-12 Thread Gabriele Fatigati
Dear OpenMPI developers, referred to the follow problem: http://openmpi.igor.onlinedirect.bg/faq/?category=troubleshooting#parallel-debugger-attach me and Cristiano Calonaci have compiled openmpi 1.3.3 with intel 11 and runs an example under Totalview 8.6. The problem below we solved settings th

Re: [OMPI users] PGI-9.0: -lpthread instead of -pthread

2009-08-12 Thread Gus Correa
Hi Jalel, list This is a libtool problem, I was told. I had the same problem with PGI 8.0-4 and OpenMPI 1.2.8 to 1.3.2 (I haven't tried 1.3.3. yet). From what you say, apparently the problem is still there on OpenMPI 1.3.3, PGI 9.0-1, and whatever libtool you have in your system. The workarou

Re: [OMPI users] strange IMB runs

2009-08-12 Thread Michael Di Domenico
So pushing this along a little more running with openmpi-1.3 svn rev 20295 mpirun -np 2 -mca btl sm,self -mca mpi_paffinity_alone 1 -mca mpi_leave_pinned 1 -mca btl_sm_eager_limit 8192 $PWD/IMB-MPI1 pingpong Yields ~390MB/sec So we're getting there, but still only about half speed On

Re: [OMPI users] Hooks for Collective's sync/transfer activity?

2009-08-12 Thread Eugene Loh
Manfred Muecke wrote: I would like to understand in more detail how much time some collective communication calls really spend waiting for the last process to enter. I know this can be done by logging entry times for each process, but I wonder if there is a better and more efficient way. "Bette

Re: [OMPI users] Hooks for Collective's sync/transfer activity?

2009-08-12 Thread Rainer Keller
Hello Manfred, this is more a MPI-Standardization question, Open MPI happens to be (the only?) one implementation providing Peruse, While there are people using Peruse event tracing to collect information on collectives in Open MPI, however, these are not in trunk. The specification itself has n

[OMPI users] Hooks for Collective's sync/transfer activity?

2009-08-12 Thread Manfred Muecke
Hello *, I would like to understand in more detail how much time some collective communication calls really spend waiting for the last process to enter. I know this can be done by logging entry times for each process, but I wonder if there is a better and more efficient way. The peruse interface

Re: [OMPI users] Memchecker and Wait

2009-08-12 Thread Allen Barnett
Hi Shiqing: That is very clever to invalidate the buffer memory until the comm completes! However, I guess I'm still confused by my results. Lines 30 and 31 identified by valgrind are the lines after the Wait, and, if I comment out the prints before the Wait, I still get the valgrind errors on the

Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler

2009-08-12 Thread Ralph Castain
We use Torque with OMPI here on almost every cluster, running 64-bit jobs with the Intel compilers, so I doubt the problem is with Torque. It is probably an issue with library paths. Torque doesn't automatically forward your environment, nor does it execute your remote .bashrc (or equivalen

[OMPI users] PGI-9.0: -lpthread instead of -pthread

2009-08-12 Thread Jalel Chergui (LIMSI-CNRS)
Hello, Trying to link OpenMPI-1.3.3 with PGI 9.0-1 and got the following error : # ./configure --prefix=/opt/ofed/mpi/pgi/openmpi-1.3.3 --with-openib=/opt/ofed FC=pgf95 CC=gcc CXX=g++ # make [...] libtool: link: pgf95 -shared -fpic -Mnomain .libs/mpi.o .libs/mpi_sizeof.o .libs/mpi_comm_spaw

Re: [OMPI users] Memchecker and Wait

2009-08-12 Thread Shiqing Fan
Hi Allen, The invalid reads come from line 30 and 31 of your code, and I guess they are the two 'printf's before MPI_Wait. In Open MPI, when memchecker is enabled, OMPI marks the receive buffer as invalid internally, immediately after receive starts for MPI semantic checks, in this case, it

Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler

2009-08-12 Thread Sims, James S. Dr.
Back to this problem. The last suggestion was to upgrade to 1.3.3, which has been done. Still cannot get this code to run in 64 bit mode with torque. What I can do is run the job in l6 bit mode using a hostfile. Specifically, if I use qsub -I -l nodes=2:ppn=1 torque allocates two nodes to the