Re: [OMPI users] Large TCP cluster timeout issue

2011-09-20 Thread Ralph Castain
Truly am sorry about that - we were just talking today about the need to update and improve our FAQ on running on large clusters. Did you by any chance look at it? Would appreciate any thoughts on how it should be improved from a user's perspective. On Sep 20, 2011, at 3:28 PM, Henderson,

Re: [OMPI users] Large TCP cluster timeout issue

2011-09-20 Thread Henderson, Brent
Nope, but if I didn't that would have saved me about an hour of coding time! I'm still curious if it would be beneficial to inject some barriers at certain locations so that if you had a slow node, not everyone would end up connecting to it all at once. Anyway, if I get access to another large

Re: [OMPI users] Large TCP cluster timeout issue

2011-09-20 Thread Ralph Castain
Hmmmperhaps you didn't notice the mpi_preconnect_all option? It does precisely what you described - it pushes zero-byte messages around a ring to force all the connections open at MPI_Init. On Sep 20, 2011, at 3:06 PM, Henderson, Brent wrote: > I recently had access to a 200+ node Magny

[OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-20 Thread Blosch, Edwin L
I'm having trouble building 1.4.3 using PGI 10.9. I searched the list archives briefly but I didn't stumble across anything that looked like the same problem, so I thought I'd ask if an expert might recognize the nature of the problem here. The configure command: ./configure

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Here is a diff -y output of the compilation of one of the program's files. The one on the left is OpenMPI mpif90, the one on the right is MVAPICH mpif90. Does that suggest perhaps I should try adding -fPIC to the OpenMPI-linked compilation?

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Thank you for this explanation. I will assume that my problem here is some kind of memory corruption. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Tim Prince Sent: Tuesday, September 20, 2011 10:36 AM To: us...@open-mpi.org

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Tim Prince
On 9/20/2011 10:50 AM, Blosch, Edwin L wrote: It appears to be a side effect of linkage that is able to change a compute-only routine's answers. I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of corruption may be going on. Those intrinsics have direct

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Reuti
Am 20.09.2011 um 16:50 schrieb Blosch, Edwin L: > Thank you all for the replies. > > Certainly optimization flags can be useful to address differences between > compilers, etc. And differences in MPI_ALLREDUCE are appreciated as possible. > But I don't think either is quite relevant because:

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-20 Thread Gus Correa
Ole Nielsen wrote: Thanks for your suggestion Gus, we need a way of debugging what is going on. I am pretty sure the problem lies with our cluster configuration. I know MPI simply relies on the underlying network. However, we can ping and ssh to all nodes (and in between and pair as well) so

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Eugene Loh
I've not been following closely. How do you know you're using the identical compilation flags? Are you saying you specify the same flags to "mpicc" (or whatever) or are you confirming that the back-end compiler is seeing the same flags? The MPI compiler wrapper (mpicc, et al.) can add

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Thank you all for the replies. Certainly optimization flags can be useful to address differences between compilers, etc. And differences in MPI_ALLREDUCE are appreciated as possible. But I don't think either is quite relevant because: - It was exact same compiler, with identical compilation

Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Samuel K. Gutierrez
Hi, Maybe you can leverage some of the techniques outlined in: Robert W. Robey, Jonathan M. Robey, and Rob Aulwes. 2011. In search of numerical consistency in parallel programming. Parallel Comput. 37, 4-5 (April 2011), 217-229. DOI=10.1016/j.parco.2011.02.009

Re: [OMPI users] Open MPI and Objective C

2011-09-20 Thread Barrett, Brian W
The problem you're running into is not due to Open MPI. The Objective C and C compilers on OS X (and most platforms) are the same binary, so you should be able to use mpicc without any problems. It will see the .m extension and switch to Objective C mode. However, NSLog is in the Foundation

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-20 Thread Rolf vandeVaart
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't >happen until the third iteration. I take that to mean that the basic >communication works, but that something is saturating. Is there some notion >of buffer size somewhere in the MPI system that could explain this? >

Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Reuti
Am 20.09.2011 um 13:52 schrieb Tim Prince: > On 9/20/2011 7:25 AM, Reuti wrote: >> Hi, >> >> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L: >> >>> I am observing differences in floating-point results from an application >>> program that appear to be related to whether I link with OpenMPI

Re: [OMPI users] Latency of 250 microseconds with Open-MPI 1.4.3, Mellanox Infiniband and 256 MPI ranks

2011-09-20 Thread Yevgeny Kliteynik
Hi Sébastien, If I understand you correctly, you are running your application on two different MPIs on two different clusters with two different IB vendors. Could you make a comparison more "apples to apples"-ish? For instance: - run the same version of Open MPI on both clusters - run the same

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-20 Thread Jeff Squyres
On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote: > Hi all - and sorry for the multiple postings, but I have more information. +1 on Eugene's comments. The test program looks fine to me. FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler allows you to just: mpicc

Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Jeff Squyres
On Sep 20, 2011, at 7:52 AM, Tim Prince wrote: > Quoted comment from OP seem to show a somewhat different question: Does > OpenMPI implement any operations in a different way from MVAPICH? I would > think it probable that the answer could be affirmative for operations such as > allreduce, but

Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Tim Prince
On 9/20/2011 7:25 AM, Reuti wrote: Hi, Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L: I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same