Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Mark Bolstad
Some additional data: Without threads it still hangs, similar behavior as before. All of the tests were run on a system running FC11 with X5550 processors. I just reran on a node of a RHEL 5.3 cluster with E5530 processors (dual Nehalam): - openmpi 1.3.4 and gcc 4.1.2 - No issues:

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Eugene Loh
Gus Correa wrote: Why wouldn't shared memory work right on Nehalem? We don't know exactly what is driving this problem, but the issue appears to be related to memory fences. Messages have to be posted to a receiver's queue. By default, each process (since OMPI 1.3.2) has only one queue.

Re: [OMPI users] Notifier Framework howto

2009-12-10 Thread Jeff Squyres
On Dec 10, 2009, at 5:06 PM, Brock Palen wrote: > I would like to try out the notifier framework, problem is I am having > trouble finding documentation for it, I am digging around the website and > not finding much. > > Currently we have a problem where hosts are throwing up errors like: >

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Jeff Squyres
On Dec 10, 2009, at 5:53 PM, Gus Correa wrote: > How does the efficiency of loopback > (let's say, over TCP and over IB) compare with "sm"? Definitely not as good; that's why we have sm. :-) I don't have any quantification of that assertion, though (i.e., no numbers to back that up). > FYI,

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Gus Correa
Hi Jeff Thanks for jumping in! :) And for your clarifications too, of course. How does the efficiency of loopback (let's say, over TCP and over IB) compare with "sm"? FYI, I do NOT see the problem reported by Matthew et al. on our AMD Opteron Shanghai dual-socket quad-core. They run a quite

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Jonathan Dursi
Jeff Squyres wrote: Why wouldn't shared memory work right on Nehalem? (That is probably distressing for Mark, Matthew, and other Nehalem owners.) To be clear, we don't know that this is a Nehalem-specific problem. I have definitely had this problem on Harpertown cores. - Jonathan

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Jeff Squyres
On Dec 10, 2009, at 5:01 PM, Gus Correa wrote: > > Just a quick interjection, I also have a dual-quad Nehalem system, HT > > on, 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads > > --enable-mpi-f77=no --with-openib=no > > > > With v1.3.4 I see roughly the same behavior, hello,

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa) RESOLVED FOR NOW

2009-12-10 Thread Gus Correa
Hi Matthew, Mark, Mattijs Great news that a solution was found, actually two, which seem to have been around for a while. Thanks Mark and Mattijs posting the solutions. Much better that all can be solved by software, with a single mca parameter. A pity that it took a while for the actual

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Jeff Squyres
On Dec 10, 2009, at 5:01 PM, Gus Correa wrote: > A couple of questions to the OpenMPI pros: > If shared memory ("sm") is turned off on a standalone computer, > which mechanism is used for MPI communication? > TCP via loopback port? Other? Whatever device supports node-local loopback. TCP is

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Matthew MacManes
Hi All, I agree that the issue is troublesome. It apparently has been reported, and there is an active bug report, with some technical discussion of the underlying problems, found here: https://svn.open-mpi.org/trac/ompi/ticket/2043 For now, it is OK, but it is an issue that hopefully will

[OMPI users] Notifier Framework howto

2009-12-10 Thread Brock Palen
I would like to try out the notifier framework, problem is I am having trouble finding documentation for it, I am digging around the website and not finding much. Currently we have a problem where hosts are throwing up errors like: [nyx0891.engin.umich.edu][[25560,1],45][btl_tcp_endpoint.c:

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-10 Thread Kevin . Buckley
>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4 >> now that it available ? Might be a good time to check the fuzzz in the >> existing patches. > > http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile Just to say that I built the NetBSD OpenMPI 1.4

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Gus Correa
HI Mark, Matthew, list Oh well, Mark's direct experience on a Nehalem is a game changer, and his recommendation to turn off the shared memory feature may be the way to go for Matthew, at least to have things working. Thank you Mark, your interjection sheds new light on the awkward situation

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-10 Thread Jeff Squyres
On Dec 10, 2009, at 4:02 PM, Joshua Bernstein wrote: > > On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote: > > Given that we haven't moved this patch to the v1.4 branch yet (i.e., it's > > not > > yet in a nightly v1.4 tarball), probably the easiest thing to do is to apply > > the attached patch

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-10 Thread Joshua Bernstein
Jeff Squyres wrote: On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote: Given that we haven't moved this patch to the v1.4 branch yet (i.e., it's not yet in a nightly v1.4 tarball), probably the easiest thing to do is to apply the attached patch to a v1.4 tarball. I tried it with my PGI 10.0

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa) RESOLVED FOR NOW

2009-12-10 Thread Matthew MacManes
Mark, Exciting.. SOLVED.. There is an open ticket #2043 regarding Nehelem/OpenMPI/Hang problem (https://svn.open-mpi.org/trac/ompi/ticket/2043).. Seems like the problem might be specific to gcc4.4x and OMPI <1.3.2.. It seems like there is a group up us with dual socket nehalems trying to use

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Mattijs Janssens
On Thursday 10 December 2009 15:42:49 Mark Bolstad wrote: > Just a quick interjection, I also have a dual-quad Nehalem system, HT on, > 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads > --enable-mpi-f77=no --with-openib=no > > With v1.3.4 I see roughly the same behavior, hello,

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-10 Thread Mark Bolstad
Just a quick interjection, I also have a dual-quad Nehalem system, HT on, 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads --enable-mpi-f77=no --with-openib=no With v1.3.4 I see roughly the same behavior, hello, ring work, connectivity fails randomly with np >= 8. Turning on -v

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-10 Thread Jeff Squyres
On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote: > That's the commit message for r22273. Also see the commit message for r22274 > (https://svn.open-mpi.org/trac/ompi/changeset/22274). > > Meaning: the fix is now in the SVN trunk; it hasn't migrated over to the v1.4 > and v1.5 branches yet.

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-10 Thread Ashley Pittman
On Tue, 2009-12-08 at 10:14 +, Number Cruncher wrote: > Whilst MPI has traditionally been run on dedicated hardware, the rise of > cheap multicore CPUs makes it very attractive for ISVs such as ourselves > (http://www.cambridgeflowsolutions.com/) to build a *single* executable > that can be