Some additional data:
Without threads it still hangs, similar behavior as before.
All of the tests were run on a system running FC11 with X5550 processors.
I just reran on a node of a RHEL 5.3 cluster with E5530 processors (dual
Nehalam):
- openmpi 1.3.4 and gcc 4.1.2
- No issues:
Gus Correa wrote:
Why wouldn't shared memory work right on Nehalem?
We don't know exactly what is driving this problem, but the issue
appears to be related to memory fences. Messages have to be posted to a
receiver's queue. By default, each process (since OMPI 1.3.2) has only
one queue.
On Dec 10, 2009, at 5:06 PM, Brock Palen wrote:
> I would like to try out the notifier framework, problem is I am having
> trouble finding documentation for it, I am digging around the website and
> not finding much.
>
> Currently we have a problem where hosts are throwing up errors like:
>
On Dec 10, 2009, at 5:53 PM, Gus Correa wrote:
> How does the efficiency of loopback
> (let's say, over TCP and over IB) compare with "sm"?
Definitely not as good; that's why we have sm. :-) I don't have any
quantification of that assertion, though (i.e., no numbers to back that up).
> FYI,
Hi Jeff
Thanks for jumping in! :)
And for your clarifications too, of course.
How does the efficiency of loopback
(let's say, over TCP and over IB) compare with "sm"?
FYI, I do NOT see the problem reported by Matthew et al.
on our AMD Opteron Shanghai dual-socket quad-core.
They run a quite
Jeff Squyres wrote:
Why wouldn't shared memory work right on Nehalem?
(That is probably distressing for Mark, Matthew, and other Nehalem owners.)
To be clear, we don't know that this is a Nehalem-specific problem.
I have definitely had this problem on Harpertown cores.
- Jonathan
On Dec 10, 2009, at 5:01 PM, Gus Correa wrote:
> > Just a quick interjection, I also have a dual-quad Nehalem system, HT
> > on, 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads
> > --enable-mpi-f77=no --with-openib=no
> >
> > With v1.3.4 I see roughly the same behavior, hello,
Hi Matthew, Mark, Mattijs
Great news that a solution was found, actually two,
which seem to have been around for a while.
Thanks Mark and Mattijs posting the solutions.
Much better that all can be solved by software,
with a single mca parameter.
A pity that it took a while for the actual
On Dec 10, 2009, at 5:01 PM, Gus Correa wrote:
> A couple of questions to the OpenMPI pros:
> If shared memory ("sm") is turned off on a standalone computer,
> which mechanism is used for MPI communication?
> TCP via loopback port? Other?
Whatever device supports node-local loopback. TCP is
Hi All,
I agree that the issue is troublesome. It apparently has been reported, and
there is an active bug report, with some technical discussion of the underlying
problems, found here: https://svn.open-mpi.org/trac/ompi/ticket/2043
For now, it is OK, but it is an issue that hopefully will
I would like to try out the notifier framework, problem is I am having
trouble finding documentation for it, I am digging around the website
and not finding much.
Currently we have a problem where hosts are throwing up errors like:
[nyx0891.engin.umich.edu][[25560,1],45][btl_tcp_endpoint.c:
>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4
>> now that it available ? Might be a good time to check the fuzzz in the
>> existing patches.
>
> http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile
Just to say that I built the NetBSD OpenMPI 1.4
HI Mark, Matthew, list
Oh well, Mark's direct experience on a Nehalem
is a game changer, and his recommendation to turn off the shared
memory feature may be the way to go for Matthew, at least to have
things working.
Thank you Mark, your interjection sheds new light on the awkward
situation
On Dec 10, 2009, at 4:02 PM, Joshua Bernstein wrote:
> > On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote:
> > Given that we haven't moved this patch to the v1.4 branch yet (i.e., it's
> > not
> > yet in a nightly v1.4 tarball), probably the easiest thing to do is to apply
> > the attached patch
Jeff Squyres wrote:
On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote:
Given that we haven't moved this patch to the v1.4 branch yet (i.e., it's not
yet in a nightly v1.4 tarball), probably the easiest thing to do is to apply
the attached patch to a v1.4 tarball. I tried it with my PGI 10.0
Mark,
Exciting.. SOLVED.. There is an open ticket #2043 regarding
Nehelem/OpenMPI/Hang problem (https://svn.open-mpi.org/trac/ompi/ticket/2043)..
Seems like the problem might be specific to gcc4.4x and OMPI <1.3.2.. It seems
like there is a group up us with dual socket nehalems trying to use
On Thursday 10 December 2009 15:42:49 Mark Bolstad wrote:
> Just a quick interjection, I also have a dual-quad Nehalem system, HT on,
> 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads
> --enable-mpi-f77=no --with-openib=no
>
> With v1.3.4 I see roughly the same behavior, hello,
Just a quick interjection, I also have a dual-quad Nehalem system, HT on,
24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads
--enable-mpi-f77=no --with-openib=no
With v1.3.4 I see roughly the same behavior, hello, ring work, connectivity
fails randomly with np >= 8. Turning on -v
On Dec 9, 2009, at 4:36 PM, Jeff Squyres wrote:
> That's the commit message for r22273. Also see the commit message for r22274
> (https://svn.open-mpi.org/trac/ompi/changeset/22274).
>
> Meaning: the fix is now in the SVN trunk; it hasn't migrated over to the v1.4
> and v1.5 branches yet.
On Tue, 2009-12-08 at 10:14 +, Number Cruncher wrote:
> Whilst MPI has traditionally been run on dedicated hardware, the rise of
> cheap multicore CPUs makes it very attractive for ISVs such as ourselves
> (http://www.cambridgeflowsolutions.com/) to build a *single* executable
> that can be
20 matches
Mail list logo