Some additional data:

Without threads it still hangs, similar behavior as before.

All of the tests were run on a system running FC11 with X5550 processors.

I just reran on a node of a RHEL 5.3 cluster with E5530 processors (dual
Nehalam):
 - openmpi 1.3.4 and gcc 4.1.2
     - No issues: connectivity_c works through np = 128

 - openmpi 1.3.4 and gcc 4.4.0
     - Hangs as before

Anything else you want me to try;-)?

Mark

On Thu, Dec 10, 2009 at 5:20 PM, Jeff Squyres <jsquy...@cisco.com> wrote:

> On Dec 10, 2009, at 5:01 PM, Gus Correa wrote:
>
> > > Just a quick interjection, I also have a dual-quad Nehalem system, HT
> > > on, 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads
> > > --enable-mpi-f77=no --with-openib=no
> > >
> > > With v1.3.4 I see roughly the same behavior, hello, ring work,
> > > connectivity fails randomly with np >= 8. Turning on -v increased the
> > > success, but still hangs. np = 16 fails more often, and the hang is
> > > random in which pair of processes are communicating.
> > >
> > > However, it seems to be related to the shared memory layer problem.
> > > Running with -mca btl ^sm works consistently through np = 128.
>
> Note, too, that --enable-mpi-threads "works" but I would not say that it is
> production-quality hardened yet.  IBM is looking into thread safety issues
> to harden up this code.  If the same hangs can be observed without
> --enable-mpi-threads, that would be a good data point.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to