Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

Jeff Squyres Thu, 10 Dec 2009 17:57:32 -0500

On Dec 10, 2009, at 5:53 PM, Gus Correa wrote:

> How does the efficiency of loopback
> (let's say, over TCP and over IB) compare with "sm"?


Definitely not as good; that's why we have sm.  :-)  I don't have any 
quantification of that assertion, though (i.e., no numbers to back that up).

> FYI, I do NOT see the problem reported by Matthew et al.
> on our AMD Opteron Shanghai dual-socket quad-core.
> They run a quite outdated
> CentOS kernel 2.6.18-92.1.22.el5, with gcc 4.1.2.
> and OpenMPI 1.3.2.
> (I've been lazy to upgrade, it is a production machine.)
> 
> I could run all three OpenMPI test programs (hello_c, ring_c, and
> connectivity_c) on all 8 cores on a single node WITH "sm" turned ON
> with no problem whatsoever.

Good.

> Moreover, all works fine if I oversuscribe up to 256 processes on
> one node.
> Beyond that I get segmentation fault (not hanging) sometimes,
> but not always.
> I understand that extreme oversubscription is a no-no.

It's quite possible that extreme oversubscription and/or that many procs in sm 
have not been well-tested.

> Moreover, on the screenshots that Matthew posted, the cores
> were at 100% CPU utilization on the simple connectivity_c
> (although this was when he had "sm" turned on on Nehalem).
> On my platform I don't get anything more than 3% or so.

100% CPU utilization usually means that some completion hasn't occurred that 
was expected and therefore everything is spinning waiting for that completion.  
The "hasn't occurred" bit is probably the bug here -- it's likely that there 
should have been a completion that somehow got missed.  But this is speculative 
-- we're still investigating...

-- 
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

Reply via email to