Re: [OMPI users] trouble using --mca mpi_yield_when_idle 1

Eugene Loh Fri, 12 Dec 2008 17:13:59 -0500

Jeff Squyres wrote:

On Dec 12, 2008, at 11:46 AM, Eugene Loh wrote:
FWIW, I've run into the need for this a few times due to HPCC testson large (>100 MPI procs) nodes or multicore systems. HPCC (amongother things) looks at the performance of a single process while allother np-1 processes spinwait -- or of a single pingpong pair whileall other np-2 processes wait. I'm not 100% sure what's going on,but I'm guessing that the hard spinning of waiting processes hitsthe memory system or some other resource, degrading the performanceof working processes. This is on nodes that are not oversubscribed.
I guess I could <waving hands> see how shmem kinds of communicationcould lead to this kind of bottleneck, and that increasing corecounts would magnify the effect. It would be good to understand ifshmem activity is the cause of the slowdown to know if this is agood rationale datapoint for whether we should do blocking progress(or, more specifically, whether we need to increase the priority ofimplementing blocking progress).

I don't understand all of what's going on here, but I/we've seen thissort of "catastrophic degradation" on two large (>100 processes) nodesof rather different architectures. Prototypes seem to indicate thatblocking *or* directed polling seems to address the problem, but thoseare preliminary findings that are not backed up by sound understandingof what's going on under the hood. Yes, still handwaving.

Re: [OMPI users] trouble using --mca mpi_yield_when_idle 1

Reply via email to