Yes, I'm using SGE. I also just noticed that when 2 tasks/slots run on a 4-core node, the 2 tasks are still cycling between run and sleep, with higher system time than user time.
Ompi_info shows the MCA parameter mpi_yield_when_idle to be 0 (aggressive), so that suggests the tasks aren't swapping out on bloccking calls. Still puzzled. Thanks, Todd On 3/22/07 7:36 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > Are you using a scheduler on your system? > > More specifically, does Open MPI know that you have for process slots > on each node? If you are using a hostfile and didn't specify > "slots=4" for each host, Open MPI will think that it's > oversubscribing and will therefore call sched_yield() in the depths > of its progress engine. > > > On Mar 21, 2007, at 5:08 PM, Heywood, Todd wrote: > >> P.s. I should have said this this is a pretty course-grained >> application, >> and netstat doesn't show much communication going on (except in >> stages). >> >> >> On 3/21/07 4:21 PM, "Heywood, Todd" <heyw...@cshl.edu> wrote: >> >>> I noticed that my OpenMPI processes are using larger amounts of >>> system time >>> than user time (via vmstat, top). I'm running on dual-core, dual-CPU >>> Opterons, with 4 slots per node, where the program has the nodes to >>> themselves. A closer look showed that they are constantly >>> switching between >>> run and sleep states with 4-8 page faults per second. >>> >>> Why would this be? It doesn't happen with 4 sequential jobs >>> running on a >>> node, where I get 99% user time, maybe 1% system time. >>> >>> The processes have plenty of memory. This behavior occurs whether >>> I use >>> processor/memory affinity or not (there is no oversubscription). >>> >>> Thanks, >>> >>> Todd >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >