On Thu, 01 Jun 2006 18:07:07 -0600, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

This *sounds* like the classic oversubscription problem: Open MPI's
aggressive vs. degraded operating modes:

http://www.open-mpi.org/faq/?category=running#oversubscribing

Good link; bookmarked for (internal) documentation...

Specifically, "slots" is *not* meant to be the number of processes to
run.  It's meant to be how many processors are available to run.  Hence,
if you lie and tell OMPI that you have more slots than CPUs, OMPI will
think that it can run in aggressive mode.  But you'll have less
processors than processes, and all of them will be running in aggressive
mode -- hence, massive slowdown.

However, you say that you've got 2 dual core opterons in a single box,
so there should be 4 processors.  Hence "slots=4" should not be a lie.

It's good to hear that my concept of slots wasn't off. (Although my message didn't give that impression...) It certainly seems to me that with two dual cores I should use slots=4.

I can't think of why this would happen.

Can you confirm that your Linux installation thinks that it has 4
processors and will schedule 4 processes simultaneously?

Fun story: At first, *I* thought it was a simple case of two single-core processors. (slots=2, and I used two nodes to get 4 CPUs) I believed it had only two processors because `cat /proc/cpuinfo` would list two processors: CPU0 and CPU1. (ie. the Linux installation doesn't see four processors, but two dual-core processors.)

Then somebody pointed out to me they were dual core, and that cpuinfo listed it:
******
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : unknown
stepping        : 2
cpu MHz         : 2613.419
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2     <----- Two cores -------
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm
bogomips        : 5227.16
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
******
To verify that it acted like it had four cores, I tried the following:
(using two nodes in the machinefile, each with slots=2)
1.) Start a 4 CPU linpack job. (Supposedly using half of the CPU power in each machine) * With just 4 processes in total, the problem size took approximately 0.08 s to finish (repeatably; the HPL.dat is set to run several of the same problem size.) * 'top' listed *two* CPU's, both pegged at 100%. Each hpl process was taking 100% of the CPU. 2.) Start a second 4 CPU linpack job (using the other half of the CPU power) * When I started the second job (8 total processes, 4 in each job), the same problem size started to take 0.19 s to complete (on both jobs) * 'top' listed *two* CPU's, both pegged at 100%. Each hpl process was taking 50% of the CPU.
************
Then, I tried the same 4 process linpack job on a single node (one node in the machinefile, slots=2) The results were essentially identical to #2 above (where the node was still running 4 processes)

So it seems that although the system has dual-core CPU's, only one core is being used per CPU; so four simultaneous processes are not being scheduled.

So the oversubscription hypothesis appears to be 100% correct; slots=4 is oversubscribing the job.

Now I get to go find out *why* the job is oversubscribed, since there are 4 cores able to handle the process... I'll have to see if the system behaves similarly with non-mpi processes (ie. it doesn't use all of the available cores). It may very well be a problem with the hardware or OS; it's the pre-release distro I wrote about in another posting yesterday...

I'm wondering if there is something happening behind the scenes... I'll have to check...
--
Troy Telford

Reply via email to