This might or might not be related to openmpi 1.8.1. I have not seen the 
problem with the same program and previous versions of openmpi
We have 64 core AMD nodes. I have recently recompiled  a large Monte Carlo 
program using 1.8.1 version of openmpi. Users start this program using 
maui/torque asking for a number of cores, usually on only one node. One run of 
the program asking for any number of cores up to 64 runs with full cpu 
utilisation on each core. A user might start a run asking for 16 cores - fine. 
Then he starts a second run on the same node, asking for another 16 cores. 
Immediately the cpu utilisation on all cores of the first job drops to 50%, as 
it is for the newly starting job. If a different program were using the 
remaining 32 cores on the same node at the same time, the cpu utilisation of 
its cores is unaffected. If we qdel the second 16 core job, the cpu utilisation 
of each core of the first job immediately climbs back to 100%. Any suggestions 
please, on where I might start looking for the solution to this problem?
Greg Doherty
ANSTO

Reply via email to