This might or might not be related to openmpi 1.8.1. I have not seen the
problem with the same program and previous versions of openmpi
We have 64 core AMD nodes. I have recently recompiled a large Monte Carlo
program using 1.8.1 version of openmpi. Users start this program using
maui/torque asking for a number of cores, usually on only one node. One run of
the program asking for any number of cores up to 64 runs with full cpu
utilisation on each core. A user might start a run asking for 16 cores - fine.
Then he starts a second run on the same node, asking for another 16 cores.
Immediately the cpu utilisation on all cores of the first job drops to 50%, as
it is for the newly starting job. If a different program were using the
remaining 32 cores on the same node at the same time, the cpu utilisation of
its cores is unaffected. If we qdel the second 16 core job, the cpu utilisation
of each core of the first job immediately climbs back to 100%. Any suggestions
please, on where I might start looking for the solution to this problem?
Greg Doherty
ANSTO