On Thu, Mar 10, 2011 at 03:30:42PM -0500, Lane Schwartz wrote:
Hi,
I've been unable to find a good definition for np_load_avg.
It's the 5-minute load average, divided by the number of CPUs detected
on the system. Buried somewhere in the docs this is
mentioned, but it's hard to find.
There is also np_load_short and np_load_long, which are the 1- and 15-
minute averages respectively. The np_load_medium value is the same as
np_load_avg.
My first thought was that this value was defined for each compute
node, and the value should be the total system load reported by the
compute node divided by the number of cores on that compute node.
It should be "close". There is a reporting delay that can create a
difference between what SGE uses, and the value on the compute node.
A colleague of mine thought that this value was defined at the queue
level, and the value should be the sum of the total system loads
reported by all compute nodes in the queue, divided by the number of
compute nodes in the queue.
What does np_load_avg really mean?
It's a host-level value.
If you ever need the "real" load average, SGE also has
"load_{avg,short,medium,long}" which do *not* adjust for the number of
CPUs. These number should more closely match what is actually reported
by the OS on the node.
--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users