Re: [gridengine users] Where do the factors for np_load_short come from?

Reuti Thu, 16 May 2013 11:24:20 -0700

Am 16.05.2013 um 20:05 schrieb Tim Landscheidt:

> Reuti <[email protected]> wrote:
> 
>>> we're using OGS/GE 2011.11 at toolserver.org, and unfortu-
>>> nately our admins are AWOL.  I'm trying to investigate why
>>> the grid is heavily underloaded (while queues are filling
>>> up).  A simple job gets queued and has scheduling_info:
> 
>>> | scheduling info:            queue instance 
>>> "[email protected]" dropped because it is temporarily not 
>>> available
>>> |                             queue instance 
>>> "[email protected]" dropped because it is temporarily not 
>>> available
>>> |                             queue instance 
>>> "[email protected]" dropped because it is temporarily not 
>>> available
>>> |                             queue instance 
>>> "[email protected]" dropped because it is temporarily not 
>>> available
>>> |                             queue instance 
>>> "[email protected]" dropped because it is disabled
>>> |                             queue instance 
>>> "[email protected]" dropped because it is disabled
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=0.845508 (= 0.645508 + 0.8 * 1.000000 with nproc=4) >= 0.75
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=0.831445 (= 0.231445 + 0.8 * 6.000000 with nproc=8) >= 0.75
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=1.245508 (= 0.645508 + 0.8 * 3.000000 with nproc=4) >= 1.2
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=1.231445 (= 0.231445 + 0.8 * 10.000000 with nproc=8) >= 1.2
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=1.202500 (= 0.002500 + 0.8 * 6.000000 with nproc=4) >= 1.2
>>> |                             queue instance 
>>> "[email protected]" dropped because it is full
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> mem_free=-173461503.737856 (= 13834.574219M - 500M * 28.000000) <= 500
>>> |                             queue instance 
>>> "[email protected]" dropped because it is overloaded: 
>>> np_load_short=3.202500 (= 0.002500 + 0.8 * 16.000000 with nproc=4) >= 3.1
> 
>>> For example, in queue instance
>>> [email protected], where do the factors 0.8
>>> and 6.000000 come from?  Neither "qconf -sconf global" nor
>>> "qconf -sconf yarrow" show anything obvious, and "qconf -sq
>>> medium-lx" only has load_thresholds with:
> 
>>> | [...]
>>> | load_thresholds       np_load_short=1.2,np_load_long=1.5,cpu=98, \
>>> |                       mem_free=1000M, \
>>> |                       
>>> [mayapple.toolserver.org=np_load_short=2.1,mem_free=300M]
>>> | [...]
> 
>>> to define the threshold, but not the calculation.  I believe
>>> the factors are applied at
>>> source/libs/sched/sge_select_queue.c:2057, but I don't want
>>> to read the whole source :-).  Are these factors some de-
>>> fault, or where should I look?
> 
>> What is defined in `qconf -ssconf` for "job_load_adjustments"?
> 
> | job_load_adjustments              
> tmp_free=150M,np_load_short=0.8,mem_free=500M
> 
> Ah!  So there's the 0.8.  Any idea for the 6.000000?


Is it still 6.000000? Please have a look at the line below 
"job_load_adjustments" and the explanation in `man sched_conf`.

-- Reuti


> 
> Thanks,
> Tim
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Where do the factors for np_load_short come from?

Reply via email to