Am 16.05.2013 um 20:05 schrieb Tim Landscheidt: > Reuti <[email protected]> wrote: > >>> we're using OGS/GE 2011.11 at toolserver.org, and unfortu- >>> nately our admins are AWOL. I'm trying to investigate why >>> the grid is heavily underloaded (while queues are filling >>> up). A simple job gets queued and has scheduling_info: > >>> | scheduling info: queue instance >>> "[email protected]" dropped because it is temporarily not >>> available >>> | queue instance >>> "[email protected]" dropped because it is temporarily not >>> available >>> | queue instance >>> "[email protected]" dropped because it is temporarily not >>> available >>> | queue instance >>> "[email protected]" dropped because it is temporarily not >>> available >>> | queue instance >>> "[email protected]" dropped because it is disabled >>> | queue instance >>> "[email protected]" dropped because it is disabled >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=0.845508 (= 0.645508 + 0.8 * 1.000000 with nproc=4) >= 0.75 >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=0.831445 (= 0.231445 + 0.8 * 6.000000 with nproc=8) >= 0.75 >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=1.245508 (= 0.645508 + 0.8 * 3.000000 with nproc=4) >= 1.2 >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=1.231445 (= 0.231445 + 0.8 * 10.000000 with nproc=8) >= 1.2 >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=1.202500 (= 0.002500 + 0.8 * 6.000000 with nproc=4) >= 1.2 >>> | queue instance >>> "[email protected]" dropped because it is full >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> mem_free=-173461503.737856 (= 13834.574219M - 500M * 28.000000) <= 500 >>> | queue instance >>> "[email protected]" dropped because it is overloaded: >>> np_load_short=3.202500 (= 0.002500 + 0.8 * 16.000000 with nproc=4) >= 3.1 > >>> For example, in queue instance >>> [email protected], where do the factors 0.8 >>> and 6.000000 come from? Neither "qconf -sconf global" nor >>> "qconf -sconf yarrow" show anything obvious, and "qconf -sq >>> medium-lx" only has load_thresholds with: > >>> | [...] >>> | load_thresholds np_load_short=1.2,np_load_long=1.5,cpu=98, \ >>> | mem_free=1000M, \ >>> | >>> [mayapple.toolserver.org=np_load_short=2.1,mem_free=300M] >>> | [...] > >>> to define the threshold, but not the calculation. I believe >>> the factors are applied at >>> source/libs/sched/sge_select_queue.c:2057, but I don't want >>> to read the whole source :-). Are these factors some de- >>> fault, or where should I look? > >> What is defined in `qconf -ssconf` for "job_load_adjustments"? > > | job_load_adjustments > tmp_free=150M,np_load_short=0.8,mem_free=500M > > Ah! So there's the 0.8. Any idea for the 6.000000?
Is it still 6.000000? Please have a look at the line below "job_load_adjustments" and the explanation in `man sched_conf`. -- Reuti > > Thanks, > Tim > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
