I'm looking into several cases where jobs don't enter our queues even
though the load is lower than the threshold and I noticed there's a
different calculation there I can't figure..

Turning on logging, I see the following on qstat -j on a job that should
enter but isn't:
queue instance "al...@n38.--.com" dropped because it is overloaded:
np_load_avg=1.306875 (= 0.965536 + 0.50 * 38.230000 with nproc=56) >= 1.30

load_formula is load_avg-num_proc and load_adjustments are 0.5:

$ qconf -ssconf
algorithm                         default
schedule_interval                 00:00:01
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.50,load_avg=0.50
load_adjustment_decay_time        0:7:30
load_formula                      load_avg-num_proc

given the n38 example, I see the average load is 0.965536 but I have
absolutely ZERO idea where that 38.23 comes from.. num_proc is 56, load_avg
is less than that, where does 38.23 comes from?

Also, I should note all our jobs take 1 full cpu and they start doing it
after about 10 seconds of starting, what should I set the decay time to? we
took 7:30 minutes as a default we found somewhere..
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to