I'm looking into several cases where jobs don't enter our queues even though the load is lower than the threshold and I noticed there's a different calculation there I can't figure..
Turning on logging, I see the following on qstat -j on a job that should enter but isn't: queue instance "al...@n38.--.com" dropped because it is overloaded: np_load_avg=1.306875 (= 0.965536 + 0.50 * 38.230000 with nproc=56) >= 1.30 load_formula is load_avg-num_proc and load_adjustments are 0.5: $ qconf -ssconf algorithm default schedule_interval 00:00:01 maxujobs 0 queue_sort_method load job_load_adjustments np_load_avg=0.50,load_avg=0.50 load_adjustment_decay_time 0:7:30 load_formula load_avg-num_proc given the n38 example, I see the average load is 0.965536 but I have absolutely ZERO idea where that 38.23 comes from.. num_proc is 56, load_avg is less than that, where does 38.23 comes from? Also, I should note all our jobs take 1 full cpu and they start doing it after about 10 seconds of starting, what should I set the decay time to? we took 7:30 minutes as a default we found somewhere..
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users