anything else I should look into ? or is it more of a bug in this version ?
Thanks! On Thu, Mar 20, 2014 at 1:29 PM, Karun K <[email protected]> wrote: > Hi Reuti, > > Yes, this issue is only with parallel jobs. Regular jobs are being > distributed fine as per load. > > Thanks > > > > > On Thu, Mar 20, 2014 at 12:27 PM, Reuti <[email protected]>wrote: > >> Hi, >> >> Am 20.03.2014 um 18:15 schrieb Karun K: >> >> > Our scheduler is configured to "least_used_host" policy depending on >> load average and for PE environments its $pe_slots >> > Regular jobs are being allocated as expected but PE jobs are being >> filled up before it moves to next available node. >> > How can I configure PE jobs also to be round-robin? i.e all requested >> slots in PE jobs have to be in the same host but jobs should be distributed >> rather than filling up host. >> >> Is this issue limited to parallel jobs? For $pe_slots it should work, and >> I see you already defined a "job_load_adjustments". >> >> -- Reuti >> >> >> > Included our ge configs below , version 2011.11p1 >> > >> > Thanks, >> > Karun >> > >> > job-ID prior name user state submit/start at queue >> slots ja-task-ID >> > >> ----------------------------------------------------------------------------------------------------------------- >> > 124688 0.51929 STDIN kk r 03/13/2014 23:07:57 >> [email protected] 2 >> > 124689 0.51929 STDIN kk r 03/13/2014 23:07:57 >> [email protected] 2 >> > 124690 0.51929 STDIN kk r 03/13/2014 23:07:57 >> [email protected] 2 >> > 124691 0.51929 STDIN kk r 03/13/2014 23:08:02 >> [email protected] 2 >> > 124692 0.51929 STDIN kk r 03/13/2014 23:08:02 >> [email protected] 2 >> > 124694 0.50500 STDIN kk r 03/13/2014 23:08:27 >> [email protected] 1 >> > 124695 0.50500 STDIN kk r 03/13/2014 23:08:27 >> [email protected] 1 >> > 124696 0.50500 STDIN kk r 03/13/2014 23:08:27 >> [email protected] 1 >> > 124697 0.50500 STDIN kk r 03/13/2014 23:08:27 >> [email protected] 1 >> > >> > [root@cluster ~]# qconf -ssconf >> > algorithm default >> > schedule_interval 0:0:05 >> > maxujobs 0 >> > queue_sort_method load >> > job_load_adjustments np_load_avg=3.0 >> > load_adjustment_decay_time 0:7:30 >> > load_formula np_load_avg >> > schedd_job_info true >> > flush_submit_sec 0 >> > flush_finish_sec 0 >> > params none >> > reprioritize_interval 0:0:0 >> > halftime 168 >> > usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000 >> > >> > ---- >> > >> > [root@cluster ~]# qconf -sp threaded >> > pe_name threaded >> > slots 9999 >> > user_lists NONE >> > xuser_lists NONE >> > start_proc_args /bin/true >> > stop_proc_args /bin/true >> > allocation_rule $pe_slots >> > control_slaves FALSE >> > job_is_first_task TRUE >> > urgency_slots min >> > accounting_summary FALSE >> > >> > All nodes have identical complex configuration >> > >> > [root@cluster ~]# qconf -se compute-4-3 >> > hostname compute-4-3.local >> > load_scaling NONE >> > complex_values slots=30,h_vmem=120G >> > load_values >> arch=linux-x64,num_proc=30,mem_total=123136.023438M, \ >> > -------------------truncated----------------------- >> > processors 30 >> > user_lists NONE >> > xuser_lists NONE >> > projects NONE >> > xprojects NONE >> > usage_scaling NONE >> > report_variables NONE >> > >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> >> >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
