Hi, Am 28.05.2014 um 17:08 schrieb Opera Wang:
> Our pool has more than 700 hosts and now have around 86k jobs pending, The > scheduler is very slow, > I'd like to know if changing maxujobs This would limit only overall number of jobs per user. What value would you put there in your case to have the cluster still fully loaded? > or report_pjob_tickets Just try. > will help. > 05/28/2014 07:07:56|schedu|host|P|PROF: job dispatching took 390.610 s (86544 > fast, 0 fast_soft, 2600 pe, 0 pe_soft, 0 res) > 05/28/2014 07:07:56|schedu|host|P|PROF: dispatched 2571 job(s) > 05/28/2014 07:07:56|schedu|host|P|PROF: parallel matching 2801 308526 > 19607 918254 157151 918254 153706 > 05/28/2014 07:07:56|schedu|host|P|PROF: sequential matching 30417 4606158 > 103535 4525629 4525629 4243369 2228 > 05/28/2014 07:07:56|schedu|host|P|PROF: create pending job orders: 0.670 s > 05/28/2014 07:07:56|schedu|host|P|PROF: scheduled in 394.370 (u 657.940 + s > 113.700 = 771.640): 2228 sequential, 343 parallel, 96792 orders, 731 H, 340 > Q, 891 QA, 86573 J(qw), 8674 J(r), 0 J(s), 0 J(h), 0 J(e), 1536 J(x), 96786 > J(all), 126 C, 61 ACL, 10 PE, 169 U, 1 D, 51 PRJ, 0 ST, 0 CKPT, 0 RU, 1 gMes, > 0 jMes, 96792/53 pre-send, 0/0/0 pe-alg > > Thanks. > > % qconf -ssconf > algorithm default > schedule_interval 0:0:30 > maxujobs 0 > queue_sort_method seqno > job_load_adjustments NONE > load_adjustment_decay_time 0:0:30 > load_formula np_load_short > schedd_job_info false > flush_submit_sec 1 > flush_finish_sec 1 This will start a scheduler run one second after each submission or end of job. I would suggest to set this to zero, as you have already a scheduler run every 30 seconds and could neglect this additional invocation.. -- Reuti > params PE_RANGE_ALG=bin,PROFILE=1 > reprioritize_interval 0:0:0 > halftime 1 > usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000 > compensation_factor 5.000000 > weight_user 0.500000 > weight_project 0.500000 > weight_department 0.000000 > weight_job 0.050000 > weight_tickets_functional 1000000 > weight_tickets_share 0 > share_override_tickets FALSE > share_functional_shares TRUE > max_functional_jobs_to_schedule 800 > report_pjob_tickets TRUE > max_pending_tasks_per_job 50 > halflife_decay_list none > policy_hierarchy OFS > weight_ticket 1.000000 > weight_waiting_time 0.000000 > weight_deadline 0.000000 > weight_urgency 0.000000 > weight_priority 0.000000 > fair_urgency_list NONE > max_reservation 0 > default_duration 2:00:0 > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
