Hi,

Am 28.05.2014 um 17:08 schrieb Opera Wang:

>    Our pool has more than 700 hosts and now have around 86k jobs pending, The 
> scheduler is very slow,
> I'd like to know if changing maxujobs

This would limit only overall number of jobs per user. What value would you put 
there in your case to have the cluster still fully loaded?

> or report_pjob_tickets

Just try.

> will help.
> 05/28/2014 07:07:56|schedu|host|P|PROF: job dispatching took 390.610 s (86544 
> fast, 0 fast_soft, 2600 pe, 0 pe_soft, 0 res)
> 05/28/2014 07:07:56|schedu|host|P|PROF: dispatched 2571 job(s)
> 05/28/2014 07:07:56|schedu|host|P|PROF: parallel matching 2801 308526        
> 19607       918254       157151 918254       153706
> 05/28/2014 07:07:56|schedu|host|P|PROF: sequential matching 30417 4606158     
>   103535      4525629      4525629 4243369         2228
> 05/28/2014 07:07:56|schedu|host|P|PROF: create pending job orders: 0.670 s
> 05/28/2014 07:07:56|schedu|host|P|PROF: scheduled in 394.370 (u 657.940 + s 
> 113.700 = 771.640): 2228 sequential, 343 parallel, 96792 orders, 731 H, 340 
> Q, 891 QA, 86573 J(qw), 8674 J(r), 0 J(s), 0 J(h), 0 J(e), 1536 J(x), 96786 
> J(all), 126 C, 61 ACL, 10 PE, 169 U, 1 D, 51 PRJ, 0 ST, 0 CKPT, 0 RU, 1 gMes, 
> 0 jMes, 96792/53 pre-send, 0/0/0 pe-alg
> 
> Thanks.
> 
> % qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:30
> maxujobs                          0
> queue_sort_method                 seqno
> job_load_adjustments              NONE
> load_adjustment_decay_time        0:0:30
> load_formula                      np_load_short
> schedd_job_info                   false
> flush_submit_sec                  1
> flush_finish_sec                  1

This will start a scheduler run one second after each submission or end of job. 
I would suggest to set this to zero, as you have already a scheduler run every 
30 seconds and could neglect this additional invocation..

-- Reuti


> params                            PE_RANGE_ALG=bin,PROFILE=1
> reprioritize_interval             0:0:0
> halftime                          1
> usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.500000
> weight_project                    0.500000
> weight_department                 0.000000
> weight_job                        0.050000
> weight_tickets_functional         1000000
> weight_tickets_share              0
> share_override_tickets            FALSE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   800
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OFS
> weight_ticket                     1.000000
> weight_waiting_time               0.000000
> weight_deadline                   0.000000
> weight_urgency                    0.000000
> weight_priority                   0.000000
> fair_urgency_list                 NONE
> max_reservation                   0
> default_duration                  2:00:0
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to