Re: [gridengine users] Wrong scheduling behaviour with parallel jobs

Reuti Tue, 22 Feb 2011 11:39:56 -0800

Am 22.02.2011 um 15:51 schrieb Andreas Haupt:

> Hi Reuti and Richard,
> 
> hmm, actually the scheduler configuration should be correct for such a
> setup. But maybe I just can't see the wood for the trees ...
> 
> [oreade38] ~ % qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:1


Doesn't put this a high load on the qmaster? Especially when you have a low 
value for ...

> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=1.0
> load_adjustment_decay_time        0:7:30
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  1
> flush_finish_sec                  1

... the flush settings. I think the defaults for max scheduling are 0:20 and 
have the flush settings set to 4.


> params                            none
> reprioritize_interval             0:0:0
> halftime                          24
> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       1.000000
> weight_project                    0.000000
> weight_department                 0.000000
> weight_job                        1.000000
> weight_tickets_functional         1000
> weight_tickets_share              10000
> share_override_tickets            FALSE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   1000
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  FS
> weight_ticket                     0.500000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.000000
> weight_priority                   1.000000
> max_reservation                   250
> default_duration                  9999:00:00

These settings look fine.


> Do you see a common mistake here? There are < 100 waiting jobs in the
> queue most of the time.

When I'm aware that there are always waiting jobs, the flush_submit_sec could 
even be higher, as there are most likely no free slots anyway. But this 
shouldn't influence the odd behavior you observe. With smaller parallel jobs 
which are waiting to get their slots it's working?

To investigate, you could also try to submit an advance reservation for some 
point in the future (unfortunately there is no option to `qrsub` to request it 
without a given start time [and I don't mean  "now" here], but you get the 
earliest time output when it could be granted). Is such a reservation granted 
in your case?

-- Reuti


> Thanks,
> Andreas
> 
> On Tue, 2011-02-22 at 15:38 +0100, Richard Ems wrote:
>> On 02/22/2011 03:07 PM, Andreas Haupt wrote:
>>> Do you see a similar behaviour? Is it a misconfiguration? Anything I
>>> could do (apart from watching the queue regularly and schedule "by
>>> hand" ...)?
>> 
>> We use the same GE version and have a "similar" configuration, but we
>> don't start parallel jobs on that many slots.
>> 
>> Could it be that max_reservation is set too low?
>> 
>> Regards, Richard
>> 
>> 
>> 
> -- 
> | Andreas Haupt             | E-Mail: [email protected]
> |  DESY Zeuthen             | WWW:    http://www-zeuthen.desy.de/~ahaupt
> |  Platanenallee 6          | Phone:  +49/33762/7-7359
> |  D-15738 Zeuthen          | Fax:    +49/33762/7-7216
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Wrong scheduling behaviour with parallel jobs

Reply via email to