Am 04.10.2013 um 22:43 schrieb Alex Chekholko:

> Hi Txema,
> 
> Are the jobs submitted with "-R y"?
> 
> Maybe also try
> 
> load_formula                      slots
> 
> 
> More philosophically, how do you expect backfilling to work if there are no 
> hard runtime limits on jobs?
> 
> Regards,
> Alex
> 
> On 10/04/2013 07:25 AM, Txema Heredia wrote:
>> Hi all,
>> 
>> I have a 27-node cluster. Currently there are 320 out of 320 slots
>> filled up. All by jobs requesting 1-slot.
>> 
>> At the top of my waiting queue there are 28 different jobs requesting 3
>> to 12 cores using two different parallel environments. All these jobs
>> are requesting -R y. They are being ignored and overrun by the myriad of
>> 1-slot requesting  jobs behind them in the waiting queue.
>> 
>> I have enabled the scheduler logging. During the last 4 hours, it has
>> logged 724 new jobs starting, in all the 27 nodes. Not a single job on
>> the system is requesting -l h_rt, but single-core jobs keep being
>> scheduled  and all the parallel jobs are starving.
>> 
>> As far as I understand, the backfilling is killing my reservations, even
>> if no one is requesting any kind of time, but if I set the
>> "default_duration" to INFINITY, all the RESERVING log messages disappear.

As SGE sees all jobs running forever by default with this setting, there can't 
anything be reserved. Even if some are requesting h_rt: SGE judges INFINITY 
being smaller than INFINITY and running jobs without a h_rt request may allow 
jobs to slip in because of this - whether they request h_rt or not.

-- Reuti


>> Additionaly, for some odd reason, I only receive RESERVING messages from
>> the jobs requesting a given number of slots (-pe whatever N). The jobs
>> requesting a slot-range (-pe threaded 4-10) seem to reserve nothing.
>> 
>> My scheduler configuration is as follows:
>> 
>> # qconf -ssconf
>> algorithm                         default
>> schedule_interval                 0:0:5
>> maxujobs                          0
>> queue_sort_method                 load
>> job_load_adjustments              np_load_avg=0.50
>> load_adjustment_decay_time        0:7:30
>> load_formula                      np_load_avg
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            MONITOR=1
>> reprioritize_interval             0:0:0
>> halftime                          168
>> usage_weight_list cpu=0.187000,mem=0.116000,io=0.697000
>> compensation_factor               5.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         1000000000
>> weight_tickets_share              1000000000
>> share_override_tickets            TRUE
>> share_functional_shares           TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OSF
>> weight_ticket                     0.010000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   1.000000
>> max_reservation                   50
>> default_duration                  24:00:00
>> 
>> 
>> I have also tested it with params PROFILE=1 and default_duration
>> INFINITY. But, when I set it, not a single reservation is logged in
>> /opt/gridengine/default/common/schedule and new jobs keep starting.
>> 
>> 
>> What am I missing? Is it possible to kill the backfilling? Are my
>> reservations really working?
>> 
>> Thanks in advance,
>> 
>> Txema
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> -- 
> Alex Chekholko [email protected]
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to