Hello,

We solved it the way that `h_rt` is set to FORCED in the complex list:

    #name                    shortcut      type        relop requestable 
consumable default  urgency
    
#------------------------------------------------------------------------------------------------
    h_rt                     h_rt          TIME        <=    FORCED      YES    
    0:0:0    0

And have a JSV rejecting jobs that don't request it (because they would be 
pending indefinetely
unless you have a default duration or use qalter).

You could also use a JSV to enforce that only jobs with large resources (in 
your case more than some
amount of slots) are able to request reservation, i.e.:

    # pseudo JSV code
    
    SLOT_RESERVATION_THRESHOLD=...
    
    if slots < SLOT_RESERVATION_THRESHOLD then
        "disable reservation / reject"
    else
        "enable reservation"
    fi


On Fri, Oct 04, 2013 at 04:25:29PM +0200, Txema Heredia wrote:
> Hi all,
> 
> I have a 27-node cluster. Currently there are 320 out of 320 slots
> filled up. All by jobs requesting 1-slot.
> 
> At the top of my waiting queue there are 28 different jobs
> requesting 3 to 12 cores using two different parallel environments.
> All these jobs are requesting -R y. They are being ignored and
> overrun by the myriad of 1-slot requesting  jobs behind them in the
> waiting queue.
> 
> I have enabled the scheduler logging. During the last 4 hours, it
> has logged 724 new jobs starting, in all the 27 nodes. Not a single
> job on the system is requesting -l h_rt, but single-core jobs keep
> being scheduled  and all the parallel jobs are starving.
> 
> As far as I understand, the backfilling is killing my reservations,
> even if no one is requesting any kind of time, but if I set the
> "default_duration" to INFINITY, all the RESERVING log messages
> disappear.
> 
> Additionaly, for some odd reason, I only receive RESERVING messages
> from the jobs requesting a given number of slots (-pe whatever N).
> The jobs requesting a slot-range (-pe threaded 4-10) seem to reserve
> nothing.
> 
> My scheduler configuration is as follows:
> 
> # qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:5
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=0.50
> load_adjustment_decay_time        0:7:30
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  0
> flush_finish_sec                  0
> params                            MONITOR=1
> reprioritize_interval             0:0:0
> halftime                          168
> usage_weight_list cpu=0.187000,mem=0.116000,io=0.697000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         1000000000
> weight_tickets_share              1000000000
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OSF
> weight_ticket                     0.010000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.100000
> weight_priority                   1.000000
> max_reservation                   50
> default_duration                  24:00:00
> 
> 
> I have also tested it with params PROFILE=1 and default_duration
> INFINITY. But, when I set it, not a single reservation is logged in
> /opt/gridengine/default/common/schedule and new jobs keep starting.
> 
> 
> What am I missing? Is it possible to kill the backfilling? Are my
> reservations really working?
> 
> Thanks in advance,
> 
> Txema
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

-- 

Mit freundlichen Grüßen /  
With kind regards

-------------------------------------------

Christian Krause

Wissenschaftliche und Kaufmännische Datenverarbeitung /  
Scientific and Commercial Data Processing (WKDV)

Wissenschaftliches Rechnen und Wissenschaftliches Datenmanagement /  
Scientific Computing and Scientific Data Management (WRWD)

-------------------------------------------

Helmholtz-Zentrum für Umweltforschung GmbH - UFZ /  
Helmholtz Centre for Environmental Research - UFZ  
Permoserstr. 15 / 04318 Leipzig / Germany  
phone +49 341 235 1001 / fax +49 341 235 1468  
<[email protected]> / <http://www.ufz.de>

Sitz der Gesellschaft: Leipzig  
Registergericht: Amtsgericht Leipzig, Handelsregister Nr. B 4703  
Vorsitzender des Aufsichtsrats: MinDirig Wilfried Kraus  
Wissenschaftlicher Geschäftsführer: Prof. Dr. Georg Teutsch  
Administrative Geschäftsführerin: Dr. Heike Graßmann

-------------------------------------------

Bitte denken Sie an die Umwelt bevor Sie diese E-Mail ausdrucken. /  
Please consider the environment before printing this e-mail.

-------------------------------------------

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to