> Am 15.04.2016 um 20:47 schrieb Christopher Black <cbl...@nygenome.org>:
> 
> If your jobs or queues have an h_rt specified, you can look into advanced 
> reservation and submitting large memory jobs with -R y.

To avoid confusion: this is in fact called resource reservation. Advance 
reservation is done by `qrsub` to get resources granted for known time in 
advance.

-- Reuti


> You will likely want to look into tweaking max*reservation* and 
> default_duration parameters via qconf -mconf/msconf. Utilizing advanced 
> reservation puts more load on the qmaster/scheduler but allows it to prevent 
> smaller jobs from flooding out large jobs when only small portions of nodes 
> become available.
> 
> Other options are using qhold or disabling the all.q queue instance on many 
> nodes when there is a backlog of high.q jobs.
> 
> Also, if you haven't already you may want to look into making mem a 
> consumable resource (based on your qconf -se output you may have already done 
> this).
> 
> Best,
> Chris
> 
> 
> 
> 
> On 4/14/16, 8:10 PM, "users-boun...@gridengine.org on behalf of Happy Monk" 
> <users-boun...@gridengine.org on behalf of gascan...@gmail.com> wrote:
> 
>> Hi,
>> 
>> 
>> We are using Open grid scheduler/Grid Engine version 2011.11p1
>> 
>> Currently have two queues, with identical settings except priority.
>> 
>> 
>> all.q  -- default queue
>> 
>> high.q    ---- higher priority
>> 
>> 
>> The scheduler is set to least nodes used policy. All our nodes have 
>> identical resources, 30 cores, 120GB RAM. Scheduler is working as expected 
>> when submitting jobs with low resource requests as per queue priorities. But 
>> when a high mem job (50+GB) is submitted
>> in high.q, it gets stuck in queue waiting forever, as low mem jobs from 
>> default.q are executed when ever a resource is available and scheduler is 
>> not able to fulfill high mem job requirements even though it is of higher 
>> priority. How can I make all jobs in
>> default.q to wait until higher priority jobs finish ?
>> 
>> 
>> 
>> Thanks,
>> 
>> 
>> Here are the details of our GE config,
>> 
>> 
>> root@master1: gridengine#qconf -ssconf
>> algorithm                         default
>> schedule_interval                 0:0:05
>> maxujobs                          0
>> queue_sort_method                 load
>> job_load_adjustments              np_load_avg=1.75
>> load_adjustment_decay_time        0:7:30
>> load_formula                      np_load_avg
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            none
>> reprioritize_interval             0:0:0
>> halftime                          168
>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>> compensation_factor               5.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         0
>> weight_tickets_share              0
>> share_override_tickets            TRUE
>> share_functional_shares           TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OFS
>> weight_ticket                     0.010000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   1.000000
>> max_reservation                   64
>> default_duration                  360:00:00
>> 
>> root@master1: gridengine#qconf -sq high.q
>> qname                 high.q
>> hostlist              @allhosts
>> seq_no                0
>> load_thresholds       np_load_avg=3.0
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              -10
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               make mpich mpi orte smp threaded
>> rerun                 FALSE
>> slots                 1,[]
>> tmpdir                /tmp
>> shell                 /bin/bash
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               20971520
>> h_stack               104857600
>> s_core                INFINITY
>> h_core                0
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>> 
>> root@master1: gridengine#qconf -sq all.q
>> qname                 all.q
>> hostlist              @allhosts
>> seq_no                0
>> load_thresholds       np_load_avg=3.0
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               make mpich mpi orte smp threaded
>> rerun                 FALSE
>> slots                 1,[]
>> tmpdir                /tmp
>> shell                 /bin/bash
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               20971520
>> h_stack               104857600
>> s_core                INFINITY
>> h_core                0
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>> 
>> root@master1: gridengine#qconf -se compute-2-1
>> hostname              compute-2-1.local
>> load_scaling          NONE
>> complex_values        slots=30,h_vmem=120G,io_slots=30
>> load_values           arch=linux-x64,num_proc=32,mem_total=129169.750000M, \
>>                     swap_total=31983.871094M,virtual_total=161153.621094M, \
>>                     load_avg=21.680000,load_short=21.950000, \
>>                     load_medium=21.680000,load_long=21.480000, \
>>                     mem_free=102849.832031M,swap_free=31983.871094M, \
>>                     virtual_free=134833.703125M,mem_used=26319.917969M, \
>>                     swap_used=0.000000M,virtual_used=26319.917969M, \
>>                     cpu=65.300000, \
>>                     
>> m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
>>                     
>> m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
>>                     m_socket=2,m_core=16,np_load_avg=0.677500, \
>>                     np_load_short=0.685937,np_load_medium=0.677500, \
>>                     np_load_long=0.671250
>> processors            32
>> user_lists            NONE
>> xuser_lists           NONE
>> projects              NONE
>> xprojects             NONE
>> usage_scaling         NONE
>> report_variables      NONE
>> 
>> root@squid: master1#qconf -sp threaded
>> pe_name            threaded
>> slots              9999
>> user_lists         NONE
>> xuser_lists        NONE
>> start_proc_args    /bin/true
>> stop_proc_args     /bin/true
>> allocation_rule    $pe_slots
>> control_slaves     FALSE
>> job_is_first_task  TRUE
>> urgency_slots      min
>> accounting_summary FALSE
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> This electronic message is intended for the use of the named recipient only, 
> and may contain information that is confidential, privileged or protected 
> from disclosure under applicable law. If you are not the intended recipient, 
> or an employee or agent responsible for delivering this message to the 
> intended recipient, you are hereby notified that any reading, disclosure, 
> dissemination, distribution, copying or use of the contents of this message 
> including any of its attachments is strictly prohibited. If you have received 
> this message in error or are not the named recipient, please notify us 
> immediately by contacting the sender at the electronic mail address noted 
> above, and destroy all copies of this message. Please note, the recipient 
> should check this email and any attachments for the presence of viruses. The 
> organization accepts no liability for any damage caused by any virus 
> transmitted by this email.
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to