Re: [gridengine users] Scheduler configuration regd high mem jobs in high priority queue

Reuti Mon, 18 Apr 2016 03:55:09 -0700

Hi,

> Am 15.04.2016 um 23:12 schrieb Happy Monk <gascan...@gmail.com>:
> 
> Is it possible to make advanced reservation default for all jobs in a queue  ?


Well, the global "sge_request" could do it, but then it would be for all jobs. 
If there are dedicated users using the high priority queue  they could have 
their personal ".sge_request" in their home directory or even specific 
subdirectories.

Another way would be to implement a JSV (job submission verifier) and attach 
"-R y" only to certain jobs depending on some criteria.

-- Reuti


> Any disadvantages to it ? 
> Yes, mem is already a consumable. 
> 
> Thanks,  
> 
> On Fri, Apr 15, 2016 at 11:47 AM, Christopher Black <cbl...@nygenome.org> 
> wrote:
> If your jobs or queues have an h_rt specified, you can look into advanced 
> reservation and submitting large memory jobs with -R y. You will likely want 
> to look into tweaking max*reservation* and default_duration parameters via 
> qconf -mconf/msconf. Utilizing advanced reservation puts more load on the 
> qmaster/scheduler but allows it to prevent smaller jobs from flooding out 
> large jobs when only small portions of nodes become available.
> 
> Other options are using qhold or disabling the all.q queue instance on many 
> nodes when there is a backlog of high.q jobs.
> 
> Also, if you haven't already you may want to look into making mem a 
> consumable resource (based on your qconf -se output you may have already done 
> this).
> 
> Best,
> Chris
> 
> 
> 
> 
> On 4/14/16, 8:10 PM, "users-boun...@gridengine.org on behalf of Happy Monk" 
> <users-boun...@gridengine.org on behalf of gascan...@gmail.com> wrote:
> 
> >Hi,
> >
> >
> >We are using Open grid scheduler/Grid Engine version 2011.11p1
> >
> >Currently have two queues, with identical settings except priority.
> >
> >
> >all.q  -- default queue
> >
> >high.q    ---- higher priority
> >
> >
> >The scheduler is set to least nodes used policy. All our nodes have 
> >identical resources, 30 cores, 120GB RAM. Scheduler is working as expected 
> >when submitting jobs with low resource requests as per queue priorities. But 
> >when a high mem job (50+GB) is submitted
> > in high.q, it gets stuck in queue waiting forever, as low mem jobs from 
> > default.q are executed when ever a resource is available and scheduler is 
> > not able to fulfill high mem job requirements even though it is of higher 
> > priority. How can I make all jobs in
> > default.q to wait until higher priority jobs finish ?
> >
> >
> >
> >Thanks,
> >
> >
> >Here are the details of our GE config,
> >
> >
> >root@master1: gridengine#qconf -ssconf
> >algorithm                         default
> >schedule_interval                 0:0:05
> >maxujobs                          0
> >queue_sort_method                 load
> >job_load_adjustments              np_load_avg=1.75
> >load_adjustment_decay_time        0:7:30
> >load_formula                      np_load_avg
> >schedd_job_info                   true
> >flush_submit_sec                  0
> >flush_finish_sec                  0
> >params                            none
> >reprioritize_interval             0:0:0
> >halftime                          168
> >usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> >compensation_factor               5.000000
> >weight_user                       0.250000
> >weight_project                    0.250000
> >weight_department                 0.250000
> >weight_job                        0.250000
> >weight_tickets_functional         0
> >weight_tickets_share              0
> >share_override_tickets            TRUE
> >share_functional_shares           TRUE
> >max_functional_jobs_to_schedule   200
> >report_pjob_tickets               TRUE
> >max_pending_tasks_per_job         50
> >halflife_decay_list               none
> >policy_hierarchy                  OFS
> >weight_ticket                     0.010000
> >weight_waiting_time               0.000000
> >weight_deadline                   3600000.000000
> >weight_urgency                    0.100000
> >weight_priority                   1.000000
> >max_reservation                   64
> >default_duration                  360:00:00
> >
> >root@master1: gridengine#qconf -sq high.q
> >qname                 high.q
> >hostlist              @allhosts
> >seq_no                0
> >load_thresholds       np_load_avg=3.0
> >suspend_thresholds    NONE
> >nsuspend              1
> >suspend_interval      00:05:00
> >priority              -10
> >min_cpu_interval      00:05:00
> >processors            UNDEFINED
> >qtype                 BATCH INTERACTIVE
> >ckpt_list             NONE
> >pe_list               make mpich mpi orte smp threaded
> >rerun                 FALSE
> >slots                 1,[]
> >tmpdir                /tmp
> >shell                 /bin/bash
> >prolog                NONE
> >epilog                NONE
> >shell_start_mode      posix_compliant
> >starter_method        NONE
> >suspend_method        NONE
> >resume_method         NONE
> >terminate_method      NONE
> >notify                00:00:60
> >owner_list            NONE
> >user_lists            NONE
> >xuser_lists           NONE
> >subordinate_list      NONE
> >complex_values        NONE
> >projects              NONE
> >xprojects             NONE
> >calendar              NONE
> >initial_state         default
> >s_rt                  INFINITY
> >h_rt                  INFINITY
> >s_cpu                 INFINITY
> >h_cpu                 INFINITY
> >s_fsize               INFINITY
> >h_fsize               INFINITY
> >s_data                INFINITY
> >h_data                INFINITY
> >s_stack               20971520
> >h_stack               104857600
> >s_core                INFINITY
> >h_core                0
> >s_rss                 INFINITY
> >h_rss                 INFINITY
> >s_vmem                INFINITY
> >h_vmem                INFINITY
> >
> >root@master1: gridengine#qconf -sq all.q
> >qname                 all.q
> >hostlist              @allhosts
> >seq_no                0
> >load_thresholds       np_load_avg=3.0
> >suspend_thresholds    NONE
> >nsuspend              1
> >suspend_interval      00:05:00
> >priority              0
> >min_cpu_interval      00:05:00
> >processors            UNDEFINED
> >qtype                 BATCH INTERACTIVE
> >ckpt_list             NONE
> >pe_list               make mpich mpi orte smp threaded
> >rerun                 FALSE
> >slots                 1,[]
> >tmpdir                /tmp
> >shell                 /bin/bash
> >prolog                NONE
> >epilog                NONE
> >shell_start_mode      posix_compliant
> >starter_method        NONE
> >suspend_method        NONE
> >resume_method         NONE
> >terminate_method      NONE
> >notify                00:00:60
> >owner_list            NONE
> >user_lists            NONE
> >xuser_lists           NONE
> >subordinate_list      NONE
> >complex_values        NONE
> >projects              NONE
> >xprojects             NONE
> >calendar              NONE
> >initial_state         default
> >s_rt                  INFINITY
> >h_rt                  INFINITY
> >s_cpu                 INFINITY
> >h_cpu                 INFINITY
> >s_fsize               INFINITY
> >h_fsize               INFINITY
> >s_data                INFINITY
> >h_data                INFINITY
> >s_stack               20971520
> >h_stack               104857600
> >s_core                INFINITY
> >h_core                0
> >s_rss                 INFINITY
> >h_rss                 INFINITY
> >s_vmem                INFINITY
> >h_vmem                INFINITY
> >
> >root@master1: gridengine#qconf -se compute-2-1
> >hostname              compute-2-1.local
> >load_scaling          NONE
> >complex_values        slots=30,h_vmem=120G,io_slots=30
> >load_values           arch=linux-x64,num_proc=32,mem_total=129169.750000M, \
> >                      swap_total=31983.871094M,virtual_total=161153.621094M, 
> > \
> >                      load_avg=21.680000,load_short=21.950000, \
> >                      load_medium=21.680000,load_long=21.480000, \
> >                      mem_free=102849.832031M,swap_free=31983.871094M, \
> >                      virtual_free=134833.703125M,mem_used=26319.917969M, \
> >                      swap_used=0.000000M,virtual_used=26319.917969M, \
> >                      cpu=65.300000, \
> >                      
> > m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
> >                      
> > m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
> >                      m_socket=2,m_core=16,np_load_avg=0.677500, \
> >                      np_load_short=0.685937,np_load_medium=0.677500, \
> >                      np_load_long=0.671250
> >processors            32
> >user_lists            NONE
> >xuser_lists           NONE
> >projects              NONE
> >xprojects             NONE
> >usage_scaling         NONE
> >report_variables      NONE
> >
> >root@squid: master1#qconf -sp threaded
> >pe_name            threaded
> >slots              9999
> >user_lists         NONE
> >xuser_lists        NONE
> >start_proc_args    /bin/true
> >stop_proc_args     /bin/true
> >allocation_rule    $pe_slots
> >control_slaves     FALSE
> >job_is_first_task  TRUE
> >urgency_slots      min
> >accounting_summary FALSE
> >
> >
> >
> >
> >
> >
> >
> This electronic message is intended for the use of the named recipient only, 
> and may contain information that is confidential, privileged or protected 
> from disclosure under applicable law. If you are not the intended recipient, 
> or an employee or agent responsible for delivering this message to the 
> intended recipient, you are hereby notified that any reading, disclosure, 
> dissemination, distribution, copying or use of the contents of this message 
> including any of its attachments is strictly prohibited. If you have received 
> this message in error or are not the named recipient, please notify us 
> immediately by contacting the sender at the electronic mail address noted 
> above, and destroy all copies of this message. Please note, the recipient 
> should check this email and any attachments for the presence of viruses. The 
> organization accepts no liability for any damage caused by any virus 
> transmitted by this email.
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Scheduler configuration regd high mem jobs in high priority queue

Reply via email to