[gridengine users] Handling Time Slot Differentiation

Brian Smith Thu, 16 Aug 2012 09:08:59 -0700

I know that in a lot of scheduling environments, queues are used such asshort, long, etc. to differentiate different classes of jobs. In ourenvironment, we're doing very much the same thing and using fancype_list syntax to differentiate our various clusters. It occurred tome, however, that it might be better to ditch that strategy and insteaduse JSV and complex attributes with a single default queue instance.


Let's say I want to have the job classes


devel <= 1hr
short <= 6hr
medium <= 48hr
long <= 192hr
xlong > 192hr (no limit, restricted access)

Our current methodology for ensuring QoS for those queues involves RQS &JSV. Schedule intervals are pretty long and hairy even for a <500 nodecluster due to the complex PE configuration:


{
   name         host_slotcap
   description  make sure only the right number of slots get used
   enabled      TRUE
   limit        queues * hosts {*} to slots=$num_proc
}
{
   name         queue_slotcap
   description  slot limits for each queue
   enabled      TRUE
   limit        queues xlong to slots=512
   limit        queues long to slots=1436
   limit        queues medium to slots=1724
}
{
   name         user_slotcap
   description  make sure users can only use so much
   enabled      TRUE
   limit        users {*} to slots=512
}

We use a jsv to classify the jobs into queues:
...
    # Set queue based on specified runtime
    if [ -z "$hrt" ]; then
        jsv_sub_add_param q_hard "devel"
        jsv_sub_add_param l_hard h_rt "01:00:00"
        do_correct="true"
    else
        do_correct="true"
        if [ $hrt -le $((3600*1)) ]; then
            jsv_sub_add_param q_hard "devel"
        elif [ $hrt -gt $((3600*1)) -a $hrt -le $((3600*6)) ]; then
            jsv_sub_add_param q_hard "short"
        elif [ $hrt -gt $((3600*6)) -a $hrt -le $((3600*48)) ]; then
            jsv_sub_add_param q_hard "medium"
        elif [ $hrt -gt $((3600*48)) -a $hrt -le $((3600*168)) ]; then
            jsv_sub_add_param q_hard "long"
        elif [ $hrt -gt $((3600*168)) ]; then
            jsv_sub_add_param q_hard "xlong"
        fi
    fi
...

We also use my github project for pbs-esque parallel environmentsupport: https://github.com/brichsmith/gepetools


This means each queue has a complicated PE configuration:

pe_list               make smp,[@cms_X7DBR-3=pe_cms_X7DBR-3_hg \
                      pe_cms_X7DBR-3_hg.1 pe_cms_X7DBR-3_hg.2 \
                      pe_cms_X7DBR-3_hg.4 pe_cms_X7DBR-3_hg.6 \
                      pe_cms_X7DBR-3_hg.8], \
                      ...
                      [@MRI_Sun_X4150=pe_MRI_Sun_X4150_hg \
                      pe_MRI_Sun_X4150_hg.1 pe_MRI_Sun_X4150_hg.2 \
                      pe_MRI_Sun_X4150_hg.4 pe_MRI_Sun_X4150_hg.6 \
                      pe_MRI_Sun_X4150_hg.8], \
                      ...
                      [@RC_Dell_R410=pe_RC_Dell_R410_hg \
                      pe_RC_Dell_R410_hg.1 \
                      pe_RC_Dell_R410_hg.12 pe_RC_Dell_R410_hg.2 \
                      pe_RC_Dell_R410_hg.4 pe_RC_Dell_R410_hg.6 \
                      pe_RC_Dell_R410_hg.8], \
                      ...
                      [@RC_HP_DL165G7=pe_RC_HP_DL165G7_hg \
                      pe_RC_HP_DL165G7_hg.1 pe_RC_HP_DL165G7_hg.12 \
                      pe_RC_HP_DL165G7_hg.16 pe_RC_HP_DL165G7_hg.2 \
                      pe_RC_HP_DL165G7_hg.4 pe_RC_HP_DL165G7_hg.6 \
                      pe_RC_HP_DL165G7_hg.8], \
                      ...

We set a negative urgency value to h_rt so that longer jobs get lowerpriority.

This approach seems to confuse the scheduler in terms of resourcereservations, so we pretty much can't do them and end up with theoccasional starving >128 slot parallel job. Its also pretty difficultto determine scheduling bottlenecks, etc. Its elegant from a userperspective, but somewhat difficult to administer and troubleshoot(we've whipped up some tools to help, but there are still limitations).

I want to ditch the "queues-as-classifiers" model and use complexattributes instead. Think a single "default" queue, but my jsv will now:


...
# Set queue based on specified runtime
    if [ -z "$hrt" ]; then
        jsv_sub_add_param l_hard h_rt "01:00:00"
        jsv_sub_add_param l_hard devel 1
        do_correct="true"
    else
        do_correct="true"
        if [ $hrt -le $((3600*1)) ]; then
            jsv_sub_add_param l_hard devel 1
        elif [ $hrt -gt $((3600*1)) -a $hrt -le $((3600*6)) ]; then
            jsv_sub_add_param l_hard short 1
        elif [ $hrt -gt $((3600*6)) -a $hrt -le $((3600*48)) ]; then
            jsv_sub_add_param l_hard medium 1
        elif [ $hrt -gt $((3600*48)) -a $hrt -le $((3600*168)) ]; then
            jsv_sub_add_param l_hard long 1
        elif [ $hrt -gt $((3600*168)) ]; then
            jsv_sub_add_param q_hard "xlong"
        fi
    fi
...

RQS gets simplified to:

{
   name         host_slotcap
   description  make sure only the right number of slots get used
   enabled      TRUE
   limit        hosts {*} to slots=$num_proc
}
{
   name         user_slotcap
   description  make sure users can only use so much
   enabled      TRUE
   limit        users {*} to slots=512
}

And global host gets configured as such:
...
complex_values  ...,short=4096,devel=4096,medium=1768,long=1534
...

We drop the urgency from h_rt and instead associate it with the complexattributes:


$ qconf -sc | egrep '^(devel|short|medium|long)[ ]+'
devel  devel    INT       <=    YES         YES        0        1000
long   long     INT       <=    YES         YES        0        0
medium medium   INT       <=    YES         YES        0        10
short  short    INT       <=    YES         YES        0        100

What say other GridEngine gurus about this approach? I believe thiswill help with my resource reservation woes and at the very least,should make my scheduler iterations much shorter. Is there a betterway? Are there any potential pitfalls I may have missed?


Any input or suggestions would be appreciated.

Best Regards,

Brian Smith
Sr. System Administrator
Research Computing, University of South Florida
4202 E. Fowler Ave. SVC4010
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Handling Time Slot Differentiation

Reply via email to