On 16 August 2012 17:07, Brian Smith <[email protected]> wrote:
> I want to ditch the "queues-as-classifiers" model and use complex > attributes instead. Think a single "default" queue, but my jsv will now: > > ... > # Set queue based on specified runtime > if [ -z "$hrt" ]; then > jsv_sub_add_param l_hard h_rt "01:00:00" > jsv_sub_add_param l_hard devel 1 > do_correct="true" > else > do_correct="true" > if [ $hrt -le $((3600*1)) ]; then > jsv_sub_add_param l_hard devel 1 > elif [ $hrt -gt $((3600*1)) -a $hrt -le $((3600*6)) ]; then > jsv_sub_add_param l_hard short 1 > elif [ $hrt -gt $((3600*6)) -a $hrt -le $((3600*48)) ]; then > jsv_sub_add_param l_hard medium 1 > elif [ $hrt -gt $((3600*48)) -a $hrt -le $((3600*168)) ]; then > jsv_sub_add_param l_hard long 1 > elif [ $hrt -gt $((3600*168)) ]; then > jsv_sub_add_param q_hard "xlong" > fi > fi > And global host gets configured as such: > ... > complex_values ...,short=4096,devel=4096,medium=1768,long=1534 > ... > > We drop the urgency from h_rt and instead associate it with the complex > attributes: > > $ qconf -sc | egrep '^(devel|short|medium|long)[ ]+' > devel devel INT <= YES YES 0 1000 > long long INT <= YES YES 0 0 > medium medium INT <= YES YES 0 10 > short short INT <= YES YES 0 100 > > What say other GridEngine gurus about this approach? I believe this > will help with my resource reservation woes and at the very least, > should make my scheduler iterations much shorter. Is there a better > way? Are there any potential pitfalls I may have missed? > > Any input or suggestions would be appreciated. > We do something fairly similar to what you are proposing. We have only two complexes with an urgency associated with them though: bonus and penalty. The JSV calculates appropriate values for them based on our rules. They are given the same weight (positive or negative) as waiting time so that we can in effect add and subtract waiting time. Rather than lots of elif statements for the rules though we have defined data structures that are kept in a perl modules that the JSV loads. At some point I should change the modules to populate the data structures from a non-perl specific config-file/database whatever. We still have a large number of queues on our cluster though for other reasons. We used to have the JSV restrict queue selection with -q name@@hostgroup-glob. When I removed that and replaced it with a bunch of numeric attributes to select the appropriate queue the scheduler started taking far less time to run. Not saying it is causal but it makes sense that comparing numeric values would take less time than globbing(especially after hostgroup expansion). We'd previously tried reducing the number of queues we had but this seemed to make far less difference than switching to queue selection via numeric comparisons. The annoying thing is I'd originally planned to write it the way it is now but I was pressed for time and setting -q from the JSV was a lot quicker to write. That said we still aren't scheduling as fast as I'd like. I'm currently working on modifying a locally developed pbs-alike feature. We have a string complex called status that we use to record problems with a node. The JSV currently requests status=OK (the default) to ensure it doesn't get assigned to a known broken node which means we can use the queue's enabled/disabled flag to mean "disabled until next reboot". I've changes this so that inn addition a numeric value already used in queue selection is set to a value (for the host) that prevents any jobs from running on the node when we disable it so I can drop the string matching request from the JSV. I doubt this will provide quite the same speedup as not globbing on the queue though. We don't currently use any RQS here. William _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
