Well, the Grid Engine config options haven't been put there just to confuse you, Chris - trust me ;-). They really were added in response to concrete requests and requirements. The problem you and others are experiencing is that DRM systems are swiss army knives. There are scenarios like that of TACC with 64K cores and a focus on very large parallel jobs as well as throughput sites with thousands of nodes running many ten million jobs a month just as well as quite small installations, sometimes even within single embedded systems. This results in sites having diverse goals such as a focus on maximizing the utilization or resources or a need to ensure equal share of resources or guaranteeing full exploitation of the most critical/expensive resources. And so on. It's not rare that such goals are conflicting in case a site wants to manage towards two or more of them.

So part of the problem is that there are so many different use cases and that those scenarios are often inherently complex. The other issue is that you do not always get the chance to rework large parts of your policy system just because you are adding support for another case ... and it's actually also not desired by existing users to have to modify how they have been managing their system thus far just because something has been added in another corner.

So there is complexity, there is overlap and policies are often not orthogonal or are conflicting.

What I've always been suggesting as a best practice approach for configuring Grid Engine policies is to take it step by step:

- Start with your topmost policy goal. If you have more than one being equally important then pick one of them.
- Then choose the Grid Engine policy which best matches that goal. Yes, there are cases where you will have choice among Grid Engine policies.
- Tune the chosen policy to meet your top priority goal. That's usually not too hard.
- Once you're done with that pick the next goal and a suitable Grid Engine policy and try to understand potential interferences first.
- Then tune that second policy while observing what's happening with your top priority goal. Make adjustments there as well as needed.
- And so on.

All that said: simplifying the existing policy scheme is a long standing goal. Or provide some wizard(s) on top making dealing with the existing system more palatable.

Cheers,

Fritz



Am 07.04.11 00:15, schrieb Chris Dagdigian:
Jiri forwarded me the URL to his post and I found it fascinating:

"Calculating GE Job Priorities"
http://olwynion.blogspot.com/2011/04/calculating-ge-job-priorities.html

I've always felt that one of the strengths of GE (unlimited number of 
knobs that you can alter) is also one of it's biggest problems (infinite 
number of potential configurations and no huge corpus of well tested 
values ...) and this post reinforces a lot of those thoughts.

What do others think? I gave up years ago trying to understand the 
policy mechanism at any deep level. I have a few good config recipes 
that I stick with. Whenever I have to deviate from those, I often end up 
making best-guess changes to odd SGE values/weights and then I have to 
watch the pending/active job list to see if the resource allocation mix 
is doing what I hoped. More clarity and "predictive-ness" would be welcome.

-dag

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

--

UnivaFritz Ferstl | CTO and Business Development, EMEA
Univa Corporation | The Data Center Optimization Company
E-Mail: [email protected] | Phone: +49.9471.200.195 | Mobile: +49.170.819.7390

Where Grid Engine lives

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to