If you run jobs on the qmaster, then high memory usage jobs can
trigger the Linux OOM killer. And if the kernel picks the qmaster,
then all job scheduling functionality would be lost... unless you have
a shadow master.

If you have RHEL 6 (or RHEL 5.8 + Oracle's Unbreakable kernel), you
can use cgroups to limit the resource usage - including memory, CPU,
I/O, etc of the jobs. But if it is a small cluster, I think the best
bang for the buck is to setup a shadow master and if OOM killer picks
the qmaster, then there is at least a backup:

http://gridscheduler.sourceforge.net/howto/shadow.html

Rayson



On Mon, May 28, 2012 at 3:16 PM, Alan McKay <[email protected]> wrote:
> Hey folks,
>
> Firstly, I know very little about Gridengine.   If there is some
> recommended reading that will easily answer my question, please be so
> kind as to point me at it.  I manage a few linux clusters, one of
> which runs SGE 6.2u3 on RHEL 5.8
>
> That cluster has a dedicated node for sgemaster, which also exports
> /gridware to the other nodes.   To me this seems wasteful since the
> load on that system is flatlined at just about zero 100% of the time.
>  It essentially does nothing.   I have several months of "munin" data
> to show this.
>
> This would seem to tell me that I could put sgemaster on one of the
> compute nodes without putting any excess load on that node.  Then
> repurpose the box running sgemaster.
>
> As for the /gridware currently exported by the sgemaster, I would move
> that to our ZFS appliance.
>
> Do most people keep a dedicated node for sgemaster?  Or do most do as
> I'd like to do and run it on a compute node?
>
> thanks,
> -Alan
>
>
> --
> “Don't eat anything you've ever seen advertised on TV”
>          - Michael Pollan, author of "In Defense of Food"
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to