Ben De Luca <[email protected]> writes:

> I was wondering, how people deal with oom conditions on there cluster.
> We constantly have machines that die because the oom killer takes out
> critical system services.
>
> Has any experiance with the oom_adj proc value, or a patch to grid to
> support it?

I second the advice about controlling the memory used by jobs.

However, for what it's worth, OOM adjustment should be straightforward
with the planned loadable module support in the shepherd.  SLURM has a
module to do it.  It needs to be done there as it's a privileged
operation, unless there's some reasonably safe way to do it with an suid
starter method.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to