I was wondering, how people deal with oom conditions on there cluster.
We constantly have machines that die because the oom killer takes out
critical system services.

Has any experiance with the oom_adj proc value, or a patch to grid to
support it?


 /proc/[pid]/oom_adj (since Linux 2.6.11)
              This file can be used to adjust the score used to select
which process
              should be killed in an out-of-memory (OOM) situation.
The kernel uses
              this value for a bit-shift operation of the process's
oom_score value:
              valid values are in the range -16 to +15, plus the
special value -17,
              which disables OOM-killing altogether for this process.
A positive
              score increases the likelihood of this process being
killed by the OOM-
              killer; a negative score decreases the likelihood.  The
default value
              for this file is 0; a new process inherits its parent's oom_adj
              setting.  A process must be privileged
(CAP_SYS_RESOURCE) to update
              this file.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to