Hi,
since oom_adj is deprecated since 2.6.36 kernel ( see
http://code.google.com/p/chromium/issues/detail?id=65009 among others )
and semantic has changed, ie.:
/proc/pid/oom_adj range: -17..15
/proc/pid/oom_score_adj range: -1000..1000
This means writing -17 on /proc/pid/oom_score_adj does not offer oom
protection
at all. It should be -1000.
I guess it's a matter of documentation and adjustments:
- SLURMD_OOM_ADJ and SLURMSTEPD_OOM_ADJ envars
- adjust set_oom_adj() to do "The Right Thing(Tm)" depending
on oom_score_adj presence or not.
- doc/html/faq.shtml adjustments ...
- src/plugins/task/cgroup/task_cgroup_memory.c adjustments:
it use -17 but that may end up in oom_score_adj, hence not
oom protecting ...
And, yes, "I hate when they do that(Tm)" :)
FYI, I checked slurm 2.3.4 source.
Maybe, it is already handled in recent slurm versions ?
A+
--
-----------------------------------------------------------
Michel Bourget - SGI - Linux Software Engineering
"Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------