On Wed, 23 Jan 2013 05:52:04 -0700, Bjørn-Helge Mevik <[email protected]> wrote: > > "Mark A. Grondona" <[email protected]> writes: > > > In upstream kernels there is already a feature for setting up > > memcg notifications, and the majority of this should be backported > > to RHEL as of RHEL6.4. > > This is very good news! It is always a nuisance when users' jobs get > killed and there is no message about why in slurm-nnn.out. Then we get > an RT ticket, and have to grep in /var/log/messages. > > --
We use the following spank/lua plugin which makes a best effort attempt to notify users when one of their tasks is killed by the OOM killer. http://code.google.com/p/slurm-spank-plugins/source/browse/lua/oom-detect.lua It just does a grep of dmesg output, so it isn't perfect. Things will be much better with oom notifications. mark > Cheers, > Bjørn-Helge Mevik, dr. scient, > Research Computing Services, University of Oslo
