On Wed, 23 Jan 2013 05:52:04 -0700, Bjørn-Helge Mevik <[email protected]> 
wrote:
> 
> "Mark A. Grondona" <[email protected]> writes:
> 
> > In upstream kernels there is already a feature for setting up
> > memcg notifications, and the majority of this should be backported
> > to RHEL as of RHEL6.4. 
> 
> This is very good news!  It is always a nuisance when users' jobs get
> killed and there is no message about why in slurm-nnn.out.  Then we get
> an RT ticket, and have to grep in /var/log/messages.
> 
> -- 

We use the following spank/lua plugin which makes a best effort attempt
to notify users when one of their tasks is killed by the OOM killer.

 http://code.google.com/p/slurm-spank-plugins/source/browse/lua/oom-detect.lua

It just does a grep of dmesg output, so it isn't perfect. Things
will be much better with oom notifications.

mark


> Cheers,
> Bjørn-Helge Mevik, dr. scient,
> Research Computing Services, University of Oslo

Reply via email to