I would very much recommend that you (and your users) bite the bullet and enforce memory and other resource limits. It sounds painful, but the long-term payoff is very high. It shifts the incentive to writing high-quality, instrumented code; without resource limits there is no incentive to do this as high-quality code is just as likely to get killed off by an errant, sloppily-written job as the sloppily-written job is to get killed by something else.
We started this process by having a subset of our cluster users run in an environment with resource limits in place, and over the course of a year we had people clamoring for limits after they heard of a world where nodes no longer get run into the ground by one misbehaving job. On Wed, Jun 01, 2016 at 04:44:57PM +0300, Ben Daniel Pere wrote: > Hi all, > > I'm trying to limit SGE from submittion jobs to any node which has less > than 10% memory free, regarrdless of queue, user or anything else - just > add a rule / complex / resource quote that will make SGE not submit tasks > to machines with less than 10% memory free (or 20GB if number must be > absolute). > > We've got several queues and each can run tasks with various memory demands > and we're not asking users to provide how much memory they need and there's > no way to start doing it, people will give wrong estimations and we'll > either have dying tasks or idle cluster due to exaggerated estimations and > it really depends on the size of the dataset being worked on which often > can't be predetermined. > > is there a way to do it? thanks!!! > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users