Have you tried using "mem_free" ? We also use h_vmem for all the reasons cited before, but using load sensors (mem_free + load_avg) is fine for a simple setup.
2016-06-01 17:38 GMT+02:00 Simon Andrews <simon.andr...@babraham.ac.uk>: > We also hit the problem with h_vmem and parallel jobs. I can't remember > what it was off-hand, but I know there was a problem with setting JOB for > the consumable, so our fix was to get the JSV to divide the request by the > number of cores so they got the right amount in the end that way instead. > > > On 01/06/2016, 16:06, "users-boun...@gridengine.org on behalf of > Christopher Black" <users-boun...@gridengine.org on behalf of > cbl...@nygenome.org> wrote: > > >We also set h_vmem as a consumable complex. We set the default memory > >request per job by setting a default value in qconf mc rather than using > >a jsv. > >One thing to be aware of is by default if you set h_vem to consumable=YES > >it gets multiplied by number of cores (so qsub l mem=4G pe smp 2 would > >be asking for 8GB). I¹ve dealt with this at a previous site and this time > >we set consumable=JOB so it doesn¹t get multiplied (SoGE 8.1.x). This may > >be sge-version-dependent. > > > >Related to Skylar¹s comments about ³a world where nodes no longer get run > >into the ground by one misbehaving job², I also recommend core binding or > >other limits to ensure an aggressively multithreaded job doesn¹t fight > >for far more cores than it asked for in a pe request. Similar to memory > >contention, this is not an issue in all environments and you will have to > >consider whether it is worth it. > > > >Best, > >Chris > > > >On 6/1/16, 10:57 AM, "users-boun...@gridengine.org on behalf of Ian > >Kaufman" <users-boun...@gridengine.org on behalf of > >ikauf...@eng.ucsd.edu> wrote: > > > >>This from Simon, and what Skylar said, are what you should heed. > >> > >> > >> > >> > >>In the end our solution was to have strict hard limits (h_vmem) on > >>memory and to define h_vmem as a consumable complex. To make life > >>easier for our users though we used a job submission verifier to add a > >>default allocation of 1GB to any job which didn't > >> ask for any memory. This covers all of the small jobs. For larger > >>jobs we simply tell people to ask for more than they need if they're > >>only doing something once, or if they have a bunch of jobs to run then > >>run one with too much memory allocated and then > >> use qacct to look at the actual max usage so they know what they should > >>ask for next time. We had some teething troubles with this for a few > >>weeks after it was introduced, but it's all been working smoothly for a > >>long time now. > >> > >> > >> > >> > >> > >> > >> > >> > >>-- > >>Ian Kaufman > >>Research Systems Administrator > >>UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > >> > >> > >> > > > >This electronic message is intended for the use of the named recipient > >only, and may contain information that is confidential, privileged or > >protected from disclosure under applicable law. If you are not the > >intended recipient, or an employee or agent responsible for delivering > >this message to the intended recipient, you are hereby notified that any > >reading, disclosure, dissemination, distribution, copying or use of the > >contents of this message including any of its attachments is strictly > >prohibited. If you have received this message in error or are not the > >named recipient, please notify us immediately by contacting the sender at > >the electronic mail address noted above, and destroy all copies of this > >message. Please note, the recipient should check this email and any > >attachments for the presence of viruses. The organization accepts no > >liability for any damage caused by any virus transmitted by this email. > > > >_______________________________________________ > >users mailing list > >users@gridengine.org > >https://gridengine.org/mailman/listinfo/users > > The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT > Registered Charity No. 1053902. > The information transmitted in this email is directed only to the > addressee. If you received this in error, please contact the sender and > delete this email from your system. The contents of this e-mail are the > views of the sender and do not necessarily represent the views of the > Babraham Institute. Full conditions at: www.babraham.ac.uk< > http://www.babraham.ac.uk/terms> > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users