We also hit the problem with h_vmem and parallel jobs. I can't remember what it was off-hand, but I know there was a problem with setting JOB for the consumable, so our fix was to get the JSV to divide the request by the number of cores so they got the right amount in the end that way instead.
On 01/06/2016, 16:06, "users-boun...@gridengine.org on behalf of Christopher Black" <users-boun...@gridengine.org on behalf of cbl...@nygenome.org> wrote: >We also set h_vmem as a consumable complex. We set the default memory >request per job by setting a default value in qconf mc rather than using >a jsv. >One thing to be aware of is by default if you set h_vem to consumable=YES >it gets multiplied by number of cores (so qsub l mem=4G pe smp 2 would >be asking for 8GB). I¹ve dealt with this at a previous site and this time >we set consumable=JOB so it doesn¹t get multiplied (SoGE 8.1.x). This may >be sge-version-dependent. > >Related to Skylar¹s comments about ³a world where nodes no longer get run >into the ground by one misbehaving job², I also recommend core binding or >other limits to ensure an aggressively multithreaded job doesn¹t fight >for far more cores than it asked for in a pe request. Similar to memory >contention, this is not an issue in all environments and you will have to >consider whether it is worth it. > >Best, >Chris > >On 6/1/16, 10:57 AM, "users-boun...@gridengine.org on behalf of Ian >Kaufman" <users-boun...@gridengine.org on behalf of >ikauf...@eng.ucsd.edu> wrote: > >>This from Simon, and what Skylar said, are what you should heed. >> >> >> >> >>In the end our solution was to have strict hard limits (h_vmem) on >>memory and to define h_vmem as a consumable complex. To make life >>easier for our users though we used a job submission verifier to add a >>default allocation of 1GB to any job which didn't >> ask for any memory. This covers all of the small jobs. For larger >>jobs we simply tell people to ask for more than they need if they're >>only doing something once, or if they have a bunch of jobs to run then >>run one with too much memory allocated and then >> use qacct to look at the actual max usage so they know what they should >>ask for next time. We had some teething troubles with this for a few >>weeks after it was introduced, but it's all been working smoothly for a >>long time now. >> >> >> >> >> >> >> >> >>-- >>Ian Kaufman >>Research Systems Administrator >>UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >> >> >> > >This electronic message is intended for the use of the named recipient >only, and may contain information that is confidential, privileged or >protected from disclosure under applicable law. If you are not the >intended recipient, or an employee or agent responsible for delivering >this message to the intended recipient, you are hereby notified that any >reading, disclosure, dissemination, distribution, copying or use of the >contents of this message including any of its attachments is strictly >prohibited. If you have received this message in error or are not the >named recipient, please notify us immediately by contacting the sender at >the electronic mail address noted above, and destroy all copies of this >message. Please note, the recipient should check this email and any >attachments for the presence of viruses. The organization accepts no >liability for any damage caused by any virus transmitted by this email. > >_______________________________________________ >users mailing list >users@gridengine.org >https://gridengine.org/mailman/listinfo/users The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902. The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms> _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users