Have you tried using "mem_free" ?

We also use h_vmem for all the reasons cited before, but using load sensors
(mem_free + load_avg) is fine for a simple setup.

2016-06-01 17:38 GMT+02:00 Simon Andrews <simon.andr...@babraham.ac.uk>:

> We also hit the problem with h_vmem and parallel jobs.  I can't remember
> what it was off-hand, but I know there was a problem with setting JOB for
> the consumable, so our fix was to get the JSV to divide the request by the
> number of cores so they got the right amount in the end that way instead.
>
>
> On 01/06/2016, 16:06, "users-boun...@gridengine.org on behalf of
> Christopher Black" <users-boun...@gridengine.org on behalf of
> cbl...@nygenome.org> wrote:
>
> >We also set h_vmem as a consumable complex. We set the default memory
> >request per job by setting a default value in qconf ­mc rather than using
> >a jsv.
> >One thing to be aware of is by default if you set h_vem to consumable=YES
> >it gets multiplied by number of cores (so qsub ­l mem=4G ­pe smp 2 would
> >be asking for 8GB). I¹ve dealt with this at a previous site and this time
> >we set consumable=JOB so it doesn¹t get multiplied (SoGE 8.1.x). This may
> >be sge-version-dependent.
> >
> >Related to Skylar¹s comments about ³a world where nodes no longer get run
> >into the ground by one misbehaving job², I also recommend core binding or
> >other limits to ensure an aggressively multithreaded job doesn¹t fight
> >for far more cores than it asked for in a pe request. Similar to memory
> >contention, this is not an issue in all environments and you will have to
> >consider whether it is worth it.
> >
> >Best,
> >Chris
> >
> >On 6/1/16, 10:57 AM, "users-boun...@gridengine.org on behalf of Ian
> >Kaufman" <users-boun...@gridengine.org on behalf of
> >ikauf...@eng.ucsd.edu> wrote:
> >
> >>This from Simon, and what Skylar said, are what you should heed.
> >>
> >>
> >>
> >>
> >>In the end our solution was to have strict hard limits (h_vmem) on
> >>memory and to define h_vmem as a consumable complex.  To make life
> >>easier for our users though we used a job submission verifier to add a
> >>default allocation of 1GB to any job which didn't
> >> ask for any memory.  This covers all of the small jobs.  For larger
> >>jobs we simply tell people to ask for more than they need if they're
> >>only doing something once, or if they have a bunch of jobs to run then
> >>run one with too much memory allocated and then
> >> use qacct to look at the actual max usage so they know what they should
> >>ask for next time.  We had some teething troubles with this for a few
> >>weeks after it was introduced, but it's all been working smoothly for a
> >>long time now.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>--
> >>Ian Kaufman
> >>Research Systems Administrator
> >>UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
> >>
> >>
> >>
> >
> >This electronic message is intended for the use of the named recipient
> >only, and may contain information that is confidential, privileged or
> >protected from disclosure under applicable law. If you are not the
> >intended recipient, or an employee or agent responsible for delivering
> >this message to the intended recipient, you are hereby notified that any
> >reading, disclosure, dissemination, distribution, copying or use of the
> >contents of this message including any of its attachments is strictly
> >prohibited. If you have received this message in error or are not the
> >named recipient, please notify us immediately by contacting the sender at
> >the electronic mail address noted above, and destroy all copies of this
> >message. Please note, the recipient should check this email and any
> >attachments for the presence of viruses. The organization accepts no
> >liability for any damage caused by any virus transmitted by this email.
> >
> >_______________________________________________
> >users mailing list
> >users@gridengine.org
> >https://gridengine.org/mailman/listinfo/users
>
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
> Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the
> addressee. If you received this in error, please contact the sender and
> delete this email from your system. The contents of this e-mail are the
> views of the sender and do not necessarily represent the views of the
> Babraham Institute. Full conditions at: www.babraham.ac.uk<
> http://www.babraham.ac.uk/terms>
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to