Thanks to everyone for the responses.
Ed, we tried increasing h_vmem and regardless of the number, if we submit a
job without the -l h_vmem=x parameter then the job fails.
> The easiest way to deal with the issue is to purchase machines with a lot
of memory... :)
^- The Truth! haha.
Bill, thanks for the link and nifty line of code.
> qhost | awk 'NR>3 { print "qconf -mattr exechost complex_values
mem_free=" $8,$1 }' | sh
Unfortunately your link for the discussion is now 404. In your example,
what does $8 refer to, because on mine the 8th column from qhost is SWAPUS
Arne, unfortunately we are stuck using 6.1u3 for a while as our sysadmin
refuses to change things on our older system. I will look into cgroups once
we get a new system in place, thanks.
Ian, it sounds like a JSV might be what we need. We are already using a
wrapper to qsub so it shouldn't be too hard to just have all jobs submit
with a default -l h_vmem if none is provided. Seems odd though that there
is no way to set a default value so that jobs without an explicit request
to the resource don't die.
On Tue, Feb 24, 2015 at 6:58 AM, Ian Kaufman <[email protected]> wrote:
> I wound up using a JSV that checked to see if h_vmem was supplied by
> the user, and if not, I supplied a default 4G request in the JSV
> rewrite.
>
> Not sure if JSVs existed in 6.1u3 though.
>
> Ian
>
> On Mon, Feb 23, 2015 at 4:07 PM, Mishkin Derakhshan
> <[email protected]> wrote:
> > Hi,
> > We have some jobs that require significant amounts of memory so we want
> to
> > try and setup h_vmem as a consumable resource to manage this.
> >
> > This is what we have setup:
> > $ qconf -sq dev.q | grep h_vmem
> > h_vmem 3.7G
> >
> > $ qconf -sc | grep h_vmem
> > h_vmem h_vmem MEMORY <= YES YES 0
> > 0
> >
> > And if we submit jobs like this then we don't have any problems,
> > $ qsub -b y -j y -l h_vmem=1G -q dev.q sleep 100
> >
> > But if we submit jobs without explicitly requesting h_vmem (i.e., we
> don't
> > use -l h_vmem=X) then the jobs die on startup saying it can't allocate
> > memory:
> > error reason 1: 02/19/2015 14:13:39 [0:14840]: can't set
> > additional group id (uid=0, euid=0): Cannot allocate memory
> >
> > We _think_ this has to do with setting a default h_vmem (on a queue
> basis?
> > host basis?) so jobs that don't explicitly request the resource will use
> > something by default, but we've been unable to figure out how to set this
> > up.
> >
> > We are using 6.1u3.
> >
> > thanks
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> >
>
>
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users