Thanks for the tip. I just tried that with h_vmem set to 71G in the queue definition, and set to INFINITY, submitting jobs with different-l h_vmem= limits. Both had the same effect so that doesn't seem to explain my mixed results with my tests.
Brett Taylor Systems Administrator Center for Systems and Computational Biology The Wistar Institute 3601 Spruce St. Room 214 Philadelphia PA 19104 Tel: 215-495-6914 Sending me a large file? Use my secure dropbox: https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected] -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Alex Chekholko Sent: Friday, February 08, 2013 2:20 PM To: [email protected] Subject: Re: [gridengine users] h_vmem not actually restricting memory usage? Hi Brett, I saw the issue with JOB vs YES when the jobs were requesting more than one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory limits when "JOB" was set. If your jobs are only requesting one complex, the JOB setting should work as expected, and you can try both ways and test the results. For troubleshooting, I also usually just make a super-short script that outputs the output of 'ulimit -a' from inside the job environment, check that the 'ulimit -v' value matches your h_vmem value. Also check your 'qhost -F h_vmem' output to see that it looks as you expect. Regards, Alex On 2/7/13 12:42 PM, Brett Taylor wrote: > Hello, > > I've been testing out the h_vmem settings for a while now, and currently I > have this setup: > > Exec host > complex_values slots=36,h_vmem=142G > high_priority.q > h_vmem INFINITY > slots 24 > priority 0 > low_priority.q > h_vmem INFINITY > priority 18 > slots 12 > qconf -sc > h_vmem h_vmem MEMORY <= YES YES > 3.95G 0 > > I know that there has been discussion of a bug with respect to setting the > complex to JOB, which is why I settled on this configuration a few months ago > in order to have two queues without oversubscribing the memory. However, > this doesn't seem to actually limit the memory usage during run time, like I > have seen GE do before. > > I have one script that I have been using to benchmark my cluster and figure > out the queue stats. It runs tophat and bowtie and my metrics for knowing if > the memory is being limited are the "Max vmem:" and "Wall clock time:" stats. > If the memory isn't limited, then if I submit the job using 24 cores, I'll > see "Max vmem: 35.342G" and a wall clock time around 2:20:00:00. When I was > able to limit the vmem, I saw stats more like " Wallclock Time = > 19:51:49... Max vmem = 3.932G". As you can see, 19 hours is a lot > quicker than 2 days. > > I don't have definitive proof, but I think changing to JOB and setting a > limit in the queue definition, instead of INFINITY, might restore the actual > runtime limit. But, then I wouldn't be able to have two queues in the way I > have them now. I'd like to test this myself but my tiny cluster is full at > the moment. Can anyone confirm these settings for me? > > Thanks, > Brett > > > Brett Taylor > Systems Administrator > Center for Systems and Computational Biology > > The Wistar Institute > 3601 Spruce St. > Room 214 > Philadelphia PA 19104 > Tel: 215-495-6914 > Sending me a large file? Use my secure dropbox: > https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected] _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users -- This email was Anti Virus checked by Astaro Security Gateway. http://www.astaro.com _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
