Hi Brett,
I saw the issue with JOB vs YES when the jobs were requesting more than
one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory
limits when "JOB" was set.
If your jobs are only requesting one complex, the JOB setting should
work as expected, and you can try both ways and test the results.
For troubleshooting, I also usually just make a super-short script that
outputs the output of 'ulimit -a' from inside the job environment, check
that the 'ulimit -v' value matches your h_vmem value.
Also check your 'qhost -F h_vmem' output to see that it looks as you expect.
Regards,
Alex
On 2/7/13 12:42 PM, Brett Taylor wrote:
Hello,
I've been testing out the h_vmem settings for a while now, and currently I have
this setup:
Exec host
complex_values slots=36,h_vmem=142G
high_priority.q
h_vmem INFINITY
slots 24
priority 0
low_priority.q
h_vmem INFINITY
priority 18
slots 12
qconf -sc
h_vmem h_vmem MEMORY <= YES YES
3.95G 0
I know that there has been discussion of a bug with respect to setting the
complex to JOB, which is why I settled on this configuration a few months ago
in order to have two queues without oversubscribing the memory. However, this
doesn't seem to actually limit the memory usage during run time, like I have
seen GE do before.
I have one script that I have been using to benchmark my cluster and figure out the queue stats. It runs tophat and
bowtie and my metrics for knowing if the memory is being limited are the "Max vmem:" and "Wall clock
time:" stats. If the memory isn't limited, then if I submit the job using 24 cores, I'll see "Max vmem:
35.342G" and a wall clock time around 2:20:00:00. When I was able to limit the vmem, I saw stats more like "
Wallclock Time = 19:51:49... Max vmem = 3.932G". As you can see, 19 hours is a lot quicker than 2 days.
I don't have definitive proof, but I think changing to JOB and setting a limit
in the queue definition, instead of INFINITY, might restore the actual runtime
limit. But, then I wouldn't be able to have two queues in the way I have them
now. I'd like to test this myself but my tiny cluster is full at the moment.
Can anyone confirm these settings for me?
Thanks,
Brett
Brett Taylor
Systems Administrator
Center for Systems and Computational Biology
The Wistar Institute
3601 Spruce St.
Room 214
Philadelphia PA 19104
Tel: 215-495-6914
Sending me a large file? Use my secure dropbox:
https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected]
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users