Am 07.02.2013 um 23:24 schrieb Brett Taylor: > Maybe I didn't describe things as clear as I should've. Or maybe I just > don't understand your response. > > My second queue is really only there for "emergencies" when someone needs to > run something small but the main queue is filled up for days. So right now, > it's accounting for the slots and the memory as I want, in that I have 142G > total and once that 142G is spoke for it can't assign more jobs to that host, > whether it is in the main queue or the secondary queue. But my issue is that > GE no longer seems to actually be placing a physical limit on the scripts > that are running. At one time, I was able to say `-pe smp 24 -l h_vmem 3.5G` > and my script would stay right around 3.5G and finish in 19 hours. At other > times, the exact same script with the exact same variables at submission > would then try to use 35+G (I am pretty sure these are per core/slot > numbers)and take 3 days to complete, which is the same as if I had no h_vmem > settings at all. > > So I guess the sizzled down version of my question is: would dropping back to > only one queue, setting the complex config to JOB, then setting the queue > config to "h_vmem 142G"
Unless you request h_vmem in the job submission, a setting of the h_vmem in the queue configuration will put an upper on the job's usage of memory (and the the possible request per job) but it won't be withdrawn automatically from the complex. > fix the issue with my script and get it back to the ~19 hour speed? I can't make any reliable statement as I don't know your application in detail. -- Reuti > Brett Taylor > Systems Administrator > Center for Systems and Computational Biology > > The Wistar Institute > 3601 Spruce St. > Room 214 > Philadelphia PA 19104 > Tel: 215-495-6914 > Sending me a large file? Use my secure dropbox: > https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected] > > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Thursday, February 07, 2013 4:58 PM > To: Brett Taylor > Cc: [email protected] > Subject: Re: [gridengine users] h_vmem not actually restricting memory usage? > > Am 07.02.2013 um 21:42 schrieb Brett Taylor: > >> Hello, >> >> I've been testing out the h_vmem settings for a while now, and currently I >> have this setup: >> >> Exec host >> complex_values slots=36,h_vmem=142G >> high_priority.q >> h_vmem INFINITY > > This is not the consumable complex per se. This you would also add in the > queue_definition's complex_values to have a consumable per queue instance. > > >> slots 24 >> priority 0 >> low_priority.q >> h_vmem INFINITY >> priority 18 >> slots 12 >> qconf -sc >> h_vmem h_vmem MEMORY <= YES YES >> 3.95G 0 >> >> I know that there has been discussion of a bug with respect to setting the >> complex to JOB, which is why I settled on this configuration a few months >> ago in order to have two queues without oversubscribing the memory. >> However, this doesn't seem to actually limit the memory usage during run >> time, like I have seen GE do before. >> >> I have one script that I have been using to benchmark my cluster and figure >> out the queue stats. It runs tophat and bowtie and my metrics for knowing >> if the memory is being limited are the "Max vmem:" and "Wall clock time:" >> stats. If the memory isn't limited, then if I submit the job using 24 >> cores, I'll see "Max vmem: 35.342G" and a wall clock time around 2:20:00:00. >> When I was able to limit the vmem, I saw stats more like " Wallclock Time >> = 19:51:49... Max vmem = 3.932G". As you can see, 19 hours is a lot >> quicker than 2 days. > > This sounds like the application is confused by to much memory you mean? > > -- Reuti > > >> I don't have definitive proof, but I think changing to JOB and setting a >> limit in the queue definition, instead of INFINITY, might restore the actual >> runtime limit. But, then I wouldn't be able to have two queues in the way I >> have them now. I'd like to test this myself but my tiny cluster is full at >> the moment. Can anyone confirm these settings for me? >> >> Thanks, >> Brett >> >> >> Brett Taylor >> Systems Administrator >> Center for Systems and Computational Biology >> >> The Wistar Institute >> 3601 Spruce St. >> Room 214 >> Philadelphia PA 19104 >> Tel: 215-495-6914 >> Sending me a large file? Use my secure dropbox: >> https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected] >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > > > -- > This email was Anti Virus checked by Astaro Security Gateway. > http://www.astaro.com > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
