Maybe I didn't describe things as clear as I should've.  Or maybe I just don't 
understand your response.  

My second queue is really only there for "emergencies" when someone needs to 
run something small but the main queue is filled up for days.  So right now, 
it's accounting for the slots and the memory as I want, in that I have 142G 
total and once that 142G is spoke for it can't assign more jobs to that host, 
whether it is in the main queue or the secondary queue.  But my issue is that 
GE no longer seems to actually be placing a physical limit on the scripts that 
are running.  At one time, I was able to say `-pe smp 24 -l h_vmem 3.5G` and my 
script would stay right around 3.5G and finish in 19 hours.  At other times, 
the exact same script with the exact same variables at submission would then 
try to use 35+G (I am pretty sure these are per core/slot numbers)and take 3 
days to complete, which is the same as if I had no h_vmem settings at all.

So I guess the sizzled down version of my question is: would dropping back to 
only one queue, setting the complex config to JOB, then setting the queue 
config to "h_vmem 142G" fix the issue with my script and get it back to the ~19 
hour speed?

Brett Taylor
Systems Administrator
Center for Systems and Computational Biology

The Wistar Institute
3601 Spruce St.
Room 214
Philadelphia PA 19104
Tel: 215-495-6914
Sending me a large file? Use my secure dropbox:
https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected]


-----Original Message-----
From: Reuti [mailto:[email protected]] 
Sent: Thursday, February 07, 2013 4:58 PM
To: Brett Taylor
Cc: [email protected]
Subject: Re: [gridengine users] h_vmem not actually restricting memory usage?

Am 07.02.2013 um 21:42 schrieb Brett Taylor:

> Hello,
> 
> I've been testing out the h_vmem settings for a while now, and currently I 
> have this setup:
> 
> Exec host
>       complex_values        slots=36,h_vmem=142G
> high_priority.q
>       h_vmem INFINITY

This is not the consumable complex per se. This you would also add in the 
queue_definition's complex_values to have a consumable per queue instance.


>       slots                 24
>       priority              0
> low_priority.q
>       h_vmem INFINITY
>       priority              18
>       slots                 12
> qconf -sc
>       h_vmem              h_vmem     MEMORY      <=    YES         YES        
> 3.95G    0
> 
> I know that there has been discussion of a bug with respect to setting the 
> complex to JOB, which is why I settled on this configuration a few months ago 
> in order to have two queues without oversubscribing the memory.  However, 
> this doesn't seem to actually limit the memory usage during run time, like I 
> have seen GE do before.
> 
> I have one script that I have been using to benchmark my cluster and figure 
> out the queue stats.  It runs tophat and bowtie and my metrics for knowing if 
> the memory is being limited are the "Max vmem:" and "Wall clock time:" stats. 
>  If the memory isn't limited, then if I submit the job using 24 cores, I'll 
> see "Max vmem: 35.342G" and a wall clock time around 2:20:00:00.  When I was 
> able to limit the vmem, I saw stats more like " Wallclock Time   = 
> 19:51:49... Max vmem         = 3.932G".  As you can see, 19 hours is a lot 
> quicker than 2 days.

This sounds like the application is confused by to much memory you mean?

-- Reuti


> I don't have definitive proof, but I think changing to JOB and setting a 
> limit in the queue definition, instead of INFINITY, might restore the actual 
> runtime limit. But, then I wouldn't be able to have two queues in the way I 
> have them now.  I'd like to test this myself but my tiny cluster is full at 
> the moment. Can anyone confirm these settings for me?
> 
> Thanks,
> Brett
> 
> 
> Brett Taylor
> Systems Administrator
> Center for Systems and Computational Biology
> 
> The Wistar Institute
> 3601 Spruce St.
> Room 214
> Philadelphia PA 19104
> Tel: 215-495-6914
> Sending me a large file? Use my secure dropbox:
> https://cscb-filetransfer.wistar.upenn.edu/dropbox/[email protected]
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


-- 
This email was Anti Virus checked by Astaro Security Gateway. 
http://www.astaro.com

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to