Re: [gridengine users] memory consumption while running the jobs in parallel environment

William Hay Thu, 04 Jun 2015 00:45:08 -0700

On Thu, 4 Jun 2015 06:18:36 +0000
"sudha.penme...@wipro.com" <sudha.penme...@wipro.com> wrote:


> Hi,
> 
> While running jobs in parallel environment if we want to run a job in grid 
> using 4 cores and total memory consumption is 40G we are defining as for 
> example
> 
> qrsh -V -cwd -q test.q -l mem_free=40G,h_vmem=10G -pe sharedmem 4 sleep 40
> 
> However this assumes that each of the threads consumes max 10G mem, the total 
> h_vmem consumed on the execution host is 40G
> 
> Our experiments have shown that when running the job in single core it 
> requires the 40G mem but if we divide the 40G by four (running with “-pe 
> sharedmem 4”) the job crashes to out of mem.
> One option is to run it like this :
> qrsh -V -cwd -q  test.q -l mem_free=40G,h_vmem=40G -pe sharedmem 4 sleep 40
> however then we end up consuming 160G of h_vmem from the execution host,
> 
> So how to ensure that each thread consumes memory only if needed
> 
> -          40G total h_vmem is consumed so that each thread can consume 40G 
> mem if needed
> 
> One option of course is to leave out the h_vmem definition :
> qrsh -V -cwd -q test.q -l mem_free=40G -pe sharedmem 4 sleep 40
> 
> however then other users might eat the memory from the host and our run 
> crashes again.
> 
This depends on how your cluster is set up.  It sounds like you have h_vmem 
configured as consumable.
Are you the admin on your cluster?  One trick we use here is to use a complex 
resource 
to represent the number of threads a multithreaded process requires rather than 
a pe so h_vmem isn't scaled
with the number of threads.  Our jsv/defaults ensures everyone requests at 
least one thread per slot.

However this is on a rather old version of grid engine and might not play 
nicely with core binding or 
cpuset/cgroup integration.  

 

 

-- 
William Hay <w....@ucl.ac.uk>

pgpmywWuaK71P.pgp
Description: PGP signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] memory consumption while running the jobs in parallel environment

Reply via email to