On Thu, 4 Jun 2015 06:18:36 +0000 "sudha.penme...@wipro.com" <sudha.penme...@wipro.com> wrote:
> Hi, > > While running jobs in parallel environment if we want to run a job in grid > using 4 cores and total memory consumption is 40G we are defining as for > example > > qrsh -V -cwd -q test.q -l mem_free=40G,h_vmem=10G -pe sharedmem 4 sleep 40 > > However this assumes that each of the threads consumes max 10G mem, the total > h_vmem consumed on the execution host is 40G > > Our experiments have shown that when running the job in single core it > requires the 40G mem but if we divide the 40G by four (running with “-pe > sharedmem 4”) the job crashes to out of mem. > One option is to run it like this : > qrsh -V -cwd -q test.q -l mem_free=40G,h_vmem=40G -pe sharedmem 4 sleep 40 > however then we end up consuming 160G of h_vmem from the execution host, > > So how to ensure that each thread consumes memory only if needed > > - 40G total h_vmem is consumed so that each thread can consume 40G > mem if needed > > One option of course is to leave out the h_vmem definition : > qrsh -V -cwd -q test.q -l mem_free=40G -pe sharedmem 4 sleep 40 > > however then other users might eat the memory from the host and our run > crashes again. > This depends on how your cluster is set up. It sounds like you have h_vmem configured as consumable. Are you the admin on your cluster? One trick we use here is to use a complex resource to represent the number of threads a multithreaded process requires rather than a pe so h_vmem isn't scaled with the number of threads. Our jsv/defaults ensures everyone requests at least one thread per slot. However this is on a rather old version of grid engine and might not play nicely with core binding or cpuset/cgroup integration. -- William Hay <w....@ucl.ac.uk>
pgpmywWuaK71P.pgp
Description: PGP signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users