I started another job with a vsize limit of 50G and it got killed as well. Third time was indeed the charm, I tried 80G and it worked. Logging ps throughout, I found the peak vsize: 53G (!?) Remember, this is a samtools sort process started with "-@4 -m5G", which ended up using >13G per thread, >260% of what I requested (!)
I don't care that much about how much RAM a process requires as long as it's reasonable. But in a distributed environment (eg: grid engine) where various users need various resources, it is desirable to have accurate estimation of what a particular job needs. In that sense, I would like to suggest to the maintainers 2 things: 1. Fix the approximation. I'm pretty sure this has to do with the size of the reads, probably the process grossly underestimates the length of the reads, and tried to load too many of them in a buffer. 2. In addition to 1, "-m" should always be a hard limit, not an approximation. This business with extra memory required to sort, this should all be under the hood: if the cl parameters are "-@4 -m5G", you know the vsize limit is 20G, so if you need a 2G internal buffer, just give each thread (20-2)/4=4.5G. A grid engine job trying to run "-@4 -m5G" should never fail with e.g. a very reasonable "-l h_vmem=6G -pe smp 4". On Mon, 19 Jan 2015 13:52:14 +0100 Devon Ryan <dpr...@dpryan.com> wrote: > Hi Matei, > > Just as a point of reference, the -m option sets an approximate size > limit on the memory needed to hold the alignment that are to be > sorted. That doesn't include the memory needed to do the actual > sorting. The memory needed for that will actually depend on how many > alignments were needed to hit the -m option. I would guess that it's > that additional memory required that's causing the problem (someone > would have to look into how the actual merge sort is implemented to > see if the additional 50% is reasonable). > > Devon > > -- > Devon Ryan, Ph.D. > Email: dpr...@dpryan.com > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.r...@dzne.de> > > On Fri, Jan 16, 2015 at 6:35 PM, Matei David <ma...@cs.toronto.edu> > wrote: > > > Hi, > > > > I'm using samtools 1.0, and I'm seeing a process started with > > "samtools sort -@ 4 -m 5G ..." > > reach VSZ 30441316 (and get killed by SGE or the kernel because of > > reaching ulimit). 4 threads x 5G should be 20G at most, right? > > > > The samtools sort documentation mentions that "-m" is > > "approximately" max mem per thread. I'm not sure what to make of > > that, is there a range we could expect? In my case, the > > approximation seems to be quite far off: 30/4 = 7.5, which is 150% > > of what I specified as max. > > > > If that might matter, the reads I'm dealing with are quite big > > (>4Kbp average length), could that be causing problems with RAM > > usage estimation? > > > > What h_vmem would should I ask for to safely sort with "-@ 4 -m 5G"? > > > > Thanks, > > Matei > > > > > > ------------------------------------------------------------------------------ > > New Year. New Location. New Benefits. New Data Center in Ashburn, > > VA. GigeNET is offering a free month of service with a new server > > in Ashburn. Choose from 2 high performing configs, both with 100TB > > of bandwidth. Higher redundancy.Lower latency.Increased > > capacity.Completely compliant. http://p.sf.net/sfu/gigenet > > _______________________________________________ > > Samtools-help mailing list > > Samtools-help@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help