I have the same concern: samtools sort uses much more RAM than the requested size. I guess this is because samtools is reserving excessive RAM to reduce malloc() calls. It would be good to make improvements. For example, we may periodically free unused memory.
I sometimes use sambamba for sorting. It also uses more RAM than the requested size, but not as much. Heng On Jan 19, 2015, at 7:52, Devon Ryan <dpr...@dpryan.com> wrote: > Hi Matei, > > Just as a point of reference, the -m option sets an approximate size limit on > the memory needed to hold the alignment that are to be sorted. That doesn't > include the memory needed to do the actual sorting. The memory needed for > that will actually depend on how many alignments were needed to hit the -m > option. I would guess that it's that additional memory required that's > causing the problem (someone would have to look into how the actual merge > sort is implemented to see if the additional 50% is reasonable). > > Devon > > -- > Devon Ryan, Ph.D. > Email: dpr...@dpryan.com > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > > On Fri, Jan 16, 2015 at 6:35 PM, Matei David <ma...@cs.toronto.edu> wrote: > Hi, > > I'm using samtools 1.0, and I'm seeing a process started with > "samtools sort -@ 4 -m 5G ..." > reach VSZ 30441316 (and get killed by SGE or the kernel because of > reaching ulimit). 4 threads x 5G should be 20G at most, right? > > The samtools sort documentation mentions that "-m" is "approximately" > max mem per thread. I'm not sure what to make of that, is there a > range we could expect? In my case, the approximation seems to be quite > far off: 30/4 = 7.5, which is 150% of what I specified as max. > > If that might matter, the reads I'm dealing with are quite big (>4Kbp > average length), could that be causing problems with RAM usage > estimation? > > What h_vmem would should I ask for to safely sort with "-@ 4 -m 5G"? > > Thanks, > Matei > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet_______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help