On Sun, Jan 08, 2017 at 06:55:03PM -0500, Angela Zou wrote:
> I read somewhere that Heng Li said that bam files should be smaller after
> sorting because sorting gives a better compression ratio. However, for me,
> after sorting my bam file went from ~31 GB to to ~34 GB. I was wondering if
> this is normal or that there is something wrong with my bam file?

I doubt there is anything "wrong", but it's atypical.

Name sorted data means similar read names are adjacent and name
compression is usually good.  However the DNA is likely very
dissimilar to the previous/next line so DNA compression is poor.

Genome position sorted data means read names are essentially in a
random order so the name compression is poorer.  However the DNA
string is likely to look very similar to the previous line, which
improves compression ratio.  Usually this reduction far outweighs the
growth in read names.

However if you have very low coverage, the position sorted data
doesn't help sequence compression (while still harming the name
compression).

Fundamentally though, you need the data in position sorted anyway and
the reduction in size is just a bonus.  If size really matters, switch
to CRAM. :-)

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
                                  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to