On Sun, Jan 08, 2017 at 06:55:03PM -0500, Angela Zou wrote: > I read somewhere that Heng Li said that bam files should be smaller after > sorting because sorting gives a better compression ratio. However, for me, > after sorting my bam file went from ~31 GB to to ~34 GB. I was wondering if > this is normal or that there is something wrong with my bam file?
I doubt there is anything "wrong", but it's atypical. Name sorted data means similar read names are adjacent and name compression is usually good. However the DNA is likely very dissimilar to the previous/next line so DNA compression is poor. Genome position sorted data means read names are essentially in a random order so the name compression is poorer. However the DNA string is likely to look very similar to the previous line, which improves compression ratio. Usually this reduction far outweighs the growth in read names. However if you have very low coverage, the position sorted data doesn't help sequence compression (while still harming the name compression). Fundamentally though, you need the data in position sorted anyway and the reduction in size is just a bonus. If size really matters, switch to CRAM. :-) James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help