Interesting thread! I like these comments: "This is twice as long as the longest human chromosome and is quite adequate for practical use" - Thomas W. Blackwell "Hopefully 2 Gbp suffices for most species that people are interested in." - John Marshall.
As already mentioned, bread wheat, which provides ~20% of all the calories consumed by humans globally, has 6 of it's 27 chromosomes *assembled* to a length of greater than 537 Mbp [1]... I won't mention Paris japonica ;-) [1] https://docs.google.com/spreadsheet/ccc?key=0Aqs5UFlky_s6dDlmVVFib3FoSm1tS1JIQzMxY0RVb2c On 18 October 2013 11:50, John Marshall <j...@sanger.ac.uk> wrote: > On 17 Oct 2013, at 17:08, Thomas W. Blackwell wrote: >> Chromosome positions beyond 2^29 - 1 = 536,870,911 are not representable >> in the .bam binary format. > > Actually the 2^29 - 1 limitation is in the BAI index format. > > BAM files themselves can represent positions up to 2^31 - 1, i.e., 2 billion > or so. The SAM/BAM specification originally said 2^29 out of sympathy with > BAI, but this was increased to 2^31 - 1 a couple of years ago [1]. > > If a BAM file contains reads with positions beyond 2^29 - 1 then you won't be > able to index it with a .bai index, as the original poster found. HTSlib > introduces a new index format, "CSI", that is similar to BAI but is more > flexible about bin size and depth, so can index all the positions > representable in BAM. > > Hopefully 2 Gbp suffices for most species that people are interested in. In > principle the existing file formats could be compatibly pushed to 4 Gbp, but > this would require great care with getting signed v. unsigned arithmetic > correct in the implementations so would likely not really be worthwhile. > > John > > [1] See > https://github.com/samtools/hts-specs/commit/28c54abe2d3478eb7ea7c5fea0a268282b087f2b > and also today's commit that updates a couple of 2^29 figures that were > previously missed. > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help