Interesting thread!

I like these comments:
        "This is twice as long as the longest human chromosome and is
quite adequate for practical use" - Thomas W. Blackwell
        "Hopefully 2 Gbp suffices for most species that people are
interested in." - John Marshall.

As already mentioned, bread wheat, which provides ~20% of all the
calories consumed by humans globally, has 6 of it's 27 chromosomes
*assembled* to a length of greater than 537 Mbp [1]...

I won't mention Paris japonica ;-)


[1] 
https://docs.google.com/spreadsheet/ccc?key=0Aqs5UFlky_s6dDlmVVFib3FoSm1tS1JIQzMxY0RVb2c

On 18 October 2013 11:50, John Marshall <j...@sanger.ac.uk> wrote:
> On 17 Oct 2013, at 17:08, Thomas W. Blackwell wrote:
>> Chromosome positions beyond  2^29 - 1 = 536,870,911  are not representable
>> in the .bam binary format.
>
> Actually the 2^29 - 1 limitation is in the BAI index format.
>
> BAM files themselves can represent positions up to 2^31 - 1, i.e., 2 billion 
> or so.  The SAM/BAM specification originally said 2^29 out of sympathy with 
> BAI, but this was increased to 2^31 - 1 a couple of years ago [1].
>
> If a BAM file contains reads with positions beyond 2^29 - 1 then you won't be 
> able to index it with a .bai index, as the original poster found.  HTSlib 
> introduces a new index format, "CSI", that is similar to BAI but is more 
> flexible about bin size and depth, so can index all the positions 
> representable in BAM.
>
> Hopefully 2 Gbp suffices for most species that people are interested in.  In 
> principle the existing file formats could be compatibly pushed to 4 Gbp, but 
> this would require great care with getting signed v. unsigned arithmetic 
> correct in the implementations so would likely not really be worthwhile.
>
>     John
>
> [1] See 
> https://github.com/samtools/hts-specs/commit/28c54abe2d3478eb7ea7c5fea0a268282b087f2b
>  and also today's commit that updates a couple of 2^29 figures that were 
> previously missed.
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to