On 15 Dec 2017, Brent Pedersen <bpede...@gmail.com> wrote:
> With bam/tabix, we can recognize the stats bin in index 37450
> 
> It is not documented how to find this bin for CSI which can have real
> data in 37450.

There exists a draft of a fleshed-out CSI document [1], but alas it still needs 
to be rescued from the back burner. CSI was introduced in HTSlib and the 
appropriate bin number for these bins in CSI can be gleaned from the HTSlib 
source code, or the relevant information from that draft is below.

> What is the way to find it?

The information and layout inside the pseudo-bin is the same as in BAI, and it 
appears as bin number bin_limit+1, where bin_limit() is the function below. 
This is a generalisation of BAI's 37450, so this calculation produces the right 
bin number for BAI too. The "+1" is an accident of history; there was one 
single slot left vacant between the largest populatable BAI bin number (37448) 
and 37450, but there's no particular discernible reason for this and it doesn't 
affect anything in practice.

    John


/* calculate maximum bin number -- valid bin numbers range within [0,bin_limit) 
*/
int bin_limit(int min_shift, int depth)
{
    return ((1 << (depth+1)*3) - 1) / 7;
}

[1] https://sourceforge.net/p/samtools/mailman/message/33475986/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to