I'll try to find some time to write some more docs soon, but fyi there  
are actually several overlap query algorithms in bx of different types.

IntervalIndexFile is mostly used for indexing large alignments on disk  
by position. It a tree of bins of fixed size (unlike the dynamic  
binning the NCLs do). The actual tools to use it are mostly MAF  
oriented, but I can write a simple script for indexing something more  
arbitrary. (The thing it actually stores in the on disk representation  
is an offset to the data of interest in another file). It is pretty  
disk oriented, and works well on network file systems since it only  
needs to read the subset of bins needed to find intervals in a  
specific region.

There is also intervals.Intersecter which is of the offline sorted  
endpoint / binary tree sort. This is in memory only.

And there is quicksect.py, which is a different type of tree, and is  
what the join tool in Galaxy uses. This is also in memory only but  
uses an extension module -- I think it can be a lot faster than  
Intersecter.

Maybe this weekend I can put together some quick examples.

-- jt

On Feb 11, 2009, at 3:15 PM, Istvan Albert wrote:

>
>
>
> On Feb 11, 2:44 pm, James Taylor <[email protected]> wrote:
>
>> It'd be really cool to compare some of the other features like
>> IntervalIndexFile vs. NLMSA. I've not had time to do it, but it might
>> be interesting (and encourage one or all of us to do some
>> optimization ;).
>
> lately I've become interested in exploring interval overlap query
> algorithms. It looks like just about everything in the genome is
> transcribed at some point, thus we will end up with orders of
> magnitude more interval data than what we were used to have. Having a
> seamless way to query these is just as important as being able to
> fetch a sequence and operate on it.
>
> If you guys write some docs on how this IntervalIndexFile is used in
> practice, with some some examples etc. I'll write benchmarks to
> compare it to other type of queries.
>
> Istvan
> 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to