I'll try to find some time to write some more docs soon, but fyi there are actually several overlap query algorithms in bx of different types.
IntervalIndexFile is mostly used for indexing large alignments on disk by position. It a tree of bins of fixed size (unlike the dynamic binning the NCLs do). The actual tools to use it are mostly MAF oriented, but I can write a simple script for indexing something more arbitrary. (The thing it actually stores in the on disk representation is an offset to the data of interest in another file). It is pretty disk oriented, and works well on network file systems since it only needs to read the subset of bins needed to find intervals in a specific region. There is also intervals.Intersecter which is of the offline sorted endpoint / binary tree sort. This is in memory only. And there is quicksect.py, which is a different type of tree, and is what the join tool in Galaxy uses. This is also in memory only but uses an extension module -- I think it can be a lot faster than Intersecter. Maybe this weekend I can put together some quick examples. -- jt On Feb 11, 2009, at 3:15 PM, Istvan Albert wrote: > > > > On Feb 11, 2:44 pm, James Taylor <[email protected]> wrote: > >> It'd be really cool to compare some of the other features like >> IntervalIndexFile vs. NLMSA. I've not had time to do it, but it might >> be interesting (and encourage one or all of us to do some >> optimization ;). > > lately I've become interested in exploring interval overlap query > algorithms. It looks like just about everything in the genome is > transcribed at some point, thus we will end up with orders of > magnitude more interval data than what we were used to have. Having a > seamless way to query these is just as important as being able to > fetch a sequence and operate on it. > > If you guys write some docs on how this IntervalIndexFile is used in > practice, with some some examples etc. I'll write benchmarks to > compare it to other type of queries. > > Istvan > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
