Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-08 Thread Hector Corrada Bravo
Thanks everyone for their comments. I think the easiest (non-hack) implementation would do the following: 1) For now, use the newly created IntervalTreeList class to implement Michael's hash idea. I'm weary of pushing this into the IntervalTree class since it would make this class depart from the

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Laurent Gautier
On 2013-04-03 19:28, Kasper Daniel Hansen wrote: Making IntervalTree chromosome would also be a great addition for organisms with many sequences, like bee (due to an incomplete genome; 10,000s of sequences). It does not matter for humans, but findOverlaps is excruciatingly slow for bee's. I hav

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Michael Lawrence
Yes, we could use this offset-based approach. Though we would want to just use the seqlengths for calculating the offsets, so that every query will be compatible with the index. This would usually mean doubles, instead of integers. The hash approach might be a tiny bit faster, as we wouldn't need t

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Martin Morgan
On 04/03/2013 10:28 AM, Kasper Daniel Hansen wrote: Making IntervalTree chromosome would also be a great addition for organisms with many sequences, like bee (due to an incomplete genome; 10,000s of sequences). It does not matter for humans, but findOverlaps is excruciatingly slow for bee's. I

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Kasper Daniel Hansen
Making IntervalTree chromosome would also be a great addition for organisms with many sequences, like bee (due to an incomplete genome; 10,000s of sequences). It does not matter for humans, but findOverlaps is excruciatingly slow for bee's. I have a couple of posts on this in the archive. I am p

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Michael Lawrence
Some ideas: - Turn the IntervalTree into a list/array of nodes that can be subset/reordered with shallow copying (just copy the pointers to the nodes), and the index would be secondary. The index in the array could be stored in each node, for lookup during overlap queries. Right now, as far as I c

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Michael Lawrence
Hi Hector, That's interesting, thanks for passing this along. I'm still wishing that somehow GRanges itself could abstract the way it stores ranges. I know that Herve/Patrick had some reasons for depending specifically on GRanges. One reason was probably convenience at the C level, but it wouldn't

Re: [Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-03 Thread Hector Corrada Bravo
Yep, I didn't comment on that, but I agree that abstracting how GRanges stores ranges would make this more elegant. Right now ranges(GRanges) is specified to be of IRanges class instead of the abstract Ranges class. If it were the latter then GIntervalTree can be a subclass of GenomicRanges, in a

[Bioc-devel] RFC: IntervalTrees for GRanges objects

2013-04-02 Thread Hector Corrada Bravo
Hello bioc-develers, I'm writing an application where lots findOverlap calls are made on static GRanges objects. For IRanges we can create persistent IntervalTree objects that would serve the multiple overlap query use-case. There is no equivalent for GenomicRanges objects, so I'm proposing an imp