You're pretty much on the spot regarding two aspects about the current IntersectingIterator:
1- It's not really extensible (there are hooks for building doc IDs, but you still need the same `partition term: docId` key structure) 2- Its main strength is that it can do the merges of sorted lists of doc IDs based on equality expressions (ie, `author=="bob" and day=="20120627"`) Fortunately, the logic isn't very complicated for re-creating the merging stuff. Personally, I think it's easy enough to separate the logic of joining N streams of iterator results from the actual scanning. Unfortunately, this would be left up to you to do at the moment :) You could do range searches by consuming sets of values and sorting all of the docIds in that range by throwing them into a TreeSet. That would let you emit doc IDs in a globally sorted order for the given range of terms. This can get problematic if the range ends up being very large because your iterator stack may periodically be destroyed and rebuilt. On Thu, Jun 28, 2012 at 1:49 PM, Sukant Hajra <[email protected]> wrote: > We're in a position right now, where we have a change list (like a transaction > log) and we'd like to index the changes by author, but a typical query is: > > Show the last n changes for author "Foo Bar" > > or > > Show changes after Jan. 1st, 2012 for author "Foo Bar" > > Certainly, we can denormalize our data to facilitate this lookup. But the > idea > of using intersecting iterators seems intriguing (to get a modicum of > data-local server-side joining), but our ideas for shoe-horning the query into > intersecting iterators seems really wonky or half-baked. Largely, we're > running into the restriction that intersecting iterators are based upon the > product of a boolean conjunctive statements about term equality. What we'd > really like is a little more range-based. The Accumulo documentation alludes > to the problem a little: > > If the results are unordered this is quite effective as the first results > to arrive are as good as any others to the user. > > In our case, order matters because we want the last results without pulling in > everything. > > We looked at the code for intersecting iterators a little, and noticed that > there's an inheritance design, but we're not convinced that it's really > "designed for extension" and if it is, we're not sure if it can be extended to > meet our needs gracefully. If it can, we're really interested in any > suggestions or prior work. > > Otherwise, we're open to the idea that there's Accumulo features we're just > not > aware of beyond intersecting iterators that are a better fit. > > It would be wonderful to have a technique to hedge against over-denormalizing > our data for every variant of query we have to support. > > Thanks for your help, > Sukant
