On Tue, Nov 13, 2012 at 9:47 AM, Peter Karman <[email protected]> wrote: > On 11/13/12 11:33 AM, Marvin Humphrey wrote: > >> I would oppose a seek() which runs implicit searches behind the scenes >> because it would surprise users with a hidden performance cost. > > I was imagining that $hits->seek() would just adjust the iterator pointer,
Why not just use the `offset` parameter to Searcher#hits? So long as rerunning the search is ruled out, adding seek() offers little functionality beyond what we already get from `offset`. The motivation for adding seek() seems to be to make Lucy more Swish-like. However, if we really wanted to offer Swish-like behavior, we'd have to run the initial search with the size of priority queue set to the size of the index every time in order to capture all possible hits. Speed and memory consumption would suck, but at least they would suck uniformly for all values of `offset`! :) And you could seek all over without having to rerun a search. I assume nobody wants that -- not even the original poster. So we're stuck with the less satisfactory semantics of having seek() fail silently beyond `offset + num_wanted`. Well, if we're not going to offer full Swish-e semantics with seek(), and the limited version offers hardly any new functionality, what do we accomplish by adding it? We'd make people like the original poster feel a little better at first -- but we're just setting them up for disappointment when they discover that Lucy's seek() and Swish-e's seek() aren't really the same after all. There's another reason not to add seek(): it conflicts with adding pre-fetching support to Hits. What happens when you seek back and forth to different points in the iterator? Does it throw away the docs that were just pre-fetched? Does it keep them and risk exceeding the prefetch_count and blowing up memory consumption? Right this moment, it seems like adding seek() to Hits would be straightforward -- but in my view, the conflict with pre-fetching serves to illustrate why it pays to be prudent about expanding public APIs, both in this specific case and in general. Marvin Humphrey
