Re: [lucy-user] Hits offset and search performarce

Marvin Humphrey Tue, 13 Nov 2012 16:38:38 -0800

On Tue, Nov 13, 2012 at 9:47 AM, Peter Karman <[email protected]> wrote:
> On 11/13/12 11:33 AM, Marvin Humphrey wrote:
>
>> I would oppose a seek() which runs implicit searches behind the scenes
>> because it would surprise users with a hidden performance cost.
>
> I was imagining that $hits->seek() would just adjust the iterator pointer,


Why not just use the `offset` parameter to Searcher#hits?  So long as
rerunning the search is ruled out, adding seek() offers little functionality
beyond what we already get from `offset`.

The motivation for adding seek() seems to be to make Lucy more Swish-like.
However, if we really wanted to offer Swish-like behavior, we'd have to run
the initial search with the size of priority queue set to the size of the
index every time in order to capture all possible hits.  Speed and memory
consumption would suck, but at least they would suck uniformly for all values
of `offset`! :)  And you could seek all over without having to rerun a search.

I assume nobody wants that -- not even the original poster.  So we're stuck
with the less satisfactory semantics of having seek() fail silently beyond
`offset + num_wanted`.

Well, if we're not going to offer full Swish-e semantics with seek(), and the
limited version offers hardly any new functionality, what do we accomplish by
adding it?  We'd make people like the original poster feel a little better at
first -- but we're just setting them up for disappointment when they discover
that Lucy's seek() and Swish-e's seek() aren't really the same after all.

There's another reason not to add seek(): it conflicts with adding
pre-fetching support to Hits.  What happens when you seek back and forth to
different points in the iterator?  Does it throw away the docs that were just
pre-fetched?  Does it keep them and risk exceeding the prefetch_count and
blowing up memory consumption?

Right this moment, it seems like adding seek() to Hits would be
straightforward -- but in my view, the conflict with pre-fetching serves to
illustrate why it pays to be prudent about expanding public APIs, both in this
specific case and in general.

Marvin Humphrey

Re: [lucy-user] Hits offset and search performarce

Reply via email to