On Wed, Oct 24, 2012 at 5:08 AM, Dag Lem <[email protected]> wrote:
> Furthermore fetch_doc and fetch_doc_vec should be replaced with
> something like fetch_docs and fetch_docs_vec, facilitating the
> fetching of several documents with a single request / response.
The place to address batch-fetching of documents is the Lucy::Search::Hits
iterator class.
Right now, Hits doesn't pre-fetch -- it just retrieves individual docs on
demand.
HitDoc*
Hits_next(Hits *self) {
MatchDoc *match_doc
= (MatchDoc*)VA_Fetch(self->match_docs, self->offset);
self->offset++;
if (!match_doc) {
/** Bail if there aren't any more *captured* hits. (There may
* be more total hits.) */
return NULL;
}
else {
// Lazily fetch HitDoc, set score.
HitDoc *hit_doc = Searcher_Fetch_Doc(self->searcher,
match_doc->doc_id);
HitDoc_Set_Score(hit_doc, match_doc->score);
return hit_doc;
}
}
We could modify Hits by giving it a `prefetch_count` member variable and a
`set_prefetch_count()` method. The default for `prefetch_count` would be 0,
preserving the current behavior, but ClusterSearcher could set that count
before returning the Hits object so that all documents are prefetched by
default on the first call to `next()`. The result will be to cut down fetches
from one round-trip per hit to one round-trip per shard-with-hits.
There's no need to make `set_prefetch_count()` public yet -- it can remain
an implementation detail for the time being.
The question of what to do about `fetch_doc_vec()` is harder. Highlighter is
the only place that calls `fetch_doc_vec()`, but it can't prefetch because it
only deals with one hit at a time.
Perhaps we ought to explore integrating Highlighter with Hits instead of
limiting it to dealing with individual Doc objects. That way, Hits could
assume responsibility for prefetching both Doc and DocVector objects at the
same time.
Marvin Humphrey