Dag Lem <[email protected]> writes:

[...]

> Some observations:
> 
> * Lucy::Search::IndexSearcher::top_docs (used by SearchServer) is
>   about twice as slow Lucy::Search::Searcher::hits (used by
>   IndexSearcher).
> * The time used for object serialization with sharding surpasses the
>   time spent in Lucy::Search::Searcher::hits without sharding, and
>   scales with query complexity.
> * Quite a lot of time is spent on (local) network communication, and
>   this also seems to scale with query complexity.

Having run strace, one culprit with the current implementation seems
to be that the client requests "doc_freq" from the server once for
every single term in the query. This seems to be the fundamental cause
of the last two observations above. Fixing this issue would help a
bit, I think, but the performance would still be severely limited by
the network roundtrips caused by hit fetching (which is not part of my
test) and the overhead of Storable.

If I may be so bold as to suggest how to make sharding really fly, I
believe what's called for is the following:

1. Get rid of as many network roundtrips as possible.
2. Design a (simple) custom application protocol, to get rid of the
   overhead of Storable.

As far as I can tell, the current protocol covers the following
actions:

  handshake
  terminate
  doc_max
  doc_freq
  top_docs
  fetch_doc
  fetch_doc_vec

Here, doc_freq and top_docs should be replaced with something like
docs_freq_and_top_docs, i.e. only one request / response per query.
Furthermore fetch_doc and fetch_doc_vec should be replaced with
something like fetch_docs and fetch_docs_vec, facilitating the
fetching of several documents with a single request / response.

With this in place, Storable serialization could be replaced by a
custom application protocol to make things *really* fast. However note
that this is not a requirement to fix the fundamental issue - network
roundtrips.

The only remaining question would be whether it is possible to
optimize Lucy::Search::IndexSearcher::top_docs to perform nearly as
well as Lucy::Search::Searcher::hits.

-- 
Best regards,

Dag Lem

Reply via email to