Totally agree on optimizing out of the box experience, it's just never a one size fits all thing. And we have to be very careful about micro- benchmarks driving these settings. Currently, many of us use Wikipedia, but that's just one doc set and I'd venture to say most Solr users do not have docs that look anything like Wikipedia. One of the things the Open Relevance project (http://wiki.apache.org/lucene-java/OpenRelevance , see the discussion on gene...@lucene.a.o) should aim to do is bring in a variety of test collections, from lots of different genres. This will help both with relevance and with speed testing.

-Grant

On May 14, 2009, at 6:47 AM, Michael McCandless wrote:

On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll <gsing...@apache.org> wrote:
I've contacted
others in the past who have done "comparisons" and after one round of
emailing it was almost always clear that they didn't know what best
practices are for any given product and thus were doing things
sub-optimally.

While I agree, one should properly match & tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the "best
practices" as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the "other camp" to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to