On 31-Jan-08, at 9:41 AM, Andy Blower wrote:
Yonik Seeley wrote:
This surprises me because the filter query submitted has usually
already
been submitted along with a normal query, and so should be cached
in the
filter cache. Surely all solr needs to do is return a handful of
fields
for
the first 100 records in the list from the cache - or so I thought.
To calculate the DocSet (the set of all documents matching *:* and
your filters), Solr can just use it's caches as long as *:* and the
filters have been used before.
*But*, to retrieve the top 10 documents matching *:* and your
filters,
the query must be re-run. That is probably where the time is being
spent. Since you aren't looking for relevancy scores at all, but
just
faceting, it seems like we could potentially optimize this in Solr.
I'm actually retrieving the first 100 in my tests, which will be
necessary
in one of the two scenarios we use blank queries for. The other
scenario
doesn't require any docs at all - just the facets, and I've not put
that in
my tests. What would the situation be if I specified a sort order
for the
facets and/or retrieved no docs at all? I'd be sorting the facets
alphabetically, which is currently done by my app rather than the
search
engine. (since I sometimes have to merge facets from more than one
field)
First question: What is the use of retrieving 100 documents if there
is no defined sort order?
The situation could be optimized in Solr, but there is a related case
that _is_ optimized that should be almost as fast. If you
a) don't ask for document score in field list (fl)
b) enable <useFilterForSortedQuery> in solrconfig.xml
c) specify _some_ sort order other than score
Then Solr will do cached bitset intersections only. It will also do
sorting, but that may not be terribly expensive. If it is close to
the desired performance, it would be relatively easy to patch solr to
not do that step.
(Note: this is query sort, no facet sort).
I had assumed that no doc would be considered more relevant than
any other
without any query terms - i.e. filter query terms wouldn't affect
relevance.
This seems sensible to me, but maybe that's only because our
current search
engine works that way.
It won't, but it will still try to calculate the score if you ask it
to (all docs will score the same, though).
Regarding optimization, I certainly think that being able to access
all
facets for subsets of the indexed data (defined by the filter
query) is an
incredibly useful feature. My search engine usage may not be very
common
though. What it means to us is that we can drive all aspects of our
sites
from the search engine, not just the obvious search forms.
I also use this feature. It would be useful to optimize the case
where rows=0.
-Mike