On 31-Jan-08, at 9:41 AM, Andy Blower wrote:

Yonik Seeley wrote:

This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields
for
the first 100 records in the list from the cache - or so I thought.

To calculate the DocSet (the set of all documents matching *:* and
your filters), Solr can just use it's caches as long as *:* and the
filters have been used before.

*But*, to retrieve the top 10 documents matching *:* and your filters,
the query must be re-run.  That is probably where the time is being
spent. Since you aren't looking for relevancy scores at all, but just
faceting, it seems like we could potentially optimize this in Solr.


I'm actually retrieving the first 100 in my tests, which will be necessary in one of the two scenarios we use blank queries for. The other scenario doesn't require any docs at all - just the facets, and I've not put that in my tests. What would the situation be if I specified a sort order for the
facets and/or retrieved no docs at all? I'd be sorting the facets
alphabetically, which is currently done by my app rather than the search engine. (since I sometimes have to merge facets from more than one field)

First question: What is the use of retrieving 100 documents if there is no defined sort order?

The situation could be optimized in Solr, but there is a related case that _is_ optimized that should be almost as fast. If you

a) don't ask for document score in field list (fl)
b) enable <useFilterForSortedQuery> in solrconfig.xml
c) specify _some_ sort order other than score

Then Solr will do cached bitset intersections only. It will also do sorting, but that may not be terribly expensive. If it is close to the desired performance, it would be relatively easy to patch solr to not do that step.

(Note: this is query sort, no facet sort).

I had assumed that no doc would be considered more relevant than any other without any query terms - i.e. filter query terms wouldn't affect relevance. This seems sensible to me, but maybe that's only because our current search
engine works that way.

It won't, but it will still try to calculate the score if you ask it to (all docs will score the same, though).

Regarding optimization, I certainly think that being able to access all facets for subsets of the indexed data (defined by the filter query) is an incredibly useful feature. My search engine usage may not be very common though. What it means to us is that we can drive all aspects of our sites
from the search engine, not just the obvious search forms.

I also use this feature. It would be useful to optimize the case where rows=0.

-Mike

Reply via email to