Re: fq efficiency

Shawn Heisey Tue, 05 Nov 2013 16:35:38 -0800

On 11/5/2013 3:36 PM, Scott Schneider wrote:

I'm wondering if filter queries are efficient enough for my use cases.  I have 
lots and lots of users in a big, multi-tenant, sharded index.  To run a search, 
I can use an fq on the user id and pass in the search terms.  Does this scale 
well with the # users?  I suppose that, since user id is indexed, generating 
the filter data (which is cached) will be fast.  And looking up search terms is 
fast, of course.  But if the search term is a common one that many users have 
in their documents, then Solr may have to perform an intersection between two 
large sets:  docs from all users with the search term and all of the current 
user's docs.


Also, how about auto-complete and searching with a trailing wildcard?  As I understand it, these work well in 
a single-tenant index because keywords are sorted in the index, so it's easy to get all the search terms that 
match "foo*".  In a multi-tenant index, all users' keywords are stored together.  So if Lucene were 
to look at all the keywords from "foo" to "foozzzzz" (I'm not sure if it actually does 
this), it would skip over a large majority of keywords that don't belong to this user.

From what I understand, there's not really a whole lot of differencebetween queries and filter queries when they are NOT cached, except thatthe main query and the filter queries are executed in parallel, whichcan save time.

When filter queries are found in the filterCache, it's a differentstory. They get applied *before* the main query, which means that themain query won't have to work as hard. The filterCache storesinformation about which documents in the entire index match the filter.By storing it as a bitset, the amount of space required is relativelylow. Applying filterCache results is very efficient.

There are also advanced techniques, like assigning a cost to each filterand creating postfilters:


http://yonik.com/posts/advanced-filter-caching-in-solr/

Thanks,
Shawn

Re: fq efficiency

Reply via email to