Re: best way to cache base queries (before application of filters)
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch kent.fi...@gmail.com wrote: #2) Your problem might be able to be solved with field collapsing on the category field in the future (but it's not in Solr yet). Sorry - I didnt understand this A single relevancy search, but group or collapse results based on the value of the category field such that you don't get more than 10 results for each value of category. but it's not in Solr yet... http://issues.apache.org/jira/browse/SOLR-236 - we've got one query we want filtered 5 ways to find the top scoring results matching the query and each filter The problem is that caching the base query involves caching not only all of the matching documents, but the score for each document. That's expensive. You could also write your own HitCollector that filtered the results of the base query 5 different ways simultaneously. -Yonik http://www.lucidimagination.com
Re: best way to cache base queries (before application of filters)
On Wed, May 20, 2009 at 12:07 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Solr plays nice with HTTP caches. Perhaps the simplest solution is to put Solr behind a caching server such as Varnish, Squid, or even Apache? In Kent's case, the other query parameters (the other filters mainly) change, so an external cache won't help. -Yonik http://www.lucidimagination.com
Re: best way to cache base queries (before application of filters)
How often do you update the indexes? We update once per day, and our HTTP cache has a hit rate of 75% once it gets warmed up. wunder On 5/20/09 9:07 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Kent, Solr plays nice with HTTP caches. Perhaps the simplest solution is to put Solr behind a caching server such as Varnish, Squid, or even Apache? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kent Fitch kent.fi...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, May 20, 2009 3:47:02 AM Subject: best way to cache base queries (before application of filters) Hi, I'm looking for some advice on how to add base query caching to SOLR. Our use-case for SOLR is: - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8 in 6 months) - a frontend which presents views of this data in 5 categories by firing off 5 queries with the same search term but 5 different fq values For example, an originating query for sydney harbour generates 5 SOLR queries: - ../search?q=fq=category:books - ../search?q=fq=category:maps - ../search?q=fq=category:music etc The complicated expansion requiring sloppy phrase matches, and the large database with lots of very large documents means that some queries take quite some time (10's to several 100's of ms), so we'd like to cache the results of the base query for a short time (long enough for all related queries to be issued). It looks like this isnt the use-case for queryResultCache, because its key is calculated in SolrIndexSearcher like this: key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(), cmd.getFlags()); That is, the filters are part of the key; and the result that's cached results reflects the application of the filters, and this works great for what it is probably designed for - supporting paging through results. So, I think our options are: - create a new queryComponent that invokes SolrIndexSearcher differently, and which has its own (short lived but long entry length) cache of the base query results - subclass or change SolrIndexSearcher, perhaps making it pluggable, perhaps defining an optional new cache of base query results - create a sublcass of the Lucene IndexSearcher which manages a cache of query results hidden from SolrIndexSearcher (and organise somehow for SolrIndexSearcher to use that sublass) Or perhaps Im taking the wrong approach to this problem entirely! Any advice is greatly appreciated. Kent Fitch
Re: best way to cache base queries (before application of filters)
Some thoughts: #1) This is sort of already implemented in some form... see this section of solrconfig.xml and try uncommenting it: !-- An optimization that attempts to use a filter to satisfy a search. If the requested sort does not include score, then the filterCache will be checked for a filter matching the query. If found, the filter will be used as the source of document ids, and then the sort will be applied to that. useFilterForSortedQuerytrue/useFilterForSortedQuery -- Unfortunately, it's currently a system-wide setting... you can't select it per-query. #2) Your problem might be able to be solved with field collapsing on the category field in the future (but it's not in Solr yet). #3) Current work I'm doing right now will push Filters down a level and check them in tandem with the query instead of after. This should speed things up by at least a factor of 2 in your case. https://issues.apache.org/jira/browse/SOLR-1165 I'm trying to get SOLR-1165 finished this week, and I'd love to see how it affects your performance. In the meantime, try useFilterForSortedQuery and let us know if it still works (it's been turned off for a lng time) ;-) -Yonik http://www.lucidimagination.com On Wed, May 20, 2009 at 3:47 AM, Kent Fitch kent.fi...@gmail.com wrote: Hi, I'm looking for some advice on how to add base query caching to SOLR. Our use-case for SOLR is: - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8 in 6 months) - a frontend which presents views of this data in 5 categories by firing off 5 queries with the same search term but 5 different fq values For example, an originating query for sydney harbour generates 5 SOLR queries: - ../search?q=complicated expansion of sydney harbourfq=category:books - ../search?q=complicated expansion of sydney harbourfq=category:maps - ../search?q=complicated expansion of sydney harbourfq=category:music etc The complicated expansion requiring sloppy phrase matches, and the large database with lots of very large documents means that some queries take quite some time (10's to several 100's of ms), so we'd like to cache the results of the base query for a short time (long enough for all related queries to be issued). It looks like this isnt the use-case for queryResultCache, because its key is calculated in SolrIndexSearcher like this: key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(), cmd.getFlags()); That is, the filters are part of the key; and the result that's cached results reflects the application of the filters, and this works great for what it is probably designed for - supporting paging through results. So, I think our options are: - create a new queryComponent that invokes SolrIndexSearcher differently, and which has its own (short lived but long entry length) cache of the base query results - subclass or change SolrIndexSearcher, perhaps making it pluggable, perhaps defining an optional new cache of base query results - create a sublcass of the Lucene IndexSearcher which manages a cache of query results hidden from SolrIndexSearcher (and organise somehow for SolrIndexSearcher to use that sublass) Or perhaps Im taking the wrong approach to this problem entirely! Any advice is greatly appreciated. Kent Fitch
Re: best way to cache base queries (before application of filters)
Kent, Solr plays nice with HTTP caches. Perhaps the simplest solution is to put Solr behind a caching server such as Varnish, Squid, or even Apache? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kent Fitch kent.fi...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, May 20, 2009 3:47:02 AM Subject: best way to cache base queries (before application of filters) Hi, I'm looking for some advice on how to add base query caching to SOLR. Our use-case for SOLR is: - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8 in 6 months) - a frontend which presents views of this data in 5 categories by firing off 5 queries with the same search term but 5 different fq values For example, an originating query for sydney harbour generates 5 SOLR queries: - ../search?q=fq=category:books - ../search?q=fq=category:maps - ../search?q=fq=category:music etc The complicated expansion requiring sloppy phrase matches, and the large database with lots of very large documents means that some queries take quite some time (10's to several 100's of ms), so we'd like to cache the results of the base query for a short time (long enough for all related queries to be issued). It looks like this isnt the use-case for queryResultCache, because its key is calculated in SolrIndexSearcher like this: key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(), cmd.getFlags()); That is, the filters are part of the key; and the result that's cached results reflects the application of the filters, and this works great for what it is probably designed for - supporting paging through results. So, I think our options are: - create a new queryComponent that invokes SolrIndexSearcher differently, and which has its own (short lived but long entry length) cache of the base query results - subclass or change SolrIndexSearcher, perhaps making it pluggable, perhaps defining an optional new cache of base query results - create a sublcass of the Lucene IndexSearcher which manages a cache of query results hidden from SolrIndexSearcher (and organise somehow for SolrIndexSearcher to use that sublass) Or perhaps Im taking the wrong approach to this problem entirely! Any advice is greatly appreciated. Kent Fitch
Re: best way to cache base queries (before application of filters)
On Wed, May 20, 2009 at 12:43 PM, Yonik Seeley yo...@lucidimagination.com wrote: useFilterForSortedQuerytrue/useFilterForSortedQuery Of course the examples you gave used the default sort (by score) so this wouldn't help if you do actually need to sort by score. -Yonik http://www.lucidimagination.com
Re: best way to cache base queries (before application of filters)
An HTTP cache will still work. We make three or four back end queries for each search page. We use separate request handlers with filter query specs instead of putting the filter query in the URL, but those two approaches are equivalent for the HTTP cache. We get similar cache hit rates on the faceted browse. wunder On 5/20/09 9:14 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 20, 2009 at 12:07 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Solr plays nice with HTTP caches. Perhaps the simplest solution is to put Solr behind a caching server such as Varnish, Squid, or even Apache? In Kent's case, the other query parameters (the other filters mainly) change, so an external cache won't help. -Yonik http://www.lucidimagination.com