Re: best way to cache base queries (before application of filters)

2009-05-21 Thread Yonik Seeley
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch kent.fi...@gmail.com wrote:
  #2) Your problem might be able to be solved with field collapsing on
  the category field in the future (but it's not in Solr yet).
 Sorry - I didnt understand this

A single relevancy search, but group or collapse results based on the
value of the category field such that you don't get more than 10
results for each value of category.

but it's not in Solr yet...
http://issues.apache.org/jira/browse/SOLR-236

 - we've got one query we want filtered 5 ways to find the top scoring
 results matching the query and each filter

The problem is that caching the base query involves caching not only
all of the matching documents, but the score for each document.
That's expensive.

You could also write your own HitCollector that filtered the results
of the base query 5 different ways simultaneously.

-Yonik
http://www.lucidimagination.com


Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 12:07 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Solr plays nice with HTTP caches.  Perhaps the simplest solution is to put 
 Solr behind a caching server such as Varnish, Squid, or even Apache?

In Kent's case, the other query parameters (the other filters mainly)
change, so an external cache won't help.

-Yonik
http://www.lucidimagination.com


Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Walter Underwood
How often do you update the indexes? We update once per day, and our
HTTP cache has a hit rate of 75% once it gets warmed up.

wunder

On 5/20/09 9:07 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:
 
 Kent,
 
 Solr plays nice with HTTP caches.  Perhaps the simplest solution is to put
 Solr behind a caching server such as Varnish, Squid, or even Apache?
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Kent Fitch kent.fi...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 20, 2009 3:47:02 AM
 Subject: best way to cache base queries (before application of filters)
 
 Hi,  I'm looking for some advice on how to add base query caching to SOLR.
 
 Our use-case for SOLR is:
 
 - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8
 in 6 months)
 - a frontend which presents views of this data in 5 categories by firing
 off 5 queries with the same search term but 5 different fq values
 
 For example, an originating query for sydney harbour generates 5 SOLR
 queries:
 
 - ../search?q=fq=category:books
 - ../search?q=fq=category:maps
 - ../search?q=fq=category:music
 etc
 
 The complicated expansion requiring sloppy phrase matches, and the large
 database with lots of very large documents means that some queries take
 quite some time (10's to several 100's of ms), so we'd like to cache the
 results of the base query for a short time (long enough for all related
 queries to be issued).
 
 It looks like this isnt the use-case for queryResultCache, because its key
 is calculated in SolrIndexSearcher like this:
 
 key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(),
 cmd.getFlags());
 
 That is, the filters are part of the key; and the result that's cached
 results reflects the application of the filters, and this works great for
 what it is probably designed for - supporting paging through results.
 
 So, I think our options are:
 
 - create a new queryComponent that invokes SolrIndexSearcher differently,
 and which has its own (short lived but long entry length) cache of the base
 query results
 
 - subclass or change SolrIndexSearcher, perhaps making it pluggable,
 perhaps defining an optional new cache of base query results
 
 - create a sublcass of the Lucene IndexSearcher which manages a cache of
 query results hidden from SolrIndexSearcher (and organise somehow for
 SolrIndexSearcher to use that sublass)
 
 Or perhaps Im taking the wrong approach to this problem entirely!  Any
 advice is greatly appreciated.
 
 Kent Fitch
 



Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Yonik Seeley
Some thoughts:

#1) This is sort of already implemented in some form... see this
section of solrconfig.xml and try uncommenting it:

   !-- An optimization that attempts to use a filter to satisfy a search.
 If the requested sort does not include score, then the filterCache
 will be checked for a filter matching the query. If found, the filter
 will be used as the source of document ids, and then the sort will be
 applied to that.
useFilterForSortedQuerytrue/useFilterForSortedQuery
   --

Unfortunately, it's currently a system-wide setting... you can't
select it per-query.

#2) Your problem might be able to be solved with field collapsing on
the category field in the future (but it's not in Solr yet).

#3) Current work I'm doing right now will push Filters down a level
and check them in tandem with the query instead of after.  This should
speed things up by at least a factor of 2 in your case.
https://issues.apache.org/jira/browse/SOLR-1165

I'm trying to get SOLR-1165 finished this week, and I'd love to see
how it affects your performance.
In the meantime, try useFilterForSortedQuery and let us know if it
still works (it's been turned off for a lng time) ;-)

-Yonik
http://www.lucidimagination.com



On Wed, May 20, 2009 at 3:47 AM, Kent Fitch kent.fi...@gmail.com wrote:
 Hi,  I'm looking for some advice on how to add base query caching to SOLR.

 Our use-case for SOLR is:

 - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8
 in 6 months)
 - a frontend which presents views of this data in 5 categories by firing
 off 5 queries with the same search term but 5 different fq values

 For example, an originating query for sydney harbour generates 5 SOLR
 queries:

 - ../search?q=complicated expansion of sydney harbourfq=category:books
 - ../search?q=complicated expansion of sydney harbourfq=category:maps
 - ../search?q=complicated expansion of sydney harbourfq=category:music
 etc

 The complicated expansion requiring sloppy phrase matches, and the large
 database with lots of very large documents means that some queries take
 quite some time (10's to several 100's of ms), so we'd like to cache the
 results of the base query for a short time (long enough for all related
 queries to be issued).

 It looks like this isnt the use-case for queryResultCache, because its key
 is calculated in SolrIndexSearcher like this:

 key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(),
 cmd.getFlags());

 That is, the filters are part of the key; and the result that's cached
 results reflects the application of the filters, and this works great for
 what it is probably designed for - supporting paging through results.

 So, I think our options are:

 - create a new queryComponent that invokes SolrIndexSearcher differently,
 and which has its own (short lived but long entry length) cache of the base
 query results

 - subclass or change SolrIndexSearcher, perhaps making it pluggable,
 perhaps defining an optional new cache of base query results

 - create a sublcass of the Lucene IndexSearcher which manages a cache of
 query results hidden from SolrIndexSearcher (and organise somehow for
 SolrIndexSearcher to use that sublass)

 Or perhaps Im taking the wrong approach to this problem entirely!  Any
 advice is greatly appreciated.

 Kent Fitch



Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Otis Gospodnetic

Kent,

Solr plays nice with HTTP caches.  Perhaps the simplest solution is to put Solr 
behind a caching server such as Varnish, Squid, or even Apache?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Kent Fitch kent.fi...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 20, 2009 3:47:02 AM
 Subject: best way to cache base queries (before application of filters)
 
 Hi,  I'm looking for some advice on how to add base query caching to SOLR.
 
 Our use-case for SOLR is:
 
 - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8
 in 6 months)
 - a frontend which presents views of this data in 5 categories by firing
 off 5 queries with the same search term but 5 different fq values
 
 For example, an originating query for sydney harbour generates 5 SOLR
 queries:
 
 - ../search?q=fq=category:books
 - ../search?q=fq=category:maps
 - ../search?q=fq=category:music
 etc
 
 The complicated expansion requiring sloppy phrase matches, and the large
 database with lots of very large documents means that some queries take
 quite some time (10's to several 100's of ms), so we'd like to cache the
 results of the base query for a short time (long enough for all related
 queries to be issued).
 
 It looks like this isnt the use-case for queryResultCache, because its key
 is calculated in SolrIndexSearcher like this:
 
 key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(),
 cmd.getFlags());
 
 That is, the filters are part of the key; and the result that's cached
 results reflects the application of the filters, and this works great for
 what it is probably designed for - supporting paging through results.
 
 So, I think our options are:
 
 - create a new queryComponent that invokes SolrIndexSearcher differently,
 and which has its own (short lived but long entry length) cache of the base
 query results
 
 - subclass or change SolrIndexSearcher, perhaps making it pluggable,
 perhaps defining an optional new cache of base query results
 
 - create a sublcass of the Lucene IndexSearcher which manages a cache of
 query results hidden from SolrIndexSearcher (and organise somehow for
 SolrIndexSearcher to use that sublass)
 
 Or perhaps Im taking the wrong approach to this problem entirely!  Any
 advice is greatly appreciated.
 
 Kent Fitch



Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 12:43 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
    useFilterForSortedQuerytrue/useFilterForSortedQuery

Of course the examples you gave used the default sort (by score) so
this wouldn't help if you do actually need to sort by score.

-Yonik
http://www.lucidimagination.com


Re: best way to cache base queries (before application of filters)

2009-05-20 Thread Walter Underwood
An HTTP cache will still work. We make three or four back end queries
for each search page. We use separate request handlers with filter query
specs instead of putting the filter query in the URL, but those two
approaches are equivalent for the HTTP cache.

We get similar cache hit rates on the faceted browse.

wunder

On 5/20/09 9:14 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 On Wed, May 20, 2009 at 12:07 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Solr plays nice with HTTP caches.  Perhaps the simplest solution is to put
 Solr behind a caching server such as Varnish, Squid, or even Apache?
 
 In Kent's case, the other query parameters (the other filters mainly)
 change, so an external cache won't help.
 
 -Yonik
 http://www.lucidimagination.com