Re: How to deal with cache for facet search when index is always increment?

2013-05-02 Thread Daniel Tyreus
On Wed, May 1, 2013 at 7:01 PM, 李威 li...@antvision.cn wrote:


 For facet seach, solr would create cache which is based on the whole docs.
 If I import a new doc into index, the cache would out of time and need to
 create again.
 For real time seach, the docs would be import to index anytime. In this
 case, the cache is nealy always need to create again, which cause the facet
 seach is very slowly.
 Do you have any idea to deal with such problem?


We're in a similar situation and have had better performance using
facet.method=fcs.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

The suggestion to use soft commits is also a good one.

Best regards,
Daniel


Re: filter before facet

2013-04-25 Thread Daniel Tyreus
On Thu, Apr 25, 2013 at 12:35 AM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:



  This leads me to believe that the FQ is being applied AFTER the facets
 are
  calculated on the whole data set. For my use case it would make a ton of
  sense to apply the FQ first and then facet. Is it possible to specify
 this
  behavior or do I need to get into the code and get my hands dirty?


 As for creating a new faceting implementation that avoids the startup
 penalty by using only the found documents, then it is technically quite
 simple: Use stored fields, iterate the hits and request the values.
 Unfortunately this scales poorly with the number of hits, so unless you
 can guarantee that you will always have small result sets, this is
 probably not a viable option.


Thank you Toke for your detailed reply. I have perhaps an unusual use case
where we may have hundreds of thousands of users each with a few thousand
documents. On some queries I can guarantee the result size will be small
compared to the entire corpus since I'm filtering on one user's documents.
I may give this alternative faceting implementation a try.

Best regards,
Daniel


filter before facet

2013-04-24 Thread Daniel Tyreus
We're testing SolrCloud 4.1 for NRT search over hundreds of millions of
documents. I've been really impressed. The query performance is so much
better than we were getting out of our database.

With filter queries, we're able to get query times of less than 100ms under
moderate load. That's amazing.

My question today is on faceting. Let me give some examples to help make my
point.

*fq=state:California*
numFound = 92193
QTime = *80*

*fq=state:Calforni*
numFound = 0
QTime = *8*

*fq=state:Californiafacet=truefacet.field=city*
numFound = 92193
QTime = *1316*

*fq=city:San Franciscofacet=truefacet.field=city*
numFound = 1961
QTime = *1477*

*fq=state:Californifacet=truefacet.field=city*
numFound = 0
QTime = *1380*

So filtering is fast and faceting is slow, which is understandable.

But why is it slow to generate facets on a result set of 0? Furthermore,
why does it take the same amount of time to generate facets on a result set
of 2000 as 100,000 documents?

This leads me to believe that the FQ is being applied AFTER the facets are
calculated on the whole data set. For my use case it would make a ton of
sense to apply the FQ first and then facet. Is it possible to specify this
behavior or do I need to get into the code and get my hands dirty?

Best Regards,
Daniel


Re: filter before facet

2013-04-24 Thread Daniel Tyreus
I'm actually using one not listed in that doc (I suspect it's new). At
least with 3 or more facet fields, the FCS method is by far the best.

Here are some representative numbers with everything the same except for
the facet.method.

facet.method = fc
QTime = 3168

facet.method = enum
QTime = 309

facet.method = fcs
QTime = 19






On Wed, Apr 24, 2013 at 2:19 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 What's your facet.method? Have you tried setting it both ways?
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Apr 24, 2013 at 5:10 PM, Daniel Tyreus dan...@webshots.com
 wrote:
  We're testing SolrCloud 4.1 for NRT search over hundreds of millions of
  documents. I've been really impressed. The query performance is so much
  better than we were getting out of our database.
 
  With filter queries, we're able to get query times of less than 100ms
 under
  moderate load. That's amazing.
 
  My question today is on faceting. Let me give some examples to help make
 my
  point.
 
  *fq=state:California*
  numFound = 92193
  QTime = *80*
 
  *fq=state:Calforni*
  numFound = 0
  QTime = *8*
 
  *fq=state:Californiafacet=truefacet.field=city*
  numFound = 92193
  QTime = *1316*
 
  *fq=city:San Franciscofacet=truefacet.field=city*
  numFound = 1961
  QTime = *1477*
 
  *fq=state:Californifacet=truefacet.field=city*
  numFound = 0
  QTime = *1380*
 
  So filtering is fast and faceting is slow, which is understandable.
 
  But why is it slow to generate facets on a result set of 0? Furthermore,
  why does it take the same amount of time to generate facets on a result
 set
  of 2000 as 100,000 documents?
 
  This leads me to believe that the FQ is being applied AFTER the facets
 are
  calculated on the whole data set. For my use case it would make a ton of
  sense to apply the FQ first and then facet. Is it possible to specify
 this
  behavior or do I need to get into the code and get my hands dirty?
 
  Best Regards,
  Daniel