Re: facet performance tips
Right, I haven't used SOLR-475 yet and am more familiar with Bobo. I believe there are differences but I haven't gone into them yet. As I'm using Solr 1.4 now, maybe I'll test the UnInvertedField modality. Feel free to report back results as I don't think I've seen much yet? On Thu, Aug 13, 2009 at 10:51 AM, Fuad Efendi wrote: > SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to > be); check this > http://issues.apache.org/jira/browse/SOLR-475 > (and probably http://issues.apache.org/jira/browse/SOLR-711) > > -Original Message- > From: Jason Rutherglen > > Yeah we need a performance comparison, I haven't had time to put > one together. If/when I do I'll compare Bobo performance against > Solr bitset intersection based facets, compare memory > consumption. > > For near realtime Solr needs to cache and merge bitsets at the > SegmentReader level, and Bobo needs to be upgraded to work with > Lucene 2.9's searching at the segment level (currently it uses a > MultiSearcher). > > Distributed search on either should be fairly straightforward? > > On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: >> It seems BOBO-Browse is alternate faceting engine; would be interesting to >> compare performance with SOLR... Distributed? >> >> >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >> Sent: August-12-09 6:12 PM >> To: solr-user@lucene.apache.org >> Subject: Re: facet performance tips >> >> For your fields with many terms you may want to try Bobo >> http://code.google.com/p/bobo-browse/ which could work well with your >> case. >> >> >> >> >> > > >
RE: facet performance tips
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to be); check this http://issues.apache.org/jira/browse/SOLR-475 (and probably http://issues.apache.org/jira/browse/SOLR-711) -Original Message- From: Jason Rutherglen Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs to be upgraded to work with Lucene 2.9's searching at the segment level (currently it uses a MultiSearcher). Distributed search on either should be fairly straightforward? On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: > It seems BOBO-Browse is alternate faceting engine; would be interesting to > compare performance with SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For your fields with many terms you may want to try Bobo > http://code.google.com/p/bobo-browse/ which could work well with your > case. > > > > >
Re: facet performance tips
Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs to be upgraded to work with Lucene 2.9's searching at the segment level (currently it uses a MultiSearcher). Distributed search on either should be fairly straightforward? On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: > It seems BOBO-Browse is alternate faceting engine; would be interesting to > compare performance with SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For your fields with many terms you may want to try Bobo > http://code.google.com/p/bobo-browse/ which could work well with your > case. > > > > >
RE: facet performance tips
Interesting, it has "BoboRequestHandler implements SolrRequestHandler" - easy to try it; and shards support [Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? [Jason Rutherglen] For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case.
RE: facet performance tips
It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance tips For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case.
RE: facet performance tips
I took 1.4 from trunk three days ago, it seems Ok for production (at least for my Master instance which is doing writes-only). I use the same config files. 500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken from trunk. However, do not try to "facet" (probably outdated term after SOLR-475) on generic queries such as [* TO *] (with huge resultset). For smaller query results (100,000 instead of 100,000,000) "counting terms" is fast enough (few milliseconds at http://www.tokenizer.org) -Original Message- From: Jérôme Etévé [mailto:jerome.et...@gmail.com] Sent: August-13-09 5:38 AM To: solr-user@lucene.apache.org Subject: Re: facet performance tips Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to maybe around 500 000, so I guess this approach won't work anymore, as I doubt a 500 000 sized fieldCache would work. So I guess my best move would be to upgrade to the soon to be 1.4 version of solr to benefit from its new faceting method. I know this is a bit off-topic, but do you have a rough idea about when 1.4 will be an official release? As well, is the current trunk OK for production? Is it compatible with 1.3 configuration files? Thanks ! Jerome. 2009/8/13 Stephen Duncan Jr : > Note that depending on the profile of your field (full text and how many > unique terms on average per document), the improvements from 1.4 may not > apply, as you may exceed the limits of the new faceting technique in Solr > 1.4. > -Stephen > > On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > >> Yes, increasing the filterCache size will help with Solr 1.3 performance. >> >> Do note that trunk (soon Solr 1.4) has dramatically improved faceting >> performance. >> >>Erik >> >> >> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: >> >> Hi everyone, >>> >>> I'm using some faceting on a solr index containing ~ 160K documents. >>> I perform facets on multivalued string fields. The number of possible >>> different values is quite large. >>> >>> Enabling facets degrades the performance by a factor 3. >>> >>> Because I'm using solr 1.3, I guess the facetting makes use of the >>> filter cache to work. My filterCache is set >>> to a size of 2048. I also noticed in my solr stats a very small ratio >>> of cache hit (~ 0.01%). >>> >>> Can it be the reason why the faceting is slow? Does it make sense to >>> increase the filterCache size so it matches more or less the number >>> of different possible values for the faceted fields? Would that not >>> make the memory usage explode? >>> >>> Thanks for your help ! >>> >>> -- >>> Jerome Eteve. >>> >>> Chat with me live at http://www.eteve.net >>> >>> jer...@eteve.net >>> >> >> > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: facet performance tips
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to maybe around 500 000, so I guess this approach won't work anymore, as I doubt a 500 000 sized fieldCache would work. So I guess my best move would be to upgrade to the soon to be 1.4 version of solr to benefit from its new faceting method. I know this is a bit off-topic, but do you have a rough idea about when 1.4 will be an official release? As well, is the current trunk OK for production? Is it compatible with 1.3 configuration files? Thanks ! Jerome. 2009/8/13 Stephen Duncan Jr : > Note that depending on the profile of your field (full text and how many > unique terms on average per document), the improvements from 1.4 may not > apply, as you may exceed the limits of the new faceting technique in Solr > 1.4. > -Stephen > > On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > >> Yes, increasing the filterCache size will help with Solr 1.3 performance. >> >> Do note that trunk (soon Solr 1.4) has dramatically improved faceting >> performance. >> >>Erik >> >> >> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: >> >> Hi everyone, >>> >>> I'm using some faceting on a solr index containing ~ 160K documents. >>> I perform facets on multivalued string fields. The number of possible >>> different values is quite large. >>> >>> Enabling facets degrades the performance by a factor 3. >>> >>> Because I'm using solr 1.3, I guess the facetting makes use of the >>> filter cache to work. My filterCache is set >>> to a size of 2048. I also noticed in my solr stats a very small ratio >>> of cache hit (~ 0.01%). >>> >>> Can it be the reason why the faceting is slow? Does it make sense to >>> increase the filterCache size so it matches more or less the number >>> of different possible values for the faceted fields? Would that not >>> make the memory usage explode? >>> >>> Thanks for your help ! >>> >>> -- >>> Jerome Eteve. >>> >>> Chat with me live at http://www.eteve.net >>> >>> jer...@eteve.net >>> >> >> > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: facet performance tips
Note that depending on the profile of your field (full text and how many unique terms on average per document), the improvements from 1.4 may not apply, as you may exceed the limits of the new faceting technique in Solr 1.4. -Stephen On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > Yes, increasing the filterCache size will help with Solr 1.3 performance. > > Do note that trunk (soon Solr 1.4) has dramatically improved faceting > performance. > >Erik > > > On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: > > Hi everyone, >> >> I'm using some faceting on a solr index containing ~ 160K documents. >> I perform facets on multivalued string fields. The number of possible >> different values is quite large. >> >> Enabling facets degrades the performance by a factor 3. >> >> Because I'm using solr 1.3, I guess the facetting makes use of the >> filter cache to work. My filterCache is set >> to a size of 2048. I also noticed in my solr stats a very small ratio >> of cache hit (~ 0.01%). >> >> Can it be the reason why the faceting is slow? Does it make sense to >> increase the filterCache size so it matches more or less the number >> of different possible values for the faceted fields? Would that not >> make the memory usage explode? >> >> Thanks for your help ! >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net >> > > -- Stephen Duncan Jr www.stephenduncanjr.com
Re: facet performance tips
For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case. On Wed, Aug 12, 2009 at 12:02 PM, Fuad Efendi wrote: > I am currently faceting on tokenized multi-valued field at > http://www.tokenizer.org (25 mlns simple docs) > > It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and > non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667) > > Average "faceting" on query results: 0.2 - 0.3 seconds; without those > patches - 20-50 seconds. > > I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475 & SOLR-667) and > to compare results... > > > > > P.S. > Avoid faceting on a field with heavy distribution of terms (such as few > millions of terms in my case); It won't work in SOLR 1.3. > > TIP: use non-tokenized single-valued field for faceting, such as > non-tokenized "country" field. > > > > P.P.S. > Would be nice to load/stress > http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against > putting CPU in a spin loop ConcurrentHashMap. > > > > -Original Message- > From: Erik Hatcher [mailto:ehatc...@apache.org] > Sent: August-12-09 2:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > Yes, increasing the filterCache size will help with Solr 1.3 > performance. > > Do note that trunk (soon Solr 1.4) has dramatically improved faceting > performance. > > Erik > > On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: > >> Hi everyone, >> >> I'm using some faceting on a solr index containing ~ 160K documents. >> I perform facets on multivalued string fields. The number of possible >> different values is quite large. >> >> Enabling facets degrades the performance by a factor 3. >> >> Because I'm using solr 1.3, I guess the facetting makes use of the >> filter cache to work. My filterCache is set >> to a size of 2048. I also noticed in my solr stats a very small ratio >> of cache hit (~ 0.01%). >> >> Can it be the reason why the faceting is slow? Does it make sense to >> increase the filterCache size so it matches more or less the number >> of different possible values for the faceted fields? Would that not >> make the memory usage explode? >> >> Thanks for your help ! >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net > > > >
RE: facet performance tips
I am currently faceting on tokenized multi-valued field at http://www.tokenizer.org (25 mlns simple docs) It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667) Average "faceting" on query results: 0.2 - 0.3 seconds; without those patches - 20-50 seconds. I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475 & SOLR-667) and to compare results... P.S. Avoid faceting on a field with heavy distribution of terms (such as few millions of terms in my case); It won't work in SOLR 1.3. TIP: use non-tokenized single-valued field for faceting, such as non-tokenized "country" field. P.P.S. Would be nice to load/stress http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against putting CPU in a spin loop ConcurrentHashMap. -Original Message- From: Erik Hatcher [mailto:ehatc...@apache.org] Sent: August-12-09 2:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance tips Yes, increasing the filterCache size will help with Solr 1.3 performance. Do note that trunk (soon Solr 1.4) has dramatically improved faceting performance. Erik On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: > Hi everyone, > > I'm using some faceting on a solr index containing ~ 160K documents. > I perform facets on multivalued string fields. The number of possible > different values is quite large. > > Enabling facets degrades the performance by a factor 3. > > Because I'm using solr 1.3, I guess the facetting makes use of the > filter cache to work. My filterCache is set > to a size of 2048. I also noticed in my solr stats a very small ratio > of cache hit (~ 0.01%). > > Can it be the reason why the faceting is slow? Does it make sense to > increase the filterCache size so it matches more or less the number > of different possible values for the faceted fields? Would that not > make the memory usage explode? > > Thanks for your help ! > > -- > Jerome Eteve. > > Chat with me live at http://www.eteve.net > > jer...@eteve.net
Re: facet performance tips
Yes, increasing the filterCache size will help with Solr 1.3 performance. Do note that trunk (soon Solr 1.4) has dramatically improved faceting performance. Erik On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: Hi everyone, I'm using some faceting on a solr index containing ~ 160K documents. I perform facets on multivalued string fields. The number of possible different values is quite large. Enabling facets degrades the performance by a factor 3. Because I'm using solr 1.3, I guess the facetting makes use of the filter cache to work. My filterCache is set to a size of 2048. I also noticed in my solr stats a very small ratio of cache hit (~ 0.01%). Can it be the reason why the faceting is slow? Does it make sense to increase the filterCache size so it matches more or less the number of different possible values for the faceted fields? Would that not make the memory usage explode? Thanks for your help ! -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
RE: facet performance tips
Jerome, Yes you need to increase the filterCache size to something close to unique number of facet elements. But also consider the RAM required to accommodate the increase. I did see a significant performance gain by increasing the filterCache size Thanks, Kalyan Manepalli -Original Message- From: Jérôme Etévé [mailto:jerome.et...@gmail.com] Sent: Wednesday, August 12, 2009 12:31 PM To: solr-user@lucene.apache.org Subject: facet performance tips Hi everyone, I'm using some faceting on a solr index containing ~ 160K documents. I perform facets on multivalued string fields. The number of possible different values is quite large. Enabling facets degrades the performance by a factor 3. Because I'm using solr 1.3, I guess the facetting makes use of the filter cache to work. My filterCache is set to a size of 2048. I also noticed in my solr stats a very small ratio of cache hit (~ 0.01%). Can it be the reason why the faceting is slow? Does it make sense to increase the filterCache size so it matches more or less the number of different possible values for the faceted fields? Would that not make the memory usage explode? Thanks for your help ! -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
facet performance tips
Hi everyone, I'm using some faceting on a solr index containing ~ 160K documents. I perform facets on multivalued string fields. The number of possible different values is quite large. Enabling facets degrades the performance by a factor 3. Because I'm using solr 1.3, I guess the facetting makes use of the filter cache to work. My filterCache is set to a size of 2048. I also noticed in my solr stats a very small ratio of cache hit (~ 0.01%). Can it be the reason why the faceting is slow? Does it make sense to increase the filterCache size so it matches more or less the number of different possible values for the faceted fields? Would that not make the memory usage explode? Thanks for your help ! -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net