Every faceting implementation I’ve seen (not just Solr/Lucene) makes big 
in-memory lists. Lots of values means a bigger list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Sep 8, 2015, at 8:33 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/8/2015 9:10 AM, adfel70 wrote:
>> I am trying to understand why faceting on a field with lots of unique values
>> has a great impact on query performance. Since Googling for Solr facet
>> algorithm did not yield anything, I looked how facets are implemented in
>> Lucene. I found out that there are 2 methods - taxonomy-based and
>> SortedSetDocValues-based. Does Solr facet capabilities are based on one of
>> those methods? if so, I still cant understand why unique values impacts
>> query performance...
> 
> Lucene's facet implementation is completely separate (and different)
> from Solr's implementation.  I am not familiar with the inner workings
> of either implementation.  Solr implemented faceting long before Lucene
> did.  I think *Solr* actually contains at least two different facet
> implementations, used for different kinds of facets.
> 
> Faceting on a field with many unique values uses a HUGE amount of heap
> memory, which is likely why query performance is impacted.
> 
> I have a dev system with all my indexes (each of which has dedicated
> hardware for production) on it.  Normally it requires 15GB of heap to
> operate properly.  Every now and then, I get asked to do a duplicate
> check on a field that *should* be unique, on an index with 250 million
> docs in it.  The query that I am asked to do for the facet matches about
> 100 million docs.  This facet query, on a field that DOES have
> docValues, will throw OOM if my heap is less than 27GB.  The dev machine
> only has 32GB of RAM, so as you might imagine, performance is really
> terrible when I do this query.  Thankfully it's a dev machine.  When I
> was doing these queries, it was running 4.9.1.  I have since upgraded it
> to 5.2.1, as a proof of concept for upgrading our production indexes ...
> but I have not attempted the facet query since the upgrade.
> 
> Thanks,
> Shawn
> 

Reply via email to