Every faceting implementation I’ve seen (not just Solr/Lucene) makes big in-memory lists. Lots of values means a bigger list.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Sep 8, 2015, at 8:33 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 9/8/2015 9:10 AM, adfel70 wrote: >> I am trying to understand why faceting on a field with lots of unique values >> has a great impact on query performance. Since Googling for Solr facet >> algorithm did not yield anything, I looked how facets are implemented in >> Lucene. I found out that there are 2 methods - taxonomy-based and >> SortedSetDocValues-based. Does Solr facet capabilities are based on one of >> those methods? if so, I still cant understand why unique values impacts >> query performance... > > Lucene's facet implementation is completely separate (and different) > from Solr's implementation. I am not familiar with the inner workings > of either implementation. Solr implemented faceting long before Lucene > did. I think *Solr* actually contains at least two different facet > implementations, used for different kinds of facets. > > Faceting on a field with many unique values uses a HUGE amount of heap > memory, which is likely why query performance is impacted. > > I have a dev system with all my indexes (each of which has dedicated > hardware for production) on it. Normally it requires 15GB of heap to > operate properly. Every now and then, I get asked to do a duplicate > check on a field that *should* be unique, on an index with 250 million > docs in it. The query that I am asked to do for the facet matches about > 100 million docs. This facet query, on a field that DOES have > docValues, will throw OOM if my heap is less than 27GB. The dev machine > only has 32GB of RAM, so as you might imagine, performance is really > terrible when I do this query. Thankfully it's a dev machine. When I > was doing these queries, it was running 4.9.1. I have since upgraded it > to 5.2.1, as a proof of concept for upgrading our production indexes ... > but I have not attempted the facet query since the upgrade. > > Thanks, > Shawn >