Re: slow solr facet processing

Ere Maijala Fri, 05 Jan 2018 06:15:58 -0800

Hi Everyone,

This is a followup on the discussion from September 2017. Since thenI've spent a lot of time gathering a better understanding on docValuescompared to UIF and other stuff related to Solr performance. Here's asummary of the results based on my real-world experience:


1. Making sure Solr needs as little Java heap as possible is crucial.

2. UIF requires a lot of Java heap. With a larger index it becomesimpractical, since Java GC can't easily keep up with the heaps required.

3. UIF is really fast, but only after serious warmup. DocValues workbetter if the index is updated regularly, since same level of warmup isnot needed.

4. DocValues, taking advantage of memory-mapped files, don't have theabove problem, and after moving to all-docValues we have been able toreduce the Java heap from 31G to 6G. This is pretty significant, sinceit means we don't have to deal with long GC pauses.

5. Make sure docValues are enabled also for all fields used for sorting.This helps avoid spending memory on field cache. Without docValues wecould easily have 2 GB of field cache entries.

5. It seems that having docValues for the id field is useful too. Fornow stored needs to remain true too (seehttps://issues.apache.org/jira/browse/SOLR-10816).

6. Sharding the index helps faceting with docValues perform more work inparallel and results in a lot better performance. This doesn't seem tonegatively affect the overall performance (at least enough to beperceived), and it seems that splitting our index to three shardsresulted in speedup that's better than previous performance divided bythree. There is a caveat [1], though.

7. In many cases fields that have docValues enabled can be switched fromstored="true" to stored="false" since Solr can fetch the contents fromdocValues. A notable exception is multivalued fields where the order ofthe values is important. This means that enabling docValues doesn't addto the index size significantly.

8. Different replica types available in Solr 7 are really useful inreducing the CPU time spent indexing records. I'd still like to have away to have PULL replicas with NRT replicas so that only the PULLreplicas handle search queries.

9. Lastly, a lot can be done on the application level. For instance inour case many users don't care about facets or only use a couple ofthem, so we fetch them asynchronously as needed and collapse most bydefault without fetching them at all. This lowers the server loadsignificantly (I'll work on contributing the option to upstream VuFind).



I hope this helps others make informed choices.

--Ere

[1] Care must be taken to avoid requests that cause Solr to fetch a lotof rows at once from each shard, since that blows up the memory usagewreaking havoc in Solr. One particular case that, at first sight,doesn't look too dangerous, is deep paging without a cursor (Yonik has agood explanation of this at http://yonik.com/solr/paging-and-deep-paging/).

Re: slow solr facet processing

Reply via email to