On 6/1/2019 12:27 AM, John Davis wrote:
I've read a bunch of the wiki's on solr heap usage and wanted to confirm my
understanding of what all does solr use the heap for:

This is something that's not straightforward to answer. It would not be wrong to say that Solr uses the Java heap for everything it does ... but saying that doesn't help you.

It's extremely difficult to predict in advance exactly how much heap you need to give to Solr.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

We can (and sometimes do) make specific recommendations to users that provide us with a wealth of information about their setup ... but you should know that those recommendations are always given with caveats. There's a good chance that things will actually work with less heap than we mention -- we're going to aim for larger values simply because the performance implications of a heap that's too small are orders of magnitude worse than one that's too large.

In practice, the way I deal with heap sizing is to start with a large value that seems big enough to work, and then analyze GC logs to try and determine whether it needs to be changed. The initial value is mostly arbitrary, influenced by experience.

Most of Solr's functionality is provided by Lucene, which is a programming API for search. For me, Lucene, and Solr's usage of Lucene, is mostly a black box - precisely how it functions internally is unknown to me. The source code is available, but it would take a very in-depth study to actually understand it.

1. Indexing new documents - until committed? if not how long are the new
documents kept in heap?

Lucene sets aside a buffer to hold data that will be flushed to a new segment. Solr's default for this buffer size is 100MB. That buffer is flushed when it fills up, not just on commmit. The segments produced by default are smaller than 100MB, so clearly Lucene does not store the data internally in the precise format that it ends up on disk.

Additional memory is needed for indexing beyond that 100MB buffer for all the manipulations that Lucene and Solr must perform.

2. Merging segments - does solr load the entire segment in memory or chunks
of it? if later how large are these chunks

Again, this is Lucene, so I don't know in detail. I can optimize an index that is much larger than all the memory in the system, so it cannot be loading all the data into memory. I don't think it's enormously RAM-hungry, but it does hit the CPU pretty hard. The fastest I have ever seen segment merging proceed is at about 30 megabytes per second, with 20 megabytes per second being more common. Virtually all modern disks are capable of faster transfer rates than 30MB/s, especially RAID10 volumes and SSD -- the disk is not the bottleneck.

3. Queries, facets, caches - anything else major?

Facets, grouping, and sorting are all RAM-hungry processes whose memory usage is greatly improved by using docValues in the field definition -- because docValues is already exactly the right data for those processes.

Was this wiki page one of the things you read?  I wrote it:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn

Reply via email to