Re: Solr Heap Usage

Shawn Heisey Sat, 01 Jun 2019 07:07:58 -0700

On 6/1/2019 12:27 AM, John Davis wrote:

I've read a bunch of the wiki's on solr heap usage and wanted to confirm my
understanding of what all does solr use the heap for:

This is something that's not straightforward to answer. It would not bewrong to say that Solr uses the Java heap for everything it does ... butsaying that doesn't help you.

It's extremely difficult to predict in advance exactly how much heap youneed to give to Solr.


https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

We can (and sometimes do) make specific recommendations to users thatprovide us with a wealth of information about their setup ... but youshould know that those recommendations are always given with caveats.There's a good chance that things will actually work with less heap thanwe mention -- we're going to aim for larger values simply because theperformance implications of a heap that's too small are orders ofmagnitude worse than one that's too large.

In practice, the way I deal with heap sizing is to start with a largevalue that seems big enough to work, and then analyze GC logs to try anddetermine whether it needs to be changed. The initial value is mostlyarbitrary, influenced by experience.

Most of Solr's functionality is provided by Lucene, which is aprogramming API for search. For me, Lucene, and Solr's usage of Lucene,is mostly a black box - precisely how it functions internally is unknownto me. The source code is available, but it would take a very in-depthstudy to actually understand it.

1. Indexing new documents - until committed? if not how long are the new
documents kept in heap?

Lucene sets aside a buffer to hold data that will be flushed to a newsegment. Solr's default for this buffer size is 100MB. That buffer isflushed when it fills up, not just on commmit. The segments produced bydefault are smaller than 100MB, so clearly Lucene does not store thedata internally in the precise format that it ends up on disk.

Additional memory is needed for indexing beyond that 100MB buffer forall the manipulations that Lucene and Solr must perform.

2. Merging segments - does solr load the entire segment in memory or chunks
of it? if later how large are these chunks

Again, this is Lucene, so I don't know in detail. I can optimize anindex that is much larger than all the memory in the system, so itcannot be loading all the data into memory. I don't think it'senormously RAM-hungry, but it does hit the CPU pretty hard. The fastestI have ever seen segment merging proceed is at about 30 megabytes persecond, with 20 megabytes per second being more common. Virtually allmodern disks are capable of faster transfer rates than 30MB/s,especially RAID10 volumes and SSD -- the disk is not the bottleneck.

3. Queries, facets, caches - anything else major?

Facets, grouping, and sorting are all RAM-hungry processes whose memoryusage is greatly improved by using docValues in the field definition --because docValues is already exactly the right data for those processes.


Was this wiki page one of the things you read?  I wrote it:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn

Re: Solr Heap Usage

Reply via email to