Re: Very High CPU when indexing

Shawn Heisey Wed, 30 Nov 2022 20:07:17 -0800

On 11/30/22 08:57, Matias Laino wrote:

Q: What is the total document count?
A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm replicating 
the same data on both)

Each core has a count. And here you can see what I was talking aboutwith max doc compared to num docs.


https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0

Q: but it would be great to have an on-disk size and document count (max docs, 
not num docs) for each collection
A: I'm not sure where to get that from metrics, based on the cloud dashboard it 
say the following by shard:
preview_s1r2:  1.9Gb
preview_s2r11:  1.9Gb
preview_s2r6:  1.9Gb
staging-d_s1r1:  1.8Gb
staging-d_s2r4:  1.8Gb
staging-a_s1r1:  1.7Gb
staging-a_s2r4:  1.7Gb
staging-c_s2r5:  1.6Gb
staging-c_s1r2:  1.6Gb
pre-prod_s1r1:  1.6Gb
pre-prod_s2r4:  1.6Gb
staging-b_s1r2:  1.5Gb
staging-b_s2r5:  1.5Gb
That is replicated on the other node.

So you've got 22GB of data, and assuming Solr is the only thing runningon the machine, only about 8GB of memory to cache it (total RAM of 16GBminus 8GB for the Solr heap). I would hope for at least of 12GB ofcache for that, and more is always better. 8GB may not be enough. Ifyou have other software running on the machine, it will be even less. Does ZK live on the same instance? If so, how much heap are you givingto that?

Performance of a system is often perfectly fine up until some threshold,and once you throw just little bit more data in the mix so it goes overthat threshold, performance drops drastically. That is how a smallincrease can bring a system to its knees.

If you can upgrade the instance to one with more memory, that might alsohelp, but I do think that the biggest problem is the autoSoftCommitsetting. If you really can't make it at least two minutes, which is thevalue I would use, then set it as high as you can. 10 to 30 seconds, maybe.


Thanks,
Shawn

Re: Very High CPU when indexing

Reply via email to