Hi Shawn, thanks again for the reply. I've tried increasing the memory to 32 gb and 16gb of ram heap with 8 cores, and even though I still see peaks of 300% CPU on the solr process it can handle it (solr doesn't go down). But, I've tried several different configurations for the auto commit and soft commit and results always take a few minutes to show up on search, which is really unacceptable for us, I'm not sure how to proceed now.
I've looked at the cores and for example of the collection I'm testing against right now, I see these values: Core 1: Num Docs:4806841 Max Doc:4845793 Heap Memory Usage:387392 Core 2: Num Docs:4810159 Max Doc:4849229 Heap Memory Usage:450008 Other collections look fairly similar, except for this one: Preview Core1: Num Docs:5774937 Max Doc:5832482 Heap Memory Usage:407424 Preview Core2: Num Docs:5774937 Max Doc:5833942 Heap Memory Usage:463632 Preview Core 3: Num Docs:5778245 Max Doc:5790174 Heap Memory Usage:480672 For some reason, the "Preview Collection" has 3 shards instead of 2 like it was before... maybe that could be related? The collection overview say shards 2 and replication factor 2. As additional info, Zookeeper is running on it's own server and solr is the only thing running on that server, aside some system processes. Thanks again! MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT matias.la...@passare.com | +54 11-6357-2143 -----Original Message----- From: Shawn Heisey <elyog...@elyograg.org> Sent: Thursday, December 1, 2022 1:07 AM To: users@solr.apache.org Subject: Re: Very High CPU when indexing On 11/30/22 08:57, Matias Laino wrote: > Q: What is the total document count? > A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm > replicating the same data on both) Each core has a count. And here you can see what I was talking about with max doc compared to num docs. https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0 > Q: but it would be great to have an on-disk size and document count > (max docs, not num docs) for each collection > A: I'm not sure where to get that from metrics, based on the cloud dashboard > it say the following by shard: > preview_s1r2: 1.9Gb > preview_s2r11: 1.9Gb > preview_s2r6: 1.9Gb > staging-d_s1r1: 1.8Gb > staging-d_s2r4: 1.8Gb > staging-a_s1r1: 1.7Gb > staging-a_s2r4: 1.7Gb > staging-c_s2r5: 1.6Gb > staging-c_s1r2: 1.6Gb > pre-prod_s1r1: 1.6Gb > pre-prod_s2r4: 1.6Gb > staging-b_s1r2: 1.5Gb > staging-b_s2r5: 1.5Gb > That is replicated on the other node. So you've got 22GB of data, and assuming Solr is the only thing running on the machine, only about 8GB of memory to cache it (total RAM of 16GB minus 8GB for the Solr heap). I would hope for at least of 12GB of cache for that, and more is always better. 8GB may not be enough. If you have other software running on the machine, it will be even less. Does ZK live on the same instance? If so, how much heap are you giving to that? Performance of a system is often perfectly fine up until some threshold, and once you throw just little bit more data in the mix so it goes over that threshold, performance drops drastically. That is how a small increase can bring a system to its knees. If you can upgrade the instance to one with more memory, that might also help, but I do think that the biggest problem is the autoSoftCommit setting. If you really can't make it at least two minutes, which is the value I would use, then set it as high as you can. 10 to 30 seconds, maybe. Thanks, Shawn