On 11/30/22 08:57, Matias Laino wrote:
Q: What is the total document count? A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm replicating the same data on both)
Each core has a count. And here you can see what I was talking about with max doc compared to num docs.
https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0
Q: but it would be great to have an on-disk size and document count (max docs, not num docs) for each collection A: I'm not sure where to get that from metrics, based on the cloud dashboard it say the following by shard: preview_s1r2: 1.9Gb preview_s2r11: 1.9Gb preview_s2r6: 1.9Gb staging-d_s1r1: 1.8Gb staging-d_s2r4: 1.8Gb staging-a_s1r1: 1.7Gb staging-a_s2r4: 1.7Gb staging-c_s2r5: 1.6Gb staging-c_s1r2: 1.6Gb pre-prod_s1r1: 1.6Gb pre-prod_s2r4: 1.6Gb staging-b_s1r2: 1.5Gb staging-b_s2r5: 1.5Gb That is replicated on the other node.
So you've got 22GB of data, and assuming Solr is the only thing running on the machine, only about 8GB of memory to cache it (total RAM of 16GB minus 8GB for the Solr heap). I would hope for at least of 12GB of cache for that, and more is always better. 8GB may not be enough. If you have other software running on the machine, it will be even less. Does ZK live on the same instance? If so, how much heap are you giving to that?
Performance of a system is often perfectly fine up until some threshold, and once you throw just little bit more data in the mix so it goes over that threshold, performance drops drastically. That is how a small increase can bring a system to its knees.
If you can upgrade the instance to one with more memory, that might also help, but I do think that the biggest problem is the autoSoftCommit setting. If you really can't make it at least two minutes, which is the value I would use, then set it as high as you can. 10 to 30 seconds, maybe.
Thanks, Shawn