On 11/30/22 08:57, Matias Laino wrote:
Q: What is the total document count?
A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm replicating 
the same data on both)

Each core has a count.  And here you can see what I was talking about with max doc compared to num docs.

https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0

Q: but it would be great to have an on-disk size and document count (max docs, 
not num docs) for each collection
A: I'm not sure where to get that from metrics, based on the cloud dashboard it 
say the following by shard:
preview_s1r2:  1.9Gb
preview_s2r11:  1.9Gb
preview_s2r6:  1.9Gb
staging-d_s1r1:  1.8Gb
staging-d_s2r4:  1.8Gb
staging-a_s1r1:  1.7Gb
staging-a_s2r4:  1.7Gb
staging-c_s2r5:  1.6Gb
staging-c_s1r2:  1.6Gb
pre-prod_s1r1:  1.6Gb
pre-prod_s2r4:  1.6Gb
staging-b_s1r2:  1.5Gb
staging-b_s2r5:  1.5Gb
That is replicated on the other node.

So you've got 22GB of data, and assuming Solr is the only thing running on the machine, only about 8GB of memory to cache it (total RAM of 16GB minus 8GB for the Solr heap).  I would hope for at least of 12GB of cache for that, and more is always better. 8GB may not be enough.  If you have other software running on the machine, it will be even less.  Does ZK live on the same instance?  If so, how much heap are you giving to that?

Performance of a system is often perfectly fine up until some threshold, and once you throw just little bit more data in the mix so it goes over that threshold, performance drops drastically. That is how a small increase can bring a system to its knees.

If you can upgrade the instance to one with more memory, that might also help, but I do think that the biggest problem is the autoSoftCommit setting.  If you really can't make it at least two minutes, which is the value I would use, then set it as high as you can.  10 to 30 seconds, maybe.

Thanks,
Shawn

Reply via email to