Hi Shawn! Thanks again for replying, a few answers to your questions.

Q: How did the index size compare 3 months ago to today?
A: Pretty much the same, we've been using websolr for years but they had a lot 
of performance issues and it was expensive (their service went down pretty 
frequently), so we moved to their v4.10 to our own cluster of 8.11 SolrCloud 
(our test environments are using a Lucene Match version  7.1.0 though). So our 
collection might have increased by maybe 1M records at most, but I really think 
so.

Q: How much total index data is there on each Solr node?
A: I'm not what's the difference with total # docs, I think it's the same but 
I'm probably wrong, it's 68.6mn per node

Q: What is the total document count?
A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm replicating 
the same data on both)

Q: but it would be great to have an on-disk size and document count (max docs, 
not num docs) for each collection
A: I'm not sure where to get that from metrics, based on the cloud dashboard it 
say the following by shard:
preview_s1r2:  1.9Gb
preview_s2r11:  1.9Gb
preview_s2r6:  1.9Gb
staging-d_s1r1:  1.8Gb
staging-d_s2r4:  1.8Gb
staging-a_s1r1:  1.7Gb
staging-a_s2r4:  1.7Gb
staging-c_s2r5:  1.6Gb
staging-c_s1r2:  1.6Gb
pre-prod_s1r1:  1.6Gb
pre-prod_s2r4:  1.6Gb
staging-b_s1r2:  1.5Gb
staging-b_s2r5:  1.5Gb
That is replicated on the other node.

> I think what I would start with is lowering autoCommit to 15000 and raising 
> autoSoftCommit to 60000.  
I will try this, as far as I understood from Solr documentation for NRT, auto 
soft commit should be lower than autocommit as hard commit is a more expensive 
operation, should I try autoSoftCommit 15000 and autoCommit 60000 ? The base 
line is that we need to have an "almost instant" availability when indexing 
data, we use solr for searches so whenever we add a new record it needs to be 
available on search almost instantly, I'm not sure what's the best for this, 
but we've been using the configuration I mentioned for a lot of time (we had 
individual "indexes" oin websolr, I'm fairly sure they weren't on different 
servers but I also don't have any info on how much memory those servers had).

For GC logs, yes! Here is the .0 file for each node:

node 1: 
https://drive.google.com/file/d/1IgneAh412HQbHC2cwZTD_PIAR7NFPWe8/view?usp=share_link
node 2: 
https://drive.google.com/file/d/1lll7WQK3T_p3G9bFv3w1B5SPnvTQleV3/view?usp=share_link

Processes list: sorry if this is not enough, I don't know how else to make it 
available, but I'm open to any suggestions!
Node 1: 
https://drive.google.com/file/d/1YQF0571oHecyPwEEuxZsxyafIf5EfUz9/view?usp=share_link
Node 2: 
https://drive.google.com/file/d/1xX72JS-LVb-VfJBxVUk45-jtGvx8SBHv/view?usp=share_link

Thank you very much again for your help on this!

MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
matias.la...@passare.com | +54 11-6357-2143


-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org> 
Sent: Tuesday, November 29, 2022 7:10 PM
To: users@solr.apache.org
Subject: Re: Very High CPU when indexing

On 11/29/22 13:58, Matias Laino wrote:
> Thank you Shawn, I'm definitely checking out those recommendations, but what 
> I cannot explain is how this worked fine for the last 3 months and then 
> suddenly this issue started happening.

I'd say you got REALLY lucky that there weren't problems sooner. How did the 
index size compare 3 months ago to today?  How much total index data is there 
on each Solr node?  What is the total document count?  From the original 
message, I can conclude it's probably in the ballpark of 60 million, but it 
would be great to have an on-disk size and document count (max docs, not num 
docs) for each collection.

> On our application, customers expect that when a record is created, that 
> record should be available on search immediately (that's why the auto Soft 
> commit of 1 second), what can you recommend for a situation like this?

I think what I would start with is lowering autoCommit to 15000 and raising 
autoSoftCommit to 60000.  As I said, it is completely unrealistic to expect 1 
second latency unless the index is VERY small. With a total document count 
north of 60 million, I would not call it small, even though there are users 
with much bigger indexes.

By chance can you gather the GC logs from your install and make them available? 
 That can answer a LOT of questions.

On the wiki article I sent last time is a section about getting a screenshot of 
a process list.  Can you get that and make it available?

Depending on what I learn from that info, I may have more questions.

Thanks,
Shawn

Reply via email to