Hi guys
Thanks for your helpful help!

More details about my env.
Cluster:
A 4 GCP(google cloud) hosts cluster, each host: 16Core cpu, 60G mem, 2TB HDD.
I set up 2 solr nodes on each host and there are 1000+ replicas on each solr 
node.
(Sorry for forgetting this before: 2 solr node on each host, so there are 2000+ 
replicas on each host...)
zookeeper has 3 instances, reuse the solr hosts (using a separated disk).
Workload:
just index tens of millions of record (total size near 100GB) into dozens (near 
100) of indexes, 30 concurrent, no search operation at the same time (I will do 
search test later).
Error:
"unstable" means there are many solr errors in log and the solr request is 
failed.
e.g. "No registered leader was found after waiting for 4000ms , collection ..."

@ Hendrik
after saw your reply, I noted my replicas num is too big, so I adjusted to: 720 
replicas on each host (reduced shard num), then all my index requests are 
successful. (happy)
but I saw the JVM peak mem usage is 24GB (via solr web UI), it's too big to be 
risky in the future (my JMV xmx is 32GB).
so would you give me some guides to reduce the memory usage? (like you 
mentioned "tuned a few caches down to a minimum")

@ Erick
I gave details above, please check.

@ Shawn
thanks for your info, it's a bad news...
hope solr-cloud can handle more collections in future.


________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Thursday, August 29, 2019 21:58
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: Question: Solr perform well with thousands of replicas?

On 8/28/2019 9:27 PM, Hongxu Ma wrote:
> I have a solr-cloud cluster, but it's unstable when collection number is big: 
> 1000 replica/core per solr node.
>
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."

The SolrPerformanceProblems wiki page is my work.  I only wrote that
sentence because other devs working in SolrCloud code told me that was
the case.  Based on things said by people (including your comments on
this thread), I think newer versions probably aren't any better, and
that sentence needs to be removed from the wiki page.

See this issue that I created a few years ago:

https://issues.apache.org/jira/browse/SOLR-7191

This issue was closed with a 6.3 fix version ... but nothing was
committed with a tag for the issue, so I have no idea why it was closed.
  I think the problems described there are still there in recent Solr
versions, and MIGHT be even worse than they were in 4.x and 5.x.

> I want to know does it mean a single solr node can handle thousands of 
> replicas? or a solr cluster can (if so, what's the size of the cluster?)

A single standalone Solr instance can handle lots of indexes, but Solr
startup is probably going to be slow.

No matter how many nodes there are, SolrCloud has problems with
thousands of collections or replicas due to issues with the overseer
queue getting enormous.  When I created SOLR-7191, I found that
restarting a node in a cloud with thousands of replicas (cores) can
result in a performance death spiral.

I haven't ever administered a production setup with thousands of
indexes, I've only done some single machine testing for the issue I
created.  I need to repeat it with 8.x and see what happens.  But I have
very little free time these days.

Thanks,
Shawn

Reply via email to