Re: Restart on a node triggers restart like impact on all the other nodes in cluster

Shawn Heisey Wed, 13 Sep 2023 05:51:42 -0700

On 9/12/23 18:28, rajani m wrote:

   Solr 9.1.1 version, upon restarting solr on any node in the cluster, a
unique event is triggered across all the *other* nodes in the cluster that
has an impact similar to restarting solr on all the other nodes in the
cluster. There is dip in the cpu usage, all the caches are emptied and
warmed up, there are disk reads/writes on all the other nodes.

How much RAM is in each node? How much is given to the Java heap? Areyou running more than one Solr instance on each node? How much diskspace do the indexes on each node consume?


What are the counts of:

* Nodes
* Collections
* Shards per collection
* Replica count per shard
* Documents per shard

There is sometimes some confusion about replica count. I've seen peoplesay they have "one shard and one replica" when the right way to state itis that the replica count is two.

If the counts above are large (meaning that you have a LOT of cores)then restarting a node can be very disruptive to the cloud as a whole.See this issue from several years ago where I explored this:


https://issues.apache.org/jira/browse/SOLR-7191

The issue has been marked as resolved in version 6.3.0, but no code wasmodified, and as far as I know, the problem still exists.

It's worth noting that in my tests for that issue, the collections wereempty. For collections that actually have data, the problem will be worse.

If there are a lot of adds/updates/deletes happening, then the deltabetween the replicas might exceed the threshold for transaction logrecovery. Solr may be doing a full replication to the cores on therestarted node. But I would expect that to only affect the shardleaders, which are the source for the replicated data.


Thanks,
Shawn

Re: Restart on a node triggers restart like impact on all the other nodes in cluster

Reply via email to