On 9/12/23 18:28, rajani m wrote:
   Solr 9.1.1 version, upon restarting solr on any node in the cluster, a
unique event is triggered across all the *other* nodes in the cluster that
has an impact similar to restarting solr on all the other nodes in the
cluster. There is dip in the cpu usage, all the caches are emptied and
warmed up, there are disk reads/writes on all the other nodes.

How much RAM is in each node? How much is given to the Java heap? Are you running more than one Solr instance on each node? How much disk space do the indexes on each node consume?

What are the counts of:

* Nodes
* Collections
* Shards per collection
* Replica count per shard
* Documents per shard

There is sometimes some confusion about replica count. I've seen people say they have "one shard and one replica" when the right way to state it is that the replica count is two.

If the counts above are large (meaning that you have a LOT of cores) then restarting a node can be very disruptive to the cloud as a whole. See this issue from several years ago where I explored this:

https://issues.apache.org/jira/browse/SOLR-7191

The issue has been marked as resolved in version 6.3.0, but no code was modified, and as far as I know, the problem still exists.

It's worth noting that in my tests for that issue, the collections were empty. For collections that actually have data, the problem will be worse.

If there are a lot of adds/updates/deletes happening, then the delta between the replicas might exceed the threshold for transaction log recovery. Solr may be doing a full replication to the cores on the restarted node. But I would expect that to only affect the shard leaders, which are the source for the replicated data.

Thanks,
Shawn

Reply via email to