Situation I'm currently trying to set up SolrCloud in an AWS Autoscaling Group, so that it can scale dynamically.
I've also added the following triggers to Solr, so that each node will have 1 (and only one) replication of each collection: { "set-cluster-policy": [ {"replica": "<2", "shard": "#EACH", "node": "#EACH"} ], "set-trigger": [{ "name": "node_added_trigger", "event": "nodeAdded", "waitFor": "5s", "preferredOperation": "ADDREPLICA" },{ "name": "node_lost_trigger", "event": "nodeLost", "waitFor": "120s", "preferredOperation": "DELETENODE" }] } This works pretty well. But my problem is that when the a node gets removed, it doesn't remove all 19 replicas from this node and I have problems when accessing the "nodes" page: [image: enter image description here] <https://i.stack.imgur.com/QyJrY.png> In the logs, this exception occurs: Operation deletenode failed:java.util.concurrent.RejectedExecutionException: Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df[Running, pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 1] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276) at org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95) at org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109) at org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Problem description So, the problem is that it only has a pool size of 10, of which 10 are busy and nothing gets queued (synchronous execution). In fact, it really only removed 10 replicas and the other 9 replicas stayed there. When manually sending the API command to delete this node it works fine, since Solr only needs to remove the remaining 9 replicas and everything is good again. Question How can I either increase this (small) thread pool size and/or activate queueing the remaining deletion tasks? Another solution might be to retry the failed task until it succeeds. Using Solr 7.7.1 on Ubuntu Server installed with the installation script from Solr (so I guess it's using Jetty?). Thanks for your help!