Oh great, thanks for the hint! I've upvoted this issue, since I think it might be worth to be able to configure that (rather low) ThreadPool count.
On Wed, 3 Apr 2019 at 10:23, Shalin Shekhar Mangar <shalinman...@gmail.com> wrote: > Thanks Roger. This was reported earlier but missed our attention. > > The issue is https://issues.apache.org/jira/browse/SOLR-11208 > > On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann <roger.lehm...@offerista.com> > wrote: > > > To be more specific: I currently have 19 collections, where each node has > > exactly one replica per collection. A new node will automatically create > > new replicas on itself, one for each existing collection (see > > cluster-policy above). > > So when removing a node, all 19 collection replicas of it need to be > > removed. This can't be done in one go due to thread count (parallel > > synchronous execution) being only 10 and is not scaling up when > necessary. > > > > On Fri, 29 Mar 2019 at 14:20, Roger Lehmann <roger.lehm...@offerista.com > > > > wrote: > > > > > Situation > > > > > > I'm currently trying to set up SolrCloud in an AWS Autoscaling Group, > so > > > that it can scale dynamically. > > > > > > I've also added the following triggers to Solr, so that each node will > > > have 1 (and only one) replication of each collection: > > > > > > { > > > "set-cluster-policy": [ > > > {"replica": "<2", "shard": "#EACH", "node": "#EACH"} > > > ], > > > "set-trigger": [{ > > > "name": "node_added_trigger", > > > "event": "nodeAdded", > > > "waitFor": "5s", > > > "preferredOperation": "ADDREPLICA" > > > },{ > > > "name": "node_lost_trigger", > > > "event": "nodeLost", > > > "waitFor": "120s", > > > "preferredOperation": "DELETENODE" > > > }] > > > } > > > > > > This works pretty well. But my problem is that when the a node gets > > > removed, it doesn't remove all 19 replicas from this node and I have > > > problems when accessing the "nodes" page: > > > > > > [image: enter image description here] > > > <https://i.stack.imgur.com/QyJrY.png> > > > > > > In the logs, this exception occurs: > > > > > > Operation deletenode > > failed:java.util.concurrent.RejectedExecutionException: Task > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2 > > rejected from > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df > [Running, > > pool size = 10, active threads = 10, queued tasks = 0, completed tasks = > 1] > > > at > > > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > > > at > > > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > > > at > > > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > > > at > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194) > > > at > > > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) > > > at > > > org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276) > > > at > > > org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95) > > > at > > > org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109) > > > at > > > org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62) > > > at > > > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292) > > > at > > > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496) > > > at > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > at java.lang.Thread.run(Thread.java:748) > > > > > > Problem description > > > > > > So, the problem is that it only has a pool size of 10, of which 10 are > > > busy and nothing gets queued (synchronous execution). In fact, it > really > > > only removed 10 replicas and the other 9 replicas stayed there. When > > > manually sending the API command to delete this node it works fine, > since > > > Solr only needs to remove the remaining 9 replicas and everything is > good > > > again. > > > Question > > > > > > How can I either increase this (small) thread pool size and/or activate > > > queueing the remaining deletion tasks? Another solution might be to > retry > > > the failed task until it succeeds. > > > > > > Using Solr 7.7.1 on Ubuntu Server installed with the installation > script > > > from Solr (so I guess it's using Jetty?). > > > > > > Thanks for your help! > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Roger Lehmann Linux-System-Engineer T: 0351-418 894 –76 *roger.lehm...@offerista.com <roger.lehm...@offerista.com>**https://www.xing.com/profile/Roger_Lehmann8 <https://www.xing.com/profile/Roger_Lehmann8>* * <https://www.offerista.com/>*__________________________________________ Offerista Group GmbH | Schützenplatz 14 | D - 01067 Dresden Geschäftsführung: Tobias Bräuer, Benjamin Thym Sitz Dresden | Amtsgericht Dresden | HRB 28678