Dear all, I have seen replica not recovering issue in both solr 8.7.0 as well as in 8.8.2 version, where we start getting the below error message in solr logs of leader and the replica never recovers unless the leader node is restarted manually to clear the queue.
2021-09-29 15:02:20.541 ERROR (updateExecutor-9-thread-6936-processing-x:multi_shard_shard2_replica_n6 r:core_node9 null n:shard2leader_ip:8983_solr c:multi_shard s:shard2) x:multi_shard_shard2_replica_n6 o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling SolrCmdDistributor$Req: cmd=add{_version_=1712248823138484227,id=astrier Caryota concurrency,commitWithin=15}; node=StdNode: http://shard2replica_ip:8983/solr/multi_shard_shard2_replica_n8/ to http://shard2replica_ip:8983/solr/multi_shard_shard2_replica_n8/ java.io.IOException: java.util.concurrent.RejectedExecutionException: *Max requests queued per destination 3000 exceeded *for HttpDestination[http://shard2replica_ip:8983 ]@3fc860ca,queue=3000,pool=MultiplexConnectionPool@5bcab306[c=0/4/4,a=4,i=0] at org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:197) ~[?:?] at org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:151) ~[?:?] at org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:145) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:216) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:209) ~[?:?] at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:172) ~[?:?] at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:106) ~[?:?] at org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83) ~[?:?] at org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:342) ~[?:?] at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:237) ~[?:?] at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:181) ~[?:?] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) ~[metrics-core-4.1.5.jar:4.1.5] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Suppressed: java.io.IOException: java.util.concurrent.RejectedExecutionException: Max requests queued per destination 3000 exceeded for HttpDestination[http://shard2replica_ip:8983 ]@3fc860ca,queue=3000,pool=MultiplexConnectionPool@5bcab306[c=0/4/4,a=4,i=0] at org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:197) ~[?:?] at org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:151) ~[?:?] at org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:145) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:216) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:209) ~[?:?] at org.apache.solr.common.util.JavaBinCodec.close(JavaBinCodec.java:1277) ~[?:?] at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:107) ~[?:?] at org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83) ~[?:?] at org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:342) ~[?:?] at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:237) ~[?:?] at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:181) ~[?:?] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) ~[metrics-core-4.1.5.jar:4.1.5] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.util.concurrent.RejectedExecutionException: Max requests queued per destination 3000 exceeded for HttpDestination[http://shard2replica_ip:8983 ]@3fc860ca,queue=3000,pool=MultiplexConnectionPool@5bcab306[c=0/4/4,a=4,i=0] Before the error message Max requests queued per destination 3000 exceeded for HttpDestination i see below in the shard leader log: *Request processing has stalled for 20076ms with 100 remaining elements in the queue.* 2021-09-29 15:02:19.689 ERROR (qtp1956415355-2035) x:multi_shard_shard2_replica_n6 o.a.s.u.SolrCmdDistributor java.io.IOException: Request processing has stalled for 20076ms with 100 remaining elements in the queue. at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:449) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:345) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:338) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:244) at org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribAdd(DistributedZkUpdateProcessor.java:300) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:230) at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:481) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:343) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:291) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:244) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2646) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1582) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:179) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.base/java.lang.Thread.run(Thread.java:829) This comes from the stall prevention logic introduced as part of stall prevention Jira patch detail: https://github.com/apache/lucene-solr/commit/c4f0c3363828c088eefa2b99783178848c2f1f7a . The error message *Max requests queued per destination 3000 exceeded for HttpDestination* is being raised from here https://github.com/eclipse/jetty.project/blob/1a594cef608373b0d321cda96b2ee5094c243209/jetty-client/src/main/java/org/eclipse/jetty/client/HttpDestination.java#L274 which is hardcoded as 3000 here (1000 * 3 ) at : https://github.com/tflobbe/lucene-solr-1/blob/branch_8x/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L797-L828 Has anyone faced this issue and is there any way to avoid it ? Best Regards, Dinesh Naik