Hello Can you please tell us the JVM Heap Setting for both the versions: 8.3.1, 8.9.0?
I will also have to look into the following code: FileFloatSource.java:210. (will do it tonite-IST and update) Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 [email protected] Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In India : http://www.makeinindia.com/home On Wed, Oct 13, 2021 at 4:06 PM Dominic Humphries <[email protected]> wrote: > Oh, that's very helpful to know about, ty > > The overwhelming majority appear to be threads in TIMED_WAITING, all > waiting on the same > thing: > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3b315cbb > > I've attached a screenshot which includes the stack trace. Stopping all > queries to the instance and waiting didn't result in any noticeable > decrease in the number of threads so it looks like despite being timed, > they're simply not getting terminated. > > Restarting the service takes me back down to just 53 threads; re-running a > test results in many new threads immediately coming into being, this time > with a higher proportion of threads BLOCKED on > org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de > - See second screenshot. The stack trace for those is too big for one > screen so here's the output: > > qtp178604517-861 (861) > > > org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de > > - > > org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:210) > - > > org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:158) > - > > org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:97) > - > > org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource.getValues(ValueSource.java:203) > - > > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:261) > - > > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:224) > - org.apache.lucene.search.Weight.scorerSupplier(Weight.java:148) > - > > org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379) > - org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344) > - org.apache.lucene.search.Weight.bulkScorer(Weight.java:182) > - > org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:338) > - org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656) > - org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > - > > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211) > - > > org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1705) > - > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1408) > - > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596) > - > > org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1500) > - > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:390) > - > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:369) > - > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) > - org.apache.solr.core.SolrCore.execute(SolrCore.java:2637) > - org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794) > - org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567) > - > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) > - > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357) > - > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) > - > > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > - > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) > - > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > - > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) > - > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > - > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) > - > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) > - > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) > - > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) > - > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > - > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) > - > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) > - > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) > - > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) > - > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > - > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) > - > > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) > - > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) > - > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > - > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) > - > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > - org.eclipse.jetty.server.Server.handle(Server.java:516) > - > org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) > - > org.eclipse.jetty.server.HttpChannel$$Lambda$556/0x000000080067a440.dispatch(Unknown > Source) > - org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) > - org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) > - > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) > - > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) > - org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) > - org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) > - > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) > - > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) > - > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) > - > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) > - > > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383) > - > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) > - > > org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) > - [email protected]/java.lang.Thread.run(Thread.java:834) > > [image: image.png] > [image: image.png] > > On Wed, 13 Oct 2021 at 00:03, Joel Bernstein <[email protected]> wrote: > >> There is a thread dump on the Solr admin. You can use that to determine >> what all those threads are doing and where they are getting stuck. You can >> post parts of the thread dump back to this email thread as well. >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> >> On Tue, Oct 12, 2021 at 11:15 AM Dominic Humphries >> <[email protected]> wrote: >> >> > We run 8.3.1 in prod without any problems, but we're having issues with >> > trying to upgrade. >> > >> > I've created an 8.9.0 leader & follower, imported our live data into it, >> > and am testing it via replaying requests made to prod. We're seeing a >> big >> > problem where fairly moderate request rates are causing the instance to >> > become so slow it fails healthcheck. The logs showed a lot of errors >> around >> > creating threads: >> > >> > solr[4507]: [124136.511s][warning][os,thread] Failed to start thread - >> > pthread_create failed (EAGAIN) for attributes: stacksize: 256k, >> guardsize: >> > 0k, detached. >> > >> > WARN (qtp178604517-3891) [ ] o.e.j.i.ManagedSelector => >> > java.lang.OutOfMemoryError: unable to create native thread: possibly >> out of >> > memory or process/resource limits reached >> > >> > So I monitored thread count for the process whilst running the test >> suite >> > and saw a persistent pattern: Threads increased until maxed out, the >> logs >> > flooded with errors as it tried to create still more threads, and the >> > instance slowed down until terminated as unhealthy. >> > >> > The DefaultTasksMax is set to 4915, I've tried raising and lowering it >> but >> > regardless of value the result is the same: it gets maxed and everything >> > slows down. >> > >> > Is there anything I can do to stop solr spinning up so many threads it >> > ceases to function? There have been a few test passes where it >> > spontaneously dropped threadcount from thousands to hundreds and stayed >> up >> > longer, but there seems no pattern to when this happens. Running the >> same >> > tests on 8.3.1 results in a much slower increase in threads and it never >> > quite maxes them so things continue to function. >> > >> > See below for the thread count and healthcheck times seen on a (fairly >> > harsh) test run of 100 requests/sec >> > >> > Thanks >> > >> > Dominic >> > >> > >> > Threadcount: >> > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; ps -eLF | grep >> 'start.jar' >> > | wc -l; sleep 10s; done >> > Tue Oct 12 14:27:33 UTC 2021 >> > 52 >> > Tue Oct 12 14:27:43 UTC 2021 >> > 52 >> > Tue Oct 12 14:27:54 UTC 2021 >> > 52 >> > Tue Oct 12 14:28:04 UTC 2021 >> > 52 >> > Tue Oct 12 14:28:14 UTC 2021 >> > 569 >> > Tue Oct 12 14:28:24 UTC 2021 >> > 899 >> > Tue Oct 12 14:28:34 UTC 2021 >> > 1198 >> > Tue Oct 12 14:28:44 UTC 2021 >> > 1589 >> > Tue Oct 12 14:28:54 UTC 2021 >> > 2016 >> > Tue Oct 12 14:29:05 UTC 2021 >> > 2451 >> > Tue Oct 12 14:29:15 UTC 2021 >> > 2851 >> > Tue Oct 12 14:29:26 UTC 2021 >> > 2934 >> > Tue Oct 12 14:29:36 UTC 2021 >> > 3249 >> > Tue Oct 12 14:29:46 UTC 2021 >> > 3501 >> > Tue Oct 12 14:29:57 UTC 2021 >> > 3734 >> > Tue Oct 12 14:30:07 UTC 2021 >> > 4128 >> > Tue Oct 12 14:30:18 UTC 2021 >> > 4374 >> > Tue Oct 12 14:30:29 UTC 2021 >> > 4637 >> > Tue Oct 12 14:30:39 UTC 2021 >> > 4693 >> > Tue Oct 12 14:30:50 UTC 2021 >> > 4807 >> > Tue Oct 12 14:31:01 UTC 2021 >> > 4916 >> > Tue Oct 12 14:31:11 UTC 2021 >> > 4916 >> > Tue Oct 12 14:31:22 UTC 2021 >> > Connection to 10.40.22.166 closed by remote host. >> > >> > >> > Healthcheck: >> > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; curl -v >> > localhost:8983/solr/ 2>&1 | grep HTTP; date; echo '----'; sleep >> > 10s; done >> > Tue Oct 12 14:27:34 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:27:34 UTC 2021 >> > ---- >> > Tue Oct 12 14:27:44 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:27:44 UTC 2021 >> > ---- >> > Tue Oct 12 14:27:54 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:27:54 UTC 2021 >> > ---- >> > Tue Oct 12 14:28:04 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:28:04 UTC 2021 >> > ---- >> > Tue Oct 12 14:28:14 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:28:16 UTC 2021 >> > ---- >> > Tue Oct 12 14:28:26 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:12 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:28:39 UTC 2021 >> > ---- >> > Tue Oct 12 14:28:49 UTC 2021 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- >> > 0> GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:23 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:29:13 UTC 2021 >> > ---- >> > Tue Oct 12 14:29:23 UTC 2021 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- >> > 0> GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:29:25 UTC 2021 >> > ---- >> > Tue Oct 12 14:29:35 UTC 2021 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- >> > 0> GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:29:44 UTC 2021 >> > ---- >> > Tue Oct 12 14:29:54 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:11 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:30:06 UTC 2021 >> > ---- >> > Tue Oct 12 14:30:16 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:30:20 UTC 2021 >> > ---- >> > Tue Oct 12 14:30:30 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- >> > 0< HTTP/1.1 200 OK >> > Tue Oct 12 14:30:33 UTC 2021 >> > ---- >> > Tue Oct 12 14:30:43 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:30:43 UTC 2021 >> > ---- >> > Tue Oct 12 14:30:53 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > Tue Oct 12 14:30:55 UTC 2021 >> > ---- >> > Tue Oct 12 14:31:05 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:31:05 UTC 2021 >> > ---- >> > Tue Oct 12 14:31:15 UTC 2021 >> > > GET /solr/ HTTP/1.1 >> > < HTTP/1.1 200 OK >> > Tue Oct 12 14:31:15 UTC 2021 >> > ---- >> > Connection to 10.40.22.166 closed by remote host. >> > >> >
