Also, can you please share the output of the below REST call from the browser (any non-IE browser)?
localhost:8983/solr/admin/info/threads Please replace the Solr port in the above call as per your environment. - Rahul On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com> wrote: > Out of curiosity do you have any suggesters configured? They tend to do > hard lock ups if they need to rebuild. Not saying that’s what’s going on > but I’ve definitely had issues with them on restarting the servers and > freezing the entire thing until they were done > > > On Jun 16, 2021, at 11:15 AM, Charlie Hubbard <charlie.hubb...@gmail.com> > wrote: > > > > Hi, > > > > I had an event recently where my Solr Server CPU spiked, and stopped > > answering incoming messages in a timely manner. I'm having a hard time > > pinpointing the cause. I'm reviewing the log files, but I can't explain > > why the CPU suddenly spiked. Once this happened all of the > > queries suddenly started to back up. There doesn't appear to be many > > queries coming in when this happened, but since queries are logged after > > they complete I think it's hard to know for sure. > > > > My setup is: > > > > - Single Node standalone server > > - OpenJDK 1.8 > > - Solr 6.6.6 > > - No replication configured > > - 4 collections ranging from 200mb - 20gb > > - 112 million docs in the collection in question > > > > Here is the section in the logs where the queries started to pile up. > You > > see the QTimes start to go way up. > > > > 2021-06-15 16:27:35.647 INFO (qtp1836643189-506) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736 > >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-483) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=132802 > >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-474) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=154802 > >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-467) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=121797 > >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-459) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=165804 > >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-487) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > QTime=110803 > >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-497) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > QTime=798 > >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-495) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798 > >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-448) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=176807 > >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-464) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > QTime=176809 > >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-479) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > QTime=99804 > >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-445) [ x:customer] > >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623773588323} status=0 QTime=187795 > > > > > > That burst of replication and luke handlers lasted for 1 second, but > there > > are 155 entries of that over and over. Prior to this there isn't any > > evidence that I can see there were any queries happening that might have > > slowed things down. There was some updating happening on the smaller > > collection, but not on the collection in question. > > > > From the GC logs I did find some overlap where thread stoppages were > taking > > place around and leading up to the point in time. > > > > Heap after GC invocations=92 (full 1): > >> par new generation total 436928K, used 86155K [0x0000000540000000, > >> 0x0000000560000000, 0x00000005e0000000) > >> eden space 349568K, 0% used [0x0000000540000000, 0x0000000540000000, > >> 0x0000000555560000) > >> from space 87360K, 98% used [0x0000000555560000, 0x000000055a982ef8, > >> 0x000000055aab0000) > >> to space 87360K, 0% used [0x000000055aab0000, 0x000000055aab0000, > >> 0x0000000560000000) > >> concurrent mark-sweep generation total 1572864K, used 681710K > >> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000) > >> Metaspace used 41348K, capacity 42508K, committed 42772K, reserved > >> 1087488K > >> class space used 4812K, capacity 5121K, committed 5140K, reserved > >> 1048576K > >> } > >> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which application > >> threads were stopped: 0.0653007 seconds, Stopping threads took: > 0.0019805 > >> seconds > >> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which application > >> threads were stopped: 0.0041873 seconds, Stopping threads took: > 0.0037697 > >> seconds > >> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which application > >> threads were stopped: 0.0029477 seconds, Stopping threads took: > 0.0024678 > >> seconds > >> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which application > >> threads were stopped: 0.0224393 seconds, Stopping threads took: > 0.0219226 > >> seconds > >> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1 > >> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256 secs] > >> [Times: user=0.01 sys=0.00, real=0.00 secs] > >> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which application > >> threads were stopped: 0.0060337 seconds, Stopping threads took: > 0.0032933 > >> seconds > >> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start] > >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark: > 0.050/0.050 > >> secs] [Times: user=0.39 sys=0.00, real=0.05 secs] > >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-preclean-start] > >> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean: > >> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] > >> 2021-06-15T16:27:41.445+0000: 1464.231: > >> [CMS-concurrent-abortable-preclean-start] > >> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which application > >> threads were stopped: 0.0018167 seconds, Stopping threads took: > 0.0012697 > >> seconds > >> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which application > >> threads were stopped: 0.0009634 seconds, Stopping threads took: > 0.0005877 > >> seconds > >> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which application > >> threads were stopped: 0.0006194 seconds, Stopping threads took: > 0.0003124 > >> seconds > >> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which application > >> threads were stopped: 0.0006736 seconds, Stopping threads took: > 0.0003113 > >> seconds > >> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which application > >> threads were stopped: 0.0009390 seconds, Stopping threads took: > 0.0005340 > >> seconds > >> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which application > >> threads were stopped: 0.0016779 seconds, Stopping threads took: > 0.0003837 > >> seconds > >> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which application > >> threads were stopped: 0.0006721 seconds, Stopping threads took: > 0.0002727 > >> seconds > >> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which application > >> threads were stopped: 0.0003734 seconds, Stopping threads took: > 0.0000991 > >> seconds > >> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which application > >> threads were stopped: 0.0004387 seconds, Stopping threads took: > 0.0001799 > >> seconds > >> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which application > >> threads were stopped: 0.0004952 seconds, Stopping threads took: > 0.0002235 > >> seconds > >> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which application > >> threads were stopped: 0.0016946 seconds, Stopping threads took: > 0.0013149 > >> seconds > >> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which application > >> threads were stopped: 0.0008748 seconds, Stopping threads took: > 0.0004336 > >> seconds > >> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which application > >> threads were stopped: 0.0624611 seconds, Stopping threads took: > 0.0610425 > >> seconds > >> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which application > >> threads were stopped: 0.0329742 seconds, Stopping threads took: > 0.0324816 > >> seconds > >> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which application > >> threads were stopped: 0.0010628 seconds, Stopping threads took: > 0.0006594 > >> seconds > >> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which application > >> threads were stopped: 0.0583459 seconds, Stopping threads took: > 0.0579144 > >> seconds > >> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which application > >> threads were stopped: 0.0007076 seconds, Stopping threads took: > 0.0003400 > >> seconds > >> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which application > >> threads were stopped: 0.0020045 seconds, Stopping threads took: > 0.0014785 > >> seconds > >> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which application > >> threads were stopped: 0.0007924 seconds, Stopping threads took: > 0.0004075 > >> seconds > >> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000: 1470.235: > >> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times: user=9.74 > >> sys=0.27, real=6.00 secs] > >> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG > >> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full 2): > > > > > > Also I had restarted the service at 14:50 which is almost 2 hours prior > to > > this. I found a similar pattern in this one but involving a somewhat > > smaller collection: > > > > 2021-06-15 15:44:09.698 INFO (qtp1836643189-60) [ x:archiver] > >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/replication > >> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397 > >> 2021-06-15 15:44:09.698 INFO (qtp1836643189-23) [ x:archiver] > >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/admin/luke > >> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0 > QTime=8399 > >> 2021-06-15 15:44:10.035 INFO (qtp1836643189-24) [ x:archiver] > >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select > >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} > >> hits=1272929 status=0 QTime=50886 > >> 2021-06-15 15:44:10.039 INFO (qtp1836643189-53) [ x:archiver] > >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select > >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} > >> hits=1272929 status=0 QTime=42102 > >> 2021-06-15 15:44:10.077 INFO (qtp1836643189-53) [ x:archiver] > >> o.a.s.s.HttpSolrCall Unable to write response, client closed connection > or > >> we are shutting down > > > > > > Again this is preceded by these calls to replication and luke handler > that > > have fairly large QTime. > > > > I am not using replication, but it is enabled. From researching the > > command this is just a status call. So where could this be triggered > from? > > > > Beyond that strange event what else should I be looking at or changing in > > my configuration to find these issues? Memory was around 7gb of the max > > 8gb. GC doesn't seem excessive. The memory is growing, but not spiking. > > Should I turn up the memory allocation? > > > > TIA > > Charlie >