Hi Rahul, Thank you for your response. Unfortunately, this spike happened yesterday, and I didn't capture this data at the time. But this is very good information if I should see this again.
I should add that I found at 14:50 I had rebooted the server due to the server crashing from an OOM condition which now makes me think I just need to crank up the Xmx memory on the server. But I wanted to rule out there wasn't something else going on before I raise the memory and move on. Charlie On Wed, Jun 16, 2021 at 11:48 AM Rahul Goswami <rahul196...@gmail.com> wrote: > Forgot to mention, you need to collect the output while you observe the CPU > spike. > > On Wed, Jun 16, 2021 at 11:38 AM Rahul Goswami <rahul196...@gmail.com> > wrote: > > > Also, can you please share the output of the below REST call from the > > browser (any non-IE browser)? > > > > localhost:8983/solr/admin/info/threads > > > > Please replace the Solr port in the above call as per your environment. > > > > - Rahul > > > > On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com> > > wrote: > > > >> Out of curiosity do you have any suggesters configured? They tend to do > >> hard lock ups if they need to rebuild. Not saying that’s what’s going > on > >> but I’ve definitely had issues with them on restarting the servers and > >> freezing the entire thing until they were done > >> > >> > On Jun 16, 2021, at 11:15 AM, Charlie Hubbard < > >> charlie.hubb...@gmail.com> wrote: > >> > > >> > Hi, > >> > > >> > I had an event recently where my Solr Server CPU spiked, and stopped > >> > answering incoming messages in a timely manner. I'm having a hard > time > >> > pinpointing the cause. I'm reviewing the log files, but I can't > explain > >> > why the CPU suddenly spiked. Once this happened all of the > >> > queries suddenly started to back up. There doesn't appear to be many > >> > queries coming in when this happened, but since queries are logged > after > >> > they complete I think it's hard to know for sure. > >> > > >> > My setup is: > >> > > >> > - Single Node standalone server > >> > - OpenJDK 1.8 > >> > - Solr 6.6.6 > >> > - No replication configured > >> > - 4 collections ranging from 200mb - 20gb > >> > - 112 million docs in the collection in question > >> > > >> > Here is the section in the logs where the queries started to pile up. > >> You > >> > see the QTimes start to go way up. > >> > > >> > 2021-06-15 16:27:35.647 INFO (qtp1836643189-506) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736 > >> >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-483) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=132802 > >> >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-474) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=154802 > >> >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-467) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=121797 > >> >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-459) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=165804 > >> >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-487) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > >> QTime=110803 > >> >> 2021-06-15 16:27:35.648 INFO (qtp1836643189-497) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > >> QTime=798 > >> >> 2021-06-15 16:27:35.647 INFO (qtp1836643189-495) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798 > >> >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-448) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=176807 > >> >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-464) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > >> QTime=176809 > >> >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-479) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke > >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 > >> QTime=99804 > >> >> 2021-06-15 16:27:35.650 INFO (qtp1836643189-445) [ x:customer] > >> >> o.a.s.c.S.Request [customer] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623773588323} status=0 > QTime=187795 > >> > > >> > > >> > That burst of replication and luke handlers lasted for 1 second, but > >> there > >> > are 155 entries of that over and over. Prior to this there isn't any > >> > evidence that I can see there were any queries happening that might > have > >> > slowed things down. There was some updating happening on the smaller > >> > collection, but not on the collection in question. > >> > > >> > From the GC logs I did find some overlap where thread stoppages were > >> taking > >> > place around and leading up to the point in time. > >> > > >> > Heap after GC invocations=92 (full 1): > >> >> par new generation total 436928K, used 86155K [0x0000000540000000, > >> >> 0x0000000560000000, 0x00000005e0000000) > >> >> eden space 349568K, 0% used [0x0000000540000000, > 0x0000000540000000, > >> >> 0x0000000555560000) > >> >> from space 87360K, 98% used [0x0000000555560000, > 0x000000055a982ef8, > >> >> 0x000000055aab0000) > >> >> to space 87360K, 0% used [0x000000055aab0000, > 0x000000055aab0000, > >> >> 0x0000000560000000) > >> >> concurrent mark-sweep generation total 1572864K, used 681710K > >> >> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000) > >> >> Metaspace used 41348K, capacity 42508K, committed 42772K, > >> reserved > >> >> 1087488K > >> >> class space used 4812K, capacity 5121K, committed 5140K, reserved > >> >> 1048576K > >> >> } > >> >> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which > >> application > >> >> threads were stopped: 0.0653007 seconds, Stopping threads took: > >> 0.0019805 > >> >> seconds > >> >> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which > >> application > >> >> threads were stopped: 0.0041873 seconds, Stopping threads took: > >> 0.0037697 > >> >> seconds > >> >> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which > >> application > >> >> threads were stopped: 0.0029477 seconds, Stopping threads took: > >> 0.0024678 > >> >> seconds > >> >> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which > >> application > >> >> threads were stopped: 0.0224393 seconds, Stopping threads took: > >> 0.0219226 > >> >> seconds > >> >> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1 > >> >> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256 > >> secs] > >> >> [Times: user=0.01 sys=0.00, real=0.00 secs] > >> >> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which > >> application > >> >> threads were stopped: 0.0060337 seconds, Stopping threads took: > >> 0.0032933 > >> >> seconds > >> >> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start] > >> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark: > >> 0.050/0.050 > >> >> secs] [Times: user=0.39 sys=0.00, real=0.05 secs] > >> >> 2021-06-15T16:27:41.440+0000: 1464.225: > [CMS-concurrent-preclean-start] > >> >> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean: > >> >> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] > >> >> 2021-06-15T16:27:41.445+0000: 1464.231: > >> >> [CMS-concurrent-abortable-preclean-start] > >> >> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which > >> application > >> >> threads were stopped: 0.0018167 seconds, Stopping threads took: > >> 0.0012697 > >> >> seconds > >> >> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which > >> application > >> >> threads were stopped: 0.0009634 seconds, Stopping threads took: > >> 0.0005877 > >> >> seconds > >> >> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which > >> application > >> >> threads were stopped: 0.0006194 seconds, Stopping threads took: > >> 0.0003124 > >> >> seconds > >> >> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which > >> application > >> >> threads were stopped: 0.0006736 seconds, Stopping threads took: > >> 0.0003113 > >> >> seconds > >> >> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which > >> application > >> >> threads were stopped: 0.0009390 seconds, Stopping threads took: > >> 0.0005340 > >> >> seconds > >> >> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which > >> application > >> >> threads were stopped: 0.0016779 seconds, Stopping threads took: > >> 0.0003837 > >> >> seconds > >> >> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which > >> application > >> >> threads were stopped: 0.0006721 seconds, Stopping threads took: > >> 0.0002727 > >> >> seconds > >> >> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which > >> application > >> >> threads were stopped: 0.0003734 seconds, Stopping threads took: > >> 0.0000991 > >> >> seconds > >> >> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which > >> application > >> >> threads were stopped: 0.0004387 seconds, Stopping threads took: > >> 0.0001799 > >> >> seconds > >> >> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which > >> application > >> >> threads were stopped: 0.0004952 seconds, Stopping threads took: > >> 0.0002235 > >> >> seconds > >> >> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which > >> application > >> >> threads were stopped: 0.0016946 seconds, Stopping threads took: > >> 0.0013149 > >> >> seconds > >> >> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which > >> application > >> >> threads were stopped: 0.0008748 seconds, Stopping threads took: > >> 0.0004336 > >> >> seconds > >> >> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which > >> application > >> >> threads were stopped: 0.0624611 seconds, Stopping threads took: > >> 0.0610425 > >> >> seconds > >> >> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which > >> application > >> >> threads were stopped: 0.0329742 seconds, Stopping threads took: > >> 0.0324816 > >> >> seconds > >> >> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which > >> application > >> >> threads were stopped: 0.0010628 seconds, Stopping threads took: > >> 0.0006594 > >> >> seconds > >> >> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which > >> application > >> >> threads were stopped: 0.0583459 seconds, Stopping threads took: > >> 0.0579144 > >> >> seconds > >> >> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which > >> application > >> >> threads were stopped: 0.0007076 seconds, Stopping threads took: > >> 0.0003400 > >> >> seconds > >> >> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which > >> application > >> >> threads were stopped: 0.0020045 seconds, Stopping threads took: > >> 0.0014785 > >> >> seconds > >> >> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which > >> application > >> >> threads were stopped: 0.0007924 seconds, Stopping threads took: > >> 0.0004075 > >> >> seconds > >> >> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000: > 1470.235: > >> >> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times: > user=9.74 > >> >> sys=0.27, real=6.00 secs] > >> >> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG > >> >> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full > 2): > >> > > >> > > >> > Also I had restarted the service at 14:50 which is almost 2 hours > prior > >> to > >> > this. I found a similar pattern in this one but involving a somewhat > >> > smaller collection: > >> > > >> > 2021-06-15 15:44:09.698 INFO (qtp1836643189-60) [ x:archiver] > >> >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/replication > >> >> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397 > >> >> 2021-06-15 15:44:09.698 INFO (qtp1836643189-23) [ x:archiver] > >> >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/admin/luke > >> >> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0 > >> QTime=8399 > >> >> 2021-06-15 15:44:10.035 INFO (qtp1836643189-24) [ x:archiver] > >> >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select > >> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} > >> >> hits=1272929 status=0 QTime=50886 > >> >> 2021-06-15 15:44:10.039 INFO (qtp1836643189-53) [ x:archiver] > >> >> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select > >> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} > >> >> hits=1272929 status=0 QTime=42102 > >> >> 2021-06-15 15:44:10.077 INFO (qtp1836643189-53) [ x:archiver] > >> >> o.a.s.s.HttpSolrCall Unable to write response, client closed > >> connection or > >> >> we are shutting down > >> > > >> > > >> > Again this is preceded by these calls to replication and luke handler > >> that > >> > have fairly large QTime. > >> > > >> > I am not using replication, but it is enabled. From researching the > >> > command this is just a status call. So where could this be triggered > >> from? > >> > > >> > Beyond that strange event what else should I be looking at or changing > >> in > >> > my configuration to find these issues? Memory was around 7gb of the > max > >> > 8gb. GC doesn't seem excessive. The memory is growing, but not > >> spiking. > >> > Should I turn up the memory allocation? > >> > > >> > TIA > >> > Charlie > >> > > >