The solr admin ui will tell you if it started rebuilding the suggest index or not
> On Jun 16, 2021, at 3:44 PM, Charlie Hubbard <charlie.hubb...@gmail.com> > wrote: > > Hi Rahul, > > Thank you for your response. Unfortunately, this spike happened yesterday, > and I didn't capture this data at the time. But this is very good > information if I should see this again. > > I should add that I found at 14:50 I had rebooted the server due to the > server crashing from an OOM condition which now makes me think I just need > to crank up the Xmx memory on the server. But I wanted to rule out there > wasn't something else going on before I raise the memory and move on. > > Charlie > >> On Wed, Jun 16, 2021 at 11:48 AM Rahul Goswami <rahul196...@gmail.com> >> wrote: >> >> Forgot to mention, you need to collect the output while you observe the CPU >> spike. >> >> On Wed, Jun 16, 2021 at 11:38 AM Rahul Goswami <rahul196...@gmail.com> >> wrote: >> >>> Also, can you please share the output of the below REST call from the >>> browser (any non-IE browser)? >>> >>> localhost:8983/solr/admin/info/threads >>> >>> Please replace the Solr port in the above call as per your environment. >>> >>> - Rahul >>> >>> On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com> >>> wrote: >>> >>>> Out of curiosity do you have any suggesters configured? They tend to do >>>> hard lock ups if they need to rebuild. Not saying that’s what’s going >> on >>>> but I’ve definitely had issues with them on restarting the servers and >>>> freezing the entire thing until they were done >>>> >>>>> On Jun 16, 2021, at 11:15 AM, Charlie Hubbard < >>>> charlie.hubb...@gmail.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I had an event recently where my Solr Server CPU spiked, and stopped >>>>> answering incoming messages in a timely manner. I'm having a hard >> time >>>>> pinpointing the cause. I'm reviewing the log files, but I can't >> explain >>>>> why the CPU suddenly spiked. Once this happened all of the >>>>> queries suddenly started to back up. There doesn't appear to be many >>>>> queries coming in when this happened, but since queries are logged >> after >>>>> they complete I think it's hard to know for sure. >>>>> >>>>> My setup is: >>>>> >>>>> - Single Node standalone server >>>>> - OpenJDK 1.8 >>>>> - Solr 6.6.6 >>>>> - No replication configured >>>>> - 4 collections ranging from 200mb - 20gb >>>>> - 112 million docs in the collection in question >>>>> >>>>> Here is the section in the logs where the queries started to pile up. >>>> You >>>>> see the QTimes start to go way up. >>>>> >>>>> 2021-06-15 16:27:35.647 INFO (qtp1836643189-506) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736 >>>>>> 2021-06-15 16:27:35.647 INFO (qtp1836643189-483) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=132802 >>>>>> 2021-06-15 16:27:35.647 INFO (qtp1836643189-474) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=154802 >>>>>> 2021-06-15 16:27:35.647 INFO (qtp1836643189-467) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=121797 >>>>>> 2021-06-15 16:27:35.648 INFO (qtp1836643189-459) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=165804 >>>>>> 2021-06-15 16:27:35.648 INFO (qtp1836643189-487) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke >>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 >>>> QTime=110803 >>>>>> 2021-06-15 16:27:35.648 INFO (qtp1836643189-497) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke >>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 >>>> QTime=798 >>>>>> 2021-06-15 16:27:35.647 INFO (qtp1836643189-495) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798 >>>>>> 2021-06-15 16:27:35.650 INFO (qtp1836643189-448) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=176807 >>>>>> 2021-06-15 16:27:35.650 INFO (qtp1836643189-464) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke >>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 >>>> QTime=176809 >>>>>> 2021-06-15 16:27:35.650 INFO (qtp1836643189-479) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/admin/luke >>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0 >>>> QTime=99804 >>>>>> 2021-06-15 16:27:35.650 INFO (qtp1836643189-445) [ x:customer] >>>>>> o.a.s.c.S.Request [customer] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623773588323} status=0 >> QTime=187795 >>>>> >>>>> >>>>> That burst of replication and luke handlers lasted for 1 second, but >>>> there >>>>> are 155 entries of that over and over. Prior to this there isn't any >>>>> evidence that I can see there were any queries happening that might >> have >>>>> slowed things down. There was some updating happening on the smaller >>>>> collection, but not on the collection in question. >>>>> >>>>> From the GC logs I did find some overlap where thread stoppages were >>>> taking >>>>> place around and leading up to the point in time. >>>>> >>>>> Heap after GC invocations=92 (full 1): >>>>>> par new generation total 436928K, used 86155K [0x0000000540000000, >>>>>> 0x0000000560000000, 0x00000005e0000000) >>>>>> eden space 349568K, 0% used [0x0000000540000000, >> 0x0000000540000000, >>>>>> 0x0000000555560000) >>>>>> from space 87360K, 98% used [0x0000000555560000, >> 0x000000055a982ef8, >>>>>> 0x000000055aab0000) >>>>>> to space 87360K, 0% used [0x000000055aab0000, >> 0x000000055aab0000, >>>>>> 0x0000000560000000) >>>>>> concurrent mark-sweep generation total 1572864K, used 681710K >>>>>> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000) >>>>>> Metaspace used 41348K, capacity 42508K, committed 42772K, >>>> reserved >>>>>> 1087488K >>>>>> class space used 4812K, capacity 5121K, committed 5140K, reserved >>>>>> 1048576K >>>>>> } >>>>>> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which >>>> application >>>>>> threads were stopped: 0.0653007 seconds, Stopping threads took: >>>> 0.0019805 >>>>>> seconds >>>>>> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which >>>> application >>>>>> threads were stopped: 0.0041873 seconds, Stopping threads took: >>>> 0.0037697 >>>>>> seconds >>>>>> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which >>>> application >>>>>> threads were stopped: 0.0029477 seconds, Stopping threads took: >>>> 0.0024678 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which >>>> application >>>>>> threads were stopped: 0.0224393 seconds, Stopping threads took: >>>> 0.0219226 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1 >>>>>> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256 >>>> secs] >>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs] >>>>>> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which >>>> application >>>>>> threads were stopped: 0.0060337 seconds, Stopping threads took: >>>> 0.0032933 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start] >>>>>> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark: >>>> 0.050/0.050 >>>>>> secs] [Times: user=0.39 sys=0.00, real=0.05 secs] >>>>>> 2021-06-15T16:27:41.440+0000: 1464.225: >> [CMS-concurrent-preclean-start] >>>>>> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean: >>>>>> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] >>>>>> 2021-06-15T16:27:41.445+0000: 1464.231: >>>>>> [CMS-concurrent-abortable-preclean-start] >>>>>> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which >>>> application >>>>>> threads were stopped: 0.0018167 seconds, Stopping threads took: >>>> 0.0012697 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which >>>> application >>>>>> threads were stopped: 0.0009634 seconds, Stopping threads took: >>>> 0.0005877 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which >>>> application >>>>>> threads were stopped: 0.0006194 seconds, Stopping threads took: >>>> 0.0003124 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which >>>> application >>>>>> threads were stopped: 0.0006736 seconds, Stopping threads took: >>>> 0.0003113 >>>>>> seconds >>>>>> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which >>>> application >>>>>> threads were stopped: 0.0009390 seconds, Stopping threads took: >>>> 0.0005340 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which >>>> application >>>>>> threads were stopped: 0.0016779 seconds, Stopping threads took: >>>> 0.0003837 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which >>>> application >>>>>> threads were stopped: 0.0006721 seconds, Stopping threads took: >>>> 0.0002727 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which >>>> application >>>>>> threads were stopped: 0.0003734 seconds, Stopping threads took: >>>> 0.0000991 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which >>>> application >>>>>> threads were stopped: 0.0004387 seconds, Stopping threads took: >>>> 0.0001799 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which >>>> application >>>>>> threads were stopped: 0.0004952 seconds, Stopping threads took: >>>> 0.0002235 >>>>>> seconds >>>>>> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which >>>> application >>>>>> threads were stopped: 0.0016946 seconds, Stopping threads took: >>>> 0.0013149 >>>>>> seconds >>>>>> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which >>>> application >>>>>> threads were stopped: 0.0008748 seconds, Stopping threads took: >>>> 0.0004336 >>>>>> seconds >>>>>> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which >>>> application >>>>>> threads were stopped: 0.0624611 seconds, Stopping threads took: >>>> 0.0610425 >>>>>> seconds >>>>>> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which >>>> application >>>>>> threads were stopped: 0.0329742 seconds, Stopping threads took: >>>> 0.0324816 >>>>>> seconds >>>>>> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which >>>> application >>>>>> threads were stopped: 0.0010628 seconds, Stopping threads took: >>>> 0.0006594 >>>>>> seconds >>>>>> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which >>>> application >>>>>> threads were stopped: 0.0583459 seconds, Stopping threads took: >>>> 0.0579144 >>>>>> seconds >>>>>> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which >>>> application >>>>>> threads were stopped: 0.0007076 seconds, Stopping threads took: >>>> 0.0003400 >>>>>> seconds >>>>>> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which >>>> application >>>>>> threads were stopped: 0.0020045 seconds, Stopping threads took: >>>> 0.0014785 >>>>>> seconds >>>>>> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which >>>> application >>>>>> threads were stopped: 0.0007924 seconds, Stopping threads took: >>>> 0.0004075 >>>>>> seconds >>>>>> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000: >> 1470.235: >>>>>> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times: >> user=9.74 >>>>>> sys=0.27, real=6.00 secs] >>>>>> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG >>>>>> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full >> 2): >>>>> >>>>> >>>>> Also I had restarted the service at 14:50 which is almost 2 hours >> prior >>>> to >>>>> this. I found a similar pattern in this one but involving a somewhat >>>>> smaller collection: >>>>> >>>>> 2021-06-15 15:44:09.698 INFO (qtp1836643189-60) [ x:archiver] >>>>>> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/replication >>>>>> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397 >>>>>> 2021-06-15 15:44:09.698 INFO (qtp1836643189-23) [ x:archiver] >>>>>> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/admin/luke >>>>>> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0 >>>> QTime=8399 >>>>>> 2021-06-15 15:44:10.035 INFO (qtp1836643189-24) [ x:archiver] >>>>>> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select >>>>>> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} >>>>>> hits=1272929 status=0 QTime=50886 >>>>>> 2021-06-15 15:44:10.039 INFO (qtp1836643189-53) [ x:archiver] >>>>>> o.a.s.c.S.Request [fusearchiver] webapp=/solr path=/select >>>>>> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912} >>>>>> hits=1272929 status=0 QTime=42102 >>>>>> 2021-06-15 15:44:10.077 INFO (qtp1836643189-53) [ x:archiver] >>>>>> o.a.s.s.HttpSolrCall Unable to write response, client closed >>>> connection or >>>>>> we are shutting down >>>>> >>>>> >>>>> Again this is preceded by these calls to replication and luke handler >>>> that >>>>> have fairly large QTime. >>>>> >>>>> I am not using replication, but it is enabled. From researching the >>>>> command this is just a status call. So where could this be triggered >>>> from? >>>>> >>>>> Beyond that strange event what else should I be looking at or changing >>>> in >>>>> my configuration to find these issues? Memory was around 7gb of the >> max >>>>> 8gb. GC doesn't seem excessive. The memory is growing, but not >>>> spiking. >>>>> Should I turn up the memory allocation? >>>>> >>>>> TIA >>>>> Charlie >>>> >>> >>