The solr admin ui will tell you if it started rebuilding the suggest index or 
not

> On Jun 16, 2021, at 3:44 PM, Charlie Hubbard <charlie.hubb...@gmail.com> 
> wrote:
> 
> Hi Rahul,
> 
> Thank you for your response.  Unfortunately, this spike happened yesterday,
> and I didn't capture this data at the time.  But this is very good
> information if I should see this again.
> 
> I should add that I found at 14:50 I had rebooted the server due to the
> server crashing from an OOM condition which now makes me think I just need
> to crank up the Xmx memory on the server.  But I wanted to rule out there
> wasn't something else going on before I raise the memory and move on.
> 
> Charlie
> 
>> On Wed, Jun 16, 2021 at 11:48 AM Rahul Goswami <rahul196...@gmail.com>
>> wrote:
>> 
>> Forgot to mention, you need to collect the output while you observe the CPU
>> spike.
>> 
>> On Wed, Jun 16, 2021 at 11:38 AM Rahul Goswami <rahul196...@gmail.com>
>> wrote:
>> 
>>> Also, can you please share the output of the below REST call from the
>>> browser (any non-IE browser)?
>>> 
>>> localhost:8983/solr/admin/info/threads
>>> 
>>> Please replace the Solr port in the above call as per your environment.
>>> 
>>> - Rahul
>>> 
>>> On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com>
>>> wrote:
>>> 
>>>> Out of curiosity do you have any suggesters configured?  They tend to do
>>>> hard lock ups if they need to rebuild.  Not saying that’s what’s going
>> on
>>>> but I’ve definitely had issues with them on restarting the servers and
>>>> freezing the entire thing until they were done
>>>> 
>>>>> On Jun 16, 2021, at 11:15 AM, Charlie Hubbard <
>>>> charlie.hubb...@gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I had an event recently where my Solr Server CPU spiked, and stopped
>>>>> answering incoming messages in a timely manner.  I'm having a hard
>> time
>>>>> pinpointing the cause.  I'm reviewing the log files, but I can't
>> explain
>>>>> why the CPU suddenly spiked.  Once this happened all of the
>>>>> queries suddenly started to back up.  There doesn't appear to be many
>>>>> queries coming in when this happened, but since queries are logged
>> after
>>>>> they complete I think it's hard to know for sure.
>>>>> 
>>>>> My setup is:
>>>>> 
>>>>>  - Single Node standalone server
>>>>>  - OpenJDK 1.8
>>>>>  - Solr 6.6.6
>>>>>  - No replication configured
>>>>>  -  4 collections ranging from 200mb - 20gb
>>>>>  - 112 million docs in the collection in question
>>>>> 
>>>>> Here is the section in the logs where the queries started to pile up.
>>>> You
>>>>> see the QTimes start to go way up.
>>>>> 
>>>>> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-506) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736
>>>>>> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-483) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=132802
>>>>>> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-474) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=154802
>>>>>> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-467) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=121797
>>>>>> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-459) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=165804
>>>>>> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-487) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>>>> QTime=110803
>>>>>> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-497) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>>>> QTime=798
>>>>>> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-495) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798
>>>>>> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-448) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=176807
>>>>>> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-464) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>>>> QTime=176809
>>>>>> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-479) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>>>>>> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>>>> QTime=99804
>>>>>> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-445) [   x:customer]
>>>>>> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623773588323} status=0
>> QTime=187795
>>>>> 
>>>>> 
>>>>> That burst of replication and luke handlers lasted for 1 second, but
>>>> there
>>>>> are 155 entries of that over and over.  Prior to this there isn't any
>>>>> evidence that I can see there were any queries happening that might
>> have
>>>>> slowed things down.  There was some updating happening on the smaller
>>>>> collection, but not on the collection in question.
>>>>> 
>>>>> From the GC logs I did find some overlap where thread stoppages were
>>>> taking
>>>>> place around and leading up to the point in time.
>>>>> 
>>>>> Heap after GC invocations=92 (full 1):
>>>>>> par new generation   total 436928K, used 86155K [0x0000000540000000,
>>>>>> 0x0000000560000000, 0x00000005e0000000)
>>>>>> eden space 349568K,   0% used [0x0000000540000000,
>> 0x0000000540000000,
>>>>>> 0x0000000555560000)
>>>>>> from space 87360K,  98% used [0x0000000555560000,
>> 0x000000055a982ef8,
>>>>>> 0x000000055aab0000)
>>>>>> to   space 87360K,   0% used [0x000000055aab0000,
>> 0x000000055aab0000,
>>>>>> 0x0000000560000000)
>>>>>> concurrent mark-sweep generation total 1572864K, used 681710K
>>>>>> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000)
>>>>>> Metaspace       used 41348K, capacity 42508K, committed 42772K,
>>>> reserved
>>>>>> 1087488K
>>>>>> class space    used 4812K, capacity 5121K, committed 5140K, reserved
>>>>>> 1048576K
>>>>>> }
>>>>>> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0653007 seconds, Stopping threads took:
>>>> 0.0019805
>>>>>> seconds
>>>>>> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0041873 seconds, Stopping threads took:
>>>> 0.0037697
>>>>>> seconds
>>>>>> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0029477 seconds, Stopping threads took:
>>>> 0.0024678
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0224393 seconds, Stopping threads took:
>>>> 0.0219226
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1
>>>>>> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256
>>>> secs]
>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0060337 seconds, Stopping threads took:
>>>> 0.0032933
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start]
>>>>>> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark:
>>>> 0.050/0.050
>>>>>> secs] [Times: user=0.39 sys=0.00, real=0.05 secs]
>>>>>> 2021-06-15T16:27:41.440+0000: 1464.225:
>> [CMS-concurrent-preclean-start]
>>>>>> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean:
>>>>>> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
>>>>>> 2021-06-15T16:27:41.445+0000: 1464.231:
>>>>>> [CMS-concurrent-abortable-preclean-start]
>>>>>> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0018167 seconds, Stopping threads took:
>>>> 0.0012697
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0009634 seconds, Stopping threads took:
>>>> 0.0005877
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0006194 seconds, Stopping threads took:
>>>> 0.0003124
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0006736 seconds, Stopping threads took:
>>>> 0.0003113
>>>>>> seconds
>>>>>> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0009390 seconds, Stopping threads took:
>>>> 0.0005340
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0016779 seconds, Stopping threads took:
>>>> 0.0003837
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0006721 seconds, Stopping threads took:
>>>> 0.0002727
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0003734 seconds, Stopping threads took:
>>>> 0.0000991
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0004387 seconds, Stopping threads took:
>>>> 0.0001799
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0004952 seconds, Stopping threads took:
>>>> 0.0002235
>>>>>> seconds
>>>>>> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0016946 seconds, Stopping threads took:
>>>> 0.0013149
>>>>>> seconds
>>>>>> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0008748 seconds, Stopping threads took:
>>>> 0.0004336
>>>>>> seconds
>>>>>> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0624611 seconds, Stopping threads took:
>>>> 0.0610425
>>>>>> seconds
>>>>>> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0329742 seconds, Stopping threads took:
>>>> 0.0324816
>>>>>> seconds
>>>>>> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0010628 seconds, Stopping threads took:
>>>> 0.0006594
>>>>>> seconds
>>>>>> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0583459 seconds, Stopping threads took:
>>>> 0.0579144
>>>>>> seconds
>>>>>> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0007076 seconds, Stopping threads took:
>>>> 0.0003400
>>>>>> seconds
>>>>>> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0020045 seconds, Stopping threads took:
>>>> 0.0014785
>>>>>> seconds
>>>>>> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which
>>>> application
>>>>>> threads were stopped: 0.0007924 seconds, Stopping threads took:
>>>> 0.0004075
>>>>>> seconds
>>>>>> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000:
>> 1470.235:
>>>>>> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times:
>> user=9.74
>>>>>> sys=0.27, real=6.00 secs]
>>>>>> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG
>>>>>> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full
>> 2):
>>>>> 
>>>>> 
>>>>> Also I had restarted the service at 14:50 which is almost 2 hours
>> prior
>>>> to
>>>>> this.  I found a similar pattern in this one but involving a somewhat
>>>>> smaller collection:
>>>>> 
>>>>> 2021-06-15 15:44:09.698 INFO  (qtp1836643189-60) [   x:archiver]
>>>>>> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/replication
>>>>>> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397
>>>>>> 2021-06-15 15:44:09.698 INFO  (qtp1836643189-23) [   x:archiver]
>>>>>> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/admin/luke
>>>>>> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0
>>>> QTime=8399
>>>>>> 2021-06-15 15:44:10.035 INFO  (qtp1836643189-24) [   x:archiver]
>>>>>> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
>>>>>> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
>>>>>> hits=1272929 status=0 QTime=50886
>>>>>> 2021-06-15 15:44:10.039 INFO  (qtp1836643189-53) [   x:archiver]
>>>>>> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
>>>>>> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
>>>>>> hits=1272929 status=0 QTime=42102
>>>>>> 2021-06-15 15:44:10.077 INFO  (qtp1836643189-53) [   x:archiver]
>>>>>> o.a.s.s.HttpSolrCall Unable to write response, client closed
>>>> connection or
>>>>>> we are shutting down
>>>>> 
>>>>> 
>>>>> Again this is preceded by these calls to replication and luke handler
>>>> that
>>>>> have fairly large QTime.
>>>>> 
>>>>> I am not using replication, but it is enabled.  From researching the
>>>>> command this is just a status call.  So where could this be triggered
>>>> from?
>>>>> 
>>>>> Beyond that strange event what else should I be looking at or changing
>>>> in
>>>>> my configuration to find these issues?  Memory was around 7gb of the
>> max
>>>>> 8gb.  GC doesn't seem excessive.  The memory is growing, but not
>>>> spiking.
>>>>> Should I turn up the memory allocation?
>>>>> 
>>>>> TIA
>>>>> Charlie
>>>> 
>>> 
>> 

Reply via email to