Hi Dave,

Thank you for your quick response.  I do have the SuggestComponent enabled,
but I do not use them.  Would this configuration trigger them to run like
you were wondering?  I could definitely try and turn those off.  Do you
know of any log messages I might look for to know if they could have been
running at the time?  Thanks again for your help.

<searchComponent name="suggest" class="solr.SuggestComponent">
   <lst name="suggester">
    <str name="name">mySuggester</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>      <!--
org.apache.solr.spelling.suggest.fst -->
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
<!-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory
-->
    <str name="field">cat</str>
    <str name="weightField">price</str>
    <str name="suggestAnalyzerFieldType">string</str>
  </lst>
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>


On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com> wrote:

> Out of curiosity do you have any suggesters configured?  They tend to do
> hard lock ups if they need to rebuild.  Not saying that’s what’s going on
> but I’ve definitely had issues with them on restarting the servers and
> freezing the entire thing until they were done
>
> > On Jun 16, 2021, at 11:15 AM, Charlie Hubbard <charlie.hubb...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I had an event recently where my Solr Server CPU spiked, and stopped
> > answering incoming messages in a timely manner.  I'm having a hard time
> > pinpointing the cause.  I'm reviewing the log files, but I can't explain
> > why the CPU suddenly spiked.  Once this happened all of the
> > queries suddenly started to back up.  There doesn't appear to be many
> > queries coming in when this happened, but since queries are logged after
> > they complete I think it's hard to know for sure.
> >
> > My setup is:
> >
> >   - Single Node standalone server
> >   - OpenJDK 1.8
> >   - Solr 6.6.6
> >   - No replication configured
> >   -  4 collections ranging from 200mb - 20gb
> >   - 112 million docs in the collection in question
> >
> > Here is the section in the logs where the queries started to pile up.
> You
> > see the QTimes start to go way up.
> >
> > 2021-06-15 16:27:35.647 INFO  (qtp1836643189-506) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736
> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-483) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=132802
> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-474) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=154802
> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-467) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=121797
> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-459) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=165804
> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-487) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> QTime=110803
> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-497) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> QTime=798
> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-495) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798
> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-448) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=176807
> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-464) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> QTime=176809
> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-479) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> QTime=99804
> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-445) [   x:customer]
> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=187795
> >
> >
> > That burst of replication and luke handlers lasted for 1 second, but
> there
> > are 155 entries of that over and over.  Prior to this there isn't any
> > evidence that I can see there were any queries happening that might have
> > slowed things down.  There was some updating happening on the smaller
> > collection, but not on the collection in question.
> >
> > From the GC logs I did find some overlap where thread stoppages were
> taking
> > place around and leading up to the point in time.
> >
> > Heap after GC invocations=92 (full 1):
> >> par new generation   total 436928K, used 86155K [0x0000000540000000,
> >> 0x0000000560000000, 0x00000005e0000000)
> >>  eden space 349568K,   0% used [0x0000000540000000, 0x0000000540000000,
> >> 0x0000000555560000)
> >>  from space 87360K,  98% used [0x0000000555560000, 0x000000055a982ef8,
> >> 0x000000055aab0000)
> >>  to   space 87360K,   0% used [0x000000055aab0000, 0x000000055aab0000,
> >> 0x0000000560000000)
> >> concurrent mark-sweep generation total 1572864K, used 681710K
> >> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000)
> >> Metaspace       used 41348K, capacity 42508K, committed 42772K, reserved
> >> 1087488K
> >>  class space    used 4812K, capacity 5121K, committed 5140K, reserved
> >> 1048576K
> >> }
> >> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which application
> >> threads were stopped: 0.0653007 seconds, Stopping threads took:
> 0.0019805
> >> seconds
> >> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which application
> >> threads were stopped: 0.0041873 seconds, Stopping threads took:
> 0.0037697
> >> seconds
> >> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which application
> >> threads were stopped: 0.0029477 seconds, Stopping threads took:
> 0.0024678
> >> seconds
> >> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which application
> >> threads were stopped: 0.0224393 seconds, Stopping threads took:
> 0.0219226
> >> seconds
> >> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1
> >> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256 secs]
> >> [Times: user=0.01 sys=0.00, real=0.00 secs]
> >> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which application
> >> threads were stopped: 0.0060337 seconds, Stopping threads took:
> 0.0032933
> >> seconds
> >> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start]
> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark:
> 0.050/0.050
> >> secs] [Times: user=0.39 sys=0.00, real=0.05 secs]
> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-preclean-start]
> >> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean:
> >> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
> >> 2021-06-15T16:27:41.445+0000: 1464.231:
> >> [CMS-concurrent-abortable-preclean-start]
> >> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which application
> >> threads were stopped: 0.0018167 seconds, Stopping threads took:
> 0.0012697
> >> seconds
> >> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which application
> >> threads were stopped: 0.0009634 seconds, Stopping threads took:
> 0.0005877
> >> seconds
> >> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which application
> >> threads were stopped: 0.0006194 seconds, Stopping threads took:
> 0.0003124
> >> seconds
> >> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which application
> >> threads were stopped: 0.0006736 seconds, Stopping threads took:
> 0.0003113
> >> seconds
> >> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which application
> >> threads were stopped: 0.0009390 seconds, Stopping threads took:
> 0.0005340
> >> seconds
> >> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which application
> >> threads were stopped: 0.0016779 seconds, Stopping threads took:
> 0.0003837
> >> seconds
> >> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which application
> >> threads were stopped: 0.0006721 seconds, Stopping threads took:
> 0.0002727
> >> seconds
> >> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which application
> >> threads were stopped: 0.0003734 seconds, Stopping threads took:
> 0.0000991
> >> seconds
> >> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which application
> >> threads were stopped: 0.0004387 seconds, Stopping threads took:
> 0.0001799
> >> seconds
> >> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which application
> >> threads were stopped: 0.0004952 seconds, Stopping threads took:
> 0.0002235
> >> seconds
> >> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which application
> >> threads were stopped: 0.0016946 seconds, Stopping threads took:
> 0.0013149
> >> seconds
> >> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which application
> >> threads were stopped: 0.0008748 seconds, Stopping threads took:
> 0.0004336
> >> seconds
> >> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which application
> >> threads were stopped: 0.0624611 seconds, Stopping threads took:
> 0.0610425
> >> seconds
> >> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which application
> >> threads were stopped: 0.0329742 seconds, Stopping threads took:
> 0.0324816
> >> seconds
> >> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which application
> >> threads were stopped: 0.0010628 seconds, Stopping threads took:
> 0.0006594
> >> seconds
> >> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which application
> >> threads were stopped: 0.0583459 seconds, Stopping threads took:
> 0.0579144
> >> seconds
> >> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which application
> >> threads were stopped: 0.0007076 seconds, Stopping threads took:
> 0.0003400
> >> seconds
> >> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which application
> >> threads were stopped: 0.0020045 seconds, Stopping threads took:
> 0.0014785
> >> seconds
> >> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which application
> >> threads were stopped: 0.0007924 seconds, Stopping threads took:
> 0.0004075
> >> seconds
> >> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000: 1470.235:
> >> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times: user=9.74
> >> sys=0.27, real=6.00 secs]
> >> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG
> >> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full 2):
> >
> >
> > Also I had restarted the service at 14:50 which is almost 2 hours prior
> to
> > this.  I found a similar pattern in this one but involving a somewhat
> > smaller collection:
> >
> > 2021-06-15 15:44:09.698 INFO  (qtp1836643189-60) [   x:archiver]
> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/replication
> >> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397
> >> 2021-06-15 15:44:09.698 INFO  (qtp1836643189-23) [   x:archiver]
> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/admin/luke
> >> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0
> QTime=8399
> >> 2021-06-15 15:44:10.035 INFO  (qtp1836643189-24) [   x:archiver]
> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
> >> hits=1272929 status=0 QTime=50886
> >> 2021-06-15 15:44:10.039 INFO  (qtp1836643189-53) [   x:archiver]
> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
> >> hits=1272929 status=0 QTime=42102
> >> 2021-06-15 15:44:10.077 INFO  (qtp1836643189-53) [   x:archiver]
> >> o.a.s.s.HttpSolrCall Unable to write response, client closed connection
> or
> >> we are shutting down
> >
> >
> > Again this is preceded by these calls to replication and luke handler
> that
> > have fairly large QTime.
> >
> > I am not using replication, but it is enabled.  From researching the
> > command this is just a status call.  So where could this be triggered
> from?
> >
> > Beyond that strange event what else should I be looking at or changing in
> > my configuration to find these issues?  Memory was around 7gb of the max
> > 8gb.  GC doesn't seem excessive.  The memory is growing, but not spiking.
> > Should I turn up the memory allocation?
> >
> > TIA
> > Charlie
>

Reply via email to