Hi Rahul,

Thank you for your response.  Unfortunately, this spike happened yesterday,
and I didn't capture this data at the time.  But this is very good
information if I should see this again.

I should add that I found at 14:50 I had rebooted the server due to the
server crashing from an OOM condition which now makes me think I just need
to crank up the Xmx memory on the server.  But I wanted to rule out there
wasn't something else going on before I raise the memory and move on.

Charlie

On Wed, Jun 16, 2021 at 11:48 AM Rahul Goswami <rahul196...@gmail.com>
wrote:

> Forgot to mention, you need to collect the output while you observe the CPU
> spike.
>
> On Wed, Jun 16, 2021 at 11:38 AM Rahul Goswami <rahul196...@gmail.com>
> wrote:
>
> > Also, can you please share the output of the below REST call from the
> > browser (any non-IE browser)?
> >
> > localhost:8983/solr/admin/info/threads
> >
> > Please replace the Solr port in the above call as per your environment.
> >
> > - Rahul
> >
> > On Wed, Jun 16, 2021 at 11:22 AM Dave <hastings.recurs...@gmail.com>
> > wrote:
> >
> >> Out of curiosity do you have any suggesters configured?  They tend to do
> >> hard lock ups if they need to rebuild.  Not saying that’s what’s going
> on
> >> but I’ve definitely had issues with them on restarting the servers and
> >> freezing the entire thing until they were done
> >>
> >> > On Jun 16, 2021, at 11:15 AM, Charlie Hubbard <
> >> charlie.hubb...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I had an event recently where my Solr Server CPU spiked, and stopped
> >> > answering incoming messages in a timely manner.  I'm having a hard
> time
> >> > pinpointing the cause.  I'm reviewing the log files, but I can't
> explain
> >> > why the CPU suddenly spiked.  Once this happened all of the
> >> > queries suddenly started to back up.  There doesn't appear to be many
> >> > queries coming in when this happened, but since queries are logged
> after
> >> > they complete I think it's hard to know for sure.
> >> >
> >> > My setup is:
> >> >
> >> >   - Single Node standalone server
> >> >   - OpenJDK 1.8
> >> >   - Solr 6.6.6
> >> >   - No replication configured
> >> >   -  4 collections ranging from 200mb - 20gb
> >> >   - 112 million docs in the collection in question
> >> >
> >> > Here is the section in the logs where the queries started to pile up.
> >> You
> >> > see the QTimes start to go way up.
> >> >
> >> > 2021-06-15 16:27:35.647 INFO  (qtp1836643189-506) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736
> >> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-483) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=132802
> >> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-474) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=154802
> >> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-467) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=121797
> >> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-459) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=165804
> >> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-487) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> >> QTime=110803
> >> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-497) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> >> QTime=798
> >> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-495) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798
> >> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-448) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=176807
> >> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-464) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> >> QTime=176809
> >> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-479) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
> >> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
> >> QTime=99804
> >> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-445) [   x:customer]
> >> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623773588323} status=0
> QTime=187795
> >> >
> >> >
> >> > That burst of replication and luke handlers lasted for 1 second, but
> >> there
> >> > are 155 entries of that over and over.  Prior to this there isn't any
> >> > evidence that I can see there were any queries happening that might
> have
> >> > slowed things down.  There was some updating happening on the smaller
> >> > collection, but not on the collection in question.
> >> >
> >> > From the GC logs I did find some overlap where thread stoppages were
> >> taking
> >> > place around and leading up to the point in time.
> >> >
> >> > Heap after GC invocations=92 (full 1):
> >> >> par new generation   total 436928K, used 86155K [0x0000000540000000,
> >> >> 0x0000000560000000, 0x00000005e0000000)
> >> >>  eden space 349568K,   0% used [0x0000000540000000,
> 0x0000000540000000,
> >> >> 0x0000000555560000)
> >> >>  from space 87360K,  98% used [0x0000000555560000,
> 0x000000055a982ef8,
> >> >> 0x000000055aab0000)
> >> >>  to   space 87360K,   0% used [0x000000055aab0000,
> 0x000000055aab0000,
> >> >> 0x0000000560000000)
> >> >> concurrent mark-sweep generation total 1572864K, used 681710K
> >> >> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000)
> >> >> Metaspace       used 41348K, capacity 42508K, committed 42772K,
> >> reserved
> >> >> 1087488K
> >> >>  class space    used 4812K, capacity 5121K, committed 5140K, reserved
> >> >> 1048576K
> >> >> }
> >> >> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which
> >> application
> >> >> threads were stopped: 0.0653007 seconds, Stopping threads took:
> >> 0.0019805
> >> >> seconds
> >> >> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which
> >> application
> >> >> threads were stopped: 0.0041873 seconds, Stopping threads took:
> >> 0.0037697
> >> >> seconds
> >> >> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which
> >> application
> >> >> threads were stopped: 0.0029477 seconds, Stopping threads took:
> >> 0.0024678
> >> >> seconds
> >> >> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which
> >> application
> >> >> threads were stopped: 0.0224393 seconds, Stopping threads took:
> >> 0.0219226
> >> >> seconds
> >> >> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1
> >> >> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256
> >> secs]
> >> >> [Times: user=0.01 sys=0.00, real=0.00 secs]
> >> >> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which
> >> application
> >> >> threads were stopped: 0.0060337 seconds, Stopping threads took:
> >> 0.0032933
> >> >> seconds
> >> >> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start]
> >> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark:
> >> 0.050/0.050
> >> >> secs] [Times: user=0.39 sys=0.00, real=0.05 secs]
> >> >> 2021-06-15T16:27:41.440+0000: 1464.225:
> [CMS-concurrent-preclean-start]
> >> >> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean:
> >> >> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
> >> >> 2021-06-15T16:27:41.445+0000: 1464.231:
> >> >> [CMS-concurrent-abortable-preclean-start]
> >> >> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which
> >> application
> >> >> threads were stopped: 0.0018167 seconds, Stopping threads took:
> >> 0.0012697
> >> >> seconds
> >> >> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which
> >> application
> >> >> threads were stopped: 0.0009634 seconds, Stopping threads took:
> >> 0.0005877
> >> >> seconds
> >> >> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which
> >> application
> >> >> threads were stopped: 0.0006194 seconds, Stopping threads took:
> >> 0.0003124
> >> >> seconds
> >> >> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which
> >> application
> >> >> threads were stopped: 0.0006736 seconds, Stopping threads took:
> >> 0.0003113
> >> >> seconds
> >> >> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which
> >> application
> >> >> threads were stopped: 0.0009390 seconds, Stopping threads took:
> >> 0.0005340
> >> >> seconds
> >> >> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which
> >> application
> >> >> threads were stopped: 0.0016779 seconds, Stopping threads took:
> >> 0.0003837
> >> >> seconds
> >> >> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which
> >> application
> >> >> threads were stopped: 0.0006721 seconds, Stopping threads took:
> >> 0.0002727
> >> >> seconds
> >> >> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which
> >> application
> >> >> threads were stopped: 0.0003734 seconds, Stopping threads took:
> >> 0.0000991
> >> >> seconds
> >> >> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which
> >> application
> >> >> threads were stopped: 0.0004387 seconds, Stopping threads took:
> >> 0.0001799
> >> >> seconds
> >> >> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which
> >> application
> >> >> threads were stopped: 0.0004952 seconds, Stopping threads took:
> >> 0.0002235
> >> >> seconds
> >> >> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which
> >> application
> >> >> threads were stopped: 0.0016946 seconds, Stopping threads took:
> >> 0.0013149
> >> >> seconds
> >> >> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which
> >> application
> >> >> threads were stopped: 0.0008748 seconds, Stopping threads took:
> >> 0.0004336
> >> >> seconds
> >> >> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which
> >> application
> >> >> threads were stopped: 0.0624611 seconds, Stopping threads took:
> >> 0.0610425
> >> >> seconds
> >> >> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which
> >> application
> >> >> threads were stopped: 0.0329742 seconds, Stopping threads took:
> >> 0.0324816
> >> >> seconds
> >> >> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which
> >> application
> >> >> threads were stopped: 0.0010628 seconds, Stopping threads took:
> >> 0.0006594
> >> >> seconds
> >> >> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which
> >> application
> >> >> threads were stopped: 0.0583459 seconds, Stopping threads took:
> >> 0.0579144
> >> >> seconds
> >> >> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which
> >> application
> >> >> threads were stopped: 0.0007076 seconds, Stopping threads took:
> >> 0.0003400
> >> >> seconds
> >> >> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which
> >> application
> >> >> threads were stopped: 0.0020045 seconds, Stopping threads took:
> >> 0.0014785
> >> >> seconds
> >> >> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which
> >> application
> >> >> threads were stopped: 0.0007924 seconds, Stopping threads took:
> >> 0.0004075
> >> >> seconds
> >> >> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000:
> 1470.235:
> >> >> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times:
> user=9.74
> >> >> sys=0.27, real=6.00 secs]
> >> >> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG
> >> >> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full
> 2):
> >> >
> >> >
> >> > Also I had restarted the service at 14:50 which is almost 2 hours
> prior
> >> to
> >> > this.  I found a similar pattern in this one but involving a somewhat
> >> > smaller collection:
> >> >
> >> > 2021-06-15 15:44:09.698 INFO  (qtp1836643189-60) [   x:archiver]
> >> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/replication
> >> >> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397
> >> >> 2021-06-15 15:44:09.698 INFO  (qtp1836643189-23) [   x:archiver]
> >> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/admin/luke
> >> >> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0
> >> QTime=8399
> >> >> 2021-06-15 15:44:10.035 INFO  (qtp1836643189-24) [   x:archiver]
> >> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
> >> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
> >> >> hits=1272929 status=0 QTime=50886
> >> >> 2021-06-15 15:44:10.039 INFO  (qtp1836643189-53) [   x:archiver]
> >> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
> >> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
> >> >> hits=1272929 status=0 QTime=42102
> >> >> 2021-06-15 15:44:10.077 INFO  (qtp1836643189-53) [   x:archiver]
> >> >> o.a.s.s.HttpSolrCall Unable to write response, client closed
> >> connection or
> >> >> we are shutting down
> >> >
> >> >
> >> > Again this is preceded by these calls to replication and luke handler
> >> that
> >> > have fairly large QTime.
> >> >
> >> > I am not using replication, but it is enabled.  From researching the
> >> > command this is just a status call.  So where could this be triggered
> >> from?
> >> >
> >> > Beyond that strange event what else should I be looking at or changing
> >> in
> >> > my configuration to find these issues?  Memory was around 7gb of the
> max
> >> > 8gb.  GC doesn't seem excessive.  The memory is growing, but not
> >> spiking.
> >> > Should I turn up the memory allocation?
> >> >
> >> > TIA
> >> > Charlie
> >>
> >
>

Reply via email to