RE: 6.6 cloud starting to eat CPU after 8+ hours

Markus Jelsma Thu, 20 Jul 2017 05:52:36 -0700

cc mailinglist

Hello,


I thought that would come to your mind but do not worry, the heap averages at 
55 % all day long, there is very little garbage collection going on, and if so, 
it is the eden space that gets collected. If you really want, i can send such a 
file when the problem occurs again, but even at those moments, GC is minimal 
and the heap stays at about 55 - 60 % and only peaks every 15 minutes when 
documents are indexed.

Thanks,
Markus
 
-----Original message-----
> From:Shawn Heisey <apa...@elyograg.org>
> Sent: Wednesday 19th July 2017 16:08
> To: Markus Jelsma <markus.jel...@openindex.io>
> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> 
> On 7/19/2017 3:35 AM, Markus Jelsma wrote:
> > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is 
> > going crazy after a good part of the day has passed. It starts eating CPU 
> > for no good reason and its latency goes up. Grafana graphs show the problem 
> > really well
> >
> > After restarting 2/6 nodes, there is also quite a distinction in the 
> > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on 
> > self time (CPU)). The busy nodes are deeply red in 
> > o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the 
> > restarted nodes are not.
> >
> > The real distinction between busy and calm nodes is that busy nodes all 
> > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as 
> > second to fillBuffer(), what are they doing?! Why? The calm nodes don't 
> > show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted 
> > nodes don't.
> >
> > So, actually, i don't have a clue! Any, any ideas? 
> >
> > Thanks,
> > Markus
> >
> > Each replica is underpowered but performing really well after restart (and 
> > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 
> > 18 GB.
> 
> A 900MB heap seems very small for an 18GB index with millions of
> documents.  The first thing I would suspect is that the heap is running
> very near the maximum and the JVM is spending a lot of time doing
> garbage collection.  Can you share the gc.log file from an instance that
> is running the high CPU so this  can be checked?  I'd also be interested
> in seeing solrconfig.xml.
> 
> Thanks,
> Shawn
> 
>

RE: 6.6 cloud starting to eat CPU after 8+ hours

Reply via email to