cc mailinglist Hello,
I thought that would come to your mind but do not worry, the heap averages at 55 % all day long, there is very little garbage collection going on, and if so, it is the eden space that gets collected. If you really want, i can send such a file when the problem occurs again, but even at those moments, GC is minimal and the heap stays at about 55 - 60 % and only peaks every 15 minutes when documents are indexed. Thanks, Markus -----Original message----- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Wednesday 19th July 2017 16:08 > To: Markus Jelsma <markus.jel...@openindex.io> > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > On 7/19/2017 3:35 AM, Markus Jelsma wrote: > > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is > > going crazy after a good part of the day has passed. It starts eating CPU > > for no good reason and its latency goes up. Grafana graphs show the problem > > really well > > > > After restarting 2/6 nodes, there is also quite a distinction in the > > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on > > self time (CPU)). The busy nodes are deeply red in > > o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the > > restarted nodes are not. > > > > The real distinction between busy and calm nodes is that busy nodes all > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as > > second to fillBuffer(), what are they doing?! Why? The calm nodes don't > > show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted > > nodes don't. > > > > So, actually, i don't have a clue! Any, any ideas? > > > > Thanks, > > Markus > > > > Each replica is underpowered but performing really well after restart (and > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size > > 18 GB. > > A 900MB heap seems very small for an 18GB index with millions of > documents. The first thing I would suspect is that the heap is running > very near the maximum and the JVM is spending a lot of time doing > garbage collection. Can you share the gc.log file from an instance that > is running the high CPU so this can be checked? I'd also be interested > in seeing solrconfig.xml. > > Thanks, > Shawn > >