Hi Rob, any recommended documentation on describing explanation/configuration of the JVM heap and permanent generation ? We stucked in this same situation too. :(
Jason On Tue, Dec 2, 2014 at 3:42 AM, Robert Coli <rc...@eventbrite.com> wrote: > On Fri, Nov 28, 2014 at 12:55 PM, Paulo Ricardo Motta Gomes < > paulo.mo...@chaordicsystems.com> wrote: > >> We restart the whole cluster every 1 or 2 months, to avoid machines >> getting into this crazy state. We tried tuning GC size and parameters, >> different cassandra versions (1.1, 1.2, 2.0), but this behavior keeps >> happening. More recently, during black friday, we received about 5x our >> normal load, and some machines started presenting this behavior. Once >> again, we restart the nodes an the GC behaves normal again. >> ... >> You can clearly notice some memory is actually reclaimed during GC in >> healthy nodes, while in sick machines very little memory is reclaimed. >> Also, since GC is executed more frequently in sick machines, it uses about >> 2x more CPU than non-sick nodes. >> >> Have you ever observed this behavior in your cluster? Could this be >> related to heap fragmentation? Would using the G1 collector help in this >> case? Any GC tuning or monitoring advice to troubleshoot this issue? >> > > The specific combo of symptoms does in fact sound like a combination of > being close to heap exhaustion with working set and then fragmentation > putting you over the top. > > I would probably start by increasing your heap, which will help avoid the > pre-fail condition from your working set. > > But for tuning, examine the contents of each generation when the JVM gets > into this state. You are probably exhausting permanent generation, but > depending on what that says, you could change the relatively sizing of the > generations. > > =Rob > >