Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

Jason Wee Mon, 01 Dec 2014 23:10:22 -0800

Hi Rob, any recommended documentation on describing
explanation/configuration of the JVM heap and permanent generation ? We
stucked in this same situation too. :(


Jason

On Tue, Dec 2, 2014 at 3:42 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Nov 28, 2014 at 12:55 PM, Paulo Ricardo Motta Gomes <
> paulo.mo...@chaordicsystems.com> wrote:
>
>> We restart the whole cluster every 1 or 2 months, to avoid machines
>> getting into this crazy state. We tried tuning GC size and parameters,
>> different cassandra versions (1.1, 1.2, 2.0), but this behavior keeps
>> happening. More recently, during black friday, we received about 5x our
>> normal load, and some machines started presenting this behavior. Once
>> again, we restart the nodes an the GC behaves normal again.
>> ...
>> You can clearly notice some memory is actually reclaimed during GC in
>> healthy nodes, while in sick machines very little memory is reclaimed.
>> Also, since GC is executed more frequently in sick machines, it uses about
>> 2x more CPU than non-sick nodes.
>>
>> Have you ever observed this behavior in your cluster? Could this be
>> related to heap fragmentation? Would using the G1 collector help in this
>> case? Any GC tuning or monitoring advice to troubleshoot this issue?
>>
>
> The specific combo of symptoms does in fact sound like a combination of
> being close to heap exhaustion with working set and then fragmentation
> putting you over the top.
>
> I would probably start by increasing your heap, which will help avoid the
> pre-fail condition from your working set.
>
> But for tuning, examine the contents of each generation when the JVM gets
> into this state. You are probably exhausting permanent generation, but
> depending on what that says, you could change the relatively sizing of the
> generations.
>
> =Rob
>
>

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

Reply via email to