Re: [EXTERNAL] Re: Garbage Collector

2019-03-22 Thread Ahmed Eljami
Thx guys for sharing your experiences with G1.

Since I sent you my question about GC, we have updated the version of java.
Always with CMS/java8 and updating from u9x to u201. Just with that, we
observe a gain of 66% (150ms ==> 50ms of STW) :)

We are planning a second tuning, this time with G1.

Thanks.

Le mar. 19 mars 2019 à 19:56, Durity, Sean R 
a écrit :

> My default is G1GC using 50% of available RAM (so typically a minimum of
> 16 GB for the JVM). That has worked in just about every case I’m familiar
> with. In the old days we used CMS, but tuning that beast is a black art
> with few wizards available (though several on this mailing list). Today, I
> just don’t see GC issues – unless there is a bad query in play. For me, the
> data model/query construction is the more fruitful path to achieving
> performance and reliability.
>
>
>
> Sean Durity
>
>
>
> *From:* Jon Haddad 
> *Sent:* Tuesday, March 19, 2019 2:16 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Garbage Collector
>
>
>
> G1 is optimized for high throughput with higher pause times.  It's great
> if you have mixed / unpredictable workloads, and as Elliott mentioned is
> mostly set & forget.
>
>
>
> ZGC requires Java 11, which is only supported on trunk.  I plan on messing
> with it soon, but I haven't had time yet.  We'll share the results on our
> blog (TLP) when we get to it.
>
>
>
> Jon
>
>
>
> On Tue, Mar 19, 2019 at 10:12 AM Elliott Sims 
> wrote:
>
> I use G1, and I think it's actually the default now for newer Cassandra
> versions.  For G1, I've done very little custom config/tuning.  I increased
> heap to 16GB (out of 64GB physical), but most of the rest is at or near
> default.  For the most part, it's been "feed it more RAM, and it works"
> compared to CMS's "lower overhead, works great until it doesn't" and dozens
> of knobs.
>
> I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really
> match or beat G1 quite yet.
>
>
>
> On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami 
> wrote:
>
> Hi Folks,
>
>
>
> Does someone use G1 GC or ZGC on production?
>
>
>
> Can you share your feedback, the configuration used if it's possible ?
>
>
>
> Thanks.
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


-- 
Cordialement;

Ahmed ELJAMI


RE: [EXTERNAL] Re: Garbage Collector

2019-03-19 Thread Durity, Sean R
My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB 
for the JVM). That has worked in just about every case I’m familiar with. In 
the old days we used CMS, but tuning that beast is a black art with few wizards 
available (though several on this mailing list). Today, I just don’t see GC 
issues – unless there is a bad query in play. For me, the data model/query 
construction is the more fruitful path to achieving performance and reliability.



Sean Durity

From: Jon Haddad 
Sent: Tuesday, March 19, 2019 2:16 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Garbage Collector

G1 is optimized for high throughput with higher pause times.  It's great if you 
have mixed / unpredictable workloads, and as Elliott mentioned is mostly set & 
forget.

ZGC requires Java 11, which is only supported on trunk.  I plan on messing with 
it soon, but I haven't had time yet.  We'll share the results on our blog (TLP) 
when we get to it.

Jon

On Tue, Mar 19, 2019 at 10:12 AM Elliott Sims 
mailto:elli...@backblaze.com>> wrote:
I use G1, and I think it's actually the default now for newer Cassandra 
versions.  For G1, I've done very little custom config/tuning.  I increased 
heap to 16GB (out of 64GB physical), but most of the rest is at or near 
default.  For the most part, it's been "feed it more RAM, and it works" 
compared to CMS's "lower overhead, works great until it doesn't" and dozens of 
knobs.
I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really 
match or beat G1 quite yet.

On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami 
mailto:ahmed.elj...@gmail.com>> wrote:
Hi Folks,

Does someone use G1 GC or ZGC on production?

Can you share your feedback, the configuration used if it's possible ?

Thanks.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Garbage Collector

2019-03-19 Thread Jon Haddad
G1 is optimized for high throughput with higher pause times.  It's great if
you have mixed / unpredictable workloads, and as Elliott mentioned is
mostly set & forget.

ZGC requires Java 11, which is only supported on trunk.  I plan on messing
with it soon, but I haven't had time yet.  We'll share the results on our
blog (TLP) when we get to it.

Jon

On Tue, Mar 19, 2019 at 10:12 AM Elliott Sims  wrote:

> I use G1, and I think it's actually the default now for newer Cassandra
> versions.  For G1, I've done very little custom config/tuning.  I increased
> heap to 16GB (out of 64GB physical), but most of the rest is at or near
> default.  For the most part, it's been "feed it more RAM, and it works"
> compared to CMS's "lower overhead, works great until it doesn't" and dozens
> of knobs.
>
> I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really
> match or beat G1 quite yet.
>
> On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami 
> wrote:
>
>> Hi Folks,
>>
>> Does someone use G1 GC or ZGC on production?
>>
>> Can you share your feedback, the configuration used if it's possible ?
>>
>> Thanks.
>>
>>


Re: Garbage Collector

2019-03-19 Thread Elliott Sims
I use G1, and I think it's actually the default now for newer Cassandra
versions.  For G1, I've done very little custom config/tuning.  I increased
heap to 16GB (out of 64GB physical), but most of the rest is at or near
default.  For the most part, it's been "feed it more RAM, and it works"
compared to CMS's "lower overhead, works great until it doesn't" and dozens
of knobs.

I haven't tried ZGC yet, but anecdotally I've heard that it doesn't really
match or beat G1 quite yet.

On Tue, Mar 19, 2019 at 9:44 AM Ahmed Eljami  wrote:

> Hi Folks,
>
> Does someone use G1 GC or ZGC on production?
>
> Can you share your feedback, the configuration used if it's possible ?
>
> Thanks.
>
>


Garbage Collector

2019-03-19 Thread Ahmed Eljami
Hi Folks,

Does someone use G1 GC or ZGC on production?

Can you share your feedback, the configuration used if it's possible ?

Thanks.


RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi,

my previously mentioned G1 bug does not seem to be related to your case

Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com]
Sent: Montag, 09. Oktober 2017 15:13
To: user@cassandra.apache.org
Subject: Re: Cassandra and G1 Garbage collector stop the world event (STW)

Hello,

@kurt greaves: Have you tried CMS with that sized heap?

Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with G1. 
The behavior is basically the same.


Using CMS suggested settings 
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=

Using G1 suggested settings 
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3


@Steinmaurer, Thomas If this happens in a very short very frequently and 
depending on your allocation rate in MB/s, a combination of the G1 bug and a 
small heap, might result going towards OOM.

We have a really high obj allocation rate:

Avg creation rate

622.9 mb/sec


Avg promotion rate

18.39 mb/sec


It could be the cause, where the GC can't keep up with this rate.

I'm stating to think it could be some wrong configuration where Cassandra is 
configured in a way that bursts allocations in a manner that G1 can't keep up 
with.

Any ideas?

Best regards,


2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>:
Hi,

although not happening here with Cassandra (due to using CMS), we had some 
weird problem with our server application e.g. hit by the following JVM/G1 bugs:
https://bugs.openjdk.java.net/browse/JDK-8140597
https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a duplicate of 
above)
https://bugs.openjdk.java.net/browse/JDK-8048556

Especially the first, JDK-8140597, might be interesting, if you see periodic 
humongous allocations (according to a GC log) resulting in mixed GC phases 
being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous 
allocations will happen if a single (?) allocation is > (region size / 2), if I 
remember correctly. Can’t recall the default G1 region size for a 12GB heap, 
but possibly 4MB. So, in case you are allocating something larger than > 2MB, 
you might end up in something called “humongous” allocations, spanning several 
G1 regions. If this happens in a very short very frequently and depending on 
your allocation rate in MB/s, a combination of the G1 bug and a small heap, 
might result going towards OOM.

Possibly worth a further route for investigation.

Regards,
Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com<mailto:scudel...@gmail.com>]
Sent: Montag, 09. Oktober 2017 13:12
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Cassandra and G1 Garbage collector stop the world event (STW)


Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been 
dealing a lot with garbage collector stop the world event, which can take up to 
50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not 
even accepting new logins.

Extra details:
• Cassandra Version: 3.11
• Heap Size = 12 GB
•     We are using G1 Garbage Collector with default settings
• Nodes size: 4 CPUs 28 GB RAM
• All CPU cores are at 100% all the time.
• The G1 GC behavior is the same across all nodes.

The behavior remains basically:
1.  Old Gen starts to fill up.
2.  GC can't clean it properly without a full GC and a STW event.
3.  The full GC starts to take longer, until the node is completely 
unresponsive.
Extra details and GC reports:
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>



The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Chris Lohfink
Can you share your schema and cfstats? This sounds kinda like a wide
partition, backed up compactions, or tombstone issue for it to create so
much and have issues like that so quickly with those settings.

A heap dump would be most telling but they are rather large and hard to
share.

Chris

On Mon, Oct 9, 2017 at 8:12 AM, Gustavo Scudeler <scudel...@gmail.com>
wrote:

> Hello,
>
> @kurt greaves: Have you tried CMS with that sized heap?
>
>
> Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with
> G1. The behavior is basically the same.
>
> *Using CMS suggested settings* http://gceasy.io/my-gc-report.jsp?p=
> c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=
>
> *Using G1 suggested settings* http://gceasy.io/my-gc-report.jsp?p=
> c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3
>
>
> @Steinmaurer, Thomas If this happens in a very short very frequently and
>> depending on your allocation rate in MB/s, a combination of the G1 bug and
>> a small heap, might result going towards OOM.
>
>
> We have a really high obj allocation rate:
>
> Avg creation rate  622.9 mb/sec
> Avg promotion rate  18.39 mb/sec
>
> It could be the cause, where the GC can't keep up with this rate.
>
> I'm stating to think it could be some wrong configuration where Cassandra is
> configured in a way that bursts allocations in a manner that G1 can't keep
> up with.
>
> Any ideas?
>
> Best regards,
>
>
> 2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com>:
>
>> Hi,
>>
>>
>>
>> although not happening here with Cassandra (due to using CMS), we had
>> some weird problem with our server application e.g. hit by the following
>> JVM/G1 bugs:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8140597
>>
>> https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a
>> duplicate of above)
>>
>> https://bugs.openjdk.java.net/browse/JDK-8048556
>>
>>
>>
>> Especially the first, JDK-8140597, might be interesting, if you see
>> periodic humongous allocations (according to a GC log) resulting in mixed
>> GC phases being steadily interrupted due to G1 bug, thus no GC in OLD
>> regions. Humongous allocations will happen if a single (?) allocation is >
>> (region size / 2), if I remember correctly. Can’t recall the default G1
>> region size for a 12GB heap, but possibly 4MB. So, in case you are
>> allocating something larger than > 2MB, you might end up in something
>> called “humongous” allocations, spanning several G1 regions. If this
>> happens in a very short very frequently and depending on your allocation
>> rate in MB/s, a combination of the G1 bug and a small heap, might result
>> going towards OOM.
>>
>>
>>
>> Possibly worth a further route for investigation.
>>
>>
>>
>> Regards,
>>
>> Thomas
>>
>>
>>
>> *From:* Gustavo Scudeler [mailto:scudel...@gmail.com]
>> *Sent:* Montag, 09. Oktober 2017 13:12
>> *To:* user@cassandra.apache.org
>> *Subject:* Cassandra and G1 Garbage collector stop the world event (STW)
>>
>>
>>
>> Hi guys,
>>
>> We have a 6 node Cassandra Cluster under heavy utilization. We have been
>> dealing a lot with garbage collector stop the world event, which can take
>> up to 50 seconds in our nodes, in the meantime Cassandra Node is
>> unresponsive, not even accepting new logins.
>>
>> Extra details:
>>
>> · Cassandra Version: 3.11
>>
>> · Heap Size = 12 GB
>>
>> · We are using G1 Garbage Collector with default settings
>>
>> ·     Nodes size: 4 CPUs 28 GB RAM
>>
>> · All CPU cores are at 100% all the time.
>>
>> · The G1 GC behavior is the same across all nodes.
>>
>> The behavior remains basically:
>>
>> 1.  Old Gen starts to fill up.
>>
>> 2.  GC can't clean it properly without a full GC and a STW event.
>>
>> 3.  The full GC starts to take longer, until the node is completely
>> unresponsive.
>>
>> *Extra details and GC reports:*
>>
>> https://stackoverflow.com/questions/46568777/cassandra-and-
>> g1-garbage-collector-stop-the-world-event-stw
>>
>>
>>
>> Can someone point me what configurations or events I could check?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Best regards,
>>
>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>>
>
>
>
>


Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
Hello,

@kurt greaves: Have you tried CMS with that sized heap?


Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with
G1. The behavior is basically the same.

*Using CMS suggested settings*
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=

*Using G1 suggested settings*
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3


@Steinmaurer, Thomas If this happens in a very short very frequently and
> depending on your allocation rate in MB/s, a combination of the G1 bug and
> a small heap, might result going towards OOM.


We have a really high obj allocation rate:

Avg creation rate  622.9 mb/sec
Avg promotion rate  18.39 mb/sec

It could be the cause, where the GC can't keep up with this rate.

I'm stating to think it could be some wrong configuration where Cassandra is
configured in a way that bursts allocations in a manner that G1 can't keep
up with.

Any ideas?

Best regards,


2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com>:

> Hi,
>
>
>
> although not happening here with Cassandra (due to using CMS), we had some
> weird problem with our server application e.g. hit by the following JVM/G1
> bugs:
>
> https://bugs.openjdk.java.net/browse/JDK-8140597
>
> https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a
> duplicate of above)
>
> https://bugs.openjdk.java.net/browse/JDK-8048556
>
>
>
> Especially the first, JDK-8140597, might be interesting, if you see
> periodic humongous allocations (according to a GC log) resulting in mixed
> GC phases being steadily interrupted due to G1 bug, thus no GC in OLD
> regions. Humongous allocations will happen if a single (?) allocation is >
> (region size / 2), if I remember correctly. Can’t recall the default G1
> region size for a 12GB heap, but possibly 4MB. So, in case you are
> allocating something larger than > 2MB, you might end up in something
> called “humongous” allocations, spanning several G1 regions. If this
> happens in a very short very frequently and depending on your allocation
> rate in MB/s, a combination of the G1 bug and a small heap, might result
> going towards OOM.
>
>
>
> Possibly worth a further route for investigation.
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* Gustavo Scudeler [mailto:scudel...@gmail.com]
> *Sent:* Montag, 09. Oktober 2017 13:12
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra and G1 Garbage collector stop the world event (STW)
>
>
>
> Hi guys,
>
> We have a 6 node Cassandra Cluster under heavy utilization. We have been
> dealing a lot with garbage collector stop the world event, which can take
> up to 50 seconds in our nodes, in the meantime Cassandra Node is
> unresponsive, not even accepting new logins.
>
> Extra details:
>
> · Cassandra Version: 3.11
>
> · Heap Size = 12 GB
>
> · We are using G1 Garbage Collector with default settings
>
> · Nodes size: 4 CPUs 28 GB RAM
>
> · All CPU cores are at 100% all the time.
>
> · The G1 GC behavior is the same across all nodes.
>
> The behavior remains basically:
>
> 1.  Old Gen starts to fill up.
>
> 2.  GC can't clean it properly without a full GC and a STW event.
>
> 3.  The full GC starts to take longer, until the node is completely
> unresponsive.
>
> *Extra details and GC reports:*
>
> https://stackoverflow.com/questions/46568777/cassandra-
> and-g1-garbage-collector-stop-the-world-event-stw
>
>
>
> Can someone point me what configurations or events I could check?
>
>
>
> Thanks!
>
>
>
> Best regards,
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>


RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi,

although not happening here with Cassandra (due to using CMS), we had some 
weird problem with our server application e.g. hit by the following JVM/G1 bugs:
https://bugs.openjdk.java.net/browse/JDK-8140597
https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a duplicate of 
above)
https://bugs.openjdk.java.net/browse/JDK-8048556

Especially the first, JDK-8140597, might be interesting, if you see periodic 
humongous allocations (according to a GC log) resulting in mixed GC phases 
being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous 
allocations will happen if a single (?) allocation is > (region size / 2), if I 
remember correctly. Can’t recall the default G1 region size for a 12GB heap, 
but possibly 4MB. So, in case you are allocating something larger than > 2MB, 
you might end up in something called “humongous” allocations, spanning several 
G1 regions. If this happens in a very short very frequently and depending on 
your allocation rate in MB/s, a combination of the G1 bug and a small heap, 
might result going towards OOM.

Possibly worth a further route for investigation.

Regards,
Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com]
Sent: Montag, 09. Oktober 2017 13:12
To: user@cassandra.apache.org
Subject: Cassandra and G1 Garbage collector stop the world event (STW)


Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been 
dealing a lot with garbage collector stop the world event, which can take up to 
50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not 
even accepting new logins.

Extra details:
· Cassandra Version: 3.11
· Heap Size = 12 GB
· We are using G1 Garbage Collector with default settings
· Nodes size: 4 CPUs 28 GB RAM
· All CPU cores are at 100% all the time.
· The G1 GC behavior is the same across all nodes.

The behavior remains basically:
1.  Old Gen starts to fill up.
2.  GC can't clean it properly without a full GC and a STW event.
3.  The full GC starts to take longer, until the node is completely 
unresponsive.
Extra details and GC reports:
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread kurt greaves
Have you tried CMS with that sized heap? G1 is only really worthwhile with
24gb+ heap size, which wouldn't really make sense on machines with 28gb of
RAM. In general CMS is found to work better for C*, leaving excess memory
to be utilised by the OS page cache​


Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been
dealing a lot with garbage collector stop the world event, which can take
up to 50 seconds in our nodes, in the meantime Cassandra Node is
unresponsive, not even accepting new logins.

Extra details:

   - Cassandra Version: 3.11
   - Heap Size = 12 GB
   - We are using G1 Garbage Collector with default settings
   - Nodes size: 4 CPUs 28 GB RAM
   - All CPU cores are at 100% all the time.
   - The G1 GC behavior is the same across all nodes.

The behavior remains basically:

   1. Old Gen starts to fill up.
   2. GC can't clean it properly without a full GC and a STW event.
   3. The full GC starts to take longer, until the node is completely
   unresponsive.

*Extra details and GC reports:*
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,


Re: Garbage collector launched on all nodes at once

2015-06-18 Thread Jonathan Haddad
How much memory do you have?  Recently people have been seeing really great
performance using G1GC with heaps  8GB and offheap memtable objects.

On Thu, Jun 18, 2015 at 1:31 AM Jason Wee peich...@gmail.com wrote:

 okay, iirc memtable has been removed off heap, google and got this
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
  apparently, there are still some reference on heap.

 On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson krum...@gmail.com
 wrote:

 It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Looks that memtable heap size is growing on some nodes rapidly (
 https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
 Drops are the places when nodes have been restarted.

 On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage
 collection is launched at the same time on each node (See [1] for total GC
 duration per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki






Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Marcus Eriksson
It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Looks that memtable heap size is growing on some nodes rapidly (
 https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
 Drops are the places when nodes have been restarted.

 On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki



Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Looks that memtable heap size is growing on some nodes rapidly (
https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
Drops are the places when nodes have been restarted.

On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




-- 
BR,
Michał Łowicki


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Jason Wee
okay, iirc memtable has been removed off heap, google and got this
http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
 apparently, there are still some reference on heap.

On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson krum...@gmail.com wrote:

 It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Looks that memtable heap size is growing on some nodes rapidly (
 https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
 Drops are the places when nodes have been restarted.

 On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki





Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Hi,

Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is
launched at the same time on each node (See [1] for total GC duration per 5
seconds). RF is set to 3. Any ideas?

[1]
https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

-- 
BR,
Michał Łowicki


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Edward Capriolo
Haha Ok.
It is not a total waste, but practically your time is better spent in other
places. The problem is just about everything is a moving target, schema,
request rate, hardware. Generally tuning nudges a couple variables in one
direction or the other and you see some decent returns. But each nudge
takes a restart and a warm up period, and with how Cassandra distributes
requests you likely have to flip several nodes or all of them before you
can see the change! By the time you do that its probably a different day or
week. Essentially finding our if one setting is better then the other is
like a 3 day test in production.

Before c* I used to deal with this in tomcat. Once in a while we would get
a dev that read some article about tuning, something about a new jvm, or
collector. With bright eyed enthusiasm they would want to try tuning our
current cluster. They spend a couple days and measure something and say it
was good lower memory usage. Meanwhile someone else would come to me and
say higher 95th response time. More short pauses, fewer long pauses,
great taste, less filing.

Most people just want to roflscale their huroku cloud. Tuning stuff is
sysadmin work and the cloud has taught us that the cost of sysadmins are
needless waste of money.

Just kidding !

But I do believe the default cassandra settings are reasonable and
typically I find that most who look at tuning GC usually need more hardware
and actually need to be tuning something somewhere else.

G1 is the perfect example of a time suck. Claims low pause latency for big
heaps, and delivers something regarded by the Cassandra community (and
hbase as well) that works worse then CMS. If you spent 3 hours switching
tuning knobs and analysing, that is 3 hours of your life you will never get
back.

Better to let SUN and other people worry about tuning (at least from where
I sit)

On Saturday, September 15, 2012, Peter Schuller peter.schul...@infidyne.com
wrote:
 Generally tuning the garbage collector is a waste of time.

 Sorry, that's BS. It can be absolutely critical, when done right, and
 only useless when done wrong. There's a spectrum in between.

 Just follow
 someone else's recommendation and use that.

 No, don't.

 Most recommendations out there are completely useless in the general
 case because someone did some very specific benchmark under very
 specific circumstances and then recommends some particular combination
 of options. In order to understand whether a particular recommendation
 applies to you, you need to know enough about your use-case that I
 suspect you're better of just reading up on the available options and
 figuring things out. Of course, randomly trying various different
 settings to see which seems to work well may be realistic - but you
 loose predictability (in the face of changing patterns of traffic for
 example) if you don't know why it's behaving like it is.

 If you care about GC related behavior you want to understand how the
 application behaves, how the garbage collector behaves, what your
 requirements are, and select settings based on those requirements and
 how the application and GC behavior combine to produce emergent
 behavior. The best GC options may vary *wildly* depending on the
 nature of your cluster and your goals. There are also non-GC settings
 (in the specific case of Cassandra) that affect the interaction with
 the garbage collector, like whether you're using row/key caching, or
 things like phi conviction threshold and/or timeouts. It's very hard
 for anyone to give generalized recommendations. If it weren't,
 Cassandra would ship with The One True set of settings that are always
 the best and there would be no discussion.

 It's very unfortunate that the state of GC in the freely available
 JVM:s is at this point given that there exists known and working
 algorithms (and at least one practical implementation) that avoids it,
 mostly. But, it's the situation we're in. The only way around it that
 I know of if you're on Hotspot, is to have the application behave in
 such a way that it avoids the causes of un-predictable behavior w.r.t.
 GC by being careful about it's memory allocation and *retention*
 profile. For the specific case of avoiding *ever* seeing a full gc, it
 gets even more complex.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)



Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Peter Schuller
 It is not a total waste, but practically your time is better spent in other
 places. The problem is just about everything is a moving target, schema,
 request rate, hardware. Generally tuning nudges a couple variables in one
 direction or the other and you see some decent returns. But each nudge takes
 a restart and a warm up period, and with how Cassandra distributes requests
 you likely have to flip several nodes or all of them before you can see the
 change! By the time you do that its probably a different day or week.
 Essentially finding our if one setting is better then the other is like a 3
 day test in production.

 Before c* I used to deal with this in tomcat. Once in a while we would get a
 dev that read some article about tuning, something about a new jvm, or
 collector. With bright eyed enthusiasm they would want to try tuning our
 current cluster. They spend a couple days and measure something and say it
 was good lower memory usage. Meanwhile someone else would come to me and
 say higher 95th response time. More short pauses, fewer long pauses, great
 taste, less filing.

That's why blind blackbox testing isn't the way to go. Understanding
what the application does, what the GC does, and the goals you have in
mind is more fruitful. For example, are you trying to improve p99?
Maybe you want to improve p999 at the cost of worse p99? What about
failure modes (non-happy cases)? Perhaps you don't care about
few-hundred-ms pauses but want to avoid full gc:s? There's lots of
different goals one might have, and workloads.

Testing is key, but only in combination with some directed choice of
what to tweak. Especially since it's hard to test for for the
non-happy cases (e.g., node takes a burst of traffic and starts
promoting everything into old-gen prior to processing a request,
resulting in a death spiral).

 G1 is the perfect example of a time suck. Claims low pause latency for big
 heaps, and delivers something regarded by the Cassandra community (and hbase
 as well) that works worse then CMS. If you spent 3 hours switching tuning
 knobs and analysing, that is 3 hours of your life you will never get back.

This is similar to saying that someone told you to switch to CMS (or,
use some particular flag, etc), you tried it, and it didn't have the
result you expected.

G1 and CMS have different trade-offs. Nether one will consistently
result in better latencies across the board. It's all about the
details.

 Better to let SUN and other people worry about tuning (at least from where I
 sit)

They're not tuning. They are providing very general purpose default
behavior, including things that make *no* sense at all with Cassandra.
For example, the default behavior with CMS is to try to make the
marking phase run as late as possible so that it finishes just prior
to heap exhaustion, in order to optimize for throughput; except
that's not a good idea for many cases because is exacerbates
fragmentation problems in old-gen by pushing usage very high
repeatedly, and it increases the chance of full gc because marking
started too late (even if you don't hit promotion failures due to
fragmentation). Sudden changes in workloads (e.g., compaction kicks
in) also makes it harder for CMS's mark triggering heuristics to work
well.

As such, default options for Cassandra are use certain settings that
diverge from that of the default behavior of the JVM, because
Cassandra-in-general is much more specific a use-case than the
completely general target audience of the JVM. Similarly, a particular
cluster (with certain workloads/goals/etc) is a yet more specific
use-case than Cassandra-in-general and may be better served by
settings that differ from that of default Cassandra.

But, I certainly agree with this (which I think roughly matches what
you're saying): Don't randomly pick options someone claims is good in
a blog post and expect it to just make things better. If it were that
easy, it would be the default behavior for obvious reasons. The reason
it's not, is likely that it depends on the situation. Further, even if
you do play the lottery and win - if you don't know *why*, how are you
able to extrapolate the behavior of the system with slightly changed
workloads? It's very hard to blackbox-test GC settings, which is
probably why GC tuning can be perceived as a useless game of
whack-a-mole.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-15 Thread Edward Capriolo
Generally tuning the garbage collector is a waste of time. Just follow
someone else's recommendation and use that.

The problem with tuning is that workloads change then you have to tune
again and again. New garbage collectors come out and you have to tune again
and again. Someone at your company reads a blog about some new jvm and its
awesomeness and you tune again and again, cassandra adds off heap caching
you tune again and again.

All this work takes a lot of time and usually results in  negligible
returns. Garbage collectors and tuning is not magic bullets.

On Wednesday, September 12, 2012, Peter Schuller 
peter.schul...@infidyne.com wrote:
 Our full gc:s are typically not very frequent. Few days or even weeks
 in between, depending on cluster.

 *PER NODE* that is. On a cluster of hundreds of nodes, that's pretty
 often (and all it takes is a single node).

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)



Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-15 Thread Peter Schuller
 Generally tuning the garbage collector is a waste of time.

Sorry, that's BS. It can be absolutely critical, when done right, and
only useless when done wrong. There's a spectrum in between.

 Just follow
 someone else's recommendation and use that.

No, don't.

Most recommendations out there are completely useless in the general
case because someone did some very specific benchmark under very
specific circumstances and then recommends some particular combination
of options. In order to understand whether a particular recommendation
applies to you, you need to know enough about your use-case that I
suspect you're better of just reading up on the available options and
figuring things out. Of course, randomly trying various different
settings to see which seems to work well may be realistic - but you
loose predictability (in the face of changing patterns of traffic for
example) if you don't know why it's behaving like it is.

If you care about GC related behavior you want to understand how the
application behaves, how the garbage collector behaves, what your
requirements are, and select settings based on those requirements and
how the application and GC behavior combine to produce emergent
behavior. The best GC options may vary *wildly* depending on the
nature of your cluster and your goals. There are also non-GC settings
(in the specific case of Cassandra) that affect the interaction with
the garbage collector, like whether you're using row/key caching, or
things like phi conviction threshold and/or timeouts. It's very hard
for anyone to give generalized recommendations. If it weren't,
Cassandra would ship with The One True set of settings that are always
the best and there would be no discussion.

It's very unfortunate that the state of GC in the freely available
JVM:s is at this point given that there exists known and working
algorithms (and at least one practical implementation) that avoids it,
mostly. But, it's the situation we're in. The only way around it that
I know of if you're on Hotspot, is to have the application behave in
such a way that it avoids the causes of un-predictable behavior w.r.t.
GC by being careful about it's memory allocation and *retention*
profile. For the specific case of avoiding *ever* seeing a full gc, it
gets even more complex.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
 Relatedly, I'd love to learn how to reliably reproduce full GC pauses
 on C* 1.1+.

Our full gc:s are typically not very frequent. Few days or even weeks
in between, depending on cluster. But it happens on several clusters;
I'm guessing most (but I haven't done a systematic analysis). The only
question is how often. But given the lack of handling of such failure
modes, the effect on clients is huge. Recommend data reads by default
to mitigate this and a slew of other sources of problems (and for
counter increments, we're rolling out least-active-request routing).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
 I was able to run IBM Java 7 with Cassandra (could not do it with 1.6
 because of snappy). It has a new Garbage collection policy (called balanced)
 that is good for very large heap size (over 8 GB), documented here that is
 so promising with Cassandra. I have not tried it but I like to see how it is
 in action.

FWIW, J9's balanced collector is very similar to G1 in it's design.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
 Our full gc:s are typically not very frequent. Few days or even weeks
 in between, depending on cluster.

*PER NODE* that is. On a cluster of hundreds of nodes, that's pretty
often (and all it takes is a single node).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-11 Thread Jonathan Ellis
Relatedly, I'd love to learn how to reliably reproduce full GC pauses
on C* 1.1+.

On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-11 Thread Shahryar Sedghi
I was able to run IBM Java 7 with Cassandra (could not do it with 1.6
because of snappy). It has a new Garbage collection policy (called
balanced)  that is good for very large heap size (over 8 GB),
documented 
http://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html
herehttp://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html
that
is so promising with Cassandra. I have not tried it but I like to see how
it is in action.

Regrads

Shahryar

On Mon, Sep 10, 2012 at 1:37 PM, Oleg Dulin oleg.du...@gmail.com wrote:

 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/





JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-10 Thread Oleg Dulin

I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

It is my feeble attempt to reduce Full GC pauses.

Has anyone had any experience with this ? Anyone tried it ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-10 Thread Peter Schuller
 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

Have tried; for some workloads it's looking promising. This is without
key cache and row cache and with a pretty large young gen.

The main think you'll want to look for is whether your post-mixed mode
collection heap usage remains stable or keeps growing. The main issue
with G1 that causes fallbacks to full GC is regions becoming
effectively uncollectable due to high remembered set scanning costs
(driven by inter-region pointers). If you can avoid that, one might
hope to avoid full gc:s all-together.

The jury is still out on my side; but like I said, I've seen promising
indications.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)