[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2020-11-18 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Resolution: Duplicate
Status: Resolved  (was: Open)

DUP of CASSANDRA-16201

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Thomas Steinmaurer
>Priority: Urgent
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg, 
> cassandra_3.11.0_min_memory_utilization.jpg
>
>
> In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
> the same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.18. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle the same load, which is 
> cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
> mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
> attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
> compaction being the top contributor for the captured 5min time-frame. Could 
> be by "accident" covering the 5min with compaction as top contributor only 
> (although mentioned simulated client load is attached), but according to 
> stack traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() 
> etc. popping up as top contributor, thus possibly new classes / data 
> structures are causing much more object churn now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2018-11-18 Thread C. Scott Andreas (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-13900:
-
Component/s: Core

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg, 
> cassandra_3.11.0_min_memory_utilization.jpg
>
>
> In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
> the same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.18. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle the same load, which is 
> cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
> mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
> attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
> compaction being the top contributor for the captured 5min time-frame. Could 
> be by "accident" covering the 5min with compaction as top contributor only 
> (although mentioned simulated client load is attached), but according to 
> stack traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() 
> etc. popping up as top contributor, thus possibly new classes / data 
> structures are causing much more object churn now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-27 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Attachment: cassandra_3.11.0_min_memory_utilization.jpg

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg, 
> cassandra_3.11.0_min_memory_utilization.jpg
>
>
> In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
> the same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.18. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle the same load, which is 
> cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
> mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
> attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
> compaction being the top contributor for the captured 5min time-frame. Could 
> be by "accident" covering the 5min with compaction as top contributor only 
> (although mentioned simulated client load is attached), but according to 
> stack traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() 
> etc. popping up as top contributor, thus possibly new classes / data 
> structures are causing much more object churn now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
the same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.18. We 
also didn't upgrade sstables yet, thus the increase mentioned in the screenshot 
is not related to any manually triggered maintenance operation after upgrading 
to 3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. 
popping up as top contributor, thus possibly new classes / data structures are 
causing much more object churn now.

  was:
In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
the same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. 
popping up as top contributor, thus possibly new classes / data structures are 
causing much more object churn now.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
the same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. 
popping up as top contributor, thus possibly new classes / data structures are 
causing much more object churn now.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. 
popping up as top contributor, thus possibly new classes / data structures are 
causing much more object churn now.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>
>
> I

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() etc. 
popping up as top contributor, thus possibly new classes / data structures are 
causing much more object churn now.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes in 3.0, e.g. BTreeRow.searchIterator() etc. pop up 
as top contributor, thus possibly new classes / data structures are causing 
much more object churn now.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>
>
> In short: Af

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
compaction being the top contributor for the captured 5min time-frame. Could be 
by "accident" covering the 5min with compaction as top contributor only 
(although mentioned simulated client load is attached), but according to stack 
traces, we see new classes in 3.0, e.g. BTreeRow.searchIterator() etc. pop up 
as top contributor, thus possibly new classes / data structures are causing 
much more object churn now.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CP

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Attachment: cassandra3014_jfr_5min.jpg

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle the same load, which is 
> cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
> mentioned load, but can provide JFR session for our current 3.0.14 setup



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup, if 
needed.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered main

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
* Client requests are entirely CQL

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup, if 
needed.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup, if 
needed.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> 

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
mentioned load, but can provide JFR session for our current 3.0.14 setup, if 
needed.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means 

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load against 2.1.18 is something, 3.0.14 can't 
handle. So, we would need to either scale up (e.g. m4.xlarge => m4.2xlarge) or 
scale out for being able to handle the same load, which is cost-wise not an 
option.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a *GC suspension time increase 
by a factor of > 2*, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned in the screenshot is 
not related to any manually triggered maintenance operation after upgrading to 
3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load



--
This messa

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%. 
See: attached screen "cassandra2118_vs_3014.jpg"

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load



--
This message was sent by Atlassian

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2118_vs_3014.jpg|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

--

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.18_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2118_vs_3014.jpg|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Attachment: (was: cassandra2.1.18_vs_3.0.14.png)

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2118_vs_3014.jpg!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2118_vs_3014.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2118_vs_3014.jpg!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Attachment: cassandra2118_vs_3014.jpg

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2.1.18_vs_3.0.14.png|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.18_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!!
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2118_vs_3014.jpg, cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2.1.18_vs_3.0.14.png|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sen

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!!
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !!
> !cassandra2.1.8_vs_3.0.14.png|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.

[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!attachment-name.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.18_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !attachment-name.jpg|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2017-09-25 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Description: 
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.8_vs_3.0.14.png|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.

  was:
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
same incoming write load on the same infrastructure anymore.

We have a loadtest environment running 24x7 testing our software using 
Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2

per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
AVG with constant, simulated load running against our cluster, using Cassandra 
2.1 for > 2 years now.

Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also 
didn't upgrade sstables yet, thus the increase mentioned below is not related 
to any manually triggered maintenance operation after upgrading to 3.0.14.

According to our monitoring, with 3.0.14, we see a GC suspension time increase 
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!attachment-name.jpg|thumbnail!

This all means that our incoming load for several weeks now against 2.1.18 is 
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
m4.2xlarge) or scale out for being able to handle the same load.


> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Blocker
> Attachments: cassandra2.1.18_vs_3.0.14.png
>
>
> In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the 
> same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.8. We 
> also didn't upgrade sstables yet, thus the increase mentioned below is not 
> related to any manually triggered maintenance operation after upgrading to 
> 3.0.14.
> According to our monitoring, with 3.0.14, we see a GC suspension time 
> increase by a factor of > 2, of course directly correlating with an CPU 
> increase > 80%.
> !cassandra2.1.8_vs_3.0.14.png|thumbnail!
> This all means that our incoming load for several weeks now against 2.1.18 is 
> something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to 
> m4.2xlarge) or scale out for being able to handle the same load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)