[jira] [Updated] (CASSANDRA-19554) Website - Download section - Update / remove EOL dates

2024-04-11 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-19554:
---
Description: 
Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
terms of EOL, running unsupported Cassandra versions and they often refer to 
what is stated in https://cassandra.apache.org/_/download.html (as the only 
source available?) and don't really think about the dependency to 5.0 GA, but 
just reflecting EOL date information there.

As of April 11, 2024, the download section states the following information:
 !image-2024-04-11-13-15-52-317.png! 

According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...

Either remove these EOL estimates or keep them stronly maintained aligned with 
an updated 5.0 GA timeline.

Thanks!


  was:
Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
terms of EOL, running unsupported Cassandra versions and they often refer to 
what is stated in https://cassandra.apache.org/_/download.html (as the only 
source available) and don't really think about the dependency to 5.0 GA, but 
just reflecting EOL date information there.

As of April 11, 2024, the download section states the following information:
 !image-2024-04-11-13-15-52-317.png! 

According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...

Either remove these EOL estimates or keep them stronly maintained aligned with 
an updated 5.0 GA timeline.

Thanks!



> Website - Download section - Update / remove EOL dates
> --
>
> Key: CASSANDRA-19554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19554
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: image-2024-04-11-13-15-52-317.png
>
>
> Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
> terms of EOL, running unsupported Cassandra versions and they often refer to 
> what is stated in https://cassandra.apache.org/_/download.html (as the only 
> source available?) and don't really think about the dependency to 5.0 GA, but 
> just reflecting EOL date information there.
> As of April 11, 2024, the download section states the following information:
>  !image-2024-04-11-13-15-52-317.png! 
> According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...
> Either remove these EOL estimates or keep them stronly maintained aligned 
> with an updated 5.0 GA timeline.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19554) Website - Download section - Update / remove EOL dates

2024-04-11 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-19554:
--

 Summary: Website - Download section - Update / remove EOL dates
 Key: CASSANDRA-19554
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19554
 Project: Cassandra
  Issue Type: Task
  Components: Documentation/Website
Reporter: Thomas Steinmaurer
 Attachments: image-2024-04-11-13-15-52-317.png

Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
terms of EOL, running unsupported Cassandra versions and they often refer to 
what is stated in https://cassandra.apache.org/_/download.html (as the only 
source available) and don't really think about the dependency to 5.0 GA, but 
just reflecting EOL date information there.

As of April 11, 2024, the download section states the following information:
 !image-2024-04-11-13-15-52-317.png! 

According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...

Either remove these EOL estimates or keep them stronly maintained aligned with 
an updated 5.0 GA timeline.

Thanks!




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19554) Website - Download section - Update / remove EOL dates

2024-04-11 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-19554:
---
Description: 
Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
terms of EOL, running unsupported Cassandra versions and they often refer to 
what is stated in https://cassandra.apache.org/_/download.html (as the only 
source available?) and don't really think about the dependency to 5.0 GA, but 
just reflecting EOL date information there.

As of April 11, 2024, the download section states the following information:
 !image-2024-04-11-13-15-52-317.png! 

According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...

Either remove these EOL estimates or keep them strongly maintained aligned with 
an updated 5.0 GA timeline.

Thanks!


  was:
Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
terms of EOL, running unsupported Cassandra versions and they often refer to 
what is stated in https://cassandra.apache.org/_/download.html (as the only 
source available?) and don't really think about the dependency to 5.0 GA, but 
just reflecting EOL date information there.

As of April 11, 2024, the download section states the following information:
 !image-2024-04-11-13-15-52-317.png! 

According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...

Either remove these EOL estimates or keep them stronly maintained aligned with 
an updated 5.0 GA timeline.

Thanks!



> Website - Download section - Update / remove EOL dates
> --
>
> Key: CASSANDRA-19554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19554
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: image-2024-04-11-13-15-52-317.png
>
>
> Enterprise customers with on-prem Cassandra usage can be pretty nitpicking in 
> terms of EOL, running unsupported Cassandra versions and they often refer to 
> what is stated in https://cassandra.apache.org/_/download.html (as the only 
> source available?) and don't really think about the dependency to 5.0 GA, but 
> just reflecting EOL date information there.
> As of April 11, 2024, the download section states the following information:
>  !image-2024-04-11-13-15-52-317.png! 
> According to that, 3.x is unmaintained, 4.0 soon to be EOL etc ...
> Either remove these EOL estimates or keep them strongly maintained aligned 
> with an updated 5.0 GA timeline.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support macOS M1 arm64

2023-11-03 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer reassigned CASSANDRA-18891:
--

Assignee: Thomas Steinmaurer  (was: Maxim Muzafarov)

> Cassandra 4.0 - JNA 5.6.0 does not support macOS M1 arm64
> -
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Assignee: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: signature.asc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> From linked ticket:
> "Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library. JNA 5.6.0 does not support arm64 architecture 
> (Apple M1 devices), causing cassandra to fail on bootstrap."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support macOS M1 arm64

2023-11-03 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782532#comment-17782532
 ] 

Thomas Steinmaurer commented on CASSANDRA-18891:


[~mmuzaf], not yet. I'm really sorry, but the whole ticket was a bit misleading 
then. Sorry for the confusion. I was under the impression that 4.0 and arm64 
was a general compatibility issue fixed in 4.1 and not specific to arm64 on 
Apple Silicon. 

> Cassandra 4.0 - JNA 5.6.0 does not support macOS M1 arm64
> -
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: signature.asc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> From linked ticket:
> "Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library. JNA 5.6.0 does not support arm64 architecture 
> (Apple M1 devices), causing cassandra to fail on bootstrap."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770297#comment-17770297
 ] 

Thomas Steinmaurer commented on CASSANDRA-18891:


Sounds great!

> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> From linked ticket:
> "Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library. JNA 5.6.0 does not support arm64 architecture 
> (Apple M1 devices), causing cassandra to fail on bootstrap."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-18891:
---
Description: 
As discussed on Slack: 
[https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]

Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
Gravition instances e.g. m7g already with Cassandra 4.0.

>From linked ticket:
"Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
binding into the C library. JNA 5.6.0 does not support arm64 architecture 
(Apple M1 devices), causing cassandra to fail on bootstrap."

  was:
As discussed on Slack: 
[https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]

Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
Gravition instances e.g. m7g already with Cassandra 4.0.

Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
binding into the C library.

JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
cassandra to fail on bootstrap.


> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> From linked ticket:
> "Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library. JNA 5.6.0 does not support arm64 architecture 
> (Apple M1 devices), causing cassandra to fail on bootstrap."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-18891:
---
Description: 
As discussed on Slack: 
[https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]

Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
Gravition instances e.g. m7g already with Cassandra 4.0.

Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
binding into the C library.

JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
cassandra to fail on bootstrap.

  was:
Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
binding into the C library.

JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
cassandra to fail on bootstrap.
 Bumping the dependency to 5.9.0 adds arm64 support. Will a PR to bump the 
dependency be acceptable ?


> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library.
> JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
> cassandra to fail on bootstrap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-18891:
---
Fix Version/s: 4.0.x

> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> As discussed on Slack: 
> [https://the-asf.slack.com/archives/CJZLTM05A/p1684745250901489]
> Created this ticket as clone of CASSANDRA-17019, to ask for considering a JNA 
> library upgrade in Cassandra 4.0, so that we could utilize ARM-based AWS 
> Gravition instances e.g. m7g already with Cassandra 4.0.
> Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library.
> JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
> cassandra to fail on bootstrap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-18891:
---
Fix Version/s: (was: 4.1-alpha1)

> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library.
> JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
> cassandra to fail on bootstrap.
>  Bumping the dependency to 5.9.0 adds arm64 support. Will a PR to bump the 
> dependency be acceptable ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer reassigned CASSANDRA-18891:
--

Assignee: (was: Yuqi Gu)

> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.1-alpha1
>
>
> Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library.
> JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
> cassandra to fail on bootstrap.
>  Bumping the dependency to 5.9.0 adds arm64 support. Will a PR to bump the 
> dependency be acceptable ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-18891:
--

 Summary: Cassandra 4.0 - JNA 5.6.0 does not support arm64
 Key: CASSANDRA-18891
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
 Project: Cassandra
  Issue Type: Bug
  Components: Dependencies
Reporter: Thomas Steinmaurer
Assignee: Yuqi Gu
 Fix For: 4.1-alpha1


Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
binding into the C library.

JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
cassandra to fail on bootstrap.
 Bumping the dependency to 5.9.0 adds arm64 support. Will a PR to bump the 
dependency be acceptable ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18891) Cassandra 4.0 - JNA 5.6.0 does not support arm64

2023-09-28 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-18891:
---
Source Control Link:   (was: 
https://github.com/apache/cassandra/commit/2043cb9fb6b25ff34afb90467b9476a09acc3933)

> Cassandra 4.0 - JNA 5.6.0 does not support arm64
> 
>
> Key: CASSANDRA-18891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18891
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> Cassandra depends on net.java.dev.jna.jna version 5.6.0 to do the native 
> binding into the C library.
> JNA 5.6.0 does not support arm64 architecture (Apple M1 devices), causing 
> cassandra to fail on bootstrap.
>  Bumping the dependency to 5.9.0 adds arm64 support. Will a PR to bump the 
> dependency be acceptable ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16555) Add support for AWS Ec2 IMDSv2

2023-06-29 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738372#comment-17738372
 ] 

Thomas Steinmaurer commented on CASSANDRA-16555:


Thanks a lot for driving that forward!

Quick question though, without having the 3.11 PR checked (in detail): In case 
IMDSv2 fails (and v2 being the new default, as it seems), for whatever reason, 
e.g. also while refreshing the token, will there be a silent fallback to v1 
(old behavior) + e.g. a log entry in cassandra.log, to remain Cassandra 
operational? Just a thought, as the default going forward from pre 3.11.16 to 
3.11.16 has changed. Thanks a lot.

> Add support  for AWS Ec2 IMDSv2
> ---
>
> Key: CASSANDRA-16555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16555
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Paul Rütter (BlueConic)
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.0.30, 3.11.16, 4.0.11, 4.1.3, 5.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In order to patch a vulnerability, Amazon came up with a new version of their 
> metadata service.
> It's no longer unrestricted but now requires a token (in a header), in order 
> to access the metadata service.
> See 
> [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html]
>  for more information.
> Cassandra currently doesn't offer an out-of-the-box snitch class to support 
> this.
> See 
> [https://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes]
> This issue asks to add support for this as a separate snitch class.
> We'll probably do a PR for this, as we are in the process of developing one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16555) Add out-of-the-box snitch for Ec2 IMDSv2

2023-05-02 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718590#comment-17718590
 ] 

Thomas Steinmaurer commented on CASSANDRA-16555:


As there was a "VOTE on 3.11.15" sent out today in dev mailing list, I guess 
this addition won't make it into 3.11.15.

> Add out-of-the-box snitch for Ec2 IMDSv2
> 
>
> Key: CASSANDRA-16555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16555
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Paul Rütter (BlueConic)
>Assignee: fulco taen
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In order to patch a vulnerability, Amazon came up with a new version of their 
> metadata service.
> It's no longer unrestricted but now requires a token (in a header), in order 
> to access the metadata service.
> See 
> [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html]
>  for more information.
> Cassandra currently doesn't offer an out-of-the-box snitch class to support 
> this.
> See 
> [https://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes]
> This issue asks to add support for this as a separate snitch class.
> We'll probably do a PR for this, as we are in the process of developing one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18169) Warning at startup in 3.11.11 or above version of Cassandra

2023-01-17 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677731#comment-17677731
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-18169 at 1/17/23 12:13 PM:
--

We have seen that in the past as well, where this WARN log produced a bit of 
confusion after upgrading to 3.11.11+. 
https://issues.apache.org/jira/browse/CASSANDRA-16619?focusedCommentId=17441530=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17441530

Can be wrong, but perhaps related to the "md" => "me" SSTable format upgrade 
with 3.11.11+, when 3.11.11+ is reading "md" files upon startup.


was (Author: tsteinmaurer):
We have seen that in the past as well, where this WARN log produced a bit of 
confusion after upgrading to 3.11.11+. 
https://issues.apache.org/jira/browse/CASSANDRA-16619?focusedCommentId=17441530=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17441530

Can be wrong, but perhaps related to the "md" => "me" SSTable format upgrade 
with 3.11.11+.

> Warning at startup in 3.11.11 or above version of Cassandra
> ---
>
> Key: CASSANDRA-18169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18169
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Mohammad Aburadeh
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.15
>
>
> We are seeing the following warning in Cassandra 3.11.11/14 at startup : 
> {code:java}
> WARN  [main] 2022-12-27 16:41:28,016 CommitLogReplayer.java:253 - Origin of 2 
> sstables is unknown or doesn't match the local node; commitLogIntervals for 
> them were ignored
> DEBUG [main] 2022-12-27 16:41:28,016 CommitLogReplayer.java:254 - Ignored 
> commitLogIntervals from the following sstables: 
> [/data/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/me-65-big-Data.db,
>  
> /data/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/me-64-big-Data.db]
>  {code}
> It looks like HostID metadata is missing at startup in the system.local 
> table. 
> We noticed that this issue does not exist in the 4.0.X version of Cassandra. 
> Could you please fix it in 3.11.X Cassandra? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18169) Warning at startup in 3.11.11 or above version of Cassandra

2023-01-17 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677731#comment-17677731
 ] 

Thomas Steinmaurer commented on CASSANDRA-18169:


We have seen that in the past as well, where this WARN log produced a bit of 
confusion after upgrading to 3.11.11+. 
https://issues.apache.org/jira/browse/CASSANDRA-16619?focusedCommentId=17441530=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17441530

Can be wrong, but perhaps related to the "md" => "me" SSTable format upgrade 
with 3.11.11+.

> Warning at startup in 3.11.11 or above version of Cassandra
> ---
>
> Key: CASSANDRA-18169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18169
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Mohammad Aburadeh
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.15
>
>
> We are seeing the following warning in Cassandra 3.11.11/14 at startup : 
> {code:java}
> WARN  [main] 2022-12-27 16:41:28,016 CommitLogReplayer.java:253 - Origin of 2 
> sstables is unknown or doesn't match the local node; commitLogIntervals for 
> them were ignored
> DEBUG [main] 2022-12-27 16:41:28,016 CommitLogReplayer.java:254 - Ignored 
> commitLogIntervals from the following sstables: 
> [/data/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/me-65-big-Data.db,
>  
> /data/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/me-64-big-Data.db]
>  {code}
> It looks like HostID metadata is missing at startup in the system.local 
> table. 
> We noticed that this issue does not exist in the 4.0.X version of Cassandra. 
> Could you please fix it in 3.11.X Cassandra? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16555) Add out-of-the-box snitch for Ec2 IMDSv2

2022-12-16 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648566#comment-17648566
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-16555 at 12/16/22 10:55 AM:
---

I wonder if the dedicated/separate snitch implementation is the best way to 
move forward, if I may challenge that :).

Perhaps would it make more sense to extend the existing {{Ec2Snitch}} 
implementation to make it configurable for being used to IMDSv2 or perhaps even 
smarter in a way, that it first automatically detects what is available on the 
EC2 instance and then simply uses that behind the scene?


was (Author: tsteinmaurer):
I wonder if the dedicated/separate is the best way to move forward, if I may 
challenge that :).

Perhaps would it make more sense to extend the existing \{{Ec2Snitch}} 
implementation to make it configurable for being used to IMDSv2 or perhaps even 
smarter in a way, that it first automatically detects what is available on the 
EC2 instance and then simply uses that behind the scene?

> Add out-of-the-box snitch for Ec2 IMDSv2
> 
>
> Key: CASSANDRA-16555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16555
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Paul Rütter (BlueConic)
>Assignee: fulco taen
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In order to patch a vulnerability, Amazon came up with a new version of their 
> metadata service.
> It's no longer unrestricted but now requires a token (in a header), in order 
> to access the metadata service.
> See 
> [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html]
>  for more information.
> Cassandra currently doesn't offer an out-of-the-box snitch class to support 
> this.
> See 
> [https://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes]
> This issue asks to add support for this as a separate snitch class.
> We'll probably do a PR for this, as we are in the process of developing one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16555) Add out-of-the-box snitch for Ec2 IMDSv2

2022-12-16 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648566#comment-17648566
 ] 

Thomas Steinmaurer commented on CASSANDRA-16555:


I wonder if the dedicated/separate is the best way to move forward, if I may 
challenge that :).

Perhaps would it make more sense to extend the existing \{{Ec2Snitch}} 
implementation to make it configurable for being used to IMDSv2 or perhaps even 
smarter in a way, that it first automatically detects what is available on the 
EC2 instance and then simply uses that behind the scene?

> Add out-of-the-box snitch for Ec2 IMDSv2
> 
>
> Key: CASSANDRA-16555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16555
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Paul Rütter (BlueConic)
>Assignee: fulco taen
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In order to patch a vulnerability, Amazon came up with a new version of their 
> metadata service.
> It's no longer unrestricted but now requires a token (in a header), in order 
> to access the metadata service.
> See 
> [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html]
>  for more information.
> Cassandra currently doesn't offer an out-of-the-box snitch class to support 
> this.
> See 
> [https://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes]
> This issue asks to add support for this as a separate snitch class.
> We'll probably do a PR for this, as we are in the process of developing one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9881) Rows with negative-sized keys can't be skipped by sstablescrub

2022-12-14 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647675#comment-17647675
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-9881 at 12/15/22 6:12 AM:
-

Interesting, I'm also getting this with 3.11.14 while trying to scrub a single 
Cassandra table, where it seems that a single physical SSTable on disk is 
broken. Seems to be in an infinite loop with the same log line as shown above. 
No progress according to "nodetool compactionstats" etc ...
{noformat}
WARN  [CompactionExecutor:3252] 2022-12-14 19:29:32,206 UTC 
OutputHandler.java:57 - Error reading partition (unreadable key) (stacktrace 
follows):
java.io.IOError: java.io.IOException: Unable to read partition key from data 
file
at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:222)
at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1052)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:86)
at 
org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:399)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:319)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Unable to read partition key from data file
{noformat}


was (Author: tsteinmaurer):
Interesting, I'm also getting this with 3.11.14 while trying to scrub a single 
Cassandra table, where it seems that a single physical SSTable on disk is 
broken. Seems to be in an infinite loop with the same log line as shown above. 
No progress according to "nodetool compactionstats" etc ...

> Rows with negative-sized keys can't be skipped by sstablescrub
> --
>
> Key: CASSANDRA-9881
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9881
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Priority: Low
> Fix For: 2.1.x
>
>
> It is possible to have corruption in such a way that scrub (on or offline) 
> can't skip the row, so you end up in a loop where this just keeps repeating:
> {noformat}
> WARNING: Row starting at position 2087453 is unreadable; skipping to next 
> Reading row at 2087453 
> row (unreadable key) is -1 bytes
> {noformat}
> The workaround is to just delete the problem sstable since you were going to 
> have to repair anyway, but it would still be nice to salvage the rest of the 
> sstable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9881) Rows with negative-sized keys can't be skipped by sstablescrub

2022-12-14 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647675#comment-17647675
 ] 

Thomas Steinmaurer commented on CASSANDRA-9881:
---

Interesting, I'm also getting this with 3.11.14 while trying to scrub a single 
Cassandra table, where it seems that a single physical SSTable on disk is 
broken. Seems to be in an infinite loop with the same log line as shown above. 
No progress according to "nodetool compactionstats" etc ...

> Rows with negative-sized keys can't be skipped by sstablescrub
> --
>
> Key: CASSANDRA-9881
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9881
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Priority: Low
> Fix For: 2.1.x
>
>
> It is possible to have corruption in such a way that scrub (on or offline) 
> can't skip the row, so you end up in a loop where this just keeps repeating:
> {noformat}
> WARNING: Row starting at position 2087453 is unreadable; skipping to next 
> Reading row at 2087453 
> row (unreadable key) is -1 bytes
> {noformat}
> The workaround is to just delete the problem sstable since you were going to 
> have to repair anyway, but it would still be nice to salvage the rest of the 
> sstable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16555) Add out-of-the-box snitch for Ec2 IMDSv2

2022-12-08 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644718#comment-17644718
 ] 

Thomas Steinmaurer commented on CASSANDRA-16555:


[~brandon.williams] many thanks for picking this up! As there are PRs available 
now, how realistic would it be that this goes into the not yet released 
3.11.15? Again, thanks a lot!

> Add out-of-the-box snitch for Ec2 IMDSv2
> 
>
> Key: CASSANDRA-16555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16555
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Paul Rütter (BlueConic)
>Assignee: fulco taen
>Priority: Normal
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In order to patch a vulnerability, Amazon came up with a new version of their 
> metadata service.
> It's no longer unrestricted but now requires a token (in a header), in order 
> to access the metadata service.
> See 
> [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html]
>  for more information.
> Cassandra currently doesn't offer an out-of-the-box snitch class to support 
> this.
> See 
> [https://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes]
> This issue asks to add support for this as a separate snitch class.
> We'll probably do a PR for this, as we are in the process of developing one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17840) IndexOutOfBoundsException in Paging State Version Inference (V3 State Received on V4 Connection)

2022-09-05 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600581#comment-17600581
 ] 

Thomas Steinmaurer commented on CASSANDRA-17840:


Any chance this is similar, also fixes CASSANDRA-17507?

> IndexOutOfBoundsException in Paging State Version Inference (V3 State 
> Received on V4 Connection)
> 
>
> Key: CASSANDRA-17840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
> Fix For: 3.11.14, 4.0.6, 4.1, 4.2
>
>
> In {{PagingState.java}}, {{index}} is an integer field, and we add long 
> values to it without a {{Math.toIntExact}} check. While we’re checking for 
> negative return values returned by {{getUnsignedVInt}}, there's a chance that 
> the value returned by it is so large that addition operation would cause 
> integer overflow, or the value itself is large enough to cause overflow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2022-04-03 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516623#comment-17516623
 ] 

Thomas Steinmaurer commented on CASSANDRA-17507:


According to [https://the-asf.slack.com/archives/CJZLTM05A/p1648727883515419,] 
not known, possibly a bug causing queries to fail during the rolling upgrade, 
thus I have opened this ticket.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
> at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:69)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.fetchPage(SinglePartitionPager.java:32)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:352)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:400)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:88)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:244)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:723)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:701)
> at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:159)
> at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
> at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:86)
> at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:106)
> at 
> org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:70)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Created] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2022-03-31 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-17507:
--

 Summary: IllegalArgumentException in query code path during 
3.11.12 => 4.0.3 rolling upgrade
 Key: CASSANDRA-17507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
 Project: Cassandra
  Issue Type: Bug
Reporter: Thomas Steinmaurer


In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables etc. 
- with ~ 1TB SSTables on disk per node, I have been running a rolling upgrade 
to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following exception 
regularly, which disappeared once all 6 nodes have been on 4.0.3. Is this 
known? Can this be ignored? As said, just a test drive, but not sure if we want 
to have that in production, especially with a larger number of nodes, where it 
could take some time, until all are upgraded. Thanks!
{code}
ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
ErrorMessage.java:457 - Unexpected exception during request
java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
at java.base/java.nio.Buffer.limit(Buffer.java:346)
at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
at 
org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
at 
org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
at 
org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
at 
org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
at 
org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
at 
org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
at 
org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:69)
at 
org.apache.cassandra.service.pager.SinglePartitionPager.fetchPage(SinglePartitionPager.java:32)
at 
org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:352)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:400)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:88)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:244)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:723)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:701)
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:159)
at 
org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
at 
org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:86)
at 
org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:106)
at 
org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:70)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17204) Upgrade to Logback 1.2.8 (security)

2021-12-22 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463855#comment-17463855
 ] 

Thomas Steinmaurer commented on CASSANDRA-17204:


Should 1.2.9 perhaps be used?

> Upgrade to Logback 1.2.8 (security)
> ---
>
> Key: CASSANDRA-17204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17204
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies
>Reporter: Jochen Schalanda
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Logback 1.2.8 has been released with a fix for a potential vulnerability in 
> its JNDI lookup.
>  * [http://logback.qos.ch/news.html]
>  * [https://jira.qos.ch/browse/LOGBACK-1591]
> {quote}*14th of December, 2021, Release of version 1.2.8*
> We note that the vulnerability mentioned in LOGBACK-1591 requires write 
> access to logback's configuration file as a prerequisite.
> * • In response to LOGBACK-1591, we have disabled all JNDI lookup code in 
> logback until further notice. This impacts {{ContextJNDISelector}} and 
> {{}} element in configuration files.
> * Also in response to LOGBACK-1591, we have removed all database (JDBC) 
> related code in the project with no replacement.
> We note that the vulnerability mentioned in LOGBACK-1591 requires write 
> access to logback's configuration file as a prerequisite. A successful RCE 
> requires all of the following to be true:
> * write access to logback.xml
> * use of versions < 1.2.8
> * reloading of poisoned configuration data, which implies application restart 
> or scan="true" set prior to attack
> Therefore and as an additional precaution, in addition to upgrading to 
> version 1.2.8, we also recommend users to set their logback configuration 
> files as read-only.
> {quote}
> This is not as bad as CVE-2021-44228 in Log4j <2.15.0 (Log4Shell), but should 
> probably be fixed anyway.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-11-09 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441530#comment-17441530
 ] 

Thomas Steinmaurer commented on CASSANDRA-16619:


Regarding the WARN log, which got introduced by that ticket, e.g.:
{noformat}
WARN  [main] 2021-11-08 21:54:06,826 CommitLogReplayer.java:253 - Origin of 1 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
{noformat}

While I understand the intention to ensure / avoid things when SSTables have 
been copied around (or e.g. due to a restore), the WARN log also seems to 
happen when Cassandra 3.11.11 reads pre-"*me*" SSTables, thus e.g. from 
3.11.10. I understand that the WARN log will go away eventually on its own 
resp. for sure (I guess?) after running "nodetool upgradesstables".

These sort of WARN log has produced quite some confusion and customer 
interaction for on-premise customer installations.
* Would it be possible to WARN only if we are in context of a "me" SSTable to 
avoid confusion after upgrading from pre-3.11.11?
* Would it be possible to mention a SSTable minor upgrade in e.g. {{NEWS.txt}} 
(or perhaps I missed it), as there might be tooling out there which counts 
number of SSTables per "format" via file name

Many thanks.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11418) Nodetool status should reflect hibernate/replacing states

2021-08-03 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392008#comment-17392008
 ] 

Thomas Steinmaurer commented on CASSANDRA-11418:


As 4.0 has been released, is this something which could be picked up in the 
near future, perhaps even for the 3.11 series?

Reason is that showing a node in normal state when using replace_address is not 
only confusing for operators, but especially for any automation/monitoring 
tooling behind a Cassandra cluster.

Additionally, when a Cassandra process disables Gossip (and client protocols) 
due to disk issues, other nodes will see it as DN but running nodetool on this 
particular node will report UN, although Gossip is disabled. This is 
additionally confusing for any automation/monitoring tooling.

> Nodetool status should reflect hibernate/replacing states
> -
>
> Key: CASSANDRA-11418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Observability, Tool/nodetool
>Reporter: Joel Knighton
>Assignee: Shaurya Gupta
>Priority: Low
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: cassandra-11418-trunk
>
>
> Currently, the four options for state in nodetool status are 
> joining/leaving/moving/normal.
> Joining nodes are determined based on bootstrap tokens, leaving nodes are 
> based on leaving endpoints in TokenMetadata, moving nodes are based on moving 
> endpoints in TokenMetadata.
> This means that a node will appear in normal state when going through a 
> bootstrap with flag replace_address, which can be confusing to operators.
> We should add another state for hibernation/replacing to make this visible. 
> This will require a way to get a list of all hibernating endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11418) Nodetool status should reflect hibernate/replacing states

2021-08-03 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer reassigned CASSANDRA-11418:
--

Assignee: Shaurya Gupta  (was: Thomas Steinmaurer)

> Nodetool status should reflect hibernate/replacing states
> -
>
> Key: CASSANDRA-11418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Observability, Tool/nodetool
>Reporter: Joel Knighton
>Assignee: Shaurya Gupta
>Priority: Low
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: cassandra-11418-trunk
>
>
> Currently, the four options for state in nodetool status are 
> joining/leaving/moving/normal.
> Joining nodes are determined based on bootstrap tokens, leaving nodes are 
> based on leaving endpoints in TokenMetadata, moving nodes are based on moving 
> endpoints in TokenMetadata.
> This means that a node will appear in normal state when going through a 
> bootstrap with flag replace_address, which can be confusing to operators.
> We should add another state for hibernation/replacing to make this visible. 
> This will require a way to get a list of all hibernating endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-11418) Nodetool status should reflect hibernate/replacing states

2021-08-03 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-11418:
---
Authors: Shaurya Gupta  (was: Thomas Steinmaurer)

> Nodetool status should reflect hibernate/replacing states
> -
>
> Key: CASSANDRA-11418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Observability, Tool/nodetool
>Reporter: Joel Knighton
>Assignee: Thomas Steinmaurer
>Priority: Low
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: cassandra-11418-trunk
>
>
> Currently, the four options for state in nodetool status are 
> joining/leaving/moving/normal.
> Joining nodes are determined based on bootstrap tokens, leaving nodes are 
> based on leaving endpoints in TokenMetadata, moving nodes are based on moving 
> endpoints in TokenMetadata.
> This means that a node will appear in normal state when going through a 
> bootstrap with flag replace_address, which can be confusing to operators.
> We should add another state for hibernation/replacing to make this visible. 
> This will require a way to get a list of all hibernating endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11418) Nodetool status should reflect hibernate/replacing states

2021-08-03 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer reassigned CASSANDRA-11418:
--

Assignee: Thomas Steinmaurer  (was: Shaurya Gupta)

> Nodetool status should reflect hibernate/replacing states
> -
>
> Key: CASSANDRA-11418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Observability, Tool/nodetool
>Reporter: Joel Knighton
>Assignee: Thomas Steinmaurer
>Priority: Low
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: cassandra-11418-trunk
>
>
> Currently, the four options for state in nodetool status are 
> joining/leaving/moving/normal.
> Joining nodes are determined based on bootstrap tokens, leaving nodes are 
> based on leaving endpoints in TokenMetadata, moving nodes are based on moving 
> endpoints in TokenMetadata.
> This means that a node will appear in normal state when going through a 
> bootstrap with flag replace_address, which can be confusing to operators.
> We should add another state for hibernation/replacing to make this visible. 
> This will require a way to get a list of all hibernating endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16442) Improve handling of failed prepared statement loading

2021-02-11 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16442:
---
Fix Version/s: 3.11.x

> Improve handling of failed prepared statement loading
> -
>
> Key: CASSANDRA-16442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16442
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.11.x
>
>
> In an internal DEV cluster, when going from 3.0 to 3.11 we have seen the 
> following WARN logs constantly upon Cassandra startup.
> {noformat}
> ...
> WARN [main] 2021-02-05 09:25:06,892 QueryProcessor.java:160 - prepared 
> statement recreation error: SELECT n,v FROM "Ts2Volatile60Min" WHERE k=? 
> LIMIT ?;
> WARN [main] 2021-02-05 09:25:06,895 QueryProcessor.java:160 - prepared 
> statement recreation error: INSERT INTO "Ts2Final01Min" (k,n,v) VALUES 
> (?,?,?) USING TIMESTAMP ?;
> ...
> {noformat}
> I guess 3.11 tries to pre-load prepared statements for tables which don't 
> exist anymore. On how we got into this situation was our fault I think (Cas 
> 3.0 => Upgrade 3.11 => Downgrade 3.0 => with 3.0 some tables got dropped => 
> Upgrade 3.11.10).
> Still, perhaps there is room for improvement when it comes to loading 
> persisted prepared statements, which might fail.
> I thought about:
> * An additional {{nodetool}} option to wipe the persisted prepared statement 
> cache
> * Perhaps even make the startup code smarter in a way, when loading of a 
> prepared statement fails, due to a table not being available anymore, then 
> auto-wipe such entries from the {{prepared_statements}} system table
> To get rid of the WARN log, I currently need to work directly on the 
> "prepared_statements" system table, but I don't know if it is safe to run 
> e.g. a TRUNCATE statement, thus currently, it seems we need to take each node 
> offline, execute a Linux {{rm}} command on SSTables for the system table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16442) Improve handling of failed prepared statement loading

2021-02-11 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-16442:
--

 Summary: Improve handling of failed prepared statement loading
 Key: CASSANDRA-16442
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16442
 Project: Cassandra
  Issue Type: Improvement
Reporter: Thomas Steinmaurer


In an internal DEV cluster, when going from 3.0 to 3.11 we have seen the 
following WARN logs constantly upon Cassandra startup.
{noformat}
...
WARN [main] 2021-02-05 09:25:06,892 QueryProcessor.java:160 - prepared 
statement recreation error: SELECT n,v FROM "Ts2Volatile60Min" WHERE k=? LIMIT 
?;
WARN [main] 2021-02-05 09:25:06,895 QueryProcessor.java:160 - prepared 
statement recreation error: INSERT INTO "Ts2Final01Min" (k,n,v) VALUES (?,?,?) 
USING TIMESTAMP ?;
...
{noformat}

I guess 3.11 tries to pre-load prepared statements for tables which don't exist 
anymore. On how we got into this situation was our fault I think (Cas 3.0 => 
Upgrade 3.11 => Downgrade 3.0 => with 3.0 some tables got dropped => Upgrade 
3.11.10).

Still, perhaps there is room for improvement when it comes to loading persisted 
prepared statements, which might fail.

I thought about:
* An additional {{nodetool}} option to wipe the persisted prepared statement 
cache
* Perhaps even make the startup code smarter in a way, when loading of a 
prepared statement fails, due to a table not being available anymore, then 
auto-wipe such entries from the {{prepared_statements}} system table

To get rid of the WARN log, I currently need to work directly on the 
"prepared_statements" system table, but I don't know if it is safe to run e.g. 
a TRUNCATE statement, thus currently, it seems we need to take each node 
offline, execute a Linux {{rm}} command on SSTables for the system table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2021-01-14 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264975#comment-17264975
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


Any ideas if this will make it into 3.0.24?

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14701) Cleanup (and other) compaction type(s) not counted in compaction remaining time

2020-12-09 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14701:
---
Fix Version/s: 3.11.x
   3.0.x

> Cleanup (and other) compaction type(s) not counted in compaction remaining 
> time
> ---
>
> Key: CASSANDRA-14701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14701
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> Opened a ticket, as discussed in user list.
> Looks like compaction remaining time only includes compactions of type 
> COMPACTION and other compaction types like cleanup etc. aren't part of the 
> estimation calculation.
> E.g. from one of our environments:
> {noformat}
> nodetool compactionstats -H
> pending tasks: 1
>compaction type   keyspace   table   completed totalunit   
> progress
>CleanupXXX YYY   908.16 GB   1.13 TB   bytes   
>   78.63%
> Active compaction remaining time :   0h00m00s
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject repairs with anti-compaction

2020-12-09 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Description: 
We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
Cassandra repair area changed significantly / got more complex. Beside 
incremental repairs not working reliably, also full repairs (-full command-line 
option) are running into anti-compaction code paths, splitting repaired / 
non-repaired data into separate SSTables, even with full repairs.

Casandra 4.x (with repair enhancements) is quite away for us (for production 
usage), thus we want to avoid anti-compactions with Cassandra 3.x at any cost. 
Especially for our on-premise installations at our customer sites, with less 
control over on how e.g. nodetool is used, we simply want to have a 
configuration parameter in e.g. cassandra.yaml, which we could use to reject 
any repair invocations that results in anti-compaction being active. 

I know, such a flag still can be flipped then (by the customer), but as a first 
safety stage possibly sufficient enough to reject anti-compaction repairs, e.g. 
if someone executes nodetool repair ... the wrong way (accidentally).


  was:
We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
Cassandra repair area changed significantly / got more complex. Beside 
incremental repairs not working reliably, also full repairs (-full command-line 
option) are running into anti-compaction code paths, splitting repaired / 
non-repaired data into separate SSTables, even with full repairs.

Casandra 4.x (with repair enhancements) is quite away for us (for production 
usage), thus we want to avoid anti-compactions with Cassandra 3.x at any cost. 
Especially for our on-premise installations at our customer sites, with less 
control over on how e.g. nodetool is used, we simply want to have a 
configuration parameter in e.g. cassandra.yaml, which we could use to reject 
any repair invocations that results in anti-compaction being active. 

I know, such a flag still can be flipped then (by the customer), but as a first 
safety stage possibly sufficient enough to reject anti-compaction repairs.



> Global configuration parameter to reject repairs with anti-compaction
> -
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Local/Config
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
> Cassandra repair area changed significantly / got more complex. Beside 
> incremental repairs not working reliably, also full repairs (-full 
> command-line option) are running into anti-compaction code paths, splitting 
> repaired / non-repaired data into separate SSTables, even with full repairs.
> Casandra 4.x (with repair enhancements) is quite away for us (for production 
> usage), thus we want to avoid anti-compactions with Cassandra 3.x at any 
> cost. Especially for our on-premise installations at our customer sites, with 
> less control over on how e.g. nodetool is used, we simply want to have a 
> configuration parameter in e.g. cassandra.yaml, which we could use to reject 
> any repair invocations that results in anti-compaction being active. 
> I know, such a flag still can be flipped then (by the customer), but as a 
> first safety stage possibly sufficient enough to reject anti-compaction 
> repairs, e.g. if someone executes nodetool repair ... the wrong way 
> (accidentally).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject repairs with anti-compaction

2020-12-09 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Fix Version/s: (was: 4.x)
   (was: 2.2.x)

> Global configuration parameter to reject repairs with anti-compaction
> -
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Local/Config
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
> Cassandra repair area changed significantly / got more complex. Beside 
> incremental repairs not working reliably, also full repairs (-full 
> command-line option) are running into anti-compaction code paths, splitting 
> repaired / non-repaired data into separate SSTables, even with full repairs.
> Casandra 4.x (with repair enhancements) is quite away for us (for production 
> usage), thus we want to avoid anti-compactions with Cassandra 3.x at any 
> cost. Especially for our on-premise installations at our customer sites, with 
> less control over on how e.g. nodetool is used, we simply want to have a 
> configuration parameter in e.g. cassandra.yaml, which we could use to reject 
> any repair invocations that results in anti-compaction being active. 
> I know, such a flag still can be flipped then (by the customer), but as a 
> first safety stage possibly sufficient enough to reject anti-compaction 
> repairs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject repairs with anti-compaction

2020-12-09 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Description: 
We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
Cassandra repair area changed significantly / got more complex. Beside 
incremental repairs not working reliably, also full repairs (-full command-line 
option) are running into anti-compaction code paths, splitting repaired / 
non-repaired data into separate SSTables, even with full repairs.

Casandra 4.x (with repair enhancements) is quite away for us (for production 
usage), thus we want to avoid anti-compactions with Cassandra 3.x at any cost. 
Especially for our on-premise installations at our customer sites, with less 
control over on how e.g. nodetool is used, we simply want to have a 
configuration parameter in e.g. cassandra.yaml, which we could use to reject 
any repair invocations that results in anti-compaction being active. 

I know, such a flag still can be flipped then (by the customer), but as a first 
safety stage possibly sufficient enough to reject anti-compaction repairs.


  was:
We are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.0/3.11 in pre-production stages including loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the _-full_ command-line 
option available since 2.2 (?)
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these examples are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

With troubles incremental repair are introducing and incremental being the 
default since 2.2 (?), what do you think about a JVM system property, 
cassandra.yaml setting or whatever … to basically let the cluster administrator 
chose if incremental repairs are allowed or not? I know, such a flag still can 
be flipped then (by the customer), but as a first safety stage possibly 
sufficient enough.



> Global configuration parameter to reject repairs with anti-compaction
> -
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Local/Config
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the 
> Cassandra repair area changed significantly / got more complex. Beside 
> incremental repairs not working reliably, also full repairs (-full 
> command-line option) are running into anti-compaction code paths, splitting 
> repaired / non-repaired data into separate SSTables, even with full repairs.
> Casandra 4.x (with repair enhancements) is quite away for us (for production 
> usage), thus we want to avoid anti-compactions with Cassandra 3.x at any 
> cost. Especially for our on-premise installations at our customer sites, with 
> less control over on how e.g. nodetool is used, we simply want to have a 
> configuration parameter in e.g. cassandra.yaml, which we could use to reject 
> any repair invocations that results in anti-compaction being active. 
> I know, such a flag still can be flipped then (by the customer), but as a 
> first safety stage possibly sufficient enough to reject anti-compaction 
> repairs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject repairs with anti-compaction

2020-12-09 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Summary: Global configuration parameter to reject repairs with 
anti-compaction  (was: Global configuration parameter to reject increment 
repair and allow full repair only)

> Global configuration parameter to reject repairs with anti-compaction
> -
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Local/Config
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> We are running Cassandra in AWS and On-Premise at customer sites, currently 
> 2.1 in production with 3.0/3.11 in pre-production stages including loadtest.
> In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
> we end up in incremental repairs being enabled / ran a first time 
> unintentionally, cause:
> a) A lot of online resources / examples do not use the _-full_ command-line 
> option available since 2.2 (?)
> b) Our internal (support) tickets of course also state nodetool repair 
> command without the -full option, as these examples are for 2.1
> Especially for On-Premise customers (with less control than with our AWS 
> deployments), this asks a bit for getting out-of-control once we have 3.11 
> out and nodetool repair being run without the -full command-line option.
> With troubles incremental repair are introducing and incremental being the 
> default since 2.2 (?), what do you think about a JVM system property, 
> cassandra.yaml setting or whatever … to basically let the cluster 
> administrator chose if incremental repairs are allowed or not? I know, such a 
> flag still can be flipped then (by the customer), but as a first safety stage 
> possibly sufficient enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-11-25 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239087#comment-17239087
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


Do we have an ETA for the patch being included/merged?

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15563) Backport removal of OpenJDK warning log

2020-11-18 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15563:
---
Description: As requested on ASF Slack, creating this ticket for a backport 
of CASSANDRA-13916 for 3.0.  (was: As requested on Slack, creating this ticket 
for a backport of CASSANDRA-13916 for 3.0.)

> Backport removal of OpenJDK warning log
> ---
>
> Key: CASSANDRA-15563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15563
> Project: Cassandra
>  Issue Type: Task
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x
>
>
> As requested on ASF Slack, creating this ticket for a backport of 
> CASSANDRA-13916 for 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15563) Backport removal of OpenJDK warning log

2020-11-18 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15563:
---
Description: As requested on Slack, creating this ticket for a backport of 
CASSANDRA-13916 for 3.0.  (was: As requested on Slack, creating this ticket for 
a backport of CASSANDRA-13916, potentially to 2.2 and 3.0.)

> Backport removal of OpenJDK warning log
> ---
>
> Key: CASSANDRA-15563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15563
> Project: Cassandra
>  Issue Type: Task
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x
>
>
> As requested on Slack, creating this ticket for a backport of CASSANDRA-13916 
> for 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15563) Backport removal of OpenJDK warning log

2020-11-18 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15563:
---
Fix Version/s: (was: 2.2.x)

> Backport removal of OpenJDK warning log
> ---
>
> Key: CASSANDRA-15563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15563
> Project: Cassandra
>  Issue Type: Task
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x
>
>
> As requested on Slack, creating this ticket for a backport of 
> CASSANDRA-13916, potentially to 2.2 and 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13900) Massive GC suspension increase after updating to 3.0.14 from 2.1.18

2020-11-18 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13900:
---
Resolution: Duplicate
Status: Resolved  (was: Open)

DUP of CASSANDRA-16201

> Massive GC suspension increase after updating to 3.0.14 from 2.1.18
> ---
>
> Key: CASSANDRA-13900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Thomas Steinmaurer
>Priority: Urgent
> Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg, 
> cassandra_3.11.0_min_memory_utilization.jpg
>
>
> In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process 
> the same incoming write load on the same infrastructure anymore.
> We have a loadtest environment running 24x7 testing our software using 
> Cassandra as backend. Both, loadtest and production is hosted in AWS and do 
> have the same spec on the Cassandra-side, namely:
> * 9x m4.xlarge
> * 8G heap
> * CMS (400MB newgen)
> * 2TB EBS gp2
> * Client requests are entirely CQL
> per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster 
> AVG with constant, simulated load running against our cluster, using 
> Cassandra 2.1 for > 2 years now.
> Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, 
> and basically, 3.0.14 isn't able to cope with the load anymore. No particular 
> special tweaks, memory settings/changes etc., all the same as in 2.1.18. We 
> also didn't upgrade sstables yet, thus the increase mentioned in the 
> screenshot is not related to any manually triggered maintenance operation 
> after upgrading to 3.0.14.
> According to our monitoring, with 3.0.14, we see a *GC suspension time 
> increase by a factor of > 2*, of course directly correlating with an CPU 
> increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg"
> This all means that our incoming load against 2.1.18 is something, 3.0.14 
> can't handle. So, we would need to either scale up (e.g. m4.xlarge => 
> m4.2xlarge) or scale out for being able to handle the same load, which is 
> cost-wise not an option.
> Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the 
> mentioned load, but can provide JFR session for our current 3.0.14 setup. The 
> attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows 
> compaction being the top contributor for the captured 5min time-frame. Could 
> be by "accident" covering the 5min with compaction as top contributor only 
> (although mentioned simulated client load is attached), but according to 
> stack traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() 
> etc. popping up as top contributor, thus possibly new classes / data 
> structures are causing much more object churn now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-11-04 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225957#comment-17225957
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-16201 at 11/4/20, 9:56 AM:
--

[~mck], thanks a lot for the extensive follow-up. In our tests, from where the 
actual JFR files come from, we now see, that Cassandra 3.11 and Cassandra 4.0 
is basically on the same level than 3.0 again, or even slightly better than 
3.0, but 2.1 unbeaten :-)

Do you see any further improvements in regard to 2.1 vs. 3.0/3.11/4.0? 
Following chart is an AVG for last 24hrs on a bunch of metrics, for all 
versions with the patch applied for 3.0/3.11/4.0, processing the same ingest. 
The only main difference here is, that 2.1 is using STCS for our timeseries 
tables, whereas 3.0+ is using TWCS.
 !screenshot-4.png|width=100%! 

So, in short:
|| ||Cassandra 2.1||Cassandra 3.0 Patched (Rel. diff to 2.1)||Cassandra 3.11 
Patched (Rel. diff to 2.1)||Cassandra 4.0 Patched (Rel. diff to 2.1)||
|AVG CPU|52,86%|61,43% (+16,2%)|61,04% (+15,5%)|75,06% (+42%)|
|AVG Suspension|3,76%|6,13% (+63%)|5,74% (+52,7%)|5,60% (+48,9%)|

But for *Cassandra 3.11* and *Cassandra 4.0*, this was a huge step forward! 
Thanks a lot!


was (Author: tsteinmaurer):
[~mck], thanks a lot for the extensive follow-up. In our tests, from where the 
actual JFR files come from, we now see, that Cassandra 3.11 and Cassandra 4.0 
is basically on the same level than 3.0 again, or even slightly better than 
3.0, but 2.1 unbeaten :-)

Do you see any further improvements in regard to 2.1 vs. 3.0/3.11/4.0? 
Following chart is an AVG for last 24hrs on a bunch of metrics, for all 
versions with the patch applied for 3.0/3.11/4.0, processing the same ingest. 
The only main difference here is, that 2.1 is using STCS for our timeseries 
tables, whereas 3.0+ is using TWCS.
 !screenshot-4.png|width=100%! 

So, in short:
|| ||Cassandra 2.1||Cassandra 3.0 Patched (Rel. diff to 2.1)||Cassandra 3.11 
Patched (Rel. diff to 2.1)||Cassandra 4.0 Patched (Rel. diff to 2.1)||
|AVG CPU|52,86%|61,43% (+16,2%)|61,04% (+15,5%)|75,06% (+42%)|
|AVG Suspension|3,76%|6,13% (+63%)|5,74% (+52,7%)|5,60% (+48,9%)|

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-11-04 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225957#comment-17225957
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


[~mck], thanks a lot for the extensive follow-up. In our tests, from where the 
actual JFR files come from, we now see, that Cassandra 3.11 and Cassandra 4.0 
is basically on the same level than 3.0 again, or even slightly better than 
3.0, but 2.1 unbeaten :-)

Do you see any further improvements in regard to 2.1 vs. 3.0/3.11/4.0? 
Following chart is an AVG for last 24hrs on a bunch of metrics, for all 
versions with the patch applied for 3.0/3.11/4.0, processing the same ingest. 
The only main difference here is, that 2.1 is using STCS for our timeseries 
tables, whereas 3.0+ is using TWCS.
 !screenshot-4.png|width=100%! 

So, in short:
|| ||Cassandra 2.1||Cassandra 3.0 Patched (Rel. diff to 2.1)||Cassandra 3.11 
Patched (Rel. diff to 2.1)||Cassandra 4.0 Patched (Rel. diff to 2.1)||
|AVG CPU|52,86%|61,43% (+16,2%)|61,04% (+15,5%)|75,06% (+42%)|
|AVG Suspension|3,76%|6,13% (+63%)|5,74% (+52,7%)|5,60% (+48,9%)|

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-11-04 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Attachment: screenshot-4.png

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-30 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223585#comment-17223585
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


{quote}
I will keep that running over the night and provide another set of JFR files 
for 3.0, 3.11 and 4.0 with the patch.
{quote}

[~mck], in my provided OneDrive share provided on Oct 12, 2020 to you, there is 
now an additional sub-directory called {{_perffixes_jfr_20201027}}, which 
contains a new set of JFR files for all versions (including 2.1), with the 
patch applied to 3.0, 3.11 and 4.0.

[~marcuse], let me know if/how I could share the new JFR files with you as well.

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-21 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218301#comment-17218301
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


[~marcuse], first impression from our comparison infrastructure regarding the 
3.0, 3.11 and 4.0 patches.

When having a look on 2 high-level metrics:
* JVM suspension, marked as "1" in the dashboard below
* Cassandra dropped messages, marked as "2" in the dashboard below

 !screenshot-3.png|width=100%! 

* Cassandra 3.0: No positive impact on suspension
* Cassandra 3.11: Huge positive impact on suspension
* Cassandra 4.0: Huge positive impact on suspension + no dropped messages with 
the patch

I will keep that running over the night and provide another set of JFR files 
for 3.0, 3.11 and 4.0 with the patch.

Thanks for your efforts!


> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-21 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Attachment: screenshot-3.png

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-13 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213000#comment-17213000
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~mck], thanks!

In Cassandra 3.0. I see in {{BTree}}.
{noformat}
public static  Builder builder(Comparator comparator, int 
initialCapacity)
{
return new Builder<>(comparator);
}
{noformat}

that this is basically missing forwarding the provided {{initialCapacity}} in 
the {{new Builder ...}} call. Not doing that potentially creates the used 
{{Object[]}} at a too small size resulting in many resizing operations during 
the life-time of the {{Object[]}}, correct? Once propagating 
{{initialCapacity}} (added in 3.11+, thus the backport to 3.0.x), we then start 
to hit CASSANDRA-16201, so I understand we need both for 3.0.x.

What I don't understand yet (or perhaps not looked closely enough) is, how 
{{MultiCBuilder.build()}} could benefit from that, cause it won't call 
{{BTreeSet.builder}} with any sort of {{initialCapacity}} information, thus 
falling back to the default {{Object[]}} size of 16.

Thanks again.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian 

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-13 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212876#comment-17212876
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~mck], while I somehow understand the relation to my recently reported 
CASSANRA-16201, can you please help me to understand why CASSANDRA-13929 seems 
to be related? Thanks.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-12 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210238#comment-17210238
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-16201 at 10/13/20, 5:06 AM:
---

[~marcuse], yes I think so. :-) TRUNK, locally checked out, calling hierarchy 
from {{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png|width=100%! 

Thanks again.


was (Author: tsteinmaurer):
[~marcuse], yes I think so. :-) TRUNK, locally checked out, calling hierarchy 
from {{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png! 

Thanks again.

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-12 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Description: 
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png|width=100%! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.

This sort of pre-allocation is causing a lot of memory pressure.


  was:
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.

This sort of pre-allocation is causing a lot of memory pressure.



> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212408#comment-17212408
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


Sent [~mck] a fresh set of JFR files today from our recent 2.1.18 / 3.0.20 / 
3.11.8 / 4.0 Beta2 testing.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210249#comment-17210249
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


[~dcapwell], code screen above is from local TRUNK, thus not strictly Beta2. 
[~marcuse] already contacted me via Slack. Thanks for your attention

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210238#comment-17210238
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-16201 at 10/8/20, 2:14 PM:
--

[~marcuse], yes I think so. :-) TRUNK, locally checked out, calling hierarchy 
from {{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png! 

Thanks again.


was (Author: tsteinmaurer):
[~marcuse], yes I think so. :-) Locally checked out, calling hierarchy from 
{{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png! 

Thanks again.

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210238#comment-17210238
 ] 

Thomas Steinmaurer commented on CASSANDRA-16201:


[~marcuse], yes I think so. :-) Locally checked out, calling hierarchy from 
{{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png! 

Thanks again.

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Attachment: screenshot-2.png

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Description: 
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.

This sort of pre-allocation is causing a lot of memory pressure.


  was:
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.



> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Description: 
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.


  was:
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, likely 
related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.




> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Attachment: screenshot-1.png

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: screenshot-1.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, likely 
> related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-08 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-16201:
--

 Summary: Cassandra 4.0 b2 - OOM / memory pressure due to object 
array pre-allocations in BatchUpdatesCollector
 Key: CASSANDRA-16201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
 Project: Cassandra
  Issue Type: Bug
Reporter: Thomas Steinmaurer


In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, likely 
related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-06 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208986#comment-17208986
 ] 

Thomas Steinmaurer commented on CASSANDRA-16153:


[~brandon.williams], sorry for wasting your time. I have discovered that this 
is an issue on our side on how we start Cassandra. Feel free to close.

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: debug.log.2020-10-01.0.zip, system.log.2020-10-01.0.zip
>
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-04 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16153:
---
Attachment: debug.log.2020-10-01.0.zip

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: debug.log.2020-10-01.0.zip, system.log.2020-10-01.0.zip
>
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-04 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207828#comment-17207828
 ] 

Thomas Steinmaurer commented on CASSANDRA-16153:


Sure. Attached [^debug.log.2020-10-01.0.zip]

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: debug.log.2020-10-01.0.zip, system.log.2020-10-01.0.zip
>
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-01 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205973#comment-17205973
 ] 

Thomas Steinmaurer commented on CASSANDRA-16153:


[^system.log.2020-10-01.0.zip]

Search for:
{noformat}
...
INFO  [main] 2020-10-01 06:17:53,135 CassandraDaemon.java:507 - Hostname: 
ip-X-Y-68-230:7000:7001
...
{noformat}

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: system.log.2020-10-01.0.zip
>
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-01 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16153:
---
Attachment: system.log.2020-10-01.0.zip

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: system.log.2020-10-01.0.zip
>
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-01 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205967#comment-17205967
 ] 

Thomas Steinmaurer commented on CASSANDRA-16153:


[~brandon.williams], no. 4 vCores (m5.xlarge).

> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-10-01 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16153:
---
Description: 
Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
{noformat}
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2018.03"
PRETTY_NAME="Amazon Linux AMI 2018.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
{noformat}

It seems the Cassandra JVM results in using Parallel GC.
{noformat}
INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 5726724752 
-> 2581334376; PS Survivor Space: 363850224 -> 0
{noformat}

Although {{jvm8-server.options}} is using CMS.
{noformat}
#
#  GC SETTINGS  #
#

### CMS Settings
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
-XX:+CMSClassUnloadingEnabled
...
{noformat}

In Cassandra 3, default has been CMS.

So, possibly there is something wrong in reading/processing 
{{jvm8-server.options}}?

  was:
Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) on Ubuntu 18.04 LTS. 
It seems the Cassandra JVM results in using Parallel GC. In Cassandra 3, 
default has been CMS.

Digging a bit further, it seems like the {{jvm8-server.options}} resp. 
{{jvm11-server.options}} files aren't used/processed in e.g. 
{{cassandra-env.sh}}.

E.g. in Cassandra 3.11, here we something like that in {{cassandra-env.sh}}.
{noformat}
# Read user-defined JVM options from jvm.options file
JVM_OPTS_FILE=$CASSANDRA_CONF/jvm.options
for opt in `grep "^-" $JVM_OPTS_FILE`
do
  JVM_OPTS="$JVM_OPTS $opt"
done
{noformat}

Can't find something similar in {{cassandra-env.sh}} for Cassandra 4 beta2.


> Cassandra 4b2 - JVM options from *.options not read/set
> ---
>
> Key: CASSANDRA-16153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Scripts
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) in AWS.
> {noformat}
> NAME="Amazon Linux AMI"
> VERSION="2018.03"
> ID="amzn"
> ID_LIKE="rhel fedora"
> VERSION_ID="2018.03"
> PRETTY_NAME="Amazon Linux AMI 2018.03"
> ANSI_COLOR="0;33"
> CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
> HOME_URL="http://aws.amazon.com/amazon-linux-ami/;
> {noformat}
> It seems the Cassandra JVM results in using Parallel GC.
> {noformat}
> INFO  [Service Thread] 2020-10-01 00:00:56,233 GCInspector.java:299 - PS 
> Scavenge GC in 541ms.  PS Old Gen: 5152844776 -> 5726724752;
> WARN  [Service Thread] 2020-10-01 00:00:56,234 GCInspector.java:297 - PS 
> MarkSweep GC in 1969ms.  PS Eden Space: 2111307776 -> 0; PS Old Gen: 
> 5726724752 -> 2581334376; PS Survivor Space: 363850224 -> 0
> {noformat}
> Although {{jvm8-server.options}} is using CMS.
> {noformat}
> #
> #  GC SETTINGS  #
> #
> ### CMS Settings
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> ## some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
> -XX:+CMSClassUnloadingEnabled
> ...
> {noformat}
> In Cassandra 3, default has been CMS.
> So, possibly there is something wrong in reading/processing 
> {{jvm8-server.options}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16153) Cassandra 4b2 - JVM options from *.options not read/set

2020-09-30 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-16153:
--

 Summary: Cassandra 4b2 - JVM options from *.options not read/set
 Key: CASSANDRA-16153
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16153
 Project: Cassandra
  Issue Type: Bug
  Components: Local/Scripts
Reporter: Thomas Steinmaurer


Trying out Cassandra 4 beta 2 with Java 8 (AdoptOpenJDK) on Ubuntu 18.04 LTS. 
It seems the Cassandra JVM results in using Parallel GC. In Cassandra 3, 
default has been CMS.

Digging a bit further, it seems like the {{jvm8-server.options}} resp. 
{{jvm11-server.options}} files aren't used/processed in e.g. 
{{cassandra-env.sh}}.

E.g. in Cassandra 3.11, here we something like that in {{cassandra-env.sh}}.
{noformat}
# Read user-defined JVM options from jvm.options file
JVM_OPTS_FILE=$CASSANDRA_CONF/jvm.options
for opt in `grep "^-" $JVM_OPTS_FILE`
do
  JVM_OPTS="$JVM_OPTS $opt"
done
{noformat}

Can't find something similar in {{cassandra-env.sh}} for Cassandra 4 beta2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15563) Backport removal of OpenJDK warning log

2020-02-10 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15563:
---
Summary: Backport removal of OpenJDK warning log  (was: Backport OpenJDK 
warning log)

> Backport removal of OpenJDK warning log
> ---
>
> Key: CASSANDRA-15563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15563
> Project: Cassandra
>  Issue Type: Task
>Reporter: Thomas Steinmaurer
>Priority: Normal
>
> As requested on Slack, creating this ticket for a backport of 
> CASSANDRA-13916, potentially to 2.2 and 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15563) Backport OpenJDK warning log

2020-02-10 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-15563:
--

 Summary: Backport OpenJDK warning log
 Key: CASSANDRA-15563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15563
 Project: Cassandra
  Issue Type: Task
Reporter: Thomas Steinmaurer


As requested on Slack, creating this ticket for a backport of CASSANDRA-13916, 
potentially to 2.2 and 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-23 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022712#comment-17022712
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-15430 at 1/24/20 6:07 AM:
-

[~benedict], in the previously provided OneDrive Link, I have put another JFR 
(sub-directory {{full_on_cas_3.0.18}}) where the entire cluster was on 3.0.18, 
thus any {{LegacyLayout}} related signs in the stack traces should be gone. I 
see no reason to open another ticket for that, cause it does not change the 
situation, that the write path churns a lot (compared to 2.1).

Thanks!


was (Author: tsteinmaurer):
[~benedict], in the previously provided OneDrive Link, I have put another JFR 
where the entire cluster was on 3.0.18, thus any {{LegacyLayout}} related signs 
in the stack traces should be gone. I see no reason to open another ticket for 
that, cause it does not change the situation, that the write path churns a lot 
(compared to 2.1).

Thanks!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-23 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022712#comment-17022712
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~benedict], in the previously provided OneDrive Link, I have put another JFR 
where the entire cluster was on 3.0.18, thus any {{LegacyLayout}} related signs 
in the stack traces should be gone. I see no reason to open another ticket for 
that, cause it does not change the situation, that the write path churns a lot 
(compared to 2.1).

Thanks!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017990#comment-17017990
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-15430 at 1/17/20 1:09 PM:
-

[~benedict], thanks!

Just another internal high-level overview on our 3.0 experience compared to 2.1 
when it comes to the write path.
 !screenshot-4.png! 

"Timeseries (TS)" written is simply our "payload" (data point) we are ingesting 
in a 6 node load test environment. While 2.1.18 was able to handle ~ 2 mio TS 
payloads / min / Cassandra JVM at 3-4% GC suspension and 25% CPU usage on a 64 
vCore box (32 physical cores) without dropping mutation messages, 3.0.18 looks 
much worse. All 3.0.18 based tests in the table have been done without being in 
a mixed Cassandra binary version scenario.

So, any low-hanging fruit would be much appreciated. :-)


was (Author: tsteinmaurer):
[~benedict], thanks!

Just another internal high-level overview on our 3.0 experience compared to 2.1 
when it comes to the write patch.
 !screenshot-4.png! 

"Timeseries (TS)" written is simply our "payload" (data point) we are ingesting 
in a 6 node load test environment. While 2.1.18 was able to handle ~ 2 mio TS 
payloads / min / Cassandra JVM at 3-4% GC suspension and 25% CPU usage on a 64 
vCore box (32 physical cores) without dropping mutation messages, 3.0.18 looks 
much worse. All 3.0.18 based tests in the table have been done without being in 
a mixed Cassandra binary version scenario.

So, any low-hanging fruit would be much appreciated. :-)

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs 

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017990#comment-17017990
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~benedict], thanks!

Just another internal high-level overview on our 3.0 experience compared to 2.1 
when it comes to the write patch.
 !screenshot-4.png! 

"Timeseries (TS)" written is simply our "payload" (data point) we are ingesting 
in a 6 node load test environment. While 2.1.18 was able to handle ~ 2 mio TS 
payloads / min / Cassandra JVM at 3-4% GC suspension and 25% CPU usage on a 64 
vCore box (32 physical cores) without dropping mutation messages, 3.0.18 looks 
much worse. All 3.0.18 based tests in the table have been done without being in 
a mixed Cassandra binary version scenario.

So, any low-hanging fruit would be much appreciated. :-)

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Attachment: screenshot-4.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017955#comment-17017955
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~benedict], as discussed, a few more JFR-based screens to better show the 
differences, until you find time to open JFR files yourself. Again, it is all 
about the write path (only) executing batch messages. Both had the same *JFR 
duration*, namely *15min*.

Cassandra *2.1.18*:
* BatchMessage.execute - In total: 42,76 GByte with next level top contributors:
** BatchStatement.getMutations => 19,46 GByte
** BatchStatement.executeWithoutConditions => 16,97 GByte
 !screenshot-1.png! 

Cassandra *3.0.18*:
* BatchMessage.execute - In total: 451,86 GByte (factor 10 more) with next 
level top contributors:
** BatchStatement.executeWithoutConditions => 214,23 GByte
** BatchStatement.getMutations => 205,52 GByte

For *3.0.18*, more in-depth drill-down for 
BatchMessage.executeWithoutConditions (214,23 GByte)
 !screenshot-2.png! 
resp.: BatchMessage.getMutations (205,52 GByte)
 !screenshot-3.png! 


A bit hard to give sufficient details with screen shots, so likely it would be 
simply the best option, to work directly with the provided JFR files.

Thanks a lot!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional 

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Attachment: screenshot-3.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Attachment: screenshot-2.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-17 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Attachment: screenshot-1.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-15 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-15430 at 1/16/20 7:57 AM:
-

[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same write load sufficiently. Thanks for any help in that area!


was (Author: tsteinmaurer):
[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same load sufficiently. Thanks for any help in that area!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard 

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-15 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same load sufficiently. Thanks for any help in that area!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2019-11-15 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Description: 
In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
production-like workload constantly and sufficiently. After upgrading one node 
to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
regression described below), 3.0.18 is showing increased CPU usage, increase 
GC, high mutation stage pending tasks, dropped mutation messages ...

Some spec. All 6 nodes equally sized:
 * Bare metal, 32 physical cores, 512G RAM
 * Xmx31G, G1, max pause millis = 2000ms
 * cassandra.yaml basically unchanged, thus same settings in regard to number 
of threads, compaction throttling etc.

Following dashboard shows highlighted areas (CPU, suspension) with metrics for 
all 6 nodes and the one outlier being the node upgraded to Cassandra 3.0.18.
 !dashboard.png|width=1280!

Additionally we see a large increase on pending tasks in the mutation stage 
after the upgrade:
 !mutation_stage.png!

And dropped mutation messages, also confirmed in the Cassandra log:
{noformat}
INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout and 
0 for cross node timeout
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
NameActive   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
MutationStage   256 81824 3360532756 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ViewMutationStage 0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ReadStage 0 0   62862266 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
RequestResponseStage  0 0 2176659856 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
ReadRepairStage   0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
CounterMutationStage  0 0  0 0  
   0
...
{noformat}
Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
node, high-level, it looks like the code path underneath 
{{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 3.0.18 
compared to 2.1.18.
 !jfr_allocations.png!

Left => 3.0.18
 Right => 2.1.18

JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
can upload them, if there is another destination available.

  was:
In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
production-like workload constantly and sufficiently. After upgrading one node 
to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
regression described below), 3.0.18 is showing increased CPU usage, increase 
GC, high mutation stage pending tasks, dropped mutation messages ...

Some spec. All 6 nodes equally sized:
 * Bare metal, 32 physical cores, 512G RAM
 * Xmx31G, G1, max pause millis = 2000ms
 * cassandra.yaml basically unchanged, thus some settings in regard to number 
of threads, compaction throttling etc.

Following dashboard shows highlighted areas (CPU, suspension) with metrics for 
all 6 nodes and the one outlier being the node upgraded to Cassandra 3.0.18.
 !dashboard.png|width=1280!

Additionally we see a large increase on pending tasks in the mutation stage 
after the upgrade:
 !mutation_stage.png!

And dropped mutation messages, also confirmed in the Cassandra log:
{noformat}
INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout and 
0 for cross node timeout
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
NameActive   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
MutationStage   256 81824 3360532756 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ViewMutationStage 0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ReadStage 0 0   62862266 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
RequestResponseStage  0 0 2176659856  

[jira] [Created] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2019-11-15 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-15430:
--

 Summary: Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap 
allocations compared to 2.1.18
 Key: CASSANDRA-15430
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
 Project: Cassandra
  Issue Type: Bug
Reporter: Thomas Steinmaurer
 Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png

In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
production-like workload constantly and sufficiently. After upgrading one node 
to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
regression described below), 3.0.18 is showing increased CPU usage, increase 
GC, high mutation stage pending tasks, dropped mutation messages ...

Some spec. All 6 nodes equally sized:
* Bare metal, 32 physical cores, 512G RAM
* Xmx31G, G1, max pause millis = 2000ms
* cassandra.yaml basically unchanged, thus some settings in regard to number of 
threads, compaction throttling etc.

Following dashboard shows highlighted areas (CPU, suspension) with metrics for 
all 6 nodes and the outlier being the node upgraded to Cassandra 3.0.18.
!dashboard.png|width=1280!

Additionally we see a large increase on pending tasks in the mutation stage 
after the upgrade:
!mutation_stage.png!

And dropped mutation messages, also confirmed in the Cassandra log:
{noformat}
INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout and 
0 for cross node timeout
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
NameActive   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
MutationStage   256 81824 3360532756 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ViewMutationStage 0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ReadStage 0 0   62862266 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
RequestResponseStage  0 0 2176659856 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
ReadRepairStage   0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
CounterMutationStage  0 0  0 0  
   0
...
{noformat}

Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
node, high-level, it looks like the code path underneath 
{{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 3.0.18 
compared to 2.1.18.
!jfr_allocations.png!

Left => 3.0.18
Right => 2.1.18

JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
can upload them, if there is another destination available.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2019-11-15 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15430:
---
Description: 
In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
production-like workload constantly and sufficiently. After upgrading one node 
to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
regression described below), 3.0.18 is showing increased CPU usage, increase 
GC, high mutation stage pending tasks, dropped mutation messages ...

Some spec. All 6 nodes equally sized:
 * Bare metal, 32 physical cores, 512G RAM
 * Xmx31G, G1, max pause millis = 2000ms
 * cassandra.yaml basically unchanged, thus some settings in regard to number 
of threads, compaction throttling etc.

Following dashboard shows highlighted areas (CPU, suspension) with metrics for 
all 6 nodes and the one outlier being the node upgraded to Cassandra 3.0.18.
 !dashboard.png|width=1280!

Additionally we see a large increase on pending tasks in the mutation stage 
after the upgrade:
 !mutation_stage.png!

And dropped mutation messages, also confirmed in the Cassandra log:
{noformat}
INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout and 
0 for cross node timeout
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
NameActive   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
MutationStage   256 81824 3360532756 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ViewMutationStage 0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ReadStage 0 0   62862266 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
RequestResponseStage  0 0 2176659856 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
ReadRepairStage   0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
CounterMutationStage  0 0  0 0  
   0
...
{noformat}
Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
node, high-level, it looks like the code path underneath 
{{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 3.0.18 
compared to 2.1.18.
 !jfr_allocations.png!

Left => 3.0.18
 Right => 2.1.18

JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
can upload them, if there is another destination available.

  was:
In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
production-like workload constantly and sufficiently. After upgrading one node 
to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
regression described below), 3.0.18 is showing increased CPU usage, increase 
GC, high mutation stage pending tasks, dropped mutation messages ...

Some spec. All 6 nodes equally sized:
* Bare metal, 32 physical cores, 512G RAM
* Xmx31G, G1, max pause millis = 2000ms
* cassandra.yaml basically unchanged, thus some settings in regard to number of 
threads, compaction throttling etc.

Following dashboard shows highlighted areas (CPU, suspension) with metrics for 
all 6 nodes and the outlier being the node upgraded to Cassandra 3.0.18.
!dashboard.png|width=1280!

Additionally we see a large increase on pending tasks in the mutation stage 
after the upgrade:
!mutation_stage.png!

And dropped mutation messages, also confirmed in the Cassandra log:
{noformat}
INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout and 
0 for cross node timeout
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
NameActive   Pending  Completed   Blocked  All Time 
Blocked
INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
MutationStage   256 81824 3360532756 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ViewMutationStage 0 0  0 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
ReadStage 0 0   62862266 0  
   0

INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
RequestResponseStage  0 0 2176659856 0 

[jira] [Updated] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-13 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15400:
---
Attachment: oldgen_increase_nov12.jpg

> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.20, 3.11.6, 4.0
>
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png, image.png, 
> oldgen_increase_nov12.jpg
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC, OpenJDK8u222
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
> startup (metric gap in the chart above), number of SSTables + pending 
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is 
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 
> MByte each
> We have been running with 2.1.18 for > 3 years and I can't remember dealing 
> with such OOM in the context of extending a cluster.
> While the MAT screens above are from our production cluster, we partly can 
> reproduce this behavior in our loadtest environment (although not going full 
> OOM there), thus I might be able to share a hprof from this non-prod 
> environment if needed.
> Thanks a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-13 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973109#comment-16973109
 ] 

Thomas Steinmaurer commented on CASSANDRA-15400:


 [~bdeggleston], thanks for the follow-up. Yes, causing quite some pain in prod 
in the moment, e.g. yesterday evening, close to running OOM again.
!oldgen_increase_nov12.jpg!

> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.20, 3.11.6, 4.0
>
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png, image.png, 
> oldgen_increase_nov12.jpg
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC, OpenJDK8u222
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
> startup (metric gap in the chart above), number of SSTables + pending 
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is 
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 
> MByte each
> We have been running with 2.1.18 for > 3 years and I can't remember dealing 
> with such OOM in the context of extending a cluster.
> While the MAT screens above are from our production cluster, we partly can 
> reproduce this behavior in our loadtest environment (although not going full 
> OOM there), thus I might be able to share a hprof from this non-prod 
> environment if needed.
> Thanks a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-08 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970154#comment-16970154
 ] 

Thomas Steinmaurer commented on CASSANDRA-15400:


[~bdeggleston], from ticket creation to a patch in ~ 24h. This is awesome! Many 
thanks.

* Out of curiosity, haven't looked too deep. I guess the patch does not make 
the content of the byte array smaller, but the capacity of the byte array 
basically in-sync with that and not 1MByte in general?
* Secondly, as 3.0.19 was released just recently, any ETA when a 3.0.20 public 
release might be available?

Again, many thanks.

> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.20, 3.11.6, 4.0
>
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC, OpenJDK8u222
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
> startup (metric gap in the chart above), number of SSTables + pending 
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is 
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 
> MByte each
> We have been running with 2.1.18 for > 3 years and I can't remember dealing 
> with such OOM in the context of extending a cluster.
> While the MAT screens above are from our production cluster, we partly can 
> reproduce this behavior in our loadtest environment (although not going full 
> OOM there), thus I might be able to share a hprof from this non-prod 
> environment if needed.
> Thanks a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15400:
---
Description: 
We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
cluster after several hours being successfully bootstrapped.

Running in AWS:
* m5.2xlarge, EBS SSD (gp2)
* Xms/Xmx12G, Xmn3G, CMS GC, OpenJDK8u222
* 4 compaction threads, throttling set to 32 MB/s

What we see is a steady increase in the OLD gen over many hours.
!cassandra_jvm_metrics.png!

* The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
* It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 31 
~ 07:00 also starting to be a member of serving client read requests
!cassandra_operationcount.png!

Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
constantly increased.

We see a correlation in increased number of SSTables and pending compactions.
!cassandra_sstables_pending_compactions.png!

Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
startup (metric gap in the chart above), number of SSTables + pending 
compactions is still high, but without facing memory troubles since then.

This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
BigTableReader instances with ~ 8.7GByte retained heap in total.
!cassandra_hprof_dominator_classes.png!

Having a closer look on a single object instance, seems like each instance is ~ 
2MByte in size.
!cassandra_hprof_bigtablereader_statsmetadata.png!
With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 MByte 
each

We have been running with 2.1.18 for > 3 years and I can't remember dealing 
with such OOM in the context of extending a cluster.

While the MAT screens above are from our production cluster, we partly can 
reproduce this behavior in our loadtest environment (although not going full 
OOM there), thus I might be able to share a hprof from this non-prod 
environment if needed.

Thanks a lot.





  was:
We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
cluster after several hours being successfully bootstrapped.

Running in AWS:
* m5.2xlarge, EBS SSD (gp2)
* Xms/Xmx12G, Xmn3G, CMS GC
* 4 compaction threads, throttling set to 32 MB/s

What we see is a steady increase in the OLD gen over many hours.
!cassandra_jvm_metrics.png!

* The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
* It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 31 
~ 07:00 also starting to be a member of serving client read requests
!cassandra_operationcount.png!

Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
constantly increased.

We see a correlation in increased number of SSTables and pending compactions.
!cassandra_sstables_pending_compactions.png!

Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
startup (metric gap in the chart above), number of SSTables + pending 
compactions is still high, but without facing memory troubles since then.

This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
BigTableReader instances with ~ 8.7GByte retained heap in total.
!cassandra_hprof_dominator_classes.png!

Having a closer look on a single object instance, seems like each instance is ~ 
2MByte in size.
!cassandra_hprof_bigtablereader_statsmetadata.png!
With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 MByte 
each

We have been running with 2.1.18 for > 3 years and I can't remember dealing 
with such OOM in the context of extending a cluster.

While the MAT screens above are from our production cluster, we partly can 
reproduce this behavior in our loadtest environment (although not going full 
OOM there), thus I might be able to share a hprof from this non-prod 
environment if needed.

Thanks a lot.






> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining 

[jira] [Comment Edited] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968762#comment-16968762
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-15400 at 11/6/19 10:18 PM:
--

[~marcuse], the data model has evolved starting with Astyanax/Thrift moved over 
to pure CQL3 access (without real data migration), but still with our own 
application-side serializer framework, working with byte buffers, thus BLOBs on 
the data model side.

Our high volume (usually > 1TByte per node, RF=3) CF/table looks like that, 
where we also see the majority of increasing number of pending compaction 
tasks, according to a per-CF JMX based self-monitoring:
{noformat}
CREATE TABLE ks.cf1 (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
...
;
{noformat}
Although we tend to also have single partitions in the area of > 100MByte, e.g. 
visible due to according compaction logs in the Cassandra log, all not being a 
real problem in practice with a heap of Xms/Xmx12G resp Xmn3G and Cas 2.1.

A few additional thoughts:
 * Likely the Cassandra node is utilizing most of the compaction threads (4 in 
this scenario with the m5.2xlarge instance type) with larger compactions on 
streamed data, giving less room for compactions of live data / actual writes 
while being in UJ, resulting in accessing much more smaller SSTables (looks 
like we have/had plenty in the area of 10-50MByte) then in UN starting to serve 
read requests
 * Is there anything known in Cas 3.0, which might result in streaming more 
data from other nodes compared to 2.1 resulting in increased compaction work to 
be done for newly joined nodes compared to 2.1
 * Is there anything known in Cas 3.0, which results in more frequent memtable 
flushes compared to 2.1, again resulting in increased compaction work
 * Talking about a single {{BigTableReader}} instance again, did anything 
change in regard to byte buffer pre-allocation at 1MByte in {{StatsMetadata}} 
per data member {{minClusteringValues}} and {{maxClusteringValues}} as shown in 
the hprof? Looks to me we potentially waste quite some on-heap memory here 
 !cassandra_hprof_statsmetadata.png|width=800!
* Is {{StatsMetadata}} purely on-heap? Or is it somehow pulled from off-heap 
first resulting in the 1MByte allocation, reminding me a bit on the NIO cache 
buffer bug 
(https://support.datastax.com/hc/en-us/articles/36863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache),
 with a recommendation setting it to exactly the number 
(-Djdk.nio.maxCachedBufferSize=1048576) we see in the hprof for the on-heap 
byte buffer

Number of compaction threads, compaction throttling is unchanged during the 
upgrade from 2.1 to 3.0 and if memory serves me well, we should see improved 
compaction throughput in 3.0 with the same throttling settings anyway.


was (Author: tsteinmaurer):
[~marcuse], the data model has evolved starting with Astyanax/Thrift moved over 
to pure CQL3 access (without real data migration), but still with our own 
application-side serializer framework, working with byte buffers, thus BLOBs on 
the data model side.

Our high volume (usually > 1TByte per node, RF=3) CF/table looks like that, 
where we also see the majority of increasing number of pending compaction 
tasks, according to a per-CF JMX based self-monitoring:
{noformat}
CREATE TABLE ks.cf1 (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
...
;
{noformat}
Although we tend to also have single partitions in the area of > 100MByte, e.g. 
visible due to according compaction logs in the Cassandra log, all not being a 
real problem in practice with a heap of Xms/Xmx12G resp Xmn3G and Cas 2.1.

A few additional thoughts:
 * Likely the Cassandra node is utilizing most of the compaction threads (4 in 
this scenario with the m5.2xlarge instance type) with larger compactions on 
streamed data, giving less room for compactions of live data / actual writes 
while being in UJ, resulting in accessing much more smaller SSTables (looks 
like we have/had plenty in the area of 10-50MByte) then in UN starting to serve 
read requests
 * Is there anything known in Cas 3.0, which might result in streaming more 
data from other nodes compared to 2.1 resulting in increased compaction work to 
be done for newly joined nodes compared to 2.1
 * Is there anything known in Cas 3.0, which results in more frequent memtable 
flushes compared to 2.1, again resulting in increased compaction work
 * Talking about a single {{BigTableReader}} instance again, did anything 
change in regard to byte buffer pre-allocation at 1MByte in {{StatsMetadata}} 
per data member {{minClusteringValues}} and {{maxClusteringValues}} as shown in 
the hprof? Looks to me we potentially waste quite some on-heap memory here 
 !cassandra_hprof_statsmetadata.png|width=800!
* Is 

[jira] [Commented] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968762#comment-16968762
 ] 

Thomas Steinmaurer commented on CASSANDRA-15400:


[~marcuse], the data model has evolved starting with Astyanax/Thrift moved over 
to pure CQL3 access (without real data migration), but still with our own 
application-side serializer framework, working with byte buffers, thus BLOBs on 
the data model side.

Our high volume (usually > 1TByte per node, RF=3) CF/table looks like that, 
where we also see the majority of increasing number of pending compaction 
tasks, according to a per-CF JMX based self-monitoring:
{noformat}
CREATE TABLE ks.cf1 (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
...
;
{noformat}
Although we tend to also have single partitions in the area of > 100MByte, e.g. 
visible due to according compaction logs in the Cassandra log, all not being a 
real problem in practice with a heap of Xms/Xmx12G resp Xmn3G and Cas 2.1.

A few additional thoughts:
 * Likely the Cassandra node is utilizing most of the compaction threads (4 in 
this scenario with the m5.2xlarge instance type) with larger compactions on 
streamed data, giving less room for compactions of live data / actual writes 
while being in UJ, resulting in accessing much more smaller SSTables (looks 
like we have/had plenty in the area of 10-50MByte) then in UN starting to serve 
read requests
 * Is there anything known in Cas 3.0, which might result in streaming more 
data from other nodes compared to 2.1 resulting in increased compaction work to 
be done for newly joined nodes compared to 2.1
 * Is there anything known in Cas 3.0, which results in more frequent memtable 
flushes compared to 2.1, again resulting in increased compaction work
 * Talking about a single {{BigTableReader}} instance again, did anything 
change in regard to byte buffer pre-allocation at 1MByte in {{StatsMetadata}} 
per data member {{minClusteringValues}} and {{maxClusteringValues}} as shown in 
the hprof? Looks to me we potentially waste quite some on-heap memory here 
 !cassandra_hprof_statsmetadata.png|width=800!
* Is {{StatsMetadata}} purely on-heap? Or is it somehow pulled from off-heap 
first resulting in the 1MByte allocation, reminding me a bit on the NIO cache 
buffer bug 
(https://support.datastax.com/hc/en-us/articles/36863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache),
 with a recommendation setting it to exactly the number 
(-Djdk.nio.maxCachedBufferSize=1048576) we see in the hprof for the on-heap 
byte buffer

Number of compaction threads, compaction throttling is unchanged during the 
upgrade from 2.1 to 3.0.

> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
> startup (metric gap in the chart above), number of SSTables + pending 
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is 
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 

[jira] [Updated] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15400:
---
Attachment: cassandra_hprof_statsmetadata.png

> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Assignee: Blake Eggleston
>Priority: Normal
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png, 
> cassandra_jvm_metrics.png, cassandra_operationcount.png, 
> cassandra_sstables_pending_compactions.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
> startup (metric gap in the chart above), number of SSTables + pending 
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is 
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 
> MByte each
> We have been running with 2.1.18 for > 3 years and I can't remember dealing 
> with such OOM in the context of extending a cluster.
> While the MAT screens above are from our production cluster, we partly can 
> reproduce this behavior in our loadtest environment (although not going full 
> OOM there), thus I might be able to share a hprof from this non-prod 
> environment if needed.
> Thanks a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-15400:
---
Description: 
We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
cluster after several hours being successfully bootstrapped.

Running in AWS:
* m5.2xlarge, EBS SSD (gp2)
* Xms/Xmx12G, Xmn3G, CMS GC
* 4 compaction threads, throttling set to 32 MB/s

What we see is a steady increase in the OLD gen over many hours.
!cassandra_jvm_metrics.png!

* The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
* It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 31 
~ 07:00 also starting to be a member of serving client read requests
!cassandra_operationcount.png!

Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
constantly increased.

We see a correlation in increased number of SSTables and pending compactions.
!cassandra_sstables_pending_compactions.png!

Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
startup (metric gap in the chart above), number of SSTables + pending 
compactions is still high, but without facing memory troubles since then.

This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
BigTableReader instances with ~ 8.7GByte retained heap in total.
!cassandra_hprof_dominator_classes.png!

Having a closer look on a single object instance, seems like each instance is ~ 
2MByte in size.
!cassandra_hprof_bigtablereader_statsmetadata.png!
With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 MByte 
each

We have been running with 2.1.18 for > 3 years and I can't remember dealing 
with such OOM in the context of extending a cluster.

While the MAT screens above are from our production cluster, we partly can 
reproduce this behavior in our loadtest environment (although not going full 
OOM there), thus I might be able to share a hprof from this non-prod 
environment if needed.

Thanks a lot.





  was:
We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
cluster after several hours being successfully bootstrapped.

Running in AWS:
* m5.2xlarge, EBS SSD (gp2)
* Xms/Xmx12G, Xmn3G, CMS GC
* 4 compaction threads, throttling set to 32 MB/s

What we see is a steady increase in the OLD gen over many hours.
!cassandra_jvm_metrics.png!

* The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
* It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 31 
~ 07:00 also starting to be a member of serving client read requests
!cassandra_operationcount.png!

Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
constantly increased.

We see a correlation in increased number of SSTables and pending compactions.
!cassandra_sstables_pending_compactions.png!

Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
startup (metric gap in the chart above), number of SSTables + pending 
compactions is still high, but without facing memory troubles since then.

This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
BigTableReader instances with ~ 8.7GByte retained hype in total.
!cassandra_hprof_dominator_classes.png!

Having a closer look on a single object instance, seems like each instance is ~ 
2MByte in size.
!cassandra_hprof_bigtablereader_statsmetadata.png!
With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 MByte 
each

We have been running with 2.1.18 for > 3 years and I can't remember dealing 
with such OOM in the context of extending a cluster.

While the MAT screens above are from our production cluster, we partly can 
reproduce this behavior in our loadtest environment (although not going full 
OOM there), thus I might be able to share a hprof from this non-prod 
environment if needed.

Thanks a lot.






> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
> cassandra_hprof_dominator_classes.png, cassandra_jvm_metrics.png, 
> cassandra_operationcount.png, cassandra_sstables_pending_compactions.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
> cluster after several hours being successfully bootstrapped.
> Running in 

[jira] [Created] (CASSANDRA-15400) Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Thomas Steinmaurer (Jira)
Thomas Steinmaurer created CASSANDRA-15400:
--

 Summary: Cassandra 3.0.18 went OOM several hours after joining a 
cluster
 Key: CASSANDRA-15400
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
 Project: Cassandra
  Issue Type: Bug
Reporter: Thomas Steinmaurer
 Attachments: cassandra_hprof_bigtablereader_statsmetadata.png, 
cassandra_hprof_dominator_classes.png, cassandra_jvm_metrics.png, 
cassandra_operationcount.png, cassandra_sstables_pending_compactions.png

We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been 
facing an OOM two times with 3.0.18 on newly added nodes joining an existing 
cluster after several hours being successfully bootstrapped.

Running in AWS:
* m5.2xlarge, EBS SSD (gp2)
* Xms/Xmx12G, Xmn3G, CMS GC
* 4 compaction threads, throttling set to 32 MB/s

What we see is a steady increase in the OLD gen over many hours.
!cassandra_jvm_metrics.png!

* The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
* It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct 31 
~ 07:00 also starting to be a member of serving client read requests
!cassandra_operationcount.png!

Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage 
constantly increased.

We see a correlation in increased number of SSTables and pending compactions.
!cassandra_sstables_pending_compactions.png!

Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra 
startup (metric gap in the chart above), number of SSTables + pending 
compactions is still high, but without facing memory troubles since then.

This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K 
BigTableReader instances with ~ 8.7GByte retained hype in total.
!cassandra_hprof_dominator_classes.png!

Having a closer look on a single object instance, seems like each instance is ~ 
2MByte in size.
!cassandra_hprof_bigtablereader_statsmetadata.png!
With 2 pre-allocated byte buffers (highlighted in the screen above) at 1 MByte 
each

We have been running with 2.1.18 for > 3 years and I can't remember dealing 
with such OOM in the context of extending a cluster.

While the MAT screens above are from our production cluster, we partly can 
reproduce this behavior in our loadtest environment (although not going full 
OOM there), thus I might be able to share a hprof from this non-prod 
environment if needed.

Thanks a lot.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14691) Cassandra 2.1 backport - The JVM should exit if jmx fails to bind

2018-09-11 Thread Thomas Steinmaurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610483#comment-16610483
 ] 

Thomas Steinmaurer commented on CASSANDRA-14691:


Well, sure, but how does this ticket about corruption e.g. compares to 
CASSANDRA-14284 included in 2.1.21 (corruption vs. crash)? Thought there might 
be e.g. 2.1.22 anyhow ... Anyway. I will now stop bothering. :-)

> Cassandra 2.1 backport - The JVM should exit if jmx fails to bind
> -
>
> Key: CASSANDRA-14691
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14691
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
>  Labels: lhf
> Fix For: 2.1.x
>
>
> If you are already running a cassandra instance, but for some reason try to 
> start another one, this happens:
> {noformat}
> INFO  20:57:09 JNA mlockall successful
> WARN  20:57:09 JMX is not enabled to receive remote connections. Please see 
> cassandra-env.sh for more info.
> ERROR 20:57:10 Error starting local jmx server:
> java.rmi.server.ExportException: Port already in use: 7199; nested exception 
> is:
> java.net.BindException: Address already in use
> at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:340) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:248) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) 
> ~[na:1.7.0_76]
> at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:207) 
> ~[na:1.7.0_76]
> at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:122) 
> ~[na:1.7.0_76]
> at sun.rmi.registry.RegistryImpl.(RegistryImpl.java:98) 
> ~[na:1.7.0_76]
> at 
> java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:239) 
> ~[na:1.7.0_76]
> at 
> org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:100)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) 
> [main/:na]
> Caused by: java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method) ~[na:1.7.0_76]
> at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) 
> ~[na:1.7.0_76]
> at java.net.ServerSocket.bind(ServerSocket.java:376) ~[na:1.7.0_76]
> at java.net.ServerSocket.(ServerSocket.java:237) ~[na:1.7.0_76]
> at 
> javax.net.DefaultServerSocketFactory.createServerSocket(ServerSocketFactory.java:231)
>  ~[na:1.7.0_76]
> at 
> org.apache.cassandra.utils.RMIServerSocketFactoryImpl.createServerSocket(RMIServerSocketFactoryImpl.java:13)
>  ~[main/:na]
> at 
> sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666) 
> ~[na:1.7.0_76]
> at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) 
> ~[na:1.7.0_76]
> ... 11 common frames omitted
> {noformat}
> However the startup continues, and ends up replaying commitlogs, which is 
> probably not a good thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14691) Cassandra 2.1 backport - The JVM should exit if jmx fails to bind

2018-09-11 Thread Thomas Steinmaurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610432#comment-16610432
 ] 

Thomas Steinmaurer commented on CASSANDRA-14691:


[~spo...@gmail.com], thanks for the feedback. So, potential corruption caused 
by this does not qualify as critical?

> Cassandra 2.1 backport - The JVM should exit if jmx fails to bind
> -
>
> Key: CASSANDRA-14691
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14691
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
>  Labels: lhf
> Fix For: 2.1.x
>
>
> If you are already running a cassandra instance, but for some reason try to 
> start another one, this happens:
> {noformat}
> INFO  20:57:09 JNA mlockall successful
> WARN  20:57:09 JMX is not enabled to receive remote connections. Please see 
> cassandra-env.sh for more info.
> ERROR 20:57:10 Error starting local jmx server:
> java.rmi.server.ExportException: Port already in use: 7199; nested exception 
> is:
> java.net.BindException: Address already in use
> at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:340) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:248) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) 
> ~[na:1.7.0_76]
> at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) 
> ~[na:1.7.0_76]
> at 
> sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:207) 
> ~[na:1.7.0_76]
> at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:122) 
> ~[na:1.7.0_76]
> at sun.rmi.registry.RegistryImpl.(RegistryImpl.java:98) 
> ~[na:1.7.0_76]
> at 
> java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:239) 
> ~[na:1.7.0_76]
> at 
> org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:100)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) 
> [main/:na]
> Caused by: java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method) ~[na:1.7.0_76]
> at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) 
> ~[na:1.7.0_76]
> at java.net.ServerSocket.bind(ServerSocket.java:376) ~[na:1.7.0_76]
> at java.net.ServerSocket.(ServerSocket.java:237) ~[na:1.7.0_76]
> at 
> javax.net.DefaultServerSocketFactory.createServerSocket(ServerSocketFactory.java:231)
>  ~[na:1.7.0_76]
> at 
> org.apache.cassandra.utils.RMIServerSocketFactoryImpl.createServerSocket(RMIServerSocketFactoryImpl.java:13)
>  ~[main/:na]
> at 
> sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666) 
> ~[na:1.7.0_76]
> at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) 
> ~[na:1.7.0_76]
> ... 11 common frames omitted
> {noformat}
> However the startup continues, and ends up replaying commitlogs, which is 
> probably not a good thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject increment repair and allow full repair only

2018-09-09 Thread Thomas Steinmaurer (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Description: 
We are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.0/3.11 in pre-production stages including loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the _-full_ command-line 
option available since 2.2 (?)
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these examples are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

With troubles incremental repair are introducing and incremental being the 
default since 2.2 (?), what do you think about a JVM system property, 
cassandra.yaml setting or whatever … to basically let the cluster administrator 
chose if incremental repairs are allowed or not? I know, such a flag still can 
be flipped then (by the customer), but as a first safety stage possibly 
sufficient enough.


  was:
We are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.0/3.11 in pre-production stages including loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the -full command-line option
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these examples are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

With troubles incremental repair are introducing and incremental being the 
default since 2.2 (?), what do you think about a JVM system property, 
cassandra.yaml setting or whatever … to basically let the cluster administrator 
chose if incremental repairs are allowed or not? I know, such a flag still can 
be flipped then (by the customer), but as a first safety stage possibly 
sufficient enough.



> Global configuration parameter to reject increment repair and allow full 
> repair only
> 
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> We are running Cassandra in AWS and On-Premise at customer sites, currently 
> 2.1 in production with 3.0/3.11 in pre-production stages including loadtest.
> In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
> we end up in incremental repairs being enabled / ran a first time 
> unintentionally, cause:
> a) A lot of online resources / examples do not use the _-full_ command-line 
> option available since 2.2 (?)
> b) Our internal (support) tickets of course also state nodetool repair 
> command without the -full option, as these examples are for 2.1
> Especially for On-Premise customers (with less control than with our AWS 
> deployments), this asks a bit for getting out-of-control once we have 3.11 
> out and nodetool repair being run without the -full command-line option.
> With troubles incremental repair are introducing and incremental being the 
> default since 2.2 (?), what do you think about a JVM system property, 
> cassandra.yaml setting or whatever … to basically let the cluster 
> administrator chose if incremental repairs are allowed or not? I know, such a 
> flag still can be flipped then (by the customer), but as a first safety stage 
> possibly sufficient enough.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14709) Global configuration parameter to reject increment repair and allow full repair only

2018-09-09 Thread Thomas Steinmaurer (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-14709:
---
Description: 
We are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.0/3.11 in pre-production stages including loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the -full command-line option
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these examples are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

With troubles incremental repair are introducing and incremental being the 
default since 2.2 (?), what do you think about a JVM system property, 
cassandra.yaml setting or whatever … to basically let the cluster administrator 
chose if incremental repairs are allowed or not? I know, such a flag still can 
be flipped then (by the customer), but as a first safety stage possibly 
sufficient enough.


  was:
We are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.0/3.11 in pre-production stages including loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the -full command-line option
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

With troubles incremental repair are introducing and incremental being the 
default since 2.2 (?), what do you think about a JVM system property, 
cassandra.yaml setting or whatever … to basically let the cluster administrator 
chose if incremental repairs are allowed or not? I know, such a flag still can 
be flipped then (by the customer), but as a first safety stage possibly 
sufficient enough.



> Global configuration parameter to reject increment repair and allow full 
> repair only
> 
>
> Key: CASSANDRA-14709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14709
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> We are running Cassandra in AWS and On-Premise at customer sites, currently 
> 2.1 in production with 3.0/3.11 in pre-production stages including loadtest.
> In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
> we end up in incremental repairs being enabled / ran a first time 
> unintentionally, cause:
> a) A lot of online resources / examples do not use the -full command-line 
> option
> b) Our internal (support) tickets of course also state nodetool repair 
> command without the -full option, as these examples are for 2.1
> Especially for On-Premise customers (with less control than with our AWS 
> deployments), this asks a bit for getting out-of-control once we have 3.11 
> out and nodetool repair being run without the -full command-line option.
> With troubles incremental repair are introducing and incremental being the 
> default since 2.2 (?), what do you think about a JVM system property, 
> cassandra.yaml setting or whatever … to basically let the cluster 
> administrator chose if incremental repairs are allowed or not? I know, such a 
> flag still can be flipped then (by the customer), but as a first safety stage 
> possibly sufficient enough.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   >