[jira] [Commented] (CASSANDRA-16115) New Cassandra website design, content and layout to work with Antora

2020-10-12 Thread Erick Ramirez (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212846#comment-17212846
 ] 

Erick Ramirez commented on CASSANDRA-16115:
---

Melissa, noting here too that view access to the document is working. Cheers!

> New Cassandra website design, content and layout to work with Antora
> 
>
> Key: CASSANDRA-16115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16115
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Melissa Logan
>Assignee: Melissa Logan
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: Screen Shot 2020-09-03 at 09.48.53.png
>
>
> This task is related to CASSANDRA-16066 (Update and rework the 
> cassandra-website material to work with Antora). The goal is to update the 
> front-end of the C* website (design, IA and content) to work with Antora to 
> help modernize the website as discussed on the [mailing 
> list|https://www.mail-archive.com/dev@cassandra.apache.org/msg15537.html].
> *Design Concepts:* A minimum of two homepage design concepts will be created 
> and shared for input, which will help standardize a brand palette for C* and 
> a design language for the site. This may include custom iconography and 
> graphics. The chosen design language will be used to develop the remaining 
> templates. 
> *Template Design*: It's estimated that 7 template designs will be needed 
> including the creation of several new pages: 
>  * Homepage template
>  * Toplevel template - e.g. Community.
>  * General template - Mostly textual with some images, e.g. Intro, Quickstart 
>  * “Library” template - A library of assets (links, downloads, logos etc) 
> that are sortable by metadata, e.g Resources, or Kafka's Powered By page).
>  * Blog landing template 
>  * Blog single template
>  * Docs template 
> *Website Content:* Along with new design will be a need for new or updated 
> content to fit the new page layouts. The intention is to use as much as 
> possible from existing content, and augment with new content where needed.
> *Template Development:* This includes the frontend development, such as any 
> HTML markup to achieve designs. HTML would be crafted so as to preserve any 
> backend/API calls, such that content is pulled in as designed. The majority 
> of the frontend work would come in the form of crafting CSS to bring the 
> designs to life, plus any minor Javascript to add subtle delights to key 
> pages.
> *Style Guide*: Once all is complete, a Style Guide be added to GitHub for 
> contributors.
> The [cassandra-website|https://github.com/apache/cassandra-website] 
> repository would need to be modified. Specific changes to be determined. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-12 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210238#comment-17210238
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-16201 at 10/13/20, 5:06 AM:
---

[~marcuse], yes I think so. :-) TRUNK, locally checked out, calling hierarchy 
from {{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png|width=100%! 

Thanks again.


was (Author: tsteinmaurer):
[~marcuse], yes I think so. :-) TRUNK, locally checked out, calling hierarchy 
from {{BatchUpdatesCollector.getPartitionUpdateBuilder}} up to 
{{PartitionUpdate.Builder.rowBuilder}}

 !screenshot-2.png! 

Thanks again.

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-12 Thread Thomas Steinmaurer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-16201:
---
Description: 
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png|width=100%! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.

This sort of pre-allocation is causing a lot of memory pressure.


  was:
In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
multiple NTR threads in a 3-digit MB range.

This is likely related to object array pre-allocations at the size of 
{{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
only 1 {{BTreeRow}} in the {{BTree}}.
 !screenshot-1.png! 

So it seems we have many, many 20K elemnts pre-allocated object arrays 
resulting in a shallow heap of 80K each, although there is only one element in 
the array.

This sort of pre-allocation is causing a lot of memory pressure.



> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14361) Allow SimpleSeedProvider to resolve multiple IPs per DNS name

2020-10-12 Thread Ben Bromhead (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212789#comment-17212789
 ] 

Ben Bromhead commented on CASSANDRA-14361:
--

Ah that will teach me for blindly clicking accept on the patch notes in GitHub 
:P

I've reverted back to the old behavior for `getAllByNameOverrideDefaults` as 
the subsqeuent call to  `getByAddressOverrideDefaults` will actually do the 
null check and set to the default for us. 

Included a new boolean conf value to revert to old behavior as well. Wasn't 
100% on naming conventions so let me know. 

> Allow SimpleSeedProvider to resolve multiple IPs per DNS name
> -
>
> Key: CASSANDRA-14361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14361
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Ben Bromhead
>Assignee: Ben Bromhead
>Priority: Low
> Fix For: 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently SimpleSeedProvider can accept a comma separated string of IPs or 
> hostnames as the set of Cassandra seeds. hostnames are resolved via 
> InetAddress.getByName, which will only return the first IP associated with an 
> A,  or CNAME record.
> By changing to InetAddress.getAllByName, existing behavior is preserved, but 
> now Cassandra can discover multiple IP address per record, allowing seed 
> discovery by DNS to be a little easier.
> Some examples of improved workflows with this change include: 
>  * specify the DNS name of a headless service in Kubernetes which will 
> resolve to all IP addresses of pods within that service. 
>  * seed discovery for multi-region clusters via AWS route53, AzureDNS etc
>  * Other common DNS service discovery mechanisms.
> The only behavior this is likely to impact would be where users are relying 
> on the fact that getByName only returns a single IP address.
> I can't imagine any scenario where that is a sane choice. Even when that 
> choice has been made, it only impacts the first startup of Cassandra and 
> would not be on any critical path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16207:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell
   Status: Review In Progress  (was: Patch Available)

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212766#comment-17212766
 ] 

David Capwell commented on CASSANDRA-16207:
---

[~ifesdjeen] you linked https://github.com/apache/cassandra/pull/773 which adds 
a test but dosn't change anything in dtest or src/java based off the above 
comment; is there something missing?

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15935:
--
Reviewers: Benedict Elliott Smith, David Capwell, David Capwell  (was: 
Benedict Elliott Smith, David Capwell)
   Benedict Elliott Smith, David Capwell, David Capwell  (was: 
Benedict Elliott Smith, David Capwell)
   Status: Review In Progress  (was: Patch Available)

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15935:
--
Status: Changes Suggested  (was: Review In Progress)

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212765#comment-17212765
 ] 

David Capwell commented on CASSANDRA-15935:
---

Did my first pass, mostly looking at structure rather than are the gossip 
changes valid; gave feedback.

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16157:
--
Status: Ready to Commit  (was: Review In Progress)

Overall LGTM +1

I left a comment about the test, it would be good to address it but fine if you 
do it before merging.  

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212761#comment-17212761
 ] 

David Capwell commented on CASSANDRA-16157:
---

The other exception that happens is

{code}
java.lang.IllegalArgumentException
at 
org.apache.cassandra.net.NoPayload$1.serialize(NoPayload.java:40)
at 
org.apache.cassandra.net.NoPayload$1.serialize(NoPayload.java:36)
at 
org.apache.cassandra.net.Message$Serializer.serializePost40(Message.java:760)
at 
org.apache.cassandra.net.Message$Serializer.serialize(Message.java:618)
at 
org.apache.cassandra.distributed.impl.Instance.serializeMessage(Instance.java:291)
at 
org.apache.cassandra.distributed.impl.Instance.lambda$registerInboundFilter$4(Instance.java:264)
at 
org.apache.cassandra.net.InboundSink$Filtered.accept(InboundSink.java:62)
at 
org.apache.cassandra.net.InboundSink$Filtered.accept(InboundSink.java:49)
at 
org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
at 
org.apache.cassandra.distributed.impl.Instance.lambda$null$6(Instance.java:334)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16157:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell
   Status: Review In Progress  (was: Patch Available)

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15935:
--
Reviewers: Benedict Elliott Smith, David Capwell  (was: Benedict Elliott 
Smith)

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212755#comment-17212755
 ] 

Yifan Cai commented on CASSANDRA-16157:
---

I can produce the {{IllegalArgumentException}} in the 
{{Instance#serializeMessage}} code path by unapplying the patch. But I cannot 
reproduce the mentioned RTE in the {{deserializeMessage}} code path. 

When applying the patch, the {{reserializationDuringUpgradeFrom30}} test always 
passes. 

Beside, providing the {{toString}} override in {{MessageImpl}} to display the 
{{verb, id, version and from}} can give better clarity in the exception 
message. For example, 
{code:java}
java.lang.RuntimeException: Can not deserialize message (verb: 1, id, 1, 
version: VERSION, from: IP)
{code}


> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-10-12 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212735#comment-17212735
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15897:
-

Branch rebased. CI running [here | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch-artifacts/83/]

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 4.0-beta3
>
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-10-12 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-15897:

Fix Version/s: 3.0.x

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 4.0-beta3
>
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-10-12 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212710#comment-17212710
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16063:
-

Thank you [~adelapena]

Indeed, these are the failures we already know about. I don't see the compact 
storage related tests in the list of failures. Also, I ran all tests several 
times locally. I believe the patch is ready for commit. 

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Compact_storage_upgrade_tests.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica

2020-10-12 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16208:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: Legacy/Local Write-Read Paths
Discovered By: User Report
 Severity: Low
 Assignee: Ekaterina Dimitrova
   Status: Open  (was: Triage Needed)

> Fail truncation requests when they fail on replica
> --
>
> Key: CASSANDRA-16208
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16208
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica

2020-10-12 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16208:

Fix Version/s: 4.0-beta

> Fail truncation requests when they fail on replica
> --
>
> Key: CASSANDRA-16208
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16208
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16208) Fail truncation requests when they fail on replica

2020-10-12 Thread Ekaterina Dimitrova (Jira)
Ekaterina Dimitrova created CASSANDRA-16208:
---

 Summary: Fail truncation requests when they fail on replica
 Key: CASSANDRA-16208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16208
 Project: Cassandra
  Issue Type: Bug
Reporter: Ekaterina Dimitrova






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16201) Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations in BatchUpdatesCollector

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212674#comment-17212674
 ] 

Michael Semb Wever commented on CASSANDRA-16201:


[~marcuse], looking at CASSANDRA-15430 it looks like {{initialCapacity}} 
(CASSANDRA-13929) needs to be back-ported to 3.0, and this ticket also applied 
to 3.0. wdyt?

> Cassandra 4.0 b2 - OOM / memory pressure due to object array pre-allocations 
> in BatchUpdatesCollector
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
 Bug Category: Parent values: Degradation(12984)Level 1 values: Resource 
Management(12995)
   Complexity: Normal
  Component/s: Local/Other
Discovered By: User Report
Fix Version/s: 4.0.x
   3.11.x
   3.0.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_3-0_obj_obj_alloc.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212667#comment-17212667
 ] 

Michael Semb Wever commented on CASSANDRA-15430:


Looking at {{Object[]}} allocation in 3.0.11 I see a lot of occurrences of 
{{Array.copyOf(..)}} under {{MultiCBuilder}}. In addition to those under 
{{BTree$Builder.}}

 !jfr_jmc_3-0_obj_obj_alloc.png|width=1200! 

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212663#comment-17212663
 ] 

Benedict Elliott Smith commented on CASSANDRA-15430:


Oof, so in this case 3.11 is the worst?  

It looks like in 3.x the cause is at least partially that we do not propagate 
{{initialCapacity}} to the {{BTree.Builder}} - though this has been fixed in 
trunk. This is trivial to fix and should definitely be back ported.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212659#comment-17212659
 ] 

Michael Semb Wever edited comment on CASSANDRA-15430 at 10/12/20, 8:32 PM:
---

Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png|width=1200! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png|width=1200! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png|width=1200! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png|width=1200! 



was (Author: michaelsembwever):
Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 


> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 

[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212659#comment-17212659
 ] 

Michael Semb Wever edited comment on CASSANDRA-15430 at 10/12/20, 8:32 PM:
---

Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png|width=1200! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 



was (Author: michaelsembwever):
Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png|width=700! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 


> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 

[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212659#comment-17212659
 ] 

Michael Semb Wever edited comment on CASSANDRA-15430 at 10/12/20, 8:31 PM:
---

Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png|width=700! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 



was (Author: michaelsembwever):
Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 


> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212659#comment-17212659
 ] 

Michael Semb Wever commented on CASSANDRA-15430:


Here are the screenshots from the latest JFRs, with relevant areas of the stack 
trace expanded.

h4. 2.1.18
Allocations
* BatchMessage.execute - 815523
 ** BatchStatement.getMutations => 377635 (46%)
 ** BatchStatement.executeWithoutConditions => 307894 (38%)

 !jfr_jmc_2-1.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_2-1_obj.png! 

h4. 3.0.20
Allocations
* BatchMessage.execute - 1687923
 ** BatchStatement.getMutations => 1016728 (60%)
 ** BatchStatement.executeWithoutConditions => 498539 (30%)

 !jfr_jmc_3-0.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-0_obj.png! 

h4. 3.11.8
Allocations
* BatchMessage.execute - 2161156
 ** BatchStatement.getMutations => 1350691 (62%)
 ** BatchStatement.executeWithoutConditions => 648225 (30%)

 !jfr_jmc_3-11.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_3-11_obj.png! 


h4. 4.0-beta2
Allocations
* BatchMessage.execute - 1908578
 ** BatchStatement.getMutations => 1344102 (70%)
 ** BatchStatement.executeWithoutConditions =>431344 (23%)

 !jfr_jmc_4-0-b2.png! 

Sizes by object under {{BatchStatement.getMutations}}

 !jfr_jmc_4-0-b2_obj.png! 


> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the 

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_3-11_obj.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_4-0-b2.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_4-0-b2_obj.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_3-11.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, jfr_jmc_3-11.png, 
> jfr_jmc_3-11_obj.png, jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_3-0.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_2-1_obj.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_2-1.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Attachment: jfr_jmc_3-0_obj.png

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> mutation_stage.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212600#comment-17212600
 ] 

David Capwell commented on CASSANDRA-16155:
---

Works for me, thanks =)

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15992:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams
   Status: Review In Progress  (was: Patch Available)

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15992:
-
Status: Ready to Commit  (was: Review In Progress)

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15992:
-
  Fix Version/s: (was: 4.0-beta)
 4.0-beta3
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-dtest/commit/19f50572016e5d88a114d730256cbf7bfd27889e
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed.

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: Remove flaky metrics assertion

2020-10-12 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 19f5057  Remove flaky  metrics assertion
19f5057 is described below

commit 19f50572016e5d88a114d730256cbf7bfd27889e
Author: Adam Holmberg 
AuthorDate: Mon Oct 12 12:21:16 2020 -0500

Remove flaky  metrics assertion

Patch by Adam Holmberg, reviewed by brandonwilliams for CASSANDRA-15992
---
 consistency_test.py | 6 --
 1 file changed, 6 deletions(-)

diff --git a/consistency_test.py b/consistency_test.py
index 493526b..e422b81 100644
--- a/consistency_test.py
+++ b/consistency_test.py
@@ -15,7 +15,6 @@ from tools.assertions import (assert_all, 
assert_length_equal, assert_none,
 from dtest import MultiError, Tester, create_ks, create_cf
 from tools.data import (create_c1c2_table, insert_c1c2, insert_columns,
 query_c1c2, rows_to_list)
-from tools.jmxutils import JolokiaAgent, make_mbean
 
 since = pytest.mark.since
 logger = logging.getLogger(__name__)
@@ -1281,11 +1280,6 @@ class TestConsistency(Tester):
[[3]],
cl=ConsistencyLevel.ALL)
 
-srp = make_mbean('metrics', type='Table', 
name='ShortReadProtectionRequests', keyspace='test', scope='test')
-with JolokiaAgent(node1) as jmx:
-# 4 srp requests for node1 and 5 for node2, total of 9
-assert 9 == jmx.read_attribute(srp, 'Count')
-
 @since('3.0')
 def test_12872(self):
 """


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212572#comment-17212572
 ] 

Adam Holmberg commented on CASSANDRA-15992:
---

[a very predictable 
patch|https://github.com/apache/cassandra-dtest/compare/master...aholmberg:CASSANDRA-15992?expand=1]
[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15992]
 with several unrelated errors.

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16127) NullPointerException when calling nodetool enablethrift

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16127:
--
  Fix Version/s: (was: 3.11.x)
 (was: 3.0.x)
 (was: 2.2.x)
 3.11.9
 3.0.23
 2.2.19
  Since Version: 2.2.18  (was: 3.11.8)
Source Control Link:  
https://github.com/apache/cassandra/commit/3ee90cfc94ee038b7758a57b56d3ec09b514cb88
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

python dtest commit: 1b71196a036b4f33d1ef53418bd21ac4b241399e

> NullPointerException when calling nodetool enablethrift
> ---
>
> Key: CASSANDRA-16127
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16127
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Thrift
>Reporter: Tibor Repasi
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.19, 3.0.23, 3.11.9
>
>
> Having thrift disabled, it's impossible to enable it again without restarting 
> the node:
> {code}
> $ nodetool statusthrift
> not running
> $ nodetool enablethrift
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.startRPCServer(StorageService.java:392)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>   at sun.rmi.transport.Transport$1.run(Transport.java:200)
>   at sun.rmi.transport.Transport$1.run(Transport.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16127) NullPointerException when calling nodetool enablethrift

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212494#comment-17212494
 ] 

David Capwell edited comment on CASSANDRA-16127 at 10/12/20, 6:09 PM:
--

CI Results: Yellow, normal failures

Branch: cassandra-2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-2.2-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/95/

Branch: cassandra-3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.0-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/96/

Branch: cassandra-3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.11-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/97/

Branch: trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-trunk-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/98/



was (Author: dcapwell):
Starting commit

CI Results (pending):

Branch: cassandra-2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-2.2-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/95/

Branch: cassandra-3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.0-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/96/

Branch: cassandra-3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.11-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/97/

Branch: trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-trunk-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/98/


> NullPointerException when calling nodetool enablethrift
> ---
>
> Key: CASSANDRA-16127
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16127
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Thrift
>Reporter: Tibor Repasi
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.19, 3.0.23, 3.11.9
>
>
> Having thrift disabled, it's impossible to enable it again without restarting 
> the node:
> {code}
> $ nodetool statusthrift
> not running
> $ nodetool enablethrift
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.startRPCServer(StorageService.java:392)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>   at 
> 

[cassandra-dtest] branch master updated: Fixed a NullPointerException when calling nodetool enablethrift

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 1b71196  Fixed a NullPointerException when calling nodetool 
enablethrift
1b71196 is described below

commit 1b71196a036b4f33d1ef53418bd21ac4b241399e
Author: David Capwell 
AuthorDate: Mon Oct 12 09:30:33 2020 -0700

Fixed a NullPointerException when calling nodetool enablethrift

patch by David Capwell; reviewed by Ekaterina Dimitrova, Jordan West, Yifan 
Cai for CASSANDRA-16127
---
 bootstrap_test.py |  14 +++--
 client_network_stop_start_test.py | 112 ++
 conftest.py   |  79 +--
 3 files changed, 195 insertions(+), 10 deletions(-)

diff --git a/bootstrap_test.py b/bootstrap_test.py
index 526992d..1cda916 100644
--- a/bootstrap_test.py
+++ b/bootstrap_test.py
@@ -835,10 +835,17 @@ class TestBootstrap(Tester):
 shutil.rmtree(commitlog_dir)
 
 @since('2.2')
+@pytest.mark.ported_to_in_jvm # see 
org.apache.cassandra.distributed.test.BootstrapBinaryDisabledTest
 def test_bootstrap_binary_disabled(self):
 """
-Test binary while bootstrapping and streaming fails
-@jira_ticket CASSANDRA-14526, CASSANDRA-14525
+Test binary while bootstrapping and streaming fails.
+
+This test was ported to jvm-dtest 
org.apache.cassandra.distributed.test.BootstrapBinaryDisabledTest,
+as of this writing there are a few limitations with jvm-dtest which 
requries this test to
+stay, namely vnode support (ci also tests under different configs).  
Once jvm-dtest supports
+vnodes, this test can go away in favor of that class.
+
+@jira_ticket CASSANDRA-14526, CASSANDRA-14525, CASSANDRA-16127
 """
 config = {'authenticator': 
'org.apache.cassandra.auth.PasswordAuthenticator',
   'authorizer': 
'org.apache.cassandra.auth.CassandraAuthorizer',
@@ -871,9 +878,6 @@ class TestBootstrap(Tester):
 node2.start(jvm_args=["-Dcassandra.ring_delay_ms=5000"])
 self.assert_log_had_msg(node2, 'Some data streaming failed')
 
-if self.cluster.version() >= LooseVersion('4.0'):
-self.assert_log_had_msg(node2, 'Not starting client transports as 
bootstrap has not completed')
-
 try:
 node2.nodetool('join')
 pytest.fail('nodetool should have errored and failed to join ring')
diff --git a/client_network_stop_start_test.py 
b/client_network_stop_start_test.py
new file mode 100644
index 000..6f472df
--- /dev/null
+++ b/client_network_stop_start_test.py
@@ -0,0 +1,112 @@
+import logging
+import os
+import os.path
+import pytest
+import shutil
+import string
+import time
+
+from ccmlib.node import TimeoutError
+from distutils.version import LooseVersion
+from dtest import Tester
+from tools import sslkeygen
+
+since = pytest.mark.since
+logger = logging.getLogger(__name__)
+
+# see https://issues.apache.org/jira/browse/CASSANDRA-16127
+class TestClientNetworkStopStart(Tester):
+
+def _normalize(self, a):
+return a.translate(str.maketrans(dict.fromkeys(string.whitespace)))
+
+def _in(self, a, b):
+return self._normalize(a) in self._normalize(b)
+
+def _assert_client_active_msg(self, name, enabled, out):
+expected = "{} active: {}".format(name, str(enabled).lower())
+actived = "actived" if enabled else "deactivated"
+assert self._in(expected, out), "{} is expected to be {} ({}) but was 
not found in output: {}".format(name, actived, str(enabled).lower(), out)
+
+def _assert_watch_log_for(self, node_or_cluster, to_watch, 
assert_msg=None):
+if assert_msg is None:
+assert_msg = "Unable to locate '{}'".format(to_watch)
+nodelist_fn = getattr(node_or_cluster, "nodelist", None)
+logger.debug("watching for '{}'".format(to_watch))
+start = time.perf_counter()
+if callable(nodelist_fn):
+for node in nodelist_fn():
+assert node.watch_log_for_no_errors(to_watch), assert_msg
+else:
+assert node_or_cluster.watch_log_for_no_errors(to_watch), 
assert_msg
+logger.debug("Completed watching for '{}'; took {}s".format(to_watch, 
time.perf_counter() - start))
+
+def _assert_binary_actually_found(self, node_or_cluster):
+# ccm will silently move on if the logs don't have CQL in time, which 
then leads to
+# flaky tests; to avoid that force waiting to be correct and assert 
the log was seen.
+logger.debug("Verifying that the CQL log was seen and that ccm didn't 
return early...")
+self._assert_watch_log_for(node_or_cluster, "Starting listening for 
CQL clients on", "Binary didn't start...")
+
+def _assert_client_enable(self, 

[jira] [Updated] (CASSANDRA-16127) NullPointerException when calling nodetool enablethrift

2020-10-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16127:
--
Status: Ready to Commit  (was: Review In Progress)

> NullPointerException when calling nodetool enablethrift
> ---
>
> Key: CASSANDRA-16127
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16127
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Thrift
>Reporter: Tibor Repasi
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> Having thrift disabled, it's impossible to enable it again without restarting 
> the node:
> {code}
> $ nodetool statusthrift
> not running
> $ nodetool enablethrift
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.startRPCServer(StorageService.java:392)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>   at sun.rmi.transport.Transport$1.run(Transport.java:200)
>   at sun.rmi.transport.Transport$1.run(Transport.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 2ae1ec5dd2d98178f3ab4b3ed64a87147e713560
Merge: 0e0056c 34dde96
Author: David Capwell 
AuthorDate: Mon Oct 12 11:06:42 2020 -0700

Merge branch 'cassandra-3.11' into trunk

 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  | 108 +--
 .../apache/cassandra/service/StorageService.java   |   2 +-
 .../distributed/impl/AbstractCluster.java  |  18 +-
 .../cassandra/distributed/impl/Instance.java   |  41 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../cassandra/distributed/shared/Shared.java   |  37 
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../test/ClientNetworkStopStartTest.java   |  79 
 .../distributed/test/TopologyChangeTest.java   |  45 +++--
 test/resources/byteman/stream_failure.btm  |  14 ++
 12 files changed, 633 insertions(+), 88 deletions(-)

diff --cc CHANGES.txt
index bf80c8c,ee70af5..6829dac
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -29,34 -6,15 +29,36 @@@ Merged from 3.11
  Merged from 3.0:
   * Handle unexpected columns due to schema races (CASSANDRA-15899)
   * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
+ Merged from 2.2:
+  * Fixed a NullPointerException when calling nodetool enablethrift 
(CASSANDRA-16127)
  
 -3.11.8
 +4.0-beta2
 + * Add addition incremental repair visibility to nodetool repair_admin 
(CASSANDRA-14939)
 + * Always access system properties and environment variables via the new 
CassandraRelevantProperties and CassandraRelevantEnv classes (CASSANDRA-15876)
 + * Remove deprecated HintedHandOffManager (CASSANDRA-15939)
 + * Prevent repair from overrunning compaction (CASSANDRA-15817)
 + * fix cqlsh COPY functions in Python 3.8 on Mac (CASSANDRA-16053)
 + * Strip comment blocks from cqlsh input before processing statements 
(CASSANDRA-15802)
 + * Fix unicode chars error input (CASSANDRA-15990)
 + * Improved testability for CacheMetrics and ChunkCacheMetrics 
(CASSANDRA-15788)
 + * Handle errors in StreamSession#prepare (CASSANDRA-15852)
 + * FQL replay should have options to ignore DDL statements (CASSANDRA-16039)
 + * Remove COMPACT STORAGE internals (CASSANDRA-13994)
 + * Make TimestampSerializer accept fractional seconds of varying precision 
(CASSANDRA-15976)
 + * Improve cassandra-stress logging when using a profile file that doesn't 
exist (CASSANDRA-14425)
 + * Improve logging for socket connection/disconnection (CASSANDRA-15980)
 + * Throw FSWriteError upon write failures in order to apply DiskFailurePolicy 
(CASSANDRA-15928)
 + * Forbid altering UDTs used in partition keys (CASSANDRA-15933)
 + * Fix version parsing logic when upgrading from 3.0 (CASSANDRA-15973)
 + * Optimize NoSpamLogger use in hot paths (CASSANDRA-15766)
 + * Verify sstable components on startup (CASSANDRA-15945)
 + * Resolve JMX output inconsistencies from CASSANDRA-7544 
storage-port-configurable-per-node (CASSANDRA-15937)
 +Merged from 3.11:
   * Correctly interpret SASI's `max_compaction_flush_memory_in_mb` setting in 
megabytes not bytes (CASSANDRA-16071)
   * Fix short read protection for GROUP BY queries (CASSANDRA-15459)
 + * stop_paranoid disk failure policy is ignored on CorruptSSTableException 
after node is up (CASSANDRA-15191)
   * Frozen RawTuple is not annotated with frozen in the toString method 
(CASSANDRA-15857)
  Merged from 3.0:
 - * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
   * Fix gossip shutdown order (CASSANDRA-15816)
   * Remove broken 'defrag-on-read' optimization (CASSANDRA-15432)
   * Check for endpoint collision with hibernating nodes (CASSANDRA-14599)
diff --cc build.xml
index e026630,191c1c8..5c9ac2f
--- a/build.xml
+++ b/build.xml
@@@ -582,13 -412,20 +582,14 @@@



 -  
 -
 -  
 -
 -
 -  
 -  
 -  
 -   
 -  
 -  
 +  
 +  
 +  

 +  
 +  

+   

   

@@@ -731,19 -542,22 +732,20 @@@
  version="${version}"/>
  
  
 -
 +
+ 
 +
 +
 +
  
  
 -  
 -  
 +
 +
  
 -
 -  
 -  
 -  
 -  
 -
 +
  
 -
 -
 +
 +
  
  
  
@@@ -770,7 -572,14 +772,8 @@@
  version="${version}"/>
  
  
 -
+ 
 -
 -  
 -  
 -  

[cassandra] branch cassandra-3.11 updated (925ad35 -> 34dde96)

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 925ad35  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 521a6e2  Fixed a NullPointerException when calling nodetool 
enablethrift
 new 42989ce  Merge branch 'cassandra-2.2' into cassandra-3.0
 new 34dde96  Merge branch 'cassandra-3.0' into cassandra-3.11

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  |  94 ++
 .../apache/cassandra/service/StorageService.java   |  11 +-
 .../distributed/impl/AbstractCluster.java  |  21 ++-
 .../cassandra/distributed/impl/Instance.java   |  35 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../shared/{ShutdownException.java => Shared.java} |  23 ++-
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../distributed/test/BytemanExamples.java  |  87 +
 .../test/ClientNetworkStopStartTest.java   | 192 +++
 test/resources/byteman/stream_failure.btm  |  14 ++
 12 files changed, 794 insertions(+), 60 deletions(-)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/shared/Byteman.java
 copy 
test/distributed/org/apache/cassandra/distributed/shared/{ShutdownException.java
 => Shared.java} (56%)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/BootstrapBinaryDisabledTest.java
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/BytemanExamples.java
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
 create mode 100644 test/resources/byteman/stream_failure.btm


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (0e0056c -> 2ae1ec5)

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 0e0056c  Ninja: add a missing CHANGES.txt entry for CASSANDRA-16155
 new 521a6e2  Fixed a NullPointerException when calling nodetool 
enablethrift
 new 42989ce  Merge branch 'cassandra-2.2' into cassandra-3.0
 new 34dde96  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 2ae1ec5  Merge branch 'cassandra-3.11' into trunk

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  | 108 +--
 .../apache/cassandra/service/StorageService.java   |   2 +-
 .../distributed/impl/AbstractCluster.java  |  18 +-
 .../cassandra/distributed/impl/Instance.java   |  41 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../shared/{RepairResult.java => Shared.java}  |  24 ++-
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../test/ClientNetworkStopStartTest.java   |  79 
 .../distributed/test/TopologyChangeTest.java   |  45 +++--
 test/resources/byteman/stream_failure.btm  |  14 ++
 12 files changed, 611 insertions(+), 97 deletions(-)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/shared/Byteman.java
 copy 
test/distributed/org/apache/cassandra/distributed/shared/{RepairResult.java => 
Shared.java} (56%)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/BootstrapBinaryDisabledTest.java
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
 create mode 100644 test/resources/byteman/stream_failure.btm


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-2.2 updated: Fixed a NullPointerException when calling nodetool enablethrift

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-2.2
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-2.2 by this push:
 new 521a6e2  Fixed a NullPointerException when calling nodetool 
enablethrift
521a6e2 is described below

commit 521a6e2aa9f8a4bc95dd13e768ec6de33cf6fa15
Author: David Capwell 
AuthorDate: Mon Oct 12 09:30:41 2020 -0700

Fixed a NullPointerException when calling nodetool enablethrift

patch by David Capwell; reviewed by Ekaterina Dimitrova, Jordan West, Yifan 
Cai for CASSANDRA-16127
---
 CHANGES.txt|   1 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  | 139 +-
 .../apache/cassandra/service/StorageService.java   |  20 +-
 .../distributed/impl/AbstractCluster.java  |  18 +-
 .../cassandra/distributed/impl/Instance.java   |  32 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../cassandra/distributed/shared/Shared.java   |  37 
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../test/ClientNetworkStopStartTest.java   | 192 +++
 test/resources/byteman/stream_failure.btm  |  14 ++
 11 files changed, 761 insertions(+), 67 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 73e9ba9..1274689 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,6 @@
 2.2.19
  * Fix ExceptionInInitializerError when data_file_directories is not set 
(CASSANDRA-16008)
+ * Fixed a NullPointerException when calling nodetool enablethrift 
(CASSANDRA-16127)
 
 2.2.18
  * Fix CQL parsing of collections when the column type is reversed 
(CASSANDRA-15814)
diff --git a/build.xml b/build.xml
index 657cbbb..1e701f8 100644
--- a/build.xml
+++ b/build.xml
@@ -397,6 +397,7 @@
   
   
   
+  
   
  
   
@@ -513,6 +514,7 @@
 
 
 
+
 
 

@@ -538,6 +540,7 @@
 version="${version}"/>
 
 
+
 
   
   
diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java 
b/src/java/org/apache/cassandra/service/CassandraDaemon.java
index 86e2464..a67011d 100644
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@ -29,17 +29,14 @@ import java.rmi.AlreadyBoundException;
 import java.rmi.NotBoundException;
 import java.rmi.Remote;
 import java.rmi.RemoteException;
-import java.rmi.registry.LocateRegistry;
 import java.rmi.registry.Registry;
 import java.rmi.server.RMIClientSocketFactory;
 import java.rmi.server.RMIServerSocketFactory;
-import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
 import java.util.concurrent.TimeUnit;
-
 import javax.management.ObjectName;
 import javax.management.StandardMBean;
 import javax.management.remote.JMXConnectorServer;
@@ -47,6 +44,13 @@ import javax.management.remote.JMXServiceURL;
 import javax.management.remote.rmi.RMIConnectorServer;
 import javax.management.remote.rmi.RMIJRMPServerImpl;
 
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.util.concurrent.Futures;
+import com.google.common.util.concurrent.ListenableFuture;
+import com.google.common.util.concurrent.Uninterruptibles;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import com.addthis.metrics3.reporter.config.ReporterConfig;
 import com.codahale.metrics.Meter;
 import com.codahale.metrics.MetricRegistryListener;
@@ -55,18 +59,18 @@ import com.codahale.metrics.jvm.BufferPoolMetricSet;
 import com.codahale.metrics.jvm.FileDescriptorRatioGauge;
 import com.codahale.metrics.jvm.GarbageCollectorMetricSet;
 import com.codahale.metrics.jvm.MemoryUsageGaugeSet;
-import com.google.common.annotations.VisibleForTesting;
-import com.google.common.util.concurrent.Futures;
-import com.google.common.util.concurrent.ListenableFuture;
-import com.google.common.util.concurrent.Uninterruptibles;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.cassandra.concurrent.*;
+import org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor;
+import org.apache.cassandra.concurrent.ScheduledExecutors;
+import org.apache.cassandra.concurrent.Stage;
+import org.apache.cassandra.concurrent.StageManager;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.SizeEstimatesRecorder;
+import 

[cassandra] 01/01: Merge branch 'cassandra-2.2' into cassandra-3.0

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 42989cee287ddacf30db331829bf927695816966
Merge: 5ef77d2 521a6e2
Author: David Capwell 
AuthorDate: Mon Oct 12 11:04:41 2020 -0700

Merge branch 'cassandra-2.2' into cassandra-3.0

 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  | 105 ++-
 .../apache/cassandra/service/StorageService.java   |  11 +-
 .../distributed/impl/AbstractCluster.java  |  18 +-
 .../cassandra/distributed/impl/Instance.java   |  33 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../cassandra/distributed/shared/Shared.java   |  37 
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../test/ClientNetworkStopStartTest.java   | 192 +++
 test/resources/byteman/stream_failure.btm  |  14 ++
 11 files changed, 729 insertions(+), 58 deletions(-)

diff --cc CHANGES.txt
index 1ea5184,1274689..1dd52c2
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,20 -1,8 +1,22 @@@
 -2.2.19
 - * Fix ExceptionInInitializerError when data_file_directories is not set 
(CASSANDRA-16008)
 +3.0.23:
 + * Handle unexpected columns due to schema races (CASSANDRA-15899)
 + * Avoid failing compactions with very large partitions (CASSANDRA-15164)
 + * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
 + * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
++Merged from 2.2:
+  * Fixed a NullPointerException when calling nodetool enablethrift 
(CASSANDRA-16127)
  
 -2.2.18
 +3.0.22:
 + * Fix gossip shutdown order (CASSANDRA-15816)
 + * Remove broken 'defrag-on-read' optimization (CASSANDRA-15432)
 + * Check for endpoint collision with hibernating nodes (CASSANDRA-14599)
 + * Operational improvements and hardening for replica filtering protection 
(CASSANDRA-15907)
 + * stop_paranoid disk failure policy is ignored on CorruptSSTableException 
after node is up (CASSANDRA-15191)
 + * 3.x fails to start if commit log has range tombstones from a column which 
is also deleted (CASSANDRA-15970)
 + * Forbid altering UDTs used in partition keys (CASSANDRA-15933)
 + * Fix empty/null json string representation (CASSANDRA-15896)
 + * Handle difference in timestamp precision between java8 and java11 in 
LogFIle.java (CASSANDRA-16050)
 +Merged from 2.2:
   * Fix CQL parsing of collections when the column type is reversed 
(CASSANDRA-15814)
  Merged from 2.1:
   * Only allow strings to be passed to JMX authentication (CASSANDRA-16077)
diff --cc build.xml
index e3270d8,1e701f8..8ecf70b
--- a/build.xml
+++ b/build.xml
@@@ -528,7 -540,19 +530,8 @@@
  version="${version}"/>
  
  
+ 
 -
 -  
 -  
 -  
 -  
 -  
 -  
 -  
 -
 -
  
  
  
diff --cc src/java/org/apache/cassandra/service/CassandraDaemon.java
index 666ca72,a67011d..ce0b8f3
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@@ -35,9 -35,8 +34,7 @@@ import java.rmi.server.RMIServerSocketF
  import java.util.HashMap;
  import java.util.List;
  import java.util.Map;
 -import java.util.UUID;
  import java.util.concurrent.TimeUnit;
- 
  import javax.management.ObjectName;
  import javax.management.StandardMBean;
  import javax.management.remote.JMXConnectorServer;
@@@ -53,20 -59,18 +57,17 @@@ import com.codahale.metrics.jvm.BufferP
  import com.codahale.metrics.jvm.FileDescriptorRatioGauge;
  import com.codahale.metrics.jvm.GarbageCollectorMetricSet;
  import com.codahale.metrics.jvm.MemoryUsageGaugeSet;
- import com.google.common.annotations.VisibleForTesting;
- import com.google.common.util.concurrent.Futures;
- import com.google.common.util.concurrent.ListenableFuture;
- import com.google.common.util.concurrent.Uninterruptibles;
- 
- import org.slf4j.Logger;
- import org.slf4j.LoggerFactory;
- 
- import org.apache.cassandra.concurrent.*;
 -import org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor;
++import org.apache.cassandra.batchlog.LegacyBatchlogMigrator;
+ import org.apache.cassandra.concurrent.ScheduledExecutors;
 -import org.apache.cassandra.concurrent.Stage;
 -import org.apache.cassandra.concurrent.StageManager;
  import org.apache.cassandra.config.CFMetaData;
  import org.apache.cassandra.config.DatabaseDescriptor;
  import org.apache.cassandra.config.Schema;
- import org.apache.cassandra.db.*;
- import org.apache.cassandra.batchlog.LegacyBatchlogMigrator;
++import org.apache.cassandra.cql3.functions.ThreadAwareSecurityManager;
+ import org.apache.cassandra.db.ColumnFamilyStore;
+ import 

[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 34dde962fc0b785d4c5d94db5f2be16c913a4257
Merge: 925ad35 42989ce
Author: David Capwell 
AuthorDate: Mon Oct 12 11:05:34 2020 -0700

Merge branch 'cassandra-3.0' into cassandra-3.11

 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  |  94 ++
 .../apache/cassandra/service/StorageService.java   |  11 +-
 .../distributed/impl/AbstractCluster.java  |  21 ++-
 .../cassandra/distributed/impl/Instance.java   |  35 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../cassandra/distributed/shared/Shared.java   |  37 
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../distributed/test/BytemanExamples.java  |  87 +
 .../test/ClientNetworkStopStartTest.java   | 192 +++
 test/resources/byteman/stream_failure.btm  |  14 ++
 12 files changed, 816 insertions(+), 52 deletions(-)

diff --cc CHANGES.txt
index 99369fa,1dd52c2..ee70af5
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,18 -1,12 +1,20 @@@
 -3.0.23:
 - * Handle unexpected columns due to schema races (CASSANDRA-15899)
 +3.11.9
 + * Fix memory leak in CompressedChunkReader (CASSANDRA-15880)
 + * Don't attempt value skipping with mixed version cluster (CASSANDRA-15833)
   * Avoid failing compactions with very large partitions (CASSANDRA-15164)
 - * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
 + * Make sure LCS handles duplicate sstable added/removed notifications 
correctly (CASSANDRA-14103)
 +Merged from 3.0:
 + * Handle unexpected columns due to schema races (CASSANDRA-15899)
   * Add flag to ignore unreplicated keyspaces during repair (CASSANDRA-15160)
+ Merged from 2.2:
+  * Fixed a NullPointerException when calling nodetool enablethrift 
(CASSANDRA-16127)
  
 -3.0.22:
 +3.11.8
 + * Correctly interpret SASI's `max_compaction_flush_memory_in_mb` setting in 
megabytes not bytes (CASSANDRA-16071)
 + * Fix short read protection for GROUP BY queries (CASSANDRA-15459)
 + * Frozen RawTuple is not annotated with frozen in the toString method 
(CASSANDRA-15857)
 +Merged from 3.0:
 + * Use IF NOT EXISTS for index and UDT create statements in snapshot schema 
files (CASSANDRA-13935)
   * Fix gossip shutdown order (CASSANDRA-15816)
   * Remove broken 'defrag-on-read' optimization (CASSANDRA-15432)
   * Check for endpoint collision with hibernating nodes (CASSANDRA-14599)
diff --cc build.xml
index f078d34,8ecf70b..191c1c8
--- a/build.xml
+++ b/build.xml
@@@ -570,14 -530,8 +572,15 @@@
  version="${version}"/>
  
  
 +
+ 
 -
 +
 +  
 +  
 +  
 +  
 +
 +
  
  
  
diff --cc src/java/org/apache/cassandra/service/CassandraDaemon.java
index cf185ec,ce0b8f3..d8bd165
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@@ -22,15 -22,33 +22,20 @@@ import java.io.IOException
  import java.lang.management.ManagementFactory;
  import java.lang.management.MemoryPoolMXBean;
  import java.net.InetAddress;
 +import java.net.URL;
  import java.net.UnknownHostException;
 -import java.rmi.AccessException;
 -import java.rmi.AlreadyBoundException;
 -import java.rmi.NotBoundException;
 -import java.rmi.Remote;
 -import java.rmi.RemoteException;
 -import java.rmi.registry.Registry;
 -import java.rmi.server.RMIClientSocketFactory;
 -import java.rmi.server.RMIServerSocketFactory;
 -import java.util.HashMap;
  import java.util.List;
 -import java.util.Map;
  import java.util.concurrent.TimeUnit;
- import javax.management.MBeanServer;
  import javax.management.ObjectName;
  import javax.management.StandardMBean;
  import javax.management.remote.JMXConnectorServer;
 -import javax.management.remote.JMXServiceURL;
 -import javax.management.remote.rmi.RMIConnectorServer;
 -import javax.management.remote.rmi.RMIJRMPServerImpl;
  
+ import com.google.common.annotations.VisibleForTesting;
+ import com.google.common.util.concurrent.Futures;
+ import com.google.common.util.concurrent.ListenableFuture;
 -import com.google.common.util.concurrent.Uninterruptibles;
+ import org.slf4j.Logger;
+ import org.slf4j.LoggerFactory;
+ 
  import com.addthis.metrics3.reporter.config.ReporterConfig;
  import com.codahale.metrics.Meter;
  import com.codahale.metrics.MetricRegistryListener;
@@@ -50,9 -62,12 +49,13 @@@ import org.apache.cassandra.concurrent.
  import org.apache.cassandra.config.CFMetaData;
  import org.apache.cassandra.config.DatabaseDescriptor;
  import org.apache.cassandra.config.Schema;
 -import 

[cassandra] branch cassandra-3.0 updated (5ef77d2 -> 42989ce)

2020-10-12 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 5ef77d2  Merge branch 'cassandra-2.2' into cassandra-3.0
 new 521a6e2  Fixed a NullPointerException when calling nodetool 
enablethrift
 new 42989ce  Merge branch 'cassandra-2.2' into cassandra-3.0

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt|   2 +
 build.xml  |   3 +
 .../apache/cassandra/service/CassandraDaemon.java  | 105 ++-
 .../apache/cassandra/service/StorageService.java   |  11 +-
 .../distributed/impl/AbstractCluster.java  |  18 +-
 .../cassandra/distributed/impl/Instance.java   |  33 +++-
 .../cassandra/distributed/shared/Byteman.java  | 207 +
 .../shared/{ShutdownException.java => Shared.java} |  23 ++-
 .../test/BootstrapBinaryDisabledTest.java  | 165 
 .../test/ClientNetworkStopStartTest.java   | 192 +++
 test/resources/byteman/stream_failure.btm  |  14 ++
 11 files changed, 707 insertions(+), 66 deletions(-)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/shared/Byteman.java
 copy 
test/distributed/org/apache/cassandra/distributed/shared/{ShutdownException.java
 => Shared.java} (56%)
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/BootstrapBinaryDisabledTest.java
 create mode 100644 
test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
 create mode 100644 test/resources/byteman/stream_failure.btm


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16083) Missing JMX objects and attributes upgrading from 3.0 to 4.0

2020-10-12 Thread Uchenna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uchenna reassigned CASSANDRA-16083:
---

Assignee: (was: Uchenna)

> Missing JMX objects and attributes upgrading from 3.0 to 4.0
> 
>
> Key: CASSANDRA-16083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16083
> Project: Cassandra
>  Issue Type: Task
>  Components: Observability/Metrics
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Using the tools added in CASSANDRA-16082, below are the list of metrics 
> missing in 4.0 but present in 3.0.  The work here is to make sure we had 
> proper deprecation for each metric, and if not to add it back.
> {code}
> $ tools/bin/jmxtool diff -f yaml cassandra-3.0-jmx.yaml trunk-jmx.yaml 
> --ignore-missing-on-left
> Objects not in right:
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=EstimatedPartitionSizeHistogram
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=BloomFilterFalseRatio
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ReplicaFilteringProtectionRequests
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=RowCacheHitOutOfRange
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=CasPrepareLatency
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=MaxPoolSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=TombstoneScannedHistogram
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=ActiveTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=WaitingOnFreeMemtableSpace
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_columnfamilies,name=CasCommitTotalLatency
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=schema_aggregates,name=CasProposeLatency
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=ViewReadTime
> org.apache.cassandra.db:type=HintedHandoffManager
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=BloomFilterDiskSpaceUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=RequestResponseStage,name=PendingTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=MemtableSwitchCount
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=MemtableOnHeapSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=range_xfers,name=ReplicaFilteringProtectionRowsCachedPerQuery
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=SnapshotsSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=RecentBloomFilterFalsePositives
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ColUpdateTimeDeltaHistogram
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=SpeculativeRetries
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=LiveDiskSpaceUsed
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=views_builds_in_progress,name=ViewReadTime
> org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CompletedTasks
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=AllMemtablesLiveDataSize
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=batchlog,name=ViewReadTime
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=hints,name=BloomFilterFalsePositives
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=range_xfers,name=CompressionMetadataOffHeapMemoryUsed
> org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=views_builds_in_progress,name=LiveScannedHistogram
> org.apache.cassandra.db:type=Tables,keyspace=system,table=views_builds_in_progress
> org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=ActiveTasks
> 

[jira] [Assigned] (CASSANDRA-16185) Add tests to cover CommitLog metrics

2020-10-12 Thread Uchenna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uchenna reassigned CASSANDRA-16185:
---

Assignee: Uchenna

> Add tests to cover CommitLog metrics
> 
>
> Key: CASSANDRA-16185
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16185
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Uchenna
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The only metrics that seems to be covered by unit test for the CommitLog 
> metrics is {{oversizedMutations}}. We should add testing the other ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16127) NullPointerException when calling nodetool enablethrift

2020-10-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212494#comment-17212494
 ] 

David Capwell commented on CASSANDRA-16127:
---

Starting commit

CI Results (pending):

Branch: cassandra-2.2
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-2.2-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/95/

Branch: cassandra-3.0
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.0-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/96/

Branch: cassandra-3.11
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-cassandra-3.11-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/97/

Branch: trunk
Circle: 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16127-trunk-1535B57E-CC4D-4F4D-91C3-9FC233923F02
Jenkins: https://ci-cassandra.apache.org/job/Cassandra-devbranch/98/


> NullPointerException when calling nodetool enablethrift
> ---
>
> Key: CASSANDRA-16127
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16127
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Thrift
>Reporter: Tibor Repasi
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> Having thrift disabled, it's impossible to enable it again without restarting 
> the node:
> {code}
> $ nodetool statusthrift
> not running
> $ nodetool enablethrift
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.startRPCServer(StorageService.java:392)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>   at sun.rmi.transport.Transport$1.run(Transport.java:200)
>   at sun.rmi.transport.Transport$1.run(Transport.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> 

[jira] [Updated] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-10-12 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15369:
--
Complexity: Challenging  (was: Impossible)

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212468#comment-17212468
 ] 

Adam Holmberg commented on CASSANDRA-15992:
---

Thanks. Will do.

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16156) Decomissioned nodes are picked for gossip when unreachable nodes are considered for gossiping

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16156:

Test and Documentation Plan: Test included into [CASSANDRA-15935], can find 
it 
[here|https://github.com/apache/cassandra/blob/27470d38f57695a766d8efd27a90be6f779ed625/test/distributed/org/apache/cassandra/distributed/test/BootstrapTest.java#L154-L191].
 Status: Patch Available  (was: Open)

|[patch|https://github.com/apache/cassandra/pull/774]|[ci|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=16156-gossip-with-unreachable-nodes]|

> Decomissioned nodes are picked for gossip when unreachable nodes are 
> considered for gossiping 
> --
>
> Key: CASSANDRA-16156
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16156
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> After node is decommissioned, it is still considered for gossip via 
> “unreachable” nodes, which results into following exceptions:
>  
> {code}
> INFO  [node4_Messaging-EventLoop-3-3] node4 2020-09-29 16:37:37,527 
> NoSpamLogger.java:91 - 
> /127.0.0.4:7012->/127.0.0.1:7012-URGENT_MESSAGES-[no-channel] failed to 
> connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection 
> refused: /127.0.0.1:7012
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
>  {code}
> Trace of the method that attempts to establish connection:
> {code} 
> org.apache.cassandra.net.MessagingService.getOutbound(MessagingService.java:492)
>   at 
> org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:335)
>   at 
> org.apache.cassandra.net.OutboundSink$Filtered.accept(OutboundSink.java:55)
>   at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
>   at 
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:327)
>   at 
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:314)
>   at org.apache.cassandra.gms.Gossiper.sendGossip(Gossiper.java:813)
>   at 
> org.apache.cassandra.gms.Gossiper.maybeGossipToUnreachableMember(Gossiper.java:840)
>   at org.apache.cassandra.gms.Gossiper.access$400(Gossiper.java:86)
>  {code}
> LEFT and other nodes that are considered dead should not be picked for gossip 
> with unreachable nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

2020-10-12 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212457#comment-17212457
 ] 

Joey Lynch edited comment on CASSANDRA-14746 at 10/12/20, 3:45 PM:
---

Hi [~jmckenzie] thanks for pinging on this!
{quote}Was this the goal of the MS rewrite? I have no horse in this race - I 
just thought the goal of it was to tighten up some of the things that were 
present / still troublesome after Jason's rewrite of things rather than 
specifically targeting performance improvements.
{quote}
I think phrasing it as no regression is also fine. Our testing so far 
identified major regressions after both refactors (stability after the first 
e.g. not delivering mutations across datacenters and some decent performance 
regressions especially around TLS after the second that had to be fixed up).
{quote}And fwiw, the benchmarks I've seen on 4.0 show a pretty significant 
improvement in throughput if nothing else, but in terms of bar - no regression 
for a rewrite seems like a good low water mark to block on.
{quote}
Some of the issues only surfaced on multi hundred node clusters spanning 
multiple datacenters and in various configurations (e.g. TLS on + compression 
off, compresison on + TLS off), I haven't seen very many large scale tests 
outside this ticket (most are 6 node clusters with a single datacenter) or that 
span uncommon configurations (e.g. which options enabled, disabled, token 
setup, etc...). If you know of any results that are public we can certainly 
link them to this ticket :) and aggregate all the verification work in one 
place!
{quote}> What do you think about this as acceptance criteria for the work here?
{quote}
I think the three identified remaining tests in the sub-tasks are good enough 
to call this done from our end, but if there are more test setups from the 
public let's get those recorded as well:

1. CASSANDRA-14764 - breaking point comparison with 3.0. This will give us good 
signal on regressions from the 3.0 series with a "typical" multi-DC setup
 2. CASSANDRA-14747 - 200 node cluster with all options disabled (e.g. if 
someone is using VPC and direct connects to peer their VPCs so they don't have 
to pay TLS compute costs): This will tell us if we broke the performance of the 
non TLS path
 3 CASSANDRA-15181 - Can we successfully stream to nodes, how long does that 
take, how long does it take with and without TLS on.

[~jmckenzie] For what it's worth these kind of scientific and rigorous tests 
are hard to run and expensive (in dollars and engineering time), which is 
probably why they don't usually get run before, e.g. 2.1, 3.0 and 3.11 all 
failed these kinds of tests resulting in numerous regression bug reports in the 
6-12 months after release. Our hope is that we can invest the time and money 
ahead of time instead of after the release for 4.0.


was (Author: jolynch):
Hi [~jmckenzie] thanks for pinging on this!
{quote}Was this the goal of the MS rewrite? I have no horse in this race - I 
just thought the goal of it was to tighten up some of the things that were 
present / still troublesome after Jason's rewrite of things rather than 
specifically targeting performance improvements.
{quote}
I think phrasing it as no regression is also fine. Our testing so far 
identified major regressions after both refactors (stability after the first 
e.g. not delivering mutations across datacenters and some decent performance 
regressions especially around TLS after the second that had to be fixed up).
{quote}And fwiw, the benchmarks I've seen on 4.0 show a pretty significant 
improvement in throughput if nothing else, but in terms of bar - no regression 
for a rewrite seems like a good low water mark to block on.
{quote}
Some of the issues only surfaced on multi hundred node clusters spanning 
multiple datacenters and in various configurations (e.g. TLS on + encryption 
off, encryption on + TLS off), I haven't seen very many large scale tests 
outside this ticket (most are 6 node clusters with a single datacenter) or that 
span uncommon configurations (e.g. which options enabled, disabled, token 
setup, etc...). If you know of any results that are public we can certainly 
link them to this ticket :) and aggregate all the verification work in one 
place!
{quote}> What do you think about this as acceptance criteria for the work here?
{quote}
I think the three identified remaining tests in the sub-tasks are good enough 
to call this done from our end, but if there are more test setups from the 
public let's get those recorded as well:

1. CASSANDRA-14764 - breaking point comparison with 3.0. This will give us good 
signal on regressions from the 3.0 series with a "typical" multi-DC setup
 2. CASSANDRA-14747 - 200 node cluster with all options disabled (e.g. if 
someone is using VPC and direct connects to peer their VPCs so they don't have 
to pay TLS 

[jira] [Comment Edited] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

2020-10-12 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212457#comment-17212457
 ] 

Joey Lynch edited comment on CASSANDRA-14746 at 10/12/20, 3:45 PM:
---

Hi [~jmckenzie] thanks for pinging on this!
{quote}Was this the goal of the MS rewrite? I have no horse in this race - I 
just thought the goal of it was to tighten up some of the things that were 
present / still troublesome after Jason's rewrite of things rather than 
specifically targeting performance improvements.
{quote}
I think phrasing it as no regression is also fine. Our testing so far 
identified major regressions after both refactors (stability after the first 
e.g. not delivering mutations across datacenters and some decent performance 
regressions especially around TLS after the second that had to be fixed up).
{quote}And fwiw, the benchmarks I've seen on 4.0 show a pretty significant 
improvement in throughput if nothing else, but in terms of bar - no regression 
for a rewrite seems like a good low water mark to block on.
{quote}
Some of the issues only surfaced on multi hundred node clusters spanning 
multiple datacenters and in various configurations (e.g. TLS on + encryption 
off, encryption on + TLS off), I haven't seen very many large scale tests 
outside this ticket (most are 6 node clusters with a single datacenter) or that 
span uncommon configurations (e.g. which options enabled, disabled, token 
setup, etc...). If you know of any results that are public we can certainly 
link them to this ticket :) and aggregate all the verification work in one 
place!
{quote}> What do you think about this as acceptance criteria for the work here?
{quote}
I think the three identified remaining tests in the sub-tasks are good enough 
to call this done from our end, but if there are more test setups from the 
public let's get those recorded as well:

1. CASSANDRA-14764 - breaking point comparison with 3.0. This will give us good 
signal on regressions from the 3.0 series with a "typical" multi-DC setup
 2. CASSANDRA-14747 - 200 node cluster with all options disabled (e.g. if 
someone is using VPC and direct connects to peer their VPCs so they don't have 
to pay TLS compute costs): This will tell us if we broke the performance of the 
non TLS path
 3 CASSANDRA-15181 - Can we successfully stream to nodes, how long does that 
take, how long does it take with and without TLS on.

[~jmckenzie] For what it's worth these kind of scientific and rigorous tests 
are hard to run and expensive (in dollars and engineering time), which is 
probably why they don't usually get run before, e.g. 2.1, 3.0 and 3.11 all 
failed these kinds of tests resulting in numerous regression bug reports in the 
6-12 months after release. Our hope is that we can invest the time and money 
ahead of time instead of after the release for 4.0.


was (Author: jolynch):
Hi [~jmckenzie] thanks for pinging on this!
{quote}Was this the goal of the MS rewrite? I have no horse in this race - I 
just thought the goal of it was to tighten up some of the things that were 
present / still troublesome after Jason's rewrite of things rather than 
specifically targeting performance improvements.
{quote}
I think phrasing it as no regression is also fine. Our testing so far 
identified major regressions after both refactors (stability after the first 
e.g. not delivering mutations across datacenters and some decent performance 
regressions after the second that had to be fixed up).
{quote}And fwiw, the benchmarks I've seen on 4.0 show a pretty significant 
improvement in throughput if nothing else, but in terms of bar - no regression 
for a rewrite seems like a good low water mark to block on.
{quote}
Some of the issues only surfaced on multi hundred node clusters spanning 
multiple datacenters and in various configurations (e.g. TLS on + encryption 
off, encryption on + TLS off), I haven't seen very many large scale tests 
outside this ticket (most are 6 node clusters with a single datacenter) or that 
span uncommon configurations (e.g. which options enabled, disabled, token 
setup, etc...). If you know of any results that are public we can certainly 
link them to this ticket :) and aggregate all the verification work in one 
place!
{quote}> What do you think about this as acceptance criteria for the work here?
{quote}
I think the three identified remaining tests in the sub-tasks are good enough 
to call this done from our end, but if there are more test setups from the 
public let's get those recorded as well:

1. CASSANDRA-14764 - breaking point comparison with 3.0. This will give us good 
signal on regressions from the 3.0 series with a "typical" multi-DC setup
 2. CASSANDRA-14747 - 200 node cluster with all options disabled (e.g. if 
someone is using VPC and direct connects to peer their VPCs so they don't have 
to pay TLS compute costs): This will tell 

[jira] [Commented] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

2020-10-12 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212457#comment-17212457
 ] 

Joey Lynch commented on CASSANDRA-14746:


Hi [~jmckenzie] thanks for pinging on this!
{quote}Was this the goal of the MS rewrite? I have no horse in this race - I 
just thought the goal of it was to tighten up some of the things that were 
present / still troublesome after Jason's rewrite of things rather than 
specifically targeting performance improvements.
{quote}
I think phrasing it as no regression is also fine. Our testing so far 
identified major regressions after both refactors (stability after the first 
e.g. not delivering mutations across datacenters and some decent performance 
regressions after the second that had to be fixed up).
{quote}And fwiw, the benchmarks I've seen on 4.0 show a pretty significant 
improvement in throughput if nothing else, but in terms of bar - no regression 
for a rewrite seems like a good low water mark to block on.
{quote}
Some of the issues only surfaced on multi hundred node clusters spanning 
multiple datacenters and in various configurations (e.g. TLS on + encryption 
off, encryption on + TLS off), I haven't seen very many large scale tests 
outside this ticket (most are 6 node clusters with a single datacenter) or that 
span uncommon configurations (e.g. which options enabled, disabled, token 
setup, etc...). If you know of any results that are public we can certainly 
link them to this ticket :) and aggregate all the verification work in one 
place!
{quote}> What do you think about this as acceptance criteria for the work here?
{quote}
I think the three identified remaining tests in the sub-tasks are good enough 
to call this done from our end, but if there are more test setups from the 
public let's get those recorded as well:

1. CASSANDRA-14764 - breaking point comparison with 3.0. This will give us good 
signal on regressions from the 3.0 series with a "typical" multi-DC setup
 2. CASSANDRA-14747 - 200 node cluster with all options disabled (e.g. if 
someone is using VPC and direct connects to peer their VPCs so they don't have 
to pay TLS compute costs): This will tell us if we broke the performance of the 
non TLS path
 3 CASSANDRA-15181 - Can we successfully stream to nodes, how long does that 
take, how long does it take with and without TLS on.

[~jmckenzie] For what it's worth these kind of scientific and rigorous tests 
are hard to run and expensive (in dollars and engineering time), which is 
probably why they don't usually get run before, e.g. 2.1, 3.0 and 3.11 all 
failed these kinds of tests resulting in numerous regression bug reports in the 
6-12 months after release. Our hope is that we can invest the time and money 
ahead of time instead of after the release for 4.0.

> Ensure Netty Internode Messaging Refactor is Solid
> --
>
> Key: CASSANDRA-14746
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14746
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>  Labels: 4.0-QA
> Fix For: 4.0-beta
>
>
> Before we release 4.0 let's ensure that the internode messaging refactor is 
> 100% solid. As internode messaging is naturally used in many code paths and 
> widely configurable we have a large number of cluster configurations and test 
> configurations that must be vetted.
> We plan to vary the following:
>  * Version of Cassandra 3.0.17 vs 4.0-alpha
>  * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes
>  * Client request rates varying between 1k QPS and 100k QPS of varying sizes 
> and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...)
>  * Internode compression
>  * Internode SSL (as well as openssl vs jdk)
>  * Internode Coalescing options
> We are looking to measure the following as appropriate:
>  * Latency distributions of reads and writes (lower is better)
>  * Scaling limit, aka maximum throughput before violating p99 latency 
> deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% 
> writes, 100% reads and 50-50 writes+reads (higher is better)
>  * Thread counts (lower is better)
>  * Context switches (lower is better)
>  * On-CPU time of tasks (higher periods without context switch is better)
>  * GC allocation rates / throughput for a fixed size heap (lower allocation 
> better)
>  * Streaming recovery time for a single node failure, i.e. can Cassandra 
> saturate the NIC
>  
> The goal is that 4.0 should have better latency, more throughput, fewer 
> threads, fewer context switches, less GC allocation, and faster recovery 
> time. I'm putting Jason Brown as the reviewer since he implemented most of 
> the 

[jira] [Updated] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-10-12 Thread Alexander Dejanovski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Dejanovski updated CASSANDRA-15580:
-
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Alexander Dejanovski*

We aim for 4.0 to have the first fully functioning incremental repair solution 
(CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full 
range, sub range, incremental) function as expected as well as ensuring 
community tools such as Reaper work. CASSANDRA-3200 adds an experimental option 
to reduce the amount of data streamed during repair, we should write more tests 
and see how it works with big nodes.

  was:
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: None*

We aim for 4.0 to have the first fully functioning incremental repair solution 
(CASSANDRA-9143)! Furthermore we aim to verify that all types of repair: (full 
range, sub range, incremental) function as expected as well as ensuring 
community tools such as Reaper work. CASSANDRA-3200 adds an experimental option 
to reduce the amount of data streamed during repair, we should write more tests 
and see how it works with big nodes.


> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Assignee: Alexander Dejanovski
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Alexander Dejanovski*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15992) Fix flaky python dtest test_13595 - consistency_test.TestConsistency

2020-10-12 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212439#comment-17212439
 ] 

Aleksey Yeschenko commented on CASSANDRA-15992:
---

Hmh. Frankly, we can just ditch that assertion. The important part of that 
regression test is the query returning a correct resultset, which it does now, 
with C* fix in, and wouldn't without. In retrospect that last assert feels to 
me like an overspecification.

> Fix flaky python dtest test_13595 - consistency_test.TestConsistency
> 
>
> Key: CASSANDRA-15992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15992
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> >   assert 9 == jmx.read_attribute(srp, 'Count')
> E   AssertionError: assert 9 == 5
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-10-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212431#comment-17212431
 ] 

Andres de la Peña commented on CASSANDRA-16063:
---

[~e.dimitrova] there are a lot of failures the CI run of upgrade tests but I'd 
say they are not related, do you agree? Are we ready to commit?

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Compact_storage_upgrade_tests.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212408#comment-17212408
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


Sent [~mck] a fresh set of JFR files today from our recent 2.1.18 / 3.0.20 / 
3.11.8 / 4.0 Beta2 testing.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-harry] branch master updated: Update jackson dependency to 2.11.3 to force yaml to 1.26

2020-10-12 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-harry.git


The following commit(s) were added to refs/heads/master by this push:
 new 7360458  Update jackson dependency to 2.11.3 to force yaml to 1.26
7360458 is described below

commit 73604582a5488e52185fa9f0ae48d1f7ed605cd0
Author: Alex Petrov 
AuthorDate: Fri Oct 9 15:01:27 2020 +0200

Update jackson dependency to 2.11.3 to force yaml to 1.26
---
 pom.xml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/pom.xml b/pom.xml
index c8ef97a..0bcf9c5 100755
--- a/pom.xml
+++ b/pom.xml
@@ -39,6 +39,7 @@
 1.8
 0.0.1-SNAPSHOT
 4.0.0-SNAPSHOT
+2.11.3
 0.0.4
 1.11.3
 
@@ -152,19 +153,19 @@
 
 com.fasterxml.jackson.dataformat
 jackson-dataformat-yaml
-2.9.8
+${jackson.version}
 
 
 
 com.fasterxml.jackson.core
 jackson-databind
-2.9.8
+${jackson.version}
 
 
 
 com.fasterxml.jackson.core
 jackson-annotations
-2.9.8
+${jackson.version}
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-harry] branch master updated: Update jackson dependency to 2.11.3 to force yaml to 1.26

2020-10-12 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-harry.git


The following commit(s) were added to refs/heads/master by this push:
 new 7360458  Update jackson dependency to 2.11.3 to force yaml to 1.26
7360458 is described below

commit 73604582a5488e52185fa9f0ae48d1f7ed605cd0
Author: Alex Petrov 
AuthorDate: Fri Oct 9 15:01:27 2020 +0200

Update jackson dependency to 2.11.3 to force yaml to 1.26
---
 pom.xml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/pom.xml b/pom.xml
index c8ef97a..0bcf9c5 100755
--- a/pom.xml
+++ b/pom.xml
@@ -39,6 +39,7 @@
 1.8
 0.0.1-SNAPSHOT
 4.0.0-SNAPSHOT
+2.11.3
 0.0.4
 1.11.3
 
@@ -152,19 +153,19 @@
 
 com.fasterxml.jackson.dataformat
 jackson-dataformat-yaml
-2.9.8
+${jackson.version}
 
 
 
 com.fasterxml.jackson.core
 jackson-databind
-2.9.8
+${jackson.version}
 
 
 
 com.fasterxml.jackson.core
 jackson-annotations
-2.9.8
+${jackson.version}
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

Test and Documentation Plan: Test included
 Status: Patch Available  (was: Open)

|[patch|https://github.com/apache/cassandra/pull/773]|[ci|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=16207-npe-broadcast-address]|

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Low Hanging Fruit
  Component/s: Test/dtest/java
Discovered By: Unit Test
 Severity: Critical
 Assignee: Alex Petrov
   Status: Open  (was: Triage Needed)

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

Description: 
When trying to run upgrades, sometimes we’re calling broadcasts addrerss on an 
uninitialised new node:

{code}
java.lang.IllegalStateException: Can't use shut down instances, delegate is null
at 
org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
at 
org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
 
at 
org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
 
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
~[dtest-3.0.19.jar:?]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 
{code}

  was:
When trying to run upgrades, sometimes we’re calling broadcasts addrerss on an 
uninitialised new node:

{code}
java.lang.IllegalStateException: Can't use shut down instances, delegate is null
at 
org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
at 
org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
 
at 
org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
 
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
~[dtest-3.0.19.jar:?]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 
INFO  [node1_Messaging-EventLoop-3-4] node1 18:02:56,785 
/127.0.0.1:7012(/127.0.0.2:54939)->/127.0.0.2:7012-URGENT_MESSAGES-3858ba33 
successfully connected, version = 11, framing = CRC, encryption = disabled
ERROR 18:02:56,786 uncaught exception in thread 
Thread[MessagingService-Incoming-/127.0.0.2,5,node2]
{code}


> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Normal
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-12 Thread Alex Petrov (Jira)
Alex Petrov created CASSANDRA-16207:
---

 Summary: NPE when calling broadcast address on unintialized node
 Key: CASSANDRA-16207
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Petrov


When trying to run upgrades, sometimes we’re calling broadcasts addrerss on an 
uninitialised new node:

{code}
java.lang.IllegalStateException: Can't use shut down instances, delegate is null
at 
org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
at 
org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
 
at 
org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
 
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
~[dtest-3.0.19.jar:?]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
 
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 
INFO  [node1_Messaging-EventLoop-3-4] node1 18:02:56,785 
/127.0.0.1:7012(/127.0.0.2:54939)->/127.0.0.2:7012-URGENT_MESSAGES-3858ba33 
successfully connected, version = 11, framing = CRC, encryption = disabled
ERROR 18:02:56,786 uncaught exception in thread 
Thread[MessagingService-Incoming-/127.0.0.2,5,node2]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16072) Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS loops to atomic adds

2020-10-12 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16072:
---
Complexity:   (was: Low Hanging Fruit)

> Reduce thread contention in CommitLogSegment and HintsBuffer by rewriting CAS 
> loops to atomic adds
> --
>
> Key: CASSANDRA-16072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16072
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints, Local/Commit Log
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Follow up to CASSANDRA-15922
> Both CommitLogSegment and HintsBuffer use AtomicIntegers for the current 
> offset when allocating. Like in CASSANDRA\-15922 the loops on 
> {{.compareAndSet(..)}} can be replaced with atomic adds using the {{. 
> getAndAdd(..)}} method.
> In highly contended environments the CAS failures can be high, starving 
> writes in a running Cassandra node. On the same cluster CASSANDRA\-15922 was 
> found, after CASSANDRA\-15922's fix was deployed, there was still problems 
> around commit log flushing and hints. No flamegraph was collected that 
> demonstrated the thread contention as clearly as was found in 
> CASSANDRA\-15922, but the performance fix proposed here hopefully is obvious 
> enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16204) PicklingError: Can't pickle : attribute lookup video_encoding on cqlshlib.copyutil failed

2020-10-12 Thread Phil Miesle (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Miesle updated CASSANDRA-16204:

Description: 
This seems to be a different issue than Cassandra-14982 :

$ cqlsh --version
 cqlsh 6.8.0

Following Datastax Academy course DS220, in the Denormalization exercise. I 
received 
{code:java}
PicklingError: Can't pickle : 
attribute lookup video_encoding on cqlshlib.copyutil failed{code}
on the COPY command into the following table:
{code:java}
CREATE TYPE IF NOT EXISTS video_encoding (
encoding TEXT,
height INT,
width INT,
bit_rates SET
);

CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen,
tags set,
title text,
user_id uuid,
PRIMARY KEY ( (actor), added_date, video_id )
) WITH CLUSTERING ORDER BY ( added_date desc, video_id asc);

COPY 
videos_by_actor(actor,added_date,video_id,character_name,description,encoding,tags,title,user_id)
 FROM 'videos_by_actor.csv' WITH HEADER = true{code}
Now, as it turns out my PRIMARY KEY was non-unique (noted when I failed to load 
as many records as were in the file), and when I changed to:
{code:java}
 PRIMARY KEY ((actor), added_date, video_id, character_name){code}
the command worked. BUT the following options also worked (though they both 
dropped records):
{code:java}
 WITH HEADER = true AND MINBATCHSIZE=1 AND MAXBATCHSIZE=1 AND PAGESIZE=10;{code}
and
{code:java}
 WITH HEADER = true AND NUMPROCESSES=1;{code}
So this seems to be a problem of multi-threading and user-defined TYPEs?

I'll note that I'm running inside a WSL2 Docker Container based on 
ubuntu:bionic:
 $ uname -a
 Linux node-1 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 
x86_64 x86_64 x86_64 GNU/Linux

  was:
This seems to be a different issue than Cassandra-14982 :

$ cqlsh --version
 cqlsh 6.8.0

Following Datastax Academy course DS220, in the Denormalization exercise. I 
received 
{code:java}
PicklingError: Can't pickle : 
attribute lookup video_encoding on cqlshlib.copyutil failed{code}
on the COPY command into the following table:
{code:java}
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen,
tags set,
title text,
user_id uuid,
PRIMARY KEY ( (actor), added_date, video_id )
) WITH CLUSTERING ORDER BY ( added_date desc, video_id asc);

COPY 
videos_by_actor(actor,added_date,video_id,character_name,description,encoding,tags,title,user_id)
 FROM 'videos_by_actor.csv' WITH HEADER = true{code}
Now, as it turns out my PRIMARY KEY was non-unique (noted when I failed to load 
as many records as were in the file), and when I changed to:
{code:java}
 PRIMARY KEY ((actor), added_date, video_id, character_name){code}
the command worked. BUT the following options also worked (though they both 
dropped records):
{code:java}
 WITH HEADER = true AND MINBATCHSIZE=1 AND MAXBATCHSIZE=1 AND PAGESIZE=10;{code}
and
{code:java}
 WITH HEADER = true AND NUMPROCESSES=1;{code}
So this seems to be a problem of multi-threading and user-defined TYPEs?

I'll note that I'm running inside a WSL2 Docker Container based on 
ubuntu:bionic:
$ uname -a
Linux node-1 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 
x86_64 x86_64 x86_64 GNU/Linux


> PicklingError: Can't pickle : 
> attribute lookup video_encoding on cqlshlib.copyutil failed
> ---
>
> Key: CASSANDRA-16204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16204
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Phil Miesle
>Priority: Normal
> Attachments: videos_by_actor.csv.gz
>
>
> This seems to be a different issue than Cassandra-14982 :
> $ cqlsh --version
>  cqlsh 6.8.0
> Following Datastax Academy course DS220, in the Denormalization exercise. I 
> received 
> {code:java}
> PicklingError: Can't pickle : 
> attribute lookup video_encoding on cqlshlib.copyutil failed{code}
> on the COPY command into the following table:
> {code:java}
> CREATE TYPE IF NOT EXISTS video_encoding (
> encoding TEXT,
> height INT,
> width INT,
> bit_rates SET
> );
> CREATE TABLE videos_by_actor (
> actor text,
> added_date timestamp,
> video_id timeuuid,
> character_name text,
> description text,
> encoding frozen,
> tags set,
> title text,
> user_id uuid,
> PRIMARY KEY ( (actor), added_date, video_id )
> ) WITH CLUSTERING ORDER BY ( added_date desc, video_id asc);
> COPY 
> videos_by_actor(actor,added_date,video_id,character_name,description,encoding,tags,title,user_id)
>  FROM 'videos_by_actor.csv' WITH HEADER = true{code}
> Now, as it turns out my PRIMARY KEY was non-unique (noted when I failed to 
> load as 

[jira] [Comment Edited] (CASSANDRA-14361) Allow SimpleSeedProvider to resolve multiple IPs per DNS name

2020-10-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212343#comment-17212343
 ] 

Andres de la Peña edited comment on CASSANDRA-14361 at 10/12/20, 12:55 PM:
---

[~benbromhead] changes look mostly good, but I'm getting a 
{{NullPointerPointerException}} during startup that doesn't even allow to start 
the server. That happens because of my suggestion of using {{getPortOrDefault}} 
in 
[{{InetAddressAndPort.getAllByNameOverrideDefaults}}|https://github.com/apache/cassandra/blob/f9de59c5856fc3fee1aaf1f9c09aa63cf39b10ee/src/java/org/apache/cassandra/locator/InetAddressAndPort.java#L234].
 This is because {{HostAndPort#getPortOrDefault}} expects an {{int}} argument 
and it's receiving a {{null}} instead, so we should either keep using it and 
pass {{InetAddressAndPort.defaultPort}} to it, or leave it as it was.

Additionally, I still think it would be nice to place the threshold property in 
{{cassandra.yaml}}. Note that there is [a new section for safety thresholds at 
{{cassandra.yaml}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1217-L1273],
 that would be the place to put it. That way we would give visibility and some 
documentation to the new property, so users know that it exists and how to use 
it.


was (Author: adelapena):
[~benbromhead] changes look mostly good, but I'm getting a 
{{NullPointerPointerException}} during startup that doesn't even allow to start 
the server. That happens because of my suggestion of using {{getPortOrDefault}} 
in 
[{{InetAddressAndPort.getAllByNameOverrideDefaults}}|https://github.com/apache/cassandra/blob/f9de59c5856fc3fee1aaf1f9c09aa63cf39b10ee/src/java/org/apache/cassandra/locator/InetAddressAndPort.java#L234].
 This is because {{HostAndPort#getPortOrDefault}} expects an {{int}} argument 
and it's receiving a {{null}} instead, so we should either keep using it and 
pass {{InetAddressAndPort.defaultPort}} to it, or leave it as it was.

Additionally, I still think it would be nice to place the threshold property in 
{{cassandra.yaml}}. Note that there is [a new section for safety thresholds at 
{{cassandra.yaml}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1217-L1273],
 that would be the place to put it.

> Allow SimpleSeedProvider to resolve multiple IPs per DNS name
> -
>
> Key: CASSANDRA-14361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14361
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Ben Bromhead
>Assignee: Ben Bromhead
>Priority: Low
> Fix For: 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently SimpleSeedProvider can accept a comma separated string of IPs or 
> hostnames as the set of Cassandra seeds. hostnames are resolved via 
> InetAddress.getByName, which will only return the first IP associated with an 
> A,  or CNAME record.
> By changing to InetAddress.getAllByName, existing behavior is preserved, but 
> now Cassandra can discover multiple IP address per record, allowing seed 
> discovery by DNS to be a little easier.
> Some examples of improved workflows with this change include: 
>  * specify the DNS name of a headless service in Kubernetes which will 
> resolve to all IP addresses of pods within that service. 
>  * seed discovery for multi-region clusters via AWS route53, AzureDNS etc
>  * Other common DNS service discovery mechanisms.
> The only behavior this is likely to impact would be where users are relying 
> on the fact that getByName only returns a single IP address.
> I can't imagine any scenario where that is a sane choice. Even when that 
> choice has been made, it only impacts the first startup of Cassandra and 
> would not be on any critical path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14361) Allow SimpleSeedProvider to resolve multiple IPs per DNS name

2020-10-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212343#comment-17212343
 ] 

Andres de la Peña commented on CASSANDRA-14361:
---

[~benbromhead] changes look mostly good, but I'm getting a 
{{NullPointerPointerException}} during startup that doesn't even allow to start 
the server. That happens because of my suggestion of using {{getPortOrDefault}} 
in 
[{{InetAddressAndPort.getAllByNameOverrideDefaults}}|https://github.com/apache/cassandra/blob/f9de59c5856fc3fee1aaf1f9c09aa63cf39b10ee/src/java/org/apache/cassandra/locator/InetAddressAndPort.java#L234].
 This is because {{HostAndPort#getPortOrDefault}} expects an {{int}} argument 
and it's receiving a {{null}} instead, so we should either keep using it and 
pass {{InetAddressAndPort.defaultPort}} to it, or leave it as it was.

Additionally, I still think it would be nice to place the threshold property in 
{{cassandra.yaml}}. Note that there is [a new section for safety thresholds at 
{{cassandra.yaml}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1217-L1273],
 that would be the place to put it.

> Allow SimpleSeedProvider to resolve multiple IPs per DNS name
> -
>
> Key: CASSANDRA-14361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14361
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Ben Bromhead
>Assignee: Ben Bromhead
>Priority: Low
> Fix For: 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently SimpleSeedProvider can accept a comma separated string of IPs or 
> hostnames as the set of Cassandra seeds. hostnames are resolved via 
> InetAddress.getByName, which will only return the first IP associated with an 
> A,  or CNAME record.
> By changing to InetAddress.getAllByName, existing behavior is preserved, but 
> now Cassandra can discover multiple IP address per record, allowing seed 
> discovery by DNS to be a little easier.
> Some examples of improved workflows with this change include: 
>  * specify the DNS name of a headless service in Kubernetes which will 
> resolve to all IP addresses of pods within that service. 
>  * seed discovery for multi-region clusters via AWS route53, AzureDNS etc
>  * Other common DNS service discovery mechanisms.
> The only behavior this is likely to impact would be where users are relying 
> on the fact that getByName only returns a single IP address.
> I can't imagine any scenario where that is a sane choice. Even when that 
> choice has been made, it only impacts the first startup of Cassandra and 
> would not be on any critical path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212241#comment-17212241
 ] 

Benedict Elliott Smith commented on CASSANDRA-15430:


[~tsteinmaurer] sorry for completely dropping the ball on this - I forgot about 
it entirely until Mick posted just now.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212233#comment-17212233
 ] 

Michael Semb Wever commented on CASSANDRA-15430:


[~tsteinmaurer], the OneDrive Link has expired. 

I would like to get access and take a look. Without the LegacyLayout work 
involved we can get this ticket out of triage.
If you would like to, in addition to making the full_on_cas_3.0.18 JFR 
accessible again, post new screenshots of the allocations of this latest JFR 
that would be very helpful.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16155:

  Fix Version/s: 4.0-beta3
  Since Version: 4.0-beta2
Source Control Link: 
https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212218#comment-17212218
 ] 

Alex Petrov edited comment on CASSANDRA-16155 at 10/12/20, 8:20 AM:


[~dcapwell] thank you for elaborating! There is a test that uses this table in 
[CASSANDRA-15935] and we can add more test coverage for this (and other) 
virtual tables if needed.

Committed to trunk as 
[896baf64159463d9dd72a8829eec8311f8a888da|https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da].


was (Author: ifesdjeen):
[~dcapwell] thank you for elaborating! There is a test that uses this table in 
[CASSANDRA-15935] and we can add more test coverage for this (and other) 
virtual tables if needed.

Committed to trunk as 
[https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da|https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da].

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16155:

Status: Ready to Commit  (was: Review In Progress)

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Ninja: add a missing CHANGES.txt entry for CASSANDRA-16155

2020-10-12 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 0e0056c  Ninja: add a missing CHANGES.txt entry for CASSANDRA-16155
0e0056c is described below

commit 0e0056c3db1e0e8726549b03bacb407c88c34390
Author: Alex Petrov 
AuthorDate: Mon Oct 12 10:17:57 2020 +0200

Ninja: add a missing CHANGES.txt entry for CASSANDRA-16155
---
 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGES.txt b/CHANGES.txt
index 412b336..bf80c8c 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-beta3
+ * Fix ByteBufferAccessor cast exceptions are thrown when trying to query a 
virtual table (CASSANDRA-16155)
  * Consolidate node liveness check for forced repair (CASSANDRA-16113)
  * Use unsigned short in ValueAccessor.sliceWithShortLength (CASSANDRA-16147)
  * Abort repairs when getting a truncation request (CASSANDRA-15854)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16205) Offline token allocation strategy generator tool

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212220#comment-17212220
 ] 

Michael Semb Wever commented on CASSANDRA-16205:



CI results 
[here|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/94/pipeline].

> Offline token allocation strategy generator tool
> 
>
> Key: CASSANDRA-16205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16205
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config, Local/Scripts
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>
> A command line tool to generate tokens (using the 
> allocate_tokens_for_local_replication_factor algorithm) for pre-configuration 
> of {{initial_tokens}} in cassandra.yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16155) ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212218#comment-17212218
 ] 

Alex Petrov commented on CASSANDRA-16155:
-

[~dcapwell] thank you for elaborating! There is a test that uses this table in 
[CASSANDRA-15935] and we can add more test coverage for this (and other) 
virtual tables if needed.

Committed to trunk as 
[https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da|https://github.com/apache/cassandra/commit/896baf64159463d9dd72a8829eec8311f8a888da].

> ByteBufferAccessor cast exceptions are thrown when trying to query a virtual 
> table
> --
>
> Key: CASSANDRA-16155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16155
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Start a fresh trunk node, and try to run
> SELECT * FROM system_views.local_read_latency ;
> You’ll get: 
> {code:java}
> ERROR [Native-Transport-Requests-1] 2020-09-30 09:44:45,099 
> ErrorMessage.java:457 - Unexpected exception during request
>  java.lang.ClassCastException: 
> org.apache.cassandra.db.marshal.ByteBufferAccessor cannot be cast to 
> java.lang.String
>          at 
> org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:29)
>          at 
> org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:131) 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Fix ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table

2020-10-12 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 896baf6  Fix ByteBufferAccessor cast exceptions are thrown when trying 
to query a virtual table
896baf6 is described below

commit 896baf64159463d9dd72a8829eec8311f8a888da
Author: Alex Petrov 
AuthorDate: Thu Oct 1 17:01:01 2020 +0200

Fix ByteBufferAccessor cast exceptions are thrown when trying to query a 
virtual table

Patch by Alex Petrov and Caleb Rackliffe; reviewed by David Capwell and 
Chris Lohfink for CASSANDRA-16155

Co-authored-by: Caleb Rackliffe 
---
 .../apache/cassandra/db/virtual/SimpleDataSet.java | 23 ++--
 .../cql3/validation/entities/VirtualTableTest.java | 65 ++
 2 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/src/java/org/apache/cassandra/db/virtual/SimpleDataSet.java 
b/src/java/org/apache/cassandra/db/virtual/SimpleDataSet.java
index 8b6f7ca..b8cb9f5 100644
--- a/src/java/org/apache/cassandra/db/virtual/SimpleDataSet.java
+++ b/src/java/org/apache/cassandra/db/virtual/SimpleDataSet.java
@@ -18,18 +18,29 @@
 package org.apache.cassandra.db.virtual;
 
 import java.nio.ByteBuffer;
-import java.util.*;
-import java.util.concurrent.TimeUnit;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.NavigableMap;
+import java.util.TreeMap;
 
 import com.google.common.collect.Iterables;
 
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.Clustering;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.DeletionTime;
+import org.apache.cassandra.db.RegularAndStaticColumns;
 import org.apache.cassandra.db.filter.ClusteringIndexFilter;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.ByteBufferAccessor;
 import org.apache.cassandra.db.marshal.CompositeType;
-import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.rows.AbstractUnfilteredRowIterator;
+import org.apache.cassandra.db.rows.BTreeRow;
+import org.apache.cassandra.db.rows.BufferCell;
+import org.apache.cassandra.db.rows.EncodingStats;
+import org.apache.cassandra.db.rows.Rows;
+import org.apache.cassandra.db.rows.Unfiltered;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import org.apache.cassandra.schema.ColumnMetadata;
 import org.apache.cassandra.schema.TableMetadata;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -84,7 +95,7 @@ public class SimpleDataSet extends 
AbstractVirtualTable.AbstractDataSet
 {
 ByteBuffer partitionKey = partitionKeyValues.length == 1
 ? decompose(metadata.partitionKeyType, 
partitionKeyValues[0])
-: ((CompositeType) 
metadata.partitionKeyType).decompose(ByteBufferAccessor.instance, 
partitionKeyValues);
+: ((CompositeType) 
metadata.partitionKeyType).decompose(partitionKeyValues);
 return metadata.partitioner.decorateKey(partitionKey);
 }
 
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/entities/VirtualTableTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/entities/VirtualTableTest.java
index cd67cc9..9808c96 100644
--- 
a/test/unit/org/apache/cassandra/cql3/validation/entities/VirtualTableTest.java
+++ 
b/test/unit/org/apache/cassandra/cql3/validation/entities/VirtualTableTest.java
@@ -51,6 +51,7 @@ public class VirtualTableTest extends CQLTester
 private static final String KS_NAME = "test_virtual_ks";
 private static final String VT1_NAME = "vt1";
 private static final String VT2_NAME = "vt2";
+private static final String VT3_NAME = "vt3";
 
 private static class WritableVirtualTable extends AbstractVirtualTable
 {
@@ -80,10 +81,10 @@ public class VirtualTableTest extends CQLTester
 {
 String key = (String) 
metadata().partitionKeyType.compose(update.partitionKey().getKey());
 update.forEach(row ->
-{
-Integer value = 
Int32Type.instance.compose(row.getCell(valueColumn).buffer());
-backingMap.put(key, value);
-});
+   {
+   Integer value = 
Int32Type.instance.compose(row.getCell(valueColumn).buffer());
+   backingMap.put(key, value);
+   });
 }
 }
 
@@ -91,13 +92,13 @@ public class VirtualTableTest extends CQLTester
 public static void setUpClass()
 {
 TableMetadata vt1Metadata =
-TableMetadata.builder(KS_NAME, VT1_NAME)
- .kind(TableMetadata.Kind.VIRTUAL)
- .addPartitionKeyColumn("pk", UTF8Type.instance)
- 

[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

Test and Documentation Plan: Test included
 Status: Patch Available  (was: Open)

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Normal
  Component/s: Test/dtest/java
Discovered By: Unit Test
 Severity: Critical
   Status: Open  (was: Triage Needed)

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212213#comment-17212213
 ] 

Alex Petrov commented on CASSANDRA-16157:
-

|[patch|https://github.com/apache/cassandra/pull/772]|[ci|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=16157-npe-during-reserialization]|

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

Summary: RTE during re-serialization for message filtering during 3.0 -> 
4.0  (was: )

> RTE during re-serialization for message filtering during 3.0 -> 4.0
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov reassigned CASSANDRA-16157:
---

Assignee: Alex Petrov

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

Summary: RTE during re-serialization for message filtering during 3.0 -> 
4.0 upgrade  (was: RTE during re-serialization for message filtering during 3.0 
-> 4.0)

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157)

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

Description: 

{code}
 15294 java.lang.RuntimeException: Can not deserialize message 
org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
  15295 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
 ~[dtest-4.0-beta3.jar:?]
  15296 at 
org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
 ~[dtest-4.0-beta3.jar:?]
  15297 at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_232]
  15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_232]
  15299 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_232]
  15300 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_232]
  15301 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [dtest-4.0-beta3.jar:?]
  15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
  15303 Caused by: java.io.EOFException
  15304 at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
 ~[dtest-4.0-beta3.jar:?]
  15305 at 
org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) 
~[dtest-4.0-beta3.jar:?]
  15306 at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
 ~[dtest-4.0-beta3.jar:?]
  15307 at 
org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
 ~[dtest-4.0-beta3.jar:?]
  15308 at 
org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) 
~[dtest-4.0-beta3.jar:?]
  15309 at 
org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
~[dtest-4.0-beta3.jar:?]
  15310 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
 ~[dtest-4.0-beta3.jar:?]
  15311 ... 7 more
{code}

> 
> -
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Normal
>
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157)

2020-10-12 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

Description: 
When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if older 
node serves as a coordinator:

{code}
 15294 java.lang.RuntimeException: Can not deserialize message 
org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
  15295 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
 ~[dtest-4.0-beta3.jar:?]
  15296 at 
org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
 ~[dtest-4.0-beta3.jar:?]
  15297 at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_232]
  15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_232]
  15299 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_232]
  15300 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_232]
  15301 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [dtest-4.0-beta3.jar:?]
  15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
  15303 Caused by: java.io.EOFException
  15304 at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
 ~[dtest-4.0-beta3.jar:?]
  15305 at 
org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) 
~[dtest-4.0-beta3.jar:?]
  15306 at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
 ~[dtest-4.0-beta3.jar:?]
  15307 at 
org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
 ~[dtest-4.0-beta3.jar:?]
  15308 at 
org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) 
~[dtest-4.0-beta3.jar:?]
  15309 at 
org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
~[dtest-4.0-beta3.jar:?]
  15310 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
 ~[dtest-4.0-beta3.jar:?]
  15311 ... 7 more
{code}

  was:

{code}
 15294 java.lang.RuntimeException: Can not deserialize message 
org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
  15295 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
 ~[dtest-4.0-beta3.jar:?]
  15296 at 
org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
 ~[dtest-4.0-beta3.jar:?]
  15297 at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_232]
  15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_232]
  15299 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_232]
  15300 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_232]
  15301 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [dtest-4.0-beta3.jar:?]
  15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
  15303 Caused by: java.io.EOFException
  15304 at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
 ~[dtest-4.0-beta3.jar:?]
  15305 at 
org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) 
~[dtest-4.0-beta3.jar:?]
  15306 at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
 ~[dtest-4.0-beta3.jar:?]
  15307 at 
org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
 ~[dtest-4.0-beta3.jar:?]
  15308 at 
org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) 
~[dtest-4.0-beta3.jar:?]
  15309 at 
org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
~[dtest-4.0-beta3.jar:?]
  15310 at 
org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
 ~[dtest-4.0-beta3.jar:?]
  15311 ... 7 more
{code}


> 
> -
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> 

[jira] [Updated] (CASSANDRA-16205) Offline token allocation strategy generator tool

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16205:
---
Status: Patch Available  (was: In Progress)

> Offline token allocation strategy generator tool
> 
>
> Key: CASSANDRA-16205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16205
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config, Local/Scripts
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>
> A command line tool to generate tokens (using the 
> allocate_tokens_for_local_replication_factor algorithm) for pre-configuration 
> of {{initial_tokens}} in cassandra.yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16205) Offline token allocation strategy generator tool

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212139#comment-17212139
 ] 

Michael Semb Wever edited comment on CASSANDRA-16205 at 10/12/20, 6:11 AM:
---

rpm/deb packaging, and doc updates being added.
-Changing the script name to {{cassandra-generate-tokens}}-


was (Author: michaelsembwever):
rpm/deb packaging, and doc updates being added.
Changing the script name to {{cassandra-generate-tokens}}

> Offline token allocation strategy generator tool
> 
>
> Key: CASSANDRA-16205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16205
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config, Local/Scripts
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>
> A command line tool to generate tokens (using the 
> allocate_tokens_for_local_replication_factor algorithm) for pre-configuration 
> of {{initial_tokens}} in cassandra.yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16205) Offline token allocation strategy generator tool

2020-10-12 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212139#comment-17212139
 ] 

Michael Semb Wever commented on CASSANDRA-16205:


rpm/deb packaging, and doc updates being added.
Changing the script name to {{cassandra-generate-tokens}}

> Offline token allocation strategy generator tool
> 
>
> Key: CASSANDRA-16205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16205
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config, Local/Scripts
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>
> A command line tool to generate tokens (using the 
> allocate_tokens_for_local_replication_factor algorithm) for pre-configuration 
> of {{initial_tokens}} in cassandra.yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16205) Offline token allocation strategy generator tool

2020-10-12 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16205:
---
Status: In Progress  (was: Patch Available)

> Offline token allocation strategy generator tool
> 
>
> Key: CASSANDRA-16205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16205
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config, Local/Scripts
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>
> A command line tool to generate tokens (using the 
> allocate_tokens_for_local_replication_factor algorithm) for pre-configuration 
> of {{initial_tokens}} in cassandra.yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org